Skip to content

Scrapers

Scraping Module Overview

Skyscraper supports several online and local sources when scraping data for your roms. This makes Skyscraper a hugely versatile tool since it also caches any resources that are gathered from any of the modules. The cached data can then be used to generate a game list and composite artwork later.

Choosing a scraping module is as simply as setting the -s <MODULE> option when running Skyscraper on the command line. It also requires a platform to be set with -p <PLATFORM>. If you leave out the -s option Skyscraper goes into game list generation mode and combines your cached data into a game list for the chosen platform and frontend. Read more about the resource cache if needed.

For scraping modules that support or require user credentials you have the option of either setting it on commandline with -u <USER:PASSWD> or -u <KEY> or better yet, by adding it permanently to the Skyscraper configuration at /home/<USER>/.skyscraper/config.ini as described in the configuration documentation

Remember, on existing Skyscraper installations to adapt the priorities.xml file for the more recently added metadata like Manuals, Fan Art and Back of Cover. You may want to review the section Resource and Scraping Module Priorities.

Capabilities of Scrapers

This table summarizes the game metadata provided by each scraping module. Hover over a table cell to display the scraper module as tooltip:

Metadata →
Scraper (Metadata coverage) ↓
Title Release Date Description Max. Players Developer Publisher Genre/Tags Rating Age Recommend. Cover Screenshot Wheel/Logo Marquee Video Manual (PDF) (v3.12+) Fan Art (v3.18+) Back of Cover (v3.18+) Texture
Arcade DB (11/18)       ✓ ¹        
ES GameList (15/18)     ✓ ²  
GameBase (10/18)   ✓ ³              
Internet Game DB (IGDB) (12/18)            
File Import (18/18)
MobyGames (8/18) ✓ ⁴ See ⁴ See ⁴ See ⁴     See ⁴        
OpenRetro (11/18)              
ScreenScraper (18/18)
The Games DB (14/18)        
ZXInfo (10/18)   ✓ ⁵              
Scraper coverage per metadata 10/10
Title
10/10
Release Date
8/10
Description
9/10
Max. Players
9/10
Developer
10/10
Publisher
10/10
Genre/Tags
7/10
Rating
6/10
Age Recommend.
10/10
Cover
10/10
Screenshot
4/10
Wheel/Logo
6/10
Marquee
4/10
Video
3/10
Manual
5/10
Fan Art
4/10
Back of Cover
2/10
Texture

Remarks:
¹ Skyscraper uses ArcadeDB's Flyer and as a failsafe the Title screen, as Arcade games usually were not sold in a box
² For historical reasons the gamelist element marquee contains the logo (wheel)
³ GameBase provides only an adult flag, thus it is either 18 or no age rating
⁴ Release date will contain the first release date worldwide with Hobbyist API subscription. Age Recommendation, Rating, Max. Players, Video and release date per platform require an APIv2 Bronze subscription or higher. Skyscraper supporting anything else than a Hobbyist subscription is very unlikely.
⁵ The source zxinfo.dk provides only an x-rated flag, thus it is either 18 or no age rating

Recognized Keywords in Query

Module Supported Formats --query="" Parameter
arcadedb Only title
esgamelist No query supported
gamebase Game filename, Game title and Game CRC (automatically detected). Except for CRC, globbing patterns (* and '?) can be used.
igdb Title or use id=... to query by IGDB game ID
import No query supported
mobygames Title or numeric MobyGames ID (see Moby ID: right below the title when displaying a game on the website)
openretro Only title
screenscraper romnom=, crc=, md5=, sha1=, use of gameid= may timeout (see #195) ; see Screenscraper documentation for description of these parameters
thegamesdb, tgdb Only title
zxinfo (worldofspectrum) Title, game Id (id=...) or game filehash (MD5 or SHA512)

Aliases for Game Filenames

Except for the Import and EmulationStation Gamelist scraper you can also define aliases for each game filename. If an alias is found it is applied for searching the game's metadata. Consult the file aliasMap.csv for details.

Successor of 'World of Spectrum' is 'ZXInfo'

Thanks to some kind soul there is a fully functional ZXSpectrum scraping source again and you can use it with Skyscraper 3.17 onwards. For you as an user nothing changes: You may continue the -s scraper values worldofspectrum, wos or zxinfo (preferred) to use this scraper. Plus, you may now also scrape by game id and game hash (see the --query option for details).

Characteristics for Each Scraping Module

ScreenScraper

  • Shortname: screenscraper
  • Type: Online
  • Website: www.screenscraper.fr
  • Type: Rom checksum based, Exact file name based
  • User credential support: Yes, and strongly recommended, but not required
  • API request limit: 20k per day for registered users
  • Thread limit: 1 or more depending on user credentials
  • Platform support: Check list under "Systémes" or see screenscraper_platforms.json sibling to your config.ini
  • Media support: backcover, cover, fanart, manual, marquee, screenshot, texture, video, wheel
  • Example use:
    Skyscraper -p snes -s screenscraper
    

ScreenScraper is probably the most versatile and complete retro gaming database out there. It searches for games using either the checksums of the files or by comparing the exact file name to entries in their database.

It can be used for gathering data for pretty much all platforms, but it does have issues with platforms that are ISO based. Still, even for those platforms, it does locate some games.

It has the best support for the wheel and marquee artwork types of any of the databases, and also contains videos, fanart, backcovers and manuals for a lot of the games.

I strongly recommend supporting them by contributing data to the database, or by supporting them with a bit of money. This can also give you more threads to scrape with.

Note

Exact file name matching does not work well for the arcade derived platforms in cases where a data checksum doesn't match. The reason being that arcade and other arcade-like platforms are made up of several subplatforms. Each of those subplatforms have a high chance of containing the same file name entry. In those cases ScreenScraper can't determine a unique game and will return an empty result.

TheGamesDB (TGDB)

  • Shortname: thegamesdb, tgdb
  • Type: Online
  • Website: www.thegamesdb.net
  • Type: File name search based
  • User credential support: Not required
  • API request limit: Limited to 3000 requests per IP per month
  • Thread limit: None
  • Platform support: Link to list or see tgdb_platforms.json sibling to your config.ini
  • Media support: backcover, cover, fanart, marquee, screenshot, wheel
  • Example use:
    Skyscraper -p snes -s thegamesdb
    

For newer games there's no way around TheGames DB. It recently had a huge redesign and their database remains one of the best out there. I would recommend scraping your roms with screenscraper first, and then use thegamesdb to fill out the gaps in your cache.

There's a small caveat to this module, as it has a monthly request limit (see above) per IP per month. But this should be plenty for most people.

Their API is based on a file name search. This means that the returned results do have a chance of being faulty. Skyscraper does a lot internally to make sure accepted data is for the correct game. But it is impossible to ensure 100% correct results, so do keep that in mind when using it. Consider using the --flags interactive command line flag if you want complete control of the accepted entries.

The backcover files scraped with this scraper are in average larger than 1MiB. It is likely you get box back cover files between 2MiB and 5MiB. Keep this in mind when space is a premium on your system. Screenscraper in contrast provides hi-res back cover files below 1MiB each, the majority is around 500KiB.

IGDB (Internet Game Database)

  • Shortname: igdb
  • Type: Online
  • Website: www.igdb.com
  • Type: File name or IGDB Game Id search based
  • User credential support: Yes, free private API client-id and secret-key required! Read more below
  • API request limit: A maximum of 4 requests per seconds is allowed
  • Thread limit: 4 (each being limited to 1 request per second)
  • Platform support: List
  • Media support: cover, fanart, screenshot
  • Example use:
    Skyscraper -p fba -s igdb <SINGLE FILE TO SCRAPE>`
    Skyscraper -p fba -s igdb --startat <FILE TO START AT> --endat <FILE TO END AT>
    

IGDB is a relatively new database on the market. But absolutely not a bad one at that. It has made a big leap forward recently, placing it right after Screenscraper and The Games DB.

It is required to register with the Twitch dev program (IGDB is owned by Twitch) and create a free client-id and secret-key pair for use with Skyscraper. The process of getting this free client-id and secret-key pair is quite easy. Just follow the following steps:

[igdb]
userCreds="CLIENTID:SECRETKEY"

Substitute CLIENTID and SECRETKEY with your own details. And that's it, you should now be able to use the IGDB module.

ArcadeDB (by motoschifo)

  • Shortname: arcadedb
  • Type: Online
  • Website: adb.arcadeitalia.net, youtube
  • Contact: arcadedatabase@gmail.com
  • Type: Mame file name id based
  • User credential support: None required
  • API request limit: None
  • Thread limit: 1
  • Platform support: Exclusively arcade platforms using official MAME files
  • Media support: cover, marquee, screenshot, video, wheel
  • Example use:
    Skyscraper -p fba -s arcadedb
    

Several Arcade databases using the MAME file name id's have existed throughout the years. Currently the best one, in my opinion, is the ArcadeDB made by motoschifo. It goes without saying that this module is best used for arcade platforms such as fba, arcade and any of the mame sub-platforms.

As it relies on the MAME file name id when searching, there's no use trying to use this module for any non-MAME files. It won't give you any results.

This module also supports videos for many games.

OpenRetro

  • Shortname: openretro
  • Type: Online
  • Website: www.openretro.org
  • Type: WHDLoad uuid based, File name search based
  • User credential support: None required
  • API request limit: None
  • Thread limit: 1
  • Platform support: Primarily Amiga, but supports others as well. Check list here to the right
  • Media support: cover, marquee, screenshot
  • Example use:
    Skyscraper -p amiga -s openretro
    

If you're looking to scrape the Amiga RetroPlay LHA files, there's no better way to do this than using the openretro module. It is by far the best WHDLoad Amiga database on the internet when it comes to data scraping, and maybe even the best Amiga game info database overall.

The database also supports many non-Amiga platforms, but there's no doubt that Amiga is the strong point.

MobyGames

  • Shortname: mobygames
  • Type: Online
  • Website: www.mobygames.com
  • Type: File name or MobyGames ID search based
  • User credential support: None required
  • API request limit: 1 request per 5 seconds (Hobbyist subscription)
  • Thread limit: 1
  • Platform support: List or see mobygames_platforms.json sibling to your config.ini
  • Media support: cover, screenshot
  • Example use:
    Skyscraper -p fba -s mobygames <SINGLE FILE TO SCRAPE>`
    Skyscraper -p fba -s mobygames --startat <FILE TO START AT> --endat <FILE TO END AT>
    

MobyGames APIv2 imposes more limits than APIv1. Not only you will need a payed subscription (to get an API key), but even with the entry-level (=Hobbyist) subscription you cannot scrape the same data as with APIv1. These are the limitations:

  • Release date will contain only the first release date worldwde with Hobbyist API subscription.
  • Age Recommendation, Rating, Maximum of Players, Video and release date per platform require an APIv2 Bronze subscription or higher.

Skyscraper supporting anything else than a Hobbyist subscription is very unlikely. It is saddening to see the service of MobyGames degrading after the acquisition by Atari SA.

However, once you have obtained an API key (starting with moby_...) add it to the userCreds configuration (without any colon) in the [mobygames] INI-file section.

ZXInfo (formerly World of Spectrum)

  • Shortname: zxinfo, worldofspectrum, wos
  • Type: Online
  • Website: zxinfo.dk
  • Type: File name search, Game Id search or Game hash search
  • User credential support: None required
  • API request limit: None
  • Thread limit: None
  • Platform support: Exclusively ZX Spectrum games
  • Media support: cover, screenshot
  • Example use:
    Skyscraper -p zxspectrum -s zxinfo
    

If you're looking specifically for ZX Spectrum data, this is the module to use. ZXInfo is probably the most complete ZX Spectrum resource and information database in existence. I strongly recommend visiting the site if you have any interest in these little machines. It's a cornucopia of information on the platform.

Custom Resource Import

  • Shortname: import
  • Type: Local
  • Website: Documentation@github
  • Type: Exact file name match
  • User credential support: None required
  • API request limit: None
  • Thread limit: None
  • Platform support: All
  • Media support: backcover, cover, fanart, manual, marquee, screenshot, texture, video, wheel
  • Example use:
    Skyscraper -p snes -s import
    

The import scraper has always set --refresh to true.
Read a thorough description of the import module to recognize all capabilities.

GameBase DB

  • Shortname: gamebase
  • Type: Local
  • Website: about the format
  • Type: filename, title or CRC match, for filename and title wildcards '*' and '?' can be applied anywhere
  • User credential support: None required
  • API request limit: None
  • Thread limit: 1
  • Platform support: For those platforms where the community has compiled a GameBase database, several dozen platforms do have a GameBase database. Some examples: Commodore Machines (VC-20,C64,Plus/4,Amiga), Sinclair Spectrum ("Speccy"), see the most comprehensive list
  • Media support: cover, screenshot
  • Example use:
    Skyscraper -p zxspectrum -s gamebase
    

A GameBase DB is a community driven effort to collect game information of the common game releases for a platform, but also more importantly for Homebrew and Indie released games. It is a great source to find much information about the games and other media in one place, which is otherwise cluttered over the internet. Skyscraper only uses the game information, but a GameBase DB also contains information and files of the platform's former magazines and short manuals for example. The usual GameBase DB Frontend is Windows based and a database is in Microsoft Access (*.mdb) format. Binary data is held in subfolders (e.g. Screenshots, Cover) on the filesystem.

Read the setup and config description of the GameBase DB module.

EmulationStation Style Gamelists

  • Shortname: esgamelist
  • Type: Local
  • Website: https://emulationstation.org
  • Type: Exact file name match
  • User credential support: None required
  • API request limit: None
  • Thread limit: None
  • Platform support: All
  • Media support: backcover, cover, fanart, manual, marquee, screenshot, video
  • Example use:
    Skyscraper -p snes -s esgamelist
    Skyscraper -p megadrive -f batocera --refresh -s esgamelist
    Skyscraper -p c64 -f batocera -s esgamelist <game filename(s)>
    

This module allows you to import data from an existing EmulationStation (ES) (flavors: RetroPie-ES, Batocera-ES and ES variants, but not ES-DE) game list into the Skyscraper cache. This is useful if you already have a lot of data and artwork in a gamelist.xml and associated media files and you wish to use it with Skyscraper. Usually this is a one-off scraper for each platform. If you want to re-import and overwrite already cached data from a previous run with this module, do set the --refresh flag. These mediatypes are implicitly set on: backcover, fanart, manual and video.
For historical reasons the gamelist element marquee contains the logo (wheel). This scraper will use the marquee gamelist element and store it in the wheel cache media type, to ensure consistency when generating a gamelist from this import again. This is why <wheel> is not ingested with this scraper from the gamelist.
This scraper does not scrape the kidgame flag from the gamelist, as Skyscraper internally uses the age to determine the kidgame output. Ingesting the <kidgame> from the ES Gamelist would be a loss of precision.
Also <texture> is not ingested, as it is usually not present in the majority of gamelist flavors.
Eventually, you may have to adjust your priorities file, to put esgamelist data higher in the output precedence.

Sparse Import

Remember you can also provide a single or a set of gamefiles on the commandline or provide a list of gamefiles (e.g., with includeFrom) to do a sparse import. In that case no --refresh flag has to be provided, it is set to on implicitly.

Skyscraper will search for the gamelist.xml file at <gameListFolder>/gamelist.xml which by default is /home/<USER>/RetroPie/roms/<PLATFORM>/gamelist.xml. Media denoted by relative paths in the gamelist is by convention relative to the input folder for EmulationStation.

Heads Up, Batocera Users

It is advised to use the frontend switch whenever this import is run for a non-RetroPie EmulationStation gamelist. That way the matching gamelist folder, gamelist filenme and input folder will be used.