Scrapers
Scraping Module Overview¶
Skyscraper supports several online and local sources when scraping data for your roms. This makes Skyscraper a hugely versatile tool since it also caches any resources that are gathered from any of the modules. The cached data can then be used to generate a game list and composite artwork later.
Choosing a scraping module is as simply as setting the -s <MODULE> option when running Skyscraper on the command line. It also requires a platform to be set with -p <PLATFORM>. If you leave out the -s option Skyscraper goes into game list generation mode and combines your cached data into a game list for the chosen platform and frontend. Read more about the resource cache if needed.
For scraping modules that support or require user credentials you have the option of either setting it on commandline with -u <USER:PASSWD> or -u <KEY> or better yet, by adding it permanently to the Skyscraper configuration at /home/<USER>/.skyscraper/config.ini as described in the configuration documentation
Remember, on existing Skyscraper installations to adapt the priorities.xml file for the more recently added metadata like Manuals, Fan Art and Back of Cover. You may want to review the section Resource and Scraping Module Priorities.
Capabilities of Scrapers¶
This table summarizes the game metadata provided by each scraping module. Hover over a table cell to display the scraper module as tooltip:
| Metadata → Scraper (Metadata coverage) ↓ |
Title | Release Date | Description | Max. Players | Developer | Publisher | Genre/Tags | Rating | Age Recommend. | Cover | Screenshot | Wheel/Logo | Marquee | Video | Manual (PDF) (v3.12+) | Fan Art (v3.18+) | Back of Cover (v3.18+) | Texture |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Arcade DB (11/18) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ ¹ | ✓ | ✓ | ✓ | ✓ | |||||||
| ES GameList (15/18) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ ² | ✓ | ✓ | ✓ | ✓ | |||
| GameBase (10/18) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ ³ | ✓ | ✓ | ||||||||
| Internet Game DB (IGDB) (12/18) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| File Import (18/18) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| MobyGames (8/18) | ✓ | ✓ ⁴ | ✓ | See ⁴ | ✓ | ✓ | ✓ | See ⁴ | See ⁴ | ✓ | ✓ | See ⁴ | ||||||
| OpenRetro (11/18) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| ScreenScraper (18/18) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| The Games DB (14/18) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| ZXInfo (10/18) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ ⁵ | ✓ | ✓ | ||||||||
| Scraper coverage per metadata | 10/10 Title |
10/10 Release Date |
8/10 Description |
9/10 Max. Players |
9/10 Developer |
10/10 Publisher |
10/10 Genre/Tags |
7/10 Rating |
6/10 Age Recommend. |
10/10 Cover |
10/10 Screenshot |
4/10 Wheel/Logo |
6/10 Marquee |
4/10 Video |
3/10 Manual |
5/10 Fan Art |
4/10 Back of Cover |
2/10 Texture |
Remarks:
¹ Skyscraper uses ArcadeDB's Flyer and as a failsafe the Title screen, as Arcade games usually were not sold in a box
² For historical reasons the gamelist element marquee contains the logo (wheel)
³ GameBase provides only an adult flag, thus it is either 18 or no age rating
⁴ Release date will contain the first release date worldwide with Hobbyist API subscription. Age Recommendation, Rating, Max. Players, Video and release date per platform require an APIv2 Bronze subscription or higher. Skyscraper supporting anything else than a Hobbyist subscription is very unlikely.
⁵ The source zxinfo.dk provides only an x-rated flag, thus it is either 18 or no age rating
Recognized Keywords in Query¶
| Module | Supported Formats --query="" Parameter |
|---|---|
| arcadedb | Only title |
| esgamelist | No query supported |
| gamebase | Game filename, Game title and Game CRC (automatically detected). Except for CRC, globbing patterns (* and '?) can be used. |
| igdb | Title or use id=... to query by IGDB game ID |
| import | No query supported |
| mobygames | Title or numeric MobyGames ID (see Moby ID: right below the title when displaying a game on the website) |
| openretro | Only title |
| screenscraper | romnom=, crc=, md5=, sha1=, use of gameid= may timeout (see #195) ; see Screenscraper documentation for description of these parameters |
| thegamesdb, tgdb | Only title |
| zxinfo (worldofspectrum) | Title, game Id (id=...) or game filehash (MD5 or SHA512) |
Aliases for Game Filenames
Except for the Import and EmulationStation Gamelist scraper you can also define aliases for each game filename. If an alias is found it is applied for searching the game's metadata. Consult the file aliasMap.csv for details.
Successor of 'World of Spectrum' is 'ZXInfo'
Thanks to some kind soul there is a fully functional ZXSpectrum scraping source again and you can use it with Skyscraper 3.17 onwards.
For you as an user nothing changes: You may continue the -s scraper values worldofspectrum, wos or zxinfo (preferred)
to use this scraper. Plus, you may now also scrape by game id and game hash (see the --query option for details).
Characteristics for Each Scraping Module¶
ScreenScraper¶
- Shortname:
screenscraper - Type: Online
- Website: www.screenscraper.fr
- Type: Rom checksum based, Exact file name based
- User credential support: Yes, and strongly recommended, but not required
- API request limit: 20k per day for registered users
- Thread limit: 1 or more depending on user credentials
- Platform support: Check list under "Systémes" or see
screenscraper_platforms.jsonsibling to yourconfig.ini - Media support:
backcover,cover,fanart,manual,marquee,screenshot,texture,video,wheel - Example use:
ScreenScraper is probably the most versatile and complete retro gaming database out there. It searches for games using either the checksums of the files or by comparing the exact file name to entries in their database.
It can be used for gathering data for pretty much all platforms, but it does have issues with platforms that are ISO based. Still, even for those platforms, it does locate some games.
It has the best support for the wheel and marquee artwork types of any of the databases, and also contains videos, fanart, backcovers and manuals for a lot of the games.
I strongly recommend supporting them by contributing data to the database, or by supporting them with a bit of money. This can also give you more threads to scrape with.
Note
Exact file name matching does not work well for the arcade derived platforms in cases where a data checksum doesn't match. The reason being that arcade and other arcade-like platforms are made up of several subplatforms. Each of those subplatforms have a high chance of containing the same file name entry. In those cases ScreenScraper can't determine a unique game and will return an empty result.
TheGamesDB (TGDB)¶
- Shortname:
thegamesdb,tgdb - Type: Online
- Website: www.thegamesdb.net
- Type: File name search based
- User credential support: Not required
- API request limit: Limited to 3000 requests per IP per month
- Thread limit: None
- Platform support: Link to list or see
tgdb_platforms.jsonsibling to yourconfig.ini - Media support:
backcover,cover,fanart,marquee,screenshot,wheel - Example use:
For newer games there's no way around TheGames DB. It recently had a huge redesign and their database remains one of the best out there. I would recommend scraping your roms with screenscraper first, and then use thegamesdb to fill out the gaps in your cache.
There's a small caveat to this module, as it has a monthly request limit (see above) per IP per month. But this should be plenty for most people.
Their API is based on a file name search. This means that the returned results do have a chance of being faulty. Skyscraper does a lot internally to make sure accepted data is for the correct game. But it is impossible to ensure 100% correct results, so do keep that in mind when using it. Consider using the --flags interactive command line flag if you want complete control of the accepted entries.
The backcover files scraped with this scraper are in average larger than 1MiB. It is likely you get box back cover files between 2MiB and 5MiB. Keep this in mind when space is a premium on your system. Screenscraper in contrast provides hi-res back cover files below 1MiB each, the majority is around 500KiB.
IGDB (Internet Game Database)¶
- Shortname:
igdb - Type: Online
- Website: www.igdb.com
- Type: File name or IGDB Game Id search based
- User credential support: Yes, free private API client-id and secret-key required! Read more below
- API request limit: A maximum of 4 requests per seconds is allowed
- Thread limit: 4 (each being limited to 1 request per second)
- Platform support: List
- Media support:
cover,fanart,screenshot - Example use:
IGDB is a relatively new database on the market. But absolutely not a bad one at that. It has made a big leap forward recently, placing it right after Screenscraper and The Games DB.
It is required to register with the Twitch dev program (IGDB is owned by Twitch) and create a free client-id and secret-key pair for use with Skyscraper. The process of getting this free client-id and secret-key pair is quite easy. Just follow the following steps:
- Signup at Twitch
- Enable two-factor authentication (mandatory)
- Register an application (call it whatever you like)
- Manage your newly created application
- Add
https://localhostas OAuth redirect URL - Generate a secret-key by selecting the button
New secret - Add your client-id and secret-key pair to the Skyscraper config ini (
/home/<USER>/.skyscraper/config.ini):
Substitute CLIENTID and SECRETKEY with your own details. And that's it, you should now be able to use the IGDB module.
ArcadeDB (by motoschifo)¶
- Shortname:
arcadedb - Type: Online
- Website: adb.arcadeitalia.net, youtube
- Contact: arcadedatabase@gmail.com
- Type: Mame file name id based
- User credential support: None required
- API request limit: None
- Thread limit: 1
- Platform support: Exclusively arcade platforms using official MAME files
- Media support:
cover,marquee,screenshot,video,wheel - Example use:
Several Arcade databases using the MAME file name id's have existed throughout the years. Currently the best one, in my opinion, is the ArcadeDB made by motoschifo. It goes without saying that this module is best used for arcade platforms such as fba, arcade and any of the mame sub-platforms.
As it relies on the MAME file name id when searching, there's no use trying to use this module for any non-MAME files. It won't give you any results.
This module also supports videos for many games.
OpenRetro¶
- Shortname:
openretro - Type: Online
- Website: www.openretro.org
- Type: WHDLoad uuid based, File name search based
- User credential support: None required
- API request limit: None
- Thread limit: 1
- Platform support: Primarily Amiga, but supports others as well. Check list here to the right
- Media support:
cover,marquee,screenshot - Example use:
If you're looking to scrape the Amiga RetroPlay LHA files, there's no better way to do this than using the openretro module. It is by far the best WHDLoad Amiga database on the internet when it comes to data scraping, and maybe even the best Amiga game info database overall.
The database also supports many non-Amiga platforms, but there's no doubt that Amiga is the strong point.
MobyGames¶
- Shortname:
mobygames - Type: Online
- Website: www.mobygames.com
- Type: File name or MobyGames ID search based
- User credential support: None required
- API request limit: 1 request per 5 seconds (Hobbyist subscription)
- Thread limit: 1
- Platform support: List or see
mobygames_platforms.jsonsibling to yourconfig.ini - Media support:
cover,screenshot - Example use:
MobyGames APIv2 imposes more limits than APIv1. Not only you will need a payed subscription (to get an API key), but even with the entry-level (=Hobbyist) subscription you cannot scrape the same data as with APIv1. These are the limitations:
- Release date will contain only the first release date worldwde with Hobbyist API subscription.
- Age Recommendation, Rating, Maximum of Players, Video and release date per platform require an APIv2 Bronze subscription or higher.
Skyscraper supporting anything else than a Hobbyist subscription is very unlikely. It is saddening to see the service of MobyGames degrading after the acquisition by Atari SA.
However, once you have obtained an API key (starting with moby_...) add it to
the userCreds configuration (without any colon) in the
[mobygames] INI-file section.
ZXInfo (formerly World of Spectrum)¶
- Shortname:
zxinfo,worldofspectrum,wos - Type: Online
- Website: zxinfo.dk
- Type: File name search, Game Id search or Game hash search
- User credential support: None required
- API request limit: None
- Thread limit: None
- Platform support: Exclusively ZX Spectrum games
- Media support:
cover,screenshot - Example use:
If you're looking specifically for ZX Spectrum data, this is the module to use. ZXInfo is probably the most complete ZX Spectrum resource and information database in existence. I strongly recommend visiting the site if you have any interest in these little machines. It's a cornucopia of information on the platform.
Custom Resource Import¶
- Shortname:
import - Type: Local
- Website: Documentation@github
- Type: Exact file name match
- User credential support: None required
- API request limit: None
- Thread limit: None
- Platform support: All
- Media support:
backcover,cover,fanart,manual,marquee,screenshot,texture,video,wheel - Example use:
The import scraper has always set --refresh to true.
Read a thorough description of the import module to recognize all capabilities.
GameBase DB¶
- Shortname:
gamebase - Type: Local
- Website: about the format
- Type: filename, title or CRC match, for filename and title wildcards '*' and '?' can be applied anywhere
- User credential support: None required
- API request limit: None
- Thread limit: 1
- Platform support: For those platforms where the community has compiled a GameBase database, several dozen platforms do have a GameBase database. Some examples: Commodore Machines (VC-20,C64,Plus/4,Amiga), Sinclair Spectrum ("Speccy"), see the most comprehensive list
- Media support:
cover,screenshot - Example use:
A GameBase DB is a community driven effort to collect game information of the
common game releases for a platform, but also more importantly for Homebrew and
Indie released games. It is a great source to find much information about the
games and other media in one place, which is otherwise cluttered over the
internet. Skyscraper only uses the game information, but a GameBase DB also
contains information and files of the platform's former magazines and short
manuals for example. The usual GameBase DB Frontend is Windows based and a
database is in Microsoft Access (*.mdb) format. Binary data is held in
subfolders (e.g. Screenshots, Cover) on the filesystem.
Read the setup and config description of the GameBase DB module.
EmulationStation Style Gamelists¶
- Shortname:
esgamelist - Type: Local
- Website: https://emulationstation.org
- Type: Exact file name match
- User credential support: None required
- API request limit: None
- Thread limit: None
- Platform support: All
- Media support:
backcover,cover,fanart,manual,marquee,screenshot,video - Example use:
This module allows you to import data from an existing EmulationStation (ES)
(flavors: RetroPie-ES, Batocera-ES and ES variants, but not ES-DE) game list
into the Skyscraper cache. This is useful if you already have a lot of data and
artwork in a gamelist.xml and associated media files and you wish to use it
with Skyscraper. Usually this is a one-off scraper for each platform. If you
want to re-import and overwrite already cached data from a previous run with
this module, do set the --refresh flag. These mediatypes are implicitly set
on: backcover, fanart, manual and video.
For historical reasons the gamelist element marquee contains the logo (wheel).
This scraper will use the marquee gamelist element and store it in the wheel
cache media type, to ensure consistency when generating a gamelist from this
import again. This is why <wheel> is not ingested with this scraper from the
gamelist.
This scraper does not scrape the kidgame flag from the gamelist, as Skyscraper
internally uses the age to determine the kidgame output. Ingesting the
<kidgame> from the ES Gamelist would be a loss of precision.
Also <texture> is not ingested, as it is usually not present in the majority
of gamelist flavors.
Eventually, you may have to adjust your
priorities file, to put
esgamelist data higher in the output precedence.
Sparse Import
Remember you can also provide a single or a set of gamefiles on the
commandline or provide a list of gamefiles (e.g., with
includeFrom) to do a sparse import.
In that case no --refresh flag has to be provided, it is set to on
implicitly.
Skyscraper will search for the gamelist.xml file at
<gameListFolder>/gamelist.xml which by default is
/home/<USER>/RetroPie/roms/<PLATFORM>/gamelist.xml. Media denoted by relative
paths in the gamelist is by convention relative to the input folder for
EmulationStation.
Heads Up, Batocera Users
It is advised to use the frontend switch whenever this import is run for a non-RetroPie EmulationStation gamelist. That way the matching gamelist folder, gamelist filenme and input folder will be used.