Hello!
I recently processed the archives from modland, modarchive, and the keygen music library. I extracted all of the samples from these files and hashed them, allowing cross-comparison across the files. This means if you have one file you like, you can find other ones that are similar. This is great for finding remixes and similar songs. (And also for finding rippers as well.)
I'd like to also use this information to understand more about what artists and genres of module files, but unfortunately I do not see any database dumps available and I don't want to scrape this site. If you are interested in helping my work, I would greatly appreciate a TSV (preferred) or CSV dump containing the MD5 sum, numeric ID, genre, and artist of the module files. (This also makes my data more useful to others.)
Regardless, you can find my analysis of the files here:
https://iwalton.com/ushare/sample-index.gzThe format of the file is TSV, with the columns: File MD5, Sample MD5, Sample Length, Loop, Sample Name
Additionally, here are files that also include the names of the files:
- Prefers Modland:
https://iwalton.com/ushare/sample-index-incl-filename.gz - ModArchive Only:
https://iwalton.com/ushare/sample-index-modarchive.gzIf you'd like to find files similar to one you know of, you can use this shell command:
grep "md5-sum-of-file" sample-index | cut -f 2 | while read -r line; do grep "$line" sample-index | sort -u; done | cut -f 1 | sort | uniq -dc | sort -hEdit: Also, here are computed ReplayGain values for all of the files too. The values were calculated by converting every mod file to a flac and running python-rgain over them.
https://iwalton.com/ushare/module-replaygain.gzEdit 2: Here are guessed sample names:
https://iwalton.com/ushare/sample-namesRougly 7.6 percent of the extracted samples have guessed names. Many of them may be incorrect. These were automatically guessed with this script:
https://pastebin.com/raw/Xn5N0ycs