Mod Archive Forums Mod Archive Forums
Advanced search  

News:

Please note: Your main modarchive.org account will not work here, you must create a forum account to post on the forums.

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - iwalton3

Pages: [1]
1
Project / Coder's Corner / Related File Analysis
« on: February 17, 2020, 07:28:13 »
Hello!

I recently processed the archives from modland, modarchive, and the keygen music library. I extracted all of the samples from these files and hashed them, allowing cross-comparison across the files. This means if you have one file you like, you can find other ones that are similar. This is great for finding remixes and similar songs. (And also for finding rippers as well.)

I'd like to also use this information to understand more about what artists and genres of module files, but unfortunately I do not see any database dumps available and I don't want to scrape this site. If you are interested in helping my work, I would greatly appreciate a TSV (preferred) or CSV dump containing the MD5 sum, numeric ID, genre, and artist of the module files. (This also makes my data more useful to others.)

Regardless, you can find my analysis of the files here: https://iwalton.com/ushare/sample-index.gz

The format of the file is TSV, with the columns: File MD5, Sample MD5, Sample Length, Loop, Sample Name

Additionally, here are files that also include the names of the files:
 - Prefers Modland: https://iwalton.com/ushare/sample-index-incl-filename.gz
 - ModArchive Only: https://iwalton.com/ushare/sample-index-modarchive.gz

If you'd like to find files similar to one you know of, you can use this shell command:

grep "md5-sum-of-file" sample-index | cut -f 2 | while read -r line; do grep "$line" sample-index | sort -u; done | cut -f 1 | sort | uniq -dc | sort -h

Edit: Also, here are computed ReplayGain values for all of the files too. The values were calculated by converting every mod file to a flac and running python-rgain over them. https://iwalton.com/ushare/module-replaygain.gz

Edit 2: Here are guessed sample names: https://iwalton.com/ushare/sample-names
Rougly 7.6 percent of the extracted samples have guessed names. Many of them may be incorrect. These were automatically guessed with this script: https://pastebin.com/raw/Xn5N0ycs

Pages: [1]