I previously mentioned that Lucene rocks — well, that is not giving it enough credit. I’m working on the guts to a Lucene enabled Picard tagger, and in doing so I have created a simple script that chewed through a given set of mp3 files and attempts to match them up with MusicBrainz.
My friend Vee once gave me a CD full of hip-hop music to give to my GF. I took one look at it and stared in shock! What a mess — not many id3 tags, mostly no album names at all. Lots of friends vs friendz problems — much slang used in inconsistent ways. Ick!
I ran this through the old tagger a while back and it matched roughly 30% of the tracks. I’ve been using this set of files to tune the new tagging engine and once things got cached into memory, it chewed through over 100 files in under 7 seconds:
60% matched: 64 files matched, 41 files with suggestions, 1 files not matched.
60% !! Check the results for yourself!
And of the 41 files that have suggestions at least 80% of them have the correct match in the top 3 closest matches. I’m floored — it works so well, and there are a number of improvements still left to make. The downside? You need the 700Mb lucene index on your hard drive. That’s going to be more than 250Mb to download. I’ll have to work out the right combination of BitTorrent, caching, and P2P solutions to tackle that minor issue.
But this is really stunning!