I previously mentioned that Lucene rocks — well, that is not giving it enough credit. I’m working on the guts to a Lucene enabled Picard tagger, and in doing so I have created a simple script that chewed through a given set of mp3 files and attempts to match them up with MusicBrainz.
My friend Vee once gave me a CD full of hip-hop music to give to my GF. I took one look at it and stared in shock! What a mess — not many id3 tags, mostly no album names at all. Lots of friends vs friendz problems — much slang used in inconsistent ways. Ick!
I ran this through the old tagger a while back and it matched roughly 30% of the tracks. I’ve been using this set of files to tune the new tagging engine and once things got cached into memory, it chewed through over 100 files in under 7 seconds:
60% matched: 64 files matched, 41 files with suggestions, 1 files not matched.
60% !! Check the results for yourself!
And of the 41 files that have suggestions at least 80% of them have the correct match in the top 3 closest matches. I’m floored — it works so well, and there are a number of improvements still left to make. The downside? You need the 700Mb lucene index on your hard drive. That’s going to be more than 250Mb to download. 😦 I’ll have to work out the right combination of BitTorrent, caching, and P2P solutions to tackle that minor issue.
But this is really stunning!