Should we import all/some of the CD Baby data?

Derek from CD Baby and I have been discussing if/how we should add CD Baby data to MusicBrainz. Given that question, I’ve taken a closer look at the CD Baby data to see what the corresponding releases in MusicBrainz look like. While there are many great looking releases that match perfectly, there are lots of cases where the data CD Baby has differs a bit from the data we have. Read the full blog entry to see some examples.

We have a few ideas on what to do and I would like to get feedback from the community at large on these ideas:

  1. Automatically import all of the CD Baby releases that MusicBrainz doesn’t have. Assume that CD Baby data is always correct. Channel none of these adds through the edit system.
  2. Provide a means to import CD Baby releases into MusicBrainz through some import system that could work like FreeDB import does now. Perhaps we could even make it a little smarter than the current FreeDB import.
  3. Something else. We’re open to suggestions.

What do you think? (See below for the promised examples)

For each of these examples, CD Baby is the top entry and MusicBrainz the bottom one:

10bass T — Do You Know the Way?:

1 Beat Generation

2 Hip Hop Culture

3 Good Times

4 10 Bass Hit

5 Scratch ‘n’ Sniff

6 What’s the Definitoin of a 10basst

7 Good Times (remix)

8 Open Your Eyes

9 Third World/first Person

10 People Gettin’ Down

11 Some Say We’re Spanish Some Say We’re Black

12 Beat Generation #2

10 Bass T — Do You Know the Way:

1 Beat Generation (Preview #1)

2 Hip Hop Culture

3 Good Times

4 10 Bass Hit (On & on)

5 Scratch ‘N’ Sniff

6 What’s the Definition of a 10BASST

7 Good Times (remix)

8 Open Your Eyes

9 Third World/First Person

10 People Gettin’ Down

11 Some Say We’re Spanish Some Say We’re Black

12 Beat Generation (Preview #2)

On MusicBrainz the remix versions are much better filled out — this happens quite a bit. Then there are mostly benign cases where only caps are off:

the number twelve looks like you — nuclear sad nuclear:

1 The Devils Dick Disaster

2 Texas Dolly

3 Clarissa Explains Cuntainment

4 Track Four

5 The Proud Parent’s Convention Held In The ER

6 An Apathy Fictional Description

7 Like A Cat

8 Rememberance dialogue

9 An Excercise In Self Portraiture:go shoot yourself

10 Operating On A Re-run Episode

11 Track Eleven

12 Category

The Number Twelve Looks Like You — Nuclear. Sad. Nuclear.:

1 The Devil’s Dick Disaster

2 Texas Dolly

3 Clarissa Explains Cuntainment

4 Track Four

5 The Proud Parent’s Convention Held in the ER

6 An Aptly Fictional Description

7 Like a Cat

8 Rememberance Dialogue

9 An Exercise in Self-Portraiture: Go Shoot Yourself

10 Operating on a Re-Run Episode

11 Track Eleven

12 Category

(not sure I want to ask about track #2). And with classical we get into a whole new territory:

Alex Masi — In the Name of Bach:

1 Toccata And Fugue In D Minor BWV 565

2 Prelude In G Major From Well Tempered Clavier BWV 870

3 Allemande From French Suite In D Minor

4 Toccata In E Minor BWV 565

5 Presto From Violin Sonata #1 In G Minor BWV 1001

6 Fugue In E Major From Well Tempered Clavier BWV 893

7 Contrapunctus #9 Alla Duodecima From Art Of Fugue BWV 1080

8 Siciliano In C Minor From Sonatas For Violin And Harpsichord BWV

9 Courante From English Suite In A Minor BWV 807

10 Invention F Major BWV 779

11 Allemande From Violin Partita #2 In D Minor BWV

12 Courante In E Minor BWV 996

13 Chromatic Fantasia BWV 903

14 Fugue BWV 903

15 Presto From Violin Sonata #1 (Acoustic)

Alex Masi — In The Name of Bach:

1 Toccata and fugue in D minor BWV 565

2 Preludie in G Major from Well Tempered Clavier BWV 870

3 Allemande from French Suite in D Minor BWV 812

4 Toccata in E Minor BWV 565

5 Presto from Violin Sonata #1 in G Minor BWV 1001

6 Fugue in E Major from Well Tempered Clavier BWV 893

7 Contrapunctus #9 Alia Duodecima from Art of Fugue BWV 1080

8 Siciliano in C Minor from Sonatas for Violin & Harpsichord BWV 101

9 Courante from English Suite in A Minor BWV 807

10 Invention F Major BWV 779

11 Allemande from Violin Partita #2 in D Minor BWV 1004

12 Courante in E Minor BWV 996

13 Chromatic Fantasia BWV 903

14 Fugue BWV 903

15 Presto from Violin Sonata #1 in G Minor BWV 1001 (acoustic version)

6 thoughts on “Should we import all/some of the CD Baby data?

  1. Geof F. Morris

    From a get-data-in-the-system perspective, I like the first option. From a keep-MB-the-way-it-has-operated perspective, I like the second one. On balance, I prefer the second, because I like MB’s principles.

  2. voiceinsideyou

    I wouldn’t mind an automatic import as long as it goes through the edit/approval system before actually getting into the DB. This would have to be trickled through to avoid overloading the edit queue and give people a chance to vote on the imports (or fix the entries).

    Although maybe it’s historical and I’m sure the data quality on FreeDB there is much lower, there sure are a lot of dodgy FreeDB imports to clean up; or dupes created by imports where MB style slightly disagrees with the import release.

    So I guess I like the 2nd option of those two, if the option I suggest isn’t possible (or is retarded).

  3. HairMetalAddict

    Having lived through the nightmare that was the automatic FreeDB importer several years back, I fear auto-imports as do most others who were around for those days. 😛

    CDBaby’s data is obviously a lot better than FreeDB, so the fear is less for them.

    However, the post above shows that it’s not perfect. And because of that, it should be submitted for vote just like any other submission.

    Assuming the number of submissions per week is kept below a certain level (dunno what that level should be) and there’s people willing to check and clean up said submissions (done all under one account just like the old FreeDB importer always said submitted by FreeDB)…

    That would give time to:

    (1) Clean up the imperfections before it got through.
    (2) Verify it’s not a dupe. Said imperfections in CDBaby data can cause dupes to get through.

    At the very least, they would need to be done all under a single user account name, and that account should ONLY be used for adding releases, no other edits. Which would make it easier to dig through and clean up.

  4. artysmokes

    Option 2 sounds best to me. I had a very quick look at http://cdbaby.com/ and it seems to contain lots of useful data, but I’m unsure who creates the listings over there.
    With an automatic import of 180,000 releases, wouldn’t MBz be swamped with (possibly) duplicated data that none of our editors are ever likely to look at?

  5. Albin

    I agree with voiceinsideyou. If done that way and it turns out the user gets very few no votes maybe a less restrictive method of importiong the remaining data can be used.

    I think importing freedb data works great and I don’t think adding a CD Baby alternative would make a big difference. I would probably not even use it since freedb is a bigger database. Currently you can import track names from freedb, press the guess case button and manually go through it for corrections. A CD Baby lookup wouldn’t change that.

    In the future it’d also be nice if there would be ways to avoid forking the database. Like a two-way replication system with the live data feed one side and a web service for submitting data the other. I don’t know how that would work without risking issues with dupes though. Well.. just a thought.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s