The BBC contributes some works data!

I’m very pleased to announce that the BBC has extracted a small chunk of its sizable Orpheus classical works database! The Orpheus database contains 114,160 works in total, with 206,179 sub-parts. The first 1000 works are serialized into XML here:

Some notes about this data:

  1. The parties data hasn’t been matched to MusicBrainz artists yet. Nearly all composers have birth/death dates, which should help loads.
  2. Works can only have a single level of sub-parts. So things like opera have the level encoded into the title of the parts.
  3. There is also performance/recording information that could be included, but there might be political issues around that.
  4. More reference data is available, if that is useful.
  5. The party/party name mapping hasn’t been dumped properly. That can be done if it becomes a problem.
  6. The dump format can be adjusted if need be.

The purpose of this data is twofold: First, I asked the BBC to provide us some data that would allow us to do a good job establishing our database of Works. As of right now we have not defined what exactly a Work in NGS is and what it will do. There are entire sets of Advanced Relationship link types linking to Works that are yet to be defined and we need to start working on those links types soon. We also need to define the list of Work types that we’ll allow in NGS.

Second, this data would be awesome to have in MusicBrainz. While we haven’t been granted access to all of this data, this is certainly a possibility moving forward. And the BBC has spent a lot of time over the years grooming this database, so we have the potential for making a great jump forward for MusicBrainz classical knowledge by importing this complete data set. But, this should be mostly a thought exercise for now – we’re not about to import any data since we have a ton of other things to do. But once we’re comfortable with our Works setup, then we can start considering if, how and when we’d integrate this data.

Big thanks to the BBC and especially Nick Humfrey who tracked down this data and obtained permission for its release!