Category Archives: Data

New MusicBrainz server virtual machine available

Time to check the weather forecast for hell, because it appears to have frozen over! We have finally released a new Virtual Machine that contains all of the MusicBrainz server software and fixed all of the currently outstanding bugs (for the VM).

The new VM now uses a 64-bit architecture and has 80GB of disk-space so it should be much easier to get along with. I tried to ship one VM that has the search indexes build in, but after 3 hours (and increasing time) of trying to export that VM I killed it. If someone has better luck exporting a VM after building search indexes, please let me know. Also, VirtualBox seems to have improved in stability on Mac OS, so we are not going to build a VMWare version of the VM at this time.

All the details for the new VM are on our Server Setup page.

Remember to get your Live Data Feed access token here if you plan to use the replication.

Downstream Wikipedia link usage and migration to Wikidata

MusicBrainz has linked to Wikipedia for many years and we now have links to Wikidata as well. Wikidata, however, acts as a central repository for Wikipedia links, so it does not make sense for MusicBrainz to maintain its own separate set of Wikipedia links, especially since Wikipedia URLs are not very stable (because of page moves and deletions) and require a lot of maintenance. Most of our data with Wikipedia links is now also linked to Wikidata, so we plan to start removing Wikipedia links where we have a Wikidata link which has the same Wikipedia link.

What this means for downstream data users:

If you use Wikipedia links, we will provide Wikidata links but you will need to fetch the Wikipedia links you want from Wikidata separately. Wikidata has information on ways to access their data at https://www.wikidata.org/wiki/Wikidata:Data_access

We plan to start removing the links after the schema change this month, starting with the less common languages and entity types. It will take a while to work through the existing links, so we don’t expect to start removing English links from artists until after the Autumn schema change.

We recognise that some people may have code which depends on these links – if you’re using these links and the above sounds problematic, please let us know how you’re using the data (which languages and entity types) and how much time you would need to support Wikidata.

Editing: Making MusicBrainz better

Over the past few weeks I’ve received a number of emails from people who are concerned about some editors who are losing sight of some basic principles behind editing data in MusicBrainz. I wanted to chime in and remind people of some of the principles that should guide how we all get along when we edit data in MusicBrainz.

First and foremost is:

Be polite and give people the benefit of the doubt that they are doing the right thing.

I don’t have to explain being polite. Yes, we all have our bad days — that is a given. But if you’re having a bad day, stop editing MusicBrainz and step away from your computer. Go outside! When you do edit, please be kind to your fellow editors.

Giving people the benefit of the doubt that they are doing the right thing is also important. The vast majority of people who edit MusicBrainz have good intentions and you should assume that to be the case.

Second, edit to make the database better. Vote yes if an edit makes the data better.

This one is a lot more vague, since “better” is a subjective term. We should accept edits that are “good enough” and avoid asking people to make “perfect” edits.

Edits fit into four categories:

  1. Edits that makes things better (perfect or not)
  2. Edits makes things different (but neither are better)
  3. Edits that contain some correct things and some incorrect things
  4. Edits that are outright wrong (existing data is better)

The first type should clearly get a yes vote. For the second, if it doesn’t make things worse, abstain and leave a comment. The third is a judgement call and I would suggest applying this heuristic:

Unless it takes more time to fix the edit than to make a new one, vote yes.

Clearly, the fourth type deserves a no vote.

That brings me to the final topic for now: No votes. A no vote is a very strong expression that has potentially chilling effects that may prevent people from editing again. A no vote should be considered the last resort. Use a no vote if you can’t find another way to resolve an edit.

Finally, some tips for auto editors: If you see an edit that is not perfect, approve it and fix it.

Auto editors are supposed to set the tone for the project and auto editors should practically never vote no on something. You have more powers than fellow editors, so please use your powers for good!

Thanks and happy (and polite) editing!

Announcing the AcousticBrainz project

MetaBrainz and the Music Technology Group at Universitat Pompeu Fabra are pleased to announce the first public release of the AcousticBrainz project.

http://acousticbrainz.org/

What is AcousticBrainz?
The AcousticBrainz project aims to crowd source acoustic information for all of the music in the world and make it available to the public. The goal of AcousticBrainz is to provide music technology researchers and open source hackers with a massive database of information about music.

AcousticBrainz uses a state of the art research project called Essentia (http://essentia.upf.edu/), developed over the last 10 years at the Music Technology Group.

Data generated from processing audio files with Essentia is collected by the AcousticBrainz project and made available to the public under the CC0 license (public domain). In 6 weeks since its inception, AcousticBrainz contributors have already submitted data for 650,000 audio tracks using pre-release software.

Today we are releasing client programs to submit data to the AcousticBrainz server and our first public release containing audio features for over 650,000 audio files.

What data does it have?
AcousticBrainz contains information called audio features. This acoustic information describes the acoustic characteristics of music and includes low-level spectral information such as tempo, and additional high level descriptors for genres, moods, keys, scales and much more. These features are explained in more detail at http://acousticbrainz.org/sample-data

How can I get it?
You can access AcousticBrainz data via our API. See details at http://acousticbrainz.org/api
We also provide downloadable dumps of the whole dataset. You can download it (all 13 gigabytes!) at http://acousticbrainz.org/download

What can I do with it?
We hope that this database will spur the development of new music technology research and allow music hackers to create new and interesting recommendation and music discovery engines. Here are some ideas of things we would like to see:

  • Music discovery
  • Playlist generation
  • Improving the state of the art in genre recognition
  • Analytics on the musical structure of popular music
  • and more!

This is one of the largest datasets of this kind available for research, and the only one of this size that we know of which contains both freely available data as well as the reference source code used to compute the data.

How can I contribute?
If you are a music researcher, you can help us by contributing to the essentia project. Go to the essentia homepage to see how you can do this. If you do something cool with the data let us know. We’d like to start a “made with AcousticBrainz” page where we will showcase interesting projects.

If you have any audio files, we would love for you to contribute audio features to our project. You can do this by downloading our submission clients from http://acousticbrainz.org/download. We provide clients for Windows, Mac, and Linux.

If you find any bugs or errors in the AcousticBrainz stack please let us know! Report issues to http://tickets.musicbrainz.org/browse/AB.

We can’t wait to see what kind of things you will make with our data.

The AcousticBrainz team.

May 2014 virtual machines released

I’m pleased to announce that we’ve just released the May 2014 Virtual Machine images of the MusicBrainz server!

If you’d like to direct download or BitTorrent download these images, head on over to our Server Setup page.

There have been no real changes to the setup of the VM images — we still publish a VirtualBox and a VMWare image. The instructions for using the VMs haven’t changed; the latest data and the May 26th version of the software are loaded on these images.

Have fun!

LinkedBrainz: Alive and well!

Barry Norton has been a star and has created and hosted RDF dumps of the MusicBrainz data and also established a permanent SPARQL endpoint for our data on linkedbrainz.org.

The timing of this is perfect, because our next release will remove the RDFa from our pages. Proper RDF data and a SPARQL end-point are the best ways to move forward with MusicBrainz data in the context of linked open data.

This gives the MusicBrainz development team the freedom to focus on making MusicBrainz better while leaving the nitty gritty parts of making our data friendly to the linked open data hackers to experts like Barry.

Thanks so much for making this happen, Barry!

Important Update on Replication

We’ve been informed that some people have noticed their MusicBrainz slave servers have stopped applying replication packets. We’ve tracked the problem down and now have a fix for this problem.

How Do I Tell If I’m Affected?

To determine if you are affected, connect to the MusicBrainz PostgreSQL database (we recommend using ./admin/psql from your musicbrainz-server checkout) and run the following:

SELECT now() - last_replication_date FROM replication_control;

If you see a date that is larger than a few days, there’s a good chance you are affected by this bug and should upgrade.

How Do I Fix This?

Fixing this is easy, you simply need to update your Git checkout. We suggest the following:

cd musicbrainz-server
git fetch --tags origin
git checkout v-2013-10-14-replication-fix-2

EDIT: formerly used a different (broken) tag and recommended a different method of fetching tags.

When replication next runs, the data that cannot be applied will be rewritten and replication will continue as normal.