Category Archives: Data

New MusicBrainz virtual machine released

I have recently released a new MusicBrainz virtual machine. This virtual machine includes all the important bits of MusicBrainz so you can run your own copy! I’d been hoping for feedback if people have encountered any problems with this VM, but I’ve not received any feedback. Here is to hoping that no news is good news!

For information on how to download, install and access this new virtual machine, take a look at our MusicBrainz Server setup page. The new VM can be downloaded from here via direct download or a torrent download.

Most of the outstanding bugs should be fixed in this release — if not, please open a new ticket.

The end of the replication nightmare!

I’m pleased to report that our nightmare of finding/reconstructing the missing replication packets is finally over!

Through many heroic hours of work, Bitmap and Chirlu have reconstructed the missing replication packets. All clients should now be on their way to being up to date. We’ve learned a number of lessons (some good, some bad — that’s life, right?) in this ordeal and we hope to avoid these issues in the future.

An integral part of this recovery process were a number of people from our community who helped us: Users mbcz, rembo10 and xeam sent us their complete DB dumps! Bitmap used these to sanity check and diff several other database to finally extract the missing packets. Thank you for dropping what you were doing and sending us a few GB of data over blazingly fast connections. Without you this would not have been possible; and this is not an exaggeration. Thank you!

After some more rest we’re going to continue to put out smaller fires that remain from the move to NewHost, but for now, the big fires are put out. Just in time for the weekend!

In the 11 year history of the replication stream we’ve had to have users restart their stream about 3-4 times because of problems on our end. Zero would’ve been nicer, but I’m proud that we’ve been able to make this system work for so long. On a daily basis we seem to have about 400 replicated copies of MusicBrainz running all over the world. Clearly this part of our service is well used and I sleep a little better at night knowing that our most critical data is backed up across the globe.

Just for fun, here is a graph of the replication API usage over the last 6 months:

hourly-api-usage.png

Towards the end the graph shows the week plus long break, then a small blip as some of our replicas got unstuck yesterday and the much larger spike shows the rest of the replicas getting unstuck. Now, as to what caused the blip in mid-October — I have no idea.

Anyways, please accept my apologies for the replication stream outage and keep replicating!

Thanks!

New MusicBrainz test VM available

There is a new test VM available for anyone who would like to try the latest, possibly not fully debugged, VM. I’m not sure why the VM is nearly 20GB larger than the previous one, while containing roughly the same stuff, but that is what we’re stuck with for this test. I’ll try harder to minimize the size for the final build.

Grab the VM here.

Read these Usage tips.

IMPORTANT: Please ignore the usage tips published on the wiki — they do not apply to this release. For the next release I’ll try and match more of the characteristics of the last version. Do read the usage tips above!

File a bug here.

New MusicBrainz server virtual machine available

Time to check the weather forecast for hell, because it appears to have frozen over! We have finally released a new Virtual Machine that contains all of the MusicBrainz server software and fixed all of the currently outstanding bugs (for the VM).

The new VM now uses a 64-bit architecture and has 80GB of disk-space so it should be much easier to get along with. I tried to ship one VM that has the search indexes build in, but after 3 hours (and increasing time) of trying to export that VM I killed it. If someone has better luck exporting a VM after building search indexes, please let me know. Also, VirtualBox seems to have improved in stability on Mac OS, so we are not going to build a VMWare version of the VM at this time.

All the details for the new VM are on our Server Setup page.

Remember to get your Live Data Feed access token here if you plan to use the replication.

Downstream Wikipedia link usage and migration to Wikidata

MusicBrainz has linked to Wikipedia for many years and we now have links to Wikidata as well. Wikidata, however, acts as a central repository for Wikipedia links, so it does not make sense for MusicBrainz to maintain its own separate set of Wikipedia links, especially since Wikipedia URLs are not very stable (because of page moves and deletions) and require a lot of maintenance. Most of our data with Wikipedia links is now also linked to Wikidata, so we plan to start removing Wikipedia links where we have a Wikidata link which has the same Wikipedia link.

What this means for downstream data users:

If you use Wikipedia links, we will provide Wikidata links but you will need to fetch the Wikipedia links you want from Wikidata separately. Wikidata has information on ways to access their data at https://www.wikidata.org/wiki/Wikidata:Data_access

We plan to start removing the links after the schema change this month, starting with the less common languages and entity types. It will take a while to work through the existing links, so we don’t expect to start removing English links from artists until after the Autumn schema change.

We recognise that some people may have code which depends on these links – if you’re using these links and the above sounds problematic, please let us know how you’re using the data (which languages and entity types) and how much time you would need to support Wikidata.

Editing: Making MusicBrainz better

Over the past few weeks I’ve received a number of emails from people who are concerned about some editors who are losing sight of some basic principles behind editing data in MusicBrainz. I wanted to chime in and remind people of some of the principles that should guide how we all get along when we edit data in MusicBrainz.

First and foremost is:

Be polite and give people the benefit of the doubt that they are doing the right thing.

I don’t have to explain being polite. Yes, we all have our bad days — that is a given. But if you’re having a bad day, stop editing MusicBrainz and step away from your computer. Go outside! When you do edit, please be kind to your fellow editors.

Giving people the benefit of the doubt that they are doing the right thing is also important. The vast majority of people who edit MusicBrainz have good intentions and you should assume that to be the case.

Second, edit to make the database better. Vote yes if an edit makes the data better.

This one is a lot more vague, since “better” is a subjective term. We should accept edits that are “good enough” and avoid asking people to make “perfect” edits.

Edits fit into four categories:

  1. Edits that makes things better (perfect or not)
  2. Edits makes things different (but neither are better)
  3. Edits that contain some correct things and some incorrect things
  4. Edits that are outright wrong (existing data is better)

The first type should clearly get a yes vote. For the second, if it doesn’t make things worse, abstain and leave a comment. The third is a judgement call and I would suggest applying this heuristic:

Unless it takes more time to fix the edit than to make a new one, vote yes.

Clearly, the fourth type deserves a no vote.

That brings me to the final topic for now: No votes. A no vote is a very strong expression that has potentially chilling effects that may prevent people from editing again. A no vote should be considered the last resort. Use a no vote if you can’t find another way to resolve an edit.

Finally, some tips for auto editors: If you see an edit that is not perfect, approve it and fix it.

Auto editors are supposed to set the tone for the project and auto editors should practically never vote no on something. You have more powers than fellow editors, so please use your powers for good!

Thanks and happy (and polite) editing!