Category Archives: Data

Server update, 2018-08-14, with new virtual machine

A new virtual machine for running your own mirror of MusicBrainz is now available. Thanks to InvisibleMan78 for the continuous feedback and to Jeff Sturgis for his work with Docker Compose! This is the last virtual machine to include the soon defunct search server. Links:

Main changes:

  • Added helper script set-web-server-name to allow connecting to the web site from a remote host;
  • Added helper script turn-port to allow turning service ports on/off for db (postgresql), musicbrainz, redis, and search;
  • Fixed docker containers startup on VM boot;
  • Fixed potential access token loss due to helper scripts reindex;
  • Fixed and improved documentation (now copied inside the VM);
  • Improved error handling in helper scripts;
  • Updated MusicBrainz server, search server, base image and dependencies.

This server release converts more search results pages to React and fixes seven bugs. Thanks to @riipah for fixing @TouhouDB handler. Search scores are no longer displayed on search results page for clarity as they misleadingly weighed down the the importance of results appearing on the first page but not in the first place. The git tag is v-2018-08-14.

Sub-task

  • [MBS-9760] – Convert the event search results page to React
  • [MBS-9761] – Convert the series search results page to React

Bug

  • [MBS-9687] – Field error messages are not always translated
  • [MBS-9694] – TouhouDB URLs are not recognized
  • [MBS-9705] – URL overview header link is now appended with /show
  • [MBS-9744] – OAuth userinfo endpoint returns latin1 encoded JSON
  • [MBS-9755] – Fix unknown search in statistics for scripts
  • [MBS-9757] – Search page form is sometimes in the wrong language
  • [MBS-9784] – IMDb verification too strict (# of digits)

Task

  • [MBS-9780] – Normalize Creative Commons URLs to HTTPS

Improvement

  • [MBS-9767] – Remove search score display from the UI

New MusicBrainz virtual machine released

I have recently released a new MusicBrainz virtual machine. This virtual machine includes all the important bits of MusicBrainz so you can run your own copy! I’d been hoping for feedback if people have encountered any problems with this VM, but I’ve not received any feedback. Here is to hoping that no news is good news!

For information on how to download, install and access this new virtual machine, take a look at our MusicBrainz Server setup page. The new VM can be downloaded from here via direct download or a torrent download.

Most of the outstanding bugs should be fixed in this release — if not, please open a new ticket.

The end of the replication nightmare!

I’m pleased to report that our nightmare of finding/reconstructing the missing replication packets is finally over!

Through many heroic hours of work, Bitmap and Chirlu have reconstructed the missing replication packets. All clients should now be on their way to being up to date. We’ve learned a number of lessons (some good, some bad — that’s life, right?) in this ordeal and we hope to avoid these issues in the future.

An integral part of this recovery process were a number of people from our community who helped us: Users mbcz, rembo10 and xeam sent us their complete DB dumps! Bitmap used these to sanity check and diff several other database to finally extract the missing packets. Thank you for dropping what you were doing and sending us a few GB of data over blazingly fast connections. Without you this would not have been possible; and this is not an exaggeration. Thank you!

After some more rest we’re going to continue to put out smaller fires that remain from the move to NewHost, but for now, the big fires are put out. Just in time for the weekend!

In the 11 year history of the replication stream we’ve had to have users restart their stream about 3-4 times because of problems on our end. Zero would’ve been nicer, but I’m proud that we’ve been able to make this system work for so long. On a daily basis we seem to have about 400 replicated copies of MusicBrainz running all over the world. Clearly this part of our service is well used and I sleep a little better at night knowing that our most critical data is backed up across the globe.

Just for fun, here is a graph of the replication API usage over the last 6 months:

hourly-api-usage.png

Towards the end the graph shows the week plus long break, then a small blip as some of our replicas got unstuck yesterday and the much larger spike shows the rest of the replicas getting unstuck. Now, as to what caused the blip in mid-October — I have no idea.

Anyways, please accept my apologies for the replication stream outage and keep replicating!

Thanks!

New MusicBrainz test VM available

There is a new test VM available for anyone who would like to try the latest, possibly not fully debugged, VM. I’m not sure why the VM is nearly 20GB larger than the previous one, while containing roughly the same stuff, but that is what we’re stuck with for this test. I’ll try harder to minimize the size for the final build.

Grab the VM here.

Read these Usage tips.

IMPORTANT: Please ignore the usage tips published on the wiki — they do not apply to this release. For the next release I’ll try and match more of the characteristics of the last version. Do read the usage tips above!

File a bug here.

New MusicBrainz server virtual machine available

Time to check the weather forecast for hell, because it appears to have frozen over! We have finally released a new Virtual Machine that contains all of the MusicBrainz server software and fixed all of the currently outstanding bugs (for the VM).

The new VM now uses a 64-bit architecture and has 80GB of disk-space so it should be much easier to get along with. I tried to ship one VM that has the search indexes build in, but after 3 hours (and increasing time) of trying to export that VM I killed it. If someone has better luck exporting a VM after building search indexes, please let me know. Also, VirtualBox seems to have improved in stability on Mac OS, so we are not going to build a VMWare version of the VM at this time.

All the details for the new VM are on our Server Setup page.

Remember to get your Live Data Feed access token here if you plan to use the replication.

Downstream Wikipedia link usage and migration to Wikidata

MusicBrainz has linked to Wikipedia for many years and we now have links to Wikidata as well. Wikidata, however, acts as a central repository for Wikipedia links, so it does not make sense for MusicBrainz to maintain its own separate set of Wikipedia links, especially since Wikipedia URLs are not very stable (because of page moves and deletions) and require a lot of maintenance. Most of our data with Wikipedia links is now also linked to Wikidata, so we plan to start removing Wikipedia links where we have a Wikidata link which has the same Wikipedia link.

What this means for downstream data users:

If you use Wikipedia links, we will provide Wikidata links but you will need to fetch the Wikipedia links you want from Wikidata separately. Wikidata has information on ways to access their data at https://www.wikidata.org/wiki/Wikidata:Data_access

We plan to start removing the links after the schema change this month, starting with the less common languages and entity types. It will take a while to work through the existing links, so we don’t expect to start removing English links from artists until after the Autumn schema change.

We recognise that some people may have code which depends on these links – if you’re using these links and the above sounds problematic, please let us know how you’re using the data (which languages and entity types) and how much time you would need to support Wikidata.