Dataset creation challenges in AcousticBrainz

Datasets are an important part of the AcousticBrainz project. All machine learning models, that are used to calculate high-level information about recordings (genre, mood, danceability, etc; see https://beta.acousticbrainz.org/485bbe7f-d0f7-4ffe-8adb-0f1093dd2dbf for example), first need to be trained on a dataset. Last year we released a platform which allows people to create and evaluate these datasets within AcousticBrainz. We’ve already seen a number of interesting datasets and now we want to take this process to the next step, make it more interesting.

Recently we started working on a new feature that allows us to organize dataset creation challenges. These challenges allow us to directly compare datasets created for the same classification tasks: genre, mood, instrumentation, etc. After a challenge ends, we can use the best models on all of the AcousticBrainz data.

Everyone can participate in a challenge, so we invite you to try the current version of the system at https://beta.acousticbrainz.org/! Right now there’s only one challenge related to classification of music with and without vocals, but we might add more later. To participate in a challenge:

  1. Create a dataset manually or by importing it from a CSV file created externally (this can be done from your profile page). Make sure it has the same structure (set of classes: “with vocals”, “without vocals”) as defined in the challenge requirements.
  2. Once you have built the dataset, select “Evaluate” link on its page to go to the evaluation page. There select a challenge that you would like to submit your dataset to (search for “Classifying vocals”).
  3. Wait for results! We’ll probably post an update once we have something interesting to show.

Please keep in mind that this is a very early prototype, so some issues are to be expected. This is why we ask you to try it and tell us what you think. We encourage you to report any problems or make suggestions in JIRA or in the #metabrainz IRC channel (https://wiki.musicbrainz.org/Communication/IRC). Feel free to use IRC or the comments section if you have any questions or thoughts. Thanks!

We have several more useful features coming up later. The big ones are improvements to the dataset editor, an extension of the API for datasets that was added recently, and a way to collect user feedback on high-level data. The dataset editor should become easier to work with, especially when working with large datasets. The API will be useful for people who want to build their own tools on top of core dataset functionality in AcousticBrainz. And finally, user feedback will allow us and other dataset creators to see how their models perform on a much larger scale.

Server update, 2016-07-04

This is a small bug-fix release. Server development has slowed down while we work on writing Docker containers for our new hosting infrastructure. But code contributions are always welcome. Thanks to Ulrich Klauer (chirlu) for his work on MBS-8806 this release.

The git tag is v-2016-07-04.

Bug

  • [MBS-8806] – Artist credits disappearing from tracklist & recordings after editing
  • [MBS-8987] – Untouched URLs are automatically/suddenly cleaned up
  • [MBS-8996] – Error message when trying to merge releases displays as “ARRAY(stuff)”
  • [MBS-8997] – Merging releases with “(Disc 1)”, “(Disc 2)” etc in titles selects all as disc 1
  • [MBS-8999] – Track lengths are in the wrong table column in the release editor

New Feature

  • [MBS-8994] – Allow passing &dismax=true through to the search server so Picard can show the same results as on the website

Server update, 2016-06-20

Our release today contains the usual lot of bug fixes and improvements. Thanks to yvanz, chirlu, and reosarevok for their contributions. The git tag is v-2016-06-20.

Bug

  • [MBS-7358] – Edit types not translated on Release editor / edit note tab
  • [MBS-7963] – Release editor doesn’t prevent too long disambiguation comments
  • [MBS-8809] – ASIN cleanup can be defective
  • [MBS-8951] – Recent notes is broken
  • [MBS-8968] – JSON version of artist includes gender-id value, which has no xml equivalent
  • [MBS-8977] – Edit searches with relationship type conditions cause an internal server error
  • [MBS-8983] – « Show only results that are in my subscribed entities. » checkbox broken
  • [MBS-8985] – Fix CPDL URLs matching and clean it up
  • [MBS-9000] – HTML special characters are not escaped on aliases page

Improvement

  • [MBS-7111] – Improve URL matching for musik-sammler.de links

Task

  • [MBS-8982] – Stop “fixing” piano with Guess Case

Massive connectivity issues

As you are probably aware, we’ve been having lots of network connectivity issues with all services hosted at Digital West in California (all of our projects, except ListenBrainz and AcousticBrainz).

Today we spent all morning trying to replace what we thought to be a faulty switch. That process didn’t go very well at all – we hit every conceivable issue that we could’ve hit. And a few more.

But, in this process we connected our gateway machines directly to our uplink (not through our switch) and the network issues persisted! After testing this setup with both of our machines, we’ve now conclusively eliminated all of our equipment as the possible source of trouble.

At this point our troubles lie in the hands of Digital West to fix. Thankfully the day staff will return to work in a few hours and hopefully we will make some progress on this issue then.

Sorry for all of this hassle.😦

Server update, 2016-06-06

This server release contains fixes for bugs introduced in the recent schema change, and some other small improvements listed below. Thanks to reosarevok, chirlu, yvanz, and gcilou for their contributions this time around. The git tag is v-2016-06-06.

Bug

  • [MBS-7986] – Misleading error message for event setlists
  • [MBS-8941] – Beta: Last AC in a release sometimes doesn’t change with the rest when changing release artist
  • [MBS-8942] – Beta: cursor moves to the end of AC search field when removing a letter in the middle of the name
  • [MBS-8950] – iTunes favicon is not displayed on the sidebar
  • [MBS-8959] – Can’t add new packaging types – ISE

Improvement

  • [MBS-8824] – Update the rateyourmusic logo used in the sidebar
  • [MBS-8925] – Limit BookBrainz relationship to BookBrainz URLs

Task

  • [MBS-8957] – Update SecondHandSongs icon and add BBC logo in the sidebar

Updated search jar/war files

Given the utter slackers we are, we haven’t yet finished updating the search server to output the new MBIDs that were added to some entities in our last release. We’ll try and get that done soonish.

However, we did update the search code to fix this error in the search indexer:

ERROR: type “earth” does not exist

I’ve put both of these jar/war files on our FTP site:

If you would like to try and build these from source, you’ll need commit 4f677727 from mmd-schema and the latest master commit from search-server. For instructions on how to build this, please follow these instructions.

UPDATE: The build from the current master for search-server appears to not be able to load indexes upon startup. Please use the old war (we still use this in production) until we can release a fix.

September, October, and November 2015 Community Recap

Jeez. This has been overdue! So a lot of things happened, and this Community Recap series kind of got put on the back burner, which obviously means a lot of things have piled up, so if I forgot something you thought should have been mentioned, please share it in the comments below.🙂 I will be doing a few months at a time until I’ve caught up to the now().

With that said, let’s proceed!
Continue reading