Dataset creation challenges in AcousticBrainz

Datasets are an important part of the AcousticBrainz project. All machine learning models, that are used to calculate high-level information about recordings (genre, mood, danceability, etc; see https://beta.acousticbrainz.org/485bbe7f-d0f7-4ffe-8adb-0f1093dd2dbf for example), first need to be trained on a dataset. Last year we released a platform which allows people to create and evaluate these datasets within AcousticBrainz. We’ve already seen a number of interesting datasets and now we want to take this process to the next step, make it more interesting.

Recently we started working on a new feature that allows us to organize dataset creation challenges. These challenges allow us to directly compare datasets created for the same classification tasks: genre, mood, instrumentation, etc. After a challenge ends, we can use the best models on all of the AcousticBrainz data.

Everyone can participate in a challenge, so we invite you to try the current version of the system at https://beta.acousticbrainz.org/! Right now there’s only one challenge related to classification of music with and without vocals, but we might add more later. To participate in a challenge:

  1. Create a dataset manually or by importing it from a CSV file created externally (this can be done from your profile page). Make sure it has the same structure (set of classes: “with vocals”, “without vocals”) as defined in the challenge requirements.
  2. Once you have built the dataset, select “Evaluate” link on its page to go to the evaluation page. There select a challenge that you would like to submit your dataset to (search for “Classifying vocals”).
  3. Wait for results! We’ll probably post an update once we have something interesting to show.

Please keep in mind that this is a very early prototype, so some issues are to be expected. This is why we ask you to try it and tell us what you think. We encourage you to report any problems or make suggestions in JIRA or in the #metabrainz IRC channel (https://wiki.musicbrainz.org/Communication/IRC). Feel free to use IRC or the comments section if you have any questions or thoughts. Thanks!

We have several more useful features coming up later. The big ones are improvements to the dataset editor, an extension of the API for datasets that was added recently, and a way to collect user feedback on high-level data. The dataset editor should become easier to work with, especially when working with large datasets. The API will be useful for people who want to build their own tools on top of core dataset functionality in AcousticBrainz. And finally, user feedback will allow us and other dataset creators to see how their models perform on a much larger scale.

5 thoughts on “Dataset creation challenges in AcousticBrainz”

  1. Nice interface, much easier than I expected!!
    I would have been interested in compiling a metal/hardcore punk list and seeing the results, but struggling to find vocal-less songs to compare! What kind of sets are useful for you?

  2. 1. I have made a dataset, it would have been made easier if I had been able to paste URL instead of manually edit it http://tickets.musicbrainz.org/browse/AB-138
    Then when I click evaluate, I have to type “v” for the challenge to show up, nothing shows up by default.
    Then when I submit to “Classifying vocals”, it just says « Can’t add this dataset into evaluation queue because it’s incomplete. »

    Otherwise, it looks fun to do. 🙂

  3. Thank you!

    aerozol, I think whatever you will come up with will be useful. You can also consider making a dataset for genre classification with music that you are interested in.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.