Once again, we’re opting to not have a schema change release in the Autumn of 2017. Expect our next schema change release to be on or about 15 May, 2018.
The Metabrainz Classical Music Enthusiasts Team has kicked off to a strong start! If you are unaware about the formation and tasks at hand, you can read more about it on the forums.
It’s clear by the number of discussions and engagements in the forum that a community effort on classical music was long overdue! It’s thrilling and we are eager for the first mission: after some discussion and voting we decided that the first community effort would be a clean-up of all our data for Claude Debussy.
As a composer with a huge influence in 20th century music, yet with a relatively low amount of hard to edit compositions like operas, Debussy is a great first choice for the community of classical editors to start actively working together to improve the data. As such, if you’d like to help out, but are new to classical editing or not too active in the community yet, don’t hesitate to reach out and ask any questions. The classical community is active in its own forum category, and we’re hoping to see a lot of activity there with editors both asking and answering questions.
What will we be working on in this first classical cleanup project?
- We will review the existing works and catalogues to make sure there are no duplicates and the info looks correct (several very active classical editors have already been working on this in preparation for this cleanup).
- We will check the release list for anything that doesn’t follow the classical guidelines. Those should of course be fixed to follow the guidelines, and that’s usually a good sign of the recording and relationship info being incomplete as well.
- We will work on the recording list. The only recordings that should be there by the end of the cleanup are of Debussy himself as a performer. Anything else currently there should have performer relationships added to it if missing, then the artist credits for the recording should be changed to list the main performers.
- And we will add missing Debussy recordings! If you have enough info to add a release we’re missing that includes works by Debussy, that’s always useful. Just make sure to try to add as much info as possible from the get go, so we don’t have to clean that addition up as well!
Don’t know where to begin? Let us know and we can help find a starting point–or just jump in and help out! We can’t wait for Mr. Debussy to be a great example of how much information MusicBrainz can provide!
ListenBrainz is a project is that has the potential to gather a lot of data quickly, which would require us to have a Big Data infrastructure, which can be expensive. In an effort to use our available cash wisely, we began to look around for ways to take advantage of other infrastructures with lower costs.
Two years ago at the Google Summer of Code mentor summit I met with a representative from the BigQuery team who said that Google was happy to host any public data set for free! I immediately took them up on this offer and started a conversation. With much time passed, we finally managed to get the data set live!
If you wish to play with the data, please do!
You’ll need a Google account to log in with — once you’re logged in, every user gets 5TB of query traffic free per month. That is quite a lot for how large this dataset is currently. The schema for this table is defined here and what the data elements mean are defined in our API docs. To get you started, I’ve written a few sample queries:
- The last 25 listens for all users
- The top 25 artists for user “rob”
- Top tracks and artists for user “rob”
BigQuery uses an SQL like syntax, so if you know some SQL then diving right in should be easy. The queries above should give you an idea of what you can do with this data. Now, please know that currently we have approaching 30M listens, so the dataset is still quite small. We’re very much interested to see what sort of things people can come up with in the near future.
Finally, some notes about openness and proprietary software: Given that we have limited resources, we aim to make the most things happen with the services that are at our disposal. Google has been extremely generous to us over the years and we’re very pleased to have access to BigQuery now.
That is not to say that we’re putting all of our eggs in one basket or forcing people to use BigQuery. Our InfluxDB database hosted on our own servers keeps the master archival copy of our listen data. Soon we hope to make dumps of this data available for anyone to download and play with using whatever tools they would like. With this setup we are not fully reliant on Google for keeping this project alive. We’re glad to have their support, but should circumstance change, we can find another BigData solution and load our master archival copy there.
Now, go play with this very promising data and post some of your favorite queries in the comments!