Once again, we’re opting to not have a schema change release in the Autumn of 2017. Expect our next schema change release to be on or about 15 May, 2018.
ListenBrainz is a project is that has the potential to gather a lot of data quickly, which would require us to have a Big Data infrastructure, which can be expensive. In an effort to use our available cash wisely, we began to look around for ways to take advantage of other infrastructures with lower costs.
Two years ago at the Google Summer of Code mentor summit I met with a representative from the BigQuery team who said that Google was happy to host any public data set for free! I immediately took them up on this offer and started a conversation. With much time passed, we finally managed to get the data set live!
If you wish to play with the data, please do!
You’ll need a Google account to log in with — once you’re logged in, every user gets 5TB of query traffic free per month. That is quite a lot for how large this dataset is currently. The schema for this table is defined here and what the data elements mean are defined in our API docs. To get you started, I’ve written a few sample queries:
- The last 25 listens for all users
- The top 25 artists for user “rob”
- Top tracks and artists for user “rob”
BigQuery uses an SQL like syntax, so if you know some SQL then diving right in should be easy. The queries above should give you an idea of what you can do with this data. Now, please know that currently we have approaching 30M listens, so the dataset is still quite small. We’re very much interested to see what sort of things people can come up with in the near future.
Finally, some notes about openness and proprietary software: Given that we have limited resources, we aim to make the most things happen with the services that are at our disposal. Google has been extremely generous to us over the years and we’re very pleased to have access to BigQuery now.
That is not to say that we’re putting all of our eggs in one basket or forcing people to use BigQuery. Our InfluxDB database hosted on our own servers keeps the master archival copy of our listen data. Soon we hope to make dumps of this data available for anyone to download and play with using whatever tools they would like. With this setup we are not fully reliant on Google for keeping this project alive. We’re glad to have their support, but should circumstance change, we can find another BigData solution and load our master archival copy there.
Now, go play with this very promising data and post some of your favorite queries in the comments!
I’m pleased to announce that we released our first official beta version of ListenBrainz yesterday! As you may know, ListenBrainz is our project to collect, preserve and make available, user listening data similar to what Last.fm has been doing, but with open data.
In 2015 a small group of hackers gathered in London to hack on the first version of ListenBrainz alpha. We threw together a pile of new technologies and released the first version of ListenBrainz at the end of the weekend. In the end, we didn’t really like the new technologies (Cassandra, Kakfa) as both ended giving us a lot of problems that never seemed to end.
In 2016 we embarked on a journey to pick new technologies that we liked better and ended up setting on InfluxDB and RabbitMQ as backbones to our data ingestion pipeline. These tools were a good match for us, since we were already using them in production! Sadly, MetaBrainz’ move to our new hosting provider ended up sucking up any available time we had to devote to the projects, so progress was made in fits and starts.
Earlier this year Param Singh expressed interest to help with the project in hopes of joining us for a Google Summer of Code project. He started submitting a never ending stream of pull requests; slowly the project started moving forwards. Together we brought the codebase up to our current standards and integrated it into the workflow that we use for all of the MetaBrainz projects.
We proceeded to prepare the next version to be released at MetaBrainz’s new hosting facility and started a never ending series of tests. We kept pounding on the data ingestion pipeline, trying to find all of the relevant bugs and ways in which the data flow could get snagged. Finally the number of reported bugs relating to data ingestion dropped to zero and we managed to import 10M listens (a listen is a record of one song being played)!
That was our cue for promoting our pre-beta test to a full beta and unleashing it onto our production servers at our new hosting facility. Today we cleaned up the last bits of the release and we are ready for business!
What does this new release bring for you, the end users? Sadly, only a few new things, since most of the work has gone into building a stable and scalable system. We do have a few new things in this release:
- Incremental imports from Last.fm — now you don’t have to do a full import any time you wish to import your latest listens from Last.fm. The importer knows when you last did and import and will work accordingly.
- Last.fm compatible submission interface — with some system configuration changes you can submit your listens directly to ListenBrainz from any application with Last.fm support. (more info here)
- Last.fm file import — if you have an old skool Last.fm zip file with your listening history backed up, you can now import it.
- User data export — you can now download your own listens straight from the site, no waiting required.
- Adaptive rate limiting on the API — our server now uses a modern rate limiting system. For details, see our API docs.
The good news is that Param is now working on his Summer of Code project that will add a lot of graphs and other critical elements for making use of this new data set. We hope to release new features on an ongoing basis from here on out.
Most importantly, we want to publicly state that ListenBrainz is now ready for business! We don’t plan to reset the database from here on out — this is the real deal and we plan to safeguard and make this database available as soon as we can. If you have hesitated with sending your listen histories to ListenBrainz in the past, you should now feel free to send your listen information to us! If you are an author of a music player, we ask that you consider adding support for ListenBrainz in your player!
In a follow-up blog post I am going to write about how to start using ListenBrainz now — at the very least use it to back-up your Last.fm listening history!
If you find bugs with our latest release, please report them to our issue tracker. If you’re interested in this project and have questions for us, why not come and pop into our IRC channel or ask a question on our community forum?
P.S. The alpha version of ListenBrainz is still around.
P.P.S. We’ll have another cool announcement very shortly! Stay tuned!
We’re going to start our schema change release process today at 17h UTC.
We anticipate having a short downtime of a few minutes as we”ll need to restart our database server. As usual, we’re not certain when we will start the downtime, but we’ll keep people posted about our progress in IRC and on Twitter.
Once we’re done with the release we will post instructions on this blog on how to upgrade any replicated instances of MusicBrainz you might be running.
We have picked our set of tickets and the date for our May 2017 schema change release: May, 15th 2017. This will be a fairly standard and minor schema change release — we’re only tackling 3 tickets that affect downstream users and no other infrastructure changes.
Take a look at our list of tickets for this schema change release. There really are only two tickets that will affect most of our downstream users:
- MBS-8393: “Extend dynamic attributes to all entities” Currently our works have the concept of additional attributes which allows the community to decide which sorts of new attributes to apply to a work. (e.g. catalog numbers, rhythmic structures, etc) This ticket will implement these attributes to all of our entities. Also, this ticket will not change any of the existing database tables, it will only add new tables.
- MBS-5452: “Support multiple lyric language values for works” Currently only one language or the special case “multiple languages” may be used to identify the language used in lyrics. This ticket allows more than one language to be specified for lyrics of a work.
The following tickets are special cases — they will not really affect our downstream users who do not have edit data loaded into their system. We are only including this change at the schema change release time in order to bring some older replicated systems up to date. If you do not use the edit data, then please ignore these tickets.
- MBS-9271: “Prevent usernames from being reused” This ticket does not change the schema, but for sake of minimizing downstream disruption, we’re going to carry out this ticket during the schema change.
- MBS-9274: “Fix the edit_note_idx_post_time_edit index in older setups to handle NULL post_time” This ticket fixes an SQL index on an edit related table.
- MBS-9273: “Fix the a_ins_edit_note function in older setups to not populate edit_note_recipient for own notes” This ticket also fixes an SQL index on an edit related table.
This is it — really minor this time around. If you have any questions, feel free to post them in the comments or on the tickets themselves.
I’m pleased to announce that Yvan Rivierre has joined the MusicBrainz development team! Yvan is not new to our community — he has been participating MusicBrainz development for some time and more recently has been attending our weekly community meeting. He’s submitted several pull requests to MusicBrainz already, and now he joins us as a full time developer.
Yvan’s nickname on all things MetaBrainz is now yvanzo, was formerly yvanz, in case you’re wondering what happened. Expect him to be around even more, helping bitmap to make improvements and changes to MusicBrainz. The MusicBrainz search infrastructure and hosting are no longer core tasks for the MusicBrainz team, leaving yvanzo and bitmap to focus solely on MusicBrainz. This brings us to a new level of dedication to our most important project and should allow us to tackle more issues faster focus new areas of improvement. (e.g. hopefully we can start making improvements in UX/UI this year!)
Welcome aboard Yvan!
If you are an eligible university student who would like to participate in Summer of Code and get paid to hack on a MetaBrainz project over the summer, take a look at our ideas page for 2017. If this sounds interesting, take a look at our getting started page.
We kindly ask that you carefully read the ideas page and the getting started page before you contact us for help!
Thanks and good luck applying!