Upgrading Postgres for MusicBrainz Live Data Feed users

We’re slowly approaching that time of year: Schema change release time. After skipping our fall update to focus on some internal tasks, we’re ready to have another schema change release in the spring: May 16, 2016

We have started the process to collect features we wish to release for this schema change release and we’ll be publishing that list in the coming weeks. However, we’re contemplating the impact of one more change we’d like to make: Upgrading to a more recent version of Postgres.

Internally we are going upgrade to Postgres 9.5, which was recently released, so we expect that the Postgres team will have worked out the most significant kinks before we’re ready to move to it. However, even though we are moving to 9.5, we are considering the impact on our downstream users/customers who need to make the same or similar change.

While we are moving to version 9.5 of Postgres, we have the option of only adopting features from Postgres 9.4, which means that our downstream users may continue to use Postgres 9.4. However, Postgres 9.5 has some nice features we’d like to use (e.g. UPSERT), so we’re pondering if it is possible for us to require Postgres 9.5 from all of ours Live Data Feed users starting on May 16, 2016. 

We have already informally queried a few of ours users and so far it seems that requiring Postgres 9.5 is feasible. If you are a Live Data Feed user and feel that this requirement of Postgres 9.5 is too much for your and your organization by May 16, 2016, please leave a comment to this blog post!

BookBrainz February 2016 Release

BookBrainz_logo_solo

Welcome, readers, to the first blog post from the BookBrainz team! I’m Ben (AKA LordSputnik), one of the two guys leading the BookBrainz project to create the most complete and thorough database of literature in the world. Or, in other words, doing for books what MusicBrainz does for music. In this post, I’m going to talk to you about the February 2016 release of BookBrainz, what we’ve been working on, and the current direction of the project.

Unit Testing

One of the biggest areas of work in this update is the new unit testing for the web service. Unit testing allows us to check that our functions work as we expect them to, and help to find and prevent bugs in the code. This is something we’ve been pushing back for months now, and it really needed doing. Luckily, the Google Code-in (GCI) happened, and one of our students, Stanisław Szcześniak, stepped up to the challenge of writing our test suite.

4000 lines of code and several test classes later, our test coverage (the proportion of web service code checked by the tests) has increased from 40% to 70%, and we’ve found about 10 bugs which we’ll be fixing for the next release. Stanisław is still helping out, now focusing his efforts on a BookBrainz plugin for Calibre (like Picard, but for eBooks) and the new BookBrainz client library.

Python Client Library

We’re planning the Python client library to be a key component of a couple of applications we’ll be writing in the future to increase the amount of data in BookBrainz. It’ll also allow outside developers to programmatically access and modify the information in BookBrainz through our web service. At the moment, it’s still in the early stages of development, with Stanisław playing about to find a clean and elegant architecture.

Reactification!

Another area we’re continuing to look into is changing our existing web page templates (written in Jade) to use the React JavaScript library. This helps us by allowing the same code to be used for templating in the browser and the server, and also allows us to use third-party libraries to simplify our user interface code (for example, React-Bootstrap and React-fontawesome).

So far, we’ve converted 9 pages, including login, registration, search and revision display, with a little help from our GCI students. An added benefit of this is that we’ve been able to apply the idea of progressive enhancement to allow JS-enabled browsers to refresh search results in real-time, while keeping the previous functionality for older or limited browsers.

Browser Compatibility

Since the last release, we’ve established a list of supported browsers, and signed up to a really useful automated test site called BrowserStack. This allows us to get screenshots of key pages of the site in many different browsers, and see where things are breaking. Although we’ve been mainly been working on back-end code for the last couple of months, there are a few issues that we’ve found out about in older browsers which will hopefully be fixed soon™. If there are any issues that you’ve spotted when using the site, be sure to let us know in the BookBrainz JIRA.

Improving Error Messages

Working on how we display errors to users is another front-end issue that we’ve sadly been neglecting for about half a year now, in favor of improving the back-end. A good example of the problem is logging in – right now, the site sometimes displays a vague error message about not being able to log the user in, and sometimes it spits out the really unhelpful message “Internal Server Error”.

ISE

So informative, BookBrainz…

Part of the work we’ll be doing on error messages in the next few months will involve creating custom error pages and trying to eliminate unfriendly technical error messages. Giving the user a better idea of what to do when something does go wrong is also important, and we’ll be trying to achieve this along with making error display more consistent across the site.

Direct Database Access

The largest change we’ve been working on over the last couple of months is having the site access the database directly, rather than obtaining data through the web service, as we’ve been doing up until now. Originally, we decided to put the web service in between the site and database to ensure that the web service had good data editing functionality and that data representations were the same as those in the site.

However, this led to us having to effectively define our schema three times – once in the database, once for structuring web service data, and once to define the data models used in the site code. We found this to be a bad situation, because it’s easy to forget to keep the three schemas consistent. Last autumn, I tried to improve the web service to automatically generate web service data structures from the site data models, but eventually decided that this would be overly complex and time-consuming.

Instead, we made the decision to migrate all of our code to Node.js, currently used by the site, and then use a shared data model package for both the site code and web service code. With this change, we would only have to define the schema twice – once in the database, and once in JavaScript – better, but not ideal. Thankfully, the Node.js library bookshelf provides database reflection, which means that we can automatically generate the Node.js data models from the database schema, finally removing the need to define the schema in multiple places.

Now, we’re about two-thirds of the way through updating the site to use the new data models – you can see our progress in the GitHub repository. Due to a new emphasis on code quality and implementing tests as we go along, progress has been slow but steady, but this should hopefully result in a more stable site when we complete this upgrade within the next couple of months.

New Web Service

Following the direct database site update, our schema will have changed in subtle ways which will make it impossible to keep using the existing web service code. Ideally, we would have kept the schema unchanged while we moved to Node.js, but partly due to ORM limitations and partly due to a desire to make things better, we’ve been tinkering with it as we’ve gone along.

This means that the web service will be unavailable for a few months while we rewrite it in Node.js. However, the new web service should be a big improvement on the old one, with a more carefully planned design learning from the mistakes of the current iteration. If you have any suggestions for what you think would be a good feature for the new web service, please let us know in the comments!
That’s it from me for now. I hope you’ve found it interesting to get an insight into the things we’ve been working on! For a more specific list of changes in the February 2016 release, please see our change log. If you have any suggestions for our future blog posts, I’d love to hear your feedback.

Server update, 2016-02-08

This update hopefully fixes some issues with “Edit Medium” edits that, in rare cases, resulted in an incorrect track listing. Sometimes tracks were being inexplicably deleted. The git tag for today’s release is v-2016-02-08.

Bug

  • [MBS-8752] – Database inconsistencies when updating medium
  • [MBS-8765] – instrument_annotation should not be backed up in mbdump.tar.bz2
  • [MBS-8770] – Banner not displayed

Task

  • [MBS-7475] – Get rid of Algorithm::Merge

Notifications and messaging in MetaBrainz projects

During the last MusicBrainz summit in Barcelona we decided to start working on finding possible ways to implement two features that have been requested for a long time:

  1. Messaging between users
  2. Notifications about various actions in MetaBrainz projects

Since MetaBrainz is more than just MusicBrainz these days, we also want to integrate these features into other projects. That, for example, means when a user is reading reviews on CritiqueBrainz they can see notifications about comments on their edits on MusicBrainz. Same applies to messaging. These features are intended to encourage our communities to communicate more easily with each other.

Messaging

http://tickets.musicbrainz.org/browse/MBS-8721

The only ways of communication we have right now are two IRC channels, forums that we plan to replace with Discourse, and comments on individual edits. Sometimes we end up sending private emails to editors for one reason or another. Perhaps it is better to have our own messaging system for this purpose? I imagine it being similar to messaging systems on forums, reddit, etc. We would like to know what you think potential uses are for this and how it might look like to be useful.

Notifications

http://tickets.musicbrainz.org/browse/MBS-1801

Site-based notifications are another thing that people have been asking for a long time. For example, these notifications can be related to edits on MusicBrainz, reviews on CritiqueBrainz, datasets in AcousticBrainz, etc. It can be an addition or replacement for email notifications that we currently have in MusicBrainz. Maybe something similar to the inbox feature that the Stack Exchange network has. People should be able to choose if they want to keep receiving email notifications or only use the new site-based notifications.

Progress so far

We looked at a couple of ways to implement this functionality.

First suggestion was to use the Layer toolkit. The problem with it is that we don’t want to be dependent on closed software and another company’s infrastructure, especially in case of such important features.

Second was to use the XMPP protocol to handle communication and notifications. We tried to implement a proof of concept using this protocol and encountered several issues at the start:

  • It’s unclear how to store messages and process them later;
  • It can be problematic to reuse the same connection in different browser;
  • There are plenty of things that we’ll need to implement on top of this protocol ourselves (like authentication, storage, notifications).

Repository with everything that was implemented so far is at https://github.com/metabrainz/xmpp-messaging-server. Given these problems we started considering implementing our own server(s) for this purpose.

You can take a look at the document where we collect most information about current progress.

Feedback

There’s plenty of feedback on the site-based notifications feature request, and we have a pretty good understanding of what’s needed. This is not the case with the messaging feature. We explored several options for implementing this kind of functionality and decided that it’s time to refresh the list of requirements to get an idea of what needs to be done.

The goal of this blog post is to encourage discussion and gather ideas. If you are interested in these features, please share your thoughts and suggestions.

Server update, 2016-01-25

Backwards-Incompatible JSON Web Service Changes

In an effort to get our JSON Web Service out of “beta” status, we’ve made some backwards-incompatible changes to it in this release:

  • The video flag on recordings is now outputted as true or false instead of 1 or 0.
  • Empty relations arrays are not outputted for linked entities anymore, since linked entities never include relationships.
  • The iso_3166_1_codes, iso_3166_2_codes, and iso_3166_3_codes properties have been renamed to iso-3166-1-codes, iso-3166-2-codes, and iso-3166-3-codes, respectively. This only applies to lookup and browse requests; search requests already outputted these with hyphens.
  • The iso-3166- properties mentioned in the previous point are not outputted if they’re empty.

Some other changes to the web service have been made, but are considered additions (not changes to existing output), so hopefully shouldn’t cause any problems. You can review them in the changelog below.

Other Changes

An issue where entities deleted from the database (but still present in the cache) remained visible has hopefully been fixed. There are several other miscellaneous bug fixes linked below. Thanks again to Ulrich Klauer for his contributions. The git tag is v-2016-01-25.

Bug

  • [MBS-5676] – JSON relationships output doesn’t include target-type
  • [MBS-6166] – Deleted accounts can still have details edited
  • [MBS-7241] – Non-transactional cache means the cache can sometimes fail to delete entities that are gone at the database level
  • [MBS-7735] – ws/2: recording’s “video” flag inconsistent between xml and json
  • [MBS-7921] – Internal server error when requesting /ws/2/isrc as JSON
  • [MBS-8367] – ws2 JSON incorrectly returns non-included field as null value
  • [MBS-8396] – JSON output has no ordering key attribute for release group series
  • [MBS-8563] – Release & Release Group browse requests without type/status filters return results which contradicts the documentation
  • [MBS-8688] – Random tagged entity type display inconsistency in personal tag page
  • [MBS-8722] – Edit stuck trying to change the gender of a group
  • [MBS-8726] – Replicated updates don’t invalidate cache entries on slave servers
  • [MBS-8730] – Reordering of sub work parts causes unwanted reordering of main work parts
  • [MBS-8746] – JSON web service doesn’t distinguish between relationships not existing vs. not being loaded

Server update, 2016-01-11

Our first release of 2016 consists mainly of data-display fixes by Ulrich Klauer and a couple small improvements by Google Code-In students Caroline Gschwend and Ohm Patel. Notably, internationalized domain names are now displayed in decoded form: https://musicbrainz.org/url/2de1616a-7ca0-4688-92cc-0a8373190ede

Thanks once again to the above contributors. :) The git tag for today’s release is v-2016-01-11 and the changelog is below.

Bug

  • [MBS-4575] – Old add release label edit does not display
  • [MBS-5205] – Text diff incorrectly highlights first word that didn’t change
  • [MBS-7844] – Name variation marker not used for artists in tracklists in “edit medium” edits
  • [MBS-8012] – Release dates/countries are displayed strangely in edit release label edits
  • [MBS-8161] – Medium titles have no diff highlighting when displaying edits
  • [MBS-8210] – Multiple “Remove ISRC/ISWC” edits on one page interfere
  • [MBS-8330] – Another name variation check after HTML entity conversion
  • [MBS-8413] – Removed URLs in edits are badly encoded
  • [MBS-8692] – Expired Catalyst sessions remain (partially) in Redis
  • [MBS-8698] – Content negotiation for JSON-LD representation does not work with multiple MIME types in Accept header

Improvement

  • [MBS-6407] – Add username to our verification mails
  • [MBS-8683] – Display internationalized domain names in decoded form
  • [MBS-8709] – Mark up removed entities as usual in add medium/edit medium edits
  • [MBS-8713] – Block SoundCloud search and tags URLs

One month of Google Code-in

So today it is a month ago since the Google Code-in competition started and 18 days until it is ending. I wanted to take this opportunity to talk a bit about some of the things that have happened so far and where we’re at.

Google Code‐inSince December 7th when Google Code-in started, we have been in touch with 107 students on the Google Code-in site, of which 70 have completed at least one task and thus earned a digital certificate from Google. 11 students have so far earned themselves a t-shirt from Google by completing 3 or more tasks. The student with the highest number of completed tasks right now sits at 17 tasks, followed by one at 16 and another at 15 completed tasks. The student with the 10th most tasks completed has 3 tasks to their name.

Stanisław Szcześniak presenting about MusicBrainz

Stanisław Szcześniak, GCI student from Poland, presenting about MusicBrainz.

We have had 7 students do presentations on MusicBrainz in at least India, Romania, England, and Poland; about 50 reviews written for CritiqueBrainz with a few more in progress; a couple of MusicBrainz how to’s written for the wiki; one video tutorial made (which hasn’t been uploaded yet); a bunch of tests written for BookBrainz; updated and have had made a bunch of icons/logos in various places; a bunch of code patches and tests written for almost all our projects, as well as for beets (a 3rd party music file tagger and organiser heavily using MB data).

We have also had to report 3 students for plagiarising leading to their disqualification. :( However, compared to the amount of work and number of students, I think it’s a decently small number.

Overall, I am (still!) really excited about MetaBrainz finally being a part of Google Code-in, and I definitely think the lack of sleep the first week and newbie questions on IRC and on the GCI tasks are worth it. We’re getting some great stuff done, that we may not have gotten around to in any reasonable time ourselves, and we get to help all these students learn about programming, open source, open data, licenses, and a bunch of other things. I’m happy and I’m not looking forward to picking only 5 finalists and only 2 winners. There are definitely more than that I would personally like to see in both categories. :)

Have you had any experiences with or thoughts on our Google Code-in participation so far? Please do share them with us in the comments!