Category Archives: Uncategorized

Help! Is there a Lucene doctor in the house?

UPDATE: Thanks to user selckin in the #lucene IRC channel for quickly solving this for us! Hopefully we can put this fix into production later today!

As our regular readers may know, we’ve been having lots of troubles with our lucene based search servers. Over the past few days we’ve spent a fair amount of time, tuning, debugging and otherwise trying to troubleshoot our setup. We’ve fixed and identified a number of problems, but most importantly we feel that we’ve identified the core issue: Our servers are simply overloaded.

Under normal conditions we find our servers loaded to about 25% – 35% CPU — things look good and we don’t think we have a capacity problem with our servers. Then a slow query comes in that starts to slow things down. Much like a traffic jam that evolves out of thin air, one slow query can make a giant mess for everyone.

We’ve started timing our queries and most of the time, they can be measured in milliseconds. However, when things get bad, they may take up to 7-8 seconds. Our upstream web servers time out on the search request after about 5 seconds in order to prevent traffic from getting backed-up. What we need to do next is to limit the duration that a lucene query can run and terminate it after the timeout.

I’ve started looking at this and quickly realized that this is much more of a job than adding a simple timeout parameter to the search call. We’re currently using this search function from IndexSearcher:

  public TopDocs search(Query query,  int n);

Ideally I would like to add a way to timeout queries after 3 seconds. So far, I’ve discovered that we could use

  public void search(Query query, Collector results)

with a TimeLimitedCollector. The old call returns TopDocs and our code assumes that we have a TopDocs object from which to cull our search results. Having stared at the docs for lucene for a while, I haven’t found an way to convert the data in TimeLimitedCollector and convert it to TopDocs. It doesn’t make sense to me.😦

How does one do this? Sadly, we have no Java programmers on our team, so we’re quite a bit out of our league here. Is there an easier way to do this? Would someone be willing to write this code for us and submit a PR? We’d find some really good chocolate and send it to you if you do!

More info on our project:

We are using Lucene 4.10.4 on a custom codebase that pre-dates SOLR — we have a new SOLR project to replace this one, but it isn’t quite done yet. (Again, not having Java programmers is a bit of a problem for us).

Any tips, explanations or pull requests would be deeply appreciated! Chocolate reward offered!

Thank you!

Important: Schema change delayed to May 23

With our ongoing hosting issues due to massive traffic increases and failing hardware we’ve been too distracted trying to manage those issues to finish all of the testing for the schema change release that was scheduled for today.

We deeply regret having to do this, but we’re going to delay the schema change release by a week. It is now scheduled for May 23, 2016. This week long delay will give us a chance to further tweak our server configuration (more on this in the next blog post) and to test the schema change release in much more detail.

We are, however, going to upgrade our database server to Postgres 9.5 either later today or tomorrow. During this upgrade we are going to employ a back-up database server and keep MusicBrainz running in read-only mode with a slightly reduced overall capacity (I’m sure everyone know what that means by now). This upgrade should have no other effects on our downstream data users.

We will give people plenty of notice before we start the postgres upgrade via our site banner and via our Twitter account (@musicbrainz).

Sorry for the continued drama affecting our services — we’re working hard to keep things together!

Important information about the May 16 schema change release

In the past few weeks we’ve been hit with massive increases in traffic and a couple of hardware failures. Trying to maintain a decent service quality in light of both of these events have taken a lot of time of our team and we don’t feel 100% confident about the schema change release tomorrow.

Fortunately, the entire team will be together in one place tomorrow. The first thing we’re going to do is review the current state of affairs and decide how to tackle the Postgres upgrade and the release. As soon as we have our plan put together, we will post an updated blog entry with all of the needed details. But, we may very well delay the release by 24 hours.

However, we found that we ran out of time on one feature: MBS-6024: Support more than one barcode on same release. This one ticket will not be included in the upcoming release. We’re really sorry for letting that one issue slip — sorry for any inconvenience this may cause you.

6 degrees of Vince Gill

I’m not sure that we’ve talked about this cool project yet, so I’ll catch up on that now. The new site Six Degrees of Vince Gill allows you to enter an artist name and see how many degrees of separation there are between your artist and Vince Gill. This project comes from Universal Music’ Nashville group — I’m happy to see our data get used in interesting ways like this!

Now, if you want to see someone relate to Vince Gill in seven degrees, have a look at how I relate to him.:)

Screen Shot 2016-03-16 at 17.45.29.

 

 

May 2016 schema change release details

In about two months time we’ll have the next schema change release: May 16, 2016. Even after skipping the fall schema change release, this release is going to have few changes that will impact our downstream users. Most of the tickets in this release will make minor improvements to database indexes and edit tables. If you are one of the few users of our edit data, then you should delve deeper into the list of tickets in this release. For everyone else, I will summarize the tickets with a greater impact.

In a previous blog post we also talked about upgrading the minimum required version of postgres. We received no real feedback requesting for us to upgrade to 9.4, but we did receive some feedback that some people would prefer 9.5, which is our preference as well. Based on that feedback, we’re going to make PostgreSQL version 9.5 the minimum required version. If you’d like to run a MusicBrainz replicated instance via our Live Data Feed, you will need to run Postgres 9.5!

The official minimum supported Ubuntu release as of now is still Ubuntu 10.04 LTS (Lucid Lynx) which reached end-of-life a year ago. We will upgrade that to Ubuntu 14.04 LTS (Trusty Tahr) at the schema change release. In particular, this means that we might start using Perl 5.18 features in the MusicBrainz Server code (as opposed to Perl 5.10 currently).

We understand that this is potentially a lot of work for some of our users, but occasionally we need to upgrade our requirements. We try and limit these sorts of upgrades as much as possible, so please bear with us.

Finally onward to the details of the release. Please take a look at the list of issues that will be addressed in this release. The few tickets worth discussing in details are:

  • MBS-8838 – “Add gids to all *_type tables“. This ticket adds MBIDs (GIDs in schema lingo) to all of our tables that define a type for some database element. Given that we recommend that external users never reference our data by row ids, we really need to provide proper permanent MBIDs to all elements of our database.
  • MBS-6024 – “Support more than one barcode on same release (SQL edition)“. This ticket adds the ability for the database to contain more than one barcode for a given release. However, this ticket does not include the user interface portions of this feature. The team will add the user interface/edit portions of this feature in a later, non schema change release.
  • MBS-4501 – “Alternative tracklists“. This ticket creates a new feature that would allow an alternative tracklist to be used for a given release. This is a better solution for handling conflicts between our style guidelines and how the data appears on the release. It is also a more elegant solution for translations of releases into different languages.

As usual, we will post final details about the release shortly before the release happens. If you have any questions about this release, feel free to ask specific questions in the tickets or general questions in the comments below.

(Edited 2016-03-16 at 12:55 UTC to add the upgraded Ubuntu requirement.)

Wary of the Web Sheriff

On September 29th, we received an email from a company called Web Sheriff, urgently requesting us to remove the name Adetayo Ayowale Onile-Ere from the Taio Cruz artist page. We investigated this request and quickly found that there were other resources on the net referencing both names.  We also found other evidence of Web Sheriff working to change the name of this artist in other venues/sites.

One of the cornerstone’s of our music database, and our principles, is to keep the data as clean and accurate as possible.  We aim to edit with due diligence.

We declined to make the change, and made the following request: If you can provide us with a birth certificate that shows Taio Cruz as the birth name, we’d be happy to make the change. Web Sheriff agreed to do that and on October 13th we received a copy of this document:

JTC - Birth Certificate

 

I inspected the document and quickly felt something was amiss. The document purports to be a Birth Certificate from the Chelsea and Westminster Hospital in London, UK and the father’s occupation is listed as “Lawyer”. While I am not a legal expert, my understanding is that the term for someone who practices law in the UK is not lawyer, but rather attorney, barrister, or solicitor.  The use of “Lawyer” in this context seemed strange to me.

With this observation as my motivation, I rang up Her Majesty’s government to ask how I would go about verifying the validity of a birth certificate. I was told that the UK government could not verify the authenticity of a certificate, but that I could request a copy of the certificate myself since they are public record. For a fee of 9 pounds 25p and two weeks time I would receive a copy of the certificate in email. I thought that was a good way forward and I asked to order a copy. The lovely lady who helped me, painstakingly endeavoured to ensure that all the details related to my request were communicated correctly. I have no doubts that I relayed the data I had accurately.

And with that, I waited. On November 8th, I received mail from Her Majesty’s government informing me that no such birth certificate could be found and that my payment will soon be refunded.

This strongly suggests that the document provided to us by Web Sheriff was not a legitimate copy of a birth certificate for Jacob Taio Cruz.

Being coerced to make changes to facts in our database based on false pretences –  I find such things despicable. We felt harassed and intimidated by the efforts of Web Sheriff, pressuring us to make this change.

I hope that this blog post will remind others to be wary and vigilant, and not give in easily to coercion.  This is what I intend to do with Web Sheriff, and others, going forward and I hope you will do the same!

For more info on Web Sheriff see their Wikipedia page.

 

UPDATE (March 9, 2016):  This post has been edited for clarity.  These changes follow a specific request from Web Sheriff that we delete or amend the post:

we must insist that you withdraw your false allegation of forgery

UPDATE: Since posting this, sharp readers spotted more problems with the certificate:

  1. There is no “county of Westminster”. There is a city of Westminster.
  2. The hospital name is missing a “t”: Wesminster.
  3. This hospital may not have been called that in 1985. It seems to have existed in that form since about 1993.

Upgrading Postgres for MusicBrainz Live Data Feed users

We’re slowly approaching that time of year: Schema change release time. After skipping our fall update to focus on some internal tasks, we’re ready to have another schema change release in the spring: May 16, 2016

We have started the process to collect features we wish to release for this schema change release and we’ll be publishing that list in the coming weeks. However, we’re contemplating the impact of one more change we’d like to make: Upgrading to a more recent version of Postgres.

Internally we are going upgrade to Postgres 9.5, which was recently released, so we expect that the Postgres team will have worked out the most significant kinks before we’re ready to move to it. However, even though we are moving to 9.5, we are considering the impact on our downstream users/customers who need to make the same or similar change.

While we are moving to version 9.5 of Postgres, we have the option of only adopting features from Postgres 9.4, which means that our downstream users may continue to use Postgres 9.4. However, Postgres 9.5 has some nice features we’d like to use (e.g. UPSERT), so we’re pondering if it is possible for us to require Postgres 9.5 from all of ours Live Data Feed users starting on May 16, 2016. 

We have already informally queried a few of ours users and so far it seems that requiring Postgres 9.5 is feasible. If you are a Live Data Feed user and feel that this requirement of Postgres 9.5 is too much for your and your organization by May 16, 2016, please leave a comment to this blog post!