Urgent schema update required

On Friday 24 May, 2013 at 15:00UTC we’re going to make an urgent schema update to fix a problem that occurred during our schema change last week. Please read this whole blog post carefully!

This update will not make any changes to the schema, but it will fix some data issues that have appeared on slave servers.

We apologize profusely for these problems — we’re working hard to rectify this problem and we’re going to improve our processes going forward to ensure that future releases will not encounter these problems.

What went wrong

Due to a misunderstanding of our database system, the ‘track’ table will be corrupted on the majority of replicated slave databases after the schema 17 migration. Specifically, depending on the internal choices of a given postgresql installation’s query planner and other system details, any particular server can end up with a variety of incompatible permutations of the track table, where ‘id’/’gid’ pairs will generally point to the incorrect track data. Unfortunately, this problem is compounded by replication, which is based on the ‘id’ column. Therefore, replication packets since the schema change are likely to have deleted and modified the incorrect rows of the ‘track’ table on slaves.

How are we fixing it

In order to ensure that no slaves continue to replicate incompatible changes, we are incrementing the schema number again to 18, which will force operators of slave servers to intervene appropriately. To ensure that slaves have a correct version of the ‘track’ table, we are providing an upgrade script that will download an exported snapshot of the production server’s ‘track’ table at a known point and import it, as well as correct some smaller issues. By importing this snapshot, slave servers will be reset to a correct version of this table and replication can continue.

Specific step by step instructions on running this upgrade will be in a separate blog post. Watch this space!

What problems may have arisen

  1. In the unlikely case an external program directly references track row ID numbers, or if it uses the newly-added track MBID field (the ‘gid’ column), these will not be correct if they were taken from any server but the production server. If an application stores either of these identifiers in any way, that data should be rebuilt.
  2. Due to the compounding problems from replication, some tracklists will have incorrect information — missing tracks, misnumbered tracks, links to the wrong recordings, wrong durations, and/or wrong track artist credits. Information of this sort that was derived from replicated slaves during the affected period should be regenerated after upgrading.

FAQ about this update

Q: We don’t use the track table, we use recordings. Am I affected?
A: You are not affected if you use recordings directly, i.e., looking up recording information by a stored recording MBID, except if you use track information linked to those recordings (for example, if you create a list of releases a given recording appears on). Since the link between the recording and the release tables is via the ‘track’ table, anything that connects these two entities is likely to be affected.

Q: How can I tell if any of the tracklists I am using are affected?
A: Due to the random permutation issue, it’s not completely possible to be 100% sure. However, it’s possible to know of tracklists that definitely have problems by two means: track counts, and sequence issues. The former can be tested with a fairly simple query: “

SELECT medium.id, medium.track_count, count(track.id) as track_track_count,
     medium.track_count  count(track.id) AS counts_differ
FROM medium join track on track.medium = medium.id
GROUP BY medium.id, medium.track_count
HAVING count(track.id)  medium.track_count;

Any medium that appears in that query has been affected and its tracklist should not be trusted (select ‘medium.release’ to get release IDs, if that’s your jam). Sequence issues are a more complex query:

SELECT distinct m.id FROM
  (SELECT DISTINCT medium.* FROM
    ( SELECT track.medium, min(track.position) AS first_track, max(track.position)
      AS last_track, count(track.position) AS track_count, sum(track.position)
      AS track_pos_acc
      FROM track
      GROUP BY track.medium) s
    JOIN medium ON medium.id = s.medium
    WHERE first_track != 1 OR last_track != s.track_count OR
        (s.track_count * (1 + s.track_count)) / 2  track_pos_acc
    ) m

(note: if you only get 10 rows for this query, you’re fine — they’re these ten, which are known problems)

For more safety, don’t trust anything in the track table that’s been updated since the schema change:

SELECT distinct medium
FROM track
WHERE last_updated > ‘2013-05-15’

If it’s possible in your application, it’s probably best to throw out any updates to tracklists since 2013-05-15.

Again, we’re sorry for the trouble this update may have caused you!

8 thoughts on “Urgent schema update required

  1. InvisibleMan78

    I’m actually at the point, where I have to apply the schema change release 17 to my little slave server.
    (If I don’t do that, I can’t replicate any more information from MB). There are NO DATA imported since 15.5.2013.

    What exactly do I have to do in this situation?

  2. warp

    Jonatton: If you’ve imported the 20130518 or 20130522 data dump, then the track table you have is correct. We still ask you to run the upgrade to schema 18 to fix a few smaller issues (particularly, artist_credit.ref_count is wrong, entries in medium_index may be missing and an index is missing on medium.release).

  3. warp

    InvisibleMan78: to run the upgrade to schema 18 you will need to be on schema 17. You can get to schema 17 by using a fullexport dump, which will have a correct track table, or you can do a normal upgrade from schema 16 to schema 17, in which cas your track table will be wrong. In both cases however the upgrade to schema 18 will fix the track table (if neccesary) and the other issues we’ve found since the release. So it’s up to you which is more convenient for you.

  4. Valerio Paolini

    Hi! We started from scratch using the full export of 20130518 and never activated the replica, however the second SQL query produces 10 rows. Does this mean that we should redo everything? Thanks!

  5. mayhem

    Valerio: Since you started with a new dump, you’re fully in the clear. Once 18 comes up, have replication run, do upgrade.sh for 18 and then you’re good. Or do a post 18 full import — your call.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s