For the past week we’ve been battling a variety of hosting issues from search servers acting up, gateways dropping packets and now our database server freaking out for no particular reason. We managed to fix/mitigate the issues in the first hiccups, which is good. And for three days everything looked peachy. Then, out of the blue our database server did this:
You don’t have to understand much about hosting computers to understand that this is bad. Out of the blue our server started having to work much harder than before. Normally that means that something using the server has changed. We’ve looked for a traffic increase — we can’t find one. We’ve examine someone being abusive to us — we found a couple users, but blocking them didn’t change anything. We tried turning off non-essential services that make use of the database server, but nothing ever changes. We’ve restarted the database server. We’ve slapped this, we’ve poked that, we’ve prodded, undid and tested just about everything we can think of. But, the load comes in spikes and recedes again; over and over.
We’ve had amazing help from a number of people, but several skilled computer geeks with a support from lots of others haven’t managed to make a dent in things. We’re exhausted and we need a bit of a break. So, that’s what we’re doing for the next 7 or so hours.
Then at 15h PDT, 18h EDT, 22h UK, 23h CET we’re going to start to upgrade to the latest version of Postgres 9.1. We hope to be down for less than half an hour — but you never know. We’ll tweet about the downtime and put up a banner on the MusicBrainz site to let people know when exactly we’ll take the site down.
Sorry for the hassle — we’re all amazingly frustrated right now — please bear with us.