Its been a rocky week in the MusicBrainz universe, that’s for sure!
About three weeks ago the load on our database server started rising — most likely due to the fact that after the April 1 release people went nutz adding labels to the database and vastly more AR links than before. The onslaught of this extra data pushed our database over an invisible threshold and things started getting shaky.
In order for a database to be running smoothly and efficiently it should mostly fit into RAM and not require the database server to fetch much data from disk continually. Once the threshold is hit where data needs to be continually fetched from disk, everything slows down drastically.
That’s basically what happened three weeks ago — the database outgrew the server we have for it. At first we thought that some feature from the April 1 release was bogging down the server, so we did some triaging with no luck. Finally we decided to throw out nearly 1 GB of useless Add TRM edit data, which shrunk the database size back down to a manageable level.
This, of course, is nothing more than a band aid. In a few weeks this problem will be back. Anticipating this moment for over a year now, I’ve been pushing for a large server donation. The Sun server donation was supposed to perfectly take care of that. But we were having serious issues with getting the database to run well on the Sun box. But, with help of some Sun engineers we’ve gotten past this problem and are now in the final stages of preparing the Sun server for production use.
But, in this middle of all this more disaster struck. Lingling, which was taking over for Stimpy as our primary web server, had a power supply fail early in the morning this past Sunday. Stimpy, with redundant power supplies, was sitting idle waiting to be put back into service after he got a new motherboard from Dell. With all the other problems we didn’t have the time to switch Stimpy back in for Lingling — we had scheduled that to happen about 12 hours after Lingling failed.
Lingling has a new powersupply arriving tomorrow. Moose, the Sun server, may go into service this weekend or early next week. Once we get these two tasks done, the site performance should be back to being zippy.
Until then, I apologize for the inconvenience!