Well, Binky went offline for 50 minutes today, and it was entirely my fault. Well, the downtime was my fault, the duration of the downtime was not.
To make a long story short, I succeeded in telling the server to stop listening to the network. (I had committed some half-completed changes to the routing tables.) This makes remote administration… difficult. Thankfully, we should be a bit more protected from that sort of problem with the services that we will be running from Matt’s colo rack.
Since I was the one who broke things, the response on my end was immediate. I informed Harkins of the downtime, submitted a trouble ticket to the hosting provider, set my IM status message to let people know that I was aware of the outage, and then started hunting for a phone number to get in contact with the service techs directly (since the autoresponse I got from the trouble ticket said 12-24 hours).
The phone number thing didn’t work, I was able to find their billing line, but no number for tech support. There is only email. Not too pleased about that – and it makes me feel more justified in decommissioning Esme (who is also hosted by the same company).
Anyhow, after I’d given up on the phone issue, I finally got an email from a real live person. Five emails and 20 minutes later, the information was finally in the hands of the right tech. He followed my instructions for the fix and had the machine back up in 5 minutes. So, while not terribly accessible, their tech staff is at least competent.
When we came back up, Apache, Bind, and SVN were not running… but Mongrel was. This confuses me, since all four are started by the same script. I will have to look into that further.
Regardless, services are all back and I’m not going to mess with things like this again for a good long while (and not w/o verifying that I know what I’m doing on Hedwig or Vimes first).