Category Archives: sysadmin

xtrabackup innodb_file_per_table gotcha

Several months ago, I’d switched to using Percona’s xtrabackup & innobackupex for all of my mysql backup needs. I had successfully used these backups to restore and replicate databases across several systems. It is good stuff.

Last week, I needed to set up new replication of an 80gb database. This should have been routine by now, but when I attempted to prepare the backup this time, it whined and complained and failed. I was kind of frazzled by the time I gave up on the issue and declared it a fluke of one sort or another.

Last night, I tried again from Sunday’s full backup, and it happened again:

I gave up after poking a few things.

This morning’s fresh look turned up this bug report.

That’s right. There’s a bug in innodb restoration that interprets location of /tmp (configurable in my.cnf) to be relative in stead of absolute.

So, if you have problems while trying to restore from an xtrabackup/ibbackup snapshot (or if you’re trying to recover innodb after a crash), just creating the offending tmp directory appears to work.

init.d template

This is a rudimentary template that I’ve been using for very quick and dirty /etc/init.d scripts recently.

It works under the assumption that your server daemon has a unique name and only ever runs a single instance – this also means that the binary and the init.d script cannot share a name – otherwise strange things happen 😉

Actual invocation logic may need to be updated on a per-service basis and chkconfig style headers would have to be added manually, but it works well for what it is.

i <3 rsync

I really love rsync. I’ll get to the specifics later, but first, the excessive backstory.

I’ve been doing a lot of backup scripting recently. Yes, there are tons of commercial apps out there, but none of them that I’ve looked into are a perfect match for all of our needs. I’ll eventually settle on one and it will probably replace 80% of my scripts, but plenty will remain.

One problem that I’ve encountered while doing the whole backup juggling bit is the ferocious rate of change in the nature of the data we’re archiving. Code I’d written a year ago was obsoleted 6 months ago by code that was obsoleted 3 months ago by code that I replaced a few weeks back that is being replaced by the code I’m writing right now.

Another one of the problems is that the sheer quantity of data involved is growing in a very uncontrolled way. Early last May (the oldest archive I have easy access to), a full archive of the entire system was barely 3gb in size. Today, it is closer to 60gb. ~20x growth over the last 9 months.

It’s been fun, if somewhat frustrating, dealing with all of the growth.

On January 20th, I needed to perform a long-deprecated sort of snapshot. The code that generated this sort of file no longer worked because so many things had changed. I wound up digging out old scripts from SVN and updating them to run against the new environment.

Because of the amount of data involved, this took a very long time. It didn’t help that the scripts consumed an unfair amount of system resources – I couldn’t run them with any meaningful priority during the day without crippling everyone else.

Lots of low priority io later, I finally had a 54gb tar file… In one of the three places I needed it.

The first transfer was simple, the hosts are on the same gigabit switch as each other. Unfortunately, scping that much data between two hosts at that kind of speed has negative effects on the systems involved. I had to throttle the transfer way down to before it could run without visibly impacting performance.

[code]
rsync –partial –bwlimit=10000 -e “ssh -i ${RSA_KEYFILE}” ${LOCAL_FNAME} {REMOTE_USER}@${REMOTE_HOST}:${REMOTE_FNAME}
[/code]

The second transfer… wasn’t so easy. I needed to move the file to my office without negatively impacting everyone’s ability to work – and I couldn’t wait for the transfer to run at low enough speeds not to cripple the T1.

We have a backup 6mbit DSL link that I only use for emergencies and for testing. Even at a full 6mbit, the transfer would have taken more than 36 hours. Compressing the file took a while but brought the file size down to a much more manageable 24gb (~11 hours over the DSL).

The only remaining gotcha was that DSL link can’t actually SSH through the firewall into the colo 😉 So… I started the transfer over https last night and went home.

This morning, it was finally time to decompress the monstrosity locally, but I noticed a hiccup in dsl traffic overnight and figured I’d run a check on things first – just to make sure that http resume had worked correctly.
[code]
ammon@scruffy:~$ gunzip –test archive_2009_01_20.tar.gz
gunzip: archive_2009_01_20.tar.gz: invalid compressed data–format violated
[/code]
This was not good. I had a 24gb file that was somehow corrupted… somewhere.

Since re-downloading the whole thing would cost me another whole day… I had to find out a way to repair the file in a reasonable amount of time. Some research and suggestion gathering later, it was confirmed that rsync would probably handle the task.

Assuming that I wouldn’t be using an unfair amount of bandwidth for this, I switched back to the T1 link so I could tunnel through SSH again.
[code]
ammon@scruffy:~$ rsync –checksum –inplace -e “ssh” wernstrom:/tmp/archive_2009_01_20.tar.gz archive_2009_01_20.tar.gz

sent 1280578 bytes received 1440757 bytes 2622.97 bytes/sec
total size is 25619572576 speedup is 9414.34

ammon@scruffy:~$ gunzip –test archive_2009_01_20.tar.gz
ammon@scruffy:~$[/code](remember, this is unix, no output implies success)

So, yeah. Rsync, I love it when you work. 😉

It took some time and generated a lot of disk activity when the process started, but it worked almost painlessly and only transferred the data I needed – thus leaving the shared network resource free for everyone else 🙂

ccent

So… I just finished the first half of my CCNA today.

I never really cared about networking much beyond that needed to make sure clients on a lan can talk to their dns server… but we’ve been growing enough here at work that the needs quickly outpaced my prior skillset. And since I was the closest thing we had to a network admin, I got signed up for classes 😉

It’s been fun and profoundly enlightening. I didn’t expect to have my way of thinking so radically altered, but I’m hardly complaining.

I’ll be starting up the second class in a week or two. This’ll include such topics as VLANs, IPv6, and fancy routing protocols. I’m stoked.

It’s funny. I never finished college (though I took classes for roughly 10 years), so this is actually the first certificate of education I’ve received since highschool.

The comment was made at work that I’d dinged as a sysadmin. I haven’t. I’m just cherry picking my next few levels in netadmin 😉

svn get revision

One of the more annoying things about svn is that (to my knowledge), there exists no single simple command to retrieve the revision number from a shell.

What I want:
[code]
ammon@hermes:~/repo$ svn info –get-revision .
1234
[/code]

But of course, nothing like this exists.

Thankfully, svn info’s output IS easy enough to parse. You just have to do it your self.

[code]
ammon@hermes:~/repo$ svn info | grep Revision | awk — ‘{print $2}’
1234
[/code]

Will give you the revision of your current checkout without the network hit of a call to svn log.

To get the current version of the repo itself (hits the network), add “-r HEAD” to the svn info call:

[code]
ammon@hermes:~/repo$ svn info -r HEAD | grep Revision | awk — ‘{print $2}’
1280
[/code]

Of course, svn info also supports outputting info as xml, so you could use that to parse things in a more advanced environment but one where you’re still not using the svn api bindings.

flash policy service daemon

Sorry it took me so long to post this, but WordPress 2.5 doesn’t seem to like me trying to upload gz/zip files, so I had to upload the source manually.

Well, it’s been months since I promised to post some usable socket policy service code, so I will.

The script here is meant to serve as a good starting point for people whose servers need to allow flash clients to make socket connections. I have not actually used this exact code in a production environment, but I have been using code that is 99% identical for a while now. I am confident that any blatant flaws are the result of simple copy-paste errors as I compiled the package. Please let me know if you find any.

I have however, stress tested the heck out of this service. One instance successfully served up over 16000 policy file requests fed into it as rapidly as I could send them. The same networking code has also handled requests from at least 100 different hosts at roughly the same time.

Everything has been combined into a single cli php script that requires no special installation. Just plop it down on the server and run it as root. It will take care of the rest. The config defaults should be safe, but you probably want to specify them more clearly – just to be safe.

The daemon is made of three classes:

  • Logger – A rudimentary log file management class that I copy from project to project in one form or another. The included version is stripped down from some of the other versions I’ve written, and I’m planning on releasing a more feature-rich version in the future.
  • Daemon – A simple class for daemonizing a process. Adapted and re-adapted countless times from an original php4 class I found on the net a few years ago by some guy named Seth (whose email domain no longer exists).
  • FlashPolicyService – The meat and potatoes, a child of Daemon. Mostly, this is just the requisite networking code and glue to make everything work together.

As with any of my other code, this is licensed under CC Attribution 3.0.

Download:

Source code after the jump.
Continue reading flash policy service daemon