AWS Tip: Warm your EBS volumes

This post is inspired by the above tweet.  EBS volumes (Amazon's equivalent to a SAN as a Service) have different performance characteristics before and after their first reads and writes.  The read side variability is particularly strong on the first read of volumes that were restored from a snapshot.  This makes a lot of sense when you realize that the snapshots are stored in S3 behind the scenes, and are lazy-loaded on first read.

The performance effects aren't clearly documented by Amazon, so I figured it was worth sharing my two key findings:

1) Reading every block of a snapshot is an extremely Good Thing.  This is partially because of the Amazon documented '5-50% decrease in IOPS' on first read; but my primary driver for doing this is to cut off the high-latency tail.  The tail latency on cold EBS volumes is occasionally so high that MySQL bombs out having waited for a read for 60+ seconds.  I strongly recommend reading every block (or at least every important block) on EBS volumes that you've restored from a snapshot.

2) Writing to every block of a new volume is a mixed bag.  It provided a small but measurable win on my synthetic benchmarks, but it didn't make a material difference to any of my actual applications.  Unfortunately, it does increase EBS snapshot costs, so writing was a loser for me.

Having found that "cat /dev/vol > /dev/null" or "dd if=/dev/vol of=/dev/null" techniques made a huge difference in tail latency, I decided to try to optimize the warming process by experimenting with block size, number of read threads, and the distribution of those reads within the volume to try to do better.  I came up with 80 combinations that seemed worth exploring, and my first implementation showed that every single combination was worse than using cat.  I then reimplemented my test framework in C instead of Python, thinking that perhaps the overhead of the interpreter was eating the gains.  The C version, at it's best, performed exactly equally to a single-threaded cat/dd solution.  Sometimes the simplest way is the best.