Disk Performance, Part 1: How Performance Is Measured

When we as computer users think of disk performance, we usually think about streaming, sequential performance, otherwise known as throughput. Desktop operating systems have trained us to think in this way, because the most prominent display of disk speed that your average person sees is an Explorer or Finder window showing file copy progress — we know that our music collection is being copied at 25 MB per second, for example. This measurement is a good fit for the task, because it gives us the best approximation of how long it will take until the file copy is finished.

In the server world, though, this generally isn’t how disk performance is measured. Servers are shared resources that do much more complicated things with data than typical desktop systems. Most database access is highly random — you pull a record here, a record there, and piece them together in the application. Rows in a MySQL table are usually no more than a couple of kilobytes each, and the rows you need to join together to service one complex query typically live all over the disk. For most other server-side applications, small files are accessed a lot, and large files are accessed infrequently. So instead of throughput, which is measured in bytes/sec, we typically work with a different measurement called IOPS (pronounced i-ops).

IOPS stands for I/O Operations Per Second, and it refers to the average number of random small reads and writes that a disk drive can perform in one second. Let’s start looking at some numbers and calculating something useful.

IOPS for a single [spinning] disk

Even though SSDs are gaining a lot of traction on servers because they supply a lot of random IOPS, I’m going to ignore them here. Why? Because their electronics are too sophisticated to model like spinning disks. To get the IOPS numbers for an SSD, your best bet is to look up some independent benchmarks. If those are unavailable, run your own. If that’s not doable, well, you’ll just have to trust the numbers that your vendor gives you.

To calculate for a single spinning disk, though, you just need need two numbers: the disk’s rotational speed, and its seek time. Below is a table I’ve shamelessly lifted from Ronny Egner:

Metric Formula Disk Drive
RPM (revolutions per minute See vendor data sheet 7200 10000 15000
RPS (revolutions per second) RPM / 60 seconds per minute 120 166.67 250
RPms (revolutions per millisecond) RPM / 60000 milliseconds per minute 0.12 0.17 0.25
Full rotation time in ms 1 / RPMs 8.33 6 4
Avg. rotational latency in ms ½ full rotation time 4.17 3 2
Avg. seek time in ms See vendor data sheet 10 5 4
IO time in ms Avg. rotational latency + avg. seek time 14.17 8 6
IOPS (1 / IO time) * 1000 70.59 125 166.67

As you can see, an average 7200 RPM disk will net you about 70 IOPS, where a 15,000 RPM disk will give you closer to 170. These numbers can be affected by NCQ, caching, and other drive features. (In a RAID array, many of these drive features are disabled, because the controller does better when performance is more deterministic.)

IOPS for a RAID array

I’m going to ignore caching for simplicity.

Things get complicated here because of the huge variance in the ways that RAID arrays work — I’m sure some people will disagree with the observations I’ve made here. I’m going to assume you have a passing familiarity with the most common RAID levels — if not, Wikipedia has a decent reference. However, I’m going to clarify a handful of definitions I’m going to be using, because the storage industry can’t agree on nomenclature:

  • Segment size: The amount of data written to a single disk within a RAID stripe.
  • Stripe width: The amount of data contained in a single RAID stripe (segment size × number of data-bearing disks).

When calculating stripe width for RAID-4/5/6, do not include disks used for storing parity.

RAID-0 (striping without mirroring)

RAID-0 doesn’t need to do any special calculations, and it doesn’t need to write anything twice. As a result, all of the disks can be used at the same time for random reads and writes with no penalties. All stripe width calculations should be made with regards to sequential I/O, rather than random I/O.

IOPS: (IOPS per disk × number of disks)

RAID-0+1 (striping with mirroring)

I’m folding RAID-1 in here.

Most controllers will interleave reads between the mirrored disks in the array, which will double your random read performance versus a single disk. Note that this only speeds up random access, because rotational latency rather than seek time is your bottleneck for sequential I/O. Write performance is slightly worse than a single disk because the same data needs to be written to two disks — for a given write, whichever drive has the longer seek time will be slowing you down. Caching usually makes this irrelevant.

Read IOPS: (IOPS per disk × number of disks)
Write IOPS: (IOPS per disk × (number of disks / 2))

RAID-5 (striping with distributed parity)

RAID-5 is a really complicated case that’s heavily reliant on the relationship between your stripe width and your average I/O size, and that complication is why most major database vendors like Oracle will recommend never running on RAID-5.

Read IOPS: (IOPS per disk × (number of disks – 1))
Write IOPS: (IOPS per disk × (number of disks – 1) × RAID-5 write penalty)

The write penalty is a variable scaling factor that varies depending on how well your workload matches your array configuration. At best, there’s virtually no penalty at all. At worst, you might be getting 20% of your expected disk performance. I’m going to go over why that is, and how to optimize your RAID-5 arrays, in Part 2.

(Duncan Epping over at Yellow Bricks has a post where he uses some constant ratios for his RAID write penalties. I have a philosophical disagreement with this idea, but I’ll link to it because my answer of “it depends” isn’t really constructive either.)

RAID-6 (striping with double parity)

Performance characteristics are almost identical to RAID-5, with two significant differences. First, the parity calculations take an order of magnitude more processing power, because the algorithms are much more sophisticated. Second, two disks in each stripe are reserved for parity data — these disks will not contribute to your IOPS.

Read IOPS: (IOPS per disk × (number of disks – 2))
Write IOPS: (IOPS per disk × (number of disks – 2) × RAID-6 write penalty)

Aggregating I/O profiles

One final thing to note: if you have enough concurrent sequential I/O tasks running at the same time, your I/O profile turns from sequential to random. The array is trying to keep these requests from being starved, and slow is usually better than no data at all, so it starts seeking all over the place instead of streaming a nice, even line of consecutive blocks off the disk. Keep this very much in mind when determining which measurement, IOPS vs. sequential throughput, is a more useful measurement for the workload you’re trying to size.

In Part 2, I’ll go over the impact of stripe sizing and how almost everybody does it completely wrong.

6 Comments

  1. Hi
    Keep on going! im waiting for part2 , as i need to be sure about stripe size when im configuring our server’s raid with different type of needs like mysql server , file hosting and …

  2. me too! I’m looking forward for the part two.. any links?
    it really helps me a lot! I’m doing this research and this article given me much ideas.. thanks!

    Techie

  3. A lot of what I know about spinning disk performance came from IBM’s Best Practices redbook for the Midrange Storage series of SAN hardware. Most of it is probably totally irrelevant to you or anyone else, but there’s a lot of good stuff in there about general RAID performance if you dig for it.

    I’m hoping to break from my usual pattern and have Part 2 up tonight or tomorrow instead of 3 months from now.

  4. There is a small typo in the table. It should be:

    10000 RPM HDD with with 6 ms full rotation time has 3 ms avg. rotation latency. And 15000 RPM HDD has 2 ms avg. rotation latency.

  5. I intended to write you one very little word just to thank you so much yet again with the amazing tips you’ve provided here. It was really incredibly open-handed with people like you to present unhampered all that a lot of people would’ve advertised for an e-book to earn some dough for their own end, mostly considering the fact that you might well have done it in case you desired. These pointers likewise served to be a easy way to comprehend the rest have similar dream much like my personal own to see great deal more regarding this issue. I am certain there are numerous more enjoyable occasions ahead for people who read your blog post.

Leave a Reply to Viliam Pucik Cancel reply

Your email address will not be published.

© 2019 @jgoldschrafe

Theme by Anders NorenUp ↑