PostgreSQL 9.0 High Performance
上QQ阅读APP看书,第一时间看更新

Physical disk performance

While a lot of high-level information about disk performance has been mentioned already, if you want to get useful benchmarks from drives you'll need to know a bit more about their physical characteristics. This will drop back to theory for a bit, followed by examples of real measured disks that demonstrate common things you can expect to see.

Random access and I/Os Per Second

Enterprise storage vendors like to talk in terms of Input/Outputs Per Second or IOPS. If you're buying a SAN for example, expect to be asked "how many IOPS do you expect in total and per spindle?" and for measurements provided by the vendor proving good performance to be in this unit. This number represents typical disk performance on a seek-heavy workload, and unfortunately it is a poor one to fixate on for database applications. Databases are often complicated mixes of I/O with caching involved—sequential reads, seeks, and commits all compete—rather than always being seek-bound.

Note

Spindle is often used as a synonym for a single disk drive, and is used interchangeably here that way. It's more correctly used to only refer to a single section of a disk, the part that rotates. In this case common use trumps correctness for most writing about this subject.

It's straightforward to compute IOPS for a single disk. You'll need to track down the manufacturer data sheet where they give the detailed timing specifications for the drive. The Seagate Momentus 7200.4 laptop drive used in the examples here has the following specifications:

  • Spindle Speed: 7,200 RPM
  • Average latency: 4.17 ms
  • Random read seek time: 11.0 ms

This models the fact that every disk access on a drive requires:

  1. Seeking to the right track on the disk. That's the "random read seek time".
  2. Waiting for the sector we want to read to show up under the read head. That's the "Average [rotational] latency" time.

The "average latency" figure here represents rotational latency. That will always be exactly 1/2 of the rotation time of the drive. In this case, 7200RPM means one rotation happens every 1/120 of a second, which means a rotation every 8.33 ms. Since, on an average you won't have to wait for a full rotation, that's halved to give an average, making for an expected rotation latency time of 4.17 ms. All 7200 RPM drives will have an identical rotational latency figure, whereas seek times vary based on drive size, quality, and similar factors.

IOPS is simply a measurement of the average time for both those operations, the seek latency and the rotation latency, inverted to be a rate instead of an elapsed time. For our sample disk, it can be computed like the following:

Rotational latency RL = 1 / RPM / 60 / 2 = 4.17ms
Seek time S=11.0ms
IOPS = 1/(RL + S)
IOPS = 1/(4.17ms + 11ms) = 65.9 IOPS

Here are a few resources discussing IOPS, including a calculator that you might find helpful:

Remember that IOPS is always a worst-case scenario. This is the performance the drive is guaranteed to deliver, if it's being hammered by requests from all over the place. It will often do better, particularly on sequential reads and writes.

Sequential access and ZCAV

In many database situations, what you're also concerned about is the streaming sequential read or write rate of the drive, where it's just staying in one area instead of seeking around. Computing this value is complicated by the nature of how disks are built.

The first thing to realize about modern hard disks is that the speed you'll see from them depends highly on what part of the disk you're reading from. Disks spin at one speed all of the time, referred to as Constant Angular Velocity or CAV. A typical drive nowadays will spin at 7200 RPM, and the actual disk platter is circular. When the disk read/write head is near the outside of the disk, the speed of the part passing underneath it is faster than on the inside. This is the same way that in a car, the outside edge of a tire travels further than the inside one, even though the actual rotation count is the same.

Because of this speed difference, manufacturers are able to pack more data onto the outside edge of the drive than the inside. The drives are actually mapped into a series of zones with different densities on them. There is a longer discussion of this topic at http://www.coker.com.au/bonnie++/zcav/ and using the zcav tool will be shown later.

The practical result is that the logical beginning part of the disk is going to be significantly faster than its end. Accordingly, whenever you benchmark a disk, you have to consider what part of that disk you're measuring. Many disk benchmark attempts give bad data because they're comparing a fast part of the disk, likely the first files put onto the disk, with ones created later that are likely on a slower part.

Short stroking

As disks have this very clear, faster portion to them, and capacities are very large, one observation you can easily make is that you should put the most important pieces of data on the early parts of the disk. One popular technique named short stroking limits the portion of the disk used to only include the fastest part, assuring you'll only be accessing its best area. Short stroking can be done just by adjusting the disk's partition table to only include the early part. You might partition the slower portion anyway, but just not use it regularly. Saving it for backups or migration use can be worthwhile. Occasionally you can force short stroking using more physical means, such as a disk vendor or RAID controller tool that allows limiting the capacity exposed to the operating system.

Commit rate

As covered in the previous chapter, how fast data can actually be committed permanently to disk is a critical performance aspect for database transaction processing. It's important to measure this area carefully. Speeds that are dramatically higher than expected are usually a sign one of the write-caches has been put into a volatile write-back mode, which as already explained can result in data loss and database corruption. Some examples of how that can happen will be covered in Chapter 4, Disk Setup.

If you don't have any non-volatile caching available, the basic commit rate for a drive will be similar to its IOPS rating. Luckily PostgreSQL will put multiple transactions into a physical commit if they aren't happening quickly enough.

PostgreSQL test_fsync

In a source code build of PostgreSQL, the src/tools/fsync directory contains a program named test_fsync that might also be included in some packaged versions. This aims to test the commit rate for each of the ways a given PostgreSQL install might commit records to disk. Unfortunately this program doesn't give results consistent with other tests, and before PostgreSQL 9.0 it's in the wrong units (elapsed times instead of operations per second). Until it's improved a bit further its output can't be relied upon.

INSERT rate

Each time you INSERT a record in a standard PostgreSQL install, it does a commit at the end. Therefore, any program that does a series of inserts in a loop and times them can measure the effective commit rate, presuming the records are small enough that true disk throughput doesn't become the limiting factor. It's possible to run exactly such a test using the pgbench tool shipped with PostgreSQL. You should be able to write your own similar test in any programming language you're familiar with, that can issue PostgreSQL INSERT statements one at a time. Just make sure you don't batch them into a larger transaction block. That's the right approach if you actually want good performance, but not for specifically testing the commit rate using small transactions.

Windows commit rate

On the Windows platform, where sysbench and test_fsync will not be available, an INSERT test is really the only good option for testing commit rate. Note that the PostgreSQL wal_sync_method, covered in a later chapter, needs to be set properly for this test to give valid results. Like most platforms, the Windows defaults will include unsafe write-back cache behavior.