PostgreSQL 9.0 High Performance
上QQ阅读APP看书,第一时间看更新

Chapter 1. PostgreSQL Versions

PostgreSQL certainly has a reputation. It's known for having a rich feature set and very stable software releases. The secure stance which its default configuration takes is simultaneously praised by security fans and criticized for its learning curve. The SQL-specification conformance and data integrity features allow only the strictest ways to interact with the database, which is surprising to those who come from a background working with looser desktop database software. All of these points have an element of truth to them.

Another part of PostgreSQL's reputation is that it's slow. This too has some truth to it, even today. There are many database operations where "the right thing" takes longer to do than the alternative. As the simplest example of this, consider the date "February 29, 2009". With no leap year in 2009, that date is only valid as an abstract one. It's not possible for this to be the real date of something that happened. If you ask the database to store this value into a standard date field, it can just do that, the fast approach. Alternatively, it can check whether that date is valid to store into the destination field, note that there is no such date in a regular calendar, and reject your change. That's always going to be slower. PostgreSQL is designed by and for the sort of people who don't like cutting corners just to make things faster or easier, and in cases where the only way you can properly handle something takes a while that may be the only option available.

However, once you have a correct implementation of something, you can then go back and optimize it. That's the mode PostgreSQL has been in for the last few years. PostgreSQL usually rises above these smaller issues to give excellent database performance. Parts of it have the sort of great design that outperforms simpler approaches, even after paying the overhead that complexity can introduce. This is a fairly recent phenomenon though, which explains quite a bit about the perception that PostgreSQL is a slower database than its competitors.

Performance of historical PostgreSQL releases

In November of 2005, PostgreSQL 8.1 was released. It included a number of internal architectural changes, some of which aimed to improve how fast the database would run on a multi-processor system with many active clients. The result was a major improvement in the ability of the database to scale upwards to handle a heavy load. Benchmarks on modern hardware really highlight just how far that version leapfrogged earlier ones. You can find an excellent performance comparison of versions 8.0 through 8.4 from György Vilmos at http://suckit.blog.hu/2009/09/29/postgresql_history. This shows exactly how dramatic these improvements have been. These tests use the Online Transaction Processing (OLTP) test of the sysBench benchmarking software, available at http://sysbench.sourceforge.net/.

This test gives a transactions per second (TPS) figure that measures the total system speed, and you can run it in either a read-only mode or one that includes writes. The read-only performance improved by over four times from 8.0 to 8.1 and more than doubled again by 8.3:

 

Version           Peak Read-Only TPS           # of clients at peak
8.0.21           1256           4
8.1.17           5620           14
8.2.13           8109           18
8.3.7           13984           22
8.4.1           13546           22

The rise in the number of clients at the peak load gives us an idea of how well the database internals handle access to shared resources. The area 8.1 in particular included a significant upgrade. Performance improved similarly on the write side, with almost an 8 times gain between 8.0 and 8.3:

 

Version           Peak Write TPS           # of clients at peak
8.0.21           361           2
8.1.17           873           10
8.2.13           1358           14
8.3.7           2795           18
8.4.1           2713           12

The small decrease in performance from 8.3 to 8.4 in both these tests is due to some subtle re-tuning of the database to improve its worst-case performance. More statistics are collected in 8.4 to improve complicated queries, at the expense of slightly slowing the sort of trivial ones tested here. There's more about this Chapter 10, Query Optimization.

These improvements have been confirmed by other benchmarking results, albeit normally not covering such a wide range of versions. It's easy to see that any conclusion about PostgreSQL performance reached before late 2005, when 8.1 shipped, is completely out of date at this point. The speed improvement in 2008's 8.3 release was an additional large leap. Versions before 8.3 are not representative of the current performance and there are other reasons to prefer using that one or a later one too.

Choosing a version to deploy

Because of these dramatic gains, if you have an older PostgreSQL system you'd like to make faster, the very first thing you should ask yourself is not how to tweak its settings, but instead, if it's possible to upgrade to a newer version. If you're starting a new project, 8.3 is the earliest version you should consider. In addition to the performance improvements, there were some changes to that version that impact application coding that you'd be better off to start with to avoid needing to retrofit later.

Chapter 16, Avoiding Common Problems includes a reference guide to what performance-related features were added to each major version of PostgreSQL, from 8.1 through 9.0. You might discover that one of the features only available in a very recent version is compelling to you, and therefore have a strong preference to use that one. Many of these version-specific changes are also highlighted throughout the book.

Upgrading to a newer major version

Until very recently, the only way to upgrade an existing PostgreSQL version to a newer major version, such as going from 8.1.X to 8.2.X, was to dump and reload. The pg_dump and/or pg_dumpall programs are used to write the entire contents of the database to a file, using the newer versions of those programs. That way, if any changes need to be made to upgrade, the newer dumping program can try to handle them. Not all upgrade changes will happen automatically though. Then, depending on the format you dumped in, you can either restore that just by running the script it generates or use the pg_restore program to handle that task. pg_restore can be a much better alternative in newer PostgreSQL versions that include a version with parallel restore capabilities.

Note

If you are using a system that doesn't easily allow you to run more than one system with PostgreSQL version at a time, such as the current RedHat Linux RPM packages, getting both old and new versions of PostgreSQL installed on your system at the same time can be difficult. There are some changes to improve this situation under development for PostgreSQL 9.0. Make sure to check the feasibility of running more than one version at once as part of planning an upgrade.

Dumping can take a while, and restoring can take even longer. While this is going on, your database likely needs to be down, so that you don't allow any changes that won't then be migrated over by the dump. For large databases, this downtime can be both large and unacceptable.

The most demanding sites prefer near zero downtime, to run 24/7. There a dump and reload is never an acceptable option. Until recently, the only real approach available for doing PostgreSQL upgrades in those environments has been using statement replication to do so. Slony is the most popular tool for that, and more information about it is in the Chapter 14, Scaling with Replication. One of Slony's features is that you don't have to be running the same version of PostgreSQL on all the nodes you are replicating to. You can bring up a new node running a newer PostgreSQL version, wait for replication to complete, and then switch over once it matches the original.

Now, there is another way available that works without needing any replication software. A program originally called pg_migrator at http://pgfoundry.org/projects/pg-migrator/ is capable of upgrading from 8.3 to 8.4 without the dump and reload. This process is called in-place upgrading. You need to test this carefully, and there are both known limitations and likely still unknown ones related to less popular PostgreSQL features. Be sure to read the documentation of the upgrade tool very carefully. Starting in PostgreSQL 9.0, this module is included with the core database, with the name changed to pg_upgrade. While all in-place upgrades have some risk and need careful testing, in many cases, pg_upgrade will take you from 8.3 or 8.4 to 9.0 and hopefully beyond.

The PostgreSQL development community is now committed to allowing in-place upgrades to future versions. Now that terabyte and larger PostgreSQL installs are common, upgrading only using dump and reload just isn't always practical.

Upgrades to PostgreSQL 8.3+ from earlier ones

The major internal changes of 8.3 make it impossible to upgrade from any earlier version past it without dumping the entire database and reloading it into the later one. This makes 8.3 a doubly important version milestone to cross. Not only is it much faster than 8.2, once your data is in 8.3, you can perform in-place upgrades from there.

Going from an earlier version to PostgreSQL 8.3 or later can be a difficult change. Some older applications rely on non-character data types being transparently cast to the TEXT type, a behavior removed from 8.3 for a variety of reasons. See http://www.postgresql.org/docs/8.3/static/release-8-3.html for details.

While there's always a chance that upgrading your database version can introduce new issues, it is particularly likely that applications written against an earlier version will need to be updated to work against 8.3 or later. It is possible to work around this issue by manually adding back the automatic typecasting features that were removed. http://petereisentraut.blogspot.com/2008/03/readding-implicit-casts-in-postgresql.html provides a sample. However, fixing the behavior in your application instead is a more robust and sustainable solution to the problem. The old behavior was eliminated because it caused subtle application issues. If you just add it back, you'll both be exposed to those and need to continue doing this extra cast addition step with every new PostgreSQL release. There is more information available at http://blog.endpoint.com/2010/01/postgres-upgrades-ten-problems-and.html on this topic and on the general challenges of doing a major PostgreSQL upgrade.

Minor version upgrades

A dump/reload or the use of tools like pg_upgrade is not needed for minor version updates, for example, going from 8.4.1 to 8.4.2. These simply require stopping the server, installing the new version, and then running the newer database binary against the existing server data files. Some people avoid ever doing such upgrades once their application is running for fear that a change in the database will cause a problem. This should never be the case for PostgreSQL. The policy of the PostgreSQL project described at http://www.postgresql.org/support/versioning states very clearly:

While upgrades always have some risk, PostgreSQL minor releases fix only frequently-encountered security and data corruption bugs to reduce the risk of upgrading. The community considers not upgrading to be riskier than upgrading.

You should never find an unexpected change that breaks an application in a minor PostgreSQL upgrade. Bug, security, and corruption fixes are always done in a way that minimizes the odds of introducing an externally visible behavior change, and if that's not possible, the reason why and the suggested workarounds will be detailed in the release notes. What you will find is that some subtle problems, resulting from resolved bugs, can clear up even after a minor version update. It's not uncommon to discover a report of a problem to one of the PostgreSQL mailing lists is resolved in the latest minor version update compatible with that installation, and upgrading to that version is all that's needed to make the issue go away.