PostgreSQL for Data Architects
上QQ阅读APP看书,第一时间看更新

WAL and the WAL writer process

When we make changes to the data, the changes are not written to the data files immediately, as mentioned before (probably many times). Changes are made to the blocks in the buffer and records of these changes are written to the WAL buffer (as soon as changes to data are made). The changes are flushed to the WAL segments when the changes are committed.

In the pg_xlog directory, the WAL segments are each 16 MB in size:

[postgres@MyCentOS pg_xlog]$ pwd
/pgdata/9.3/pg_xlog
[postgres@MyCentOS pg_xlog]$ ls -alrt
total 16396
drwx------. 2 postgres postgres 4096 Oct 13 13:23 archive_status
drwx------. 3 postgres postgres 4096 Oct 13 13:23 .
drwx------. 15 postgres postgres 4096 Nov 15 20:17 ..
-rw-------. 1 postgres postgres 16777216 Nov 15 20:17 000000010000000000000001

We can find out the segment PostgreSQL is writing to now using the pg_current_xlog_location function:

[postgres@MyCentOS pg_xlog]$ psql
psql (9.3.0)
Type "help" for help.

postgres=# SELECT pg_xlogfile_name(pg_current_xlog_location());
     pg_xlogfile_name     
--------------------------
 000000010000000000000001
(1 row)

The name is not a random collection of digits and numbers. It's comprised of three parts of 8 characters each:

000000010000000000000001

The digits are classified as follows:

  • The first 8 digits identifies the timeline
  • The following 8 digits identifies the (logical) xlog file
  • The last ones represent the (physical) xlog file (Segment)

Each segment contains blocks of 8K. Usually, PostgreSQL moves from one segment to the next when one segment is filled up, that is, all the 16 MB is filled. However, it's also possible to trigger the switch.

There are many functions related to WAL, which are useful in archival, recovery, and so on.

For example, pg_switch_xlog moves to the next transaction log file, allowing the current file to be archived. If there has been no transaction log activity since the last transaction log switch, pg_switch_xlog does nothing and returns the start location of the transaction log file currently in use. WAL is mostly written to and rarely read from. Cases when WAL is read from include:

  • Recovery
  • Server startup
  • Replication

Let's take a look at a few uses of WAL.

Recovery

This is the primary purpose of the WAL concept. Here, we are referring to recovering transactions that have been committed, but have not found their way to the data files. All the changes made to the database will find their way into the WAL segments, irrespective of whether the changes have been reflected into the data files or not. In fact, it's mandatory that changes have been written to WAL files before they are written to the data files themselves. Loss of WAL files almost certainly means lost transactions.

Incremental backup and point-in-time recovery

We can take a snapshot of the PostgreSQL filesystem and then set up a WAL archival process. The snapshot taken need not be a consistent one. The WAL segments generated will keep getting archived and we can use the snapshot and the archived WAL segment to perform a point-in-time recovery. In this process (which will be covered in Chapter 10, Scaling, Replication, and Backup and Recovery), we restore the file snapshot, and then replay the WAL segments until a specific point in time or until a transaction.

Replication

WAL segments have been used for the two purposes mentioned earlier. From version 9.0, WAL segments have been used for replication also. The rationale is simple. All the changes happening in the server are being recorded in the WAL segments anyway. Why not use these and get a stand-by server ready for failover?

We will cover the steps to set these up in Chapter 10, Scaling, Replication, and Backup and Recovery.

WAL also reduces the number of disk writes necessary to guarantee that a transaction is committed, thus improving performance. This is achieved because WAL writes are sequential. If a number of small transactions are committed, they will appear sequentially in the log. If the database were to work without WAL, a transaction commit will immediately result in writing data out to all the data files that were affected by the transaction.

Now, let's also take a look at a few key parameters related to WAL. Change the directory to /pgdata/9.3 and execute the following command:

[postgres@MyCentOS 9.3]$ grep wal postgresql.conf 
wal_level = archive # minimal, archive, or hot_standby
#wal_sync_method = fsync # the default is the first option
#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers
#wal_writer_delay = 200ms # 1-10000 milliseconds
#max_wal_senders = 0 # max number of walsender processes
#wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables
#wal_receiver_status_interval = 10s # send replies at least this often
#wal_receiver_timeout = 60s # time that receiver waits for

The wal_level is an important parameter as this setting has a direct impact on what we can do with WAL, whether we can use it only for recovery from a crash, or we can use it also to archive for a point-in-time recovery, or whether we can use it to create a stand-by server. This setting determines the amount of information written to WAL. This is an enum type of value, which means that it can have one of a possible list of values. The possible values are minimal, archive, and hot_standby. The default value is minimal, which writes only the information needed to recover from a crash or immediate shutdown. The archive value adds logging required for WAL archiving, and hot_standby further adds information required to run read-only queries on a standby server. This parameter can only be set at server start.

In minimal level, WAL-logging of some bulk operations can be safely skipped, which can make the operations faster. Minimal WAL does not contain enough information to reconstruct the data from a base backup and the WAL logs, so either archive or the hot_standby level should be used to enable WAL archiving (archive_mode) and streaming replication.

In the hot_standby level, the same information is logged as with archive, plus the information needed to reconstruct the status of running transactions from the WAL. To enable read-only queries on a standby server, wal_level must be set to hot_standby on the primary, and a hot_standby setting must be enabled in the standby.

When fsync is on, the PostgreSQL server tries to ensure that data really gets flushed to the disk (remember the different levels of cache?). Setting fsync to off might result in significant gains in performance, but carries a risk. This can result in an unrecoverable data corruption in the event of a power failure or a system crash. In cases where fsync is turned off, wal_sync_method is not relevant at all. When fsync is on, we could choose one of many options, such as open_datasync, fdatasync, fsync, and so on; each one has a different way to ensure that data gets flushed to disk. A description of these methods is available at http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-SYNC-METHOD.

For an in-depth discussion on the fsync options and its impact on the performance and possibility of data loss, please refer to PostgreSQL 9.0 High Performance, Gregory Smith, Packt Publishing.

The wal_buffers memory is the amount of shared memory set aside for the WAL buffer. Setting this to -1 will result in an automatic selection based on the shared_buffers setting. In automatic setting, the value can range from 64 KB to 16 MB. When set manually, the minimum possible value is 32 KB.

Another decision is how frequently the WAL writer should flush WAL to disk. What we need to remember is that this will be far more frequent than checkpoint frequency. The WAL flush interval is indicated by wal_writer_delay. The default value is 200 milliseconds.

There are a few more parameters relevant for replication and we will cover them in Chapter 10, Scaling, Replication, and Backup and Recovery.