
Understanding the shared buffer
When there are thousands of users trying to read/write data to many different tables, reading from the directories/files (which we saw getting created when we installed PostgreSQL and created a database with a couple of tables) will result in a miserably non-scalable system. The reads and writes will result in searching for many files, opening these files, using fseek()
for specific data records, locking, editing, and unlocking. To make this a lot more scalable and faster, the concept of shared buffers (memory area) is introduced. Now, the backend processes are no longer reading from the files and writing to the files, but dealing with buffers or RAM, with significant improvement in performance. The amount of memory to be allocated is decided by the shared_buffers
parameter in postgresql.conf
. This fixed-size block of shared memory is allocated when the server is started.
It's not this memory chunk alone that is responsible for improving the response times, but the OS cache also helps quite a bit by keeping a lot of data ready-to-serve. Together, these two caches result in a significant reduction in the actual number and volume of physical reads and writes. In addition to these two levels of caching, there could be a disk controller cache, disk drive cache, and so on. The bottom line is that these caches improve performance by reducing the physical I/O necessary.
There are also risks associated with huge caches, such as the spikes in I/O, when large volumes of data get flushed from the buffer to disk. For a detailed understanding of the risks and benefits of caches, disks, and physical and logical I/O tuning, PostgreSQL 9.0 High Performance, Gregory Smith, Packt Publishing, is recommended.
Let's just consider the possible routes a simple SELECT
statement might take, considering the shared buffer cache and OS cache alone.
The first thing the process will check is whether the data it wants is available in the database buffer cache. If it is not available in the database buffer cache, a request goes to the OS to fetch the specific file/block(s). There is a chance that the OS cache already has the file/block(s) and passes it to the database cache. In both these cases, a physical I/O is avoided. It's only when the data is not present in either of these caches (or other caches), that a user initialized read/write will really result in a physical I/O. These three possibilities are shown in the following diagram:

It's evident that most of the user-driven data fetches and writes will happen via buffers. Exceptions to this would be databases where the buffer is minuscule compared to the data that is usually read and written. Even in cases where the number of transactions per second is very high, the physical I/O will be limited if these transactions are mostly working with the same datasets. It's only when different transactions are accessing data from really different areas in the file system that the issue of frequent buffer flushes to disk and reads from disk will occur.
Even in a scenario where a user makes a lot of changes to table data and issues a commit, it might not immediately result in writes to the underlying data files. This might just result in ensuring that Write Ahead Log (WAL) files are synchronized with the WAL buffer.
WAL forms a critical component in ensuring the Durability (D) and, to some extent, the Atomicity (A) of ACID properties of transactions. However, first let's continue with the buffer and see how user-changed data finally reaches the data files.
Inspecting the buffer cache
PostgreSQL provides an extension to view what is in the buffer cache. It can be installed in a similar manner to what we did earlier. Log in to psql and create two databases:
CREATE DATABASE test; CREATE DATABASE mydb;
Connect to the test database and execute:
CREATE EXTENSION pg_buffercache; CREATE EXTENSION
So, what happens when we do this? We get an idea by looking at the SQL. At shell prompt, go to the /usr/local/pgsql/share/extension
directory:
[postgres@MyCentOS ~]$ cd /usr/local/pgsql/share/extension [postgres@MyCentOS extension]$ pwd /usr/local/pgsql/share/extension [postgres@MyCentOS extension]$ more pg_buffercache--1.0.sql /* contrib/pg_buffercache/pg_buffercache--1.0.sql */ -- complain if script is sourced in psql, rather than via CREATE EXTENSION \echo Use "CREATE EXTENSION pg_buffercache" to load this file. \quit -- Register the function. CREATE FUNCTION pg_buffercache_pages() RETURNS SETOF RECORD AS 'MODULE_PATHNAME', 'pg_buffercache_pages' LANGUAGE C; -- Create a view for convenient access. CREATE VIEW pg_buffercache AS SELECT P.* FROM pg_buffercache_pages() AS P (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, relforknumber int2, relblocknumber int8, isdirty bool, usagecount int2); -- Don't want these to be available to public. REVOKE ALL ON FUNCTION pg_buffercache_pages() FROM PUBLIC; REVOKE ALL ON pg_buffercache FROM PUBLIC;
Tip
One thing we need to remember is that extensions get installed in a database in the cluster. If we have to use them in another database in the cluster, we must install the extensions in this database too. In PostgreSQL, a cluster refers to a set of databases using the same configuration files, listening for requests at a common port. Clusters, databases, and related objects will be covered in detail in Chapter 3, PostgreSQL – Object Hierarchy and Roles.
Let's connect to the test database and see what is present in the buffer cache:
[postgres@MyCentOS extension]$ psql -d test Type "help" for help. test=# SELECT DISTINCT reldatabase FROM pg_buffercache; reldatabase ------------- 12896 0 24741 (4 rows)
So, we have some part of two databases in the cache. The record with 0
represents buffers that are not used yet:
test=# \! oid2name All databases: Oid Database Name Tablespace ---------------------------------- 16440 mydb pg_default 12896 postgres pg_default 12891 template0 pg_default 1 template1 pg_default 24741 test pg_default
These are the test
database (to which we are connected) and the postgres
database.
We can link this with a couple of other views to get a better picture:
SELECT c.relname, count(*) AS buffers FROM pg_class c JOIN pg_buffercache b ON b.relfilenode=c.relfilenode INNER JOIN pg_database d ON (b.reldatabase=d.oid AND d.datname=current_database()) GROUP BY c.relname ORDER BY 2 DESC; relname | buffers -----------------------------------+--------- pg_operator | 13 pg_depend_reference_index | 11 pg_depend | 9
We see that it is mostly data dictionary views.
Note
There are many data dictionary tables and views that provide us information about various objects, object types, permissions, and so on. These together manage the book-keeping activities for the cluster; pg_class
is one of these catalog tables. One of the columns in this table is relname
. Although, it sounds like it will be storing relation/table names, it can also store data for other object types. We should use it along with the relkind
column. The relkind
column tells us what type of object the record refers to. Possible values in relkind
include r
(table), i
(index), S
(Sequence), v
(view), and so on.
Let's remove relname
from the preceding query and modify it slightly:
SELECT c.relname, count(*) AS buffers FROM pg_class c JOIN pg_buffercache b ON b.relfilenode=c.relfilenode JOIN pg_database d ON (b.reldatabase=d.oid AND d.datname=current_database()) WHERE c.relname NOT LIKE 'pg%' GROUP BY c.relname ORDER BY 2 DESC; relname | buffers ---------+--------- (0 rows)
We will now try populating the buffer with a user-created table. We will first create a table and insert a record:
[postgres@MyCentOS ~]$ psql -d test CREATE TABLE emp(id serial, first_name varchar(50)); INSERT INTO emp(first_name) VALUES('Jayadeva'); SELECT * FROM emp; id | first_name ----+------------ 1 | Jayadeva (1 row)
Note
The serial
keyword refers to autoincrementing integer type. Using this keyword will automatically create a sequence number generator (SEQUENCE
) and the column (id
in the table emp
) will be populated from this sequence. For more information about numeric data types, refer to http://www.postgresql.org/docs/current/static/datatype-numeric.html#datatype-numeric-table.
For details about sequences, refer to http://www.postgresql.org/docs/current/static/sql-createsequence.html.
We can repeat the query to inspect the buffer and check whether the buffer contains the newly created table and its sequence:
relname | buffers ------------+--------- emp_id_seq | 1 emp | 1
Let's modify the query a bit:
SELECT c.relname, b.isdirty FROM pg_class c JOIN pg_buffercache b ON b.relfilenode=c.relfilenode JOIN pg_database d ON (b.reldatabase=d.oid AND d.datname=current_database()) WHERE c.relname not like 'pg%'; relname | isdirty ------------+--------- emp_id_seq | f emp | f
Notice that the isdirty
flag is f
(false):
UPDATE emp SET first_name ='Newname'; UPDATE 1
If we repeat the query for buffers with the isdirty
flag, we will get:
relname | isdirty ------------+--------- emp_id_seq | f emp | t
The output tells us that the buffer is dirty. Let's force a checkpoint:
CHECKPOINT; CHECKPOINT
We will repeat the query:
relname | isdirty ------------+--------- emp_id_seq | f emp | f (1 row)
Now the buffer is no longer dirty.