Database Replication in PostgreSQL

March 27, 2025

Introduction

If you've ever wished your database could be in two places at once, I'm here to tell you how to do just that. The database is at the heart of any application. Therefore, we need to ensure its stability and use it to derive insights from our application. Database replication is one important way we can achieve both these objectives. But first, we need to answer the question:

What is database replication?

Database replication is the process of copying and maintaining database objects, such as tables and their data, from one database (the primary) to another (the replica). Think of it as a synchronized dance where the primary server leads, and the replica follows every step, ensuring that the data remains consistent across both. This is achieved through mechanisms like Write-Ahead Logs (WAL) shipping, which ensures that every change made to the primary database is faithfully replayed on the replica.

Why and when is database replication used?

Broadly speaking, we use database replication for two categories of cases:

1. Operational resilience

It's always good to have a backup. A million terrible things can happen to your database, so it is paramount to have contingency plans. In addition, it's also good to have additional databases that can support your primary one. Concretely, we can use database replication for:

High Availability: If your primary database decides to give up and go down unexpectedly, replicas can step in to keep your application running smoothly.
Load Balancing: When your application is under heavy load, replicas can share the burden by handling read queries, leaving the primary to focus on write operations.
Disaster Recovery: In case of catastrophic failures (think hardware crashes or accidental DROP TABLE commands), replicas can save the day by providing a copy of your data.

2. Data analytics

Replication is particularly useful in scenarios where Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) workloads need to be separated.

OLTP Workloads: These involve high-frequency, short-duration transactions, such as inserting, updating, or deleting records in real-time applications like e-commerce platforms or banking systems. The primary database is optimized for these operations, ensuring low latency and high throughput.
OLAP Workloads: These involve complex, long-running queries for data analysis, reporting, and business intelligence. Analytical queries often require scanning large datasets and aggregating results, which can strain the primary database if run alongside OLTP operations.

By replicating the primary database to one or more replicas, OLAP queries can be offloaded to the replicas, ensuring that the primary database remains responsive for transactional operations. This separation of concerns not only improves performance but also allows data engineers to optimize replicas specifically for analytical workloads, such as creating additional indexes or materialized views without impacting the primary database.

TLDR

In summary, database replication is the process of creating identical replica(s) of your database to ensure either to ensure its operational resilience or to optimize for analytical queries without impacting the transactional database's performance.

Create a PostgreSQL database replica

Prerequisites

Before proceeding, ensure you have PostgreSQL installed on your device. You can find installation instructions for your operating system on the official PostgreSQL website:

PostgreSQL Installation Guide

Verify the installation by checking the version:

psql --version

Create a primary cluster

which initdb

sudo initdb -D /tmp/primary_db

Configure the primary cluster

vi /tmp/primary_db/postgresql.conf

In the postgresql.conf file, add:

listen_addresses = '*'          # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost'; use '*' for all
                                        # (change requires restart)
                                        # (change requires restart)
port = 5432                             # (change requires restart)
max_connections = 100                   # (change requires restart)

Start cluster server

sudo pg_ctl -D /tmp/primary_db start

Connect to the database and create a user

psql -p 5432 postgres

postgres=# CREATE USER replicator REPLICATION;
CREATE ROLE

Configure client authentication

vi /tmp/primary_db/pg_hba.conf

Edit pg_hba.conf and add:

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    all             replicator      127.0.0.1/32            trust

Restart the database for changes to take effect:

sudo pg_ctl -D /tmp/primary_db restart

Create replica cluster

sudo pg_basebackup -h localhost -U replicator --checkpoint=fast -D /tmp/replica_db -R --slot=slot_name -C --port=5432

The port should be the same as the primary cluster’s when creating the replica.

Edit the postgresql.conf file and change the port for the replica to run on a different port:

vi /tmp/replica_db/postgresql.conf

# - Connection Settings -

listen_addresses = '*'                  # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost'; use '*' for all
                                        # (change requires restart)
port = 5434                             # (change requires restart)

Start the replica

sudo pg_ctl -D /tmp/replica_db start

Test replication

Use psql to connect to the primary cluster

psql postgres -p 5432

Check replica count and status by running:

postgres=# \x
Expanded display is on.
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 42579
usesysid         | 16384
usename          | replicator
application_name | walreceiver
client_addr      | ::1
client_hostname  |
client_port      | 65406
backend_start    | 2022-10-10 15:07:59.376224+08
backend_xmin     |
state            | streaming
sent_lsn         | 0/9000148
write_lsn        | 0/9000148
flush_lsn        | 0/9000148
replay_lsn       | 0/9000148
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
reply_time       | 2022-10-10 15:09:09.565283+08

There should be one record for each replica

Connect to the replica cluster and check if WAL receiving is running

psql postgres -p 5434

postgres=# \x
Expanded display is on.

postgres=# SELECT * FROM pg_stat_wal_receiver;

-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 42578
status                | streaming
receive_start_lsn     | 0/9000000
receive_start_tli     | 1
written_lsn           | 0/9000148
flushed_lsn           | 0/9000000
received_tli          | 1
last_msg_send_time    | 2022-10-10 15:16:59.725174+08
last_msg_receipt_time | 2022-10-10 15:16:59.725259+08
latest_end_lsn        | 0/9000148
latest_end_time       | 2022-10-10 15:07:59.416296+08
slot_name             | replica_slot
sender_host           | localhost
sender_port           | 5432
conninfo              | user=replicator passfile=/Users/khalid/.pgpass channel_binding=prefer dbname=replication host=localhost port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any

Or create a database and a table in the primary cluster and insert data into it. Then check if the data is replicated in the replica.