This is an internal documentation. There is a good chance you’re looking for something else. See Disclaimer.

Postgres Replication

What to do if Replication Streaming on Postgres fails

Important! Read this section first before doing anything!

In this section, very powerful commands are described. So before execute anything, be 100% sure that you are on the right server! Else you could cause a lot of damage.

To check if you are on the right server use the following query. On master the result is empty. On slave it is not (see example).

Check if you are on slave

SELECT pg_last_xlog_receive_location();

  pg_last_xlog_receive_location
 -------------------------------
         2493/54AAF118

Check if Replication is working

First of all you should check the delay. If the delay is critical nagios or Backoffice will inform you about it. This will look like this:

Check the delay
```
POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB postgres (host:db01slave) 147866242312 seconds
```
If the number (delay in seconds) is extremely high then you know something must be wrong with the replication.
Another way to check the delay directly is
```
$ ssh db01a

$ psql -U postgres -h db01slave

SELECT now() - pg_last_xact_replay_timestamp() AS delay;

         delay
     --------------
     04:15:55.03134
```
It prints the delay in human readable time format like (hh:mm:ss). The advantage of this query is, that it is very comprehensive. So you don’t have to calculate a lot, the output is appropriate. If the delay is very small the slave is probably reading from master at that moment. But always have an look at the issue.
Check the last replay
```
SELECT pg_last_xact_replay_timestamp();

  pg_last_xact_replay_timestamp
 -------------------------------
  2017-10-10 11:45:00.945915+02
```
It prints the timestamp for the last replication. It is more comprehensive than just a number. If the delay is more than 3 hours, you should have a look at the postgres log file: /var/log/postgres/postgresql-9.5-main.log. If you find messages like “0000000200002440000000A8: file already removed” then you have to do the basebackup. Otherwise you could have a look at the postgres configurations. If you can’t see anything odd, a restart of postgres could solve the problem.

Hint

Postgres doesn’t replay any changes during dumps.

Restart Postgres-Slave

$ sudo systemctl restart postgresql@9.5-main.service

Basebackup

Important! All these commands must only be executed on de Slave (db01slave)! The initial user for this procedure should be tadm.

save password from /postgres/postgres_data/recovery.conf
change to user root

copy the password

sudo su

cat /postgres/postgres_data/recovery.conf

Base backup

screen -S restore_slavei

cp -a /etc/postgresql/9.5/main/ /postgres/backup/main

pg_dropcluster --stop 9.5 main

cd /etc/postgres

mkdir 9.5

chown -R postgres:postgres 9.5

cp -a /postgres/backup/main /etc/postgresql/9.5/main

^D or exit

sudo su postgres

pg_basebackup -h db01master -U pg_replica -D /postgres/postgres_data/ -v -P --xlog-method=stream

cd /postgres/postgres_data/ && mv recovery.done recovery.conf

recovery.conf change (Slave-IP) -> (Master-IP)

^D or exit

sudo systemctl start postgresql@9.5-main.service