前面已经在两台CentOS 7服务器上从源码编译安装了PostgreSQL,当前这两台服务器的PostgreSQL数据库是相互独立,它们之间没有任何关系。 本文将介绍基于repmgr的postgresql主备高可用方案,使用repmgr将这两个PostgreSQL设置为一个基本的复制集群,其占用一个作为主服务(primary),另一个作为备用服务(standby)。

repmgr是一个用于管理PostgreSQL集群的复制和故障转移的开源工具套件,它增强了PostgreSQL内建的热备功能,可以使用命令工具设置备用服务器、监控复制和执行任务管理,进行故障转移或者手动切换。

1.安装环境和准备工作

两台CentOS 7服务器如下:

1
2
192.168.100.151    node1
192.168.100.152    node2

前面已经在这两台服务器上编译安装了PostgreSQL 13.5。 要使用repmgr进行主备切换(switchover),需要用到node1和node2之间postgres用户的ssh无密码使用key直接登录,这个需要提前配置好。 此外,postgres用户还需要无密执行systemd启动和重启数据库服务。

1
echo "postgres ALL = (root) NOPASSWD:/usr/bin/systemctl start pgserver,/usr/bin/systemctl restart pgserver,/usr/bin/systemctl stop pgserver,/usr/bin/systemctl reload pgserver,/usr/bin/systemctl status pgserver" | sudo tee /etc/sudoers.d/postgres

下载并解压缩repmgr的源码:

1
tar -zxvf repmgr-5.3.0.tar.gz

安装相关工具及相关依赖:

1
yum install -y flex libselinux-devel libxml2-devel libxslt-devel openssl-devel pam-devel readline-devel

2.编译安装repmgr

接下来在node1和node2这两台服务器上从源码编译安装repmgr。进入到源码解压缩路径完成编译安装:

1
2
3
4
5
6
cd repmgr-5.3.0

export PG_CONFIG=/usr/local/pgsql/bin/pg_config
./configure
make
make install

编译安装后,在PostgreSQL安装目录的bin目录中会多出repmgrrepmgrd两个二进制文件:

1
2
3
ls /usr/local/pgsql/bin | grep repmgr
repmgr
repmgrd

3.repmgr配置

3.1 PostgreSQL的配置

这里选择node1作为主库,主库PostgreSQL的配置文件/home/postgres/data/postgresql.conf中关于复制的配置需要做一下调整:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
max_wal_senders = 10
max_replication_slots = 10
wal_level = replica
hot_standby = on
archive_mode = on
archive_command = '/bin/true'

wal_log_hints=on

shared_preload_libraries = 'repmgr'

修改完成后重启数据库服务systemctl restart pgserver,确保没有错误发生。

下面是一个可供参考的完整的postgresql.conf配置示例文件:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
data_directory = '/home/postgres/data'
hba_file = '/home/postgres/data/pg_hba.conf'
ident_file = '/home/postgres/data/pg_ident.conf'

external_pid_file = '/home/postgres/data/postmaster.pid'

listen_addresses = '*'
port = 5432
max_connections = 150
superuser_reserved_connections = 10

authentication_timeout = 30s
password_encryption = md5

shared_buffers = 512MB
dynamic_shared_memory_type = posix

wal_level = replica
fsync = on
wal_log_hints = on

max_wal_size = 1GB
min_wal_size = 80MB
wal_keep_size = 1GB

archive_mode = on
archive_command = '/bin/true'

max_wal_senders = 10
max_replication_slots = 10

hot_standby = on

log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_line_prefix = '%m [%p] '
log_timezone = 'PRC'

row_security = on

datestyle = 'iso, mdy'
timezone = 'PRC'

lc_messages = 'en_US.UTF-8'
lc_monetary = 'zh_CN.utf8'
lc_numeric = 'zh_CN.UTF-8'
lc_time = 'zh_CN.UTF-8'

default_text_search_config = 'pg_catalog.english'

shared_preload_libraries = 'repmgr'

3.2 创建repmgr用户和数据库

接下来在主服务器node1的PostgreSQL上创建一个专用超级用户repmgr和一个专用的数据库repmgr。

1
2
3
/usr/local/pgsql/bin/createuser -s repmgr
/usr/local/pgsql/bin/createdb repmgr -O repmgr
/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Upostgres -c 'ALTER USER repmgr SET search_path TO repmgr, "$user", public;'

repmgr将安装repmgr扩展,它将创建一个包含repmgr元数据表以及其他函数和视图的repmgr schema。

3.3 在pg_hba.conf中配置认证

在主服务器node1的/home/postgres/data/pg_hba.conf中配置repmgr用户的认证权限,确保其有适当的权限,并且能够以复制模式连接:

1
2
3
4
5
6
7
8
9
local   replication   repmgr                              trust
host    replication   repmgr      127.0.0.1/32            trust
host    replication   repmgr      192.168.100.151/32       trust
host    replication   repmgr      192.168.100.152/32       trust

local   repmgr        repmgr                              trust
host    repmgr        repmgr      127.0.0.1/32            trust
host    repmgr        repmgr      192.168.100.151/32       trust
host    repmgr        repmgr      192.168.100.152/32       trust

修改完配置需要重启服务sudo systemctl restart pgserver

3.4 备用节点node2上的准备工作

在备库node2上停止PostgreSQL服务,并删除其数据目录:

1
2
systemctl stop pgserver
rm -rf /home/postgres/data/*

从测试一下从备节点node2到主节点node1的连接,确保可以连接上:

1
/usr/local/pgsql/bin/psql 'host=node1 user=repmgr dbname=repmgr connect_timeout=2'

因为这里是第一次初始主备PostgreSQL服务,我们将node1服务器作为主库,将node2服务器作为从库。

3.5 创建repmgr.conf配置文件

在node1和node2上创建/usr/local/pgsql/repmgr.conf配置文件

node1的repmgr.conf:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
node_id=1
node_name=node1
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/data'
pg_bindir='/usr/local/pgsql/bin/'
ssh_options='-q -o ConnectTimeout=10'
failover='automatic'
promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /usr/local/pgsql/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /usr/local/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'
service_start_command='sudo systemctl start pgserver'
service_stop_command='sudo systemctl stop pgserver'
service_restart_command='sudo systemctl restart pgserver'
service_reload_command='sudo systemctl reload pgserver'

node2的repmgr.conf:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
node_id=2
node_name=node2
conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/data'
pg_bindir='/usr/local/pgsql/bin/'
ssh_options='-q -o ConnectTimeout=10'
failover='automatic'
promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /usr/local/pgsql/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /usr/local/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'
service_start_command='sudo systemctl start pgserver'
service_stop_command='sudo systemctl stop pgserver'
service_restart_command='sudo systemctl restart pgserver'
service_reload_command='sudo systemctl reload pgserver'

3.6 注册主节点

为了使repmgr支持复制集群,主节点必须注册到repmgr。这将安装repmgr扩展和元数据对象,并为主服务器添加一个元数据记录。 在node1上使用repmgr命令将node1注册为主库:

1
2
3
4
5
6
7
su postgres
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf primary register --force

INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

查看一下此时集群状态,只有node1这一个主节点:

1
2
3
4
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2

repmgr元数据表中的数据如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Urepmgr

\x
Expanded display is on.
repmgr=# SELECT * FROM repmgr.nodes;
-[ RECORD 1 ]----+-------------------------------------------------------
node_id          | 1
upstream_node_id |
active           | t
node_name        | node1
type             | primary
location         | default
priority         | 100
conninfo         | host=node1 user=repmgr dbname=repmgr connect_timeout=2
repluser         | repmgr
slot_name        |
config_file      | /usr/local/pgsql/repmgr.conf

一个PostgreSQL复制集群中的每个服务器都将拥有自己的记录。在使用repmgrd时,当节点状态或角色发生变化时,upstream_node_idactivetype字段会更新。

3.7 克隆生成备节点数据

在备节点node2上,测试(dry run)一下能否clone主库的数据:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
su postgres
/usr/local/pgsql/bin/repmgr -h node1 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby clone --dry-run

NOTICE: destination directory "/home/postgres/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=node1 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/data -h node1 -p 5432 -U repmgr -X stream
INFO: all prerequisites for "standby clone" are met

以上测试没有问题的话,在从库node2上运行下面的命令clone主库node1上的数据:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
/usr/local/pgsql/bin/repmgr -h node1 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby clone

NOTICE: destination directory "/home/postgres/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=node1 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/home/postgres/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/data -h node1 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: sudo systemctl start pgserver
HINT: after starting the server, you need to register this standby with "repmgr standby register"

实际上使用了pg_basebackup命令clone了主节点的数据目录文件。主节点数据目录中的配置文件也都被复制到了备节点的数据目录中,包括postgresql.conf, postgresql.auto.conf, pg_hba.confpg_ident.conf。 如果不需要针对备节点定制修改这些配置的话,就可以启动备节点的数据库服务了:

1
systemctl start pgserver

3.9 验证复制是否正常工作

连接到主节点数据库执行下面的查询:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Urepmgr

repmgr=# \x
Expanded display is on.
repmgr=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 3636
usesysid         | 16384
usename          | repmgr
application_name | node2
client_addr      | 192.168.100.152
client_hostname  |
client_port      | 44160
backend_start    | 2021-11-14 14:48:17.856758+08
backend_xmin     |
state            | streaming
sent_lsn         | 0/40002D0
write_lsn        | 0/40002D0
flush_lsn        | 0/40002D0
replay_lsn       | 0/40002D0
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
reply_time       | 2021-11-14 14:50:48.201894+08

这表明之前克隆的备用节点(在application_name字段中显示的node2)已经从IP地址192.168.100.152连接到主节点。

连接到备库执行下面的查询:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Urepmgr

sELECT * FROM pg_stat_wal_receiver;
-[ RECORD 1 ]---------+----------------------------------
pid                   | 2927
status                | streaming
receive_start_lsn     | 0/5000000
receive_start_tli     | 1
written_lsn           | 0/50000A0
flushed_lsn           | 0/50000A0
received_tli          | 1
last_msg_send_time    | 2021-11-14 14:55:55.416634+08
last_msg_receipt_time | 2021-11-14 14:55:55.415109+08
latest_end_lsn        | 0/50000A0
latest_end_time       | 2021-11-17 14:55:55.408763+08
slot_name             |
sender_host           | node1
sender_port           | 5432

3.9 注册备节点

在node2上使用repmgr命令将node2注册为备节点:

1
2
3
4
5
6
7
8
su postgres
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby register --force

INFO: connecting to local node "node2" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "node2" (ID: 2) successfully registered

查看一下此时集群状态,node2作为备节点被加入到了集群中:

1
2
3
4
5
6
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show

 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

3.10 主备手动切换测试

现在主节点node1和备节点node2都正常运行,在备节点node1上执行切换操作,手动将node1切换为主节点:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby switchover

NOTICE: executing switchover on node "node2" (ID: 2)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)
DETAIL: executing server command "sudo systemctl stop pgserver"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/6000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
WARNING: node "node1" attached in state "startup"
INFO: waiting for node "node1" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
DETAIL: node "node2" (ID: 1) is currently attached to its upstream node in state "startup"
NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node2" is now primary and node "node1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully

上面的命令成功的将备节点node2切换为主节点,主节点node1重启后作为新的备节点加入到集群:

1
2
3
4
5
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

4.使用repmgrd实现自动故障转移

repmgrd是一个管理和监控守护进程,运行在PostgreSQL复制集群的每个节点上,可以自动执行故障转移和更新备用节点等操作以跟上新的主节点,并提供关于每个备用服务器状态的监视信息。

4.1 启动repmgrd

在node1和node2上创建repmgrd的systemd配置文件/etc/systemd/system/repmgrd.service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[Unit]
Description=A replication manager, and failover management tool for PostgreSQL
After=syslog.target
After=network.target
After=pgserver.service

[Service]
Type=forking

User=postgres
Group=postgres

# PID file
PIDFile=/home/postgres/data/repmgrd.pid

# Location of repmgr conf file:
Environment=REPMGRDCONF=/usr/local/pgsql/repmgr.conf
Environment=PIDFILE=/home/postgres/data/repmgrd.pid

# Where to send early-startup messages from the server
# This is normally controlled by the global default set by systemd
# StandardOutput=syslog
ExecStart=/usr/local/pgsql/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verbose
ExecStop=/usr/bin/kill -TERM $MAINPID
ExecReload=/usr/bin/kill -HUP $MAINPID

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=300

[Install]
WantedBy=multi-user.target

在node1和node2上启动repmgrd服务:

1
systemctl enable repmgrd --now

4.2 自动故障转移测试

当前集群的状态如下:

1
2
3
4
5
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

当前node2为主节点,现在模拟故障,将node2上数据库服务停掉:

1
systemctl stop pgserver

等一小会儿,node1会成为新的主节点,node2会被标记为failed状态:

1
2
3
4
5
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | - failed  | ?        | default  | 100      |          | host=node2 user=repmgr dbname=repmgr connect_timeout=2

此时如果将node2的数据库服务重新起来,就会出现下面的状态,这个也是这种主备高可用方案的缺点,出现了两个主节点,相当于发生了脑裂。

1
2
3
4
5
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | ! running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

node2的状态是不对的,将其服务再次停止,并重新以备节点的角色加入到集群:

1
2
3
4
5
6
7
8
systemctl stop pgserver
/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf node rejoin  -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind

/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

5.总结

本文中使用repmgr实现PostgreSQL主备高可用的方案的缺点就是,如果集群的网络出现脑裂(网络分区),就会出现两个节点都认为自己是主节点的情况,如果两个节点都同时承载了业务读写,就会出现数据不一致的情况。 一般造成脑裂的主要原因是主备之间网络问题导致备库认为主库故障自动切换为主库,这就造成出现了双主的情况,另一个原因是人工进行主备切换后,由将原来的主节点启动(人工启动或服务器宕机后开机启动启动)。 发生脑裂后通常需要人工介入,并且介入越早越好,如果可以忽略脑裂期间分叉造成的部分数据丢失,可以通过pg_rewind将其中一个节点重新作为备节点加入集群,恢复单主的集群状态。

关于脑裂的问题,repmgr还提供了witness节点的方案,witness节点作为一个普通的PostgreSQL实例,它不是流复制集群的一部分,其目的是,如果发生故障转移时,提供证据证明是主服务器本身故障,还是网络原因造成的脑裂。 关于repmgr中witness的部分,本文没有涉及。

参考