基于repmgr实现PostgreSQL的主备高可用

📅 2021-11-14 | 🖱️

前面已经在两台CentOS 7服务器上从源码编译安装了PostgreSQL，当前这两台服务器的PostgreSQL数据库是相互独立，它们之间没有任何关系。本文将介绍基于repmgr的postgresql主备高可用方案，使用repmgr将这两个PostgreSQL设置为一个基本的复制集群，其占用一个作为主服务(primary)，另一个作为备用服务(standby)。

repmgr是一个用于管理PostgreSQL集群的复制和故障转移的开源工具套件，它增强了PostgreSQL内建的热备功能，可以使用命令工具设置备用服务器、监控复制和执行任务管理，进行故障转移或者手动切换。

1.安装环境和准备工作 #

两台CentOS 7服务器如下:

1192.168.100.151    node1
2192.168.100.152    node2

前面已经在这两台服务器上编译安装了PostgreSQL 13.5。要使用repmgr进行主备切换(switchover)，需要用到node1和node2之间postgres用户的ssh无密码使用key直接登录，这个需要提前配置好。此外，postgres用户还需要无密执行systemd启动和重启数据库服务。

1echo "postgres ALL = (root) NOPASSWD:/usr/bin/systemctl start pgserver,/usr/bin/systemctl restart pgserver,/usr/bin/systemctl stop pgserver,/usr/bin/systemctl reload pgserver,/usr/bin/systemctl status pgserver" | sudo tee /etc/sudoers.d/postgres

下载并解压缩repmgr的源码:

1tar -zxvf repmgr-5.3.0.tar.gz

安装相关工具及相关依赖:

1yum install -y flex libselinux-devel libxml2-devel libxslt-devel openssl-devel pam-devel readline-devel

2.编译安装repmgr #

接下来在node1和node2这两台服务器上从源码编译安装repmgr。进入到源码解压缩路径完成编译安装:

1cd repmgr-5.3.0
2
3export PG_CONFIG=/usr/local/pgsql/bin/pg_config
4./configure
5make
6make install

编译安装后，在PostgreSQL安装目录的bin目录中会多出repmgr和repmgrd两个二进制文件:

1ls /usr/local/pgsql/bin | grep repmgr
2repmgr
3repmgrd

3.repmgr配置 #

3.1 PostgreSQL的配置 #

这里选择node1作为主库，主库PostgreSQL的配置文件/home/postgres/data/postgresql.conf中关于复制的配置需要做一下调整:

 1max_wal_senders = 10
 2max_replication_slots = 10
 3wal_level = replica
 4hot_standby = on
 5archive_mode = on
 6archive_command = '/bin/true'
 7
 8wal_log_hints=on
 9
10shared_preload_libraries = 'repmgr'

修改完成后重启数据库服务systemctl restart pgserver，确保没有错误发生。

下面是一个可供参考的完整的postgresql.conf配置示例文件:

 1data_directory = '/home/postgres/data'
 2hba_file = '/home/postgres/data/pg_hba.conf'
 3ident_file = '/home/postgres/data/pg_ident.conf'
 4
 5external_pid_file = '/home/postgres/data/postmaster.pid'
 6
 7listen_addresses = '*'
 8port = 5432
 9max_connections = 150
10superuser_reserved_connections = 10
11
12authentication_timeout = 30s
13password_encryption = md5
14
15shared_buffers = 512MB
16dynamic_shared_memory_type = posix
17
18wal_level = replica
19fsync = on
20wal_log_hints = on
21
22max_wal_size = 1GB
23min_wal_size = 80MB
24wal_keep_size = 1GB
25
26archive_mode = on
27archive_command = '/bin/true'
28
29max_wal_senders = 10
30max_replication_slots = 10
31
32hot_standby = on
33
34log_destination = 'stderr'
35logging_collector = on
36log_directory = 'log'
37log_filename = 'postgresql-%a.log'
38log_truncate_on_rotation = on
39log_rotation_age = 1d
40log_rotation_size = 0
41log_line_prefix = '%m [%p] '
42log_timezone = 'PRC'
43
44row_security = on
45
46datestyle = 'iso, mdy'
47timezone = 'PRC'
48
49lc_messages = 'en_US.UTF-8'
50lc_monetary = 'zh_CN.utf8'
51lc_numeric = 'zh_CN.UTF-8'
52lc_time = 'zh_CN.UTF-8'
53
54default_text_search_config = 'pg_catalog.english'
55
56shared_preload_libraries = 'repmgr'

3.2 创建repmgr用户和数据库 #

接下来在主服务器node1的PostgreSQL上创建一个专用超级用户repmgr和一个专用的数据库repmgr。

1/usr/local/pgsql/bin/createuser -s repmgr
2/usr/local/pgsql/bin/createdb repmgr -O repmgr
3/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Upostgres -c 'ALTER USER repmgr SET search_path TO repmgr, "$user", public;'

repmgr将安装repmgr扩展，它将创建一个包含repmgr元数据表以及其他函数和视图的repmgr schema。

3.3 在pg_hba.conf中配置认证 #

在主服务器node1的/home/postgres/data/pg_hba.conf中配置repmgr用户的认证权限，确保其有适当的权限，并且能够以复制模式连接:

1local   replication   repmgr                              trust
2host    replication   repmgr      127.0.0.1/32            trust
3host    replication   repmgr      192.168.100.151/32       trust
4host    replication   repmgr      192.168.100.152/32       trust
5
6local   repmgr        repmgr                              trust
7host    repmgr        repmgr      127.0.0.1/32            trust
8host    repmgr        repmgr      192.168.100.151/32       trust
9host    repmgr        repmgr      192.168.100.152/32       trust

修改完配置需要重启服务sudo systemctl restart pgserver。

3.4 备用节点node2上的准备工作 #

在备库node2上停止PostgreSQL服务，并删除其数据目录:

1systemctl stop pgserver
2rm -rf /home/postgres/data/*

从测试一下从备节点node2到主节点node1的连接，确保可以连接上:

1/usr/local/pgsql/bin/psql 'host=node1 user=repmgr dbname=repmgr connect_timeout=2'

因为这里是第一次初始主备PostgreSQL服务，我们将node1服务器作为主库，将node2服务器作为从库。

3.5 创建repmgr.conf配置文件 #

在node1和node2上创建/usr/local/pgsql/repmgr.conf配置文件

node1的repmgr.conf:

 1node_id=1
 2node_name=node1
 3conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
 4data_directory='/home/postgres/data'
 5pg_bindir='/usr/local/pgsql/bin/'
 6ssh_options='-q -o ConnectTimeout=10'
 7failover='automatic'
 8promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /usr/local/pgsql/repmgr.conf --log-to-file'
 9follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /usr/local/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'
10service_start_command='sudo systemctl start pgserver'
11service_stop_command='sudo systemctl stop pgserver'
12service_restart_command='sudo systemctl restart pgserver'
13service_reload_command='sudo systemctl reload pgserver'

node2的repmgr.conf:

 1node_id=2
 2node_name=node2
 3conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2'
 4data_directory='/home/postgres/data'
 5pg_bindir='/usr/local/pgsql/bin/'
 6ssh_options='-q -o ConnectTimeout=10'
 7failover='automatic'
 8promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /usr/local/pgsql/repmgr.conf --log-to-file'
 9follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /usr/local/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'
10service_start_command='sudo systemctl start pgserver'
11service_stop_command='sudo systemctl stop pgserver'
12service_restart_command='sudo systemctl restart pgserver'
13service_reload_command='sudo systemctl reload pgserver'

3.6 注册主节点 #

为了使repmgr支持复制集群，主节点必须注册到repmgr。这将安装repmgr扩展和元数据对象，并为主服务器添加一个元数据记录。在node1上使用repmgr命令将node1注册为主库:

1su postgres
2/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf primary register --force
3
4INFO: connecting to primary database...
5NOTICE: attempting to install extension "repmgr"
6NOTICE: "repmgr" extension successfully installed
7NOTICE: primary node record (ID: 1) registered

查看一下此时集群状态，只有node1这一个主节点:

1/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
2 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
3----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
4 1  | node1 | primary | * running |          | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2

repmgr元数据表中的数据如下:

 1/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Urepmgr
 2
 3\x
 4Expanded display is on.
 5repmgr=# SELECT * FROM repmgr.nodes;
 6-[ RECORD 1 ]----+-------------------------------------------------------
 7node_id          | 1
 8upstream_node_id |
 9active           | t
10node_name        | node1
11type             | primary
12location         | default
13priority         | 100
14conninfo         | host=node1 user=repmgr dbname=repmgr connect_timeout=2
15repluser         | repmgr
16slot_name        |
17config_file      | /usr/local/pgsql/repmgr.conf

一个PostgreSQL复制集群中的每个服务器都将拥有自己的记录。在使用repmgrd时，当节点状态或角色发生变化时，upstream_node_id、active和type字段会更新。

3.7 克隆生成备节点数据 #

在备节点node2上，测试(dry run)一下能否clone主库的数据:

 1su postgres
 2/usr/local/pgsql/bin/repmgr -h node1 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby clone --dry-run
 3
 4NOTICE: destination directory "/home/postgres/data" provided
 5INFO: connecting to source node
 6DETAIL: connection string is: host=node1 user=repmgr dbname=repmgr
 7DETAIL: current installation size is 31 MB
 8INFO: "repmgr" extension is installed in database "repmgr"
 9INFO: replication slot usage not requested;  no replication slot will be set up for this standby
10INFO: parameter "max_wal_senders" set to 10
11NOTICE: checking for available walsenders on the source node (2 required)
12INFO: sufficient walsenders available on the source node
13DETAIL: 2 required, 10 available
14NOTICE: checking replication connections can be made to the source server (2 required)
15INFO: required number of replication connections could be made to the source server
16DETAIL: 2 replication connections required
17NOTICE: standby will attach to upstream node 1
18HINT: consider using the -c/--fast-checkpoint option
19INFO: would execute:
20  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/data -h node1 -p 5432 -U repmgr -X stream
21INFO: all prerequisites for "standby clone" are met

以上测试没有问题的话，在从库node2上运行下面的命令clone主库node1上的数据:

 1/usr/local/pgsql/bin/repmgr -h node1 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby clone
 2
 3NOTICE: destination directory "/home/postgres/data" provided
 4INFO: connecting to source node
 5DETAIL: connection string is: host=node1 user=repmgr dbname=repmgr
 6DETAIL: current installation size is 31 MB
 7INFO: replication slot usage not requested;  no replication slot will be set up for this standby
 8NOTICE: checking for available walsenders on the source node (2 required)
 9NOTICE: checking replication connections can be made to the source server (2 required)
10INFO: checking and correcting permissions on existing directory "/home/postgres/data"
11NOTICE: starting backup (using pg_basebackup)...
12HINT: this may take some time; consider using the -c/--fast-checkpoint option
13INFO: executing:
14  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/data -h node1 -p 5432 -U repmgr -X stream
15NOTICE: standby clone (using pg_basebackup) complete
16NOTICE: you can now start your PostgreSQL server
17HINT: for example: sudo systemctl start pgserver
18HINT: after starting the server, you need to register this standby with "repmgr standby register"

实际上使用了pg_basebackup命令clone了主节点的数据目录文件。主节点数据目录中的配置文件也都被复制到了备节点的数据目录中，包括postgresql.conf, postgresql.auto.conf, pg_hba.conf和pg_ident.conf。如果不需要针对备节点定制修改这些配置的话，就可以启动备节点的数据库服务了:

1systemctl start pgserver

3.9 验证复制是否正常工作 #

连接到主节点数据库执行下面的查询:

 1/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Urepmgr
 2
 3repmgr=# \x
 4Expanded display is on.
 5repmgr=# SELECT * FROM pg_stat_replication;
 6-[ RECORD 1 ]----+------------------------------
 7pid              | 3636
 8usesysid         | 16384
 9usename          | repmgr
10application_name | node2
11client_addr      | 192.168.100.152
12client_hostname  |
13client_port      | 44160
14backend_start    | 2021-11-14 14:48:17.856758+08
15backend_xmin     |
16state            | streaming
17sent_lsn         | 0/40002D0
18write_lsn        | 0/40002D0
19flush_lsn        | 0/40002D0
20replay_lsn       | 0/40002D0
21write_lag        |
22flush_lag        |
23replay_lag       |
24sync_priority    | 0
25sync_state       | async
26reply_time       | 2021-11-14 14:50:48.201894+08

这表明之前克隆的备用节点(在application_name字段中显示的node2)已经从IP地址192.168.100.152连接到主节点。

连接到备库执行下面的查询:

 1/usr/local/pgsql/bin/psql -p5432 -h127.0.0.1 -Urepmgr
 2
 3sELECT * FROM pg_stat_wal_receiver;
 4-[ RECORD 1 ]---------+----------------------------------
 5pid                   | 2927
 6status                | streaming
 7receive_start_lsn     | 0/5000000
 8receive_start_tli     | 1
 9written_lsn           | 0/50000A0
10flushed_lsn           | 0/50000A0
11received_tli          | 1
12last_msg_send_time    | 2021-11-14 14:55:55.416634+08
13last_msg_receipt_time | 2021-11-14 14:55:55.415109+08
14latest_end_lsn        | 0/50000A0
15latest_end_time       | 2021-11-17 14:55:55.408763+08
16slot_name             |
17sender_host           | node1
18sender_port           | 5432

3.9 注册备节点 #

在node2上使用repmgr命令将node2注册为备节点:

1su postgres
2/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby register --force
3
4INFO: connecting to local node "node2" (ID: 2)
5INFO: connecting to primary database
6WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
7INFO: standby registration complete
8NOTICE: standby node "node2" (ID: 2) successfully registered

查看一下此时集群状态，node2作为备节点被加入到了集群中:

1/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
2
3 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
4----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
5 1  | node1 | primary | * running |          | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
6 2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

3.10 主备手动切换测试 #

现在主节点node1和备节点node2都正常运行，在备节点node1上执行切换操作，手动将node1切换为主节点:

 1/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby switchover
 2
 3NOTICE: executing switchover on node "node2" (ID: 2)
 4NOTICE: attempting to pause repmgrd on 2 nodes
 5NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
 6NOTICE: stopping current primary node "node1" (ID: 1)
 7NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)
 8DETAIL: executing server command "sudo systemctl stop pgserver"
 9INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
10NOTICE: current primary has been cleanly shut down at location 0/6000028
11NOTICE: promoting standby to primary
12DETAIL: promoting server "node2" (ID: 2) using pg_promote()
13NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
14NOTICE: STANDBY PROMOTE successful
15DETAIL: server "node2" (ID: 2) was successfully promoted to primary
16WARNING: node "node1" attached in state "startup"
17INFO: waiting for node "node1" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
18DETAIL: node "node2" (ID: 1) is currently attached to its upstream node in state "startup"
19NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby
20NOTICE: switchover was successful
21DETAIL: node "node2" is now primary and node "node1" is attached as standby
22NOTICE: STANDBY SWITCHOVER has completed successfully

上面的命令成功的将备节点node2切换为主节点，主节点node1重启后作为新的备节点加入到集群:

1/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
2ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
3----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
4 1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
5 2  | node2 | primary | * running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

4.使用repmgrd实现自动故障转移 #

repmgrd是一个管理和监控守护进程，运行在PostgreSQL复制集群的每个节点上,可以自动执行故障转移和更新备用节点等操作以跟上新的主节点，并提供关于每个备用服务器状态的监视信息。

4.1 启动repmgrd #

在node1和node2上创建repmgrd的systemd配置文件/etc/systemd/system/repmgrd.service：

 1[Unit]
 2Description=A replication manager, and failover management tool for PostgreSQL
 3After=syslog.target
 4After=network.target
 5After=pgserver.service
 6
 7[Service]
 8Type=forking
 9
10User=postgres
11Group=postgres
12
13# PID file
14PIDFile=/home/postgres/data/repmgrd.pid
15
16# Location of repmgr conf file:
17Environment=REPMGRDCONF=/usr/local/pgsql/repmgr.conf
18Environment=PIDFILE=/home/postgres/data/repmgrd.pid
19
20# Where to send early-startup messages from the server
21# This is normally controlled by the global default set by systemd
22# StandardOutput=syslog
23ExecStart=/usr/local/pgsql/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verbose
24ExecStop=/usr/bin/kill -TERM $MAINPID
25ExecReload=/usr/bin/kill -HUP $MAINPID
26
27# Give a reasonable amount of time for the server to start up/shut down
28TimeoutSec=300
29
30[Install]
31WantedBy=multi-user.target

在node1和node2上启动repmgrd服务:

1systemctl enable repmgrd --now

4.2 自动故障转移测试 #

当前集群的状态如下:

1/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
2 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
3----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
4 1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
5 2  | node2 | primary | * running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

当前node2为主节点，现在模拟故障，将node2上数据库服务停掉:

1systemctl stop pgserver

等一小会儿，node1会成为新的主节点，node2会被标记为failed状态:

1/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
2 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
3----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
4 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
5 2  | node2 | primary | - failed  | ?        | default  | 100      |          | host=node2 user=repmgr dbname=repmgr connect_timeout=2

此时如果将node2的数据库服务重新起来，就会出现下面的状态，这个也是这种主备高可用方案的缺点，出现了两个主节点，相当于发生了脑裂。

1/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
2 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
3----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
4 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
5 2  | node2 | primary | ! running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

node2的状态是不对的，将其服务再次停止，并重新以备节点的角色加入到集群:

1systemctl stop pgserver
2/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf node rejoin  -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind
3
4/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show
5 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
6----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
7 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
8 2  | node2 | standby |   running | node1    | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

5.总结 #

本文中使用repmgr实现PostgreSQL主备高可用的方案的缺点就是，如果集群的网络出现脑裂(网络分区)，就会出现两个节点都认为自己是主节点的情况，如果两个节点都同时承载了业务读写，就会出现数据不一致的情况。一般造成脑裂的主要原因是主备之间网络问题导致备库认为主库故障自动切换为主库，这就造成出现了双主的情况，另一个原因是人工进行主备切换后，由将原来的主节点启动(人工启动或服务器宕机后开机启动启动)。发生脑裂后通常需要人工介入，并且介入越早越好，如果可以忽略脑裂期间分叉造成的部分数据丢失，可以通过pg_rewind将其中一个节点重新作为备节点加入集群，恢复单主的集群状态。

关于脑裂的问题，repmgr还提供了witness节点的方案，witness节点作为一个普通的PostgreSQL实例，它不是流复制集群的一部分，其目的是，如果发生故障转移时，提供证据证明是主服务器本身故障，还是网络原因造成的脑裂。关于repmgr中witness的部分，本文没有涉及。