View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000664 | Pgpool-II | Bug | public | 2020-11-30 12:32 | 2020-12-10 14:11 |
| Reporter | jeffty_w | Assigned To | pengbo | ||
| Priority | high | Severity | major | Reproducibility | always |
| Status | feedback | Resolution | open | ||
| Platform | Linux | OS | RHEL | OS Version | 7.7 |
| Product Version | 4.1.2 | ||||
| Summary | 0000664: PgPool with WatchDog Cluster doesn't work after reboot all the machines | ||||
| Description | Hi there, I installed Pgpool + WatchDog as the example https://www.pgpool.net/docs/latest/en/html/example-cluster.html described on 3 machines. Today they are powered off by accident. After power on them I found that db2 and db3 are in 'down' status. Is there a recommended way to auto start all of the nodes even after reboot? RPMs: postgresql: 12.3 pgpool-II-pg12: 4.1.2 Machines: jeffty-db11, jeffty-db2, jeffty-db3 VIP: 192.168.1.208 Services Configured during the installation: pgpool service is not enabled, systemctl enable postgresql-12 on every machine. cat /usr/lib/systemd/system/postgresql-12.service ExecStartPre=/usr/pgsql-12/bin/postgresql-12-check-db-dir ${PGDATA} ExecStart=/usr/pgsql-12/bin/postgres -D ${PGDATA} And here is how I initiated the recovery nodes on db1 several months ago: pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 Now after power on those machines, only the db1's postgresql service is active. I started the pgpool service manually one by one and check the watchdog info on db1: -bash-4.2$ pcp_watchdog_info -h 192.168.1.208 -p 9898 -U pgpool 3 YES jeffty-db1:9999 Linux jeffty-db1 jeffty-db1 jeffty-db1:9999 Linux jeffty-db1 jeffty-db1 9999 9000 4 MASTER jeffty-db2:9999 Linux jeffty-db2 jeffty-db2 9999 9000 7 STANDBY jeffty-db3:9999 Linux jeffty-db3 jeffty-db3 9999 9000 7 STANDBY Looks good. and I tried -bash-4.2$ psql -h 192.168.1.208 -p 9999 -U pgpool postgres -c "show pool_nodes" node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | jeffty-db1 | 5432 | up | 0.333333 | primary | 247 | true | 0 | | | 2020-11-28 14:04:24 1 | jeffty-db2 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:26 2 | jeffty-db3 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:28 (3 rows) db2 & db3 are in 'down' status. Then I login them and tried: e.g. db2#su - postgres -bash-4.2$ psql -U postgres psql: error: could not connect to server: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? So I have to start postgresql manually as /usr/pgsql-12/bin/pg_ctl -D /var/lib/pgsql/12/data/ start Now I can use psql -U postgres to login db2, but it seems that they are not in replicated since I create a table on db2 and cannot find the new table in db1. Finally after rebooting: 1. If I don't manually start postgresql service on db2 & db3, then it turns into only one node cluster, db1 acts as the only database instance. 2. If I manually start postgresql service on db2 & db3, it should still be one node instance since changes on db2 won't be synchronized to db1. What steps I missed during this process? Any suggestion or guidance would be appreciated. Best Regards, Jeffty | ||||
| Steps To Reproduce | 1. Install pgpool 4.1.2 and postgresql 12.3 on 3 machines as https://www.pgpool.net/docs/latest/en/html/example-cluster.html described. 2. enable the postgresql-12 services. 3. don't enable pgpool services. 4. reboot all of the machines. | ||||
| Tags | No tags attached. | ||||
|
|
Have you configure "8.2.7. /etc/sysconfig/pgpool Configuration"? https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG-SYSCONFIG --- # vi /etc/sysconfig/pgpool ... OPTS=" -D -n" --- |
|
|
Hi pengbo, yes. e.g. pgpool file in all servers: vi /etc/sysconfig/pgpool # Options for pgpool # -n: don't run in daemon mode. does not detatch control tty # -d: debug mode. lots of debug information will be printed #OPTS=" -d -n" OPTS=" -D -n" STOP_OPTS=" -m fast" |
|
|
Checked and found the root cause why db2 & db3 cannot be started automatically: on db2 & db3: vi /usr/lib/systemd/system/postgresql-12.service ...... ExecStart=/usr/pgsql-12/bin/postgresql -D ${PGDATA} ...... it should be 'postgres' rather than 'postgresql'.I don't know why it happened, maybe I changed them manually before. Changed and exected: systemctl daemon-reload systemctl start postgresql-12 on db2 & db3. Now postgres service in db2 & db3 are running. And check the pool nodes status: [root@jeffty-db2 ~]# psql -h 192.168.1.208 -p 9999 -U pgpool postgres -c "show pool_nodes" Password for user pgpool: node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | jeffty-db1 | 5432 | up | 0.333333 | primary | 36837 | true | 0 | | | 2020-11-28 14:04:24 1 | jeffty-db2 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:26 2 | jeffty-db3 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:28 db2 & db3 are still down. It's the correct status right? Since the document 4.1.2 doesn't mention it, the question now becomes: 1. After we initdb in db1, and execute pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 all postgresqls on db1-db3 are running. Now do we need to systemctl enable pgpool systemctl enable postgresql-12 on all 3 nodes? 2. After rebooting, do we need to manually execute below commands again? pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 Thanks a lot. |
|
|
> db2 & db3 are still down. It's the correct status right? Yes. If you didn't create standby nodes, it is the correct status. > 1. After we initdb in db1, and execute > pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 > pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 Yes. If you haven't created PostgreSQL node1 and node2, you need to execute "pcp_recovery_node" to create them. To create standby nodes you can do either 1. or 2.: 1. use "pcp_recovery_node" 2. create a streaming replication PostgreSQL cluster in advance (e.g. using pg_basebackup). > all postgresqls on db1-db3 are running. Now do we need to > > systemctl enable pgpool > systemctl enable postgresql-12 > > on all 3 nodes? Yes. > 2. After rebooting, do we need to manually execute below commands again? > > pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 > pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 No. Because you have created node1 and node2, you do not need to execute "pcp_recovery_node" again. Below is an configuration example: https://www.pgpool.net/docs/41/en/html/example-cluster.html |
|
|
Now I understand question 1 & 2. Thanks pengbo for the kindly support. But for question 3, the issue is after rebooting, starting postgres service on db2 & db3, and check the pool nodes status: [root@jeffty-db2 ~]# psql -h 192.168.1.208 -p 9999 -U pgpool postgres -c "show pool_nodes" Password for user pgpool: node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | jeffty-db1 | 5432 | up | 0.333333 | primary | 36837 | true | 0 | | | 2020-11-28 14:04:24 1 | jeffty-db2 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:26 2 | jeffty-db3 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:28 The are 'down', and create a table or insert a record in db1 won't be replicated to db2 & db3 automatically. It seems that I need to execute something to notify pgpool to 'manage db2 & db3'? Otherwise the db change is always happening on db1. Thanks. |
|
|
Question 3: Is Virtual IP "192.168.1.208" set on the pgpool node which is on db1? Pgpool can't detect the standby nodes automatically once the nodes goes down. To automatically bring back the DOWN nodes, you can use the parameter "auto_failback". auto_failback = on https://www.pgpool.net/docs/41/en/html/runtime-config-failover.html#GUC-AUTO-FAILBACK |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2020-11-30 12:32 | jeffty_w | New Issue | |
| 2020-11-30 12:41 | pengbo | Note Added: 0003617 | |
| 2020-11-30 12:43 | pengbo | Assigned To | => pengbo |
| 2020-11-30 12:43 | pengbo | Status | new => feedback |
| 2020-11-30 12:55 | jeffty_w | Note Added: 0003619 | |
| 2020-11-30 12:55 | jeffty_w | Status | feedback => assigned |
| 2020-11-30 14:16 | jeffty_w | Note Added: 0003620 | |
| 2020-12-09 16:27 | pengbo | Note Added: 0003647 | |
| 2020-12-09 16:27 | pengbo | Status | assigned => feedback |
| 2020-12-09 16:34 | pengbo | Note Edited: 0003647 | |
| 2020-12-10 01:37 | jeffty_w | Note Added: 0003651 | |
| 2020-12-10 01:37 | jeffty_w | Status | feedback => assigned |
| 2020-12-10 14:04 | pengbo | Note Added: 0003654 | |
| 2020-12-10 14:05 | pengbo | Note Edited: 0003654 | |
| 2020-12-10 14:11 | pengbo | Status | assigned => feedback |