0000664: PgPool with WatchDog Cluster doesn't work after reboot all the machines - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000664	Pgpool-II	Bug	public	2020-11-30 12:32	2020-12-10 14:11

Reporter	jeffty_w	Assigned To	pengbo
Priority	high	Severity	major	Reproducibility	always
Status	feedback	Resolution	open
Platform	Linux	OS	RHEL	OS Version	7.7
Product Version	4.1.2

Summary	0000664: PgPool with WatchDog Cluster doesn't work after reboot all the machines
Description	Hi there, I installed Pgpool + WatchDog as the example https://www.pgpool.net/docs/latest/en/html/example-cluster.html described on 3 machines. Today they are powered off by accident. After power on them I found that db2 and db3 are in 'down' status. Is there a recommended way to auto start all of the nodes even after reboot? RPMs: postgresql: 12.3 pgpool-II-pg12: 4.1.2 Machines: jeffty-db11, jeffty-db2, jeffty-db3 VIP: 192.168.1.208 Services Configured during the installation: pgpool service is not enabled, systemctl enable postgresql-12 on every machine. cat /usr/lib/systemd/system/postgresql-12.service ExecStartPre=/usr/pgsql-12/bin/postgresql-12-check-db-dir ${PGDATA} ExecStart=/usr/pgsql-12/bin/postgres -D ${PGDATA} And here is how I initiated the recovery nodes on db1 several months ago: pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 Now after power on those machines, only the db1's postgresql service is active. I started the pgpool service manually one by one and check the watchdog info on db1: -bash-4.2$ pcp_watchdog_info -h 192.168.1.208 -p 9898 -U pgpool 3 YES jeffty-db1:9999 Linux jeffty-db1 jeffty-db1 jeffty-db1:9999 Linux jeffty-db1 jeffty-db1 9999 9000 4 MASTER jeffty-db2:9999 Linux jeffty-db2 jeffty-db2 9999 9000 7 STANDBY jeffty-db3:9999 Linux jeffty-db3 jeffty-db3 9999 9000 7 STANDBY Looks good. and I tried -bash-4.2$ psql -h 192.168.1.208 -p 9999 -U pgpool postgres -c "show pool_nodes" node_id \| hostname \| port \| status \| lb_weight \| role \| select_cnt \| load_balance_node \| replication_delay \| replication_state \| replication_sync_state \| last_status_change ---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 \| jeffty-db1 \| 5432 \| up \| 0.333333 \| primary \| 247 \| true \| 0 \| \| \| 2020-11-28 14:04:24 1 \| jeffty-db2 \| 5432 \| down \| 0.333333 \| standby \| 0 \| false \| 0 \| \| \| 2020-11-28 14:01:26 2 \| jeffty-db3 \| 5432 \| down \| 0.333333 \| standby \| 0 \| false \| 0 \| \| \| 2020-11-28 14:01:28 (3 rows) db2 & db3 are in 'down' status. Then I login them and tried: e.g. db2#su - postgres -bash-4.2$ psql -U postgres psql: error: could not connect to server: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? So I have to start postgresql manually as /usr/pgsql-12/bin/pg_ctl -D /var/lib/pgsql/12/data/ start Now I can use psql -U postgres to login db2, but it seems that they are not in replicated since I create a table on db2 and cannot find the new table in db1. Finally after rebooting: 1. If I don't manually start postgresql service on db2 & db3, then it turns into only one node cluster, db1 acts as the only database instance. 2. If I manually start postgresql service on db2 & db3, it should still be one node instance since changes on db2 won't be synchronized to db1. What steps I missed during this process? Any suggestion or guidance would be appreciated. Best Regards, Jeffty
Steps To Reproduce	1. Install pgpool 4.1.2 and postgresql 12.3 on 3 machines as https://www.pgpool.net/docs/latest/en/html/example-cluster.html described. 2. enable the postgresql-12 services. 3. don't enable pgpool services. 4. reboot all of the machines.
Tags	No tags attached.

pengbo 2020-11-30 12:41 developer ~0003617	Have you configure "8.2.7. /etc/sysconfig/pgpool Configuration"? https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG-SYSCONFIG --- # vi /etc/sysconfig/pgpool ... OPTS=" -D -n" ---

jeffty_w 2020-11-30 12:55 reporter ~0003619	Hi pengbo, yes. e.g. pgpool file in all servers: vi /etc/sysconfig/pgpool # Options for pgpool # -n: don't run in daemon mode. does not detatch control tty # -d: debug mode. lots of debug information will be printed #OPTS=" -d -n" OPTS=" -D -n" STOP_OPTS=" -m fast"

jeffty_w 2020-11-30 14:16 reporter ~0003620	Checked and found the root cause why db2 & db3 cannot be started automatically: on db2 & db3: vi /usr/lib/systemd/system/postgresql-12.service ...... ExecStart=/usr/pgsql-12/bin/postgresql -D ${PGDATA} ...... it should be 'postgres' rather than 'postgresql'.I don't know why it happened, maybe I changed them manually before. Changed and exected: systemctl daemon-reload systemctl start postgresql-12 on db2 & db3. Now postgres service in db2 & db3 are running. And check the pool nodes status: [root@jeffty-db2 ~]# psql -h 192.168.1.208 -p 9999 -U pgpool postgres -c "show pool_nodes" Password for user pgpool: node_id \| hostname \| port \| status \| lb_weight \| role \| select_cnt \| load_balance_node \| replication_delay \| replication_state \| replication_sync_state \| last_status_change ---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 \| jeffty-db1 \| 5432 \| up \| 0.333333 \| primary \| 36837 \| true \| 0 \| \| \| 2020-11-28 14:04:24 1 \| jeffty-db2 \| 5432 \| down \| 0.333333 \| standby \| 0 \| false \| 0 \| \| \| 2020-11-28 14:01:26 2 \| jeffty-db3 \| 5432 \| down \| 0.333333 \| standby \| 0 \| false \| 0 \| \| \| 2020-11-28 14:01:28 db2 & db3 are still down. It's the correct status right? Since the document 4.1.2 doesn't mention it, the question now becomes: 1. After we initdb in db1, and execute pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 all postgresqls on db1-db3 are running. Now do we need to systemctl enable pgpool systemctl enable postgresql-12 on all 3 nodes? 2. After rebooting, do we need to manually execute below commands again? pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 Thanks a lot.

pengbo 2020-12-09 16:27 developer ~0003647 Last edited: 2020-12-09 16:34	> db2 & db3 are still down. It's the correct status right? Yes. If you didn't create standby nodes, it is the correct status. > 1. After we initdb in db1, and execute > pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 > pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 Yes. If you haven't created PostgreSQL node1 and node2, you need to execute "pcp_recovery_node" to create them. To create standby nodes you can do either 1. or 2.: 1. use "pcp_recovery_node" 2. create a streaming replication PostgreSQL cluster in advance (e.g. using pg_basebackup). > all postgresqls on db1-db3 are running. Now do we need to > > systemctl enable pgpool > systemctl enable postgresql-12 > > on all 3 nodes? Yes. > 2. After rebooting, do we need to manually execute below commands again? > > pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1 > pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2 No. Because you have created node1 and node2, you do not need to execute "pcp_recovery_node" again. Below is an configuration example: https://www.pgpool.net/docs/41/en/html/example-cluster.html

jeffty_w 2020-12-10 01:37 reporter ~0003651	Now I understand question 1 & 2. Thanks pengbo for the kindly support. But for question 3, the issue is after rebooting, starting postgres service on db2 & db3, and check the pool nodes status: [root@jeffty-db2 ~]# psql -h 192.168.1.208 -p 9999 -U pgpool postgres -c "show pool_nodes" Password for user pgpool: node_id \| hostname \| port \| status \| lb_weight \| role \| select_cnt \| load_balance_node \| replication_delay \| replication_state \| replication_sync_state \| last_status_change ---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 \| jeffty-db1 \| 5432 \| up \| 0.333333 \| primary \| 36837 \| true \| 0 \| \| \| 2020-11-28 14:04:24 1 \| jeffty-db2 \| 5432 \| down \| 0.333333 \| standby \| 0 \| false \| 0 \| \| \| 2020-11-28 14:01:26 2 \| jeffty-db3 \| 5432 \| down \| 0.333333 \| standby \| 0 \| false \| 0 \| \| \| 2020-11-28 14:01:28 The are 'down', and create a table or insert a record in db1 won't be replicated to db2 & db3 automatically. It seems that I need to execute something to notify pgpool to 'manage db2 & db3'? Otherwise the db change is always happening on db1. Thanks.

pengbo 2020-12-10 14:04 developer ~0003654 Last edited: 2020-12-10 14:05	Question 3: Is Virtual IP "192.168.1.208" set on the pgpool node which is on db1? Pgpool can't detect the standby nodes automatically once the nodes goes down. To automatically bring back the DOWN nodes, you can use the parameter "auto_failback". auto_failback = on https://www.pgpool.net/docs/41/en/html/runtime-config-failover.html#GUC-AUTO-FAILBACK

Date Modified	Username	Field	Change
2020-11-30 12:32	jeffty_w	New Issue
2020-11-30 12:41	pengbo	Note Added: 0003617
2020-11-30 12:43	pengbo	Assigned To	=> pengbo
2020-11-30 12:43	pengbo	Status	new => feedback
2020-11-30 12:55	jeffty_w	Note Added: 0003619
2020-11-30 12:55	jeffty_w	Status	feedback => assigned
2020-11-30 14:16	jeffty_w	Note Added: 0003620
2020-12-09 16:27	pengbo	Note Added: 0003647
2020-12-09 16:27	pengbo	Status	assigned => feedback
2020-12-09 16:34	pengbo	Note Edited: 0003647
2020-12-10 01:37	jeffty_w	Note Added: 0003651
2020-12-10 01:37	jeffty_w	Status	feedback => assigned
2020-12-10 14:04	pengbo	Note Added: 0003654
2020-12-10 14:05	pengbo	Note Edited: 0003654
2020-12-10 14:11	pengbo	Status	assigned => feedback