View Issue Details

IDProjectCategoryView StatusLast Update
0000664Pgpool-IIBugpublic2020-12-10 14:11
Reporterjeffty_w Assigned Topengbo  
PriorityhighSeveritymajorReproducibilityalways
Status feedbackResolutionopen 
PlatformLinuxOSRHELOS Version7.7
Product Version4.1.2 
Summary0000664: PgPool with WatchDog Cluster doesn't work after reboot all the machines
DescriptionHi there,

I installed Pgpool + WatchDog as the example https://www.pgpool.net/docs/latest/en/html/example-cluster.html described on 3 machines.
Today they are powered off by accident. After power on them I found that db2 and db3 are in 'down' status.
Is there a recommended way to auto start all of the nodes even after reboot?

RPMs:
postgresql: 12.3
pgpool-II-pg12: 4.1.2

Machines: jeffty-db11, jeffty-db2, jeffty-db3
VIP: 192.168.1.208
Services Configured during the installation:
pgpool service is not enabled,
systemctl enable postgresql-12 on every machine.

cat /usr/lib/systemd/system/postgresql-12.service
ExecStartPre=/usr/pgsql-12/bin/postgresql-12-check-db-dir ${PGDATA}
ExecStart=/usr/pgsql-12/bin/postgres -D ${PGDATA}


And here is how I initiated the recovery nodes on db1 several months ago:
pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1
pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2


Now after power on those machines, only the db1's postgresql service is active. I started the pgpool service manually one by one and check the watchdog info on db1:

-bash-4.2$ pcp_watchdog_info -h 192.168.1.208 -p 9898 -U pgpool

3 YES jeffty-db1:9999 Linux jeffty-db1 jeffty-db1

jeffty-db1:9999 Linux jeffty-db1 jeffty-db1 9999 9000 4 MASTER
jeffty-db2:9999 Linux jeffty-db2 jeffty-db2 9999 9000 7 STANDBY
jeffty-db3:9999 Linux jeffty-db3 jeffty-db3 9999 9000 7 STANDBY

Looks good. and I tried

-bash-4.2$ psql -h 192.168.1.208 -p 9999 -U pgpool postgres -c "show pool_nodes"

 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | jeffty-db1 | 5432 | up | 0.333333 | primary | 247 | true | 0 | | | 2020-11-28 14:04:24
 1 | jeffty-db2 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:26
 2 | jeffty-db3 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:28
(3 rows)

db2 & db3 are in 'down' status. Then I login them and tried:
e.g.
db2#su - postgres
-bash-4.2$ psql -U postgres
psql: error: could not connect to server: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

So I have to start postgresql manually as
/usr/pgsql-12/bin/pg_ctl -D /var/lib/pgsql/12/data/ start

Now I can use psql -U postgres to login db2, but it seems that they are not in replicated since I create a table on db2 and cannot find the new table in db1.

Finally after rebooting:
1. If I don't manually start postgresql service on db2 & db3, then it turns into only one node cluster, db1 acts as the only database instance.
2. If I manually start postgresql service on db2 & db3, it should still be one node instance since changes on db2 won't be synchronized to db1.

What steps I missed during this process? Any suggestion or guidance would be appreciated.

Best Regards,
Jeffty
Steps To Reproduce1. Install pgpool 4.1.2 and postgresql 12.3 on 3 machines as https://www.pgpool.net/docs/latest/en/html/example-cluster.html described.
2. enable the postgresql-12 services.
3. don't enable pgpool services.
4. reboot all of the machines.
TagsNo tags attached.

Activities

pengbo

2020-11-30 12:41

developer   ~0003617

Have you configure "8.2.7. /etc/sysconfig/pgpool Configuration"?
https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG-SYSCONFIG

---
# vi /etc/sysconfig/pgpool
...
OPTS=" -D -n"
---

jeffty_w

2020-11-30 12:55

reporter   ~0003619

Hi pengbo, yes.

e.g. pgpool file in all servers:

vi /etc/sysconfig/pgpool
# Options for pgpool

# -n: don't run in daemon mode. does not detatch control tty
# -d: debug mode. lots of debug information will be printed

#OPTS=" -d -n"
OPTS=" -D -n"

STOP_OPTS=" -m fast"

jeffty_w

2020-11-30 14:16

reporter   ~0003620

Checked and found the root cause why db2 & db3 cannot be started automatically:

on db2 & db3: vi /usr/lib/systemd/system/postgresql-12.service
......
ExecStart=/usr/pgsql-12/bin/postgresql -D ${PGDATA}
......

it should be 'postgres' rather than 'postgresql'.I don't know why it happened, maybe I changed them manually before.

Changed and exected:
systemctl daemon-reload
systemctl start postgresql-12
on db2 & db3.

Now postgres service in db2 & db3 are running.
And check the pool nodes status:

[root@jeffty-db2 ~]# psql -h 192.168.1.208 -p 9999 -U pgpool postgres -c "show pool_nodes"
Password for user pgpool:
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | jeffty-db1 | 5432 | up | 0.333333 | primary | 36837 | true | 0 | | | 2020-11-28 14:04:24
 1 | jeffty-db2 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:26
 2 | jeffty-db3 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:28

db2 & db3 are still down. It's the correct status right?

Since the document 4.1.2 doesn't mention it, the question now becomes:

1. After we initdb in db1, and execute
pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1
pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2

all postgresqls on db1-db3 are running. Now do we need to

systemctl enable pgpool
systemctl enable postgresql-12

on all 3 nodes?

2. After rebooting, do we need to manually execute below commands again?

pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1
pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2

Thanks a lot.

pengbo

2020-12-09 16:27

developer   ~0003647

Last edited: 2020-12-09 16:34

> db2 & db3 are still down. It's the correct status right?

Yes. If you didn't create standby nodes, it is the correct status.

> 1. After we initdb in db1, and execute
> pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1
> pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2

Yes.
If you haven't created PostgreSQL node1 and node2, you need to execute "pcp_recovery_node" to create them.

To create standby nodes you can do either 1. or 2.:
  1. use "pcp_recovery_node"
  2. create a streaming replication PostgreSQL cluster in advance (e.g. using pg_basebackup).

> all postgresqls on db1-db3 are running. Now do we need to
>
> systemctl enable pgpool
> systemctl enable postgresql-12
>
> on all 3 nodes?

Yes.

> 2. After rebooting, do we need to manually execute below commands again?
>
> pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 1
> pcp_recovery_node -h 192.168.1.208 -p 9898 -U pgpool -n 2

No. Because you have created node1 and node2, you do not need to execute "pcp_recovery_node" again.

Below is an configuration example:
https://www.pgpool.net/docs/41/en/html/example-cluster.html

jeffty_w

2020-12-10 01:37

reporter   ~0003651

Now I understand question 1 & 2.
Thanks pengbo for the kindly support.

But for question 3, the issue is after rebooting, starting postgres service on db2 & db3, and check the pool nodes status:

[root@jeffty-db2 ~]# psql -h 192.168.1.208 -p 9999 -U pgpool postgres -c "show pool_nodes"
Password for user pgpool:
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+--------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | jeffty-db1 | 5432 | up | 0.333333 | primary | 36837 | true | 0 | | | 2020-11-28 14:04:24
 1 | jeffty-db2 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:26
 2 | jeffty-db3 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-11-28 14:01:28

The are 'down', and create a table or insert a record in db1 won't be replicated to db2 & db3 automatically.
It seems that I need to execute something to notify pgpool to 'manage db2 & db3'? Otherwise the db change is always happening on db1.

Thanks.

pengbo

2020-12-10 14:04

developer   ~0003654

Last edited: 2020-12-10 14:05

Question 3:

Is Virtual IP "192.168.1.208" set on the pgpool node which is on db1?
Pgpool can't detect the standby nodes automatically once the nodes goes down.

To automatically bring back the DOWN nodes, you can use the parameter "auto_failback".

  auto_failback = on

https://www.pgpool.net/docs/41/en/html/runtime-config-failover.html#GUC-AUTO-FAILBACK

Issue History

Date Modified Username Field Change
2020-11-30 12:32 jeffty_w New Issue
2020-11-30 12:41 pengbo Note Added: 0003617
2020-11-30 12:43 pengbo Assigned To => pengbo
2020-11-30 12:43 pengbo Status new => feedback
2020-11-30 12:55 jeffty_w Note Added: 0003619
2020-11-30 12:55 jeffty_w Status feedback => assigned
2020-11-30 14:16 jeffty_w Note Added: 0003620
2020-12-09 16:27 pengbo Note Added: 0003647
2020-12-09 16:27 pengbo Status assigned => feedback
2020-12-09 16:34 pengbo Note Edited: 0003647
2020-12-10 01:37 jeffty_w Note Added: 0003651
2020-12-10 01:37 jeffty_w Status feedback => assigned
2020-12-10 14:04 pengbo Note Added: 0003654
2020-12-10 14:05 pengbo Note Edited: 0003654
2020-12-10 14:11 pengbo Status assigned => feedback