View Issue Details

IDProjectCategoryView StatusLast Update
0000731Pgpool-IIBugpublic2021-09-14 10:58
Reportertom-sato Assigned ToMuhammad Usama  
PrioritynormalSeverityblockReproducibilityalways
Status closedResolutionopen 
Product Version4.2.4 
Target Version4.2.5Fixed in Version4.2.5 
Summary0000731: Fails to execute follow_primary_command in Pgpool-II 4.2.4
DescriptionWe have Pgpool-II and PostgreSQL with 3 nodes each, and the clustering mode is streaming replication. Failover occurs when the primary is shutdown, but the standby now fails to follow the new primary. The watchdog was also segfaulting. There was no problem with Pgpool-II 4.2.3.

2021-08-30 07:31:57: follow_child pid 11747: LOG: read from socket failed, remote end closed the connection
2021-08-30 07:31:57: follow_child pid 11747: WARNING: error reading from IPC command socket for ipc command c
2021-08-30 07:31:57: follow_child pid 11747: DETAIL: read from socket failed with error "Success"
2021-08-30 07:31:57: follow_child pid 11747: WARNING: execute cluster command failed
2021-08-30 07:31:57: follow_child pid 11747: DETAIL: issue command to watchdog returned NULL
2021-08-30 07:31:57: follow_child pid 11747: LOG: start triggering follow command.
2021-08-30 07:31:57: follow_child pid 11747: LOG: execute command: /var/lib/pgsql/test-4.2.4/pgpool0/etc/follow_primary.sh 0 /tmp 51000 /var/lib/pgsql/test-4.2.4/pgpool0/data0 1 0 /tmp 0 51001 /var/lib/pgsql/test-4.2.4/pgpool0/data1
2021-08-30 07:31:57: follow_child pid 11747: LOG: execute command: /var/lib/pgsql/test-4.2.4/pgpool0/etc/follow_primary.sh 2 /tmp 51002 /var/lib/pgsql/test-4.2.4/pgpool0/data2 1 0 /tmp 0 51001 /var/lib/pgsql/test-4.2.4/pgpool0/data1
2021-08-30 07:31:57: pcp_main pid 11471: LOG: forked new pcp worker, pid=11879 socket=7
2021-08-30 07:31:57: pcp_child pid 11879: LOG: received failback request for node_id: 2 from pid [11879]
2021-08-30 07:31:57: pcp_child pid 11879: LOG: failed to connect to watchdog command server socket
2021-08-30 07:31:57: pcp_child pid 11879: DETAIL: connect on "/var/run/postgresql/.s.PGPOOLWD_CMD.50002" failed with reason: "Connection refused"
2021-08-30 07:31:57: pcp_child pid 11879: WARNING: failover command on watchdog failed
2021-08-30 07:31:57: pcp_child pid 11879: DETAIL: issue command to watchdog returned NULL
2021-08-30 07:31:57: pcp_child pid 11879: ERROR: failback request for node_id: 2 from pid [11879] is canceled by other pgpool
2021-08-30 07:31:57: follow_child pid 11747: LOG: failed to connect to watchdog command server socket
2021-08-30 07:31:57: follow_child pid 11747: DETAIL: connect on "/var/run/postgresql/.s.PGPOOLWD_CMD.50002" failed with reason: "Connection refused"
2021-08-30 07:31:57: follow_child pid 11747: WARNING: execute cluster command failed
2021-08-30 07:31:57: follow_child pid 11747: DETAIL: issue command to watchdog returned NULL
2021-08-30 07:31:57: pcp_main pid 11471: LOG: PCP process with pid: 11879 exit with SUCCESS.
2021-08-30 07:31:57: pcp_main pid 11471: LOG: PCP process with pid: 11879 exits with status 0
2021-08-30 07:31:58: pcp_main pid 11471: LOG: restart request received in pcp child process
2021-08-30 07:31:58: main pid 11415: LOG: PCP child 11471 exits with status 0 in failover()
2021-08-30 07:31:58: main pid 11415: LOG: fork a new PCP child pid 11880 in failover()
2021-08-30 07:31:58: main pid 11415: LOG: Pgpool-II parent process received SIGUSR1
2021-08-30 07:31:58: main pid 11415: LOG: Pgpool-II parent process received inform quarantine nodes signal from watchdog
2021-08-30 07:31:58: main pid 11415: WARNING: watchdog child process with pid: 11428 was terminated by segmentation fault
2021-08-30 07:31:58: main pid 11415: LOG: fork a new watchdog child process with pid: 11881
Steps To ReproduceThe problem can be reproduced using watchdog_setup. The procedure is as follows:

# cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)
# setenforce 0
# systemctl stop firewalld
# yum -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
(snip)
Installed:
  pgdg-redhat-repo.noarch 0:42.0-18

Complete!
# yum -y install postgresql13-server
(snip)
Installed:
  postgresql13-server.x86_64 0:13.4-1PGDG.rhel7

Dependency Installed:
  postgresql13.x86_64 0:13.4-1PGDG.rhel7
  postgresql13-libs.x86_64 0:13.4-1PGDG.rhel7

Complete!
# yum -y install https://www.pgpool.net/yum/rpms/4.2/redhat/rhel-7-x86_64/pgpool-II-release-4.2-1.noarch.rpm
(snip)
Installed:
  pgpool-II-release.noarch 0:4.2-1

Complete!
# yum -y install pgpool-II-pg13 pgpool-II-pg13-extensions
(snip)
Installed:
  pgpool-II-pg13.x86_64 0:4.2.4-1pgdg.rhel7
  pgpool-II-pg13-extensions.x86_64 0:4.2.4-1pgdg.rhel7

Dependency Installed:
  libmemcached.x86_64 0:1.0.16-5.el7

Complete!
# su - postgres
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/pgsql/.ssh/id_rsa):
Created directory '/var/lib/pgsql/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/pgsql/.ssh/id_rsa.
Your public key has been saved in /var/lib/pgsql/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:FIScHsmZi1gebS/rsWQbwluERrzcs8igz55tpq1zZZo postgres@node-1.example.com
The key's randomart image is:
+---[RSA 2048]----+
| . + Bo |
| = % . |
| * O +. |
| o B B.. |
| . = o =S |
|. = @ |
| o % = |
| ++E + |
| .=Bo |
+----[SHA256]-----+
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:XVT71QTM3Yh9FWF4XOoRj9gNYzcRiJoGGJ2FFUZInkU.
ECDSA key fingerprint is MD5:92:28:4b:ed:7c:05:49:56:5c:ee:28:f0:68:2e:21:ea.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Mon Aug 30 07:26:41 2021
$ exit
logout
Connection to localhost closed.
$ mkdir ~/test-4.2.4
$ cd ~/test-4.2.4
$ watchdog_setup -n 3
(snip)
pgpool-II setting for streaming replication mode is done.
To start the whole system, use /var/lib/pgsql/test-4.2.4/pgpool2/startall.
To shutdown the whole system, use /var/lib/pgsql/test-4.2.4/pgpool2/shutdownall.
pcp command user name is "postgres", password is "postgres".
Each PostgreSQL, pgpool-II and pcp port is as follows:
0000001 port is 51000
0000002 port is 51001
0000003 port is 51002
pgpool port is 50008
pcp port is 50009
The info above is in README.port.
$ ./startall
waiting for server to start....11377 2021-08-30 07:30:39 JST LOG: redirecting log output to logging collector process
11377 2021-08-30 07:30:39 JST HINT: Future log output will appear in directory "log".
 done
server started
waiting for server to start....11389 2021-08-30 07:30:39 JST LOG: redirecting log output to logging collector process
11389 2021-08-30 07:30:39 JST HINT: Future log output will appear in directory "log".
 done
server started
waiting for server to start....11403 2021-08-30 07:30:39 JST LOG: redirecting log output to logging collector process
11403 2021-08-30 07:30:39 JST HINT: Future log output will appear in directory "log".
 done
server started
$ psql -h /tmp -p 50008 -c "show pool_nodes"
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | /tmp | 51000 | up | 0.333333 | primary | 0 | false | 0 | | | 2021-08-30 07:31:27
 1 | /tmp | 51001 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2021-08-30 07:31:27
 2 | /tmp | 51002 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2021-08-30 07:31:27
(3 rows)

$ /usr/pgsql-13/bin/pg_ctl stop -D pgpool0/data0
waiting for server to shut down.... done
server stopped
$ psql -h /tmp -p 50008 -c "show pool_nodes"
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | /tmp | 51000 | down | 0.333333 | standby | 0 | false | 0 | | | 2021-08-30 07:31:57
 1 | /tmp | 51001 | up | 0.333333 | primary | 0 | true | 0 | | | 2021-08-30 07:33:02
 2 | /tmp | 51002 | down | 0.333333 | standby | 0 | false | 0 | streaming | async | 2021-08-30 07:31:57
(3 rows)

$ ./shutdownall
2021-08-30 07:33:15: main pid 30813: LOG: stop request sent to pgpool. waiting for termination...
.done.
2021-08-30 07:33:16: main pid 30816: LOG: stop request sent to pgpool. waiting for termination...
.done.
2021-08-30 07:33:17: main pid 30820: LOG: stop request sent to pgpool. waiting for termination...
.done.
pg_ctl: PID file "/var/lib/pgsql/test-4.2.4/pgpool0/data0/postmaster.pid" does not exist
Is server running?
waiting for server to shut down.... done
server stopped
waiting for server to shut down.... done
server stopped
Tagsfailover, watchdog

Activities

tom-sato

2021-08-30 08:27

reporter  

test-4.2.4.zip (104,957 bytes)

pengbo

2021-09-13 02:02

developer   ~0003937

Thank you for reporting this bug.
It is fixed by the following commit.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=3e8807f49af665e66da404c632e4a17819e826ec

This fix will be released in the next minor version 4.2.5.

Issue History

Date Modified Username Field Change
2021-08-30 08:27 tom-sato New Issue
2021-08-30 08:27 tom-sato Tag Attached: failover
2021-08-30 08:27 tom-sato Tag Attached: watchdog
2021-08-30 08:27 tom-sato File Added: test-4.2.4.zip
2021-08-30 09:44 pengbo Assigned To => pengbo
2021-08-30 09:44 pengbo Status new => assigned
2021-08-30 10:38 pengbo Assigned To pengbo => Muhammad Usama
2021-09-13 02:02 pengbo Note Added: 0003937
2021-09-13 02:02 pengbo Status assigned => resolved
2021-09-13 02:02 pengbo Target Version => 4.2.5
2021-09-14 10:58 administrator Status resolved => closed
2021-09-14 10:58 administrator Fixed in Version => 4.2.5