View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000731 | Pgpool-II | Bug | public | 2021-08-30 08:27 | 2021-09-14 10:58 |
| Reporter | tom-sato | Assigned To | Muhammad Usama | ||
| Priority | normal | Severity | block | Reproducibility | always |
| Status | closed | Resolution | open | ||
| Product Version | 4.2.4 | ||||
| Target Version | 4.2.5 | Fixed in Version | 4.2.5 | ||
| Summary | 0000731: Fails to execute follow_primary_command in Pgpool-II 4.2.4 | ||||
| Description | We have Pgpool-II and PostgreSQL with 3 nodes each, and the clustering mode is streaming replication. Failover occurs when the primary is shutdown, but the standby now fails to follow the new primary. The watchdog was also segfaulting. There was no problem with Pgpool-II 4.2.3. 2021-08-30 07:31:57: follow_child pid 11747: LOG: read from socket failed, remote end closed the connection 2021-08-30 07:31:57: follow_child pid 11747: WARNING: error reading from IPC command socket for ipc command c 2021-08-30 07:31:57: follow_child pid 11747: DETAIL: read from socket failed with error "Success" 2021-08-30 07:31:57: follow_child pid 11747: WARNING: execute cluster command failed 2021-08-30 07:31:57: follow_child pid 11747: DETAIL: issue command to watchdog returned NULL 2021-08-30 07:31:57: follow_child pid 11747: LOG: start triggering follow command. 2021-08-30 07:31:57: follow_child pid 11747: LOG: execute command: /var/lib/pgsql/test-4.2.4/pgpool0/etc/follow_primary.sh 0 /tmp 51000 /var/lib/pgsql/test-4.2.4/pgpool0/data0 1 0 /tmp 0 51001 /var/lib/pgsql/test-4.2.4/pgpool0/data1 2021-08-30 07:31:57: follow_child pid 11747: LOG: execute command: /var/lib/pgsql/test-4.2.4/pgpool0/etc/follow_primary.sh 2 /tmp 51002 /var/lib/pgsql/test-4.2.4/pgpool0/data2 1 0 /tmp 0 51001 /var/lib/pgsql/test-4.2.4/pgpool0/data1 2021-08-30 07:31:57: pcp_main pid 11471: LOG: forked new pcp worker, pid=11879 socket=7 2021-08-30 07:31:57: pcp_child pid 11879: LOG: received failback request for node_id: 2 from pid [11879] 2021-08-30 07:31:57: pcp_child pid 11879: LOG: failed to connect to watchdog command server socket 2021-08-30 07:31:57: pcp_child pid 11879: DETAIL: connect on "/var/run/postgresql/.s.PGPOOLWD_CMD.50002" failed with reason: "Connection refused" 2021-08-30 07:31:57: pcp_child pid 11879: WARNING: failover command on watchdog failed 2021-08-30 07:31:57: pcp_child pid 11879: DETAIL: issue command to watchdog returned NULL 2021-08-30 07:31:57: pcp_child pid 11879: ERROR: failback request for node_id: 2 from pid [11879] is canceled by other pgpool 2021-08-30 07:31:57: follow_child pid 11747: LOG: failed to connect to watchdog command server socket 2021-08-30 07:31:57: follow_child pid 11747: DETAIL: connect on "/var/run/postgresql/.s.PGPOOLWD_CMD.50002" failed with reason: "Connection refused" 2021-08-30 07:31:57: follow_child pid 11747: WARNING: execute cluster command failed 2021-08-30 07:31:57: follow_child pid 11747: DETAIL: issue command to watchdog returned NULL 2021-08-30 07:31:57: pcp_main pid 11471: LOG: PCP process with pid: 11879 exit with SUCCESS. 2021-08-30 07:31:57: pcp_main pid 11471: LOG: PCP process with pid: 11879 exits with status 0 2021-08-30 07:31:58: pcp_main pid 11471: LOG: restart request received in pcp child process 2021-08-30 07:31:58: main pid 11415: LOG: PCP child 11471 exits with status 0 in failover() 2021-08-30 07:31:58: main pid 11415: LOG: fork a new PCP child pid 11880 in failover() 2021-08-30 07:31:58: main pid 11415: LOG: Pgpool-II parent process received SIGUSR1 2021-08-30 07:31:58: main pid 11415: LOG: Pgpool-II parent process received inform quarantine nodes signal from watchdog 2021-08-30 07:31:58: main pid 11415: WARNING: watchdog child process with pid: 11428 was terminated by segmentation fault 2021-08-30 07:31:58: main pid 11415: LOG: fork a new watchdog child process with pid: 11881 | ||||
| Steps To Reproduce | The problem can be reproduced using watchdog_setup. The procedure is as follows: # cat /etc/centos-release CentOS Linux release 7.8.2003 (Core) # setenforce 0 # systemctl stop firewalld # yum -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm (snip) Installed: pgdg-redhat-repo.noarch 0:42.0-18 Complete! # yum -y install postgresql13-server (snip) Installed: postgresql13-server.x86_64 0:13.4-1PGDG.rhel7 Dependency Installed: postgresql13.x86_64 0:13.4-1PGDG.rhel7 postgresql13-libs.x86_64 0:13.4-1PGDG.rhel7 Complete! # yum -y install https://www.pgpool.net/yum/rpms/4.2/redhat/rhel-7-x86_64/pgpool-II-release-4.2-1.noarch.rpm (snip) Installed: pgpool-II-release.noarch 0:4.2-1 Complete! # yum -y install pgpool-II-pg13 pgpool-II-pg13-extensions (snip) Installed: pgpool-II-pg13.x86_64 0:4.2.4-1pgdg.rhel7 pgpool-II-pg13-extensions.x86_64 0:4.2.4-1pgdg.rhel7 Dependency Installed: libmemcached.x86_64 0:1.0.16-5.el7 Complete! # su - postgres $ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/var/lib/pgsql/.ssh/id_rsa): Created directory '/var/lib/pgsql/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /var/lib/pgsql/.ssh/id_rsa. Your public key has been saved in /var/lib/pgsql/.ssh/id_rsa.pub. The key fingerprint is: SHA256:FIScHsmZi1gebS/rsWQbwluERrzcs8igz55tpq1zZZo postgres@node-1.example.com The key's randomart image is: +---[RSA 2048]----+ | . + Bo | | = % . | | * O +. | | o B B.. | | . = o =S | |. = @ | | o % = | | ++E + | | .=Bo | +----[SHA256]-----+ $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ ssh localhost The authenticity of host 'localhost (::1)' can't be established. ECDSA key fingerprint is SHA256:XVT71QTM3Yh9FWF4XOoRj9gNYzcRiJoGGJ2FFUZInkU. ECDSA key fingerprint is MD5:92:28:4b:ed:7c:05:49:56:5c:ee:28:f0:68:2e:21:ea. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts. Last login: Mon Aug 30 07:26:41 2021 $ exit logout Connection to localhost closed. $ mkdir ~/test-4.2.4 $ cd ~/test-4.2.4 $ watchdog_setup -n 3 (snip) pgpool-II setting for streaming replication mode is done. To start the whole system, use /var/lib/pgsql/test-4.2.4/pgpool2/startall. To shutdown the whole system, use /var/lib/pgsql/test-4.2.4/pgpool2/shutdownall. pcp command user name is "postgres", password is "postgres". Each PostgreSQL, pgpool-II and pcp port is as follows: 0000001 port is 51000 0000002 port is 51001 0000003 port is 51002 pgpool port is 50008 pcp port is 50009 The info above is in README.port. $ ./startall waiting for server to start....11377 2021-08-30 07:30:39 JST LOG: redirecting log output to logging collector process 11377 2021-08-30 07:30:39 JST HINT: Future log output will appear in directory "log". done server started waiting for server to start....11389 2021-08-30 07:30:39 JST LOG: redirecting log output to logging collector process 11389 2021-08-30 07:30:39 JST HINT: Future log output will appear in directory "log". done server started waiting for server to start....11403 2021-08-30 07:30:39 JST LOG: redirecting log output to logging collector process 11403 2021-08-30 07:30:39 JST HINT: Future log output will appear in directory "log". done server started $ psql -h /tmp -p 50008 -c "show pool_nodes" node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | /tmp | 51000 | up | 0.333333 | primary | 0 | false | 0 | | | 2021-08-30 07:31:27 1 | /tmp | 51001 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2021-08-30 07:31:27 2 | /tmp | 51002 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2021-08-30 07:31:27 (3 rows) $ /usr/pgsql-13/bin/pg_ctl stop -D pgpool0/data0 waiting for server to shut down.... done server stopped $ psql -h /tmp -p 50008 -c "show pool_nodes" node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | /tmp | 51000 | down | 0.333333 | standby | 0 | false | 0 | | | 2021-08-30 07:31:57 1 | /tmp | 51001 | up | 0.333333 | primary | 0 | true | 0 | | | 2021-08-30 07:33:02 2 | /tmp | 51002 | down | 0.333333 | standby | 0 | false | 0 | streaming | async | 2021-08-30 07:31:57 (3 rows) $ ./shutdownall 2021-08-30 07:33:15: main pid 30813: LOG: stop request sent to pgpool. waiting for termination... .done. 2021-08-30 07:33:16: main pid 30816: LOG: stop request sent to pgpool. waiting for termination... .done. 2021-08-30 07:33:17: main pid 30820: LOG: stop request sent to pgpool. waiting for termination... .done. pg_ctl: PID file "/var/lib/pgsql/test-4.2.4/pgpool0/data0/postmaster.pid" does not exist Is server running? waiting for server to shut down.... done server stopped waiting for server to shut down.... done server stopped | ||||
| Tags | failover, watchdog | ||||
|
|
|
|
|
Thank you for reporting this bug. It is fixed by the following commit. https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=3e8807f49af665e66da404c632e4a17819e826ec This fix will be released in the next minor version 4.2.5. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2021-08-30 08:27 | tom-sato | New Issue | |
| 2021-08-30 08:27 | tom-sato | Tag Attached: failover | |
| 2021-08-30 08:27 | tom-sato | Tag Attached: watchdog | |
| 2021-08-30 08:27 | tom-sato | File Added: test-4.2.4.zip | |
| 2021-08-30 09:44 | pengbo | Assigned To | => pengbo |
| 2021-08-30 09:44 | pengbo | Status | new => assigned |
| 2021-08-30 10:38 | pengbo | Assigned To | pengbo => Muhammad Usama |
| 2021-09-13 02:02 | pengbo | Note Added: 0003937 | |
| 2021-09-13 02:02 | pengbo | Status | assigned => resolved |
| 2021-09-13 02:02 | pengbo | Target Version | => 4.2.5 |
| 2021-09-14 10:58 | administrator | Status | resolved => closed |
| 2021-09-14 10:58 | administrator | Fixed in Version | => 4.2.5 |