View Issue Details

IDProjectCategoryView StatusLast Update
0000268Pgpool-IIBugpublic2020-12-10 15:47
Reporterchjischj Assigned ToMuhammad Usama  
PriorityhighSeveritymajorReproducibilityalways
Status closedResolutionopen 
PlatformLinuxOSCentOSOS Version7.0
Product Version3.5.4 
Summary0000268: pgpool block on issue_command_to_watchdog() for ever when the primary's network is cut off
Descriptionpgpool:3.5.4
postgresql:9.5.2
one primary and two standbys,stream replication and use watchdog,pgpool runs on each nodes together with postgres.

one of the node(node1) is the primary of postgresql and the master of pgpool,when cut off the network of this node,failover_command fail to be called.
The reset of nodes(node2 and node3) had elected a new pgpool master,but the new master blocked at issue_command_to_watchdog()

[root@node3 ~]# pcp_watchdog_info -w -v
Watchdog Cluster Information
Total Nodes : 3
Remote Nodes : 2
Quorum state : QUORUM EXIST
Alive Remote Nodes : 2
VIP up on local node : YES
Master Node Name : Linux_node3_9999
Master Host Name : node3

Watchdog Node Information
Node Name : Linux_node3_9999
Host Name : node3
Delegate IP : 192.168.0.220
Pgpool port : 9999
Watchdog port : 9000
Node priority : 1
Status : 4
Status Name : MASTER

Node Name : Linux_node1_9999
Host Name : node1
Delegate IP : 192.168.0.220
Pgpool port : 9999
Watchdog port : 9000
Node priority : 1
Status : 8
Status Name : LOST

Node Name : Linux_node2_9999
Host Name : node2
Delegate IP : 192.168.0.220
Pgpool port : 9999
Watchdog port : 9000
Node priority : 1
Status : 7
Status Name : STANDBY

pgpool log
--------------------------
Nov 15 23:12:37 node3 pgpool: 2016-11-15 23:12:37: pid 4088: ERROR: Failed to check replication time lag
Nov 15 23:12:37 node3 pgpool: 2016-11-15 23:12:37: pid 4088: DETAIL: No persistent db connection for the node 0
Nov 15 23:12:37 node3 pgpool: 2016-11-15 23:12:37: pid 4088: HINT: check sr_check_user and sr_check_password
Nov 15 23:12:37 node3 pgpool: 2016-11-15 23:12:37: pid 4088: CONTEXT: while checking replication time lag
Nov 15 23:12:39 node3 pgpool: 2016-11-15 23:12:39: pid 4088: LOG: failed to connect to PostgreSQL server on "node1:5433", getsockopt() detected error "No route to host"
Nov 15 23:12:39 node3 pgpool: 2016-11-15 23:12:39: pid 4088: ERROR: failed to make persistent db connection
Nov 15 23:12:39 node3 pgpool: 2016-11-15 23:12:39: pid 4088: DETAIL: connection to host:"node1:5433" failed

stack of pgpool
--------------------
 [root@node3 ~]# ps -ef|grep pgpool.conf
root 4048 1 0 Nov15 ? 00:00:00 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -n
root 5301 4832 0 00:10 pts/3 00:00:00 grep --color=auto pgpool.conf
[root@node3 ~]# pstack 4048
#0 0x00007f73647e98d3 in __select_nocancel () from /lib64/libc.so.6
0000001 0x0000000000493d2e in issue_command_to_watchdog ()
0000002 0x0000000000494ac3 in wd_degenerate_backend_set ()
0000003 0x000000000040bcf3 in degenerate_backend_set_ex ()
0000004 0x000000000040e1c4 in PgpoolMain ()
0000005 0x0000000000406ec2 in main ()
Steps To Reproduce1. one primary and two standbys
2. stream replication and use watchdog
3. run pgpool on every nodes
4. cut off the network of the pgpool master(ip addr down ...)
Tagsstreaming replication, watchdog

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2016-12-07 23:42 chjischj New Issue
2016-12-07 23:42 chjischj Tag Attached: streaming replication
2016-12-07 23:42 chjischj Tag Attached: watchdog
2016-12-20 09:28 t-ishii Assigned To => Muhammad Usama
2016-12-20 09:28 t-ishii Status new => assigned
2017-08-29 09:41 pengbo Status assigned => closed