View Issue Details

IDProjectCategoryView StatusLast Update
0000032Pgpool-IIBugpublic2012-11-15 10:07
Reporteroleg_myrkAssigned Tot-ishii 
PrioritynormalSeveritycrashReproducibilitysometimes
Status resolvedResolutionopen 
Platformamd64OSDebianOS VersionWheezy
Product Version 
Target VersionFixed in Version 
Summary0000032: PGPool hangs on pcp_attach/detach
DescriptionI am observing a situation when pgpool hangs in master/slave mode with
postgresql streaming replication.
Steps To Reproduce* Start pgpool:
      sudo pgpool -f /etc/pgpool2/mx_large-pgpool.conf -F
/etc/pgpool2/pcp.conf -a /etc/pgpool2/pool_hba.conf -n -d
* To reproduce:
      $ sudo pcp_attach_node 100 localhost 9899 mx mx 1
      $ sudo pcp_node_info 100 localhost 9899 mx mx 1
      mx-db2 5434 1 0.600000
      $ sudo pcp_detach_node 100 localhost 9899 mx mx 1
      $ sudo pcp_node_info 100 localhost 9899 mx mx 1
      *** hangs here ***
* Also connecting to the port hangs:
      psql -p 6434 -U mx_large mx_large
* This freeze is not deterministic, but attaching/detaching standby
node and asking info for a few times will eventually freeze pgpool
Additional InformationPgpool version is: pgpool-II version 3.2.1, running on Debian
Wheezy, Postgresql 9.1


Pgpool's debug output:

$ sudo pgpool -f /etc/pgpool2/mx_large-pgpool.conf -F /etc/pgpool2/pcp.conf -a /etc/pgpool2/pool_hba.conf -n -d
2012-10-17 10:22:33 DEBUG: pid 12887: key: listen_addresses
2012-10-17 10:22:33 DEBUG: pid 12887: value: '*' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: port
2012-10-17 10:22:33 DEBUG: pid 12887: value: 6434 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: key: socket_dir
2012-10-17 10:22:33 DEBUG: pid 12887: value: '/var/run/postgresql' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: pcp_port
2012-10-17 10:22:33 DEBUG: pid 12887: value: 9899 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: key: pcp_socket_dir
2012-10-17 10:22:33 DEBUG: pid 12887: value: '/var/run/pgpool-mx_large' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: backend_hostname0
2012-10-17 10:22:33 DEBUG: pid 12887: value: 'mx-db1' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: backend_port0
2012-10-17 10:22:33 DEBUG: pid 12887: value: 5434 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: pool_config: port slot number 0
2012-10-17 10:22:33 DEBUG: pid 12887: key: backend_weight0
2012-10-17 10:22:33 DEBUG: pid 12887: value: 2 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: pool_config: weight slot number 0 weight: 2.000000
2012-10-17 10:22:33 DEBUG: pid 12887: key: backend_flag0
2012-10-17 10:22:33 DEBUG: pid 12887: value: 'DISALLOW_TO_FAILOVER' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: extract_string_tokens: token: DISALLOW_TO_FAILOVER
2012-10-17 10:22:33 DEBUG: pid 12887: pool_config: disallow_to_failover on
2012-10-17 10:22:33 DEBUG: pid 12887: pool_config: slot number 0 flag: 0001
2012-10-17 10:22:33 DEBUG: pid 12887: key: backend_hostname1
2012-10-17 10:22:33 DEBUG: pid 12887: value: 'mx-db2' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: backend_port1
2012-10-17 10:22:33 DEBUG: pid 12887: value: 5434 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: pool_config: port slot number 1
2012-10-17 10:22:33 DEBUG: pid 12887: key: backend_weight1
2012-10-17 10:22:33 DEBUG: pid 12887: value: 3 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: pool_config: weight slot number 1 weight: 3.000000
2012-10-17 10:22:33 DEBUG: pid 12887: key: enable_pool_hba
2012-10-17 10:22:33 DEBUG: pid 12887: value: on kind: 1
2012-10-17 10:22:33 DEBUG: pid 12887: key: authentication_timeout
2012-10-17 10:22:33 DEBUG: pid 12887: value: 60 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: key: pool_passwd
2012-10-17 10:22:33 DEBUG: pid 12887: value: 'mx_large-pool_passwd' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: ssl
2012-10-17 10:22:33 DEBUG: pid 12887: value: off kind: 1
2012-10-17 10:22:33 DEBUG: pid 12887: key: num_init_children
2012-10-17 10:22:33 DEBUG: pid 12887: value: 32 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: key: max_pool
2012-10-17 10:22:33 DEBUG: pid 12887: value: 4 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: key: child_life_time
2012-10-17 10:22:33 DEBUG: pid 12887: value: 300 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: key: child_max_connections
2012-10-17 10:22:33 DEBUG: pid 12887: value: 0 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: key: connection_life_time
2012-10-17 10:22:33 DEBUG: pid 12887: value: 0 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: key: client_idle_limit
2012-10-17 10:22:33 DEBUG: pid 12887: value: 0 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: key: log_destination
2012-10-17 10:22:33 DEBUG: pid 12887: value: 'stderr' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: print_timestamp
2012-10-17 10:22:33 DEBUG: pid 12887: value: on kind: 1
2012-10-17 10:22:33 DEBUG: pid 12887: key: log_connections
2012-10-17 10:22:33 DEBUG: pid 12887: value: on kind: 1
2012-10-17 10:22:33 DEBUG: pid 12887: key: log_hostname
2012-10-17 10:22:33 DEBUG: pid 12887: value: on kind: 1
2012-10-17 10:22:33 DEBUG: pid 12887: key: log_statement
2012-10-17 10:22:33 DEBUG: pid 12887: value: off kind: 1
2012-10-17 10:22:33 DEBUG: pid 12887: key: log_per_node_statement
2012-10-17 10:22:33 DEBUG: pid 12887: value: off kind: 1
2012-10-17 10:22:33 DEBUG: pid 12887: key: log_standby_delay
2012-10-17 10:22:33 DEBUG: pid 12887: value: 'none' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: syslog_facility
2012-10-17 10:22:33 DEBUG: pid 12887: value: 'LOCAL0' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: syslog_ident
2012-10-17 10:22:33 DEBUG: pid 12887: value: 'pgpool' kind: 4
2012-10-17 10:22:33 DEBUG: pid 12887: key: debug_level
2012-10-17 10:22:33 DEBUG: pid 12887: value: 0 kind: 2
2012-10-17 10:22:33 DEBUG: pid 12887: loading "/etc/pgpool2/pool_hba.conf" for client authentication configuration file
2012-10-17 10:22:33 LOG: pid 12887: read_status_file: 1 th backend is set to down status
2012-10-17 10:22:33 DEBUG: pid 12888: I am 12888
2012-10-17 10:22:33 DEBUG: pid 12888: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12889: I am 12889
2012-10-17 10:22:33 DEBUG: pid 12889: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12890: I am 12890
2012-10-17 10:22:33 DEBUG: pid 12890: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12891: I am 12891
2012-10-17 10:22:33 DEBUG: pid 12891: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12892: I am 12892
2012-10-17 10:22:33 DEBUG: pid 12892: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12893: I am 12893
2012-10-17 10:22:33 DEBUG: pid 12893: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12894: I am 12894
2012-10-17 10:22:33 DEBUG: pid 12894: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12895: I am 12895
2012-10-17 10:22:33 DEBUG: pid 12895: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12896: I am 12896
2012-10-17 10:22:33 DEBUG: pid 12896: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12897: I am 12897
2012-10-17 10:22:33 DEBUG: pid 12897: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12898: I am 12898
2012-10-17 10:22:33 DEBUG: pid 12898: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12899: I am 12899
2012-10-17 10:22:33 DEBUG: pid 12899: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12900: I am 12900
2012-10-17 10:22:33 DEBUG: pid 12900: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12901: I am 12901
2012-10-17 10:22:33 DEBUG: pid 12901: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12902: I am 12902
2012-10-17 10:22:33 DEBUG: pid 12902: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12903: I am 12903
2012-10-17 10:22:33 DEBUG: pid 12903: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12904: I am 12904
2012-10-17 10:22:33 DEBUG: pid 12904: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12906: I am 12906
2012-10-17 10:22:33 DEBUG: pid 12906: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12907: I am 12907
2012-10-17 10:22:33 DEBUG: pid 12907: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12908: I am 12908
2012-10-17 10:22:33 DEBUG: pid 12908: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12909: I am 12909
2012-10-17 10:22:33 DEBUG: pid 12909: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12910: I am 12910
2012-10-17 10:22:33 DEBUG: pid 12910: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12911: I am 12911
2012-10-17 10:22:33 DEBUG: pid 12911: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12912: I am 12912
2012-10-17 10:22:33 DEBUG: pid 12912: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12913: I am 12913
2012-10-17 10:22:33 DEBUG: pid 12913: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12914: I am 12914
2012-10-17 10:22:33 DEBUG: pid 12914: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12905: I am 12905
2012-10-17 10:22:33 DEBUG: pid 12905: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12915: I am 12915
2012-10-17 10:22:33 DEBUG: pid 12915: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12917: I am 12917
2012-10-17 10:22:33 DEBUG: pid 12917: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12918: I am 12918
2012-10-17 10:22:33 DEBUG: pid 12918: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 LOG: pid 12887: pgpool-II successfully started. version 3.2.1 (namameboshi)
2012-10-17 10:22:33 DEBUG: pid 12919: I am 12919
2012-10-17 10:22:33 DEBUG: pid 12919: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12921: I am 12921
2012-10-17 10:22:33 DEBUG: pid 12921: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12887: pool_ssl: SSL requested but SSL support is not available
2012-10-17 10:22:33 DEBUG: pid 12916: I am 12916
2012-10-17 10:22:33 DEBUG: pid 12916: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:33 DEBUG: pid 12921: pool_ssl: SSL requested but SSL support is not available
2012-10-17 10:22:33 DEBUG: pid 12887: s_do_auth: auth kind: 5
2012-10-17 10:22:33 DEBUG: pid 12921: s_do_auth: auth kind: 5
2012-10-17 10:22:33 DEBUG: pid 12921: s_do_auth: auth kind: 0
2012-10-17 10:22:33 DEBUG: pid 12887: s_do_auth: auth kind: 0
2012-10-17 10:22:33 DEBUG: pid 12887: s_do_auth: backend key data received
2012-10-17 10:22:33 DEBUG: pid 12887: s_do_auth: transaction state: I
2012-10-17 10:22:33 DEBUG: pid 12887: do_query: extended:0 query:SELECT pg_is_in_recovery()
2012-10-17 10:22:33 DEBUG: pid 12921: s_do_auth: backend key data received
2012-10-17 10:22:33 DEBUG: pid 12921: s_do_auth: transaction state: I
2012-10-17 10:22:33 DEBUG: pid 12887: do_query: kind: T
2012-10-17 10:22:33 DEBUG: pid 12887: do_query: row description received
2012-10-17 10:22:33 DEBUG: pid 12887: num_fileds: 1
2012-10-17 10:22:33 DEBUG: pid 12887: do_query: kind: D
2012-10-17 10:22:33 DEBUG: pid 12887: do_query: data row received
2012-10-17 10:22:33 DEBUG: pid 12887: do_query: kind: C
2012-10-17 10:22:33 DEBUG: pid 12887: do_query: Command complete received
2012-10-17 10:22:33 DEBUG: pid 12887: do_query: kind: Z
2012-10-17 10:22:33 DEBUG: pid 12887: do_query: Ready for query
2012-10-17 10:22:33 LOG: pid 12887: find_primary_node: primary node id is 0
2012-10-17 10:22:41 LOG: pid 12920: send_failback_request: fail back 1 th node request from pid 12920
2012-10-17 10:22:41 DEBUG: pid 12887: failover_handler called
2012-10-17 10:22:41 DEBUG: pid 12887: failover_handler: starting to select new master node
2012-10-17 10:22:41 LOG: pid 12887: starting fail back. reconnect host mx-db2(5434)
2012-10-17 10:22:41 LOG: pid 12887: Do not restart children because we are failbacking node id 1 hostmx-db2 port:5434 and we are in streaming replication mode
2012-10-17 10:22:41 LOG: pid 12887: find_primary_node_repeatedly: waiting for finding a primary node
2012-10-17 10:22:41 DEBUG: pid 12887: pool_ssl: SSL requested but SSL support is not available
2012-10-17 10:22:41 DEBUG: pid 12887: s_do_auth: auth kind: 5
2012-10-17 10:22:41 DEBUG: pid 12887: s_do_auth: auth kind: 0
2012-10-17 10:22:41 DEBUG: pid 12887: s_do_auth: backend key data received
2012-10-17 10:22:41 DEBUG: pid 12887: s_do_auth: transaction state: I
2012-10-17 10:22:41 DEBUG: pid 12887: do_query: extended:0 query:SELECT pg_is_in_recovery()
2012-10-17 10:22:41 DEBUG: pid 12887: do_query: kind: T
2012-10-17 10:22:41 DEBUG: pid 12887: do_query: row description received
2012-10-17 10:22:41 DEBUG: pid 12887: num_fileds: 1
2012-10-17 10:22:41 DEBUG: pid 12887: do_query: kind: D
2012-10-17 10:22:41 DEBUG: pid 12887: do_query: data row received
2012-10-17 10:22:41 DEBUG: pid 12887: do_query: kind: C
2012-10-17 10:22:41 DEBUG: pid 12887: do_query: Command complete received
2012-10-17 10:22:41 DEBUG: pid 12887: do_query: kind: Z
2012-10-17 10:22:41 DEBUG: pid 12887: do_query: Ready for query
2012-10-17 10:22:41 LOG: pid 12887: find_primary_node: primary node id is 0
2012-10-17 10:22:41 LOG: pid 12887: failover: set new primary node: 0
2012-10-17 10:22:41 LOG: pid 12887: failover: set new master node: 0
2012-10-17 10:22:41 LOG: pid 12887: failback done. reconnect host mx-db2(5434)
2012-10-17 10:22:41 LOG: pid 12921: worker process received restart request
2012-10-17 10:22:42 DEBUG: pid 12887: reap_handler called
2012-10-17 10:22:42 DEBUG: pid 12887: reap_handler: call wait3
2012-10-17 10:22:42 LOG: pid 12887: worker child 12921 exits with status 256
2012-10-17 10:22:42 LOG: pid 12887: fork a new worker child pid 12931
2012-10-17 10:22:42 DEBUG: pid 12887: reap_handler: normally exited
2012-10-17 10:22:42 LOG: pid 12920: pcp child process received restart request
2012-10-17 10:22:42 DEBUG: pid 12931: I am 12931
2012-10-17 10:22:42 DEBUG: pid 12931: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:42 DEBUG: pid 12887: reap_handler called
2012-10-17 10:22:42 DEBUG: pid 12887: reap_handler: call wait3
2012-10-17 10:22:42 LOG: pid 12887: PCP child 12920 exits with status 256
2012-10-17 10:22:42 DEBUG: pid 12931: pool_ssl: SSL requested but SSL support is not available
2012-10-17 10:22:42 LOG: pid 12887: fork a new PCP child pid 12933
2012-10-17 10:22:42 DEBUG: pid 12887: reap_handler: normally exited
2012-10-17 10:22:42 DEBUG: pid 12931: s_do_auth: auth kind: 5
2012-10-17 10:22:42 DEBUG: pid 12931: s_do_auth: auth kind: 0
2012-10-17 10:22:42 DEBUG: pid 12931: s_do_auth: backend key data received
2012-10-17 10:22:42 DEBUG: pid 12931: s_do_auth: transaction state: I
2012-10-17 10:22:42 DEBUG: pid 12931: pool_ssl: SSL requested but SSL support is not available
2012-10-17 10:22:42 DEBUG: pid 12931: s_do_auth: auth kind: 5
2012-10-17 10:22:42 DEBUG: pid 12931: s_do_auth: auth kind: 0
2012-10-17 10:22:42 DEBUG: pid 12931: s_do_auth: backend key data received
2012-10-17 10:22:42 DEBUG: pid 12931: s_do_auth: transaction state: I
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: extended:0 query:SELECT pg_current_xlog_location()
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: kind: T
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: row description received
2012-10-17 10:22:42 DEBUG: pid 12931: num_fileds: 1
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: kind: D
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: data row received
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: kind: C
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: Command complete received
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: kind: Z
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: Ready for query
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: extended:0 query:SELECT pg_last_xlog_replay_location()
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: kind: T
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: row description received
2012-10-17 10:22:42 DEBUG: pid 12931: num_fileds: 1
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: kind: D
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: data row received
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: kind: C
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: Command complete received
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: kind: Z
2012-10-17 10:22:42 DEBUG: pid 12931: do_query: Ready for query
2012-10-17 10:22:47 LOG: pid 12933: degenerate_backend_set: 1 fail over request from pid 12933
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler called
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: starting to select new master node
2012-10-17 10:22:47 LOG: pid 12887: starting degeneration. shutdown host mx-db2(5434)
2012-10-17 10:22:47 LOG: pid 12887: Restart all children
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12888
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12889
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12890
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12891
2012-10-17 10:22:47 DEBUG: pid 12888: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12892
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12893
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12894
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12895
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12896
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12897
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12898
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12899
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12900
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12901
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12902
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12903
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12904
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12905
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12906
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12907
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12908
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12909
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12910
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12911
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12912
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12913
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12914
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12915
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12916
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12917
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12918
2012-10-17 10:22:47 DEBUG: pid 12887: failover_handler: kill 12919
2012-10-17 10:22:47 LOG: pid 12887: find_primary_node_repeatedly: waiting for finding a primary node
2012-10-17 10:22:47 DEBUG: pid 12887: pool_ssl: SSL requested but SSL support is not available
2012-10-17 10:22:47 DEBUG: pid 12891: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12889: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12893: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12890: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12895: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12892: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12897: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12894: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12899: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12896: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12901: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12898: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12903: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12900: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12904: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12902: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12905: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12906: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12908: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12907: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12910: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12909: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12912: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12911: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12914: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12913: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12916: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12915: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12918: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12917: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12919: child received shutdown request signal 3
2012-10-17 10:22:47 DEBUG: pid 12887: s_do_auth: auth kind: 5
2012-10-17 10:22:47 DEBUG: pid 12887: s_do_auth: auth kind: 0
2012-10-17 10:22:47 DEBUG: pid 12887: s_do_auth: backend key data received
2012-10-17 10:22:47 DEBUG: pid 12887: s_do_auth: transaction state: I
2012-10-17 10:22:47 DEBUG: pid 12887: do_query: extended:0 query:SELECT pg_is_in_recovery()
2012-10-17 10:22:47 DEBUG: pid 12887: do_query: kind: T
2012-10-17 10:22:47 DEBUG: pid 12887: do_query: row description received
2012-10-17 10:22:47 DEBUG: pid 12887: num_fileds: 1
2012-10-17 10:22:47 DEBUG: pid 12887: do_query: kind: D
2012-10-17 10:22:47 DEBUG: pid 12887: do_query: data row received
2012-10-17 10:22:47 DEBUG: pid 12887: do_query: kind: C
2012-10-17 10:22:47 DEBUG: pid 12887: do_query: Command complete received
2012-10-17 10:22:47 DEBUG: pid 12887: do_query: kind: Z
2012-10-17 10:22:47 DEBUG: pid 12887: do_query: Ready for query
2012-10-17 10:22:47 LOG: pid 12887: find_primary_node: primary node id is 0
2012-10-17 10:22:47 LOG: pid 12887: failover: set new primary node: 0
2012-10-17 10:22:47 LOG: pid 12887: failover: set new master node: 0
2012-10-17 10:22:47 DEBUG: pid 12942: I am 12942
2012-10-17 10:22:47 DEBUG: pid 12942: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12942: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12943: I am 12943
2012-10-17 10:22:47 DEBUG: pid 12943: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12943: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12946: I am 12946
2012-10-17 10:22:47 DEBUG: pid 12946: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12946: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12947: I am 12947
2012-10-17 10:22:47 DEBUG: pid 12947: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12947: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12944: I am 12944
2012-10-17 10:22:47 DEBUG: pid 12944: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12944: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12948: I am 12948
2012-10-17 10:22:47 DEBUG: pid 12948: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12948: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12949: I am 12949
2012-10-17 10:22:47 DEBUG: pid 12949: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12949: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12951: I am 12951
2012-10-17 10:22:47 DEBUG: pid 12951: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 DEBUG: pid 12950: I am 12950
2012-10-17 10:22:47 LOG: pid 12951: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12950: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12950: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12952: I am 12952
2012-10-17 10:22:47 DEBUG: pid 12952: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 DEBUG: pid 12945: I am 12945
2012-10-17 10:22:47 LOG: pid 12952: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12945: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12945: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12954: I am 12954
2012-10-17 10:22:47 DEBUG: pid 12954: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 DEBUG: pid 12953: I am 12953
2012-10-17 10:22:47 LOG: pid 12954: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12953: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12953: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12955: I am 12955
2012-10-17 10:22:47 DEBUG: pid 12955: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12955: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12941: I am 12941
2012-10-17 10:22:47 DEBUG: pid 12941: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12941: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12956: I am 12956
2012-10-17 10:22:47 DEBUG: pid 12956: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12956: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12958: I am 12958
2012-10-17 10:22:47 DEBUG: pid 12958: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12958: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12959: I am 12959
2012-10-17 10:22:47 DEBUG: pid 12959: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12959: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12963: I am 12963
2012-10-17 10:22:47 DEBUG: pid 12963: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12963: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12957: I am 12957
2012-10-17 10:22:47 DEBUG: pid 12957: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12957: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12964: I am 12964
2012-10-17 10:22:47 DEBUG: pid 12964: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12964: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12961: I am 12961
2012-10-17 10:22:47 DEBUG: pid 12961: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12961: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12965: I am 12965
2012-10-17 10:22:47 DEBUG: pid 12965: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 DEBUG: pid 12960: I am 12960
2012-10-17 10:22:47 DEBUG: pid 12960: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12965: do_child: failback event found. restart myself.
2012-10-17 10:22:47 LOG: pid 12960: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12967: I am 12967
2012-10-17 10:22:47 DEBUG: pid 12962: I am 12962
2012-10-17 10:22:47 DEBUG: pid 12967: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 DEBUG: pid 12962: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12967: do_child: failback event found. restart myself.
2012-10-17 10:22:47 LOG: pid 12962: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12968: I am 12968
2012-10-17 10:22:47 DEBUG: pid 12966: I am 12966
2012-10-17 10:22:47 DEBUG: pid 12968: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 DEBUG: pid 12966: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12968: do_child: failback event found. restart myself.
2012-10-17 10:22:47 LOG: pid 12966: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12970: I am 12970
2012-10-17 10:22:47 DEBUG: pid 12969: I am 12969
2012-10-17 10:22:47 DEBUG: pid 12970: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 DEBUG: pid 12969: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12970: do_child: failback event found. restart myself.
2012-10-17 10:22:47 LOG: pid 12969: do_child: failback event found. restart myself.
2012-10-17 10:22:47 DEBUG: pid 12971: I am 12971
2012-10-17 10:22:47 DEBUG: pid 12971: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12971: do_child: failback event found. restart myself.
2012-10-17 10:22:47 LOG: pid 12931: worker process received restart request
2012-10-17 10:22:47 LOG: pid 12887: failover done. shutdown host mx-db2(5434)
2012-10-17 10:22:47 DEBUG: pid 12972: I am 12972
2012-10-17 10:22:47 DEBUG: pid 12972: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:47 LOG: pid 12972: do_child: failback event found. restart myself.
2012-10-17 10:22:48 LOG: pid 12933: pcp child process received restart request
2012-10-17 10:22:48 DEBUG: pid 12887: reap_handler called
2012-10-17 10:22:48 DEBUG: pid 12887: reap_handler: call wait3
2012-10-17 10:22:48 DEBUG: pid 12887: child 12888 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12889 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12890 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12891 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12892 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12893 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12894 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12895 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12896 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12897 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12898 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12899 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12900 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12901 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12902 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12903 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12904 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12905 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12906 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12907 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12908 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12909 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12910 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12911 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12912 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12913 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12914 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12915 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12916 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12917 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12918 exits with status 0
2012-10-17 10:22:48 DEBUG: pid 12887: child 12919 exits with status 0
2012-10-17 10:22:48 LOG: pid 12887: worker child 12931 exits with status 256
2012-10-17 10:22:48 LOG: pid 12887: fork a new worker child pid 12973
2012-10-17 10:22:48 DEBUG: pid 12887: reap_handler: normally exited
2012-10-17 10:22:48 DEBUG: pid 12973: I am 12973
2012-10-17 10:22:48 DEBUG: pid 12973: pool_initialize_private_backend_status: initialize backend status
2012-10-17 10:22:48 DEBUG: pid 12973: pool_ssl: SSL requested but SSL support is not available
2012-10-17 10:22:48 DEBUG: pid 12973: s_do_auth: auth kind: 5
2012-10-17 10:22:48 DEBUG: pid 12973: s_do_auth: auth kind: 0
2012-10-17 10:22:48 DEBUG: pid 12973: s_do_auth: backend key data received
2012-10-17 10:22:48 DEBUG: pid 12973: s_do_auth: transaction state: I
TagsNo tags attached.

Activities

oleg_myrk

2012-10-24 00:01

reporter  

pgpool.conf (19,745 bytes)

oleg_myrk

2012-10-24 00:02

reporter  

pgpool.out (33,582 bytes)

t-ishii

2012-11-14 14:56

developer   ~0000148

Here is the patch trying to solve the problem. Problem is, when pcp process exits, pgpool parent misses SIGCHLD in some situation. The patch makes pgpool parent to wait for pcp process exits, rather than catching in SIGCHLD signal handler, which is not so reliable regarding pcp process. Can you try it out?

t-ishii

2012-11-14 14:57

developer  

main.c.patch (1,045 bytes)
diff --git a/main.c b/main.c
index 86a5a01..c378c76 100644
--- a/main.c
+++ b/main.c
@@ -1677,6 +1677,8 @@ static void failover(void)
 	int new_primary;
 	int nodes[MAX_NUM_BACKENDS];
 	bool need_to_restart_children;
+	int status;
+	int sts;
 
 	pool_debug("failover_handler called");
 
@@ -2042,6 +2044,29 @@ static void failover(void)
 	 * Send restart request to pcp child.
 	 */
 	kill(pcp_pid, SIGUSR1);
+	for (;;)
+	{
+		sts = waitpid(pcp_pid, &status, 0);
+		if (sts != -1)
+			break;
+		if (sts == -1)
+		{
+			if (errno == EINTR)
+				continue;
+			else
+			{
+				pool_error("failover: waitpid failed. reson: %s", strerror(errno));
+				return;
+			}
+		}
+	}
+	if (WIFSIGNALED(status))
+		pool_log("PCP child %d exits with status %d by signal %d in failover()", pcp_pid, status, WTERMSIG(status));
+	else
+		pool_log("PCP child %d exits with status %d in failover()", pcp_pid, status);
+
+	pcp_pid = pcp_fork_a_child(pcp_unix_fd, pcp_inet_fd, pcp_conf_file);
+	pool_log("fork a new PCP child pid %d in failover()", pcp_pid);
 }
 
 /*
main.c.patch (1,045 bytes)

t-ishii

2012-11-15 10:07

developer   ~0000152

We tested the patch and it seems solves the problem. Fix committed.

Issue History

Date Modified Username Field Change
2012-10-24 00:01 oleg_myrk New Issue
2012-10-24 00:01 oleg_myrk File Added: pgpool.conf
2012-10-24 00:02 oleg_myrk File Added: pgpool.out
2012-11-14 14:56 t-ishii Note Added: 0000148
2012-11-14 14:57 t-ishii File Added: main.c.patch
2012-11-14 15:34 t-ishii Assigned To => t-ishii
2012-11-14 15:34 t-ishii Status new => assigned
2012-11-15 10:07 t-ishii Note Added: 0000152
2012-11-15 10:07 t-ishii Status assigned => resolved