[Pgpool-general] Issue with master node selection in pgpool-II-3.0.3-beta1

Tue Feb 22 23:32:55 UTC 2011

Thanks for the report. We are looking into this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hello,
> 
> I am having a problem with selection of master node when I manually
> attach slave node (with id #1) which came up after postgresql server reboot.
> When I attach the slave, pgpool tries to select master node, and for
> some reason reports that previously working master node (with id #0) is no longer
> available, marks the master node as unavailable, and selects the newly
> attached node (id #1) as master. If after that I manually attach the node with
> id #0 back to the pool it becomes a master.
> 
> More detailed explanation.
> 
> == Setup ==
> Two postgresql nodes (running postgresql 9.0.3 on FreeBSD 8.0):
> Master node (storage1) and a slave one (storage2).
> The system uses streaming replication with the slave node running in hot
> standby mode.
> pgpool-II-3.0.3-beta1 is installed on storage1 node with the following
> configuration file:
> 
> ======================================================================
> zaa:...local/etc# cat pgpool.conf | sort | grep -v '^#' | grep -v '^$'
> authentication_timeout = 30
> backend_hostname0 = 'storage1'
> backend_hostname1 = 'storage2'
> backend_port0 = 5432
> backend_port1 = 5432
> backend_weight0 = 0.01
> backend_weight1 = 0.99
> black_function_list = 'nextval,setval'
> child_idle_limit = 0
> child_life_time = 300
> child_max_connections = 0
> connection_cache = true
> connection_life_time = 0
> debug_level = 1
> enable_pool_hba = false
> fail_over_on_backend_error = false
> failover_command = ''
> health_check_period = 5
> health_check_user = 'pgpool'
> listen_addresses = 'localhost'
> load_balance_mode = true
> log_connections = true
> log_hostname = false
> log_per_node_statement = true
> log_standby_delay = 'always'
> log_statement = true
> master_slave_mode = true
> master_slave_sub_mode = 'stream'
> max_pool = 5
> num_init_children = 2
> parallel_mode = false
> pid_file_name = '/var/run/pgpool/pgpool.pid'
> port = 9999
> print_timestamp = true
> replication_mode = false
> reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
> socket_dir = '/tmp'
> white_function_list = ''
> ======================================================================
> 
> I can successfully connect to local port 9999 and issue DML and select
> queries.
> 
> == Problem ==
> 
> 1. I shutdown postgresql on slave node storage2 and see the following
> lines in syslog on the node with pgpool:
> 
> ======================================================================
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: detect_stop_postmaster_error: receive admin shutdown error from a node.
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG:   pid 53354: postmaster on DB node 1 was shutdown by administrative command
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG:   pid 53354: notice_backend_error: 1 fail over request from pid 53354
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler called
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler: starting to select new master node
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG:   pid 52999: starting degeneration. shutdown host storage2(5432)
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: child received shutdown request signal 3
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler: kill 53354
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler: kill 53355
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53355: child received shutdown request signal 3
> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG:   pid 52999: failover_handler: set new master node: 0
> ...
> Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: starting health checking
> Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: health_check: 0 th DB node status: 2
> Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: health_check: 1 th DB node status: 3
> ======================================================================
> 
> As you can see, pgpool correctly marks storage2 node as being down.
> 
> 2. After a while I start postgres on node storage2.
> 3. In order to return the node back into the pool I run the following
> command:
> 
> pcp_attach_node -d 120 localhost 9898 postgres postgres 1
> 
> and see those lines in syslog on storage1:
> 
> ======================================================================
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 53003: send_failback_request: fail back 1 th node request from pid 53003
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler called
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler: starting to select new master node
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 52999: starting fail back. reconnect host storage2(5432)
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler: kill 56131
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56131: child received shutdown request signal 3
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler: kill 56132
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56132: child received shutdown request signal 3
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 52999: failover_handler: set new master node: 0
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 52999: failback done. reconnect host storage2(5432)
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56205: I am 56205
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: pool_ssl: SSL requested but SSL support is not available
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56206: I am 56206
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: auth kind: 0
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: parameter status data received
> Feb 21 23:28:03 zaa last message repeated 10 times
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: backend key data received
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: transaction state: I
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: T
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: num_fileds: 1
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: D
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: C
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: Z
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: T
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: num_fileds: 1
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: D
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: C
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: Z
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 52999: find_primary_node: primary node id is 0
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: reap_handler called
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: reap_handler: call wait3
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child 56132 exits with status 0
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child 56131 exits with status 0
> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: reap_handler: normally exited
> Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: starting health checking
> Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: health_check: 0 th DB node status: 2
> Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: health_check: 1 th DB node status: 1
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: T
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: num_fileds: 1
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: D
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: C
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: Z
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: E
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: pool_read: read failed (Connection reset by peer)
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 56021: notice_backend_error: 0 fail over request from pid 56021
> Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG:  unexpected EOF on client connection
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler called
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: starting to select new master node
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: starting degeneration. shutdown host storage1(5432)
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: kill 56205
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56205: child received shutdown request signal 3
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: kill 56206
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: failover_handler: set new master node: 1
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56206: child received shutdown request signal 3
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: failover done. shutdown host storage1(5432)
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56213: I am 56213
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56214: I am 56214
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: pool_ssl: SSL requested but SSL support is not available
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: auth kind: 0
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: parameter status data received
> Feb 21 23:28:08 zaa last message repeated 10 times
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: backend key data received
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: transaction state: I
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: T
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: num_fileds: 1
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: D
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: C
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: Z
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: T
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: num_fileds: 1
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: E
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: Z
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: find_primary_node: do_query returns no rows
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: find_primary_node: do_query returns no data
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: find_primary_node: primary node id is 1
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: reap_handler called
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: reap_handler: call wait3
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child 56206 exits with status 0
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child 56205 exits with status 0
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child process 56021 was terminated by segmentation fault
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: worker child 56021 exits with status 11 by signal 11
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: fork a new worker child pid 56216
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: reap_handler: normally exited
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: I am 56216
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: pool_ssl: SSL requested but SSL support is not available
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: auth kind: 0
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: parameter status data received
> Feb 21 23:28:08 zaa last message repeated 10 times
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: backend key data received
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: transaction state: I
> Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: starting health checking
> Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: health_check: 0 th DB node status: 3
> Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: health_check: 1 th DB node status: 1
> ======================================================================
> 
> As you can see, at first pgpool correctly chooses storage1 node as master, but
> then for some reason it gets information about the master node (storage1)
> unavailability:
> 
> ======================================================================
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: pool_read: read failed (Connection reset by peer)
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 56021: notice_backend_error: 0 fail over request from pid 56021
> Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG:  unexpected EOF on client connection
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler called
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: starting to select new master node
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: starting degeneration. shutdown host storage1(5432)
> ======================================================================
> 
> slave node storage2 becomes the master node and health check shows storage1
> node as unavailable.
> 
> If I run "pcp_attach_node -d 120 localhost 9898 postgres postgres 0" command,
> storage1 node again becomes the master node as it should be. 
> 
> Could someone please let me know what might be causing the issue? Is this a bug
> or I am doing something wrong?
> 
> PS. One more interesting issue is presence of those two lines:
> ======================================================================
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child process 56021 was terminated by segmentation fault
> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: worker child 56021 exits with status 11 by signal 11
> ======================================================================
> 
> -- 
> Alexander Zhuravlev
> http://ikato.com/
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general