[Pgpool-general] Issue with master node selection in pgpool-II-3.0.3-beta1

Tatsuo Ishii ishii at sraoss.co.jp
Wed Feb 23 01:49:58 UTC 2011


Zhuravlev,

I couldn't reproduce your problem here.  Since both
problems(unexpected failover and segfault) were occured in "worker
process", I would like to take a backtrace of the process.
The process looks like:

23567 ?        S      0:00 pgpool: worker process
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Thanks for the report. We are looking into this.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> 
>> Hello,
>> 
>> I am having a problem with selection of master node when I manually
>> attach slave node (with id #1) which came up after postgresql server reboot.
>> When I attach the slave, pgpool tries to select master node, and for
>> some reason reports that previously working master node (with id #0) is no longer
>> available, marks the master node as unavailable, and selects the newly
>> attached node (id #1) as master. If after that I manually attach the node with
>> id #0 back to the pool it becomes a master.
>> 
>> More detailed explanation.
>> 
>> == Setup ==
>> Two postgresql nodes (running postgresql 9.0.3 on FreeBSD 8.0):
>> Master node (storage1) and a slave one (storage2).
>> The system uses streaming replication with the slave node running in hot
>> standby mode.
>> pgpool-II-3.0.3-beta1 is installed on storage1 node with the following
>> configuration file:
>> 
>> ======================================================================
>> zaa:...local/etc# cat pgpool.conf | sort | grep -v '^#' | grep -v '^$'
>> authentication_timeout = 30
>> backend_hostname0 = 'storage1'
>> backend_hostname1 = 'storage2'
>> backend_port0 = 5432
>> backend_port1 = 5432
>> backend_weight0 = 0.01
>> backend_weight1 = 0.99
>> black_function_list = 'nextval,setval'
>> child_idle_limit = 0
>> child_life_time = 300
>> child_max_connections = 0
>> connection_cache = true
>> connection_life_time = 0
>> debug_level = 1
>> enable_pool_hba = false
>> fail_over_on_backend_error = false
>> failover_command = ''
>> health_check_period = 5
>> health_check_user = 'pgpool'
>> listen_addresses = 'localhost'
>> load_balance_mode = true
>> log_connections = true
>> log_hostname = false
>> log_per_node_statement = true
>> log_standby_delay = 'always'
>> log_statement = true
>> master_slave_mode = true
>> master_slave_sub_mode = 'stream'
>> max_pool = 5
>> num_init_children = 2
>> parallel_mode = false
>> pid_file_name = '/var/run/pgpool/pgpool.pid'
>> port = 9999
>> print_timestamp = true
>> replication_mode = false
>> reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
>> socket_dir = '/tmp'
>> white_function_list = ''
>> ======================================================================
>> 
>> I can successfully connect to local port 9999 and issue DML and select
>> queries.
>> 
>> == Problem ==
>> 
>> 1. I shutdown postgresql on slave node storage2 and see the following
>> lines in syslog on the node with pgpool:
>> 
>> ======================================================================
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: detect_stop_postmaster_error: receive admin shutdown error from a node.
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG:   pid 53354: postmaster on DB node 1 was shutdown by administrative command
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG:   pid 53354: notice_backend_error: 1 fail over request from pid 53354
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler called
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler: starting to select new master node
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG:   pid 52999: starting degeneration. shutdown host storage2(5432)
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: child received shutdown request signal 3
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler: kill 53354
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler: kill 53355
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53355: child received shutdown request signal 3
>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG:   pid 52999: failover_handler: set new master node: 0
>> ...
>> Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: starting health checking
>> Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: health_check: 0 th DB node status: 2
>> Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: health_check: 1 th DB node status: 3
>> ======================================================================
>> 
>> As you can see, pgpool correctly marks storage2 node as being down.
>> 
>> 2. After a while I start postgres on node storage2.
>> 3. In order to return the node back into the pool I run the following
>> command:
>> 
>> pcp_attach_node -d 120 localhost 9898 postgres postgres 1
>> 
>> and see those lines in syslog on storage1:
>> 
>> ======================================================================
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 53003: send_failback_request: fail back 1 th node request from pid 53003
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler called
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler: starting to select new master node
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 52999: starting fail back. reconnect host storage2(5432)
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler: kill 56131
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56131: child received shutdown request signal 3
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler: kill 56132
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56132: child received shutdown request signal 3
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 52999: failover_handler: set new master node: 0
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 52999: failback done. reconnect host storage2(5432)
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56205: I am 56205
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: pool_ssl: SSL requested but SSL support is not available
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56206: I am 56206
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: auth kind: 0
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: parameter status data received
>> Feb 21 23:28:03 zaa last message repeated 10 times
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: backend key data received
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: transaction state: I
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: T
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: num_fileds: 1
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: D
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: C
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: Z
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: T
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: num_fileds: 1
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: D
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: C
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: Z
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG:   pid 52999: find_primary_node: primary node id is 0
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: reap_handler called
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: reap_handler: call wait3
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child 56132 exits with status 0
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child 56131 exits with status 0
>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: reap_handler: normally exited
>> Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: starting health checking
>> Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: health_check: 0 th DB node status: 2
>> Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: health_check: 1 th DB node status: 1
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: T
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: num_fileds: 1
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: D
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: C
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: Z
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: E
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: pool_read: read failed (Connection reset by peer)
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 56021: notice_backend_error: 0 fail over request from pid 56021
>> Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG:  unexpected EOF on client connection
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler called
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: starting to select new master node
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: starting degeneration. shutdown host storage1(5432)
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: kill 56205
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56205: child received shutdown request signal 3
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: kill 56206
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: failover_handler: set new master node: 1
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56206: child received shutdown request signal 3
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: failover done. shutdown host storage1(5432)
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56213: I am 56213
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56214: I am 56214
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: pool_ssl: SSL requested but SSL support is not available
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: auth kind: 0
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: parameter status data received
>> Feb 21 23:28:08 zaa last message repeated 10 times
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: backend key data received
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: transaction state: I
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: T
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: num_fileds: 1
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: D
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: C
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: Z
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: T
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: num_fileds: 1
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: E
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: Z
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: find_primary_node: do_query returns no rows
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: find_primary_node: do_query returns no data
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: find_primary_node: primary node id is 1
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: reap_handler called
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: reap_handler: call wait3
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child 56206 exits with status 0
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child 56205 exits with status 0
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child process 56021 was terminated by segmentation fault
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: worker child 56021 exits with status 11 by signal 11
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: fork a new worker child pid 56216
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: reap_handler: normally exited
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: I am 56216
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: pool_ssl: SSL requested but SSL support is not available
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: auth kind: 0
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: parameter status data received
>> Feb 21 23:28:08 zaa last message repeated 10 times
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: backend key data received
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: transaction state: I
>> Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: starting health checking
>> Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: health_check: 0 th DB node status: 3
>> Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: health_check: 1 th DB node status: 1
>> ======================================================================
>> 
>> As you can see, at first pgpool correctly chooses storage1 node as master, but
>> then for some reason it gets information about the master node (storage1)
>> unavailability:
>> 
>> ======================================================================
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: pool_read: read failed (Connection reset by peer)
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 56021: notice_backend_error: 0 fail over request from pid 56021
>> Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG:  unexpected EOF on client connection
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler called
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: starting to select new master node
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: starting degeneration. shutdown host storage1(5432)
>> ======================================================================
>> 
>> slave node storage2 becomes the master node and health check shows storage1
>> node as unavailable.
>> 
>> If I run "pcp_attach_node -d 120 localhost 9898 postgres postgres 0" command,
>> storage1 node again becomes the master node as it should be. 
>> 
>> Could someone please let me know what might be causing the issue? Is this a bug
>> or I am doing something wrong?
>> 
>> PS. One more interesting issue is presence of those two lines:
>> ======================================================================
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child process 56021 was terminated by segmentation fault
>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG:   pid 52999: worker child 56021 exits with status 11 by signal 11
>> ======================================================================
>> 
>> -- 
>> Alexander Zhuravlev
>> http://ikato.com/
>> _______________________________________________
>> Pgpool-general mailing list
>> Pgpool-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgpool-general
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general


More information about the Pgpool-general mailing list