View Issue Details

IDProjectCategoryView StatusLast Update
0000126Pgpool-IIBugpublic2016-04-19 23:33
ReportereldadAssigned Tonagata 
PriorityhighSeveritycrashReproducibilityhave not tried
Status assignedResolutionopen 
PlatformLinuxOSRHELOS Version6.3
Product Version 
Target VersionFixed in Version 
Summary0000126: pgpool get stuck after failover
Descriptionpgpool lost connection to one of the slaves in the postgres cluster and activated the failover script which does nothing since it is a slave.
immediately after it the pgpool stopped responding to new connections and print errors: ERROR: pid 9463: pool_read_kind: kind does not match between master(0) slot[0] (52)
This caused the application to stop responding since the pgpool didn't too a failover to the standby node and the VIP stayed on the main node.

active-standby pgpool II 3.3.3-1 64bit
4 nodes of postgres 9.3
Additional Informationlog file from the time of the issue:
2014-12-23 05:45:23 LOG: pid 8400: postmaster on DB node 0 was shutdown by administrative command
2014-12-23 05:45:23 LOG: pid 8400: degenerate_backend_set: 0 fail over request from pid 8400
2014-12-23 05:45:23 LOG: pid 24452: wd_start_interlock: start interlocking
2014-12-23 05:45:23 LOG: pid 24452: wd_assume_lock_holder: become a new lock holder
2014-12-23 05:45:23 LOG: pid 24471: wd_send_response: WD_STAND_FOR_LOCK_HOLDER received but lock holder exists already
2014-12-23 05:45:24 LOG: pid 24452: starting degeneration. shutdown host ptast001ppdb10(5432)
2014-12-23 05:45:24 LOG: pid 24452: Restart all children
2014-12-23 05:45:24 LOG: pid 347: connection received: host=15.224.241.72 port=56829
2014-12-23 05:45:24 LOG: pid 2450: connection received: host=15.224.241.72 port=56830
2014-12-23 05:45:24 ERROR: pid 8400: pool_flush_it: write failed to backend (0). reason: Broken pipe offset: 0 wlen: 5
2014-12-23 05:45:24 LOG: pid 9414: connection received: host=15.224.241.72 port=56832
2014-12-23 05:45:24 LOG: pid 24452: execute command: /etc/pgpool-II/failover_stream.sh 0 ptast001ppdb10 1 ptast001ppdb11 1
===================================================================================
START Node 0 failed on Tue Dec 23 05:45:24 PST 2014. Current master is 1
===================================================================================
Failed node ptast001ppdb10 is slave. Nothing is done
2014-12-23 05:45:24 LOG: pid 24452: wd_end_interlock: end interlocking
2014-12-23 05:45:24 LOG: pid 669: Replication of node:0 is behind 7819552 bytes from the primary server (node:1)
2014-12-23 05:45:25 LOG: pid 24452: failover: set new primary node: 1
2014-12-23 05:45:25 LOG: pid 24452: failover: set new master node: 1
2014-12-23 05:45:25 LOG: pid 9460: connection received: host=15.224.241.72 port=56834
2014-12-23 05:45:25 LOG: pid 9462: connection received: host=15.224.241.72 port=56835
2014-12-23 05:45:25 LOG: pid 9461: connection received: host=15.224.241.72 port=56833
2014-12-23 05:45:25 LOG: pid 9463: connection received: host=15.224.241.72 port=56836
2014-12-23 05:45:25 LOG: pid 9465: connection received: host=15.224.241.72 port=56838
2014-12-23 05:45:25 LOG: pid 9464: connection received: host=15.224.241.72 port=56837
2014-12-23 05:45:25 LOG: pid 9466: connection received: host=15.224.241.72 port=56839
2014-12-23 05:45:25 LOG: pid 9467: connection received: host=15.224.241.72 port=56840
2014-12-23 05:45:25 LOG: pid 9468: connection received: host=15.224.241.72 port=56841
2014-12-23 05:45:25 LOG: pid 9470: connection received: host=15.224.241.72 port=56842
2014-12-23 05:45:25 LOG: pid 9469: connection received: host=15.224.241.72 port=56843
2014-12-23 05:45:25 ERROR: pid 9462: pool_read_kind: kind does not match between master(0) slot[0] (52)
2014-12-23 05:45:25 LOG: pid 9471: connection received: host=15.224.241.72 port=56845
2014-12-23 05:45:25 ERROR: pid 9460: pool_read_kind: kind does not match between master(0) slot[0] (52)
2014-12-23 05:45:25 LOG: pid 9472: connection received: host=15.224.241.72 port=56844
2014-12-23 05:45:25 LOG: pid 9460: connection received: host=15.224.241.72 port=56848
2014-12-23 05:45:25 ERROR: pid 9461: pool_read_kind: kind does not match between master(0) slot[0] (52)
2014-12-23 05:45:25 LOG: pid 9474: connection received: host=15.224.241.72 port=56849
2014-12-23 05:45:25 LOG: pid 9473: connection received: host=15.224.241.72 port=56847
2014-12-23 05:45:25 LOG: pid 9475: connection received: host=15.224.241.72 port=56850
2014-12-23 05:45:25 ERROR: pid 9463: pool_read_kind: kind does not match between master(0) slot[0] (52)
2014-12-23 05:45:25 ERROR: pid 9465: pool_read_kind: kind does not match between master(0) slot[0] (52)
TagsNo tags attached.

Activities

nagata

2015-01-08 17:32

developer   ~0000509

I'm trying to reproduce this but I can't.
Are you able to reproduce this easily? If so, could you please provide
the detailed steps and pgpool.conf?

eldad

2015-01-08 21:23

reporter  

pgpool.conf (33,874 bytes)

eldad

2015-01-08 21:27

reporter   ~0000511

Hi,

I tried 2-3 times but I can't reproduce it, I guess
it happens due to network problem so just shutting down one of the nodes doesn't do the work.
I'm attaching my pgpool.conf
Can you please explain what is the meaning of the error:
ERROR: pid 9463: pool_read_kind: kind does not match between master(0) slot[0] (52)
Maybe I can give you some more info if I will understand the problem.

Thanks,
Eldad

nagata

2015-01-09 21:33

developer   ~0000512

ERROR: pid 9463: pool_read_kind: kind does not match between master(0) slot[0] (52)

pool_read_kind function reads a byte of "packet kind" from all valid backend nodes and rises the error if there is a mismatch between these.

This error says "received 0 from master node but received 52 from node #0".

However, node #0 was detached from pgpool in this case and invalid, so this should be skipped by pool_read_kind.

2014-12-23 05:45:23 LOG: pid 8400: degenerate_backend_set: 0 fail over request from pid 8400

This is very odd....

trevor

2016-04-19 23:33

reporter   ~0000787

I am experiencing the same problem. The problem seems to be that node 0 is not the master resulting in the error that primary node 0's kind does not match the uninitialized kind0 variable.

Issue History

Date Modified Username Field Change
2014-12-24 17:31 eldad New Issue
2015-01-08 11:07 nagata Assigned To => nagata
2015-01-08 11:07 nagata Status new => assigned
2015-01-08 17:32 nagata Note Added: 0000509
2015-01-08 17:32 nagata Status assigned => feedback
2015-01-08 21:23 eldad File Added: pgpool.conf
2015-01-08 21:27 eldad Note Added: 0000511
2015-01-08 21:27 eldad Status feedback => assigned
2015-01-09 21:33 nagata Note Added: 0000512
2016-04-19 23:33 trevor Note Added: 0000787