View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000126 | Pgpool-II | Bug | public | 2014-12-24 17:31 | 2023-09-25 22:00 |
| Reporter | eldad | Assigned To | nagata | ||
| Priority | high | Severity | crash | Reproducibility | have not tried |
| Status | assigned | Resolution | open | ||
| Platform | Linux | OS | RHEL | OS Version | 6.3 |
| Summary | 0000126: pgpool get stuck after failover | ||||
| Description | pgpool lost connection to one of the slaves in the postgres cluster and activated the failover script which does nothing since it is a slave. immediately after it the pgpool stopped responding to new connections and print errors: ERROR: pid 9463: pool_read_kind: kind does not match between master(0) slot[0] (52) This caused the application to stop responding since the pgpool didn't too a failover to the standby node and the VIP stayed on the main node. active-standby pgpool II 3.3.3-1 64bit 4 nodes of postgres 9.3 | ||||
| Additional Information | log file from the time of the issue: 2014-12-23 05:45:23 LOG: pid 8400: postmaster on DB node 0 was shutdown by administrative command 2014-12-23 05:45:23 LOG: pid 8400: degenerate_backend_set: 0 fail over request from pid 8400 2014-12-23 05:45:23 LOG: pid 24452: wd_start_interlock: start interlocking 2014-12-23 05:45:23 LOG: pid 24452: wd_assume_lock_holder: become a new lock holder 2014-12-23 05:45:23 LOG: pid 24471: wd_send_response: WD_STAND_FOR_LOCK_HOLDER received but lock holder exists already 2014-12-23 05:45:24 LOG: pid 24452: starting degeneration. shutdown host ptast001ppdb10(5432) 2014-12-23 05:45:24 LOG: pid 24452: Restart all children 2014-12-23 05:45:24 LOG: pid 347: connection received: host=15.224.241.72 port=56829 2014-12-23 05:45:24 LOG: pid 2450: connection received: host=15.224.241.72 port=56830 2014-12-23 05:45:24 ERROR: pid 8400: pool_flush_it: write failed to backend (0). reason: Broken pipe offset: 0 wlen: 5 2014-12-23 05:45:24 LOG: pid 9414: connection received: host=15.224.241.72 port=56832 2014-12-23 05:45:24 LOG: pid 24452: execute command: /etc/pgpool-II/failover_stream.sh 0 ptast001ppdb10 1 ptast001ppdb11 1 =================================================================================== START Node 0 failed on Tue Dec 23 05:45:24 PST 2014. Current master is 1 =================================================================================== Failed node ptast001ppdb10 is slave. Nothing is done 2014-12-23 05:45:24 LOG: pid 24452: wd_end_interlock: end interlocking 2014-12-23 05:45:24 LOG: pid 669: Replication of node:0 is behind 7819552 bytes from the primary server (node:1) 2014-12-23 05:45:25 LOG: pid 24452: failover: set new primary node: 1 2014-12-23 05:45:25 LOG: pid 24452: failover: set new master node: 1 2014-12-23 05:45:25 LOG: pid 9460: connection received: host=15.224.241.72 port=56834 2014-12-23 05:45:25 LOG: pid 9462: connection received: host=15.224.241.72 port=56835 2014-12-23 05:45:25 LOG: pid 9461: connection received: host=15.224.241.72 port=56833 2014-12-23 05:45:25 LOG: pid 9463: connection received: host=15.224.241.72 port=56836 2014-12-23 05:45:25 LOG: pid 9465: connection received: host=15.224.241.72 port=56838 2014-12-23 05:45:25 LOG: pid 9464: connection received: host=15.224.241.72 port=56837 2014-12-23 05:45:25 LOG: pid 9466: connection received: host=15.224.241.72 port=56839 2014-12-23 05:45:25 LOG: pid 9467: connection received: host=15.224.241.72 port=56840 2014-12-23 05:45:25 LOG: pid 9468: connection received: host=15.224.241.72 port=56841 2014-12-23 05:45:25 LOG: pid 9470: connection received: host=15.224.241.72 port=56842 2014-12-23 05:45:25 LOG: pid 9469: connection received: host=15.224.241.72 port=56843 2014-12-23 05:45:25 ERROR: pid 9462: pool_read_kind: kind does not match between master(0) slot[0] (52) 2014-12-23 05:45:25 LOG: pid 9471: connection received: host=15.224.241.72 port=56845 2014-12-23 05:45:25 ERROR: pid 9460: pool_read_kind: kind does not match between master(0) slot[0] (52) 2014-12-23 05:45:25 LOG: pid 9472: connection received: host=15.224.241.72 port=56844 2014-12-23 05:45:25 LOG: pid 9460: connection received: host=15.224.241.72 port=56848 2014-12-23 05:45:25 ERROR: pid 9461: pool_read_kind: kind does not match between master(0) slot[0] (52) 2014-12-23 05:45:25 LOG: pid 9474: connection received: host=15.224.241.72 port=56849 2014-12-23 05:45:25 LOG: pid 9473: connection received: host=15.224.241.72 port=56847 2014-12-23 05:45:25 LOG: pid 9475: connection received: host=15.224.241.72 port=56850 2014-12-23 05:45:25 ERROR: pid 9463: pool_read_kind: kind does not match between master(0) slot[0] (52) 2014-12-23 05:45:25 ERROR: pid 9465: pool_read_kind: kind does not match between master(0) slot[0] (52) | ||||
| Tags | No tags attached. | ||||
|
|
I'm trying to reproduce this but I can't. Are you able to reproduce this easily? If so, could you please provide the detailed steps and pgpool.conf? |
|
|
|
|
|
Hi, I tried 2-3 times but I can't reproduce it, I guess it happens due to network problem so just shutting down one of the nodes doesn't do the work. I'm attaching my pgpool.conf Can you please explain what is the meaning of the error: ERROR: pid 9463: pool_read_kind: kind does not match between master(0) slot[0] (52) Maybe I can give you some more info if I will understand the problem. Thanks, Eldad |
|
|
ERROR: pid 9463: pool_read_kind: kind does not match between master(0) slot[0] (52) pool_read_kind function reads a byte of "packet kind" from all valid backend nodes and rises the error if there is a mismatch between these. This error says "received 0 from master node but received 52 from node #0". However, node #0 was detached from pgpool in this case and invalid, so this should be skipped by pool_read_kind. 2014-12-23 05:45:23 LOG: pid 8400: degenerate_backend_set: 0 fail over request from pid 8400 This is very odd.... |
|
|
I am experiencing the same problem. The problem seems to be that node 0 is not the master resulting in the error that primary node 0's kind does not match the uninitialized kind0 variable. |
|
|
I am facing same issue. Is this already solved? 2023-09-22 03:19:28: pid 1: LOG: fork a new child process with pid: 27269 2023-09-22 03:19:33: pid 25212: ERROR: unable to read message kind 2023-09-22 03:19:33: pid 25212: DETAIL: kind does not match between main(0) slot[0] (52) 2023-09-22 03:19:38: pid 25212: ERROR: unable to read message kind 2023-09-22 03:19:38: pid 25212: DETAIL: kind does not match between main(0) slot[0] (52) 2023-09-22 03:19:38: pid 1: LOG: shutting down 2023-09-22 03:19:38: pid 1: LOG: terminating all child processes 2023-09-22 03:19:38: pid 27266: LOG: failed to connect to PostgreSQL server on "metrics-pg-postgresql-0.metrics-pg-postgresql-headless.nsxstg-stgsrv-4xc.svc.cluster.local:5432" using INET socket 2023-09-22 03:19:38: pid 27266: DETAIL: exit request has been sent 2023-09-22 03:19:38: pid 27266: LOG: failed to create a backend 0 connection 2023-09-22 03:19:38: pid 27266: DETAIL: skip this backend because because failover_on_backend_error is off and we are in streaming replication mode and node is standby node 2023-09-22 03:19:38: pid 27266: LOG: failed to connect to PostgreSQL server on "metrics-pg-postgresql-1.metrics-pg-postgresql-headless.nsxstg-stgsrv-4xp8zqcc.svc.cluster.local:5432" using INET socket 2023-09-22 03:19:38: pid 27266: DETAIL: exit request has been sent 2023-09-22 03:19:38: pid 27266: FATAL: failed to create a backend 1 connection 2023-09-22 03:19:38: pid 27266: DETAIL: not executing failover because failover_on_backend_error is off |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2014-12-24 17:31 | eldad | New Issue | |
| 2015-01-08 11:07 | nagata | Assigned To | => nagata |
| 2015-01-08 11:07 | nagata | Status | new => assigned |
| 2015-01-08 17:32 | nagata | Note Added: 0000509 | |
| 2015-01-08 17:32 | nagata | Status | assigned => feedback |
| 2015-01-08 21:23 | eldad | File Added: pgpool.conf | |
| 2015-01-08 21:27 | eldad | Note Added: 0000511 | |
| 2015-01-08 21:27 | eldad | Status | feedback => assigned |
| 2015-01-09 21:33 | nagata | Note Added: 0000512 | |
| 2016-04-19 23:33 | trevor | Note Added: 0000787 | |
| 2023-09-25 22:00 | rohitgupta06 | Note Added: 0004431 | |
| 2023-09-25 22:00 | rohitgupta06 | File Added: image.png |