[pgpool-general: 9090] Re: kind does not match between main(0) slot[0] (52)

Emond Papegaaij emond.papegaaij at gmail.com
Tue Apr 30 23:39:34 JST 2024


Op di 30 apr 2024 om 07:43 schreef Bo Peng <pengbo at sraoss.co.jp>:

> Hi,
>
> > We've noticed a failure in one of our test runs tonight which I can't
> > explain. During a reboot test of the nodes in the cluster, one of the
> > pgpool instances (the one with ip 172.29.30.2) starts returning the
> > following error:
> > pid 194: ERROR:  unable to read message kind
> > pid 194: DETAIL:  kind does not match between main(0) slot[0] (52)
>
> The error means the responses from main node and node0 do not match.
>
> I checked the logs and the logs show that node0 is down, but pcp_node_info
> shows "up".
>
> Could you share your pgpool.conf of all pgpool nodes and
> the test scenario?
>

This is our  reboot test. It reboots all 3 nodes in the cluster in a
controlled way:
* The test starts by restoring a fixed state in a cluster with 3 vms.
Node 172.29.30.1 will be watchdog leader and run the primary database. The
other nodes are healthy standby.
* At timestamp 04:17:32: Node 172.29.30.2 is rebooted first.
* Wait until node 2 is fully up (this takes about 2:30 minutes after it has
booted).
* At timestamp 04:21:55: Node 172.29.30.3 is rebooted next
* Wait until node 3 is fully up (this again takes about 2:30 minutes after
it has booted).
* At timestamp 04:25:54: Failover all tasks from node 172.29.30.1 to
another node (node 2 is the most likely). This consists of first restarting
pgpool to force it to drop its leadership status. When pgpool is up and in
sync in the cluster, stop and detach the database to force a failover.
* At timestamp 04:26:17: Reboot node 1
* Wait until all nodes report a fully healthy state.

As you can see in the log, node 2 starts reporting 'kind does not match' at
the moment node 1 is in its reboot cycle. The first error is at 04:27:46,
which matches exactly with the moment pgpool starts back up on node 1. The
logs from node 1 show pgpool starting and the logs from node 2 show 'new
watchdog connection' just prior to the first 'kind does not match'.

I've attached an example pgpool.conf. It's not the exact same version from
this test, because the test does not export the configuration. All relevant
settings will be the same, but some names (such as
backend_application_nameX) will be different. The configuration is
identical on all nodes, because it is fully managed by configuration
management.

Best regards,
Emond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20240430/2d71ee76/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool.conf
Type: application/octet-stream
Size: 41292 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20240430/2d71ee76/attachment-0001.obj>


More information about the pgpool-general mailing list