Rob Reinhardt rreinhardt at eitccorp.com
Tue Apr 16 04:17:35 JST 2019

I performed a simple test today with everything up in normal state.
Shutdown the primary server's db instance, and pgpool detected it (that's
fine). At the moment I don't have any automatic actions taking place
failover or otherwise. And I didn't do any manual ones.

I simply  restarted my primary instance, and repmgr shows a nice green

At that point, pgpool was confused and incorrect.  it still thought the
primary node was down, so I guess it doesn't check again. show pool_nodes
won't connect at this point either and pgpool logs continually retrying to
find a primary node.

Then if I restart pgpool like the hint in pgpool's log says to do, it comes
back and still can't see that there is a primary.  show pool_nodes also
still can't connect.

So then, I troubleshoot and I take a look at /tmp/pgpool_status, it says:

On a lark, I try shutting down pgpool again, then deleting that file, and
then restarted it.

This time it comes up and does check and show pool_nodes can connect and
has the correct cluster status.

I'm too new and naive to pgpool to assume anything is meant to be or not,
so here are some stupid questions:

1) Shouldn't I expect pgpool to just handle this without requiring a
restart? Why isn't it re-checking and re-updating status?  I mean it knows
an incident just occured, why wouldn't it keep rechecking the real status
and update itself automatically? Or is it supposed to and this is just a
2) Even if I assume that manually restarting pgpool has to happen every
time my cluster status changes (for whatever reason, planned or otherwise).
then why do I have to manually remove that file to get pgpool to see the
right of things when it comes back up? Or am I not supposed to have to and
this is just a bug?
3) If all of that is true and just the way it has to be, why doesnt the
systemd start script for the pgpool.service that comes with it, have code
to always remove that file on startup so that it CAN get a clean start? I'd
have to assume that it is not intended that this file need to be manually
managed or script in the startup. Or that a service script bug?
4) And finally, if none of that is bugs and is just the way it is designed
to work, do you foresee any problems with me adding code to the service
script to remove that file each time it starts up in order to force it to
automatically check out the real cluster status when it comes up.

