[pgpool-general: 5010] Re: Failover command not invoked by secondary pgpool after its promotion to primary in HA

Thu Sep 22 01:10:11 JST 2016

Hi Gabriele,

I had the same issue when my search_primary_node_timeout had a bad value. With a value at 300, it work fine for me.

If it can help …

Regards,

Christophe

De : pgpool-general-bounces at pgpool.net [mailto:pgpool-general-bounces at pgpool.net] De la part de Gabriele Monfardini
Envoyé : mercredi 21 septembre 2016 16:12
À : pgpool-general at pgpool.net
Objet : [pgpool-general: 5006] Failover command not invoked by secondary pgpool after its promotion to primary in HA

Hi all,

I have the following setup: 2 identical virtual machines with pgpool 3.5.4 and postgresql 9.5.4 and CentOS 7.

The first one, let's call it A, has master postgresql and primary pgpool.
The second one, let's call it B, has slave postgresql (in hot standby) and secondary pgpool.
I think this setup is pretty common.

The two pgpool are configured with a delegate ip, watchdog, health check on both backends and a failover command.
The setup seems to work correctly at first.

The problem is that, to assess failover, we try bringing down the network on vm A.
This should cause the following:

  *   pgpool on B should react becoming primary pgpool and adding delegate ip to its ethernet device
  *   pgpool, after the configured number of attempts to connect to master database on A should promote postgresql on B using the configured failover_command.
First point was successful, second one was not.
After attempting to connect to postgresql on A appears a message that notify that backend on A is declared down, but no failover command was issued and SHOW pool_nodes shows postgresql on A as online.
Moreover attempts to connect to postgresql on A continue forever.

Note that restarting pgpool fixes all problems.
It declares itself primary since it cannot reach its counterpart on A, brings up delegate ip and after configured number of attempts promotes postgresql on B as new master.

So this definitely seems a bug, and indeed a tichet was already be opened for it.
http://www.pgpool.net/mantisbt/view.php?id=227&history=1

My question is the following: our setup and in particular having two nodes with one pgpool and one postgresql is ideal or is it advisable to arrange services in more nodes or differently?

IMHO the probability that one node goes down entirely or becomes unreachable is bigger than a failure of the sole postgresql service.
Thus, due to this bug, we risk to have no high avaiability at all and to be left with a read-only database.
Obviously our client is not very happy with that since we have a more complex setup with little advantages.

Is there any news on bug resolution timing?

Is there any workaround that I can use to mitigate the impact?
For instance, I may setup pgpools in order to have primary on B.
Getting rid of pgpool promotion probably does trigger postgresql failover.

What is your opinion?

Best regards,

Gabriele Monfardini

-----
Gabriele Monfardini
LdP Progetti GIS
tel: 0577.531049
email: monfardini at ldpgis.it<mailto:monfardini at ldpgis.it>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20160921/9939380b/attachment-0001.html>