[pgpool-general: 4114] Terminating Long Running Query Triggers Failover

Thu Oct 15 01:16:04 JST 2015

Hi everyone. We have some master/slave clusters using pgpool 3.2.4 connected to 9.4 instances. Yesterday, I found some long running queries using a script I have to find them, and found some very long running queries on our clusters that have a status of unlogged in pg_class. I noticed in the where clause pgpool_regclass was referenced and the query had been active for 26 days. I also found some on other clusters that had been running for 14 days. I terminated them on our clusters, and two of our clusters showed as a failover being triggered in pgpool. The master databases never went down. I executed a pcp_attach on the pool servers for each of the clusters reporting the failover, and the connection status using pcp_node_info returned the master node to a status of, "2". Everything seemed fine afterward. An example is below.

SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pgpool_regclass('mytable') AND c.relpersistence = 'u'|active|26 days

A couple of questions:

  1.  Is the pgpool failover related to terminating long running queries like the one above?
  2.  Should this have triggered the failover?
  3.  What can I do to prevent pgpool from seeing it as a failed instance when it never actually went down? I don't want to "DISALLOW_FAILOVER" in the pool config, however, I don't want it reacting to any sort of false positive.
  4.  What is the correlation to pgpool in the query example above? Is this a select query that was issued by pool and got hung in an active state for some reason?

Any help trying to figure this out would be greatly appreciated. I'm not new to Postgres, however, I am somewhat new and green with Pg Pool.

Thanks!
Daniel