[pgpool-general: 763] pgpool dropping backends too much

Karl von Randow karl+pgpool at cactuslab.com
Thu Jul 19 12:11:20 JST 2012


We are running pgpool with 3 backend servers (9.0, streaming 
replication). We are running non-SSL client-pgpool and SSL pgpool-server 
(my previous email was in error, we do not appear to be using SSL 
client-pgpool).

I have set the primary server to not support fail-over, and that works, 
it doesn't failover. However our slaves failover once or twice a day, 
when the slave has not in fact failed. I have to reattach the node, and 
it continues happily.

The syslog always contains this note about E packets:

Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 31871: 
pool_process_query: discard E packet from backend 1
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 ERROR: pid 31871: 
pool_ssl: SSL_read: no SSL error reported
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 ERROR: pid 31871: 
pool_read: read failed (Success)
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 31871: 
degenerate_backend_set: 1 fail over request from pid 31871
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: 
starting degeneration. shutdown host db2(5432)
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: 
Restart all children
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: 
find_primary_node_repeatedly: waiting for finding a primary node
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: 
find_primary_node: primary node id is 0
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: 
failover: set new primary node: 0
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: 
failover: set new master node: 0
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 923: worker 
process received restart request
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 30346: 
failover done. shutdown host db2(5432)
Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 30346: 
worker child 923 exits with status 256
Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 924: pcp 
child process received restart request
Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 30346: fork 
a new worker child pid 9434
Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 30346: PCP 
child 924 exits with status 256
Jul 17 00:11:04 app2 pgpool: 2012-07-17 00:11:04 LOG:   pid 30346: fork 
a new PCP child pid 9435

Sometimes the proceeding syslog entry is a LOG notice about a statement 
that failed, eg:
Jul 17 00:11:03 app2 pgpool: 2012-07-17 00:11:03 LOG:   pid 26664: 
pool_send_and_wait: Error or notice message from backend: : DB node id: 
1 backend pid: 15682 statement: <SNIP> message: canceling statement due 
to conflict with recovery

I don't want to mark our slaves as no-failover but it seems that pgpool 
is either experiencing a fault and interpreting it as a failover, or is 
a bit sensitive? I'm happy to test patches!

Best regards,
Karl


More information about the pgpool-general mailing list