[pgpool-general: 3773] documentation ambiguities

Igor Gueths igor.gueths at rackspace.com
Tue Jun 2 00:18:13 JST 2015


Hello,
for about a month or so now I have been working on implementing Pgpool 
in front of a streaming cluster of Postgresql-9.3 instances, and have 
found a number of items in the documentation that likely require 
clarification; consequently, I wanted to get them out there in hopes of 
getting this fixed upstream.

Exactly which components of the heartbeat mechanism introduced in 
Pgpool-3.3 require Pgpool to run as root in order to work properly? When 
running as a non-root user i.e. Postgres, there are warnings/errors in 
the logs relating to failing to create the heartbeat receive socket, 
SO_BINDTODEVICE requires root privilege, etc; however, the heartbeat 
mechanism appears to work to the point where it can recognize that a 
primary Pgpool instance has gone away, and therefore tell the secondary 
instance to claim the virtual IP. To this point, one thing I did notice 
was that when running as Postgres in this case, Watchdog periodically 
got itself into situations where each node of the two Pgpool nodes 
deployed believed the other was down, resulting in a very interesting 
failover situation; this seems like rather similar behavior to another 
post describing a race condition which seems to occur when multiple 
Pgpool nodes are started in quick succession. I unfortunately was unable 
to reproduce the split brain behavior consistently, so therefore I am 
not entirely sure what is causing it at present. However, running Pgpool 
and therefore Watchdog as Root, and setting wd_heartbeat_deadtime = 120 
or somesuch has mitigated the issue.

Another item that doesn't seem to be elaborated on is the fact that 
configuration parameters appear to be case-sensitive; this is readily 
apparent when setting delegate_ip vs delegate_IP in pgpool.conf. In the 
former case, at least under Pgpool-3.4.1 the parameter appears to be 
ignored entirely, with no corresponding log messages indicating that 
pgpool.conf encountered a parse error etc.

Last but not least, it is not entirely clear what defines a backend 
error. This resulted in the main client of the database having issues 
interacting with it, due to query errors from Postgres resulting in the 
incorrect assessment by Pgpool that a failover was required. 
failover_on_backend_error = off ended up resolving this one, although it 
took several days to figure out that Pgpool's definition of a backend 
error encompassed much more than just the death of Postmaster.

I would be happy to update relevant documentation items pertaining to 
the above, presuming that someone isn't already working on it. Thanks 
for reading!
-- 
Igor Gueths, Linux Systems Engineer Desk: +1.210.312.1952 Mobile: 
+1.210.997.9397


More information about the pgpool-general mailing list