[pgpool-general: 3465] Fwd: failed master recovery leads to it becoming primary

Mon Feb 9 17:46:14 JST 2015

Dear all

I'm new to pgpool2 and apparently missing something fundamental. My setup
is master/slave streaming replication (postgresql 9.3 native) and pgpool2
3.3.2 (from ubuntu prepackaged) on both machines. I have my backend0 and
backend1 pointing to devdb1.lan and slave.devdb1.lan, both set to allow
failover and i have load balancing enabled and watchdog with VIP enabled in
pgpool.

All I want is this:
1) have a VIP (virtual ip) on the current master, to which i'm going to
send requests
2) automatic failover to a slave if the master dies
3) slave becoming my new master
4) old master to be recovered manually, so it becomes new slave.

So far I can successfully do #1 thru #3, but struggle when it comes to
fixing the old master so it becomes the slave of the current master.

Suppose the situation where master dies and slave takes over:

 node_id |     hostname     | port | status | lb_weight |  role
---------+------------------+------+--------+-----------+---------
 0       | devdb1.lan       | 5432 | 3      | 0.500000  | standby
 1       | slave.devdb1.lan | 5432 | 2      | 0.500000  | primary
(2 rows)

After failover, node_id=1 holds the VIP, it has read/write access to
underlying DB and all is good.

I bring up the old master and want to make it a slave:

root at slave:/ pcp_recovery_node -d 300 localhost 9898 postgres postgres 0

as soon as that's done, the status changes to the following:

 node_id |     hostname     | port | status | lb_weight |  role
---------+------------------+------+--------+-----------+---------
 0       | devdb1.lan       | 5432 | 2      | 0.500000  | primary
 1       | slave.devdb1.lan | 5432 | 2      | 0.500000  | standby

and all the hell breaks loose, as pgpool does this:

Feb  9 10:29:11 slave pgpool[1765]: failover_handler called
Feb  9 10:29:11 slave pgpool[1765]: failover_handler: starting to select
new master node
Feb  9 10:29:13 slave pgpool[1765]: failover: set new primary node: 0
Feb  9 10:29:13 slave pgpool[1765]: failover: set new master node: 0

without telling the actual master that it should become a slave, so now i
have two servers, which think they are masters.

I have searched the forums and found one similar question unanswered, while
the other was answered back in 2010, saying that the "next version of
pgpool will be able to deal with the problem" (
http://lists.pgfoundry.org/pipermail/pgpool-general/2010-November/003034.html
).

What am I missing here?

Thanks,
Aistis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20150209/ce3327d3/attachment.html>