[Pgpool-general] Can a failed master rejoin as a slave?

Fri Jun 17 19:14:11 UTC 2011

It's possible that I did not have pgpool_walrecrunning() installed in every database.  I am trying again now to make certain my setup is correct.

-- Matt

On Jun 17, 2011, at 12:10 PM, <Daniel.Crespo at l-3com.com<mailto:Daniel.Crespo at l-3com.com>> wrote:

According to Matt, he is using pgpool-II 3.0.4 built from source. I have
not tried either.

From: pgpool-general-bounces at pgfoundry.org<mailto:pgpool-general-bounces at pgfoundry.org>
[mailto:pgpool-general-bounces at pgfoundry.org] On Behalf Of Anton Koldaev
Sent: Friday, June 17, 2011 2:39 PM
To: Matt Solnit
Cc: pgpool-general at pgfoundry.org<mailto:pgpool-general at pgfoundry.org>
Subject: Re: [Pgpool-general] Can a failed master rejoin as a slave?

Hmm... it seems to me your problem was resolved in 3.0.3:

3.0.3 (umiyameboshi) 2011/02/23

 * Version 3.0.3

 This version fixes various bugs since 3.0.1. Please note that
 3.0.2 was canceled due to a packaging problem.

 - Fix online recovery problem in the streaming replication
       mode(Tatsuo). Consider following scenario. Suppose node 0 is
       the initial primary server and 1 is the initial standby
       server.

1) Node 0 going down and node 1 promotes to new primary.
2) Recover node 0 as new standby.
3) pgpool-II assumes that node 0 is the new primary.

This problem happens because pgpool-II regarded
unconditionally
the youngest node to be the primary. pgpool-II 3.0.3 now
checks
each node by using pgpool_walrecrunning() to see if it
is a
actually primary or not and is able to avoid the problem
and
regards node as standby correctly. Also you can use new
variable "%P" to be used in the recovery script.  If you
do
not install the function, the above problem is not
resolved.

On Fri, Jun 17, 2011 at 8:02 PM, Matt Solnit <msolnit at soasta.com<mailto:msolnit at soasta.com>> wrote:
On Jun 17, 2011, at 8:17 AM, <Daniel.Crespo at l-3com.com<mailto:Daniel.Crespo at l-3com.com>> wrote:

Hi, Matt
pgpool-II immediately attempts to use it as a master again. This
doesn't work, obviously, because it's no longer a master.
I dont understand why it doesnt work.
AFAIK node with the youngest id(backendX in pgpool.conf) and status
2(psql -c 'show pool_nodes;') will always become a primary node.

Check this out:
The backend which was given the DB node ID of 0 will be called
"Master DB". When multiple backends are defined, the service can be
continued even if the Master DB is down (not true in some modes). In
this case, the youngest DB node ID alive will be the new Master DB.
http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html

The problem Matt points out is precisely when primary DB *is
re-attached*. After re-attaching the primary DB (node ID 0), it's "back
online", therefore, pgpool treats it as the master again, according to
your cited explanation. So I agree with Matt: the just re-attached Node
0 should be slave from now on, since it was technically attached AFTER
selecting the new master (which is Node 1 at this point).

-Daniel
Exactly. With streaming replication, only the "true" master can accept
DML statements (insert/update/delete), so if pgpool-II attempts to send
them to the wrong node, you get a "connect execute XYZ in a read-only
transaction" error.

This thread seems to cover the same question, but I couldn't really tell
what the resolution was:
http://lists.pgfoundry.org/pipermail/pgpool-general/2011-April/003568.ht
ml

-- Matt
_______________________________________________
Pgpool-general mailing list
Pgpool-general at pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general

--
Best regards,
Koldaev Anton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20110617/19c4d7e2/attachment.html>