[pgpool-general: 6675] Re: Query

Muhammad Usama m.usama at gmail.com
Mon Aug 19 22:05:59 JST 2019


On Sat, Aug 17, 2019 at 12:28 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > Hi Pgpool Team,
> >
> >               *We are nearing a production release and running into the
> > below issues.*
> > Replies at the earliest would be highly helpful and greatly appreciated.
> > Please let us know on how to get rid of the below issues.
> >
> > We have a 3 node pgpool + postgres cluster - M1 , M2, M3. The pgpool.conf
> > is as attached.
> >
> > *Case I :  *
> > M1 - Pgpool Master + Postgres Master
> > M2 , M3 - Pgpool slave + Postgres slave
> >
> > - M1 goes out of network. its marked as LOST in the pgpool cluster
> > - M2 becomes postgres master
> > - M3 becomes pgpool master.
> > - When M1 comes back to the network, pgpool is able to solve split brain.
> > However, its changing the postgres master back to M1 by logging a
> statement
> > - "LOG:  primary node was chenged after the sync from new master", so
> since
> > M2 was already postgres master (and its trigger file is not touched) its
> > not able to sync to the new master.
> > *I somehow want to avoid this postgres master change..please let us know
> if
> > there is a way to avoid it*
>
> Sorry but I don't know how to prevent this. Probably when former
> watchdog master recovers from an network outage and there's already
> PostgreSQL primary server, the watchdog master should not sync the
> state. What do you think Usama?
>

Yes, that's true, there is no functionality that exists in Pgpool-II to
disable the backend node status synch. In fact that
would be hazardous if we somehow disable the node status syncing.

But having said that, In the mentioned scenario when the M1 comes back and
join the watchdog cluster Pgpool-II should have
kept the M2 as the true master while resolving the split-brain. The
algorithm used to resolve the true master considers quite a
few parameters and for the scenario, you explained, M2 should have kept the
master node status while M1 should have resigned
after joining back the cluster and effectively the M1 node should have been
syncing the status from M2 ( keeping the proper primary node)
not the other way around.
Can you please share the Pgpool-II log files so that I can have a look at
what went wrong in this case.

Thanks
Best Regards
Muhammad Usama


> In the mean time you could manually stop the primary PostgreSQL on M1
> then execute pcp_recovery_node on it to avoid the dual primary server
> situation.
>
> > *Case II:*
> >
> > M1 - Pgpool Master + Postgres Master
> > M2 , M3 - Pgpool slave + Postgres slave
> >
> > - Shut down M1, M2
> > - M3 is rightly elected as the pgpool master.
> > - However when failover request kicks in, the watchdog rejects with the
> > below log .* Is there a way to make M3 as the postgres master, inspite of
> > the quorum ?*
> > *Please let me know.*
> >
> >
> > 2019-08-16T11:15:04+00:00 lcm-34-182 pgpool[11002]: [92-1] 2019-08-16
> > 11:15:04: pid 11002: LOG: watchdog is processing the failover command
> > [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC
> interface
> > 2019-08-16T11:15:04+00:00 lcm-34-182 pgpool[11002]: [93-1] 2019-08-16
> > 11:15:04: pid 11002: LOG: failover requires the quorum to hold, which is
> > not present at the moment
> > 2019-08-16T11:15:04+00:00 lcm-34-182 pgpool[11002]: [93-2] 2019-08-16
> > 11:15:04: pid 11002: DETAIL: Rejecting the failover request
> > 2019-08-16T11:15:04+00:00 lcm-34-182 pgpool[11002]: [94-1] 2019-08-16
> > 11:15:04: pid 11002: LOG: failover command [DEGENERATE_BACKEND_REQUEST]
> > request from pgpool-II node "lcm-34-182.dev.lcm.local:9999 Linux
> > lcm-34-182.dev.lcm.local" is rejected because the watchdog cluster does
> not
> > hold the quorum
> > 2019-08-16T11:15:04+00:00 lcm-34-182 pgpool[11049]: [23-1] 2019-08-16
> > 11:15:04: pid 11049: LOG: degenerate backend request for 1 node(s) from
> pid
> > [11049], is changed to quarantine node request by watchdog
>
> You could manualy promote PostgreSQL on M3 by using pg_ctl promote
> command, then execute pcp_promote_node to make it primary server from
> Pgpool-II's point of view.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190819/ce6925c0/attachment.html>


More information about the pgpool-general mailing list