[pgpool-hackers: 3307] Re: duplicate failover request over allow_multiple_failover_requests_from_node=off

Tue Apr 16 18:17:25 JST 2019

On Tue, Apr 16, 2019 at 2:03 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> >> Thanks. However this will change existing behavior. Probably we should
> >> make the change against master branch only?
> >>
> >
> > Probably yes, because the current fix I have for this in my mind involves
> > the configurable timeout parameter
> > to make the master pgpool resign. Let me come up with the patch and then
> we
> > work on the part of that
> > needs to be back ported.
> > And regarding the patch I shared upthread to continue the health check on
> > quarantined nodes, Do you think we should
> > also back-patch it to older versions as-well ?
>
> Not sure we should back port both of two patches since they will
> change existing behaviors (and even one of them is documented).
>
> What do you think?
>

Totally agreed. So I will go on to make it for master branch only.
Many thanks for the valuable inputs.

Best regards
Muhammad Usama

> > Thanks
> > Best Regards
> > Muhammad Usama
> >
> >
> >>
> >> > Thanks
> >> > Best Regards
> >> > Muhammad Usama
> >> >
> >> >
> >> >> > Thanks
> >> >> > Best Regards
> >> >> > Muhammad Usama
> >> >> >
> >> >> >
> >> >> >> >> > Can you please try out the attached patch, to see if the
> >> solution
> >> >> >> works
> >> >> >> >> for
> >> >> >> >> > the situation?
> >> >> >> >> > The patch is generated against current master branch.
> >> >> >> >> >
> >> >> >> >> > Thanks
> >> >> >> >> > Best Regards
> >> >> >> >> > Muhammad Usama
> >> >> >> >> >
> >> >> >> >> > On Wed, Apr 10, 2019 at 2:04 PM TAKATSUKA Haruka <
> >> >> >> harukat at sraoss.co.jp>
> >> >> >> >> > wrote:
> >> >> >> >> >
> >> >> >> >> >> Hello, Pgpool developers
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> I found Pgpool-II watchdog is too strict for duplicate
> failover
> >> >> >> request
> >> >> >> >> >> with allow_multiple_failover_requests_from_node=off setting.
> >> >> >> >> >>
> >> >> >> >> >> For example, A watchdog cluster with 3 pgpool instances is
> >> here.
> >> >> >> >> >> Their backends are PostgreSQL servers using streaming
> >> replication.
> >> >> >> >> >>
> >> >> >> >> >> When the communication between master/coordinator pgpool and
> >> >> >> >> >> primary PostgreSQL node is down during a short period
> >> >> >> >> >> (or pgpool do any false-positive judgement by various
> reasons),
> >> >> >> >> >> and then the pgpool tries to failover but cannot get the
> >> >> consensus,
> >> >> >> >> >> so it makes the primary node into quarantine status. It
> cannot
> >> >> >> >> >> be reset automatically. As a result, the service becomes
> >> >> unavailable.
> >> >> >> >> >>
> >> >> >> >> >> This case generates logs like the following:
> >> >> >> >> >>
> >> >> >> >> >> pid 1234: LOG:  new IPC connection received
> >> >> >> >> >> pid 1234: LOG:  watchdog received the failover command from
> >> local
> >> >> >> >> >> pgpool-II on IPC interface
> >> >> >> >> >> pid 1234: LOG:  watchdog is processing the failover command
> >> >> >> >> >> [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II
> on
> >> IPC
> >> >> >> >> interface
> >> >> >> >> >> pid 1234: LOG:  Duplicate failover request from "pg1:5432
> Linux
> >> >> pg1"
> >> >> >> >> node
> >> >> >> >> >> pid 1234: DETAIL:  request ignored
> >> >> >> >> >> pid 1234: LOG:  failover requires the majority vote, waiting
> >> for
> >> >> >> >> consensus
> >> >> >> >> >> pid 1234: DETAIL:  failover request noted
> >> >> >> >> >> pid 4321: LOG:  degenerate backend request for 1 node(s)
> from
> >> pid
> >> >> >> >> [4321],
> >> >> >> >> >> is changed to quarantine node request by watchdog
> >> >> >> >> >> pid 4321: DETAIL:  watchdog is taking time to build
> consensus
> >> >> >> >> >>
> >> >> >> >> >> Note that this case dosen't have any communication truouble
> >> among
> >> >> >> >> >> the Pgpool watchdog nodes.
> >> >> >> >> >> You can reproduce it by changing one PostgreSQL's
> pg_hba.conf
> >> to
> >> >> >> >> >> reject the helth check access from one pgpool node in short
> >> >> period.
> >> >> >> >> >>
> >> >> >> >> >> The document don't say that duplicate failover requests make
> >> the
> >> >> node
> >> >> >> >> >> quarantine immediately. I think it should be just igunoring
> the
> >> >> >> request.
> >> >> >> >> >>
> >> >> >> >> >> A patch file for head of V3_7_STABLE is attached.
> >> >> >> >> >> Pgpool with this patch also disturbs failover by single
> >> pgpool's
> >> >> >> >> repeated
> >> >> >> >> >> failover requests. But it can recover when the connection
> >> trouble
> >> >> is
> >> >> >> >> gone.
> >> >> >> >> >>
> >> >> >> >> >> Does this change have any problem?
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> with best regards,
> >> >> >> >> >> TAKATSUKA Haruka <harukat at sraoss.co.jp>
> >> >> >> >> >> _______________________________________________
> >> >> >> >> >> pgpool-hackers mailing list
> >> >> >> >> >> pgpool-hackers at pgpool.net
> >> >> >> >> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190416/3026a93c/attachment.html>