[pgpool-hackers: 3975] Re: [pgpool-general: 7543] VIP with one node

Muhammad Usama m.usama at gmail.com
Thu Jul 15 15:09:33 JST 2021


On Thu, Jul 15, 2021 at 10:42 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> I am trying to understand your proposal. Please correct me if I am
> wrong.  It seems the proposal just gives up the concept of quorum. For
> example, we start with 3-node cluster A, B and C.  Due to a network
> problem, C is separated with A and B. A and B can still
> communicate. After wd_lost_node_to_remove_timeout passed, A, B become
> a 2-node cluster with quorum. C becomes a 1-node cluster with
> quorum. So a split brain occurs.
>

Hi Ishii_San

Your understanding is correct for the proposal. Basically IMHO whatever we
do for trying to remedy that original issue there will always be a chance
of split-brain.

The reason I am proposing this solution is that with this proposed design
the behaviour
would be configurable. For example if user set wd_lost_node_to_remove_timeout
= 0
then this will disable the lost node removal function and eventually the
watchdog would
behave as it does currently.
And normally I expect this wd_lost_node_to_remove_timeout value to be set
in the
range of 5 to 10 mins. Because blackout for more than 5 to 10 mins would
mean
there is some serious problem in the network that a node is unable to
community for
such a long period of time and we need resume the service even if it comes
with
the risk of a split-brain.

The second part of proposal talks about the nodes that are properly shut
down. In that
case, the proposal is to stop counting those nodes towards the quorum
calculation since
we already know that these nodes are not alive anymore. But again it also
have associated
risks in case the previously shutdown node got started again but unable to
communicate
with existing cluster.

Best regards
Muhammad Usama


>
> Am I missing something?
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Hi,
> >
> > I have been thinking about this issue and I believe the concerns are
> genuine
> > and we need to figure out a way around.
> >
> > IMHO one possible solution is to change how watchdog does the quorum
> > calculations
> > and which nodes makes up the watchdog cluster.
> >
> > The current implementation calculates the quorum based on the number of
> > configured
> > watchdog nodes and alive nodes. And if we make the watchdog cluster
> adjust
> > itself dynamically
> > based on the current situation, then we can have a better user
> experience.
> >
> > As of now the watchdog cluster definition recognises node as either alive
> > or absent.
> > And the number of alive-nodes need to be >= to the total number of
> > configured nodes
> > for the quorum to hold.
> >
> > So my suggestion is that instead of using a binary status, we consider
> that
> > watchdog node
> > can be in one of three states 'Alive', 'Dead' or 'Lost', and all dead
> nodes
> > should be considered
> > as not part of the current cluster.
> >
> > Consider the example where we have 5 configured watchdog nodes.
> > With current implementation the quorum will require 3 alive nodes.
> >
> > Now suppose we have started only 3 nodes. That would be good enough to
> make
> > the cluster
> > hold the quorum and one of the nodes will eventually acquire the VIP, so
> no
> > problems there.
> > But as soon as we shutdown one of the nodes or it becomes 'Lost' the
> > cluster will lose the
> > quorum and release the VIP, making the service unavailable.
> >
> > Consider the same scenario, with above mentioned new definition of
> watchdog
> > cluster.
> > When we initially start 3 nodes out of 5 the cluster marks the remaining
> > two nodes
> > (after configurable time) as dead, and removes them for the cluster until
> > one of those nodes
> > is started and connects with the cluster. So after that configured time,
> > even if we have 5 configured
> > watchdog nodes our cluster dynamically adjusts itself and considers the
> > cluster having
> > only 3 nodes (instead of 5) and that will require only 2 nodes be alive.
> >
> > By this new definition if one of the node gets lost, the cluster will
> still
> > hold the quorum
> > since it was considering it consists of 3 nodes. And that lost node will
> > again be marked
> > as dead after a configured amount of time and eventually further shrink
> the
> > cluster size to 2 nodes.
> > Similarly, when some previously dead node joins the cluster, the cluster
> > will expend itself again to
> > accommodate that node.
> >
> > On top of that if some watchdog node is properly shutdown then it would
> be
> > Immediately
> > marked as dead and removed from the cluster.
> >
> > Of course, this is not a bullet-proof and comes with the risk of having a
> > split-brain in case of
> > a few network partitioning scenarios, but I think it would work in 99% of
> > cases.
> >
> > This new implementation would require two new (proposed) additional
> > configuration parameter.
> > 1- wd_lost_node_to_remove_timeout (seconds)
> > 2- wd_initial_node_showup_time (seconds)
> >
> > Also, we can also implement a new PCP command to force the lost node to
> be
> > marked as dead.
> >
> > Thoughts and suggestions?
> >
> > Thanks
> > Best regards
> > Muhammad Usama
> >
> > On Tue, May 11, 2021 at 7:18 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> Hi Pgpool-II developers,
> >>
> >> Recently we got a complain below from a user.
> >>
> >> Currently Pgpool-II releases VIP if the quorum is lost.  This is
> >> reasonable and safe so that we can prevent split-brain problems.
> >>
> >> However, I feel it would be nice if there's a way to allow to hold VIP
> >> even if the quorum is lost for emergency sake.
> >>
> >> Suppose we have 3-node pgpool each in different 3 cities. Those 2
> >> cities are break down by an earth quake, and user want to keep their
> >> business relying on the remaining 1 node. Of course we could disable
> >> watchdog and restart pgpool so that applications can directly connect
> >> to pgpool. However in this case applications need to change the IP
> >> which connect to.
> >>
> >> Also as the user pointed out, with 2-node configuration the VIP can be
> >> used by enabling enable_consensus_with_half_vote even if there is
> >> only 1 node remains. It seems as if 2-node config is better than
> >> 3-node config in this regard. Of course this is not true since 3-node
> >> config is much more resistant to split-brain problems.
> >>
> >> I think there are multiple ways to deal with the problem:
> >>
> >> 1) invent a new config parameter so that pgpool keeps VIP even if the
> >> quorum is lost.
> >>
> >> 2) add a new pcp command which re-attaches the VIP after VIP is lost
> >> due to loss of the quorum.
> >>
> >> #1 could easily creates duplicate VIPs. #2 looks better but when other
> >>  nodes come up, it could be possible that duplicate VIPs are created.
> >>
> >> Thoughts?
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> > Dear all,
> >> >
> >> > I have fairly common 3-node cluster, with each node running a PgPool
> >> > and a PostreSQL instance.
> >> >
> >> > I have set up priorities so that:
> >> >   - when all 3 nodes are up, the 1st node is gonna have the VIP,
> >> >   - when the 1st node is down, the 2nd node is gonna have the VIP, and
> >> >   - when both the 1st and the 2nd nodes are down, then the 3rd node
> >> > should get the VIP.
> >> >
> >> > My problem is that when only 1 node is up, the VIP is not brought up,
> >> > because there is no quorum.
> >> > How can I get PgPool to bring up the VIP to the only remaining node,
> >> > which still could and should serve requests?
> >> >
> >> > Regards,
> >> >
> >> > tamas
> >> >
> >> > --
> >> > Rébeli-Szabó Tamás
> >> >
> >> > _______________________________________________
> >> > pgpool-general mailing list
> >> > pgpool-general at pgpool.net
> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> _______________________________________________
> >> pgpool-hackers mailing list
> >> pgpool-hackers at pgpool.net
> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20210715/16ea88f3/attachment-0001.htm>


More information about the pgpool-hackers mailing list