[pgpool-hackers: 2542] Re: New Feature with patch: Quorum and Consensus for backend failover

Tatsuo Ishii ishii at sraoss.co.jp
Tue Sep 19 11:18:35 JST 2017


>> Hi Ishii-San
>> 
>> On Mon, Sep 11, 2017 at 11:01 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> 
>>> Usama,
>>>
>>> I have modified watchdog regression script to test out your quorum
>>> aware failover patch. Here are differences from the existing script.
>>>
>>> - Install streaming replication primary and standby DB node (before
>>>   raw mode + only 1 node). This is necessary to test an ordinary
>>>   failover scenario. Current script creates only one DB node. So if we
>>>   get the node down, the cluster goes into "all db node down"
>>>   status. I already reported possible problem with the status up
>>>   thread and waiting for the answer from Usama.
>>>
>>> - Add one more pgpool-II node "standby2". For this purpose, new
>>>   configuration file "standby2.conf" added.
>>>
>>> - Add new test scenario: "fake" failover. By using the infrastructure
>>>   I have created to simulate the communication path between standby2
>>>   and DB node 1. The test checks if such a error raised a failover
>>>   request from standby2, but it is safely ignored.
>>>
>>> - Add new test scenario: "real" failover. Shutting down DB node 1,
>>>   which should raise a failover request.
>>>
>>> - Modify test.sh to agree with the changes above.
>>>
>> 
>> 
>> Thanks for testing and test script patch. I think the changes and test
>> scenario is spot on but I think we should add a new test case
>> with these modifications and keep the 004_watchdog test case in the same
>> shape as-well, This will help us to identify watchdog issues more
>> swiftly.
> 
> Agreed. Once your quorum patch committed, I will add the test as
> 011.watchdog_quorum_failover.  (if you have a better idea for the test
> name, please let me know).
>
>>> Since the modification requires your quorum patch committed, I haven't
>>> push the change yet.

Done.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

>> I was worried about the scenario mentioned above and now identified it as
>> an existing issue, So will commit the patch tomorrow.
> 
> Ok.
> 
>> Thanks
>> Best regardsd
>> Muhammad Usama
>> 
>> 
>>> Best regards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese:http://www.sraoss.co.jp
>>>
>>> > I have tested the patch a little bit using 004 watchdog regression
>>> > test.  After the test ends, I manually started master and standby
>>> > Pgpool-II.
>>> >
>>> > 1) Stop master PostgreSQL. Since there's only one PostgreSQL is
>>> > configured, I expected:
>>> >
>>> > psql: ERROR:  pgpool is not accepting any new connections
>>> > DETAIL:  all backend nodes are down, pgpool requires at least one valid
>>> node
>>> > HINT:  repair the backend nodes and restart pgpool
>>> >
>>> > but master Pgpool-II replies:
>>> >
>>> > psql: FATAL:  failed to create a backend connection
>>> > DETAIL:  executing failover on backend
>>> >
>>> > Is this normal?
>>> >
>>> > 2) I shutdown the master node to see if the standby escalates.
>>> >
>>> > After shutting down the master, I see this using pcp_watchdog_info:
>>> >
>>> > pcp_watchdog_info -p 11105
>>> > localhost:11100 Linux tishii-CF-SX3HE4BP localhost 11100 21104 4 MASTER
>>> > localhost:11000 Linux tishii-CF-SX3HE4BP localhost 11000 21004 10
>>> SHUTDOWN
>>> >
>>> > Seems ok but I want to confirm.
>>> >
>>> > master and standby pgpool logs attached.
>>> >
>>> > Best regards,
>>> > --
>>> > Tatsuo Ishii
>>> > SRA OSS, Inc. Japan
>>> > English: http://www.sraoss.co.jp/index_en.php
>>> > Japanese:http://www.sraoss.co.jp
>>> >
>>> >> On Fri, Aug 25, 2017 at 5:05 PM, Tatsuo Ishii <ishii at sraoss.co.jp>
>>> wrote:
>>> >>
>>> >>> > On Fri, Aug 25, 2017 at 12:53 PM, Tatsuo Ishii <ishii at sraoss.co.jp>
>>> >>> wrote:
>>> >>> >
>>> >>> >> Usama,
>>> >>> >>
>>> >>> >> With the new patch, the regression tests all passed.
>>> >>> >>
>>> >>> >
>>> >>> > Glad to hear that :-)
>>> >>> > Did you had a chance to look at the node quarantine state I added.
>>> What
>>> >>> are
>>> >>> > your thoughts on that ?
>>> >>>
>>> >>> I'm going to look into the patch this weekend.
>>> >>>
>>> >>
>>> >> Many thanks
>>> >>
>>> >> Best Regards
>>> >> Muhammad Usama
>>> >>
>>> >>>
>>> >>> Best regards,
>>> >>> --
>>> >>> Tatsuo Ishii
>>> >>> SRA OSS, Inc. Japan
>>> >>> English: http://www.sraoss.co.jp/index_en.php
>>> >>> Japanese:http://www.sraoss.co.jp
>>> >>>
>>> >>> >> > Hi Ishii-San
>>> >>> >> >
>>> >>> >> > Please fine the updated patch, It fixes the regression issue you
>>> were
>>> >>> >> > facing and also another bug which I encountered during my testing.
>>> >>> >> >
>>> >>> >> > -- Adding Yugo to the thread,
>>> >>> >> > Hi Yugo,
>>> >>> >> >
>>> >>> >> > Since you are an expert of watchdog feature, So I thought you
>>> might
>>> >>> have
>>> >>> >> > something to say especially regarding the discussion points
>>> mentioned
>>> >>> in
>>> >>> >> > the initial mail.
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > Thanks
>>> >>> >> > Best Regards
>>> >>> >> > Muhammad Usama
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > On Thu, Aug 24, 2017 at 11:25 AM, Muhammad Usama <
>>> m.usama at gmail.com>
>>> >>> >> wrote:
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> On Thu, Aug 24, 2017 at 4:34 AM, Tatsuo Ishii <
>>> ishii at sraoss.co.jp>
>>> >>> >> wrote:
>>> >>> >> >>
>>> >>> >> >>> After applying the patch, many of regression tests fail. It
>>> seems
>>> >>> >> >>> pgpool.conf.sample has bogus comment which causes the
>>> pgpool.conf
>>> >>> >> >>> parser to complain parse error.
>>> >>> >> >>>
>>> >>> >> >>> 2017-08-24 08:22:36: pid 6017: FATAL:  syntex error in
>>> configuration
>>> >>> >> file
>>> >>> >> >>> "/home/t-ishii/work/pgpool-II/current/pgpool2/src/test/regre
>>> >>> >> >>> ssion/tests/004.watchdog/standby/etc/pgpool.conf"
>>> >>> >> >>> 2017-08-24 08:22:36: pid 6017: DETAIL:  parse error at line 568
>>> '*'
>>> >>> >> token
>>> >>> >> >>> = 8
>>> >>> >> >>>
>>> >>> >> >>
>>> >>> >> >> Really sorry, Somehow I overlooked the sample config file
>>> changes I
>>> >>> made
>>> >>> >> >> at the last minute.
>>> >>> >> >> Will send you the updated version.
>>> >>> >> >>
>>> >>> >> >> Thanks
>>> >>> >> >> Best Regards
>>> >>> >> >> Muhammad Usama
>>> >>> >> >>
>>> >>> >> >>>
>>> >>> >> >>> Best regards,
>>> >>> >> >>> --
>>> >>> >> >>> Tatsuo Ishii
>>> >>> >> >>> SRA OSS, Inc. Japan
>>> >>> >> >>> English: http://www.sraoss.co.jp/index_en.php
>>> >>> >> >>> Japanese:http://www.sraoss.co.jp
>>> >>> >> >>>
>>> >>> >> >>> > Usama,
>>> >>> >> >>> >
>>> >>> >> >>> > Thanks for the patch. I am going to review it.
>>> >>> >> >>> >
>>> >>> >> >>> > In the mean time when I apply your patch, I got some trailing
>>> >>> >> >>> > whitespace errors. Can you please fix them?
>>> >>> >> >>> >
>>> >>> >> >>> > /home/t-ishii/quorum_aware_failover.diff:470: trailing
>>> >>> whitespace.
>>> >>> >> >>> >
>>> >>> >> >>> > /home/t-ishii/quorum_aware_failover.diff:485: trailing
>>> >>> whitespace.
>>> >>> >> >>> >
>>> >>> >> >>> > /home/t-ishii/quorum_aware_failover.diff:564: trailing
>>> >>> whitespace.
>>> >>> >> >>> >
>>> >>> >> >>> > /home/t-ishii/quorum_aware_failover.diff:1428: trailing
>>> >>> whitespace.
>>> >>> >> >>> >
>>> >>> >> >>> > /home/t-ishii/quorum_aware_failover.diff:1450: trailing
>>> >>> whitespace.
>>> >>> >> >>> >
>>> >>> >> >>> > warning: squelched 3 whitespace errors
>>> >>> >> >>> > warning: 8 lines add whitespace errors.
>>> >>> >> >>> >
>>> >>> >> >>> > Best regards,
>>> >>> >> >>> > --
>>> >>> >> >>> > Tatsuo Ishii
>>> >>> >> >>> > SRA OSS, Inc. Japan
>>> >>> >> >>> > English: http://www.sraoss.co.jp/index_en.php
>>> >>> >> >>> > Japanese:http://www.sraoss.co.jp
>>> >>> >> >>> >
>>> >>> >> >>> >> Hi
>>> >>> >> >>> >>
>>> >>> >> >>> >> I was working on the new feature to make the backend node
>>> >>> failover
>>> >>> >> >>> quorum
>>> >>> >> >>> >> aware and on the half way through the implementation I also
>>> added
>>> >>> >> the
>>> >>> >> >>> >> majority consensus feature for the same.
>>> >>> >> >>> >>
>>> >>> >> >>> >> So please find the first version of the patch for review that
>>> >>> makes
>>> >>> >> the
>>> >>> >> >>> >> backend node failover consider the watchdog cluster quorum
>>> status
>>> >>> >> and
>>> >>> >> >>> seek
>>> >>> >> >>> >> the majority consensus before performing failover.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Changes in the Failover mechanism with watchdog.*
>>> >>> >> >>> >> For this new feature I have modified the Pgpool-II's existing
>>> >>> >> failover
>>> >>> >> >>> >> mechanism with watchdog.
>>> >>> >> >>> >> Previously as you know when the Pgpool-II require to perform
>>> a
>>> >>> node
>>> >>> >> >>> >> operation (failover, failback, promote-node) with the
>>> watchdog.
>>> >>> The
>>> >>> >> >>> >> watchdog used to propagated the failover request to all the
>>> >>> >> Pgpool-II
>>> >>> >> >>> nodes
>>> >>> >> >>> >> in the watchdog cluster and as soon as the request was
>>> received
>>> >>> by
>>> >>> >> the
>>> >>> >> >>> >> node, it used to initiate the local failover and that
>>> failover
>>> >>> was
>>> >>> >> >>> >> synchronised on all nodes using the distributed locks.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Now Only the Master node performs the failover.*
>>> >>> >> >>> >> The attached patch changes the mechanism of synchronised
>>> >>> failover,
>>> >>> >> and
>>> >>> >> >>> now
>>> >>> >> >>> >> only the Pgpool-II of master watchdog node performs the
>>> failover,
>>> >>> >> and
>>> >>> >> >>> all
>>> >>> >> >>> >> other standby nodes sync the backend statuses after the
>>> master
>>> >>> >> >>> Pgpool-II is
>>> >>> >> >>> >> finished with the failover.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Overview of new failover mechanism.*
>>> >>> >> >>> >> -- If the failover request is received to the standby
>>> watchdog
>>> >>> >> >>> node(from
>>> >>> >> >>> >> local Pgpool-II), That request is forwarded to the master
>>> >>> watchdog
>>> >>> >> and
>>> >>> >> >>> the
>>> >>> >> >>> >> Pgpool-II main process is returned with the
>>> >>> >> FAILOVER_RES_WILL_BE_DONE
>>> >>> >> >>> >> return code. And upon receiving the FAILOVER_RES_WILL_BE_DONE
>>> >>> from
>>> >>> >> the
>>> >>> >> >>> >> watchdog for the failover request the requesting Pgpool-II
>>> moves
>>> >>> >> >>> forward
>>> >>> >> >>> >> without doing anything further for the particular failover
>>> >>> command.
>>> >>> >> >>> >>
>>> >>> >> >>> >> -- Now when the failover request from standby node is
>>> received by
>>> >>> >> the
>>> >>> >> >>> >> master watchdog, after performing the validation, applying
>>> the
>>> >>> >> >>> consensus
>>> >>> >> >>> >> rules the failover request is triggered on the local
>>> Pgpool-II .
>>> >>> >> >>> >>
>>> >>> >> >>> >> -- When the failover request is received to the master
>>> watchdog
>>> >>> node
>>> >>> >> >>> from
>>> >>> >> >>> >> the local Pgpool-II (On the IPC channel) the watchdog process
>>> >>> inform
>>> >>> >> >>> the
>>> >>> >> >>> >> Pgpool-II requesting process to proceed with failover
>>> (provided
>>> >>> all
>>> >>> >> >>> >> failover rules are satisfied).
>>> >>> >> >>> >>
>>> >>> >> >>> >> -- After the failover is finished on the master Pgpool-II,
>>> the
>>> >>> >> failover
>>> >>> >> >>> >> function calls the *wd_failover_end*() which sends the
>>> backend
>>> >>> sync
>>> >>> >> >>> >> required message to all standby watchdogs.
>>> >>> >> >>> >>
>>> >>> >> >>> >> -- Upon receiving the sync required message from master
>>> watchdog
>>> >>> >> node
>>> >>> >> >>> all
>>> >>> >> >>> >> Pgpool-II sync the new statuses of each backend node from the
>>> >>> master
>>> >>> >> >>> >> watchdog.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *No More Failover locks*
>>> >>> >> >>> >> Since with this new failover mechanism we do not require any
>>> >>> >> >>> >> synchronisation and guards against the execution of
>>> >>> >> failover_commands
>>> >>> >> >>> by
>>> >>> >> >>> >> multiple Pgpool-II nodes, So the patch removes all the
>>> >>> distributed
>>> >>> >> >>> locks
>>> >>> >> >>> >> from failover function, This makes the failover simpler and
>>> >>> faster.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *New kind of Failover operation NODE_QUARANTINE_REQUEST*
>>> >>> >> >>> >> The patch adds the new kind of backend node operation
>>> >>> >> NODE_QUARANTINE
>>> >>> >> >>> which
>>> >>> >> >>> >> is effectively same as the NODE_DOWN, but with
>>> node_quarantine
>>> >>> the
>>> >>> >> >>> >> failover_command is not triggered.
>>> >>> >> >>> >> The NODE_DOWN_REQUEST is automatically converted to the
>>> >>> >> >>> >> NODE_QUARANTINE_REQUEST when the failover is requested on the
>>> >>> >> backend
>>> >>> >> >>> node
>>> >>> >> >>> >> but watchdog cluster does not holds the quorum.
>>> >>> >> >>> >> This means in the absence of quorum the failed backend nodes
>>> are
>>> >>> >> >>> >> quarantined and when the quorum becomes available again the
>>> >>> >> Pgpool-II
>>> >>> >> >>> >> performs the failback operation on all quarantine nodes.
>>> >>> >> >>> >> And again when the failback is performed on the quarantine
>>> >>> backend
>>> >>> >> >>> node the
>>> >>> >> >>> >> failover function does not trigger the failback_command.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Controlling the Failover behaviour.*
>>> >>> >> >>> >> The patch adds three new configuration parameters to
>>> configure
>>> >>> the
>>> >>> >> >>> failover
>>> >>> >> >>> >> behaviour from user side.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *failover_when_quorum_exists*
>>> >>> >> >>> >> When enabled the failover command will only be executed when
>>> the
>>> >>> >> >>> watchdog
>>> >>> >> >>> >> cluster holds the quorum. And when the quorum is absent and
>>> >>> >> >>> >> failover_when_quorum_exists is enabled the failed backend
>>> nodes
>>> >>> will
>>> >>> >> >>> get
>>> >>> >> >>> >> quarantine until the quorum becomes available again.
>>> >>> >> >>> >> disabling it will enable the old behaviour of failover
>>> commands.
>>> >>> >> >>> >>
>>> >>> >> >>> >>
>>> >>> >> >>> >> *failover_require_consensus*This new configuration parameter
>>> >>> can be
>>> >>> >> >>> used to
>>> >>> >> >>> >> make sure we get the majority vote before performing the
>>> >>> failover on
>>> >>> >> >>> the
>>> >>> >> >>> >> node. When *failover_require_consensus* is enabled then the
>>> >>> >> failover is
>>> >>> >> >>> >> only performed after receiving the failover request from the
>>> >>> >> majority
>>> >>> >> >>> or
>>> >>> >> >>> >> Pgpool-II nodes.
>>> >>> >> >>> >> For example in three nodes cluster the failover will not be
>>> >>> >> performed
>>> >>> >> >>> until
>>> >>> >> >>> >> at least two nodes ask for performing the failover on the
>>> >>> particular
>>> >>> >> >>> >> backend node.
>>> >>> >> >>> >>
>>> >>> >> >>> >> It is also worthwhile to mention here that
>>> >>> >> *failover_require_consensus*
>>> >>> >> >>> >> only works when failover_when_quorum_exists is enables.
>>> >>> >> >>> >>
>>> >>> >> >>> >>
>>> >>> >> >>> >> *enable_multiple_failover_requests_from_node*
>>> >>> >> >>> >> This parameter works in connection with
>>> >>> *failover_require_consensus*
>>> >>> >> >>> >> config. When enabled a single Pgpool-II node can vote for
>>> >>> failover
>>> >>> >> >>> multiple
>>> >>> >> >>> >> times.
>>> >>> >> >>> >> For example in the three nodes cluster if one Pgpool-II node
>>> >>> sends
>>> >>> >> the
>>> >>> >> >>> >> failover request of particular node twice that would be
>>> counted
>>> >>> as
>>> >>> >> two
>>> >>> >> >>> >> votes in favour of failover and the failover will be
>>> performed
>>> >>> even
>>> >>> >> if
>>> >>> >> >>> we
>>> >>> >> >>> >> do not get a vote from other two nodes.
>>> >>> >> >>> >>
>>> >>> >> >>> >> And when *enable_multiple_failover_requests_from_node* is
>>> >>> disabled,
>>> >>> >> >>> Only
>>> >>> >> >>> >> the first vote from each Pgpool-II will be accepted and all
>>> other
>>> >>> >> >>> >> subsequent votes will be marked duplicate and rejected.
>>> >>> >> >>> >> So in that case we will require a majority votes from
>>> distinct
>>> >>> >> nodes to
>>> >>> >> >>> >> execute the failover.
>>> >>> >> >>> >> Again this *enable_multiple_failover_requests_from_node*
>>> only
>>> >>> >> becomes
>>> >>> >> >>> >> effective when both *failover_when_quorum_exists* and
>>> >>> >> >>> >> *failover_require_consensus* are enabled.
>>> >>> >> >>> >>
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Controlling the failover: The Coding perspective.*
>>> >>> >> >>> >> Although the failover functions are made quorum and consensus
>>> >>> aware
>>> >>> >> but
>>> >>> >> >>> >> there is still a way to bypass the quorum conditions, and
>>> >>> >> requirement
>>> >>> >> >>> of
>>> >>> >> >>> >> consensus.
>>> >>> >> >>> >>
>>> >>> >> >>> >> For this the patch uses the existing request_details flags in
>>> >>> >> >>> >> POOL_REQUEST_NODE to control the behaviour of failover.
>>> >>> >> >>> >>
>>> >>> >> >>> >> Here are the newly added flags values.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *REQ_DETAIL_WATCHDOG*:
>>> >>> >> >>> >> Setting this flag while issuing the failover command will not
>>> >>> send
>>> >>> >> the
>>> >>> >> >>> >> failover request to the watchdog. But this flag may not be
>>> >>> useful in
>>> >>> >> >>> any
>>> >>> >> >>> >> other place than where it is already used.
>>> >>> >> >>> >> Mostly this flag can be used to avoid the failover command
>>> from
>>> >>> >> going
>>> >>> >> >>> to
>>> >>> >> >>> >> watchdog that is already originated from watchdog. Otherwise
>>> we
>>> >>> can
>>> >>> >> >>> end up
>>> >>> >> >>> >> in infinite loop.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *REQ_DETAIL_CONFIRMED*:
>>> >>> >> >>> >> Setting this flag will bypass the
>>> *failover_require_consensus*
>>> >>> >> >>> >> configuration and immediately perform the failover if quorum
>>> is
>>> >>> >> >>> present.
>>> >>> >> >>> >> This flag can be used to issue the failover request
>>> originated
>>> >>> from
>>> >>> >> PCP
>>> >>> >> >>> >> command.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *REQ_DETAIL_UPDATE*:
>>> >>> >> >>> >> This flag is used for the command where we are failing back
>>> the
>>> >>> >> >>> quarantine
>>> >>> >> >>> >> nodes. Setting this flag will not trigger the
>>> failback_command.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Some conditional flags used:*
>>> >>> >> >>> >> I was not sure about the configuration of each type of
>>> failover
>>> >>> >> >>> operation.
>>> >>> >> >>> >> As we have three main failover operations NODE_UP_REQUEST,
>>> >>> >> >>> >> NODE_DOWN_REQUEST, and PROMOTE_NODE_REQUEST
>>> >>> >> >>> >> So I was thinking do we need to give the configuration
>>> option to
>>> >>> the
>>> >>> >> >>> users,
>>> >>> >> >>> >> if they want to enable/disable quorum checking and consensus
>>> for
>>> >>> >> >>> individual
>>> >>> >> >>> >> failover operation type.
>>> >>> >> >>> >> For example: is it a practical configuration where a user
>>> would
>>> >>> >> want to
>>> >>> >> >>> >> ensure quorum while preforming NODE_DOWN operation while
>>> does not
>>> >>> >> want
>>> >>> >> >>> it
>>> >>> >> >>> >> for NODE_UP.
>>> >>> >> >>> >> So in this patch I use three compile time defines to enable
>>> >>> disable
>>> >>> >> the
>>> >>> >> >>> >> individual failover operation, while we can decide on the
>>> best
>>> >>> >> >>> solution.
>>> >>> >> >>> >>
>>> >>> >> >>> >> NODE_UP_REQUIRE_CONSENSUS: defining it will enable quorum
>>> >>> checking
>>> >>> >> >>> feature
>>> >>> >> >>> >> for NODE_UP_REQUESTs
>>> >>> >> >>> >>
>>> >>> >> >>> >> NODE_DOWN_REQUIRE_CONSENSUS: defining it will enable quorum
>>> >>> checking
>>> >>> >> >>> >> feature for NODE_DOWN_REQUESTs
>>> >>> >> >>> >>
>>> >>> >> >>> >> NODE_PROMOTE_REQUIRE_CONSENSUS: defining it will enable
>>> quorum
>>> >>> >> >>> checking
>>> >>> >> >>> >> feature for PROMOTE_NODE_REQUESTs
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Some Point for Discussion:*
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Do we really need to check ReqInfo->switching flag before
>>> >>> enqueuing
>>> >>> >> >>> >> failover request.*
>>> >>> >> >>> >> While working on the patch I was wondering why do we disallow
>>> >>> >> >>> enqueuing the
>>> >>> >> >>> >> failover command when the failover is already in progress?
>>> For
>>> >>> >> example
>>> >>> >> >>> in
>>> >>> >> >>> >> *pcp_process_command*() function if we see the
>>> >>> *Req_info->switching*
>>> >>> >> >>> flag
>>> >>> >> >>> >> set we bailout with the error instead of enqueuing the
>>> command.
>>> >>> Is
>>> >>> >> is
>>> >>> >> >>> >> really necessary?
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Do we need more granule control over each failover
>>> operation:*
>>> >>> >> >>> >> As described in section "Some conditional flags used" I want
>>> the
>>> >>> >> >>> opinion on
>>> >>> >> >>> >> do we need configuration parameters in pgpool.conf to enable
>>> >>> disable
>>> >>> >> >>> quorum
>>> >>> >> >>> >> and consensus checking on individual failover types.
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Which failover should be mark as Confirmed:*
>>> >>> >> >>> >> As defined in the above section of REQ_DETAIL_CONFIRMED, We
>>> can
>>> >>> mark
>>> >>> >> >>> the
>>> >>> >> >>> >> failover request to not need consensus, currently the
>>> requests
>>> >>> from
>>> >>> >> >>> the PCP
>>> >>> >> >>> >> commands are fired with this flag. But I was wondering there
>>> may
>>> >>> be
>>> >>> >> >>> more
>>> >>> >> >>> >> places where we many need to use the flag.
>>> >>> >> >>> >> For example I currently use the same confirmed flag when
>>> >>> failover is
>>> >>> >> >>> >> triggered because of *replication_stop_on_mismatch*.
>>> >>> >> >>> >>
>>> >>> >> >>> >> I think we should think this flag for each place of failover,
>>> >>> like
>>> >>> >> >>> when the
>>> >>> >> >>> >> failover is triggered
>>> >>> >> >>> >> because of health_check failure.
>>> >>> >> >>> >> because of replication mismatch
>>> >>> >> >>> >> because of backend_error
>>> >>> >> >>> >> e.t.c
>>> >>> >> >>> >>
>>> >>> >> >>> >> *Node Quarantine behaviour.*
>>> >>> >> >>> >> What do you think about the node quarantine used by this
>>> patch.
>>> >>> Can
>>> >>> >> you
>>> >>> >> >>> >> think of some problem which can be caused by this?
>>> >>> >> >>> >>
>>> >>> >> >>> >> *What should be the default values for each newly added
>>> config
>>> >>> >> >>> parameters.*
>>> >>> >> >>> >>
>>> >>> >> >>> >>
>>> >>> >> >>> >>
>>> >>> >> >>> >> *TODOs*
>>> >>> >> >>> >>
>>> >>> >> >>> >> -- Updating the documentation is still todo. Will do that
>>> once
>>> >>> every
>>> >>> >> >>> aspect
>>> >>> >> >>> >> of the feature will be finalised.
>>> >>> >> >>> >> -- Some code warnings and cleanups are still not done.
>>> >>> >> >>> >> -- I am still little short on testing
>>> >>> >> >>> >> -- Regression test cases for the feature
>>> >>> >> >>> >>
>>> >>> >> >>> >>
>>> >>> >> >>> >> Thoughts and suggestions are most welcome.
>>> >>> >> >>> >>
>>> >>> >> >>> >> Thanks
>>> >>> >> >>> >> Best regards
>>> >>> >> >>> >> Muhammad Usama
>>> >>> >> >>> > _______________________________________________
>>> >>> >> >>> > pgpool-hackers mailing list
>>> >>> >> >>> > pgpool-hackers at pgpool.net
>>> >>> >> >>> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>>> >>> >> >>>
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>>
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers


More information about the pgpool-hackers mailing list