[pgpool-general: 6724] Re: Query

Lakshmi Raghavendra lakshmiym108 at gmail.com
Thu Oct 3 00:10:48 JST 2019


Sure, thanks for the update
Muhammad. I currently don't have a setup, will tryout the fix once I have
it.

Thanks & Regards,

     Lakshmi Y M

On Wed, Oct 2, 2019, 7:48 PM Muhammad Usama <m.usama at gmail.com> wrote:

> Hi Lakshmi,
>
> Sorry for the delayed response and many thanks for providing the log files.
>
> I have been looking into a few similar bug reports and after reviewing the
> log you sent and the ones
> shared on  https://www.pgpool.net/mantisbt/view.php?id=547 I realized
> that there was confusion
> in the watchdog code on how to deal with the life-check failed scenarios
> especially for the cases when the
> life-check reports the node failure while watchdog core still able to
> communicate with remote nodes.
> and also for the case when node A's life-check reports node B as lost
> while B still thinks A is alive and healthy.
>
> So I have reviewed the whole watchdog design around the life-check reports
> and have made some fixes.
> I am not sure if you have a development setup and can verify the fix but I
> am attaching the patch anyway if you
> want to try that out. The patch is generated against the current MASTER
> branch and I will commit it after little
> more testing and then backport it to all supported branches, and
> hopefully, your issue will be fixed in the upcoming
> release of Pgpool-II.
>
> Thanks
> Best regards
> Muhammad Usama
>
>
>
> On Mon, Sep 2, 2019 at 9:31 AM Lakshmi Raghavendra <lakshmiym108 at gmail.com>
> wrote:
>
>> Hi Tatsuo,
>>
>>           Please find attached the zip file.
>>
>> Thanks And Regards,
>>
>>   Lakshmi Y M
>>
>> On Mon, Sep 2, 2019 at 5:13 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>
>>> Hi Lakshmi,
>>>
>>> Your attached files are too large to accept by the mailing list. Can
>>> you compress them and post the message along the compressed attached
>>> files?
>>>
>>> Best regards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese:http://www.sraoss.co.jp
>>>
>>> From: Lakshmi Raghavendra <lakshmiym108 at gmail.com>
>>> Subject: Fwd: [pgpool-general: 6672] Query
>>> Date: Sun, 1 Sep 2019 23:14:30 +0530
>>> Message-ID: <
>>> CAHHVJ5sRoVFEEW4EoZLgudCTTm0cqGjXhbbkpnOiimcs4euUSw at mail.gmail.com>
>>>
>>> > ---------- Forwarded message ---------
>>> > From: Lakshmi Raghavendra <lakshmiym108 at gmail.com>
>>> > Date: Sat, Aug 31, 2019 at 10:17 PM
>>> > Subject: Re: [pgpool-general: 6672] Query
>>> > To: Tatsuo Ishii <ishii at sraoss.co.jp>
>>> > Cc: Muhammad Usama <m.usama at gmail.com>, <pgpool-general at pgpool.net>
>>> >
>>> >
>>> > Hi Usama / Tatsuo,
>>> >
>>> >          Received the email notification today, sorry for the delayed
>>> > response.
>>> > Please find attached the pgpool-II log for the same.
>>> >
>>> > So basically below is the short summary of the issue:
>>> >
>>> >
>>> > Node -1 : Pgpool Master + Postgres Master
>>> >
>>> > Node -2 : Pgpool Standby + Postgres Standby
>>> >
>>> > Node-3 : Pgpool Standby + Postgres Standby
>>> >
>>> >
>>> > When network failure happens and Node-1 goes out of network, below is
>>> the
>>> > status :
>>> >
>>> > Node-1 : Pgpool Lost status + Postgres Standby (down)
>>> >
>>> > Node -2 : Pgpool Master + Postgres Master
>>> >
>>> > Node-3 : Pgpool Standby + Postgres Standby
>>> >
>>> >
>>> > Now when Node-1 comes back to network , below is the status causing the
>>> > pgpool cluster to get into imbalance :
>>> >
>>> >
>>> >
>>> > lcm-34-189:~ # psql -h 10.198.34.191 -p 9999 -U pgpool postgres -c
>>> "show
>>> > pool_nodes"
>>> > Password for user pgpool:
>>> >  node_id |   hostname    | port | status | lb_weight |  role   |
>>> select_cnt
>>> > | load_balance_node | replication_delay | last_status_change
>>> >
>>> ---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
>>> >  0       | 10.198.34.188 | 5432 | up     | 0.333333  | primary | 0
>>> >  | true              | 0                 | 2019-08-31 16:40:26
>>> >  1       | 10.198.34.189 | 5432 | up     | 0.333333  | standby | 0
>>> >  | false             | 1013552           | 2019-08-31 16:40:26
>>> >  2       | 10.198.34.190 | 5432 | up     | 0.333333  | standby | 0
>>> >  | false             | 0                 | 2019-08-31 16:40:26
>>> > (3 rows)
>>> >
>>> > lcm-34-189:~ # /usr/local/bin/pcp_watchdog_info -p 9898 -h
>>> 10.198.34.191 -U
>>> > pgpool
>>> > Password:
>>> > 3 NO lcm-34-188.dev.lcm.local:9999 Linux lcm-34-188.dev.lcm.local
>>> > 10.198.34.188
>>> >
>>> > lcm-34-189.dev.lcm.local:9999 Linux lcm-34-189.dev.lcm.local
>>> > lcm-34-189.dev.lcm.local 9999 9000 7 STANDBY
>>> > lcm-34-188.dev.lcm.local:9999 Linux lcm-34-188.dev.lcm.local
>>> 10.198.34.188
>>> > 9999 9000 4 MASTER
>>> > lcm-34-190.dev.lcm.local:9999 Linux lcm-34-190.dev.lcm.local
>>> 10.198.34.190
>>> > 9999 9000 4 MASTER
>>> > lcm-34-189:~ #
>>> >
>>> >
>>> >
>>> > Thanks And Regards,
>>> >
>>> >    Lakshmi Y M
>>> >
>>> > On Tue, Aug 20, 2019 at 8:55 AM Tatsuo Ishii <ishii at sraoss.co.jp>
>>> wrote:
>>> >
>>> >> > On Sat, Aug 17, 2019 at 12:28 PM Tatsuo Ishii <ishii at sraoss.co.jp>
>>> >> wrote:
>>> >> >
>>> >> >> > Hi Pgpool Team,
>>> >> >> >
>>> >> >> >               *We are nearing a production release and running
>>> into
>>> >> the
>>> >> >> > below issues.*
>>> >> >> > Replies at the earliest would be highly helpful and greatly
>>> >> appreciated.
>>> >> >> > Please let us know on how to get rid of the below issues.
>>> >> >> >
>>> >> >> > We have a 3 node pgpool + postgres cluster - M1 , M2, M3. The
>>> >> pgpool.conf
>>> >> >> > is as attached.
>>> >> >> >
>>> >> >> > *Case I :  *
>>> >> >> > M1 - Pgpool Master + Postgres Master
>>> >> >> > M2 , M3 - Pgpool slave + Postgres slave
>>> >> >> >
>>> >> >> > - M1 goes out of network. its marked as LOST in the pgpool
>>> cluster
>>> >> >> > - M2 becomes postgres master
>>> >> >> > - M3 becomes pgpool master.
>>> >> >> > - When M1 comes back to the network, pgpool is able to solve
>>> split
>>> >> brain.
>>> >> >> > However, its changing the postgres master back to M1 by logging a
>>> >> >> statement
>>> >> >> > - "LOG:  primary node was chenged after the sync from new
>>> master", so
>>> >> >> since
>>> >> >> > M2 was already postgres master (and its trigger file is not
>>> touched)
>>> >> its
>>> >> >> > not able to sync to the new master.
>>> >> >> > *I somehow want to avoid this postgres master change..please let
>>> us
>>> >> know
>>> >> >> if
>>> >> >> > there is a way to avoid it*
>>> >> >>
>>> >> >> Sorry but I don't know how to prevent this. Probably when former
>>> >> >> watchdog master recovers from an network outage and there's already
>>> >> >> PostgreSQL primary server, the watchdog master should not sync the
>>> >> >> state. What do you think Usama?
>>> >> >>
>>> >> >
>>> >> > Yes, that's true, there is no functionality that exists in
>>> Pgpool-II to
>>> >> > disable the backend node status synch. In fact that
>>> >> > would be hazardous if we somehow disable the node status syncing.
>>> >> >
>>> >> > But having said that, In the mentioned scenario when the M1 comes
>>> back
>>> >> and
>>> >> > join the watchdog cluster Pgpool-II should have
>>> >> > kept the M2 as the true master while resolving the split-brain. The
>>> >> > algorithm used to resolve the true master considers quite a
>>> >> > few parameters and for the scenario, you explained, M2 should have
>>> kept
>>> >> the
>>> >> > master node status while M1 should have resigned
>>> >> > after joining back the cluster and effectively the M1 node should
>>> have
>>> >> been
>>> >> > syncing the status from M2 ( keeping the proper primary node)
>>> >> > not the other way around.
>>> >> > Can you please share the Pgpool-II log files so that I can have a
>>> look at
>>> >> > what went wrong in this case.
>>> >>
>>> >> Usama,
>>> >>
>>> >> Ok, the scenario (PostgreSQL primary x 2 in the end) should have not
>>> >> happend. That's a good news.
>>> >>
>>> >> Lakshmi,
>>> >>
>>> >> Can you please provide the Pgpool-II log files as Usama requested?
>>> >>
>>> >> Best regards,
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS, Inc. Japan
>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> Japanese:http://www.sraoss.co.jp
>>> >>
>>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20191002/c534fe5a/attachment.html>


More information about the pgpool-general mailing list