[pgpool-general: 6692] Re: Query

Mon Sep 2 02:40:41 JST 2019

Hi Usama / Tatsuo,

         Received the email notification today, sorry for the delayed
response.
Please find attached the pgpool-II log for the same.

So basically below is the short summary of the issue:

Node -1 : Pgpool Master + Postgres Master

Node -2 : Pgpool Standby + Postgres Standby

Node-3 : Pgpool Standby + Postgres Standby

When network failure happens and Node-1 goes out of network, below is the
status :

Node-1 : Pgpool Lost status + Postgres Standby (down)

Node -2 : Pgpool Master + Postgres Master

Node-3 : Pgpool Standby + Postgres Standby

Now when Node-1 comes back to network , below is the status causing the
pgpool cluster to get into imbalance :

lcm-34-189:~ # psql -h 10.198.34.191 -p 9999 -U pgpool postgres -c "show
pool_nodes"
Password for user pgpool:
 node_id |   hostname    | port | status | lb_weight |  role   | select_cnt
| load_balance_node | replication_delay | last_status_change
---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | 10.198.34.188 | 5432 | up     | 0.333333  | primary | 0
 | true              | 0                 | 2019-08-31 16:40:26
 1       | 10.198.34.189 | 5432 | up     | 0.333333  | standby | 0
 | false             | 1013552           | 2019-08-31 16:40:26
 2       | 10.198.34.190 | 5432 | up     | 0.333333  | standby | 0
 | false             | 0                 | 2019-08-31 16:40:26
(3 rows)

lcm-34-189:~ # /usr/local/bin/pcp_watchdog_info -p 9898 -h 10.198.34.191 -U
pgpool
Password:
3 NO lcm-34-188.dev.lcm.local:9999 Linux lcm-34-188.dev.lcm.local
10.198.34.188

lcm-34-189.dev.lcm.local:9999 Linux lcm-34-189.dev.lcm.local
lcm-34-189.dev.lcm.local 9999 9000 7 STANDBY
lcm-34-188.dev.lcm.local:9999 Linux lcm-34-188.dev.lcm.local 10.198.34.188
9999 9000 4 MASTER
lcm-34-190.dev.lcm.local:9999 Linux lcm-34-190.dev.lcm.local 10.198.34.190
9999 9000 4 MASTER
lcm-34-189:~ #

Thanks And Regards,

   Lakshmi Y M

On Tue, Aug 20, 2019 at 8:55 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > On Sat, Aug 17, 2019 at 12:28 PM Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >
> >> > Hi Pgpool Team,
> >> >
> >> >               *We are nearing a production release and running into
> the
> >> > below issues.*
> >> > Replies at the earliest would be highly helpful and greatly
> appreciated.
> >> > Please let us know on how to get rid of the below issues.
> >> >
> >> > We have a 3 node pgpool + postgres cluster - M1 , M2, M3. The
> pgpool.conf
> >> > is as attached.
> >> >
> >> > *Case I :  *
> >> > M1 - Pgpool Master + Postgres Master
> >> > M2 , M3 - Pgpool slave + Postgres slave
> >> >
> >> > - M1 goes out of network. its marked as LOST in the pgpool cluster
> >> > - M2 becomes postgres master
> >> > - M3 becomes pgpool master.
> >> > - When M1 comes back to the network, pgpool is able to solve split
> brain.
> >> > However, its changing the postgres master back to M1 by logging a
> >> statement
> >> > - "LOG:  primary node was chenged after the sync from new master", so
> >> since
> >> > M2 was already postgres master (and its trigger file is not touched)
> its
> >> > not able to sync to the new master.
> >> > *I somehow want to avoid this postgres master change..please let us
> know
> >> if
> >> > there is a way to avoid it*
> >>
> >> Sorry but I don't know how to prevent this. Probably when former
> >> watchdog master recovers from an network outage and there's already
> >> PostgreSQL primary server, the watchdog master should not sync the
> >> state. What do you think Usama?
> >>
> >
> > Yes, that's true, there is no functionality that exists in Pgpool-II to
> > disable the backend node status synch. In fact that
> > would be hazardous if we somehow disable the node status syncing.
> >
> > But having said that, In the mentioned scenario when the M1 comes back
> and
> > join the watchdog cluster Pgpool-II should have
> > kept the M2 as the true master while resolving the split-brain. The
> > algorithm used to resolve the true master considers quite a
> > few parameters and for the scenario, you explained, M2 should have kept
> the
> > master node status while M1 should have resigned
> > after joining back the cluster and effectively the M1 node should have
> been
> > syncing the status from M2 ( keeping the proper primary node)
> > not the other way around.
> > Can you please share the Pgpool-II log files so that I can have a look at
> > what went wrong in this case.
>
> Usama,
>
> Ok, the scenario (PostgreSQL primary x 2 in the end) should have not
> happend. That's a good news.
>
> Lakshmi,
>
> Can you please provide the Pgpool-II log files as Usama requested?
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190901/7ae408c8/attachment.html>