[pgpool-general: 6916] Re: watchdog fails to start pgpool-4.1.0

Muhammad Usama m.usama at gmail.com
Thu Mar 5 20:41:27 JST 2020


Hi Wolf,

Thanks for the information and provide the log files.
So the real problem was with the misconfigured other_pgpool_port0 setting
on node-1,

This is what was happening.

When a watchdog node starts up it tries to connect to all the configured
nodes.
And as per the configurations on node1 in "pgpool.4.1.0-node1.conf" file,
the initial handshake message
sent from node1 to node0 says that the message intended for watchdog node
on which pgpool port
should be 5432 ( because of other_pgpool_port0 setting on node1). But since
the port value on node0 was 9999
so the watchdog on node 0 was not accepting the node1 as part watchdog
cluster, but at the same time
the handshake packet sent from node 0 to node 1 was perfectly fine for
node1 ( no misconfiguration in that path)
so it was kind of a network-partitioning scenario.

I think the watchdog should have thrown a more descriptive error message in
this case.

I will look into enhancing the error reporting in this scenario.

Thanks
Best regards
Muhammad Usama







On Wed, Mar 4, 2020 at 10:16 PM Wolf Schwurack <wolf at uen.org> wrote:

> Hey Muhammad
>
>
>
> I have this working now with version 4.1.0
>
> Here are my changes made in pgpool.conf on both nodes
>
> On node 0
>
>  diff pgpool.4.1.0-new pgpool.4.1.0-node0.conf
>
> 463c463
>
> < failover_command = '/usr/lib/postgresql/11/bin/failover.sh %h %H %R'
>
> ---
>
> > failover_command = '/usr/lib/postgresql/10/bin/failover.sh %h %H %R'
>
> 700c700
>
> < heartbeat_device0 = 'eth0'
>
> ---
>
> > heartbeat_device0 = ''
>
>
>
> On node 1
>
> diff pgpool.4.1.0-node1.conf pgpool.4.1.0-node1-new
>
> 594c594
>
> < delegate_IP = ''
>
> ---
>
> > delegate_IP = '10.11.0.204'
>
> 738c738
>
> < other_pgpool_port0 = 5432
>
> ---
>
> > other_pgpool_port0 = 9999
>
>
>
> *From: *Muhammad Usama <m.usama at gmail.com>
> *Date: *Wednesday, March 4, 2020 at 8:06 AM
> *To: *Wolfgang Schwurack <wolf at uen.org>
> *Cc: *Tatsuo Ishii <ishii at sraoss.co.jp>, PgPool General <
> pgpool-general at pgpool.net>
> *Subject: *Re: [pgpool-general: 6865] Re: watchdog fails to start
> pgpool-4.1.0
>
>
>
>
>
>
>
> On Wed, Mar 4, 2020 at 7:53 PM Wolf Schwurack <wolf at uen.org> wrote:
>
> Hi Muhammad
>
> I’m sending you the pgpool.log from both node0 and node1. The pgpool.log
> files are when I used pgpool.conf from 4.1.0. Also sending the pgpool.conf
> from 4.0.5 and 4.1.0 When I start pgpool using 4.0.5 pgpool.conf I’m not
> getting any errors but when I use pgpool.conf from 4.1.0 I’m getting “We
> are in split brain”
>
>
>
> I tar all files into pgpool.tar.gz
>
>
>
>
>
> Many thanks, I am looking into this right now and will get back to you soon
>
>
>
> Best regards
>
> Muhammad Usama
>
>
>
> Wolfgang Schwurack
>
> Database/System Administrator
>
> Utah Education Network
>
> 801-587-9444
>
> wolf at uen.org
>
>
>
>
>
>
>
> *From: *Muhammad Usama <m.usama at gmail.com>
> *Date: *Wednesday, March 4, 2020 at 2:20 AM
> *To: *Tatsuo Ishii <ishii at sraoss.co.jp>
> *Cc: *Wolfgang Schwurack <wolf at uen.org>, PgPool General <
> pgpool-general at pgpool.net>
> *Subject: *Re: [pgpool-general: 6865] Re: watchdog fails to start
> pgpool-4.1.0
>
>
>
> Hi Wolfgang,
>
>
>
> Sorry for the late reply. I just realized the email was sitting in my
> drafts folder and was never sent.
>
>
>
> Is it possible if you can share the Pppool log files for both nodes
> preferably with the debug enabled?
>
> Meanwhile, I am also trying to reproduce the scenario locally.
>
>
>
> Thanks
>
> Best regards
>
> Muhammad Usama
>
>
>
>
>
>
>
> On Tue, Feb 18, 2020 at 12:13 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>
> Hi Usama,
>
> Any opinion on this?
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > I turned on enable_consensus_with_half_votes which I’m getting the
> > acquired delegate IP on node 0. But now when I start pgpool on node 1
> > getting this in the log file which is repeating - see below.  When I
> check
> > which node has the virtual IP is show that node 0 does which is the
> master
> > node.
> >
> > 2020-02-12 08:11:52: pid 29493: LOG:  watchdog node state changed from
> > [INITIALIZING] to [MASTER]
> > 2020-02-12 08:11:52: pid 29493: LOG:  I am announcing my self as
> > master/coordinator watchdog node
> > 2020-02-12 08:11:52: pid 29493: LOG:  remote node "" decided it is the
> > true master
> > 2020-02-12 08:11:52: pid 29493: DETAIL:  re-initializing the local
> > watchdog cluster state because of split-brain
> > 2020-02-12 08:11:52: pid 29493: LOG:  watchdog node state changed from
> > [MASTER] to [JOINING]
> > 2020-02-12 08:11:53: pid 29493: LOG:  new watchdog node connection is
> > received from "10.11.0.202:12399"
> > 2020-02-12 08:11:56: pid 29493: LOG:  watchdog node state changed from
> > [JOINING] to [INITIALIZING]
> > 2020-02-12 08:11:57: pid 29493: LOG:  I am the only alive node in the
> > watchdog cluster
> > 2020-02-12 08:11:57: pid 29493: HINT:  skipping stand for coordinator
> state
> >
> > My environment
> > 2 pgpool hosts on Ubuntu 18
> > 2 postgresql hosts on Ubuntu 18 postgreSQL 11
> >
> >
> > Wolfgang Schwurack
> > Database/System Administrator
> > Utah Education Network
> > 801-587-9444
> > Wolf at uen.org
> >
> >
> >
> >
> >
> > On 2/11/20, 3:50 PM, "Tatsuo Ishii" <ishii at sraoss.co.jp> wrote:
> >
> >>Have you turned on enable_consensus_with_half_votes?
> >>From 4.1 you need to turn on this if you use even number of Pgpool-II
> >>nodes.
> >>It's documented in the migration section in the doc:
> >>https://www.pgpool.net/docs/latest/en/html/release-4-1-0.html
> >>
> >>Best regards,
> >>--
> >>Tatsuo Ishii
> >>SRA OSS, Inc. Japan
> >>English: http://www.sraoss.co.jp/index_en.php
> >>Japanese:http://www.sraoss.co.jp
> >>
> >>From: Wolf Schwurack <wolf at uen.org>
> >>Subject: [pgpool-general: 6865] Re: watchdog fails to start pgpool-4.1.0
> >>Date: Tue, 11 Feb 2020 18:10:25 +0000
> >>Message-ID: <56216C05-00F8-4C10-A32A-C793411C7891 at umail.utah.edu>
> >>
> >>> After doing some more testing on version 4.1.0 I have notice that if
> >>>node 0 fails, node 1 never acquires the delegate IP. I compared this to
> >>>version 4.0.5 which when node 0 fails, node 1 acquires the delegate IP
> >>>
> >>> Wolfgang Schwurack
> >>> Database/System Administrator
> >>> Utah Education Network
> >>> 801-587-9444
> >>> wolf at uen.org<mailto:wolf at uen.org>
> >>>
> >>> From: "pgpool-general-bounces at pgpool.net"
> >>><pgpool-general-bounces at pgpool.net> on behalf of Wolfgang Schwurack
> >>><wolf at uen.org>
> >>> Date: Tuesday, February 11, 2020 at 10:54 AM
> >>> To: "pgpool-general at pgpool.net" <pgpool-general at pgpool.net>
> >>> Subject: [pgpool-general: 6864] Re: watchdog fails to start
> pgpool-4.1.0
> >>>
> >>> It seem that version 4.1.0 requires the second node to be started
> >>>before acquired the delegate IP
> >>> After starting pgpool on the node 1 I?m seeing that watchdog
> >>>successfully acquired the delegate IP on node 0
> >>>
> >>> 2020-02-11 10:45:26: pid 9928: LOG:  watchdog: escalation started
> >>> 2020-02-11 10:45:33: pid 9928: LOG:  successfully acquired the delegate
> >>>IP:"10.11.0.204"
> >>> 2020-02-11 10:45:33: pid 9928: DETAIL:  'if_up_cmd' returned with
> >>>success
> >>> 2020-02-11 10:45:33: pid 9577: LOG:  watchdog escalation process with
> >>>pid: 9928 exit with SUCCESS.
> >>>
> >>> On previous versions watchdog would always acquire the delegate IP
> >>>without the second node being started.
> >>>
> >>>
> >>> From: "pgpool-general-bounces at pgpool.net"
> >>><pgpool-general-bounces at pgpool.net> on behalf of Wolfgang Schwurack
> >>><wolf at uen.org>
> >>> Date: Tuesday, February 11, 2020 at 10:22 AM
> >>> To: "pgpool-general at pgpool.net" <pgpool-general at pgpool.net>
> >>> Subject: [pgpool-general: 6863] watchdog fails to start pgpool-4.1.0
> >>>
> >>> I?m trying to get watchdog to start using pgpool-4.1.0 but fails to
> >>>start. I have been using pgpool-4.0.5 with watchdog no issues.
> >>> Has something changed in version 4.1.0 for watchdog?
> >>> Hosts  - Ubuntu 18.0.4
> >>> PostgreSQL 11
> >>>
> >>> I?ve been using pgpool for a long time on each new release I have
> >>>always just done ./configure, make, make install
> >>>
> >>> This is my start command
> >>>
> >>> /usr/local/bin/pgpool -n -D -f /usr/local/etc/pgpool.conf >
> >>>/var/log/pgpool/pgpool.log 2>&1 &
> >>> In pgpool.log it would always show if acquired the delegate ip
> >>> Version 4.0.5 start up watchdog
> >>>
> >>> 2020-02-11 10:13:05: pid 2195: LOG:  pgpool-II successfully started.
> >>>version 4.0.5 (torokiboshi)
> >>>
> >>> 2020-02-11 10:13:05: pid 2195: LOG:  node status[0]: 1
> >>>
> >>> 2020-02-11 10:13:05: pid 2195: LOG:  node status[1]: 2
> >>>
> >>> 2020-02-11 10:13:06: pid 2228: LOG:  creating socket for sending
> >>>heartbeat
> >>>
> >>> 2020-02-11 10:13:06: pid 2228: DETAIL:  bind send socket to device:
> eth0
> >>>
> >>> 2020-02-11 10:13:06: pid 2228: LOG:  set SO_REUSEPORT option to the
> >>>socket
> >>>
> >>> 2020-02-11 10:13:06: pid 2228: LOG:  creating socket for sending
> >>>heartbeat
> >>>
> >>> 2020-02-11 10:13:06: pid 2228: DETAIL:  set SO_REUSEPORT
> >>>
> >>> 2020-02-11 10:13:06: pid 2227: LOG:  createing watchdog heartbeat
> >>>receive socket.
> >>>
> >>> 2020-02-11 10:13:06: pid 2227: DETAIL:  bind receive socket to device:
> >>>"eth0"
> >>>
> >>> 2020-02-11 10:13:06: pid 2227: LOG:  set SO_REUSEPORT option to the
> >>>socket
> >>>
> >>> 2020-02-11 10:13:06: pid 2227: LOG:  creating watchdog heartbeat
> >>>receive socket.
> >>>
> >>> 2020-02-11 10:13:06: pid 2227: DETAIL:  set SO_REUSEPORT
> >>>
> >>> 2020-02-11 10:13:12: pid 2200: LOG:  successfully acquired the delegate
> >>>IP:"10.11.0.204"
> >>>
> >>> 2020-02-11 10:13:12: pid 2200: DETAIL:  'if_up_cmd' returned with
> >>>success
> >>>
> >>> 2020-02-11 10:13:12: pid 2197: LOG:  watchdog escalation process with
> >>>pid: 2200 exit with SUCCESS.
> >>>
> >>> Version 4.1.0 fails to start watchdog
> >>>
> >>> 2020-02-11 10:15:54: pid 8392: LOG:  pgpool-II successfully started.
> >>>version 4.1.0 (karasukiboshi)
> >>>
> >>> 2020-02-11 10:15:54: pid 8392: LOG:  node status[0]: 1
> >>>
> >>> 2020-02-11 10:15:54: pid 8392: LOG:  node status[1]: 2
> >>>
> >>> 2020-02-11 10:15:55: pid 8425: LOG:  creating socket for sending
> >>>heartbeat
> >>>
> >>> 2020-02-11 10:15:55: pid 8425: DETAIL:  bind send socket to device:
> eth0
> >>>
> >>> 2020-02-11 10:15:55: pid 8425: LOG:  set SO_REUSEPORT option to the
> >>>socket
> >>>
> >>> 2020-02-11 10:15:55: pid 8425: LOG:  creating socket for sending
> >>>heartbeat
> >>>
> >>> 2020-02-11 10:15:55: pid 8425: DETAIL:  set SO_REUSEPORT
> >>>
> >>> 2020-02-11 10:15:55: pid 8424: LOG:  createing watchdog heartbeat
> >>>receive socket.
> >>>
> >>> 2020-02-11 10:15:55: pid 8424: DETAIL:  bind receive socket to device:
> >>>"eth0"
> >>>
> >>> 2020-02-11 10:15:55: pid 8424: LOG:  set SO_REUSEPORT option to the
> >>>socket
> >>>
> >>> 2020-02-11 10:15:55: pid 8424: LOG:  creating watchdog heartbeat
> >>>receive socket.
> >>>
> >>> 2020-02-11 10:15:55: pid 8424: DETAIL:  set SO_REUSEPORT
> >>>
> >>>
> >>> Wolfgang Schwurack
> >>> Database/System Administrator
> >>> Utah Education Network
> >>> 801-587-9444
> >>> wolf at uen.org<mailto:wolf at uen.org>
> >>>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20200305/0c71bcfd/attachment-0001.html>


More information about the pgpool-general mailing list