[pgpool-general: 7381] Re: Watchdog New Primary & Standby shutdown when Node 0 Fails

Tatsuo Ishii ishii at sraoss.co.jp
Mon Dec 21 21:41:40 JST 2020


Hi Joe,

I was able to reproduce your problem by using watchdog_setup. I just
made a mistake in testing. Sorry for noise.

Steps to reproduce the problem:

$ watchdog_setup -wn 3 -n 2	# create 3 watchdog node and 2 postgres node.
$ ./startall
$ cd pgpool0
EDIT shutdownall script to not shutdown postgres

dir=`pwd`
PGPOOL_INSTALL_DIR=/usr/local
$PGPOOL_INSTALL_DIR/bin/pgpool -f $dir/etc/pgpool.conf -m f stop
while [ -f $dir/run/pgpool.pid ];do sleep 1;done
#/usr/local/pgsql/bin/pg_ctl -D /home/t-ishii/work/Pgpool-II/current/a/pgpool0/data0 -m f stop <-- comment out
#/usr/local/pgsql/bin/pg_ctl -D /home/t-ishii/work/Pgpool-II/current/a/pgpool0/data1 -m f stop <-- comment out

# shutdown pgpool0
$ ./shutdownall

With 4.2.0 watchdog, port for pgpool1 and pgpool2 cannot be connected (your problem).
$ pcp_watchdog_info -w -p 50005
ERROR: connection to socket "/tmp/.s.PGSQL.50005" failed with error "No such file or directory"
$ pcp_watchdog_info -w -p 50009
ERROR: connection to socket "/tmp/.s.PGSQL.50009" failed with error "No such file or directory"

With 4.2 stable head:
$ pcp_watchdog_info -w -p 50005
localhost:50004 Linux tishii-CFSV7-1 localhost 50004 50006 4 LEADER    <-- pgpool1 is properly promoted
localhost:50000 Linux tishii-CFSV7-1 localhost 50000 50002 10 SHUTDOWN <-- pgpool0 shutdown as expected
localhost:50008 Linux tishii-CFSV7-1 localhost 50008 50010 7 STANDBY

So I can confirm that the problem is solved in 4.2 stable head (= supposed to be 4.2.1).

BTW, Pengbo said that she is going to release 4.2.1 on this Wednesday,
December 23. I believe she is going to release RPMs as well.

> Hi Tatsuo,
> 
> I haven't tried it with the watchdog_setup, I just configured it myself following the documentation.
> 
> I will try before the holidays and let you know.
> 
> Thanks,
> 
> Joe Madden
> Senior Systems Engineer
> D 01412224666      
> joe.madden at mottmac.com
> 
> 
> -----Original Message-----
> From: Tatsuo Ishii <ishii at sraoss.co.jp> 
> Sent: 20 December 2020 10:58
> To: m.usama at gmail.com
> Cc: Joe Madden <Joe.Madden at mottmac.com>; pgpool-general at pgpool.net
> Subject: Re: [pgpool-general: 7372] Re: Watchdog New Primary & Standby shutdown when Node 0 Fails
> 
> Hi Usama,
> 
>> Both wd_cli and the lifecheck mechanism uses the same path and commit 
>> messages only mentions the wd_cli.
>> Looking at the email I think it's a very critical issue and we should 
>> do a point release for 4.2 The issue was caused by an oversight by the 
>> "simplifying watchdog configuration" feature which was introduced in 
>> 4.2, so the older versions should not have the same problem.
>> 
>> Thanks
>> Best regards
>> Muhammad Usama
> 
> BTW, I wonder why watchdog cluster created by watchdog_setup does not show the problem.
> 
>> If I configure with node 0 always being dead:
> 
> Do you have any idea?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sraoss.co.jp%2Findex_en.php&data=04%7C01%7CJoe.Madden%40mottmac.com%7Ca92c301b9c0c4451768108d8a4d61438%7Ca2bed0c459574f73b0c2a811407590fb%7C0%7C0%7C637440586659096119%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P8%2F1cmww6gCch5YowriymHBD1P%2BYck0NAyscc7%2FCQXY%3D&reserved=0
> Japanese:https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sraoss.co.jp%2F&data=04%7C01%7CJoe.Madden%40mottmac.com%7Ca92c301b9c0c4451768108d8a4d61438%7Ca2bed0c459574f73b0c2a811407590fb%7C0%7C0%7C637440586659096119%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KSRzpNI2jQHxJrGNM%2FyZXDqwH0zfr9TqBs1mTlpk3P8%3D&reserved=0


More information about the pgpool-general mailing list