[pgpool-general: 7683] Re: Failover question

Fri Sep 3 11:24:22 JST 2021

> I found the issue /bin/ip did not have the sticky set on node3. After setting the sticky bit on /bin/ip the failover on node3 is working
> 
> root at pgtest-03:~# ll /bin/ip
> -rwxr-xr-x 1 root root 611960 Feb 13  2020 /bin/ip*
> root at pgtest-03:~# chmod u=rwxs,g=rx,o=rx /bin/ip
> root at pgtest-03:~# ll /bin/ip
> -rwsr-xr-x 1 root root 611960 Feb 13  2020 /bin/ip*

Great! 
You need to set sticky bit on /bin/ip command.

> On 9/1/21, 10:54 AM, "pgpool-general on behalf of Wolf Schwurack" <pgpool-general-bounces at pgpool.net on behalf of wolf at uen.org> wrote:
> 
>     Hello
> 
>     This shows pcp_watchdog_info after node1 is added back in
>     postgres at pgtest-02:~$ pcp_watchdog_info -h localhost -U wolf
>     Password: 
>     3 YES pgtest-02:9999 Linux pgtest-02 pgtest-02
> 
>     pgtest-02:9999 Linux pgtest-02 pgtest-02 9999 9000 4 LEADER
>     pgtest-01:9999 Linux pgtest-01 pgtest-01 9999 9000 7 STANDBY
>     pgtest-03:9999 Linux pgtest-03 pgtest-03 9999 9000 7 STANDBY
> 
>     Here's the pgpool.log from node3 after shutdown of pgpool on node2
>     2021-09-01 10:43:38: pid 417478: LOG:  adding watchdog node "pgtest-01:9999 Linux pgtest-01" to the standby list
>     2021-09-01 10:43:38: pid 417478: LOG:  quorum found
>     2021-09-01 10:43:38: pid 417478: DETAIL:  starting escalation process
>     2021-09-01 10:43:38: pid 417478: LOG:  escalation process started with PID:554759
>     2021-09-01 10:43:38: pid 417478: LOG:  signal_user1_to_parent_with_reason(3)
>     2021-09-01 10:43:38: pid 417474: LOG:  Pgpool-II parent process received SIGUSR1
>     2021-09-01 10:43:38: pid 417474: LOG:  Pgpool-II parent process received watchdog quorum change signal from watchdog
>     2021-09-01 10:43:38: pid 417478: LOG:  new IPC connection received
>     2021-09-01 10:43:38: pid 417474: LOG:  watchdog cluster now holds the quorum
>     2021-09-01 10:43:38: pid 417474: DETAIL:  updating the state of quarantine backend nodes
>     2021-09-01 10:43:38: pid 417478: LOG:  new IPC connection received
>     2021-09-01 10:43:38: pid 554759: LOG:  watchdog: escalation started
>     RTNETLINK answers: Operation not permitted
>     2021-09-01 10:43:38: pid 554759: LOG:  failed to acquire the delegate IP address
>     2021-09-01 10:43:38: pid 554759: DETAIL:  'if_up_cmd' failed
>     2021-09-01 10:43:38: pid 554759: WARNING:  watchdog escalation failed to acquire delegate IP
> 
>     Here's pcp_watchdog_info on node3 after showdown of pgpool on node2
>     postgres at pgtest-03:~$ pcp_watchdog_info -h localhost -U wolf
>     Password: 
>     3 YES pgtest-03:9999 Linux pgtest-03 pgtest-03
> 
>     pgtest-03:9999 Linux pgtest-03 pgtest-03 9999 9000 4 LEADER
>     pgtest-01:9999 Linux pgtest-01 pgtest-01 9999 9000 7 STANDBY
>     pgtest-02:9999 Linux pgtest-02 pgtest-02 9999 9000 10 SHUTDOWN
> 
>     Here's pcp_watchdog_info on node3 after start of pgpool on node2
>     postgres at pgtest-03:~$ pcp_watchdog_info -h localhost -U wolf
>     Password: 
>     3 YES pgtest-03:9999 Linux pgtest-03 pgtest-03
> 
>     pgtest-03:9999 Linux pgtest-03 pgtest-03 9999 9000 4 LEADER
>     pgtest-01:9999 Linux pgtest-01 pgtest-01 9999 9000 7 STANDBY
>     pgtest-02:9999 Linux pgtest-02 pgtest-02 9999 9000 7 STANDBY
> 
>     Still no watchdog IP enabled It seems this is the issue on node3 maybe a permission issue?
>     RTNETLINK answers: Operation not permitted
> 
>     Wolf
> 
>     On 8/29/21, 9:18 PM, "Bo Peng" <pengbo at sraoss.co.jp> wrote:
> 
>         Hello,
> 
>         > Sorry but you miss the part where node 1 was added back to a standby after the failover to node 2. At the point of when I turn off pgpool on node 2, node 1 and node 3 are the standby nodes which node 3 should take over watchdog
> 
>         I have tested Pgpool-II 4.2.4, but I could not reproduce this issue.
>         Could you share the following information?
> 
>         - result of "pcp_watchdog_info" after adding back node1 as a standby
>         - pgpool logs of node 1 and node 3 after turning off pgpool on node2.
> 
> 
>         > Wolf 
>         > 
>         > On 8/27/21, 10:07 AM, "Bo Peng" <pengbo at sraoss.co.jp> wrote:
>         > 
>         >     Hello,
>         > 
>         >     > My question is why watchdog doesn’t come up on node 3. Pgpool.conf is set the same on all 3 nodes.
>         > 
>         >     If you shut down pgpool node1 and node2, the number of alive pgpool is one,
>         >     the quorum does not exist.
>         > 
>         >     If you want to enable watchdog even if the quorum does not exist,
>         >     you need to enable the parameter "enable_consensus_with_half_votes".
>         > 
>         >     See more detail about "enable_consensus_with_half_votes":
>         >     https://www.pgpool.net/docs/latest/en/html/runtime-watchdog-config.html#GUC-ENABLE-CONSENSUS-WITH-HALF-VOTES
>         > 
>         >     > I have a 3 nodes setup for pgpool/postgresql using watchdog, When testing the failover of pgpool, I turn off pgpool on node 1 which fails over watchdog to node 2. Then I turn on pgpool on node 1 that set node 1 as a standby node. The next step I turn off pgpool on node 2 which watchdog try’s to fail over to node 3 but watchdog IP never comes up on node 3 or any of the nodes. So I turn off pgpool on node 3 and watchdog fails over to node 1.
>         >     > My question is why watchdog doesn’t come up on node 3. Pgpool.conf is set the same on all 3 nodes.
>         >     > 
>         >     > Here’s my output of show pool_nodes
>         >     > 
>         >     >  node_id | hostname  | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
>         >     > 
>         >     > ---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>         >     > 
>         >     >  0       | pgtest-01 | 5432 | up     | 0.500000  | primary | 2003       | true              | 0                 |                   |                        | 2021-08-24 14:06:20
>         >     > 
>         >     >  1       | pgtest-02 | 5432 | up     | 0.500000  | standby | 667        | false             | 0                 | streaming         | async                  | 2021-08-24 14:06:20
>         >     > 
>         >     >  2       | pgtest-03 | 5432 | up     | 0.000000  | standby | 0          | false             | 0                 | streaming         | async                  | 2021-08-24 14:06:20
>         >     > 
>         >     > Not sure if this is an issue but the lb_weight show node 1(pgtest-01) and node 2(pgtest-02) as 0.5000 and node 3(pgtest-03) as 0.0000
>         >     > 
>         >     > In pgpool.conf I have backend_weight for each node set to 0.3
>         >     > 
>         >     > Hosts = Ubuntu 20.4
>         >     > Pgpool = 4.2.4
>         >     > PostgreSQL = 12.8
>         >     > 
>         >     > 
>         >     > -- Wolf
>         >     > 
>         >     > 
>         > 
>         > 
>         >     -- 
>         >     Bo Peng <pengbo at sraoss.co.jp>
>         >     SRA OSS, Inc. Japan
>         >     http://www.sraoss.co.jp/
>         > 
> 
> 
>         -- 
>         Bo Peng <pengbo at sraoss.co.jp>
>         SRA OSS, Inc. Japan
>         http://www.sraoss.co.jp/
> 
>     _______________________________________________
>     pgpool-general mailing list
>     pgpool-general at pgpool.net
>     http://www.pgpool.net/mailman/listinfo/pgpool-general
> 

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan
http://www.sraoss.co.jp/