[pgpool-general: 8107] Re: Problems taking node offline

Wed Apr 27 11:37:44 JST 2022

Hello,

On Tue, 26 Apr 2022 15:01:15 +0000
Jon SCHEWE <jon.schewe at raytheon.com> wrote:

> >> I want to take a backend node offline and having some trouble with it.
> >>
> >> I check the status of my notes:
> >> template1=> show pool_nodes;
> >>  node_id |       hostname       | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change 
> >> ---------+----------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >>  0       | psql-01.mgmt.bbn.com | 5432 | up     | 0.333333  | standby | 646198     | false             | 0                 | streaming         | sync                   | 2022-04-25 14:19:57
> >>  1       | psql-02.mgmt.bbn.com | 5432 | up     | 0.333333  | primary | 2115353    | true              | 0                 |                   |                        | 2022-04-25 14:16:24
> >>  2       | psql-03.mgmt.bbn.com | 5432 | up     | 0.333333  | standby | 2913       | false             | 0                 | streaming         | potential              | 2022-04-25 14:24:25
> >> (3 rows)
> >>
> >> I want to take psql-02 offline.
> >>
> >> pcp_detach_node -h psql.mgmt.bbn.com -p 9897 -U pgpool -g -n 1
> >> Password:
> >> pcp_detach_node -- Command Successful
> >>
> >>
> >> I check the status again:
> >> template1=> show pool_nodes;
> >>  node_id |       hostname       | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change 
> >> ---------+----------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >>  0       | psql-01.mgmt.bbn.com | 5432 | up     | 0.333333  | standby | 718555     | true              | 0                 | streaming         | sync                   | 2022-04-25 14:19:57
> >>  1       | psql-02.mgmt.bbn.com | 5432 | up     | 0.333333  | primary | 2373454    | false             | 0                 |                   |                        | 2022-04-25 14:16:24
> >>  2       | psql-03.mgmt.bbn.com | 5432 | up     | 0.333333  | standby | 3310       | false             | 0                 | streaming         | potential              | 2022-04-25 14:24:25
> >> (3 rows)
> >>
> >>
> >> I still see psql-02 online. Why is that?
> >
> >Could you share pgpool.conf
> 
> Yes, attached.
> 
> > and full log after running pcp_detach_node?
> 
> The only log messages are what I sent originally.
> 
> >Which version of Pgpool-II are you using?
> 
> 4.1.4

Thank you.

I think watchdog may not be working properly.
If you run pcp_detach_node, failover_command and follow_master_command should be executed.
But I could not see the related logs.

Could you check the watchdog status using "pcp_watchdog_info" command?
Does this issue occur if you disable watchdog "use_watchdog = off"?

> This morning I checked and 2 of the nodes are marked as down and the primary has changed. Perhaps the pcp command took some more time (hours) to complete?
> 
> template1=> show pool_nodes;
>  node_id |       hostname       | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
> ---------+----------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>  0       | psql-01.mgmt.bbn.com | 5432 | up     | 0.333333  | primary | 27047705   | true              | 0                 |                   |                        | 2022-04-26 00:19:21
>  1       | psql-02.mgmt.bbn.com | 5432 | down   | 0.333333  | standby | 15170253   | false             | 0                 |                   |                        | 2022-04-26 00:19:21
>  2       | psql-03.mgmt.bbn.com | 5432 | down   | 0.333333  | standby | 214556     | false             | 0                 | streaming         | sync                   | 2022-04-26 00:19:21
> (3 rows)
> 
> 
> >
> >> Log messages during the pcp command:
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 11674: LOG:  new IPC connection received
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 11674: LOG:  online recovery request from local pgpool-II node received on IPC interface is forwarded to master watchdog node "psql-02.mgmt.bbn.com:9898 Linux psql-02"
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 11674: DETAIL:  waiting for the reply...
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 13736: LOG:  PCP process with pid: 20049 exit with SUCCESS.
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 13736: LOG:  PCP process with pid: 20049 exits with status 0
> >>
> >>
> >> Apr 25 15:59:54 psql-02 pgpool[11672]: 2022-04-25 15:59:54: pid 11672: LOG:  child process with pid: 19261 exits with status 256
> >> Apr 25 15:59:54 psql-02 pgpool[11672]: 2022-04-25 15:59:54: pid 11672: LOG:  fork a new child process with pid: 20176
> >> Apr 25 15:59:54 psql-02 pgpool[11672]: 2022-04-25 15:59:54: pid 11672: LOG:  child process with pid: 19006 exits with status 256
> >> Apr 25 15:59:54 psql-02 pgpool[11672]: 2022-04-25 15:59:54: pid 11672: LOG:  fork a new child process with pid: 20178

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan
http://www.sraoss.co.jp/