[pgpool-general: 7149] Re: After FailOver.SH Both nodes remain on standby :(

Thu Jul 16 21:41:29 JST 2020

> Hi All,
> 
> I am investigating a possible bug in pgpool , where it keeps two servers as standby and not promote the active  node
> to a primary role.
> 
> Here is an output of show pool nodes describe the issue
> 
> [root at mgrdb100 ~]# PGPASSWORD=xxxxxxx psql -U postgres -h 10.65.181.99 -p 9999 -c 'show pool_nodes'
> node_id | hostname  | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
> ---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------
> 0       | 1.1.1.100 | 5432 | down   | 0.500000  | standby | 5392       | false             | 0
> 1       | 1.1.1.101 | 5432 | up     | 0.500000  | standby | 0          | true              | 0
> 
> 
> My scenario happened on PGPool v 3.6.6  and I also confirm it happened on the latest version 4.1.2 as well,
> 
> The scenario is the following:
> 
>   1.  Two postgress are up and running node0 primary node 1 standby
>   2.  Two pgpools are up and running   vip attach properly
>   3.  I execute kill -9 to kill postgress 1.1.1.100 which was the primary (node_id0)
>   4.  I examine pgpool logs and I see FailOver.SH executed successfully
> trigger file is being written on the node1 (the standby),   and on postgres logs  of the  standby
> I found message indicate that trigger file found,
> as well  i verify postgress move  from standby to master,
>  autovacum start work and in pg_in_recovery the return false
>   5.  So basically almost everything is fine  except pgpool which still inform that the role on standby
> 
> 
> I appreciate getting some steps I should check in order to progress to a solution to the issue
> or step to in order to confirm it is a bug,      I pulled out from logs  the Backend DB status
> see log below, I am  not clear  status 2 meaning  my concern right now relate on authentication issue
> but still, it is just a guess

Unfortunately the log is not very helpful. Detecting primary node is
not done by health check process. It's better to share log right after
failover because primary node should have been detected there.

Also sharing pgpool.conf is useful for us.

Status "2" means "up" by the way.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp