[pgpool-general: 7148] After FailOver.SH Both nodes remain on standby :(

Thu Jul 16 20:22:03 JST 2020

Hi All,

I am investigating a possible bug in pgpool , where it keeps two servers as standby and not promote the active  node
to a primary role.

Here is an output of show pool nodes describe the issue

[root at mgrdb100 ~]# PGPASSWORD=xxxxxxx psql -U postgres -h 10.65.181.99 -p 9999 -c 'show pool_nodes'
node_id | hostname  | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------
0       | 1.1.1.100 | 5432 | down   | 0.500000  | standby | 5392       | false             | 0
1       | 1.1.1.101 | 5432 | up     | 0.500000  | standby | 0          | true              | 0

My scenario happened on PGPool v 3.6.6  and I also confirm it happened on the latest version 4.1.2 as well,

The scenario is the following:

  1.  Two postgress are up and running node0 primary node 1 standby
  2.  Two pgpools are up and running   vip attach properly
  3.  I execute kill -9 to kill postgress 1.1.1.100 which was the primary (node_id0)
  4.  I examine pgpool logs and I see FailOver.SH executed successfully
trigger file is being written on the node1 (the standby),   and on postgres logs  of the  standby
I found message indicate that trigger file found,
as well  i verify postgress move  from standby to master,
 autovacum start work and in pg_in_recovery the return false
  5.  So basically almost everything is fine  except pgpool which still inform that the role on standby

I appreciate getting some steps I should check in order to progress to a solution to the issue
or step to in order to confirm it is a bug,      I pulled out from logs  the Backend DB status
see log below, I am  not clear  status 2 meaning  my concern right now relate on authentication issue
but still, it is just a guess

2020-07-15 17:12:05: pid 25729: DEBUG:  starting health check
2020-07-15 17:12:05: pid 25729: DEBUG:  doing health check against database:postgres user:postgres
2020-07-15 17:12:05: pid 25729: DEBUG:  Backend DB node 0 status is 3
2020-07-15 17:12:05: pid 25729: DEBUG:  Backend DB node 1 status is 2
2020-07-15 17:12:05: pid 25729: DEBUG:  Trying to make persistent DB connection to backend node 1 having status 2
2020-07-15 17:12:05: pid 25729: DEBUG:  pool_flush_it: flush size: 41
2020-07-15 17:12:05: pid 25729: DEBUG:  pool_read: read 13 bytes from backend 1
2020-07-15 17:12:05: pid 25729: DEBUG:  authenticate kind = 5
2020-07-15 17:12:05: pid 25729: DEBUG:  pool_write: to backend: 1 kind:p
2020-07-15 17:12:05: pid 25729: DEBUG:  pool_flush_it: flush size: 41
2020-07-15 17:12:05: pid 25729: DEBUG:  pool_flush_it: flush size: 0
2020-07-15 17:12:05: pid 25729: DEBUG:  pool_read: read 321 bytes from backend 1
2020-07-15 17:12:05: pid 25729: DEBUG:  authenticate kind = 0
2020-07-15 17:12:05: pid 25729: DEBUG:  authenticate backend: key data received
2020-07-15 17:12:05: pid 25729: DEBUG:  authenticate backend: transaction state: I
2020-07-15 17:12:05: pid 25729: DEBUG:  persistent DB connection to backend node 1 having status 2 is successful
2020-07-15 17:12:05: pid 25729: DEBUG:  pool_write: to backend: 1 kind:X

::DISCLAIMER:: E-mail communication is confidential and intended solely for the addressee(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you believe the e-mail message has been sent to you in error, please notify the sender by replying to the e-mail transmission and delete the message without disclosing it. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20200716/53ba0137/attachment-0001.html>