[pgpool-hackers: 4142] Long standing bug with pgpool main process

Fri Mar 4 20:31:16 JST 2022

I have found a long standing bug with pgpool main process.  Moreover
this bug is important because when the issue happens, the pgpool main
process exits. If watchdog is not enabled, the only way to recover
from it is restarting pgpool. I have to blame myself for this
bug. Sorry.

Below is the explanation of this bug in the commit message.

Pgpool-II main process tries to find primary node whenever the cluster
status is changed by failover/failback. While doing it, if a backend
is failing or shutting down, socket write to the backend could
fail. Unfortunately in the case do_query() throws FATAL error, which
makes the Pgpool-II main process die like this.

2022-03-04 18:13:12.711: main pid 795826: WARNING:  write on backend 1 failed with error :"Broken pipe"
2022-03-04 18:13:12.711: main pid 795826: DETAIL:  while trying to write data from offset: 0 wlen: 32
2022-03-04 18:13:12.711: main pid 795826: LOG:  notice_backend_error: called from pgpool main. ignored.
2022-03-04 18:13:12.711: main pid 795826: LOG:  unable to flush data to backend
2022-03-04 18:13:12.711: main pid 795826: DETAIL:  do not failover because I am the main process
2022-03-04 18:13:12.711: main pid 795826: FATAL:  Backend throw an error message
2022-03-04 18:13:12.711: main pid 795826: DETAIL:  Exiting current session because of an error from backend
2022-03-04 18:13:12.711: main pid 795826: HINT:  BACKEND Error: "terminating connection due to administrator command"
2022-03-04 18:13:12.715: main pid 795826: LOG:  shutting down

To prevent it, change ereport(FATAL) to ereport(ERROR) in do_query().

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp