[pgpool-general: 7059] Re: Closing connection in OmniDB causes degenerate backend request on pgPool

Takuma Hoshiai hoshiai at sraoss.co.jp
Tue Jun 2 09:24:51 JST 2020


Hello,

Thank you for the information.
I found bug, this problem is caused in Pgpool-II 4.1.2.

In Pgpool-II 4.1.2 using native replication mode, this problem happens
for executing pg_terminate_backend command as you say.
If you are using Pgpool-II 4.1.2 installed from the source code, I think
that it will be improved by applying the attached patch and then reinstalling.

Best Regards,

Takuma Hoshiai

On Wed, 27 May 2020 13:25:42 +0000
Láznička Vladimír <Vladimir.Laznicka at cca.cz> wrote:

> Hello Takuma,
> 
> Thank you for your reply.
> 
> I reproduced the issue with logging parameters as requested and log output looks like this:
> 
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11987-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  Query message from frontend.
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11987-2] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |DETAIL:  query: "BEGIN"
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11988-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  DB node id: 0 backend pid: 2470 statement: SELECT version()
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11989-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  DB node id: 0 backend pid: 2470 statement: BEGIN
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11990-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  DB node id: 1 backend pid: 14439 statement: BEGIN
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11991-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  Query message from frontend.
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11991-2] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |DETAIL:  query: "select pg_terminate_backend(14368)"
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11992-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  found the pg_terminate_backend request for backend pid:14368 on backend node:1
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11992-2] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |DETAIL:  setting the connection flag
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11993-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  DB node id: 1 backend pid: 14439 statement: select pg_terminate_backend(14368)
> May 27 14:29:32 AISGPGP01 pgpool[11687]: [11684-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  reading and processing packets
> May 27 14:29:32 AISGPGP01 pgpool[11687]: [11684-2] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |DETAIL:  postmaster on DB node 1 was shutdown by administrative command
> May 27 14:29:32 AISGPGP01 pgpool[11687]: [11685-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  received degenerate backend request for node_id: 1 from pid [11687]
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [63-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  new IPC connection received
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [64-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  watchdog received the failover command from local pgpool-II on IPC interface
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [65-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [66-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  we do not need quorum to hold to proceed with failover
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [66-2] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |DETAIL:  proceeding with the failover
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [66-3] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |HINT:  failover_when_quorum_exists is set to false
> May 27 14:29:32 AISGPGP01 pgpool[30844]: [11987-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  Pgpool-II parent process has received failover request
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [67-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  new IPC connection received
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [68-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  received the failover indication from Pgpool-II on IPC interface
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [69-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  watchdog is informed of failover start by the main process
> May 27 14:29:32 AISGPGP01 pgpool[30844]: [11988-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  starting degeneration. shutdown host aisgdbd02.cca.cz(5432)
> May 27 14:29:32 AISGPGP01 pgpool[12160]: [11994-1] 2020-05-27 14:29:32 ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  Terminate message from frontend.
> May 27 14:29:32 AISGPGP01 pgpool[30844]: [11989-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  Restart all children
> May 27 14:29:32 AISGPGP01 pgpool[30844]: [11990-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  failover: set new primary node: -1
> May 27 14:29:32 AISGPGP01 pgpool[30844]: [11991-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  failover: set new master node: 0
> May 27 14:29:32 AISGPGP01 pgpool[31445]: [844-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  worker process received restart request
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [70-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  new IPC connection received
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [71-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  received the failover indication from Pgpool-II on IPC interface
> May 27 14:29:32 AISGPGP01 pgpool[30846]: [72-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  watchdog is informed of failover end by the main process
> May 27 14:29:32 AISGPGP01 pgpool[30844]: [11992-1] 2020-05-27 14:29:32 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  failover done. shutdown host aisgdbd02.cca.cz(5432)
> May 27 14:29:33 AISGPGP01 pgpool[31444]: [631-1] 2020-05-27 14:29:33 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  restart request received in pcp child proces
> 
> Let me know, if you find any new information from this. I will try to roll back to the older version of pgPool and try to replicate the issue - it is quite possible that we just missed it before (or thought it was some inconsistency in data that caused the failover).
> 
> Best regards,
> Vladimír Láznička
> 
> -----Original Message-----
> From: Takuma Hoshiai <hoshiai at sraoss.co.jp> 
> Sent: Wednesday, May 27, 2020 10:32 AM
> To: Láznička Vladimír <Vladimir.Laznicka at cca.cz>
> Cc: pgpool-general at pgpool.net
> Subject: Re: [pgpool-general: 7040] Closing connection in OmniDB causes degenerate backend request on pgPool
> 
> Hello,
> 
> On Tue, 26 May 2020 13:28:45 +0000
> Láznička Vladimír <Vladimir.Laznicka at cca.cz> wrote:
> 
> > Hello,
> > 
> > With the last version of pgPool (4.1.2) we have noticed a strange behavior while using the OmniDB client application. Firstly - we are running pgPool in a native replication mode in this configuration:
> > 
> > PgPool servers (coordinated by the Watchdog process):
> > 
> > AISGPGP01 - primary pgpool server with a virtual IP for client connections.
> > AISGPGP02 - standby pgpool server.
> > 
> > PostgreSQL servers (DML + DDL commands are replicated via pgPool to both, SELECTs are load balanced on one of them):
> > 
> > AISGDBD01 - DB node 0.
> > AISGDBD02 - DB node 1.
> > 
> > Our problem is following - when closing a window with editable SQL code in OmniDB application (effectively closing the session under which the window is connecting to the database), it causes a failover of the DB node as if the database on one of that node went down. The result is that one DB node is down, while other is still up. If it happens a second time (someone closes the connection from OmniDB), pgPool degenerates the last node and both databases are effectively down (PostgreSQL processes on AIGSDBD servers run fine) until they are manually brought up by PCP commands. The log from pgPool during the event is following:
> > 
> > May 26 12:07:59 AISGPGP01 pgpool[29612]: [2067-1] 2020-05-26 12:07:59 
> > ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No 
> > Connection] :U: |LOG:  new connection received May 26 12:07:59 
> > AISGPGP01 pgpool[29612]: [2067-2] 2020-05-26 12:07:59 ::: APP - [No 
> > Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: 
> > |DETAIL:  connecting host=nb197.cca.cz port=58065 May 26 12:07:59 
> > AISGPGP01 pgpool[29612]: [2068-1] 2020-05-26 12:07:59 ::: APP - OmniDB 
> > :A: DB - aisgdvyv :D: USER - aisg :U: |LOG:  found the 
> > pg_terminate_backend request for backend pid:32254 on backend node:0 
> > May 26 12:07:59 AISGPGP01 pgpool[29612]: [2068-2] 2020-05-26 12:07:59 
> > ::: APP - OmniDB :A: DB - aisgdvyv :D: USER - aisg :U: |DETAIL:  
> > setting the connection flag May 26 12:07:59 AISGPGP01 pgpool[29820]: 
> > [2340-1] 2020-05-26 12:07:59 ::: APP - OmniDB :A: DB - aisgdvyv :D: 
> > USER - aisg :U: |LOG:  reading and processing packets May 26 12:07:59 
> > AISGPGP01 pgpool[29820]: [2340-2] 2020-05-26 12:07:59 ::: APP - OmniDB 
> > :A: DB - aisgdvyv :D: USER - aisg :U: |DETAIL:  postmaster on DB node 
> > 0 was shutdown by administrative command May 26 12:07:59 AISGPGP01 
> > pgpool[29820]: [2341-1] 2020-05-26 12:07:59 ::: APP - OmniDB :A: DB - 
> > aisgdvyv :D: USER - aisg :U: |LOG:  received degenerate backend request for node_id: 0 from pid [29820] May 26 12:07:59 AISGPGP01 pgpool[22480]: [131-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  new IPC connection received May 26 12:07:59 AISGPGP01 pgpool[22480]: [132-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  watchdog received the failover command from local pgpool-II on IPC interface May 26 12:07:59 AISGPGP01 pgpool[22480]: [133-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface May 26 12:07:59 AISGPGP01 pgpool[22480]: [134-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  we do not need quorum to 
 hold to proceed with failover May 26 12:07:59 AISGPGP01 pgpool[22480]: [134-2] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |DETAIL:  proceeding with the failover May 26 12:07:59 AISGPGP01 pgpool[22480]: [134-3] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |HINT:  failover_when_quorum_exists is set to false May 26 12:07:59 AISGPGP01 pgpool[22478]: [2354-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  Pgpool-II parent process has received failover request May 26 12:07:59 AISGPGP01 pgpool[22480]: [135-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  new IPC connection received May 26 12:07:59 AISGPGP01 pgpool[22480]: [136-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  received
  the failover indication from Pgpool-II on IPC interface May 26 12:07:59 AISGPGP01 pgpool[22480]: [137-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  watchdog is informed of failover start by the main process May 26 12:07:59 AISGPGP01 pgpool[22478]: [2355-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  starting degeneration. shutdown host aisgdbd01.cca.cz(5432) May 26 12:07:59 AISGPGP01 pgpool[22478]: [2356-1] 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] :U: |WARNING:  All the DB nodes are in down status and skip writing status file.
> > May 26 12:07:59 AISGPGP01 pgpool[22478]: [2357-1] 2020-05-26 12:07:59 
> > ::: APP - [No Connection] :A: DB - [No Connection] :D: USER - [No 
> > Connection] :U: |LOG:  failover: no valid backend node found May 26 
> > 12:07:59 AISGPGP01 pgpool[22478]: [2358-1] 2020-05-26 12:07:59 ::: APP 
> > - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] 
> > :U: |LOG:  Restart all children May 26 12:07:59 AISGPGP01 
> > pgpool[22478]: [2359-1] 2020-05-26 12:07:59 ::: APP - [No Connection] 
> > :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  
> > failover: set new primary node: -1 May 26 12:07:59 AISGPGP01 
> > pgpool[22480]: [138-1] 2020-05-26 12:07:59 ::: APP - [No Connection] 
> > :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  new IPC 
> > connection received May 26 12:07:59 AISGPGP01 pgpool[22480]: [139-1] 
> > 2020-05-26 12:07:59 ::: APP - [No Connection] :A: DB - [No Connection] 
> > :D: USER - [No Connection] :U: |LOG:  received the failover indication 
> > from Pgpool-II on IPC interface May 26 12:07:59 AISGPGP01 
> > pgpool[29662]: [2327-1] 2020-05-26 12:07:59 ::: APP - [No Connection] 
> > :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  worker 
> > process received restart request May 26 12:07:59 AISGPGP01 
> > pgpool[22480]: [140-1] 2020-05-26 12:07:59 ::: APP - [No Connection] 
> > :A: DB - [No Connection] :D: USER - [No Connection] :U: |LOG:  
> > watchdog is informed of failover end by the main process May 26 
> > 12:07:59 AISGPGP01 pgpool[22478]: [2360-1] 2020-05-26 12:07:59 ::: APP 
> > - [No Connection] :A: DB - [No Connection] :D: USER - [No Connection] 
> > :U: |LOG:  failover done. shutdown host aisgdbd01.cca.cz(5432)
> > 
> > The log on the DB node 0 (that is being degenerated by pgPool) during that time is following:
> > 
> > 2020-05-26 12:07:59.265 CEST ::: APP - OmniDB :A: DB - aisgdvyv :D: 
> > USER - aisg :U: HOST - 172.20.15.64 :H: CMD - SELECT :C: |LOG:  AUDIT: 
> > SESSION,1,1,READ,SELECT,,,select pg_terminate_backend(32254),<not 
> > logged>
> > 2020-05-26 12:07:59.265 CEST ::: APP - OmniDB :A: DB - aisgdvyv :D: 
> > USER - aisg :U: HOST - 172.20.15.64 :H: CMD - idle :C: |FATAL:  
> > terminating connection due to administrator command
> > 2020-05-26 12:07:59.355 CEST ::: APP - OmniDB :A: DB - aisgdvyv :D: 
> > USER - aisg :U: HOST - 172.20.15.64 :H: CMD - idle in transaction :C: 
> > |LOG:  unexpected EOF on client connection with an open transaction
> > 
> > We have not noticed this behavior while running previous versions of the pgPool (4.1.1 and older) and even tested the same scenario on an environment with 4.1.1 without the nodes being degenerated, so it seems to be an issue of the new version, but I obviously cannot confirm that. I have also tried to change the failover_on_backend_error parameter, but it has no effect in this particular situation.
> 
> I think the cause of failover is pg_terminate_backend() command. Usually, it is happend by executing this command directly, without using Pgpool. 
> Because pgpool can not know if postgres shutdown or only terminate connection.
> 
> This behavior is dscried in documentation:
>   https://www.pgpool.net/docs/latest/en/html/restrictions.html
> 
> However I don't understand why it is happend by Pgpool-II 4.1.2 and it isn't happend by Pgpool-II 4.1.1 and under. 
> Could you Reproduce this work with enabled following paramer in pgpool ?
> 
>  log_per_node_statement = on
>  log_client_messages = on
> 
> > I am sending our pgPool configuration file as an attachment. Please, let me know, if there's a way to find out, what is actually happening between pgPool and OmniDB and how to prevent this behavior, or if you need to some additional information to solve this issue. Thank you for your time.
> > 
> > With regards,
> > Vladimír Láznička, CCA Group a. s.
> 
> Best Regarads,
> 
> Takuma Hoshiai
> 
> --
> Takuma Hoshiai <hoshiai at sraoss.co.jp>
> 


-- 
Takuma Hoshiai <hoshiai at sraoss.co.jp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_terminate_command_bug.patch
Type: application/octet-stream
Size: 884 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20200602/5d7ac0ed/attachment-0001.obj>


More information about the pgpool-general mailing list