[pgpool-hackers: 3786] pg_terminate_backend() does not work in native replication mode

Tatsuo Ishii ishii at sraoss.co.jp
Thu Aug 20 13:31:47 JST 2020


Hi Usma,

While looking into the 073.pg_terminate_backend test failure I found
interesting issue.

Supoose we execute following SQL in native replication mode:

ssesion 1: select pg_sleep(60); /* at time 't1' */

session 2: select pg_terminate_backend('7615');	/* at time 't2' */

The pg_sleep() should be canceled at time t2, but actually it is
canceled at t2 + 60 seconds. Also after the cancel we get:

WARNING:  packet kind of backend 1 ['D'] does not match with master/majority nodes packet kind ['E']
WARNING:  write on backend 0 failed with error :"Success"
DETAIL:  while trying to write data from offset: 0 wlen: 5
FATAL:  failed to read kind from backend
DETAIL:  kind mismatch among backends. Possible last query was: "select pg_sleep(60);" kind details are: 0[E: terminating connection due to administrator command] 1[D]
HINT:  check data consistency among db nodes

What actually happening here is:

2020-08-20 13:01:46: psql pid 7603: LOG:  DB node id: 0 backend pid: 7615 statement: BEGIN
2020-08-20 13:01:46: psql pid 7603: LOG:  DB node id: 1 backend pid: 7616 statement: BEGIN
2020-08-20 13:01:46: psql pid 7603: LOG:  DB node id: 0 backend pid: 7615 statement: select pg_sleep(60); <-- pgpool 7603 waiting for response from backend 0.
2020-08-20 13:02:06: psql pid 7598: LOG:  DB node id: 0 backend pid: 7632 statement: SELECT version()
2020-08-20 13:02:06: psql pid 7598: LOG:  DB node id: 0 backend pid: 7632 statement: SELECT count(*) FROM pg_catalog.pg_proc AS p, pg_catalog.pg_namespace AS n WHERE p.proname = 'pg_terminate_backend' AND n.oid = p.pronamespace AND n.nspname ~ '.*' AND p.provolatile = 'v'
2020-08-20 13:02:06: psql pid 7598: LOG:  found the pg_terminate_backend request for backend pid:7615 on backend node:0
2020-08-20 13:02:06: psql pid 7598: DETAIL:  setting the connection flag
2020-08-20 13:02:06: psql pid 7598: LOG:  DB node id: 0 backend pid: 7632 statement: select pg_terminate_backend(7615);
2020-08-20 13:02:06: psql pid 7603: LOG:  DB node id: 1 backend pid: 7616 statement: select pg_sleep(60);  <--- pgpool 7603 got response because pg_terminate_backend executed. pgpool 7603 started to wait for response from backend 1.
2020-08-20 13:03:06: psql pid 7603: WARNING:  packet kind of backend 1 ['D'] does not match with master/majority nodes packet kind ['E'] <-- after 60 seconds passed, pgpool 7603 got response from bacnend 0 and 1. <-- since backend 0 got error while backend 1 sucessfully executed pg_sleep(60), there were difference in packet kind.
2020-08-20 13:03:06: psql pid 7603: FATAL:  failed to read kind from backend <-- and pgpool get angry!
2020-08-20 13:03:06: psql pid 7603: DETAIL:  kind mismatch among backends. Possible last query was: "select pg_sleep(60);" kind details are: 0[E: terminating connection due to administrator command] 1[D]
2020-08-20 13:03:06: psql pid 7603: HINT:  check data consistency among db nodes
2020-08-20 13:03:06: psql pid 7603: WARNING:  write on backend 0 failed with error :"Success"
2020-08-20 13:03:06: psql pid 7603: DETAIL:  while trying to write data from offset: 0 wlen: 5
2020-08-20 13:03:06: main pid 7572: LOG:  child process with pid: 7603 exits with status 512


Any idea how to deal with this problem?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list