[pgpool-hackers: 3788] Re: pg_terminate_backend() does not work in native replication mode

Tatsuo Ishii ishii at sraoss.co.jp
Thu Aug 20 15:01:29 JST 2020


Another fundamental problem with pg_terminate_backend() with native
replication mode (and snapshot isolation mode) is, pgpool needs to
send pg_termnaite_backend() with different argument because the
argument is a process id, which is not incosistent among backends.

> Hi Usma,
> 
> While looking into the 073.pg_terminate_backend test failure I found
> interesting issue.
> 
> Supoose we execute following SQL in native replication mode:
> 
> ssesion 1: select pg_sleep(60); /* at time 't1' */
> 
> session 2: select pg_terminate_backend('7615');	/* at time 't2' */
> 
> The pg_sleep() should be canceled at time t2, but actually it is
> canceled at t2 + 60 seconds. Also after the cancel we get:
> 
> WARNING:  packet kind of backend 1 ['D'] does not match with master/majority nodes packet kind ['E']
> WARNING:  write on backend 0 failed with error :"Success"
> DETAIL:  while trying to write data from offset: 0 wlen: 5
> FATAL:  failed to read kind from backend
> DETAIL:  kind mismatch among backends. Possible last query was: "select pg_sleep(60);" kind details are: 0[E: terminating connection due to administrator command] 1[D]
> HINT:  check data consistency among db nodes
> 
> What actually happening here is:
> 
> 2020-08-20 13:01:46: psql pid 7603: LOG:  DB node id: 0 backend pid: 7615 statement: BEGIN
> 2020-08-20 13:01:46: psql pid 7603: LOG:  DB node id: 1 backend pid: 7616 statement: BEGIN
> 2020-08-20 13:01:46: psql pid 7603: LOG:  DB node id: 0 backend pid: 7615 statement: select pg_sleep(60); <-- pgpool 7603 waiting for response from backend 0.
> 2020-08-20 13:02:06: psql pid 7598: LOG:  DB node id: 0 backend pid: 7632 statement: SELECT version()
> 2020-08-20 13:02:06: psql pid 7598: LOG:  DB node id: 0 backend pid: 7632 statement: SELECT count(*) FROM pg_catalog.pg_proc AS p, pg_catalog.pg_namespace AS n WHERE p.proname = 'pg_terminate_backend' AND n.oid = p.pronamespace AND n.nspname ~ '.*' AND p.provolatile = 'v'
> 2020-08-20 13:02:06: psql pid 7598: LOG:  found the pg_terminate_backend request for backend pid:7615 on backend node:0
> 2020-08-20 13:02:06: psql pid 7598: DETAIL:  setting the connection flag
> 2020-08-20 13:02:06: psql pid 7598: LOG:  DB node id: 0 backend pid: 7632 statement: select pg_terminate_backend(7615);
> 2020-08-20 13:02:06: psql pid 7603: LOG:  DB node id: 1 backend pid: 7616 statement: select pg_sleep(60);  <--- pgpool 7603 got response because pg_terminate_backend executed. pgpool 7603 started to wait for response from backend 1.
> 2020-08-20 13:03:06: psql pid 7603: WARNING:  packet kind of backend 1 ['D'] does not match with master/majority nodes packet kind ['E'] <-- after 60 seconds passed, pgpool 7603 got response from bacnend 0 and 1. <-- since backend 0 got error while backend 1 sucessfully executed pg_sleep(60), there were difference in packet kind.
> 2020-08-20 13:03:06: psql pid 7603: FATAL:  failed to read kind from backend <-- and pgpool get angry!
> 2020-08-20 13:03:06: psql pid 7603: DETAIL:  kind mismatch among backends. Possible last query was: "select pg_sleep(60);" kind details are: 0[E: terminating connection due to administrator command] 1[D]
> 2020-08-20 13:03:06: psql pid 7603: HINT:  check data consistency among db nodes
> 2020-08-20 13:03:06: psql pid 7603: WARNING:  write on backend 0 failed with error :"Success"
> 2020-08-20 13:03:06: psql pid 7603: DETAIL:  while trying to write data from offset: 0 wlen: 5
> 2020-08-20 13:03:06: main pid 7572: LOG:  child process with pid: 7603 exits with status 512
> 
> 
> Any idea how to deal with this problem?
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers


More information about the pgpool-hackers mailing list