[pgpool-hackers: 3791] Re: pg_terminate_backend() does not work in native replication mode

Muhammad Usama m.usama at gmail.com
Tue Aug 25 15:36:38 JST 2020


Hi Ishii-San

I have been thinking about this issue in the past days and unfortunately
couldn't think of a good solution.

As you mentioned in native replication mode and with snapshot isolation
as well almost all queries are sent to multiple PostgreSQL backends and
even when we issue pg_terminate_backend for one backend process
the Pgpool-II keeps on waiting for the other backends to finish their tasks.
Now there could be few solutions to this problem but each has its own
drawback.

The first solution would be to identify all the relevant backend PIDs
and issue individual pg_terminate_backend function to each backend
node after substituting the appropriate PID argument to the function.
But the problem with this solution is not only this is complex to
implement but also has some fundamental differences from the
actual functionality provided by the pg_terminate_backend(),
Since the pg_terminate_backend is supposed to kill a single troubling
backend process that was identified by the user from pg_stat_activity or any
other similar means so the user might not expect the function to influence
or kill some other backend on a different PG server. SO that might
come as a surprise for some users.

The second solution is to clearly mention this behavior in the
documentation and informs the users about how to proceed
with pg_terminate_backend in native replication and snapshot
isolation modes.
i.e.
step1: get the PIDs of all the backend process related to the particular
pgpool child process
step2: issue pg_terminate_backend for each PID individually.

The third solution I can think of is to add a new pgpool-II local
function for the purpose.
For example, we can add a new function in Pgpool-II say
pool_terminate_all_backends(pgpool-child-pid) and that new function
issues pg_terminate_backed() from within for each backend process.

What are your thoughts on these possible solutions and on a way forward?

Thanks


On Thu, Aug 20, 2020 at 11:01 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Another fundamental problem with pg_terminate_backend() with native
> replication mode (and snapshot isolation mode) is, pgpool needs to
> send pg_termnaite_backend() with different argument because the
> argument is a process id, which is not incosistent among backends.
>
> > Hi Usma,
> >
> > While looking into the 073.pg_terminate_backend test failure I found
> > interesting issue.
> >
> > Supoose we execute following SQL in native replication mode:
> >
> > ssesion 1: select pg_sleep(60); /* at time 't1' */
> >
> > session 2: select pg_terminate_backend('7615');       /* at time 't2' */
> >
> > The pg_sleep() should be canceled at time t2, but actually it is
> > canceled at t2 + 60 seconds. Also after the cancel we get:
> >
> > WARNING:  packet kind of backend 1 ['D'] does not match with
> master/majority nodes packet kind ['E']
> > WARNING:  write on backend 0 failed with error :"Success"
> > DETAIL:  while trying to write data from offset: 0 wlen: 5
> > FATAL:  failed to read kind from backend
> > DETAIL:  kind mismatch among backends. Possible last query was: "select
> pg_sleep(60);" kind details are: 0[E: terminating connection due to
> administrator command] 1[D]
> > HINT:  check data consistency among db nodes
> >
> > What actually happening here is:
> >
> > 2020-08-20 13:01:46: psql pid 7603: LOG:  DB node id: 0 backend pid:
> 7615 statement: BEGIN
> > 2020-08-20 13:01:46: psql pid 7603: LOG:  DB node id: 1 backend pid:
> 7616 statement: BEGIN
> > 2020-08-20 13:01:46: psql pid 7603: LOG:  DB node id: 0 backend pid:
> 7615 statement: select pg_sleep(60); <-- pgpool 7603 waiting for response
> from backend 0.
> > 2020-08-20 13:02:06: psql pid 7598: LOG:  DB node id: 0 backend pid:
> 7632 statement: SELECT version()
> > 2020-08-20 13:02:06: psql pid 7598: LOG:  DB node id: 0 backend pid:
> 7632 statement: SELECT count(*) FROM pg_catalog.pg_proc AS p,
> pg_catalog.pg_namespace AS n WHERE p.proname = 'pg_terminate_backend' AND
> n.oid = p.pronamespace AND n.nspname ~ '.*' AND p.provolatile = 'v'
> > 2020-08-20 13:02:06: psql pid 7598: LOG:  found the pg_terminate_backend
> request for backend pid:7615 on backend node:0
> > 2020-08-20 13:02:06: psql pid 7598: DETAIL:  setting the connection flag
> > 2020-08-20 13:02:06: psql pid 7598: LOG:  DB node id: 0 backend pid:
> 7632 statement: select pg_terminate_backend(7615);
> > 2020-08-20 13:02:06: psql pid 7603: LOG:  DB node id: 1 backend pid:
> 7616 statement: select pg_sleep(60);  <--- pgpool 7603 got response because
> pg_terminate_backend executed. pgpool 7603 started to wait for response
> from backend 1.
> > 2020-08-20 13:03:06: psql pid 7603: WARNING:  packet kind of backend 1
> ['D'] does not match with master/majority nodes packet kind ['E'] <-- after
> 60 seconds passed, pgpool 7603 got response from bacnend 0 and 1. <-- since
> backend 0 got error while backend 1 sucessfully executed pg_sleep(60),
> there were difference in packet kind.
> > 2020-08-20 13:03:06: psql pid 7603: FATAL:  failed to read kind from
> backend <-- and pgpool get angry!
> > 2020-08-20 13:03:06: psql pid 7603: DETAIL:  kind mismatch among
> backends. Possible last query was: "select pg_sleep(60);" kind details are:
> 0[E: terminating connection due to administrator command] 1[D]
> > 2020-08-20 13:03:06: psql pid 7603: HINT:  check data consistency among
> db nodes
> > 2020-08-20 13:03:06: psql pid 7603: WARNING:  write on backend 0 failed
> with error :"Success"
> > 2020-08-20 13:03:06: psql pid 7603: DETAIL:  while trying to write data
> from offset: 0 wlen: 5
> > 2020-08-20 13:03:06: main pid 7572: LOG:  child process with pid: 7603
> exits with status 512
> >
> >
> > Any idea how to deal with this problem?
> >
> > Best regards,
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> > English: http://www.sraoss.co.jp/index_en.php
> > Japanese:http://www.sraoss.co.jp
> > _______________________________________________
> > pgpool-hackers mailing list
> > pgpool-hackers at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>


-- 
...
Muhammad Usama
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20200825/bc782a05/attachment.html>


More information about the pgpool-hackers mailing list