[Pgpool-general] Unable to accept new connection after terminating pgpool backend process

Wed Dec 29 20:25:57 UTC 2010

Hi,

Thanks for the information. We decided to try ignoring the
"ADMIN_SHUTDOWN_ERROR_CODE". So far our tests shows this gives us what
we want.

We put the change in pool_process_query.c's
"detect_postmaster_down_error" function:

3739,3746c3739,3744
<       /*
<       int r =  detect_error(backend, ADMIN_SHUTDOWN_ERROR_CODE,
major, 'E', false);
<       if (r == SPECIFIED_ERROR)
<       {
<               pool_debug("detect_stop_postmaster_error: receive
admin shutdown error from a node.");
<               return r;
<       }
<       */
---
>       int r =  detect_error(backend, ADMIN_SHUTDOWN_ERROR_CODE, major, 'E', false);
>       if (r == SPECIFIED_ERROR)
>       {
>               pool_debug("detect_stop_postmaster_error: receive admin shutdown error from a node.");
>               return r;
>       }
3748c3746
<       int r = detect_error(backend, CRASH_SHUTDOWN_ERROR_CODE,
major, 'N', false);
---
>       r = detect_error(backend, CRASH_SHUTDOWN_ERROR_CODE, major, 'N', false);

This allows us to kill a backend Postgres process without bringing
down the whole pgpool. The pgpool child with the killed backend
connection will exit, and then a new child will be forked. Other
pgpool child processes are not being affected:

2010-12-29 14:17:42 DEBUG: pid 2765: detect_error: kind: E
2010-12-29 14:17:42 DEBUG: pid 2765: read_kind_from_backend: read kind
from 0 th backend E NUM_BACKENDS: 1
2010-12-29 14:17:42 DEBUG: pid 2765: ProcessBackendResponse: kind from
backend: E
2010-12-29 14:17:42 LOG:   pid 2765: do_child: exits with status 1 due to error
2010-12-29 14:17:42 ERROR: pid 2765: pool_flush_it: write failed to
backend (0). reason: Broken pipe offset: 0 wlen: 5
2010-12-29 14:17:42 DEBUG: pid 2763: reap_handler called
2010-12-29 14:17:42 DEBUG: pid 2763: reap_handler: call wait3
2010-12-29 14:17:42 DEBUG: pid 2763: child 2765 exits with status 256
2010-12-29 14:17:42 DEBUG: pid 2763: fork a new child pid 3551
2010-12-29 14:17:42 DEBUG: pid 2763: reap_handler: normally exited
2010-12-29 14:17:42 DEBUG: pid 3551: I am 3551

Thanks.
-Arthur Chang

On Wed, Dec 29, 2010 at 8:30 AM, Kelly Burkhart
<kelly.burkhart at gmail.com> wrote:
> On Wed, Dec 29, 2010 at 7:41 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>> What we like to have is the ability to:
>>>>> - Treat the error code got back from pg_terminate_backend (or the kill
>>>>> command) as a regular disconnect
>>>>
>>>> It isn't possible without changing PostgreSQL itself.
>>>
>>> Can you please explain why?  If we kill -INT a backend postgres
>>> process, it doesn't take the whole database cluster down, it shouldn't
>>> take the whole pool down either.  The fact that it does is a bug in
>>> pg_pool IMO.
>>
>> The problem is PostgreSQL returns exactly the same error code when
>> postmaster goes down. See:
>>
>> http://archives.postgresql.org/pgsql-hackers/2010-05/msg00629.php
>>
>
> OK, I'm not familiar enough with postgres code to understand why
> killing a single backend should return the same code to client as a
> controlled shutdown of the database.  However, if the DB is shutdown,
> *every* pool process will get the error code, in our case only the one
> running the query that needed to be stopped.
>
> Since ADMIN_SHUTDOWN_ERROR_CODE is sent by the backend for two
> completely different events, perhaps the answer is to not test for it.
>  If a single backend goes down you send the error code to the client
> and disconnect.  If all backends go down, that's effectively the same
> as the database machine crashing.
>
> -K
>