[Pgpool-general] Unable to accept new connection after terminating pgpool backend process

Mon Jan 3 15:25:24 UTC 2011

I verified "kill -INT" Postgres backend doesn't trigger failover (both
2.3.3 and 3.0.1). In my previous tests, I used "kill <PID>" which
defaults to "kill -TERM", and it triggered failovers.

We still hope kill -TERM doesn't trigger failover, and it appears we
can achieve that by ignoring ADMIN_SHUTDOWN_ERROR_CODE.

Other than triggering failover, is there some other use of
ADMIN_SHUTDOWN_ERROR_CODE that we should be aware of?

Thanks.
-Arthur

On Sun, Jan 2, 2011 at 6:09 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> I couldn't reproduce your problem on CVS HEAD.
>
> connect to pgpool
> test=# begin;
> BEGIN
> test=# lock t1; <-- waiting for acquiring lock. I kill PostgreSQL backend by using SIGINT
> ERROR:  canceling statement due to user request
> test=# end;
> ROLLBACK
> test=#
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>> Then it must be a bug, or at least an unexpected feature.
>> I will look into this.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>> It does trigger a failover and worse, once a backend has been
>>> terminated with SIGINT, no new connections can be established through
>>> the pool until the whole pool is restarted.  That's exactly the
>>> problem we are trying to solve.
>>>
>>> -K
>>>
>>> On Sun, Jan 2, 2011 at 3:54 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>> On Wed, Dec 29, 2010 at 7:41 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>>>>>> What we like to have is the ability to:
>>>>>>>>> - Treat the error code got back from pg_terminate_backend (or the kill
>>>>>>>>> command) as a regular disconnect
>>>>>>>>
>>>>>>>> It isn't possible without changing PostgreSQL itself.
>>>>>>>
>>>>>>> Can you please explain why?  If we kill -INT a backend postgres
>>>>>>> process, it doesn't take the whole database cluster down, it shouldn't
>>>>>>> take the whole pool down either.  The fact that it does is a bug in
>>>>>>> pg_pool IMO.
>>>>>>
>>>>>> The problem is PostgreSQL returns exactly the same error code when
>>>>>> postmaster goes down. See:
>>>>>>
>>>>>> http://archives.postgresql.org/pgsql-hackers/2010-05/msg00629.php
>>>>>>
>>>>>
>>>>> OK, I'm not familiar enough with postgres code to understand why
>>>>> killing a single backend should return the same code to client as a
>>>>> controlled shutdown of the database.  However, if the DB is shutdown,
>>>>> *every* pool process will get the error code, in our case only the one
>>>>> running the query that needed to be stopped.
>>>>>
>>>>> Since ADMIN_SHUTDOWN_ERROR_CODE is sent by the backend for two
>>>>> completely different events, perhaps the answer is to not test for it.
>>>>>  If a single backend goes down you send the error code to the client
>>>>> and disconnect.  If all backends go down, that's effectively the same
>>>>> as the database machine crashing.
>>>>
>>>> If the particular use is canceling the long running query, why don't
>>>> you use query cancel? It can be done by signaling PostgreSQL backend
>>>> process with SIGINT signal. This does not trigger failover of
>>>> pgpool-II of course.
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese: http://www.sraoss.co.jp
>>>>
>> _______________________________________________
>> Pgpool-general mailing list
>> Pgpool-general at pgfoundry.org
>> http://pgfoundry.org/mailman/listinfo/pgpool-general
>