[Pgpool-general] Replication and Failover

Mon Oct 11 12:44:36 UTC 2010

>  On 10/05/10 09:33, Gary Fu wrote:
>>  On 10/04/10 18:56, Tatsuo Ishii wrote:
>>>> I'm running pgpool2 3.0 with replication mode. I just noticed that
>>>> when
>>>> the pgpool failover (due to mismatch error) is done by shutting down
>>>> the
>>>> secondary db, my application failed due to the lost of connection.
>>>> The documentation mention that when the failover is performed,
>>>> pgpool kills all its child processes and starts new child processes
>>>> for
>>>> new connections from the clients.  Does this mean that my application
>>>> has to make the new connection when the failover happens ?
>>> Yes.
>>>
>>>> If so,
>>>> the question is how does my application know there is a failover ?
>>> In this case libpq/your_favorite_driver returns error "server closed
>>> the connection unexpectedly This probably means the server
>>> terminated abnormally before or while processing the request."
>>> -- 
>>
>> Is there a way to just disable the secondary db when the mismatch
>> error
>> happens without failover, so that my application can keep working with
>> the primary db without making a new connection ?
>>
>> I did tested (as far as I can remember) before with old pgpool2
>> version,
>> that when I shut down one of the db, my application kept working
>> without
>> lost connection error.  What's the difference between this case and
>> the
>> faileover case ?
>>
>> Thanks,
>> Gary
> 
> Hi Tatsuo,
> 
> Could you provide any answer or suggestion on above questions I have ?
> 
> Thanks,
> Gary

Sorry for delay. This is a repeatedly asked question. The answer is in
a comment in main.c:

/*
 * Before we tried to minimize restarting pgpool to protect existing
 * connections from clients to pgpool children. What we did here was,
 * if children other than master went down, we did not fail over.
 * This is wrong. Think about following scenario. If someone
 * accidentally plugs out the network cable, the TCP/IP stack keeps
 * retrying for long time (typically 2 hours). The only way to stop
 * the retry is restarting the process.  Bottom line is, we need to
 * restart all children in any case.  See pgpool-general list posting
 * "TCP connections are *not* closed when a backend timeout" on Jul 13
 * 2008 for more details.
 */

Here is the original complain.

> Subject: [Pgpool-general] TCP connections are *not* closed when a backend timeout
> From: Maxence DUNNEWIND <maxence at dunnewind.net>
> To: pgpool-general at pgfoundry.org
> Date: Fri, 11 Jul 2008 11:34:37 +0200
> Sender: pgpool-general-bounces at pgfoundry.org
> User-Agent: Mutt/1.5.13 (2006-08-11)
> X-Mew: <1> No his/her public key: ID = 0x9334C111
> X-Mew: tab/spc characters on Subject: are simplified.
> 
> Hi,
> 
> I'm working on recovery  with pgpool.
> When a backend failed (ie, for exemple, when the postgresql server shuts
> down), all seems OK, connections are closed and backend is set as down
> (status 3).
> 
> My problem is in case of network problem. If I remove the network link
> with the backend, pgpool correctly detects it as down when healthcheck
> timeout but it does *not* close tcp connections to remote backend.
> 
> The problem is that when the link comes back and when I start
> pcp_recovery_node, the second stage can't process because there are
> existing connections to backend ...
> 
> Is this a normal thing? Or is this a bug ?
> 
> I'm trying to find how I can close the connections when healthcheck
> timeout...

Here is my reply:

> Thanks for the report.
> 
> If in case of network problem, the underlying TCP/IP stack is keeping
> retrying, and the only safe way to shutdown the connection is
> restarting the process. Even if we safely close the limbo connection,
> we need to pass the limbo connection info from parent to child process
> (remember that health checking is done in parent, and child is keeping
> the connection). 
> 
> This is not problem if master goes does down, since all children will
> restart anyway. I guess you remove the netork cable for nodes other
> than master. The difference here is pgpool tries to minimize
> restarting children. If master does not fail, pgpool will not do the
> restarting.
> 
> From your report I think this logic si wrong and we need to restart
> children in *any* case. Included is the patch for this. Could you
> please try it out?

So we decided we always restart all pgpool child process.

But if you think you never unplug network cable, myabe you could bring
back the ifdef-outed code to treat master node specially. Right after
the comment in main.c you see #ifdef NOT_USED. Just remove the ifdef
and try it out...
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp