[Pgpool-general] Online recovery during load

Fernando Morgenstern fernando at consultorpc.com
Fri Jan 15 13:21:44 UTC 2010


Hello,

I will join this thread ( without being invited :P ) because i am having the same problem with pgpool.

I have it running for 1 week on a high load environment and I decided to test online recovery, that always worked on test env, but i get :

ERROR: pid 1199: wait_connection_closed: existing connections did not close in 90 sec.

I tested with different values for client_idle_limit_in_recovery and recovery_timeout , but all of them fail. Pgpool simply can't close all connections.

Just sharing another experience and hopefully waiting for a possible solution.

Regards,
---

Fernando Marcelo
www.consultorpc.com
fernando at consultorpc.com

Em 15/01/2010, às 06:11, Christophe Philemotte escreveu:

> Hi Jaume Sabater,
> 
> Thanks for answering me.
> 
>>> Before to present you my problem, just a question about the 2nd stage
>>> (you'll understand that this question is linked to my problem). Why the
>>> client connections have to be closed during this stage? Couldn't the
>>> recovered node catch up with the master node without stopping the service?
>> 
>> At the second stage, current idle connections are closed, and open
>> connections are offered some time to finish their work before being
>> closed. Connections need to be closed before the node being recovered
>> is started as, when it starts, it will obtain and process the pending
>> log files. This will put in the two nodes in sync, and then the queued
>> requests will be processed.
>> 
>> Conclusion: only when all connections are closed, pgpool-II can be
>> sure that the two nodes will be perfectly in sync.
> 
> Tell me if I'm wrong. That means that it is impossible to recover online
> if there are heavily used persistent opened connections. And I have to
> design my client application to not use persistent connection if I would
> like to perform online recovery. Is it correct?
> 
>>> Now, let me present you my problem. When I test online recovery during a
>>> typical database load, I've obtained two failed scenarios:
>>> 1. when the client_idle_limit_in_recovery is set (the best found value
>>> is 10s), the online recovery is done, but a few client requests have
>>> failed (timeout or closed connection);
>>> 2. when the client_idle_limit_in_recovery isn't set, the online recovery
>>> is not done, because of a few client connections that cannot be closed
>>> (There are effectively used by client processes, not lazy ones).
>> 
>> I have also been having problems under certain circumstances when
>> trying to recover a node since I first started with pgpool-II. Tatsuo
>> (and other contributors) have fixed some of these scenarios, but I
>> think there is still the problem that certain connections are not
>> dropped during the second stage, hence bloating the whole process.
> 
> Without using client_idle_limit_in_recovery, it is what I have noticed.
> 
>> I believe, but I cannot be sure, than when the DBA of my main client
>> has pgadmin3 open, it always fails. Usually, when connections are only
>> from the front-ends of the web platform (i.e. open connection, send a
>> request, (supposedly) close connection), everything goes fine. But
>> with "persistent" connections, so to speak, pgpool-II is not always
>> capable of dropping them.
> 
> Ok, it is my feeling I've just exposed above.
> 
>> Does it make any sense to you, Tatsuo?
> Does it?
> 
> Regards,
> 
> Christophe Philemotte
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general



More information about the Pgpool-general mailing list