[pgpool-general: 4780] Re: "replication" mode inconsistencies

Tue Jul 12 11:49:14 JST 2016

> On Fri, Aug 7, 2015 at 5:54 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> 
>> > I've done some more thinking on this, and I think the solution is
>> > simply *not* to record pgpool_status when *all* nodes are down (i.e.
>> > only record it when at least one node is up).  So pgpool_status will
>> > always reflect the last set of nodes to which any data was written.
>> > Upon restart, if the up-to-date (previously "up") node is in fact down
>> > (regardless of whether the stale ("down") node is back up), pgpool
>> > will detect this in its health check and will fail; if the up-to-date
>> > (previously "up") node is back up, then pgpool will commence using it.
>>
>> A downside of this approach is, the first health check after pgpool
>> restarting may take long time due to certain health checking retry
>> setting if the node is still down. However this downside is not new to
>> your approach (and there's a workaround: users can manually edit
>> pgpool_status file to set the node status to "down"). Besides this,
>> your approach seems attractive.
>>
> 
> I have been looking into this one and thinking of any other possible
> solutions we can have. And I think this solution of not recording the
> status of last working node when it goes down should work without any
> problem, And I believe this is a very good solution in a sense that it
> would require a minimum amount of code changes and the downside of the
> solution that at startup might require a long time when the previously last
> alive node is still down should not be worrisome since that situation would
> always require a user intervention to fix and a little extra time should
> not hurt in that case.

I have implemented this and commit/push in to master branch (which
will be upcoming pgpool-II 3.6).

> Along with that, I think while we are on that, it would be a good thing to
> record some more information in the pool_status file regarding the node
> that fails. Like along with the node status, if we also put the timestamp
> of the node status change and the last successful query processed by
> that backend node in the pool status file, It could be helpful in deciding
> the direction of recovery and may be for debugging the cause of node
> failure.

I have not implemented this part but seems like a good idea.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp