[Pgpool-hackers] Replication node failure == aborted transaction... bug?

Thu Feb 4 14:04:33 UTC 2010

> > Yeah, it's desirable but pretty hard to implement.
> > 
> > For example, consider load balanced SELECT. If a node goes offline
> > while SELECT result is being returned to frontend, pgpool needs to
> > remember all the rows which it has been already received and restart
> > the transmission from the point against the different node.
> 
> Indeed, that'd be pretty complicated to do.  But i think there are some
> specific cases that would be less complicated to catch:
> 
>  * a SELECT is load balanced to a node that is offline (i.e. the write
>    to the backend with teh query packet fails before any response is
>    recieved).
>  * an UPDATE or INSERT succeeds on the master node but fails on one or more
>    replication nodes (i.e. they are detected as out of sync and failed).
>  * an UPDATE or INSERT fails on the master node before any response is
>    recieved (similar to the first case), such that the "mastership" is
>    transferred to another node.
> 
> I'd think that in all of these cases it'd wouldn't be unreasonable to
> transparently catch it and hide it from the frontend, what do you think?

Agreed. But at the same time we need to think about an interfere with
health checking. Apparently upon those conditions the health checking
will detect backend failure and the failover process starts. Currently
it terminates all child processes thus all live sessions between
frontend and pgpool are killed. We need to change it not to simply
terminate, instead to trigger a flag which suggests that backend goes
down.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp