[pgpool-hackers: 1504] Re: Proposal: minimize process restart when fail over occurs
Tatsuo Ishii
ishii at postgresql.org
Sun Apr 17 18:15:52 JST 2016
Ok, I have succeeded in not restart child process when certain
conditions are met.
- Streaming replication mode
- pcp_detach_node is used
- does not use the load balance node (that means the process does not
issue queries to the load balance node)
- the node is not primary node
At this point this just enhance allow following use cases (we assume
that pcp_detach_node detaches DB node N):
1) Lucky users connecting to the database server N are not affected by
the pcp_detach_node.
2) Planned DB shutdown. For demonstration purpose, I use pgbench -C.
- start pgbench -C
- change pgpool.conf to change the weight to 0 for backend N.
- pgpool reload
- pcp_detach_node N
- pgbench happily continues the benchmark
Probably #2 is practically useful.
I think we could expand this to certain cases such as PostgreSQL
is shutdown by admin. Will continue to work on this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
> I have moved forward a little bit with this. At this point I have just
> a created necessary infrastructure to deal with the goal. See
> [pgpool-committers: 3127] for more details.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>> So this is a proposal for pgpool-II 3.6.
>>
>> I already did some discussion on this:
>>
>> From: Tatsuo Ishii <ishii at postgresql.org>
>> Subject: [pgpool-hackers: 1413] Item #11, torward pgpool-II 3.6
>> Date: Fri, 19 Feb 2016 12:03:12 +0900 (JST)
>> Message-ID: <20160219.120312.816223524770393776.t-ishii at sraoss.co.jp>
>>
>> Here is a more or less formal proposal which is replacing it.
>>
>> Goal:
>>
>> Currently pgpool-II kills all child process when fail over (or switch
>> over by pcp_detach_node) occurs. Of course this leads to disconnecting
>> of all existing client connections because the peer process which
>> client is connecting is gone. This proposal is seeking a way to
>> minimize such session disconnections.
>>
>> o Precondition:
>>
>> I assume this proposal is for streaming replication mode only. Maybe
>> we could expand this for other modes in the future. I also assume the
>> broken server is not primary.
>>
>> o Consideration:
>>
>> What is the reason why we need to kill child process? Basically the
>> problem is the retry in the TCP/IP stack layer when the connection
>> goes wrong, for example, the network cable is pulled out. In this case
>> the only way to stop the retry is restarting the process.
>>
>> There are several chances where we could avoid the restarting:
>>
>> 1) Knowing that we are not dealing with a fail over caused by the
>> cabling problem. There are at least two cases we know the problem is
>> not a cabling:
>>
>> a) the fail over is triggered by pcp_detach_node.
>>
>> b) the fail over is triggered by posmaster shutdown.
>>
>> For other cases we need to find a way to know that the problem is a
>> cabling or not. Currently we use timeout to detect such that
>> situation. So if we could know if the timeout is occurred or not, then
>> we could know the problem is a cabling or not.
>>
>> 2) Once we succeed in #1, next thing we need to do is, whether a
>> session in question is using the broken server. This is fairly easy
>> because we already have the info on shared memory. If the session uses
>> the broken server, then we need to restart the process. No way. Other
>> case we just close a connection to the broken backend (if any).
>>
>> o Things we need to do:
>>
>> - Invent a way to know if the fail over request is created by
>> pcp_detach_node. Probably we add a new flag to the fail over request
>> packet to indicate whether the origin of the request is
>> pcp_detach_node or not.
>>
>> - The same technique above can be used for the admin PostgreSQL
>> shutdown case.
>>
>> - Create a API to deal with connections using the broken server.
>>
>> o What are the benefit once above proposal is implemented?
>>
>> - If conditions below are met, the user session can be survives after fail over.
>>
>> - Operated in streaming replication mode
>>
>> - The failed server is not primary
>>
>> - The session does not connect to the broken broken standby server
>>
>> Comments, opinions?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
More information about the pgpool-hackers
mailing list