[pgpool-general: 8002] Re: pcp_node_info does not return when host is lost on 4.3.0

Wed Jan 26 16:50:17 JST 2022

>> If that's not possible, the number of timeouts should be reduced to an
>> > absolute minimum.
>>
>> Ok, I would add timeout 1 second (that's the minimum) to the call for
>> PQpingParams.
>>
> 
> I think the PQpingParams should take the same connect_timeout as configured
> in the pgpool config. Otherwise this method might hit a timeout while other
> parts of the code don't. I think that would only cause confusion.

Hum. That makes sense. Initially I thought PQpingParams is called only
for pcp_node_info (and show pool_nodes) and the timeout should be
small. But yes, user may be confused if different timeout is used.

> The loop is in get_nodes. It always iterates over all
> nodes. inform_node_info (pcp_worker.c) calls this function and only prints
> the selected node.

>> Also, it
>> > should not try to connect twice. If the first attempt fails, the second
>> > should be skipped.
>>
> 
> Looking at the previous patches, I think both patches are needed. The last
> patch (pcp_hang.patch) prevents a second timeout for the same backend when
> the ping fails. The first (pcp_node_info_hang.patch) is also still needed,
> because there's a small chance the database will get lost in between the
> calls. pcp_node_info should not perform retries. So the two things
> remaining are: setting a connect_timeout on PQpingParams and get_nodes
> should be refactored to only collect the information for one node if the
> pcp worker has a node id.

Right.

I have already fixed this and Peng is testing it now.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp