[pgpool-hackers: 4132] Re: invalid degenerate backend request, node id : 2 status: [2] is not valid for failover

Tatsuo Ishii ishii at sraoss.co.jp
Mon Feb 21 10:20:29 JST 2022


Hi Anirudh,

> Hello Tatsuo
> 
> Thank you so much for the explanation. I understand the problem but same as you, cannot understand the reason behind it. The same variable 'BACKEND_INFO(i).backend_status’ cannot have 2 different values. Maybe the VALID_BACKEND macro is false due to other checks that it has in its if else blocks. I am not sure. I will try to have a look.

Maybe. It would be nice if you could attach debugger to the health check
process which emits the log:

>> > invalid degenerate backend request, node id : 2 status: [2] is not valid for failover

then examin variables which may be related to the issue. One of them I am especially interested in these:

(1)
static POOL_SESSION_CONTEXT * session_context = NULL;

The variable in the health check is expected to be NULL. If not, we have trouble.


(2)
BACKEND_STATUS private_backend_status[MAX_NUM_BACKENDS];

This is the backend status copy from the shared memory area.
This is expected to be 1 or 2 (CONN_CONNECT_WAIT or CON_UP).

BTW, I need to correct my explanation.

>> my_backend_status[(backend_id). The local variable is initialized at
>> the pgpool main process (src/main/pgpool_main.c) and inherited via
>> fork() to the health check process.
>>
>> /*
>> * Initialize backend status area. From now on, VALID_BACKEND macro can be
>> * used. (get_next_main_node() uses VALID_BACKEND)
>> */
>>
>> for (i = 0; i < MAX_NUM_BACKENDS; i++)
>> {
>> my_backend_status[i] = &(BACKEND_INFO(i).backend_status);
>> }
>>
>> As you can see actually my_backend_status stores the pointer to
>> BACKEND_INFO(i).backend_status. So *(my_backend_status[(backend_id)
>> should be same as BACKEND_INFO(backend_id).backend_status.

Actually my_backend_status is not initialized like this. health check process calls this:
	/* Initialize my backend status */
	pool_initialize_private_backend_status();

It copies backend status in the shared memory area to the process
local memory. This is necessary for pgpool child process which deals
with client session so that they are not bothered by the status
changes in the shared memory area in the middle of process.  BUT this
is not necessary for health check process. I am going to fix this.

However I don't think this can explain the issue. As long as there's
no difference between the local variable and the shared memory in the
life time, the bug does not do anything bad, and I cannot think of
any scenario which could difference between the variables.

> As for the reproducibility, you are right, in the past few months of development and testing, I have only seen this issue a couple times. Earlier, I thought it was something that I missed in the config but now that my setup is almost production ready, I opened this ticket for help. It is really difficult to reproduce but I currently have a setup running where this problem is persistent. I have tried by turning that replica off and on several times and it connects back to the cluster when it's up but I see the same error when it’s down.
> 
> So, if you prefer I can help you investigate more on this by sharing some detailed logs and config options from my setup.
> 
> Even though this issue comes up rarely, but this is important for me to resolve because my setup uses synchronous replication and if both of my replicas are down, I convert the replication mode from sync to async in the failover script. But if this bug shows up, then the failover script is not executed when the last replica goes down and my queries hang forever because they do not have a replica committing transactions in synchronous mode.
> 
> So, if I can help you debug this by providing any logs or config or a direct access to my setup over a Zoom call, that’d be really great for me. Please let me know if this works for you.

Thanks. Is it possible for you to apply attached patches so that
pgpool produces additional log? (this will emit log whenever failover
happens). The log basically emits information which can be obtained by
debugger explained above.

> Thank you
> 
> Anirudh
> On 19 Feb 2022, 2:33 AM +0100, Tatsuo Ishii <ishii at sraoss.co.jp>, wrote:
>> > Hello
>> >
>> > I had the following setup:
>> > 1 pgpool: 4.2.2
>> > 3 postgres nodes (all 12.5) - 1 primary, 2 replicas
>> >
>> > I turned off both replicas one after the other with a difference of a couple seconds. Failover was performed for one of the replicas but not for the other. For the second replica the healthcheck keeps on failing but it just keeps on performing the healthcheck in an infinite loop. When pgpool attempts the failover of the second replica, I see this error in the logs:
>> >
>> > invalid degenerate backend request, node id : 2 status: [2] is not valid for failover
>> >
>> > As far as I understand, status [2] is up, so the failover should be performed. Here¢s a snippet of the logs:
>>
>> You are right. Problem is, two different internal APIs which both extract the status do not agree.
>> The error messages comes from here:
>> src/main/pool_internal_comms.c: degenerate_backend_set_ex()
>>
>> for (i = 0; i < count; i++)
>> {
>> if (node_id_set[i] < 0 || node_id_set[i] >= MAX_NUM_BACKENDS ||
>> (!VALID_BACKEND(node_id_set[i]) && BACKEND_INFO(node_id_set[i]).quarantine == false))
>> {
>> if (node_id_set[i] < 0 || node_id_set[i] >= MAX_NUM_BACKENDS)
>> ereport(elevel,
>> (errmsg("invalid degenerate backend request, node id: %d is out of range. node id must be between [0 and %d]"
>> ,node_id_set[i], MAX_NUM_BACKENDS)));
>> else
>> ereport(elevel,
>> (errmsg("invalid degenerate backend request, node id : %d status: [%d] is not valid for failover"
>> ,node_id_set[i], BACKEND_INFO(node_id_set[i]).backend_status)));
>>
>> I think VALID_BACKEND(node_id_set[i]) returned false
>> here. VALID_BACKEND returns false if the backend status in the shared
>> memory area is neither 1 (waiting for connection) nor 2 (up and
>> running). However BACKEND_INFO(node_id_set[i]).backend_status says
>> backend_status is actually 2 (up and running). The strange things is,
>> both VALID_BACKEND and BACKEND_INFO(node_id_set[i]).backend_status
>> look into the same shared memory area. Let me explain.
>>
>> VALID_BACKEND is a macron to be expanded as:
>>
>> #define VALID_BACKEND(backend_id) \
>> ((RAW_MODE && (backend_id) == REAL_MAIN_NODE_ID) || \
>> (pool_is_node_to_be_sent_in_current_query((backend_id)) && \
>> ((*(my_backend_status[(backend_id)]) == CON_UP) || \
>> (*(my_backend_status[(backend_id)]) == CON_CONNECT_WAIT))))
>>
>> Since you are running pgpool in other than RAW_MODE, the macro first
>> checks if pool_is_node_to_be_sent_in_current_query((backend_id)
>> returns true. Here is the portion of the code (defined in
>> src/context/pool_query_context.c).
>>
>> bool
>> pool_is_node_to_be_sent_in_current_query(int node_id)
>> {
>> POOL_SESSION_CONTEXT *sc;
>>
>> if (RAW_MODE)
>> return node_id == REAL_MAIN_NODE_ID;
>>
>> As I said earlier, RAW_MODE returns false. So it executes next line:
>>
>> sc = pool_get_session_context(true);
>> if (!sc)
>> return true;
>>
>> pool_get_session_context() returns false and this function returns
>> true because health check process did not use "sessoion context". The
>> session context only exists on pgpool child process which deal with
>> user connections.
>>
>> Then the macro checks local variable:
>> my_backend_status[(backend_id). The local variable is initialized at
>> the pgpool main process (src/main/pgpool_main.c) and inherited via
>> fork() to the health check process.
>>
>> /*
>> * Initialize backend status area. From now on, VALID_BACKEND macro can be
>> * used. (get_next_main_node() uses VALID_BACKEND)
>> */
>>
>> for (i = 0; i < MAX_NUM_BACKENDS; i++)
>> {
>> my_backend_status[i] = &(BACKEND_INFO(i).backend_status);
>> }
>>
>> As you can see actually my_backend_status stores the pointer to
>> BACKEND_INFO(i).backend_status. So *(my_backend_status[(backend_id)
>> should be same as BACKEND_INFO(backend_id).backend_status.
>>
>> However, as I said earlier, it seems they seem to disagree. At this
>> point I can't think of any explanation for this.
>>
>> How often do you see this problem? It's reliably reproduced?
>> I was not able to reproduce this so far.
>>
>> Best reagards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: failover_log.patch
Type: text/x-patch
Size: 927 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20220221/2bc70c45/attachment-0001.bin>


More information about the pgpool-hackers mailing list