[pgpool-hackers: 4128] Re: invalid degenerate backend request, node id : 2 status: [2] is not valid for failover

Tatsuo Ishii ishii at sraoss.co.jp
Sat Feb 19 10:33:26 JST 2022


> Hello
> 
> I had the following setup:
> 1 pgpool: 4.2.2
> 3 postgres nodes (all 12.5) - 1 primary, 2 replicas
> 
> I turned off both replicas one after the other with a difference of a couple seconds. Failover was performed for one of the replicas but not for the other. For the second replica the healthcheck keeps on failing but it just keeps on performing the healthcheck in an infinite loop. When pgpool attempts the failover of the second replica, I see this error in the logs:
> 
> invalid degenerate backend request, node id : 2 status: [2] is not valid for failover
> 
> As far as I understand, status [2] is up, so the failover should be performed. Here’s a snippet of the logs:

You are right. Problem is, two different internal APIs which both extract the status do not agree.
The error messages comes from here:
src/main/pool_internal_comms.c: degenerate_backend_set_ex()

	for (i = 0; i < count; i++)
	{
		if (node_id_set[i] < 0 || node_id_set[i] >= MAX_NUM_BACKENDS ||
			(!VALID_BACKEND(node_id_set[i]) && BACKEND_INFO(node_id_set[i]).quarantine == false))
		{
			if (node_id_set[i] < 0 || node_id_set[i] >= MAX_NUM_BACKENDS)
				ereport(elevel,
						(errmsg("invalid degenerate backend request, node id: %d is out of range. node id must be between [0 and %d]"
								,node_id_set[i], MAX_NUM_BACKENDS)));
			else
				ereport(elevel,
						(errmsg("invalid degenerate backend request, node id : %d status: [%d] is not valid for failover"
								,node_id_set[i], BACKEND_INFO(node_id_set[i]).backend_status)));

I think VALID_BACKEND(node_id_set[i]) returned false
here. VALID_BACKEND returns false if the backend status in the shared
memory area is neither 1 (waiting for connection) nor 2 (up and
running). However BACKEND_INFO(node_id_set[i]).backend_status says
backend_status is actually 2 (up and running). The strange things is,
both VALID_BACKEND and BACKEND_INFO(node_id_set[i]).backend_status
look into the same shared memory area. Let me explain.

VALID_BACKEND is a macron to be expanded as:

#define VALID_BACKEND(backend_id) \
	((RAW_MODE && (backend_id) == REAL_MAIN_NODE_ID) ||		\
	(pool_is_node_to_be_sent_in_current_query((backend_id)) &&	\
	 ((*(my_backend_status[(backend_id)]) == CON_UP) ||			\
	  (*(my_backend_status[(backend_id)]) == CON_CONNECT_WAIT))))

Since you are running pgpool in other than RAW_MODE, the macro first
checks if pool_is_node_to_be_sent_in_current_query((backend_id)
returns true. Here is the portion of the code (defined in
src/context/pool_query_context.c).

bool
pool_is_node_to_be_sent_in_current_query(int node_id)
{
	POOL_SESSION_CONTEXT *sc;

	if (RAW_MODE)
		return node_id == REAL_MAIN_NODE_ID;

As I said earlier, RAW_MODE returns false. So it executes next line:

	sc = pool_get_session_context(true);
	if (!sc)
		return true;

pool_get_session_context() returns false and this function returns
true because health check process did not use "sessoion context". The
session context only exists on pgpool child process which deal with
user connections.

Then the macro checks local variable:
my_backend_status[(backend_id). The local variable is initialized at
the pgpool main process (src/main/pgpool_main.c) and inherited via
fork() to the health check process.

	/*
	 * Initialize backend status area. From now on, VALID_BACKEND macro can be
	 * used. (get_next_main_node() uses VALID_BACKEND)
	 */

	for (i = 0; i < MAX_NUM_BACKENDS; i++)
	{
		my_backend_status[i] = &(BACKEND_INFO(i).backend_status);
	}

As you can see actually my_backend_status stores the pointer to
BACKEND_INFO(i).backend_status. So *(my_backend_status[(backend_id)
should be same as BACKEND_INFO(backend_id).backend_status.

However, as I said earlier, it seems they seem to disagree. At this
point I can't think of any explanation for this.

How often do you see this problem? It's reliably reproduced?
I was not able to reproduce this so far.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list