[pgpool-hackers: 4134] Re: invalid degenerate backend request, node id : 2 status: [2] is not valid for failover

Mon Feb 21 15:40:56 JST 2022

> Dear Tatsuo
> 
> I am not an expert on C programming, but if I understand correctly, applying this debugger patch would mean doing it on a fresh copy of the pgpool code,
> re-compiling it and then running another instance of pgpool from that new code. If that’s the procedure, I am afraid, we might not be able to reproduce the
> bug in that new instance as it is so infrequent.

Ok.

> Is there a way we can debug this or get these variables in runtime? I highly doubt it as I already tried increasing the log level to debug5 but couldn’t find
> anything related but I still want to ask because you might know of better alternatives.

The only way is attaching debugger to the process when the issue is reproduced.

> If not, I will definitely try to run it in a fresh pgpool instance and leave the current buggy one running as it is. Just hoping for a better alternative.

I have found one case which reproduces the issue. Unfortunately the
version I used is 4.3, not 4.2 and the test case needs new option for
pcp_promote command which does not exist in 4.2. Anyway the step to
reproduce the issue is as follows (using 3 PostgreSQL):

(1) shutdown pgpool (node 0 is primary).

(2) set node 1 status to "down" in pgpool_status file so that node 1
is to be set down from pgpool's point of view.

(3) start pgpool.

(4) pcp_promote node -s 2

(5) shutdown node 1 PostgreSQL.

(6) Pgpool complains: invalid degenerate backend request, node id : 1 status: [2] is not valid for failover

(7) infinit health check failure on node 1.

I confirmed the private_backend_status for node 1 is "down", which
cause the issue by attaching debugger to the health check process for
node 1. So I think correct fix for this is removing the call to
pool_initialize_private_backend_status(). I will work on finding a way
to reproduce the issue on Pgpool-II 4.2.

> Thank you
> 
> Anirudh
> On 21 Feb 2022, 2:20 AM +0100, Tatsuo Ishii <ishii at sraoss.co.jp>, wrote:
> 
>  Hi Anirudh,
> 
>  Hello Tatsuo
> 
>  Thank you so much for the explanation. I understand the problem but same as you, cannot understand the reason behind it. The same variable
>  'BACKEND_INFO(i).backend_status’ cannot have 2 different values. Maybe the VALID_BACKEND macro is false due to other checks that it has in its
>  if else blocks. I am not sure. I will try to have a look.
> 
>  Maybe. It would be nice if you could attach debugger to the health check
>  process which emits the log:
> 
>  invalid degenerate backend request, node id : 2 status: [2] is not valid for failover
> 
>  then examin variables which may be related to the issue. One of them I am especially interested in these:
> 
>  (1)
>  static POOL_SESSION_CONTEXT * session_context = NULL;
> 
>  The variable in the health check is expected to be NULL. If not, we have trouble.
> 
>  (2)
>  BACKEND_STATUS private_backend_status[MAX_NUM_BACKENDS];
> 
>  This is the backend status copy from the shared memory area.
>  This is expected to be 1 or 2 (CONN_CONNECT_WAIT or CON_UP).
> 
>  BTW, I need to correct my explanation.
> 
>  my_backend_status[(backend_id). The local variable is initialized at
>  the pgpool main process (src/main/pgpool_main.c) and inherited via
>  fork() to the health check process.
> 
>  /*
>  * Initialize backend status area. From now on, VALID_BACKEND macro can be
>  * used. (get_next_main_node() uses VALID_BACKEND)
>  */
> 
>  for (i = 0; i < MAX_NUM_BACKENDS; i++)
>  {
>  my_backend_status[i] = &(BACKEND_INFO(i).backend_status);
>  }
> 
>  As you can see actually my_backend_status stores the pointer to
>  BACKEND_INFO(i).backend_status. So *(my_backend_status[(backend_id)
>  should be same as BACKEND_INFO(backend_id).backend_status.
> 
>  Actually my_backend_status is not initialized like this. health check process calls this:
>  /* Initialize my backend status */
>  pool_initialize_private_backend_status();
> 
>  It copies backend status in the shared memory area to the process
>  local memory. This is necessary for pgpool child process which deals
>  with client session so that they are not bothered by the status
>  changes in the shared memory area in the middle of process. BUT this
>  is not necessary for health check process. I am going to fix this.
> 
>  However I don't think this can explain the issue. As long as there's
>  no difference between the local variable and the shared memory in the
>  life time, the bug does not do anything bad, and I cannot think of
>  any scenario which could difference between the variables.
> 
>  As for the reproducibility, you are right, in the past few months of development and testing, I have only seen this issue a couple times. Earlier, I
>  thought it was something that I missed in the config but now that my setup is almost production ready, I opened this ticket for help. It is really
>  difficult to reproduce but I currently have a setup running where this problem is persistent. I have tried by turning that replica off and on several
>  times and it connects back to the cluster when it's up but I see the same error when it’s down.
> 
>  So, if you prefer I can help you investigate more on this by sharing some detailed logs and config options from my setup.
> 
>  Even though this issue comes up rarely, but this is important for me to resolve because my setup uses synchronous replication and if both of my
>  replicas are down, I convert the replication mode from sync to async in the failover script. But if this bug shows up, then the failover script is not
>  executed when the last replica goes down and my queries hang forever because they do not have a replica committing transactions in synchronous
>  mode.
> 
>  So, if I can help you debug this by providing any logs or config or a direct access to my setup over a Zoom call, that’d be really great for me. Please
>  let me know if this works for you.
> 
>  Thanks. Is it possible for you to apply attached patches so that
>  pgpool produces additional log? (this will emit log whenever failover
>  happens). The log basically emits information which can be obtained by
>  debugger explained above.
> 
>  Thank you
> 
>  Anirudh
>  On 19 Feb 2022, 2:33 AM +0100, Tatsuo Ishii <ishii at sraoss.co.jp>, wrote:
> 
>  Hello
> 
>  I had the following setup:
>  1 pgpool: 4.2.2
>  3 postgres nodes (all 12.5) - 1 primary, 2 replicas
> 
>  I turned off both replicas one after the other with a difference of a couple seconds. Failover was performed for one of the replicas but not for
>  the other. For the second replica the healthcheck keeps on failing but it just keeps on performing the healthcheck in an infinite loop. When
>  pgpool attempts the failover of the second replica, I see this error in the logs:
> 
>  invalid degenerate backend request, node id : 2 status: [2] is not valid for failover
> 
>  As far as I understand, status [2] is up, so the failover should be performed. Here￠s a snippet of the logs:
> 
>  You are right. Problem is, two different internal APIs which both extract the status do not agree.
>  The error messages comes from here:
>  src/main/pool_internal_comms.c: degenerate_backend_set_ex()
> 
>  for (i = 0; i < count; i++)
>  {
>  if (node_id_set[i] < 0 || node_id_set[i] >= MAX_NUM_BACKENDS ||
>  (!VALID_BACKEND(node_id_set[i]) && BACKEND_INFO(node_id_set[i]).quarantine == false))
>  {
>  if (node_id_set[i] < 0 || node_id_set[i] >= MAX_NUM_BACKENDS)
>  ereport(elevel,
>  (errmsg("invalid degenerate backend request, node id: %d is out of range. node id must be between [0 and %d]"
>  ,node_id_set[i], MAX_NUM_BACKENDS)));
>  else
>  ereport(elevel,
>  (errmsg("invalid degenerate backend request, node id : %d status: [%d] is not valid for failover"
>  ,node_id_set[i], BACKEND_INFO(node_id_set[i]).backend_status)));
> 
>  I think VALID_BACKEND(node_id_set[i]) returned false
>  here. VALID_BACKEND returns false if the backend status in the shared
>  memory area is neither 1 (waiting for connection) nor 2 (up and
>  running). However BACKEND_INFO(node_id_set[i]).backend_status says
>  backend_status is actually 2 (up and running). The strange things is,
>  both VALID_BACKEND and BACKEND_INFO(node_id_set[i]).backend_status
>  look into the same shared memory area. Let me explain.
> 
>  VALID_BACKEND is a macron to be expanded as:
> 
>  #define VALID_BACKEND(backend_id) \
>  ((RAW_MODE && (backend_id) == REAL_MAIN_NODE_ID) || \
>  (pool_is_node_to_be_sent_in_current_query((backend_id)) && \
>  ((*(my_backend_status[(backend_id)]) == CON_UP) || \
>  (*(my_backend_status[(backend_id)]) == CON_CONNECT_WAIT))))
> 
>  Since you are running pgpool in other than RAW_MODE, the macro first
>  checks if pool_is_node_to_be_sent_in_current_query((backend_id)
>  returns true. Here is the portion of the code (defined in
>  src/context/pool_query_context.c).
> 
>  bool
>  pool_is_node_to_be_sent_in_current_query(int node_id)
>  {
>  POOL_SESSION_CONTEXT *sc;
> 
>  if (RAW_MODE)
>  return node_id == REAL_MAIN_NODE_ID;
> 
>  As I said earlier, RAW_MODE returns false. So it executes next line:
> 
>  sc = pool_get_session_context(true);
>  if (!sc)
>  return true;
> 
>  pool_get_session_context() returns false and this function returns
>  true because health check process did not use "sessoion context". The
>  session context only exists on pgpool child process which deal with
>  user connections.
> 
>  Then the macro checks local variable:
>  my_backend_status[(backend_id). The local variable is initialized at
>  the pgpool main process (src/main/pgpool_main.c) and inherited via
>  fork() to the health check process.
> 
>  /*
>  * Initialize backend status area. From now on, VALID_BACKEND macro can be
>  * used. (get_next_main_node() uses VALID_BACKEND)
>  */
> 
>  for (i = 0; i < MAX_NUM_BACKENDS; i++)
>  {
>  my_backend_status[i] = &(BACKEND_INFO(i).backend_status);
>  }
> 
>  As you can see actually my_backend_status stores the pointer to
>  BACKEND_INFO(i).backend_status. So *(my_backend_status[(backend_id)
>  should be same as BACKEND_INFO(backend_id).backend_status.
> 
>  However, as I said earlier, it seems they seem to disagree. At this
>  point I can't think of any explanation for this.
> 
>  How often do you see this problem? It's reliably reproduced?
>  I was not able to reproduce this so far.
> 
>  Best reagards,
>  --
>  Tatsuo Ishii
>  SRA OSS, Inc. Japan
>  English: http://www.sraoss.co.jp/index_en.php
>  Japanese:http://www.sraoss.co.jp
> 
> *