[pgpool-general: 9071] Re: Segmentation after switchover

Thu Apr 4 10:19:56 JST 2024

> I was afraid this was going to be your answer :) Unfortunately, I really
> don't know what sequence of events and actions triggers the error. All I
> know is that our reboot test triggers it. Also, I don't know how to setup
> such a test case. The main sequence of events is as follows:
> 
> * Startup the cluster using the following steps
>   - Start the database on node 1 and make sure it is the primary database
>   - Start pgpool on node 1 and wait a few seconds
>   - Start pgpool on node 2 and 3 and wait a few seconds
>   - Initialize the databases on node 2 and 3, setting them up as standby
>   - The result now is: node 1 is primary db and leader, node 2 and 3 are
> standby db and standby pgpool watchdog
> * Shtudown and reboot node 2
> * Wait for node 2 to come back up
> * Shutdown and reboot node 3
> * Wait for node 3 to come back up
> * Restart pgpool on node 1 to force it to drop its leader status and wait a
> few seconds
> * Shutdown and detach the database on node 1 to trigger a failover and wait
> for a new primary to be selected
> * Wait 10 more seconds
> * Rewind the database on node 1 and instruct it to follow the new primary
> * (CRASH ON NODE 2) Wait for the now standby database on node 1 to be fully
> synchronized with the new primary
> * (CRASH ON NODE 1) Shutdown and reboot node 1
> * Wait for node 1 to come back up
> * Verify the integrity of the cluster (all databases and pgpool instances
> must be up)
> 
> With Valgrind, I've been able to determine that the SIGSEGV on node 1 has
> the same backtrace as the SIGSEGV on node 3 (the one in
> free_persistent_db_connection_memory). I'm not sure about the exact moment
> for the crash on node 3, as it happens less frequently and I don't have
> reliable logs of a crash at the moment. The crash on node 2
> (get_query_result in pool_worker_child.c) is by far the most frequent,
> happening about 2 out of 3 times. It seems to happen at the moment when the
> database on node 1 is started up again, which triggers a failback. When the
> database on node 1 reports "database system is ready to accept read only
> connections", only a few ms later, the SIGSEGV happens on node 2.
> 
> I dove into the code, and I think I've found the cause of the error. Just
> prior to crashing, it reports "find_primary_node:
> make_persistent_db_connection_noerror failed on node 0". This must come
> from pgpool_main.c:2782. This means that slots[0] is NULL. Then, at
> pgpool_main.c:2791 it enters verify_backend_node_status with this slots
> array. At lines 2569-2579 it loops over these slots,
> calling get_server_version for every slot, including slots[0], which is
> NULL. This crashes when get_server_version calls get_query_result, which
> tries to dereference slots[0]->con. At pgpool_main.c:2456 there is an
> explicit check for NULL, this is missing in the other for loop, but it is
> also missing at line 2609.

But there's a check at line 2604 of pgpool_main.c:

if (pool_node_status[j] == POOL_NODE_STATUS_STANDBY)

If pool_node_status[j] is POOL_NODE_STATUS_STANDBY, the target node
(0) must be alive in the past. I suspect node 0 goes down after the
pool_node_status[j] was updated. I should have checked slots
availability before calling get_query_result at 2609.

As of crash in health_check.c, I think I have found the cause.  The
connection info is cached in HealthCheckMemoryContext, which is
pointed to by "slot" (a static variable). When an error occurred,
ereport(ERROR) jumps to line 159. Then the code proceeds to the for
loop starting at line 171. At line 174
MemoryContextResetAndDeleteChildren(HealthCheckMemoryContext) is
called and the connection info is discarded [1]. Problem is, the value
of "slot" remains, which means that slot points to freed memory. We
should have cleared slot there.

Same issue is found in pool_worker_child.c.

Attached is the patch for the above.

> I've attached another log from node 2, with the latest patch applied. I
> still see some uninitialized values. I've enabled origin tracking in
> valgrind to get detailed information on the origin of the uninitialized
> values.

I will look into it later.

Best reagards,

[1] Pgpool-II imports PostgreSQL's memory management module.
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: segfault.patch
Type: text/x-patch
Size: 1687 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20240404/ed994bc2/attachment.bin>