View Issue Details

IDProjectCategoryView StatusLast Update
0000384Pgpool-IIBugpublic2018-03-14 22:15
Reportergsucameli.nr Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionopen 
PlatformLinuxOSCentOSOS Version6.8
Product Version3.7.2 
Target Version3.7.3 
Summary0000384: Pool worker children uses random primary_node_id when in master_slave replication
DescriptionAFAICT the mechanism used to share POOL_REQUEST_INFO struct doesn't work on my CentOS 6.8 64-bit.

Here's the configuration:
- num_init_children = 50
- max_pool = 4
- master_slave_mode = on
- master_slave_sub_mode = 'stream'
- replication_mode = off
- load_balance_mode = on
- 2 backend nodes configured: 0 is the slave node, 1 is the primary one.
- white_function_list = ''
- black_function_list = 'nextval,setval,currval,lastval'

What I can see is a SELECT NEXTVAL('...') query randomly sent to the wrong backend.

Looking at the log file I can see the primary node is correctly identified:

> pid 32107: LOG: find_primary_node: primary node id is 1


Enabling DEBUG1 also NEXTVAL queries are recognized to be sent to primary node:

> pid 1634: DETAIL: destination = 2 for query= "SELECT NEXTVAL('change_log_id_seq')"
> pid 1634: DEBUG: function call walker, function name: "nextval"
> pid 1634: DEBUG: could not load balance because writing functions are used


BTW lots of workers are reporting a wrong PRIMARY_NODE_ID value, but also the same worker (e.g. pid 2155) sometimes reports PRIMARY_NODE_ID:1 and after a while PRIMARY_NODE_ID:0:

> pid 2155: DEBUG: pool_virtual_master_db_node_id: virtual_master_node_id:1 load_balance_node_id:1 PRIMARY_NODE_ID:1
> pid 2157: DEBUG: pool_virtual_master_db_node_id: virtual_master_node_id:0 load_balance_node_id:1 PRIMARY_NODE_ID:0
...
> pid 2156: DEBUG: pool_virtual_master_db_node_id: virtual_master_node_id:0 load_balance_node_id:1 PRIMARY_NODE_ID:1
> pid 2155: DEBUG: pool_virtual_master_db_node_id: virtual_master_node_id:0 load_balance_node_id:1 PRIMARY_NODE_ID:0


No failovers are occurring (failover is not configured at all).

Looking at code I can see:
1. the shared memory for POOL_REQUEST_INFO is created and attached by the parent process,
2. child processes are forked,
3. the Req_info->primary_node_id value is set

Since child process doesn't change that value and the parent process changes it on failover, but no failovers are occurring.
So the only reason each process see a different PRIMARY_NODE_ID value is the shared memory is not mapped correctly on child processes address space.

The issue disappears after changing the source code to make all children processes attaching the shared memory used by Req_info.
In addition looking at logs I can see all pool worker children reporting the same (and right) PRIMARY_NODE_ID value, i.e. 1.


I'm attaching the patch I used, BTW it is just something to understand the issue and I guess cannot be merged as is.

Parent process calls shmget, stores Req_info_shmid and then calls shmat, while child processes calls only shmat on the shared Req_info_shmid.
TagsNo tags attached.

Activities

gsucameli.nr

2018-03-08 03:01

reporter  

shmem_attach.diff (15,150 bytes)   
shmem_attach.diff (15,150 bytes)   

gsucameli.nr

2018-03-08 06:55

reporter   ~0001951

I have tried the current master and it seems not affected...
so maybe the issue has been fixed yet on master and it doesn't depend on shared memory...
I'm investigating further.

gsucameli.nr

2018-03-08 07:26

reporter   ~0001952

The issue can be closed as invalid.

I've built v3.7.2 from source (instead of using RPM) and the issue is there.
Then found that it has been fixed in commit 6b757807196dde (not related to shared mem at all...).

t-ishii

2018-03-14 22:14

developer   ~0001959

Thank you for the report. Yes, I think you were bitten by a bug in 3.7.2. Sorry for inconvenience.

I am going to close this issue.

Issue History

Date Modified Username Field Change
2018-03-08 03:01 gsucameli.nr New Issue
2018-03-08 03:01 gsucameli.nr File Added: shmem_attach.diff
2018-03-08 06:55 gsucameli.nr Note Added: 0001951
2018-03-08 07:26 gsucameli.nr Note Added: 0001952
2018-03-14 22:14 t-ishii Note Added: 0001959
2018-03-14 22:15 t-ishii Status new => resolved
2018-03-14 22:15 t-ishii Target Version => 3.7.3
2018-03-14 22:15 t-ishii Additional Information Updated