View Issue Details

IDProjectCategoryView StatusLast Update
0000384Pgpool-IIBugpublic2018-03-14 22:15
Reportergsucameli.nrAssigned To 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionopen 
PlatformLinuxOSCentOSOS Version6.8
Product Version3.7.2 
Target Version3.7.3Fixed in Version 
Summary0000384: Pool worker children uses random primary_node_id when in master_slave replication
DescriptionAFAICT the mechanism used to share POOL_REQUEST_INFO struct doesn't work on my CentOS 6.8 64-bit.

Here's the configuration:
- num_init_children = 50
- max_pool = 4
- master_slave_mode = on
- master_slave_sub_mode = 'stream'
- replication_mode = off
- load_balance_mode = on
- 2 backend nodes configured: 0 is the slave node, 1 is the primary one.
- white_function_list = ''
- black_function_list = 'nextval,setval,currval,lastval'

What I can see is a SELECT NEXTVAL('...') query randomly sent to the wrong backend.

Looking at the log file I can see the primary node is correctly identified:

> pid 32107: LOG: find_primary_node: primary node id is 1


Enabling DEBUG1 also NEXTVAL queries are recognized to be sent to primary node:

> pid 1634: DETAIL: destination = 2 for query= "SELECT NEXTVAL('change_log_id_seq')"
> pid 1634: DEBUG: function call walker, function name: "nextval"
> pid 1634: DEBUG: could not load balance because writing functions are used


BTW lots of workers are reporting a wrong PRIMARY_NODE_ID value, but also the same worker (e.g. pid 2155) sometimes reports PRIMARY_NODE_ID:1 and after a while PRIMARY_NODE_ID:0:

> pid 2155: DEBUG: pool_virtual_master_db_node_id: virtual_master_node_id:1 load_balance_node_id:1 PRIMARY_NODE_ID:1
> pid 2157: DEBUG: pool_virtual_master_db_node_id: virtual_master_node_id:0 load_balance_node_id:1 PRIMARY_NODE_ID:0
...
> pid 2156: DEBUG: pool_virtual_master_db_node_id: virtual_master_node_id:0 load_balance_node_id:1 PRIMARY_NODE_ID:1
> pid 2155: DEBUG: pool_virtual_master_db_node_id: virtual_master_node_id:0 load_balance_node_id:1 PRIMARY_NODE_ID:0


No failovers are occurring (failover is not configured at all).

Looking at code I can see:
1. the shared memory for POOL_REQUEST_INFO is created and attached by the parent process,
2. child processes are forked,
3. the Req_info->primary_node_id value is set

Since child process doesn't change that value and the parent process changes it on failover, but no failovers are occurring.
So the only reason each process see a different PRIMARY_NODE_ID value is the shared memory is not mapped correctly on child processes address space.

The issue disappears after changing the source code to make all children processes attaching the shared memory used by Req_info.
In addition looking at logs I can see all pool worker children reporting the same (and right) PRIMARY_NODE_ID value, i.e. 1.


I'm attaching the patch I used, BTW it is just something to understand the issue and I guess cannot be merged as is.

Parent process calls shmget, stores Req_info_shmid and then calls shmat, while child processes calls only shmat on the shared Req_info_shmid.
TagsNo tags attached.

Activities

gsucameli.nr

2018-03-08 03:01

reporter  

shmem_attach.diff (15,150 bytes)
diff --git a/src/config/pool_config.c b/src/config/pool_config.c
index 0fe30ef..dd612b3 100644
--- a/src/config/pool_config.c
+++ b/src/config/pool_config.c
@@ -1833,7 +1833,7 @@ int pool_init_config(void)
 	memset(pool_config, 0, sizeof(POOL_CONFIG));
 
 #ifndef POOL_PRIVATE
-	g_pool_config.backend_desc = pool_shared_memory_create(sizeof(BackendDesc));
+	g_pool_config.backend_desc = pool_shared_memory_create(sizeof(BackendDesc), NULL);
 	memset(g_pool_config.backend_desc, 0, sizeof(BackendDesc));
 #else
 	g_pool_config.backend_desc = palloc0(sizeof(BackendDesc));
diff --git a/src/config/pool_config.l b/src/config/pool_config.l
index 082637c..fa7d246 100644
--- a/src/config/pool_config.l
+++ b/src/config/pool_config.l
@@ -108,7 +108,7 @@ int pool_init_config(void)
 	memset(pool_config, 0, sizeof(POOL_CONFIG));
 
 #ifndef POOL_PRIVATE
-	g_pool_config.backend_desc = pool_shared_memory_create(sizeof(BackendDesc));
+	g_pool_config.backend_desc = pool_shared_memory_create(sizeof(BackendDesc), NULL);
 	memset(g_pool_config.backend_desc, 0, sizeof(BackendDesc));
 #else
 	g_pool_config.backend_desc = palloc0(sizeof(BackendDesc));
diff --git a/src/include/pool.h b/src/include/pool.h
index 48cd829..27129f7 100644
--- a/src/include/pool.h
+++ b/src/include/pool.h
@@ -511,6 +511,7 @@ extern long int weight_master;	/* normalized weight of master (0-RAND_MAX range)
 extern int my_proc_id;  /* process table id (!= UNIX's PID) */
 extern ProcessInfo *process_info; /* shmem process information table */
 extern ConnectionInfo *con_info; /* shmem connection info table */
+extern int Req_info_shmid;
 extern POOL_REQUEST_INFO *Req_info;
 extern volatile sig_atomic_t *InRecovery;
 extern char remote_ps_data[];		/* used for set_ps_display */
@@ -622,7 +623,8 @@ extern void child_exit(int code);
 extern void init_prepared_list(void);
 extern void proc_exit(int);
 
-extern void *pool_shared_memory_create(size_t size);
+extern void *pool_shared_memory_attach(int shmid);
+extern void *pool_shared_memory_create(size_t size, int *o_shmid);
 extern void pool_shmem_exit(int code);
 
 extern void pool_semaphore_create(int numSems);
diff --git a/src/main/health_check.c b/src/main/health_check.c
index de3eda8..ddcc882 100644
--- a/src/main/health_check.c
+++ b/src/main/health_check.c
@@ -131,7 +131,9 @@ void do_health_check_child(int *node_id)
 	signal(SIGUSR2, SIG_IGN);
 	signal(SIGPIPE, SIG_IGN);
 
-    /* Create per loop iteration memory context */
+	Req_info = pool_shared_memory_attach(Req_info_shmid);
+
+	/* Create per loop iteration memory context */
 	HealthCheckMemoryContext = AllocSetContextCreate(TopMemoryContext,
                                              "health_check_main_loop",
 													 ALLOCSET_DEFAULT_MINSIZE,
diff --git a/src/main/pgpool_main.c b/src/main/pgpool_main.c
index 1b4da1e..b7b0ec9 100644
--- a/src/main/pgpool_main.c
+++ b/src/main/pgpool_main.c
@@ -183,6 +183,7 @@ extern char *hba_file;
 static int exiting = 0;		/* non 0 if I'm exiting */
 static int switching = 0;		/* non 0 if I'm failing over or degenerating */
 
+int Req_info_shmid = -1;
 POOL_REQUEST_INFO *Req_info;		/* request info area in shared memory */
 volatile sig_atomic_t *InRecovery; /* non 0 if recovery is started */
 volatile sig_atomic_t reload_config_request = 0;
@@ -236,7 +237,7 @@ int PgpoolMain(bool discard_status, bool clear_memcache_oidmaps)
 	/*
 	 * install the call back for preparation of system exit
 	 */
-    on_system_exit(system_will_go_down, (Datum)NULL);
+	on_system_exit(system_will_go_down, (Datum)NULL);
 
 	/* set unix domain socket path for connections to pgpool */
 	snprintf(un_addr.sun_path, sizeof(un_addr.sun_path), "%s/.s.PGSQL.%d",
@@ -3030,7 +3031,7 @@ static void initialize_shared_mem_objects(bool clear_memcache_oidmaps)
 	 * connection pool in each process. Of course this was wrong.
 	 */
 	size = pool_coninfo_size();
-	con_info = pool_shared_memory_create(size);
+	con_info = pool_shared_memory_create(size, NULL);
 	memset(con_info, 0, size);
 
 	size = pool_config->num_init_children * (sizeof(ProcessInfo));
@@ -3041,7 +3042,7 @@ static void initialize_shared_mem_objects(bool clear_memcache_oidmaps)
 					sizeof(ProcessInfo),
 					size)));
 
-	process_info = pool_shared_memory_create(size);
+	process_info = pool_shared_memory_create(size, NULL);
 	memset(process_info, 0, size);
 
 	for (i = 0; i < pool_config->num_init_children; i++)
@@ -3049,9 +3050,9 @@ static void initialize_shared_mem_objects(bool clear_memcache_oidmaps)
 		process_info[i].connection_info = pool_coninfo(i,0,0);
 	}
 
-	user1SignalSlot = pool_shared_memory_create(sizeof(User1SignalSlot));
+	user1SignalSlot = pool_shared_memory_create(sizeof(User1SignalSlot), NULL);
 	/* create fail over/switch over event area */
-	Req_info = pool_shared_memory_create(sizeof(POOL_REQUEST_INFO));
+	Req_info = pool_shared_memory_create(sizeof(POOL_REQUEST_INFO), &Req_info_shmid);
 
 	ereport(DEBUG1,
 			(errmsg("Request info are: sizeof(POOL_REQUEST_INFO) %zu bytes requested for shared memory",
@@ -3073,7 +3074,7 @@ static void initialize_shared_mem_objects(bool clear_memcache_oidmaps)
 	Req_info->switching = false;
 	Req_info->request_queue_head = Req_info->request_queue_tail = -1;
 	Req_info->primary_node_id = -2;
-	InRecovery = pool_shared_memory_create(sizeof(int));
+	InRecovery = pool_shared_memory_create(sizeof(int), NULL);
 	*InRecovery = RECOVERY_INIT;
 
 	ereport(DEBUG1,
@@ -3131,7 +3132,7 @@ static void initialize_shared_mem_objects(bool clear_memcache_oidmaps)
 	}
 
 	/* Initialize statistics area */
-	stat_set_stat_area(pool_shared_memory_create(stat_shared_memory_size()));
+	stat_set_stat_area(pool_shared_memory_create(stat_shared_memory_size(), NULL));
 	stat_init_stat_area();
 	/* initialize watchdog IPC unix domain socket address */
 	if (pool_config->use_watchdog)
diff --git a/src/pcp_con/pcp_child.c b/src/pcp_con/pcp_child.c
index 5c919fb..3926634 100644
--- a/src/pcp_con/pcp_child.c
+++ b/src/pcp_con/pcp_child.c
@@ -108,7 +108,7 @@ void pcp_main(int unix_fd, int inet_fd)
 	pcp_unix_fd = unix_fd;
 	pcp_inet_fd = inet_fd;
 
-	pcp_recovery_in_progress = pool_shared_memory_create(sizeof(bool));
+	pcp_recovery_in_progress = pool_shared_memory_create(sizeof(bool), NULL);
 	*pcp_recovery_in_progress = false;
 	/*
 	 * install the call back for preparation of exit
@@ -127,6 +127,8 @@ void pcp_main(int unix_fd, int inet_fd)
 	pool_signal(SIGPIPE, SIG_IGN);
 	pool_signal(SIGALRM, SIG_IGN);
 
+	Req_info = pool_shared_memory_attach(Req_info_shmid);
+
 	MemoryContextSwitchTo(TopMemoryContext);
 
 	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
diff --git a/src/pcp_con/pcp_worker.c b/src/pcp_con/pcp_worker.c
index 952904a..57644ff 100644
--- a/src/pcp_con/pcp_worker.c
+++ b/src/pcp_con/pcp_worker.c
@@ -122,6 +122,9 @@ pcp_worker_main(int port)
 	signal(SIGHUP, SIG_IGN);
 	signal(SIGPIPE, SIG_IGN);
 	signal(SIGALRM, SIG_IGN);
+
+        Req_info = pool_shared_memory_attach(Req_info_shmid);
+
 	/* Create per loop iteration memory context */
 	PCPMemoryContext = AllocSetContextCreate(TopMemoryContext,
 											 "PCP_worker_main_loop",
diff --git a/src/protocol/child.c b/src/protocol/child.c
index 066b035..bdac203 100644
--- a/src/protocol/child.c
+++ b/src/protocol/child.c
@@ -156,6 +156,8 @@ void do_child(int *fds)
 
 	on_system_exit(child_will_go_down, (Datum)NULL);
 
+	Req_info = pool_shared_memory_attach(Req_info_shmid);
+
 	int *walk;
 #ifdef NONE_BLOCK
 	/* set listen fds to none-blocking */
diff --git a/src/query_cache/pool_memqcache.c b/src/query_cache/pool_memqcache.c
index f2d35e3..a398f30 100644
--- a/src/query_cache/pool_memqcache.c
+++ b/src/query_cache/pool_memqcache.c
@@ -1766,7 +1766,7 @@ int pool_init_memory_cache(size_t size)
 	ereport(DEBUG1,
 		(errmsg("memory cache request size : %zd",size)));
 
-	shmem = pool_shared_memory_create(size);
+	shmem = pool_shared_memory_create(size, NULL);
 	return 0;
 }
 
@@ -1861,7 +1861,7 @@ int pool_init_fsmm(size_t size)
 	int maxblock = 	pool_get_memqcache_blocks();
 	int encode_value;
 
-	fsmm = pool_shared_memory_create(size);
+	fsmm = pool_shared_memory_create(size, NULL);
 	encode_value = POOL_MAX_FREE_SPACE/POOL_FSMM_RATIO;
 	memset(fsmm, encode_value, maxblock);
 	return 0;
@@ -1889,7 +1889,7 @@ static int *pool_fsmm_clock_hand;
  */
 void pool_allocate_fsmm_clock_hand(void)
 {
-	pool_fsmm_clock_hand = pool_shared_memory_create(sizeof(*pool_fsmm_clock_hand));
+	pool_fsmm_clock_hand = pool_shared_memory_create(sizeof(*pool_fsmm_clock_hand), NULL);
 	*pool_fsmm_clock_hand = 0;
 }
 
@@ -3375,7 +3375,7 @@ void pool_handle_query_cache(POOL_CONNECTION_POOL *backend, char *query, Node *n
 static POOL_QUERY_CACHE_STATS *stats;
 int pool_init_memqcache_stats(void)
 {
-	stats = pool_shared_memory_create(sizeof(POOL_QUERY_CACHE_STATS));
+	stats = pool_shared_memory_create(sizeof(POOL_QUERY_CACHE_STATS), NULL);
 	pool_reset_memqcache_stats();
 	return 0;
 }
@@ -3524,7 +3524,7 @@ int pool_hash_init(int nelements)
 	mask = ~0;
 	mask >>= shift;
 	size = (char *)&hh.elements - (char *)&hh + sizeof(POOL_HEADER_ELEMENT)*nelements2;
-	hash_header = pool_shared_memory_create(size);
+	hash_header = pool_shared_memory_create(size, NULL);
 	hash_header->nhash = nelements2;
     hash_header->mask = mask;
 
@@ -3536,7 +3536,7 @@ int pool_hash_init(int nelements)
 #endif
 
 	size = sizeof(POOL_HASH_ELEMENT)*nelements2;
-	hash_elements = pool_shared_memory_create(size);
+	hash_elements = pool_shared_memory_create(size, NULL);
 
 #ifdef POOL_HASH_DEBUG
 	ereport(LOG,
diff --git a/src/streaming_replication/pool_worker_child.c b/src/streaming_replication/pool_worker_child.c
index 7715876..81e7416 100644
--- a/src/streaming_replication/pool_worker_child.c
+++ b/src/streaming_replication/pool_worker_child.c
@@ -120,7 +120,9 @@ void do_worker_child(void)
 	signal(SIGUSR2, SIG_IGN);
 	signal(SIGPIPE, SIG_IGN);
 
-    /* Create per loop iteration memory context */
+	Req_info = pool_shared_memory_attach(Req_info_shmid);
+
+	/* Create per loop iteration memory context */
 	WorkerMemoryContext = AllocSetContextCreate(TopMemoryContext,
                                              "Worker_main_loop",
                                              ALLOCSET_DEFAULT_MINSIZE,
diff --git a/src/tools/pgmd5/pool_config.c b/src/tools/pgmd5/pool_config.c
index 278de70..8e24ac5 100644
--- a/src/tools/pgmd5/pool_config.c
+++ b/src/tools/pgmd5/pool_config.c
@@ -1833,7 +1833,7 @@ int pool_init_config(void)
 	memset(pool_config, 0, sizeof(POOL_CONFIG));
 
 #ifndef POOL_PRIVATE
-	g_pool_config.backend_desc = pool_shared_memory_create(sizeof(BackendDesc));
+	g_pool_config.backend_desc = pool_shared_memory_create(sizeof(BackendDesc), NULL);
 	memset(g_pool_config.backend_desc, 0, sizeof(BackendDesc));
 #else
 	g_pool_config.backend_desc = palloc0(sizeof(BackendDesc));
diff --git a/src/utils/pool_shmem.c b/src/utils/pool_shmem.c
index 71c5a98..047a5fd 100644
--- a/src/utils/pool_shmem.c
+++ b/src/utils/pool_shmem.c
@@ -40,15 +40,42 @@ static void IpcMemoryDelete(int status, Datum shmId);
 
 
 /*
+ * Attach to the passed a shared memory segment.  Also,
+ * register an on_shmem_exit callback to release the storage.
+ */
+void *
+pool_shared_memory_attach(int shmid)
+{
+        void       *memAddress;
+
+        /* attach to the segment */
+        memAddress = shmat(shmid, NULL, PG_SHMAT_FLAGS);
+
+        if (memAddress == (void *) -1)
+                ereport(FATAL,
+                        (errmsg("could not attach shared memory [id:%d]",shmid),
+                                errdetail("attach to shared memory [id:%d] failed with reason: \"%s\"",shmid,strerror(errno))));
+
+
+        /* Register on-exit routine to detach new segment before deleting */
+        on_shmem_exit(IpcMemoryDetach, (Datum) memAddress);
+
+        return memAddress;
+}
+
+/*
  * Create a shared memory segment of the given size and initialize.  Also,
  * register an on_shmem_exit callback to release the storage.
  */
 void *
-pool_shared_memory_create(size_t size)
+pool_shared_memory_create(size_t size, int *o_shmid)
 {
 	int			shmid;
 	void	   *memAddress;
 
+	if (o_shmid)
+		*o_shmid = -1;
+
 	/* Try to create new segment */
 	shmid = shmget(IPC_PRIVATE, size, IPC_CREAT | IPC_EXCL | IPCProtection);
 
@@ -61,16 +88,10 @@ pool_shared_memory_create(size_t size)
 	on_shmem_exit(IpcMemoryDelete, shmid);
 
 	/* OK, should be able to attach to the segment */
-	memAddress = shmat(shmid, NULL, PG_SHMAT_FLAGS);
-
-	if (memAddress == (void *) -1)
-		ereport(FATAL,
-			(errmsg("could not create shared memory for request size: %ld",size),
-				errdetail("attach to shared memory [id:%d] failed with reason: \"%s\"",shmid,strerror(errno))));
-
+	memAddress = pool_shared_memory_attach(shmid);
 
-	/* Register on-exit routine to detach new segment before deleting */
-	on_shmem_exit(IpcMemoryDetach, (Datum) memAddress);
+	if (o_shmid)
+		*o_shmid = shmid;
 
 	return memAddress;
 }
diff --git a/src/watchdog/wd_commands.c b/src/watchdog/wd_commands.c
index 4b7ab6a..93d0f1c 100644
--- a/src/watchdog/wd_commands.c
+++ b/src/watchdog/wd_commands.c
@@ -81,13 +81,13 @@ void wd_ipc_initialize_data(void)
 				 pool_config->wd_ipc_socket_dir,
 				 pool_config->wd_port);
 
-		watchdog_ipc_address = pool_shared_memory_create(strlen(wd_ipc_sock_addr) +1);
+		watchdog_ipc_address = pool_shared_memory_create(strlen(wd_ipc_sock_addr) +1, NULL);
 		strcpy(watchdog_ipc_address, wd_ipc_sock_addr);
 	}
 
 	if (ipc_shared_key == NULL)
 	{
-		ipc_shared_key = pool_shared_memory_create(sizeof(unsigned int));
+		ipc_shared_key = pool_shared_memory_create(sizeof(unsigned int), NULL);
 		*ipc_shared_key = 0;
 		while (*ipc_shared_key == 0) {
 			pool_random_salt((char*)ipc_shared_key);
@@ -96,13 +96,13 @@ void wd_ipc_initialize_data(void)
 
 	if (watchdog_require_cleanup == NULL)
 	{
-		watchdog_require_cleanup = pool_shared_memory_create(sizeof(bool));
+		watchdog_require_cleanup = pool_shared_memory_create(sizeof(bool), NULL);
 		*watchdog_require_cleanup = false;
 	}
 
 	if (watchdog_node_escalated == NULL)
 	{
-		watchdog_node_escalated = pool_shared_memory_create(sizeof(bool));
+		watchdog_node_escalated = pool_shared_memory_create(sizeof(bool), NULL);
 		*watchdog_node_escalated = false;
 	}
 }
diff --git a/src/watchdog/wd_lifecheck.c b/src/watchdog/wd_lifecheck.c
index b34f08b..741fbdd 100644
--- a/src/watchdog/wd_lifecheck.c
+++ b/src/watchdog/wd_lifecheck.c
@@ -591,9 +591,9 @@ static void load_watchdog_nodes_from_json(char* json_data, int len)
 	}
 
 	/* okay we are done, put this in shared memory */
-	gslifeCheckCluster = pool_shared_memory_create(sizeof(LifeCheckCluster));
+	gslifeCheckCluster = pool_shared_memory_create(sizeof(LifeCheckCluster), NULL);
 	gslifeCheckCluster->nodeCount = nodeCount;
-	gslifeCheckCluster->lifeCheckNodes = pool_shared_memory_create(sizeof(LifeCheckNode) * gslifeCheckCluster->nodeCount);
+	gslifeCheckCluster->lifeCheckNodes = pool_shared_memory_create(sizeof(LifeCheckNode) * gslifeCheckCluster->nodeCount, NULL);
 	for (i = 0; i < nodeCount; i++)
 	{
 		WDNodeInfo *nodeInfo = get_WDNodeInfo_from_wd_node_json(value->u.array.values[i]);
shmem_attach.diff (15,150 bytes)

gsucameli.nr

2018-03-08 06:55

reporter   ~0001951

I have tried the current master and it seems not affected...
so maybe the issue has been fixed yet on master and it doesn't depend on shared memory...
I'm investigating further.

gsucameli.nr

2018-03-08 07:26

reporter   ~0001952

The issue can be closed as invalid.

I've built v3.7.2 from source (instead of using RPM) and the issue is there.
Then found that it has been fixed in commit 6b757807196dde (not related to shared mem at all...).

t-ishii

2018-03-14 22:14

developer   ~0001959

Thank you for the report. Yes, I think you were bitten by a bug in 3.7.2. Sorry for inconvenience.

I am going to close this issue.

Issue History

Date Modified Username Field Change
2018-03-08 03:01 gsucameli.nr New Issue
2018-03-08 03:01 gsucameli.nr File Added: shmem_attach.diff
2018-03-08 06:55 gsucameli.nr Note Added: 0001951
2018-03-08 07:26 gsucameli.nr Note Added: 0001952
2018-03-14 22:14 t-ishii Note Added: 0001959
2018-03-14 22:15 t-ishii Status new => resolved
2018-03-14 22:15 t-ishii Target Version => 3.7.3
2018-03-14 22:15 t-ishii Additional Information Updated View Revisions