View Issue Details

IDProjectCategoryView StatusLast Update
0000054Pgpool-IIBugpublic2013-06-26 11:55
Reporterarshu aroraAssigned Tonagata 
PriorityhighSeveritymajorReproducibilityunable to reproduce
Status resolvedResolutionfixed 
Platform RHEL5.5 x86_64OSOS Version
Product Version 
Target VersionFixed in Version 
Summary0000054: pgpool-II semaphore lock problem
DescriptionHi,

One of our customer is facing pgpool semaphore lock problem

(Conditions)

・Server BL460c G7 × 3

・OS: RHEL 5.5 x64

・Streaming Replication(Async)master/slave configuration

・Clients: Using Apache, Jmeter, and JDBC connections.


(phenomenon)

pgpool-II failover does not work properly many times.


(inquiry)

1.) Is there a way to decrease the frequency of the failures by something like to adjust kernel parameter of semaphore or pgpool configurations?


2.) Information showned below is a result of analysis by the customer.

Do you think the analysis showned below will be an adequate conclusion for this issue?


The customer made a lots of client requests for a certain duration to give their pgpool system some load by using JMeter. Then in such a conditions, the pgpool could handle first failover a few times, but it got hang up the rest of time. Besides, second failover was not executed.


The customer's request is to realize to be able to failover multiple times.


According to the customer saying, it looked like stopping at semaphore lock like below.


watchdog process

#0 0x00000037e90d4ff7 in semop () from /lib64/libc.so.6

0000001 0x0000000000421b1a in pool_semaphore_lock (semNum=<value optimized out>) at pool_sema.c:128

0000002 0x0000000000405cc0 in degenerate_backend_set (node_id_set=0x7fff770f3838, count=1) at main.c:1475

0000003 0x00000000004740e0 in wd_child (fork_wait_time=<value optimized out>) at wd_child.c:235

0000004 0x0000000000473b61 in wd_main (fork_wait_time=1) at watchdog.c:127

0000005 0x0000000000407ecc in main (argc=<value optimized out>, argv=0x7fff770f9da8) at main.c:643

 

main process

#0 0x00000035864d4ff7 in semop () from /lib64/libc.so.6

0000001 0x0000000000421b1a in pool_semaphore_lock (semNum=<value optimized out>) at pool_sema.c:128

0000002 0x0000000000404e4b in failover () at main.c:1717

0000003 0x0000000000404d35 in pool_sleep (second=<value optimized out>) at main.c:2501

0000004 0x0000000000407715 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:846


Then based on the information above, they had looked at the source codes of pgpool-II 3.2.1 community version.

And what they found was that a semaphore lock confliction was happened between pgpool-II main process which is parent and its child process. So if there are lots of client requests for a certain duration, many active child processes are of course spawned, and then it leads to possibilities to make it hang up because it stops at semaphore lock procedure many times when it's failover function of parent process.


Besides, the watchdog process is also a child process of the main process, hence a competition to get the semaphore lock will be tend to intensive conditions, which also means occasions of pgpool-II hang-up will be also increase.

attached "info_pack0.zip" contents

---

JDBC_TEST_EDB_select_e.jmx --- for Apache Jmeter

pgpool.conf.cop01 --- was used on test case, for detail, pls refer "pgpool_watchdog_JDBC_testresults_20130329.xlsx"

pgpool.conf.cop02 --- was used on test case, for detail, pls refer "pgpool_watchdog_JDBC_testresults_20130329.xlsx"

pgpool_DEBUG_log.zip --- debug logs when the test case was done by the customer, you will know the time reffering "pgpool_watchdog_JDBC_testresults_20130329.xlsx"


TagsNo tags attached.

Activities

arshu arora

2013-04-05 18:15

reporter  

pgpool.conf.cop01 (25,508 bytes)

arshu arora

2013-04-08 18:18

reporter  

info_pack_02.zip (470,914 bytes)

nagata

2013-05-13 18:52

developer   ~0000274

Last edited: 2013-05-13 18:59

View 2 revisions

Thank you for reporting and analysis.

I'd investigated the code and realized that a deadlock situation could occur in failover when watchdog is enabled.

After backend DB failure is detected, degenerate_backend_set() sends a failover request signal within a semaphore lock. In addition, when watchdog enabled, this function also sends packets to other pgpools to propagate the failover request. However, if the other pgpool does the same simultaneously , a deadlock occurs, because the failover request have to wait for the lock of each other . Responses of the packets never return, so both pgpools are blocked.

If there are lots of client, the deadlock will occur more frequently because a lots of simultaneous failover requests and locks will occur.

To resolve it, I wrote a patch to fix to return the response packet before the failover request is signaled. Once the response is returned the semaphore is unlocked soon, and deadlock can be avoided.

Could you please try the attached patch?

nagata

2013-05-13 18:53

developer  

patch_wd_child_0513.diff (2,669 bytes)
diff --git a/watchdog/wd_child.c b/watchdog/wd_child.c
index e6a1486..3c661c3 100644
--- a/watchdog/wd_child.c
+++ b/watchdog/wd_child.c
@@ -37,6 +37,7 @@
 pid_t wd_child(int fork_wait_time);
 static void wd_child_exit(int exit_signo);
 static int send_response(int sock, WdPacket * recv_pack);
+static void wd_node_request_signal(WD_PACKET_NO packet_no, WdNodeInfo *node);
 
 pid_t
 wd_child(int fork_wait_time)
@@ -95,7 +96,7 @@ wd_child(int fork_wait_time)
 		rtn = wd_recv_packet(fd, &buf);
 		if (rtn == WD_OK)
 		{
-			send_response(fd,&buf);
+			send_response(fd, &buf);
 		}
 		close(fd);
 	}
@@ -125,6 +126,7 @@ send_response(int sock, WdPacket * recv_pack)
 	WdNodeInfo * node;
 	WdPacket send_packet;
 	struct timeval tv;
+	bool is_node_packet = false;
 
 	if (recv_pack == NULL)
 	{
@@ -226,19 +228,19 @@ send_response(int sock, WdPacket * recv_pack)
 		case WD_FAILBACK_REQUEST:
 			node = &(recv_pack->wd_body.wd_node_info);	
 			wd_set_node_mask(WD_FAILBACK_REQUEST,node->node_id_set,node->node_num);
-			send_failback_request(node->node_id_set[0]);
+			is_node_packet = true;
 			send_packet.packet_no = WD_NODE_READY;
 			break;
 		case WD_DEGENERATE_BACKEND:
 			node = &(recv_pack->wd_body.wd_node_info);	
 			wd_set_node_mask(WD_DEGENERATE_BACKEND,node->node_id_set, node->node_num);
-			degenerate_backend_set(node->node_id_set, node->node_num);
+			is_node_packet = true;
 			send_packet.packet_no = WD_NODE_READY;
 			break;
 		case WD_PROMOTE_BACKEND:
 			node = &(recv_pack->wd_body.wd_node_info);	
 			wd_set_node_mask(WD_PROMOTE_BACKEND,node->node_id_set, node->node_num);
-			promote_backend(node->node_id_set[0]);
+			is_node_packet = true;
 			send_packet.packet_no = WD_NODE_READY;
 			break;
 		default:
@@ -249,5 +251,34 @@ send_response(int sock, WdPacket * recv_pack)
 
 	/* send response packet */
 	rtn = wd_send_packet(sock, &send_packet);
+
+	/* send node request signal.
+	 * wd_node_request_singnal() uses a semaphore lock internally, so should be
+	 * called after sending a response pakcet to prevent dead lock.
+	 */
+	if (is_node_packet)
+		wd_node_request_signal(recv_pack->packet_no, node);
+
 	return rtn;
 }
+
+/* send node request signal */
+static void
+wd_node_request_signal(WD_PACKET_NO packet_no, WdNodeInfo *node)
+{
+	switch (packet_no)
+	{
+		case WD_FAILBACK_REQUEST:
+			send_failback_request(node->node_id_set[0]);
+			break;
+		case WD_DEGENERATE_BACKEND:
+			degenerate_backend_set(node->node_id_set, node->node_num);
+			break;
+		case WD_PROMOTE_BACKEND:
+			promote_backend(node->node_id_set[0]);
+			break;
+		default:
+			pool_error("wd_node_request_signal: unknown packet number");
+			break;
+	}
+}
patch_wd_child_0513.diff (2,669 bytes)

ahsan.hadi@enterprisedb.com

2013-05-15 16:40

reporter   ~0000282

Hi Nagata,

Thanks for a submitting a fix for this issue. We have a customer who is running into this issue and waiting for a fix. We are in process of verifying the fix and will update the thread accordingly once we are done. Were you able to reproduce the system hang issue, did you use some tool like pgbench or jmeter (customer is using this) to generate the load with lots of concurrent clients?

Thanks.

Muhammad Usama

2013-05-15 20:44

developer   ~0000283

Hi Nagata,

I tried to reproduce the issue and couldn't replicate the exact deadlock scenario as mentioned in the bug description. But while trying to generate the problem when I connect large amount of clients with watchdog is enabled, instead of getting the deadlock when failover occurs the pgpool-II log gets flooded with lot of error messages.
"ERROR: pid 22825: semop(id=9371663) failed: Invalid argument "
"ERROR: pid 22840: semop(id=9371663) failed: Invalid argument "
"ERROR: pid 22840: semop(id=9371663) failed: Invalid argument "
"ERROR: pid 22840: semop(id=9371663) failed: Invalid argument "
"ERROR: pid 22819: semop(id=9371663) failed: Invalid argument "
.
.
...

This error comes from pool_seme.c line 128 and setting the break point in the code shows the following call stack.

(gdb) bt
#0 0xb7599250 in semop () from /lib/i386-linux-gnu/libc.so.6
0000001 0x0806b9f8 in pool_semaphore_lock (semNum=0) at pool_sema.c:128
0000002 0x08051b0f in connection_count_up () at child.c:1871
0000003 do_accept (unix_fd=<optimized out>, inet_fd=<optimized out>, timeout=0x0) at child.c:661
0000004 0x08052aff in do_child (unix_fd=4, inet_fd=5) at child.c:194
0000005 0x0804e700 in fork_a_child (unix_fd=4, inet_fd=5, id=12) at main.c:1233
0000006 0x0804ff40 in failover () at main.c:1991
0000007 0x0804cffd in main (argc=4, argv=0xbfab5254) at main.c:769

I think because of the failed semop() function call I am not getting the deadlock and would have got the deadlock if the call was successful.

After applying your patch I performed the same steps multiple times and never got the deadlock or the semop() error.
In my opinion the patch successfully fixes this issue and should also fix the deadlock scenario as mentioned in the original issue statement.

Thanks

nagata

2013-05-17 12:18

developer   ~0000286

Hi Ahsan and Usama,

I couldn't reproduce neigther the deadlock or semop error, but glad to hear that the issue is fixed.
I committed the patch to master and V3_2_STABLE branch.

Issue History

Date Modified Username Field Change
2013-04-04 19:28 arshu arora New Issue
2013-04-04 19:28 arshu arora File Added: new.tar.gz
2013-04-05 18:15 arshu arora File Added: pgpool.conf.cop01
2013-04-06 07:49 t-ishii File Deleted: new.tar.gz
2013-04-08 18:18 arshu arora File Added: info_pack_02.zip
2013-05-12 01:04 t-ishii Assigned To => t-ishii
2013-05-12 01:04 t-ishii Status new => assigned
2013-05-12 01:04 t-ishii Assigned To t-ishii => nagata
2013-05-13 18:52 nagata Note Added: 0000274
2013-05-13 18:53 nagata File Added: patch_wd_child_0513.diff
2013-05-13 18:59 nagata Note Edited: 0000274 View Revisions
2013-05-15 16:40 ahsan.hadi@enterprisedb.com Note Added: 0000282
2013-05-15 20:44 Muhammad Usama Note Added: 0000283
2013-05-17 12:18 nagata Note Added: 0000286
2013-06-26 11:55 nagata Status assigned => resolved
2013-06-26 11:55 nagata Resolution open => fixed