View Issue Details

IDProjectCategoryView StatusLast Update
0000045Pgpool-IIBugpublic2013-01-16 14:01
ReporterrpashinAssigned Tot-ishii 
PriorityhighSeveritymajorReproducibilityalways
Status resolvedResolutionopen 
Platformx86OSLinuxOS VersionDebian 6.0
Product Version 
Target VersionFixed in Version 
Summary0000045: LISTEN/NOTIFY doesn' t work if cluster contains more then 1 node in streaming replication mode
DescriptionThere are two nodes in pgpool-II cluster configured in streaming replication mode.

When I open one psql console on one box and do it from one console:

postgres=# LISTEN TEST1;
LISTEN
postgres=# NOTIFY TEST1;
NOTIFY
Asynchronous notification "test1" received from server process with PID 8812.

I'm getting notification response successfully.

When I'm openinig two different psql sessions from one box, or from different boxes (it doesn't matter) and trying to do the same but from different consoles, "listening" console hangs when I'm trying to receive notification response.

Steps To ReproduceAdd two nodes in cluster. Configure streaming replication between them.

On host1:

host1$ psql postgres
postgres=# LISTEN TEST1;
LISTEN
postgres=#

On host2:

host2$ psql postgres
postgres=# NOTIFY TEST1;
NOTIFY
postgres=#

then on host1:
postgres=# LISTEN TEST1;
...
that's it. It hangs forever after this command.
Additional InformationWhen both nodes are up and with running replication, if I disconnect first node (id=0), secondary node become primary node and if it's the only one alive node in cluster => LISTEN/NOTIFY works as expected. Until I return first node (id=0) back in cluster.
When first node is back - listening console is about to hang again.
TagsNo tags attached.

Activities

rpashin

2012-12-12 02:34

reporter   ~0000178

Sorry, forgot to say that it's last recent 3.2.1 version.

t-ishii

2012-12-12 11:49

developer   ~0000179

Confirmed the problem. Thinking about how to fix it...

t-ishii

2012-12-13 16:40

developer   ~0000185

Ok, I think I fixed the problem. Could you please try attached patches?

t-ishii

2012-12-13 16:41

developer  

listen-notify.patch (3,080 bytes)
diff --git a/pool_process_query.c b/pool_process_query.c
index 69e8eca..22f080b 100644
--- a/pool_process_query.c
+++ b/pool_process_query.c
@@ -1075,28 +1075,36 @@ POOL_STATUS SimpleForwardToFrontend(char kind, POOL_CONNECTION *frontend,
 	}
 	memcpy(p1, p, len);
 
-	for (i=0;i<NUM_BACKENDS;i++)
+	/*
+	 * If we received a notification message in master/slave mode,
+	 * other backends will not receive the message.
+	 * So we should skip other nodes otherwise we will hung in pool_read.
+	 */
+	if (!MASTER_SLAVE || kind != 'A')
 	{
-		if (VALID_BACKEND(i) && !IS_MASTER_NODE_ID(i))
+		for (i=0;i<NUM_BACKENDS;i++)
 		{
-			status = pool_read(CONNECTION(backend, i), &len, sizeof(len));
-			if (status < 0)
+			if (VALID_BACKEND(i) && !IS_MASTER_NODE_ID(i))
 			{
-				pool_error("SimpleForwardToFrontend: error while reading message length");
-				return POOL_END;
-			}
+				status = pool_read(CONNECTION(backend, i), &len, sizeof(len));
+				if (status < 0)
+				{
+					pool_error("SimpleForwardToFrontend: error while reading message length");
+					return POOL_END;
+				}
 
-			len = ntohl(len);
-			len -= 4;
+				len = ntohl(len);
+				len -= 4;
 
-			p = pool_read2(CONNECTION(backend, i), len);
-			if (p == NULL)
-				return POOL_END;
+				p = pool_read2(CONNECTION(backend, i), len);
+				if (p == NULL)
+					return POOL_END;
 
-			if (len != len1)
-			{
-				pool_debug("SimpleForwardToFrontend: length does not match between backends master(%d) %d th backend(%d) kind:(%c)",
- 						   len, i, len1, kind);
+				if (len != len1)
+				{
+					pool_debug("SimpleForwardToFrontend: length does not match between backends master(%d) %d th backend(%d) kind:(%c)",
+							   len, i, len1, kind);
+				}
 			}
 		}
 	}
@@ -3561,6 +3569,7 @@ POOL_STATUS read_kind_from_backend(POOL_CONNECTION *frontend, POOL_CONNECTION_PO
 	double max_count = 0;
 	int degenerate_node_num = 0;		/* number of backends degeneration requested */
 	int degenerate_node[MAX_NUM_BACKENDS];		/* degeneration requested backend list */
+	POOL_STATUS status;
 
 	POOL_MEMORY_POOL *old_context;
 
@@ -3572,6 +3581,31 @@ POOL_STATUS read_kind_from_backend(POOL_CONNECTION *frontend, POOL_CONNECTION_PO
 
 	memset(kind_map, 0, sizeof(kind_map));
 
+	if (MASTER_SLAVE)
+	{
+		status = read_kind_from_one_backend(frontend, backend, (char *)&kind, MASTER_NODE_ID);
+		if (status != POOL_CONTINUE)
+		{
+			pool_error("read_kind_from_backend: read_kind_from_one_backend for master node %d failed",
+					   MASTER_NODE_ID);
+			return status;
+		}
+
+		/*
+		 * If we received a notification message in master/slave mode,
+		 * other backends will not receive the message.
+		 * So we should skip other nodes otherwise we will hung in pool_read.
+		 */
+		if (kind == 'A')	
+		{
+			*decided_kind = 'A';
+			pool_log("read_kind_from_backend: received notification message for master node %d",
+					 MASTER_NODE_ID);
+			return POOL_CONTINUE;
+		}
+		pool_unread(CONNECTION(backend, MASTER_NODE_ID), &kind, sizeof(kind));
+	}
+
 	for (i=0;i<NUM_BACKENDS;i++)
 	{
 		/* initialize degenerate record */
listen-notify.patch (3,080 bytes)

rpashin

2012-12-14 22:16

reporter   ~0000186

Tatsuo,
thank you very much! After patching LISTEN/NOTIFY works in our staging environment. In the beginning of next week we'll put it into production. I'll let you know about the result.

rpashin

2012-12-27 01:03

reporter   ~0000207

Looks like something is still not working properly. Probably I was mistaken somewhere in my experiments in staging environment when I was saying that it works.

After patch, second (listening) psql console is not hanging up after receiving notification. It do continue functioning, but there is not even sign of received notification. I.e. instead of getting this:

 postgres=# LISTEN TEST1;
 LISTEN
 Asynchronous notification "test1" received from server process with PID 28610.
 postgres=#

I see only this:

 postgres=# LISTEN TEST1;
 LISTEN
 postgres=#


And there are two more things I wanted to notice:

1. I have 2 sql servers in cluster UP and running and master is second one, i.e. primary server = 1, slave = 0

When I'm sending notification from one psql console to another, I see in logs this:

Dec 26 07:24:18 ast-pg-test-1 pgpool[2857]: statement: NOTIFY TEST1;
Dec 26 07:24:18 ast-pg-test-1 pgpool[2857]: DB node id: 1 backend pid: 28838 statement: NOTIFY TEST1;
Dec 26 07:24:18 ast-pg-test-1 pgpool[2856]: pool_process_query: discard A packet from backend 1
Dec 26 07:24:20 ast-pg-test-1 pgpool[2856]: statement: LISTEN TEST1;
Dec 26 07:24:20 ast-pg-test-1 pgpool[2856]: DB node id: 1 backend pid: 28836 statement: LISTEN TEST1;

Looks like sql is sending A packet, but pgpool is discarding it because it consider sending node as "not master" while it

is really.


2. Conditions the same. 2 SQL nodes Up and running. Mester id=1, slave=0.

I connect 2 psql sessions to pgpool. Since I use streaming replication I suppose that requestst from both psql consoles

will be forwarded to second SQL server (id=1) which is master in replication. In this case, if I disconnect first node

(id=0) I should not even notice this in console. However if I disconnect first node from cluster via web-interface when

consoles are connected, I see message that 'connection to the server was lost':

 postgres=# LISTEN TEST1;
 LISTEN
 postgres=# LISTEN TEST1;
 LISTEN
 postgres=#

 == Here I disconnect first node ==

 postgres=# LISTEN TEST1;
 The connection to the server was lost. Attempting reset: Succeeded.
 postgres=# LISTEN TEST1;
 LISTEN

It seems to me a bit strange, but probably I just misunderstanding something.

Please tell me if I can provide any other information to help troubleshooting this.

t-ishii

2012-12-27 12:23

developer   ~0000208

> After patch, second (listening) psql console is not hanging up after receiving notification. It do continue functioning, but there is not even sign of received notification.

I confirmed this part happens only when primary is 1 and standby is 0. It seems there are some places where pgpool assumes that node 0 is primary. Will fix.

t-ishii

2012-12-30 21:00

developer   ~0000209

Ok, attached (listen_notify-v2.patch) is a revised patch. Please try it.

t-ishii

2012-12-30 21:02

developer  

listen_notify-v2.patch (11,364 bytes)
diff --git a/pool.h b/pool.h
index d4830d6..56458ee 100644
--- a/pool.h
+++ b/pool.h
@@ -164,6 +164,9 @@ typedef struct {
 	char *buf2;	/* buffer for pool_read2 */
 	int bufsz2;	/* its size in bytes */
 
+	char *buf3;	/* buffer for pool_push/pop */
+	int bufsz3;	/* its size in bytes */
+
 	int isbackend;		/* this connection is for backend if non 0 */
 	int db_node_id;		/* DB node id for this connection */
 
diff --git a/pool_process_query.c b/pool_process_query.c
index 69e8eca..1dfce5f 100644
--- a/pool_process_query.c
+++ b/pool_process_query.c
@@ -176,6 +176,20 @@ POOL_STATUS pool_process_query(POOL_CONNECTION *frontend,
 		}
 
 		/*
+		 * If we are not processing a query, now is the time to
+		 * extract retrieve pending data from buffer stack if any.
+		 */
+		if (!pool_is_query_in_progress())
+		{
+			for (i=0;i<NUM_BACKENDS;i++)
+			{
+				int plen;
+				if (VALID_BACKEND(i) && pool_stacklen(CONNECTION(backend, i)) > 0)
+					pool_pop(CONNECTION(backend, i), &plen);
+			}
+		}
+
+		/*
 		 * If we are prcessing query, process it.
 		 */
 		if (pool_is_query_in_progress())
@@ -291,30 +305,52 @@ POOL_STATUS pool_process_query(POOL_CONNECTION *frontend,
 										pool_error("pool_process_query: error while reading message kind from backend %d", i);
 										return POOL_ERROR;
 									}
-									pool_log("pool_process_query: discard %c packet from backend %d", kind, i);
 
-									if (MAJOR(backend) == PROTO_MAJOR_V3)
+									if (kind == 'A')
 									{
-										if (pool_read(CONNECTION(backend, i), &len, sizeof(len)) < 0)
+										/*
+										 * In replication mode, NOTIFY is sent to all backends.
+										 * However the order of arrival of 'Notification response'
+										 * is not necessarily the master first and then slaves.
+										 * So if it arrives slave first, we should try to read from master,
+										 * rather than just discard it.
+										 */
+										pool_unread(CONNECTION(backend, i), &kind, sizeof(kind));
+										pool_log("pool_process_query: received %c packet from backend %d. Don't dicard and read %c packet from master", kind, i, kind);
+										if (pool_read(CONNECTION(backend, MASTER_NODE_ID), &kind, sizeof(kind)) < 0)
 										{
-											pool_error("pool_process_query: error while reading message length from backend %d", i);
-											return POOL_ERROR;
-										}
-										len = ntohl(len) - 4;
-										string = pool_read2(CONNECTION(backend, i), len);
-										if (string == NULL)
-										{
-											pool_error("pool_process_query: error while reading rest of message from backend %d", i);
+											pool_error("pool_process_query: error while reading message kind from backend %d", MASTER_NODE_ID);
 											return POOL_ERROR;
 										}
+										pool_unread(CONNECTION(backend, MASTER_NODE_ID), &kind, sizeof(kind));
 									}
 									else
 									{
-										string = pool_read_string(CONNECTION(backend, i), &len, 0);
-										if (string == NULL)
+										pool_log("pool_process_query: discard %c packet from backend %d", kind, i);
+
+										if (MAJOR(backend) == PROTO_MAJOR_V3)
 										{
-											pool_error("pool_process_query: error while reading rest of message from backend %d", i);
-											return POOL_ERROR;
+											if (pool_read(CONNECTION(backend, i), &len, sizeof(len)) < 0)
+											{
+												pool_error("pool_process_query: error while reading message length from backend %d", i);
+												return POOL_ERROR;
+											}
+											len = ntohl(len) - 4;
+											string = pool_read2(CONNECTION(backend, i), len);
+											if (string == NULL)
+											{
+												pool_error("pool_process_query: error while reading rest of message from backend %d", i);
+												return POOL_ERROR;
+											}
+										}
+										else
+										{
+											string = pool_read_string(CONNECTION(backend, i), &len, 0);
+											if (string == NULL)
+											{
+												pool_error("pool_process_query: error while reading rest of message from backend %d", i);
+												return POOL_ERROR;
+											}
 										}
 									}
 								}
@@ -1075,28 +1111,36 @@ POOL_STATUS SimpleForwardToFrontend(char kind, POOL_CONNECTION *frontend,
 	}
 	memcpy(p1, p, len);
 
-	for (i=0;i<NUM_BACKENDS;i++)
+	/*
+	 * If we received a notification message in master/slave mode,
+	 * other backends will not receive the message.
+	 * So we should skip other nodes otherwise we will hung in pool_read.
+	 */
+	if (!MASTER_SLAVE || kind != 'A')
 	{
-		if (VALID_BACKEND(i) && !IS_MASTER_NODE_ID(i))
+		for (i=0;i<NUM_BACKENDS;i++)
 		{
-			status = pool_read(CONNECTION(backend, i), &len, sizeof(len));
-			if (status < 0)
+			if (VALID_BACKEND(i) && !IS_MASTER_NODE_ID(i))
 			{
-				pool_error("SimpleForwardToFrontend: error while reading message length");
-				return POOL_END;
-			}
+				status = pool_read(CONNECTION(backend, i), &len, sizeof(len));
+				if (status < 0)
+				{
+					pool_error("SimpleForwardToFrontend: error while reading message length");
+					return POOL_END;
+				}
 
-			len = ntohl(len);
-			len -= 4;
+				len = ntohl(len);
+				len -= 4;
 
-			p = pool_read2(CONNECTION(backend, i), len);
-			if (p == NULL)
-				return POOL_END;
+				p = pool_read2(CONNECTION(backend, i), len);
+				if (p == NULL)
+					return POOL_END;
 
-			if (len != len1)
-			{
-				pool_debug("SimpleForwardToFrontend: length does not match between backends master(%d) %d th backend(%d) kind:(%c)",
- 						   len, i, len1, kind);
+				if (len != len1)
+				{
+					pool_debug("SimpleForwardToFrontend: length does not match between backends master(%d) %d th backend(%d) kind:(%c)",
+							   len, i, len1, kind);
+				}
 			}
 		}
 	}
@@ -1823,11 +1867,12 @@ POOL_STATUS do_command(POOL_CONNECTION *frontend, POOL_CONNECTION *backend,
 				pool_error("do_command: error while reading message length");
 				return POOL_END;
 			}
+			pool_debug("len:%x", len);
 			len = ntohl(len) - 4;
-			
-			if (kind != 'N' && kind != 'E' && kind != 'S' && kind != 'C')
+
+			if (kind != 'N' && kind != 'E' && kind != 'S' && kind != 'C' && kind != 'A')
 			{
-				pool_error("do_command: error, kind is not N, E, S or C(%02x)", kind);
+				pool_error("do_command: error, kind is not N, E, S, C or A(%02x)", kind);
 				return POOL_END;
 			}
 			string = pool_read2(backend, len);
@@ -1836,6 +1881,22 @@ POOL_STATUS do_command(POOL_CONNECTION *frontend, POOL_CONNECTION *backend,
 				pool_error("do_command: error while reading rest of message");
 				return POOL_END;
 			}
+
+			/*
+			 * It is possible that we receives a notification response
+			 * ('A') from one of backends prior to "ready for query"
+			 * response if LISTEN and NOTIFY are issued in a same
+			 * connection. So we need to save notification response to
+			 * stack buffer so that we could retrieve it later on.
+			 */
+			if (kind == 'A')
+			{
+				int nlen = htonl(len+4);
+				pool_debug("nlen:%x", nlen);
+				pool_push(backend, &kind, sizeof(kind));
+				pool_push(backend, &nlen, sizeof(nlen));
+				pool_push(backend, string, len);
+			}
 		}
 		else
 		{
@@ -3561,6 +3622,7 @@ POOL_STATUS read_kind_from_backend(POOL_CONNECTION *frontend, POOL_CONNECTION_PO
 	double max_count = 0;
 	int degenerate_node_num = 0;		/* number of backends degeneration requested */
 	int degenerate_node[MAX_NUM_BACKENDS];		/* degeneration requested backend list */
+	POOL_STATUS status;
 
 	POOL_MEMORY_POOL *old_context;
 
@@ -3572,6 +3634,31 @@ POOL_STATUS read_kind_from_backend(POOL_CONNECTION *frontend, POOL_CONNECTION_PO
 
 	memset(kind_map, 0, sizeof(kind_map));
 
+	if (MASTER_SLAVE)
+	{
+		status = read_kind_from_one_backend(frontend, backend, (char *)&kind, MASTER_NODE_ID);
+		if (status != POOL_CONTINUE)
+		{
+			pool_error("read_kind_from_backend: read_kind_from_one_backend for master node %d failed",
+					   MASTER_NODE_ID);
+			return status;
+		}
+
+		/*
+		 * If we received a notification message in master/slave mode,
+		 * other backends will not receive the message.
+		 * So we should skip other nodes otherwise we will hung in pool_read.
+		 */
+		if (kind == 'A')	
+		{
+			*decided_kind = 'A';
+			pool_log("read_kind_from_backend: received notification message for master node %d",
+					 MASTER_NODE_ID);
+			return POOL_CONTINUE;
+		}
+		pool_unread(CONNECTION(backend, MASTER_NODE_ID), &kind, sizeof(kind));
+	}
+
 	for (i=0;i<NUM_BACKENDS;i++)
 	{
 		/* initialize degenerate record */
diff --git a/pool_query_context.c b/pool_query_context.c
index 6dbe397..c2ae66b 100644
--- a/pool_query_context.c
+++ b/pool_query_context.c
@@ -278,8 +278,14 @@ int pool_virtual_master_db_node_id(void)
 	}
 
 	/*
-	 * No query context exists. Returns master node id in private buffer.
+	 * No query context exists.  If in master/slave mode, returns
+	 * primary node if exists.  Oterwise returns my_master_node_id,
+	 * which represents the last REAL_MASTER_NODE_ID.
 	 */
+	if (MASTER_SLAVE)
+	{
+		return PRIMARY_NODE_ID;
+	}
 	return my_master_node_id;
 }
 
diff --git a/pool_stream.c b/pool_stream.c
index 92cc590..083619d 100644
--- a/pool_stream.c
+++ b/pool_stream.c
@@ -843,6 +843,66 @@ int pool_unread(POOL_CONNECTION *cp, void *data, int len)
 }
 
 /*
+ * pool_push: Push data into buffer stack.
+ */
+int pool_push(POOL_CONNECTION *cp, void *data, int len)
+{
+	char *p;
+
+	pool_debug("pool_push: len: %d", len);
+
+	if (cp->bufsz3 == 0)
+	{
+		p = cp->buf3 = malloc(len);
+		if (p == NULL)
+		{
+			pool_error("pool_push: malloc failed. len:%d", len);
+			return -1;
+		}
+	}
+	else
+	{
+		p = cp->buf3 + cp->bufsz3;
+		cp->buf3 = realloc(cp->buf3, cp->bufsz3 + len);
+	}
+
+	memcpy(p, data, len);
+	cp->bufsz3 += len;
+
+	return 0;
+}
+
+/*
+ * pool_pop: Pop data from buffer stack and put back data using
+ * pool_unread.
+ */
+void pool_pop(POOL_CONNECTION *cp, int *len)
+{
+	if (cp->bufsz3 == 0)
+	{
+		*len = 0;
+		pool_debug("pool_pop: len: %d", *len);
+		return;
+	}
+
+	pool_unread(cp, cp->buf3, cp->bufsz3);
+	*len = cp->bufsz3;
+	free(cp->buf3);
+	cp->bufsz3 = 0;
+	cp->buf3 = NULL;
+	pool_debug("pool_pop: len: %d", *len);
+}
+
+/*
+ * pool_stacklen: Returns buffer stack length
+ * pool_unread.
+ */
+int pool_stacklen(POOL_CONNECTION *cp)
+{
+	return cp->bufsz3;
+}
+
+/*
  * set non-block flag
  */
 void pool_set_nonblock(int fd)
diff --git a/pool_stream.h b/pool_stream.h
index dedd5a1..188c1aa 100644
--- a/pool_stream.h
+++ b/pool_stream.h
@@ -6,7 +6,7 @@
  * pgpool: a language independent connection pool server for PostgreSQL 
  * written by Tatsuo Ishii
  *
- * Copyright (c) 2003-2010	PgPool Global Development Group
+ * Copyright (c) 2003-2012	PgPool Global Development Group
  *
  * Permission to use, copy, modify, and distribute this software and
  * its documentation for any purpose and without fee is hereby
@@ -52,6 +52,9 @@ extern int pool_flush_it(POOL_CONNECTION *cp);
 extern int pool_write_and_flush(POOL_CONNECTION *cp, void *buf, int len);
 extern char *pool_read_string(POOL_CONNECTION *cp, int *len, int line);
 extern int pool_unread(POOL_CONNECTION *cp, void *data, int len);
+extern int pool_push(POOL_CONNECTION *cp, void *data, int len);
+extern void pool_pop(POOL_CONNECTION *cp, int *len);
+extern int pool_stacklen(POOL_CONNECTION *cp);
 extern void pool_set_nonblock(int fd);
 extern void pool_unset_nonblock(int fd);
 
listen_notify-v2.patch (11,364 bytes)

rpashin

2013-01-11 14:05

reporter   ~0000211

Last edited: 2013-01-11 14:06

View 2 revisions

Sorry for delay, we were having NY holidays.
I checked listen_notify-v2.patch in staging environment and looks like it works as expected after patch. I'll do a couple of additional tests and will reply shortly to confirm that everything works.

rpashin

2013-01-15 16:41

reporter   ~0000212

Tatsuo,

looks like last recent patch (listen_notify-v2.patch) completely resolved this issue.

Bug can be considered as resolved with this patch.

Thank you!

t-ishii

2013-01-16 13:25

developer   ~0000214

Thanks! I have committed the patch to master and V3.2 stable tree.

Issue History

Date Modified Username Field Change
2012-12-12 00:09 rpashin New Issue
2012-12-12 02:34 rpashin Note Added: 0000178
2012-12-12 11:49 t-ishii Note Added: 0000179
2012-12-12 11:49 t-ishii Assigned To => t-ishii
2012-12-12 11:49 t-ishii Status new => assigned
2012-12-13 16:40 t-ishii Note Added: 0000185
2012-12-13 16:41 t-ishii File Added: listen-notify.patch
2012-12-14 22:16 rpashin Note Added: 0000186
2012-12-27 01:03 rpashin Note Added: 0000207
2012-12-27 12:23 t-ishii Note Added: 0000208
2012-12-30 21:00 t-ishii Note Added: 0000209
2012-12-30 21:02 t-ishii File Added: listen_notify-v2.patch
2013-01-08 09:46 t-ishii Status assigned => feedback
2013-01-11 14:05 rpashin Note Added: 0000211
2013-01-11 14:05 rpashin Status feedback => assigned
2013-01-11 14:06 rpashin Note Edited: 0000211 View Revisions
2013-01-15 16:41 rpashin Note Added: 0000212
2013-01-16 13:25 t-ishii Note Added: 0000214
2013-01-16 13:25 t-ishii Status assigned => resolved
2013-01-16 14:01 t-ishii Changeset attached => pgpool2 master dc5b7ccc