View Issue Details

IDProjectCategoryView StatusLast Update
0000303Pgpool-IIBugpublic2017-08-03 14:44
ReporternagataAssigned Tot-ishii 
PrioritynormalSeverityminorReproducibilityunable to reproduce
Status resolvedResolutionopen 
Product Version 
Target VersionFixed in Version 
Summary0000303: db_node_is is not properly initialized
DescriptionPOOL_CONNECTION has a member named db_node_id that is DB node id for this connection. This value is used to output log or degenerate the backend in pool_read(), for example.


 221 /* if fail_over_on_backend_error is true, then trigger failover */
 222 if (pool_config->fail_over_on_backend_error)
 223 {
 224 notice_backend_error(cp->db_node_id, true);
 225
 226 /* If we are in the main process, we will not exit */
 227 child_exit(POOL_EXIT_AND_RESTART);
 228 ereport(ERROR,
 229 (errmsg("unable to read data from DB node %d",cp->db_node_id),
 230 errdetail("socket read failed with an error \"%s\"", strerror(errno))));
 231 }

However, db_node_id is initialized only in connect_backend() which is called only by child processes. In other processes the value is not initialized and always 0.

So, when pool_read is called from the parent process or the worker process during health check or streaming-replication delay check, and emits an error in the above code, log message output is as following, no matter which backend it tried to read from.

 ERROR: unable to read data from DB node 0
                            ~~~~~~~~~~~~~~~
To make the matter worse, if it is the worker process, backend 0 is degenerated.
(If it is the parent process, notice_backend_error() do nothing.)

The resolution is to initialize properly even in the parent process or the worker process.

I attached the patch to fix it. What do you think about this?
TagsNo tags attached.

Activities

nagata

2017-04-19 17:05

developer  

db_node_id.patch (7,255 bytes)
diff --git a/src/include/pool.h b/src/include/pool.h
index 40f425d..b21d4ae 100644
--- a/src/include/pool.h
+++ b/src/include/pool.h
@@ -621,9 +621,9 @@ extern POOL_STATUS OneNode_do_command(POOL_CONNECTION *frontend, POOL_CONNECTION
 
 /* child.c */
 extern POOL_CONNECTION_POOL_SLOT *make_persistent_db_connection(
-	char *hostname, int port, char *dbname, char *user, char *password, bool retry);
+	int db_node_id, char *hostname, int port, char *dbname, char *user, char *password, bool retry);
 extern POOL_CONNECTION_POOL_SLOT *make_persistent_db_connection_noerror(
-    char *hostname, int port, char *dbname, char *user, char *password, bool retry);
+    int db_node_id, char *hostname, int port, char *dbname, char *user, char *password, bool retry);
 extern void discard_persistent_db_connection(POOL_CONNECTION_POOL_SLOT *cp);
 
 /* define pool_system.c */
diff --git a/src/include/utils/pool_stream.h b/src/include/utils/pool_stream.h
index 5db8372..3518761 100644
--- a/src/include/utils/pool_stream.h
+++ b/src/include/utils/pool_stream.h
@@ -42,7 +42,7 @@
        (connection)->len = 0; \
     } while (0)
 
-extern POOL_CONNECTION *pool_open(int fd, bool backend_connection);
+extern POOL_CONNECTION *pool_open(int fd, bool backend_connection, int db_node_id);
 extern void pool_close(POOL_CONNECTION *cp);
 extern int pool_read(POOL_CONNECTION *cp, void *buf, int len);
 extern void pool_read_with_error(POOL_CONNECTION *cp, void *buf, int len,
diff --git a/src/main/pgpool_main.c b/src/main/pgpool_main.c
index 5983c1d..68acb93 100644
--- a/src/main/pgpool_main.c
+++ b/src/main/pgpool_main.c
@@ -2290,7 +2290,8 @@ do_health_check(bool use_template_db, volatile int *health_check_node_id)
 		ereport(DEBUG1,
 			(errmsg("Trying to make persistent DB connection to backend node %d having status %d", i, bkinfo->backend_status)));
 
-		slot = make_persistent_db_connection(bkinfo->backend_hostname,
+		slot = make_persistent_db_connection(*health_check_node_id,
+											 bkinfo->backend_hostname,
 											 bkinfo->backend_port,
 											 dbname,
 											 pool_config->health_check_user,
@@ -2883,7 +2884,8 @@ static bool
 
 	*is_standby = false;
 
-	s = make_persistent_db_connection_noerror(bkinfo->backend_hostname,
+	s = make_persistent_db_connection_noerror(backend_no,
+										  bkinfo->backend_hostname,
 										  bkinfo->backend_port,
 										  pool_config->sr_check_database,
 										  pool_config->sr_check_user,
diff --git a/src/protocol/child.c b/src/protocol/child.c
index a0f8c17..6953918 100644
--- a/src/protocol/child.c
+++ b/src/protocol/child.c
@@ -793,7 +793,7 @@ void cancel_request(CancelPacket *sp)
 			return;
 		}
 
-		con = pool_open(fd,true);
+		con = pool_open(fd, true, i);
 		if (con == NULL)
 			return;
 
@@ -878,13 +878,6 @@ static POOL_CONNECTION_POOL *connect_backend(StartupPacket *sp, POOL_CONNECTION
 		{
 			if (VALID_BACKEND(i))
 			{
-
-				/* set DB node id */
-				CONNECTION(backend, i)->db_node_id = i;
-
-				/* mark this is a backend connection */
-				CONNECTION(backend, i)->isbackend = 1;
-
 				pool_ssl_negotiate_clientserver(CONNECTION(backend, i));
 
 				/*
@@ -1177,7 +1170,7 @@ void child_exit(int code)
  * create a persistent connection
  */
 POOL_CONNECTION_POOL_SLOT *make_persistent_db_connection(
-	char *hostname, int port, char *dbname, char *user, char *password, bool retry)
+	int db_node_id, char *hostname, int port, char *dbname, char *user, char *password, bool retry)
 {
 	POOL_CONNECTION_POOL_SLOT *cp;
 	int fd;
@@ -1218,9 +1211,8 @@ POOL_CONNECTION_POOL_SLOT *make_persistent_db_connection(
                  errdetail("connection to host:\"%s:%d\" failed", hostname, port)));
 	}
 
-	cp->con = pool_open(fd,true);
+	cp->con = pool_open(fd, true, db_node_id);
 	cp->closetime = 0;
-	cp->con->isbackend = 1;
 	pool_ssl_negotiate_clientserver(cp->con);
 
 	/*
@@ -1297,13 +1289,14 @@ POOL_CONNECTION_POOL_SLOT *make_persistent_db_connection(
  * make_persistent_db_connection() which does not ereports in case of an error
  */
 POOL_CONNECTION_POOL_SLOT *make_persistent_db_connection_noerror(
-                        char *hostname, int port, char *dbname, char *user, char *password, bool retry)
+                        int db_node_id, char *hostname, int port, char *dbname, char *user, char *password, bool retry)
 {
     POOL_CONNECTION_POOL_SLOT *slot = NULL;
     MemoryContext oldContext = CurrentMemoryContext;
     PG_TRY();
     {
-        slot = make_persistent_db_connection(hostname,
+        slot = make_persistent_db_connection(db_node_id,
+                                                 hostname,
                                                  port,
                                                  dbname,
                                                  user,
@@ -2106,7 +2099,7 @@ static void validate_backend_connectivity(int front_end_fd)
 			 */
 			PG_TRY();
 			{
-				if ((cp = pool_open(front_end_fd,false)) == NULL)
+				if ((cp = pool_open(front_end_fd, false, -1)) == NULL)
 				{
 					close(front_end_fd);
 					child_exit(POOL_EXIT_AND_RESTART);
@@ -2184,7 +2177,7 @@ get_connection(int front_end_fd, SockAddr *saddr)
 
 	}
 
-	if ((cp = pool_open(front_end_fd,false)) == NULL)
+	if ((cp = pool_open(front_end_fd, false, -1)) == NULL)
 	{
 		close(front_end_fd);
         ereport(ERROR,
diff --git a/src/protocol/pool_connection_pool.c b/src/protocol/pool_connection_pool.c
index 289cec4..d97c2e8 100644
--- a/src/protocol/pool_connection_pool.c
+++ b/src/protocol/pool_connection_pool.c
@@ -804,7 +804,7 @@ static POOL_CONNECTION_POOL_SLOT *create_cp(POOL_CONNECTION_POOL_SLOT *cp, int s
 		return NULL;
 
 	cp->sp = NULL;
-	cp->con = pool_open(fd,true);
+	cp->con = pool_open(fd, true, slot);
 	cp->closetime = 0;
 	return cp;
 }
diff --git a/src/streaming_replication/pool_worker_child.c b/src/streaming_replication/pool_worker_child.c
index 824c0f5..3cff6e1 100644
--- a/src/streaming_replication/pool_worker_child.c
+++ b/src/streaming_replication/pool_worker_child.c
@@ -198,7 +198,8 @@ static void establish_persistent_connection(void)
 		if (slots[i] == NULL)
 		{
                 bkinfo = pool_get_node_info(i);
-                slots[i] = make_persistent_db_connection_noerror(bkinfo->backend_hostname,
+                slots[i] = make_persistent_db_connection_noerror(i,
+											  bkinfo->backend_hostname,
 											  bkinfo->backend_port,
 											  pool_config->sr_check_database,
 											  pool_config->sr_check_user,
diff --git a/src/utils/pool_stream.c b/src/utils/pool_stream.c
index 30d9da4..6bb7644 100644
--- a/src/utils/pool_stream.c
+++ b/src/utils/pool_stream.c
@@ -72,7 +72,7 @@ static MemoryContext
 * open read/write file descriptors.
 * returns POOL_CONNECTION on success otherwise NULL.
 */
-POOL_CONNECTION *pool_open(int fd, bool backend_connection)
+POOL_CONNECTION *pool_open(int fd, bool backend_connection, int db_node_id)
 {
 	POOL_CONNECTION *cp;
 
@@ -100,6 +100,9 @@ POOL_CONNECTION *pool_open(int fd, bool backend_connection)
 	cp->fd = fd;
 	cp->socket_state = POOL_SOCKET_VALID;
 
+	cp->isbackend = backend_connection ? 1 : 0;
+	cp->db_node_id = backend_connection ? db_node_id : -1;
+
     MemoryContextSwitchTo(oldContext);
 
 	return cp;
db_node_id.patch (7,255 bytes)

t-ishii

2017-04-20 10:57

developer   ~0001435

> To make the matter worse, if it is the worker process, backend 0 is degenerated.

Are you sure?

Replication delay check worker does not trigger in this case.
Typical log is something like:
> 2017-04-20 10:47:13: pid 22543: LOG: failed to connect to PostgreSQL server on "133.137.175.105:11003", getsockopt() detected error "Connection refused"
and pool_read is not called, thus does not trigger failover.

nagata

2017-04-20 13:06

developer   ~0001436

Last edited: 2017-04-20 13:13

View 3 revisions

I've not reproduced it, so the mechanism is my guess from the source code.
However, certainly the log provided from our client shows that

 2017-04-12 01:26:18: pid 7548: LOG: fork a new worker child process with pid: 12603
 ...
 2017-04-12 01:38:18: pid 12603: LOG: received degenerate backend request for node_id: 0 from pid [12603]
 ...
 2017-04-12 01:38:18: pid 12603: WARNING: child_exit: called from invalid process. ignored.
 2017-04-12 01:38:18: pid 12603: ERROR: unable to read data from DB node 0
 2017-04-12 01:38:18: pid 12603: DETAIL: socket read failed with an error "Connection reset by peer"

pool_read() is called during authentication in make_persistent_db_connection()
do_query() also calls pool_read()

t-ishii

2017-04-20 14:15

developer   ~0001437

Ok.

Regarding your patch, I don't like to modify pool_open(). I recommend leave it as it is. More members may be added to struct POOOL_CONNECTION in the future. We don't want to change pool_open() signature to set such new properties. We should add values for struct members such as is_backend and db_node_id, after calling pool_open().

nagata

2017-04-20 15:54

developer   ~0001438

Why don't initialize the new member in pool_open() even when a new member is added to POOL_CONNECTION? I think it is better to initialize the structure's members as early stage as possible to avoid use uninitialized variable.

t-ishii

2017-04-20 16:03

developer   ~0001439

As I said, that way will bring big maintenance headache in the future.

nagata

2017-04-20 16:17

developer   ~0001440

But, if we add a new member, anyway we have to maintenance this including initialization. So, is it easier to handle it in a single function like pool_open() than doing it different places?

nagata

2017-04-26 13:54

developer   ~0001475

Am I missing something?

t-ishii

2017-04-27 07:57

developer   ~0001480

Go for it.

t-ishii

2017-08-03 12:03

developer   ~0001619

Since there was no progress since April 2017, I took over this and committed/pushed fix.

Issue History

Date Modified Username Field Change
2017-04-19 17:05 nagata New Issue
2017-04-19 17:05 nagata File Added: db_node_id.patch
2017-04-20 10:57 t-ishii Note Added: 0001435
2017-04-20 13:06 nagata Note Added: 0001436
2017-04-20 13:07 nagata Note Edited: 0001436 View Revisions
2017-04-20 13:13 nagata Note Edited: 0001436 View Revisions
2017-04-20 14:15 t-ishii Note Added: 0001437
2017-04-20 15:54 nagata Note Added: 0001438
2017-04-20 16:03 t-ishii Note Added: 0001439
2017-04-20 16:17 nagata Note Added: 0001440
2017-04-26 13:54 nagata Note Added: 0001475
2017-04-27 07:57 t-ishii Note Added: 0001480
2017-04-27 08:01 t-ishii Assigned To => nagata
2017-04-27 08:01 t-ishii Status new => assigned
2017-08-03 12:02 t-ishii Assigned To nagata => t-ishii
2017-08-03 12:03 t-ishii Note Added: 0001619
2017-08-03 14:44 t-ishii Status assigned => resolved