View Issue Details

IDProjectCategoryView StatusLast Update
0000005Pgpool-IIBugpublic2012-03-29 12:56
ReportertuomasAssigned To 
PrioritynormalSeverityminorReproducibilitysometimes
Status closedResolutionopen 
PlatformLinuxOSUbuntuOS Version11.04
Product Version 
Target VersionFixed in Version 
Summary0000005: occasional pgpool child segmentation faults
DescriptionPgpool: 3.1.2
Postgresql: 9.1.2
mode: master-slave streaming replication

We are seeing some segfaults from pgpool childs, possibly because for some reason primary node id is different from master node id.


Feb 15 08:26:33 pgpool[7920]: ProcessFrontendResponse: failed to read kind from frontend. frontend abnormally exited
Feb 15 08:26:33 kernel: pgpool[7920]: segfault at 0 ip 000000000040a505 sp 00007fff92887400 error 4 in pgpool[400000+de000]
Feb 15 08:26:33 kernel: Process pgpool (pid: 7920, threadinfo ffff880101006000, task ffff8800a263ada0)
Feb 15 08:26:33 pgpool[10468]: Child process 7920 was terminated by segmentation fault

Here's the gdb output.

#0 0x000000000040a505 in do_child (unix_fd=5, inet_fd=<value optimized out>) at child.c:356
356 sp = MASTER_CONNECTION(backend)->sp;
(gdb) bt
#0 0x000000000040a505 in do_child (unix_fd=5, inet_fd=<value optimized out>) at child.c:356
0000001 0x00000000004047e5 in fork_a_child (unix_fd=5, inet_fd=6, id=5) at main.c:1072
0000002 0x000000000040509a in reaper () at main.c:2150
0000003 0x0000000000405cfd in pool_sleep (second=<value optimized out>) at main.c:2347
0000004 0x00000000004074f8 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:708

(gdb) print backend
$1 = (POOL_CONNECTION_POOL *) 0xe86070

(gdb) print backend->slots
$3 = {0x0, 0xe89d00, 0x0 <repeats 126 times>}

(gdb) print status
$4 = POOL_END

(gdb) print Req_info->master_node_id
$5 = 0

(gdb) print Req_info->primary_node_id
$6 = 1


Another one in different place:

Program terminated with signal 11, Segmentation fault.
#0 0x00000000004479a2 in ProcessFrontendResponse (frontend=0xe870a0, backend=0xe86070) at pool_proto_modules.c:2012
2012 if (MAJOR(backend) == PROTO_MAJOR_V3)
(gdb) bt
#0 0x00000000004479a2 in ProcessFrontendResponse (frontend=0xe870a0, backend=0xe86070) at pool_proto_modules.c:2012
0000001 0x0000000000414e36 in pool_process_query (frontend=0xe870a0, backend=0xe86070, reset_request=<value optimized out>) at pool_process_query.c:344
0000002 0x000000000040a4f2 in do_child (unix_fd=5, inet_fd=<value optimized out>) at child.c:354
0000003 0x00000000004047e5 in fork_a_child (unix_fd=5, inet_fd=6, id=1) at main.c:1072
0000004 0x000000000040509a in reaper () at main.c:2150
0000005 0x0000000000405cfd in pool_sleep (second=<value optimized out>) at main.c:2347
0000006 0x00000000004074f8 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:708

(gdb) print backend
$1 = (POOL_CONNECTION_POOL *) 0xe86070

(gdb) print backend->slots
$2 = {0x0, 0xe850a0, 0x0 <repeats 126 times>}

(gdb) print Req_info->master_node_id
$3 = 0

(gdb) print Req_info->primary_node_id
$4 = 1

In both cases backend->slots[0]->sp is accessed but backend->slots[0] is null

Events that lead to this:

- Node 1 was primary and node 0 slave (previously there was a failover that turned node 1 to primary/master and 0 as slave)
- pgpool lost connectivity to both backends
- node 1 was attached back first as it was primary and continued to be primary and master
- node 0 was attached back and although node 1 stayed primary as it should, it seems node 0 became master:

[network issue broke all connections]
Cannot accept() new connection. all backends are down
Replication of node:0 is behind 0 bytes from the primary server (node:1)
[pcp_attach_node 1 here]
send_failback_request: fail back 1 th node request from pid 20679
starting fail back. reconnect host fdqa03.fd-qa.flowdock-int.com(5432)
Do not restart children because we are failbacking node id 1 hosta.b.c.d port:5432 and we are in streaming replication mode
find_primary_node_repeatedly: waiting for finding a primary node
find_primary_node: primary node id is 1
failover: set new primary node: 1
failover: set new master node: 1 <-- here master is still 0
worker process received restart request
failback done. reconnect host a.b.c.d(5432)
...
do_child: failback event found. restart myself.
...

[pcp_attach_node 0]
send_failback_request: fail back 0 th node request from pid 21988
starting fail back. reconnect host d.c.b.a(5432)
Do not restart children because we are failbacking node id 0 hostd.c.b.a port:5432 and we are in streaming replication mode
find_primary_node_repeatedly: waiting for finding a primary node
find_primary_node: primary node id is 1
failover: set new primary node: 1
failover: set new master node: 0 <--- here master node is set to 0 for some reason
failback done. reconnect host d.c.b.a(5432)
...

Let me know if some additional info would be useful

I don't quite see the difference between master node and primary node so i guess they should always be the same at least in streaming replication mode.
TagsNo tags attached.

Activities

t-ishii

2012-02-22 15:16

developer   ~0000008

I think problem is [pcp_attach_node 0] tries to attached down
node. pcp_attach_node attaches specified node regardless the node is
actually up or down. i.e. it's absolutely user's responsibility to
make sure he is doing the right thing. However segfault is not good of
course.

[technical aspect of the problem] Pgpool child has an internal state
cache (my_backend_status[(backend_id)]) not to be distrubed by
occasional backend status changes(e.g. going down). This is going to
be updated whenever convenient, usually at the time next session
starting. The cache is used by VALID_BACKEND macro. Also we have data
in shared memory area which represents which one is master(where
"master" is the first live backend). In this case master was set to 0
by the reason described above. So pgpool child looks into down backend
info, which causes segfalut. It seems we should employ silimar
strategy as my_backend_status for VALID_BACKEND macro case as
well. Problem is the macro has been used so widely, I'm not sure it's
safe to change the macro or not. I feel like I need to study this
issue more...

t-ishii

2012-02-22 16:41

developer  

master_node.patch (1,824 bytes)
diff --git a/child.c b/child.c
index 335f6fb..c3c0879 100644
--- a/child.c
+++ b/child.c
@@ -1960,7 +1960,7 @@ static void init_system_db_connection(void)
 }
 
 /*
- * Initialize my backend status.
+ * Initialize my backend status and master node id.
  * We copy the backend status to private area so that
  * they are not changed while I am alive.
  */
@@ -1976,4 +1976,6 @@ void pool_initialize_private_backend_status(void)
 		/* my_backend_status is referred to by VALID_BACKEND macro. */
 		my_backend_status[i] = &private_backend_status[i];
 	}
+
+	my_master_node_id = REAL_MASTER_NODE_ID;
 }
diff --git a/main.c b/main.c
index 156be90..2eadb9c 100644
--- a/main.c
+++ b/main.c
@@ -181,6 +181,7 @@ static BackendStatusRecord backend_rec;	/* Backend status record */
 static pid_t worker_pid; /* pid of worker process */
 
 BACKEND_STATUS* my_backend_status[MAX_NUM_BACKENDS];		/* Backend status buffer */
+int my_master_node_id;		/* Master node id buffer */
 
 int myargc;
 char **myargv;
diff --git a/pool.h b/pool.h
index 5f85230..d05371b 100644
--- a/pool.h
+++ b/pool.h
@@ -268,6 +268,7 @@ typedef struct {
 extern bool pool_is_node_to_be_sent_in_current_query(int node_id);
 extern int pool_virtual_master_db_node_id(void);
 extern BACKEND_STATUS* my_backend_status[];
+extern int my_master_node_id;
 
 #define VALID_BACKEND(backend_id) \
 	((RAW_MODE && (backend_id) == REAL_MASTER_NODE_ID) ||		\
diff --git a/pool_query_context.c b/pool_query_context.c
index 9f5cac7..d09a0c6 100644
--- a/pool_query_context.c
+++ b/pool_query_context.c
@@ -276,7 +276,11 @@ int pool_virtual_master_db_node_id(void)
 	{
 		return sc->query_context->virtual_master_node_id;
 	}
-	return REAL_MASTER_NODE_ID;
+
+	/*
+	 * No query context exists. Returns master node id in private buffer.
+	 */
+	return my_master_node_id;
 }
 
 /*
master_node.patch (1,824 bytes)

t-ishii

2012-02-22 16:44

developer   ~0000011

Ok. I have created and uploaded a patch which should fix the problem. Please try if you like. The patches are against for git master head but could be applied to pgpool-II 3.1.x as well.

tuomas

2012-02-23 18:15

reporter   ~0000012

Node 0 was up (server reachable and postgresql correctly running at least) when it was attached, also the postgresql servers were replicating fine and there were "Replication of node:0 is behind 0 bytes from the primary server (node:1)" in the logs. Not sure why pgpool would have though that it was down. I guess the reconnect would fail at least?

Anyway, I'll apply the patch and try to reproduce.

tuomas

2012-02-29 21:43

reporter   ~0000015

Last edited: 2012-02-29 21:44

View 2 revisions

I've now been testing the patch. I still got some segfaults soon after attaching node 0 back, but now in a different place:

node 0 is up and ready to accept connections and it's attach back:

Feb 29 09:44:10 pgpool[479]: send_failback_request: fail back 0 th node request from pid 479
Feb 29 09:44:10 pgpool[445]: starting fail back. reconnect host node0(5432)
Feb 29 09:44:10 pgpool[445]: Do not restart children because we are failbacking node id 0 hostnode0 port:5432 and we are in streaming replication mode
Feb 29 09:44:10 pgpool[445]: find_primary_node_repeatedly: waiting for finding a primary node
Feb 29 09:44:10 pgpool[445]: find_primary_node: primary node id is 1
Feb 29 09:44:10 pgpool[445]: failover: set new primary node: 1
Feb 29 09:44:10 pgpool[445]: failover: set new master node: 0
Feb 29 09:44:10 pgpool[445]: failback done. reconnect host node0(5432)
Feb 29 09:44:10 pgpool[480]: worker process received restart request
Feb 29 09:44:11 pgpool[479]: pcp child process received restart request
Feb 29 09:44:11 pgpool[445]: worker child 480 exits with status 256
Feb 29 09:44:11 pgpool[445]: fork a new worker child pid 3654
Feb 29 09:44:11 pgpool[445]: PCP child 479 exits with status 256
Feb 29 09:44:11 pgpool[445]: fork a new PCP child pid 3655
Feb 29 09:44:12 pgpool[3654]: Replication of node:0 is behind 0 bytes from the primary server (node:1)
Feb 29 09:44:17 kernel: [1107706.281641] pgpool[1505]: segfault at 0 ip 000000000044aa43 sp 00007fffa1980710 error 4 in pgpool[400000+de000]
Feb 29 09:44:17 pgpool[445]: Child process 1505 was terminated by segmentation fault
Feb 29 09:44:17 pgpool[3708]: do_child: failback event found. restart myself.
...
Feb 29 09:49:05 pgpool[3654]: Replication of node:0 is behind 0 bytes from the primary server (node:1)
Feb 29 09:49:08 kernel: [1107997.010304] pgpool[458]: segfault at 0 ip 000000000044aa43 sp 00007fffa1980780 error 4 in pgpool[400000+de000]
Feb 29 09:49:08 pgpool[445]: Child process 458 was terminated by segmentation fault
Feb 29 09:49:08 pgpool[4219]: do_child: failback event found. restart myself.
...
Feb 29 09:55:50 kernel: [1108398.757059] pgpool[453]: segfault at 0 ip 000000000044aa43 sp 00007fffa1980780 error 4 in pgpool[400000+de000]
Feb 29 09:55:50 pgpool[445]: Child process 453 was terminated by segmentation fault
Feb 29 09:55:50 pgpool[4958]: do_child: failback event found. restart myself.
Feb 29 09:55:50 kernel: [1108398.772031] pgpool[474]: segfault at 0 ip 000000000044aa43 sp 00007fffa1980780 error 4 in pgpool[400000+de000]
Feb 29 09:55:50 pgpool[445]: Child process 474 was terminated by segmentation fault
Feb 29 09:55:50 pgpool[4960]: do_child: failback event found. restart myself.

first segfault (pid 1505):

Program terminated with signal 11, Segmentation fault.
#0 0x000000000044aa43 in pool_where_to_send (query_context=0x2677160, query=0x265a8c0 "SELECT x.* FROM x WHERE x.y = 'z' LIMIT 1",
    node=0x26782c8) at pool_query_context.c:360
360 MAJOR(backend) == PROTO_MAJOR_V3)
(gdb) bt
#0 0x000000000044aa43 in pool_where_to_send (query_context=0x2677160, query=0x265a8c0 "SELECT x.* FROM x WHERE x.y = 'z' LIMIT 1",
    node=0x26782c8) at pool_query_context.c:360
0000001 0x0000000000443287 in SimpleQuery (frontend=0x265a030, backend=0x2655070, len=101,
    contents=0x265a8c0 "SELECT x.* FROM x WHERE x.y = 'z' LIMIT 1") at pool_proto_modules.c:286
0000002 0x0000000000447b73 in ProcessFrontendResponse (frontend=0x265a030, backend=0x2655070) at pool_proto_modules.c:2046
0000003 0x0000000000414e46 in pool_process_query (frontend=0x265a030, backend=0x2655070, reset_request=<value optimized out>) at pool_process_query.c:344
0000004 0x000000000040a502 in do_child (unix_fd=5, inet_fd=<value optimized out>) at child.c:354
0000005 0x00000000004047e5 in fork_a_child (unix_fd=5, inet_fd=6, id=31) at main.c:1073
0000006 0x000000000040509a in reaper () at main.c:2151
0000007 0x0000000000405cfd in pool_sleep (second=<value optimized out>) at main.c:2348
0000008 0x00000000004074f8 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:709

(gdb) print backend->slots
$2 = {0x0, 0x2654080, 0x0 <repeats 126 times>}

(gdb) print Req_info->master_node_id
$3 = 0

(gdb) print Req_info->primary_node_id
$4 = 1

(gdb) print my_master_node_id
$1 = 1

(gdb) print session_context->query_context->virtual_master_node_id
$1 = 0

So within the query context the id is still 0.

The children that segfaulted had not restarted at that point so there might still be some concurrency issue with children that have an open (even if idle) connection. Those queries are not a long running ones (they are very fast), but the client connections are kept alive.


Once all childs had restarted (or segfaulted) i have not seen more segfaults.

Btw, why do the children restart although it says "Do not restart children because we are failbacking node id 0 hostnode0 port:5432 and we are in streaming replication mode"?

t-ishii

2012-03-03 12:15

developer   ~0000016

I found another problem which could refer to newly attached node (in your case 0). I have uploaded small fix to deal with this.

> Btw, why do the children restart although it says "Do not restart children because we are failbacking node id 0 hostnode0 port:5432 and we are in streaming replication mode"?
Are you sure that the children restarted? Their pids seem not likely to restart.
Note that those processes are special and treated differently from ordinary pgpool children.
Feb 29 09:44:11 pgpool[445]: fork a new worker child pid 3654
Feb 29 09:44:11 pgpool[445]: PCP child 479 exits with status 256
Feb 29 09:44:11 pgpool[445]: fork a new PCP child pid 3655

t-ishii

2012-03-03 12:15

developer  

pool_query_context.patch (639 bytes)
diff --git a/pool_query_context.c b/pool_query_context.c
index d09a0c6..a9f393b 100644
--- a/pool_query_context.c
+++ b/pool_query_context.c
@@ -98,7 +98,7 @@ void pool_start_query(POOL_QUERY_CONTEXT *query_context, char *query, int len, N
 		query_context->original_query = query;
 		query_context->rewritten_query = NULL;
 		query_context->parse_tree = node;
-		query_context->virtual_master_node_id = REAL_MASTER_NODE_ID;
+		query_context->virtual_master_node_id = my_master_node_id;
 		query_context->is_cache_safe = false;
 		if (pool_config->memory_cache_enabled)
 			query_context->temp_cache = pool_create_temp_query_cache(query);

t-ishii

2012-03-04 13:02

developer   ~0000017

I found we need one more fix. Please apply attached patch as well.

t-ishii

2012-03-04 13:11

developer  

pool_query_context.patch2 (536 bytes)

tuomas

2012-03-05 21:51

reporter   ~0000018

Hi,

Thanks! I'm now testing the new patches and it looks good so far. Done a couple of fail overs and there has been no segfaults.

What comes to children restarting; I tested it separately by stopping and starting a slave and then attaching it back.

(Node id 0 is primary, node id 1 is slave)
I got first:
pgpool[18087]: Do not restart children because we are failbacking node id 1 hosta.b.c.d port:5432 and we are in streaming replication mode

And then these for all the children:
pgpool[19773]: do_child: failback event found. discard existing connections
pgpool[19762]: do_child: failback event found. restart myself.
pgpool[19767]: do_child: failback event found. discard existing connections
pgpool[19770]: do_child: failback event found. discard existing connections
pgpool[19772]: do_child: failback event found. restart myself.
...
I verified that those pids no longer existed after that and i also saw an error of a dropped connection in the application that was connected.

So indeed it seems to restart the children.

t-ishii

2012-03-24 12:14

developer   ~0000020

Yes, children will restart after all in covenient time, which means after client connection to the child terminates.

tuomas

2012-03-27 19:06

reporter   ~0000021

I recall it restarted also the active ones ("discarding existing connections"). But let's close this ticket as fixed as the segfaults are no longer happening and it works great now.

I'll test the restart thing more separately and open a new ticket if I can reproduce a real issue with it.

Thanks

Issue History

Date Modified Username Field Change
2012-02-15 20:28 tuomas New Issue
2012-02-22 15:16 t-ishii Note Added: 0000008
2012-02-22 16:41 t-ishii File Added: master_node.patch
2012-02-22 16:44 t-ishii Note Added: 0000011
2012-02-23 18:15 tuomas Note Added: 0000012
2012-02-29 21:43 tuomas Note Added: 0000015
2012-02-29 21:44 tuomas Note Edited: 0000015 View Revisions
2012-03-03 12:15 t-ishii Note Added: 0000016
2012-03-03 12:15 t-ishii File Added: pool_query_context.patch
2012-03-04 13:02 t-ishii Note Added: 0000017
2012-03-04 13:11 t-ishii File Added: pool_query_context.patch2
2012-03-05 21:51 tuomas Note Added: 0000018
2012-03-24 12:14 t-ishii Note Added: 0000020
2012-03-27 19:06 tuomas Note Added: 0000021
2012-03-29 12:56 t-ishii Status new => closed