0000076: pgpool 3.3.1 has segfault under load of UPDATEs to psql 9.2 database. - Pgpool-II Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000076	Pgpool-II	Bug	public	2013-10-12 03:17	2013-10-19 12:47

Reporter	glenngerhardt	Assigned To	t-ishii
Priority	normal	Severity	major	Reproducibility	always
Status	resolved	Resolution	open
Platform	Linux	OS	CentOS	OS Version	2.6.32-279.22.1

Summary	0000076: pgpool 3.3.1 has segfault under load of UPDATEs to psql 9.2 database.
Description	Core file stack trace: Program terminated with signal 11, Segmentation fault. #0 0x000000000044ce18 in Execute (frontend=0x1248280, backend=0x10c27d0, len=5, contents=0x121d440 "") at pool_proto_modules.c:745 745 pool_proto_modules.c: No such file or directory. in pool_proto_modules.c #0 0x000000000044ce18 in Execute (frontend=0x1248280, backend=0x10c27d0, len=5, contents=0x121d440 "") at pool_proto_modules.c:745 0000001 0x000000000044e27d in ProcessFrontendResponse (frontend=0x1248280, backend=0x10c27d0) at pool_proto_modules.c:2496 0000002 0x000000000041c8d0 in pool_process_query (frontend=<value optimized out>, backend=0x10c27d0, reset_request=<value optimized out>) at pool_process_query.c:232 0000003 0x000000000040b968 in do_child (unix_fd=4, inet_fd=<value optimized out>) at child.c:371 0000004 0x0000000000404dff in fork_a_child (unix_fd=4, inet_fd=5, id=454) at main.c:1246 0000005 0x0000000000408010 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:654
Steps To Reproduce	Load test. Running approximately 300 concurrent UPDATES to pgpool. The UPDATE looks like : UPDATE client SET (device_profile, device_type, model, nonce, os, os_version, vendor, updated_at) = ('(iPhone4;4.2.1;Apple)', 'IMC2.3.0_IP_D2.2.0_S0', 'iPhone4', '<0>drm_token', 'iphone', '4.2.1', 'Apple', NOW() AT TIME ZONE 'UTC') WHERE device_id = 'brg3_IMC2.3.0_IP_D2.2.0_S0_1' AND owner_id = '2' AND user_token = 'brg3' Steps to reproduce are difficult. There's no 1, 2, 3, BANG that does it. I let the load run and occasionally see the crash.
Additional Information	Pgpool log: 2013-10-11 17:41:17 ERROR: pid 99744: Child process 100200 was terminated by segmentation fault /var/log/messages: Oct 11 17:41:01 butler kernel: pgpool[100200]: segfault at ffffffffbe51e684 ip 000000000044ce18 sp 00007fff65f626e0 error 4 in pgpool[400000+107000] This defect could be a duplicate of this, but not sure. http://www.pgpool.net/mantisbt/view.php?id=55
Tags	No tags attached.

glenngerhardt 2013-10-12 03:17 reporter	pgpool.conf (20,432 bytes)

t-ishii 2013-10-15 17:46 developer ~0000339	The place where segfault gets happend is in the code path of query cache. While bug 0000055 did not use the query cache. So I think it is not likely those bugs are directly connected. In the mean time I found a nasty bug with the code path of the query cache. This may or may not be connected your problem but can you please try it out?

t-ishii 2013-10-15 17:48 developer	pool_proto_modules.c.patch (539 bytes) pool_proto_modules.c.patch (539 bytes)

glenngerhardt 2013-10-16 02:27 reporter ~0000343	I am sorry, but the patch did not work. The patch was on line 748 and the crash is on line 745 (before that loop). It still crashes. Here is some info from the core file that should be helpful. It seems that the bind_msg->param_offset is a very large negative number. (gdb) bt #0 0x000000000044ce18 in Execute (frontend=0x1b75a20, backend=0x1aaa080, len=5, contents=0x1b6f630 "") at pool_proto_modu 0000001 0x000000000044e27d in ProcessFrontendResponse (frontend=0x1b75a20, backend=0x1aaa080) at pool_proto_modules.c:2496 0000002 0x000000000041c8d0 in pool_process_query (frontend=<value optimized out>, backend=0x1aaa080, reset_request=<value opti at pool_process_query.c:232 0000003 0x000000000040b968 in do_child (unix_fd=4, inet_fd=<value optimized out>) at child.c:371 0000004 0x0000000000404dff in fork_a_child (unix_fd=4, inet_fd=5, id=285) at main.c:1246 0000005 0x000000000040648a in reaper () at main.c:2446 0000006 0x0000000000408854 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:702 (gdb) p hex_str $1 = "^v\340", <incomplete sequence \313> (gdb) p query_in_bind_msg $2 = 0x722240 "" (gdb) p bind_msg $3 = (POOL_SENT_MESSAGE ) 0x1b67218 (gdb) p bind_msg $4 = {kind = 66 'B', len = 24, contents = 0x1b67258 "", num_tsparams = 0, name = 0x1b67278 "", query_context = 0x1b68bb0, param_offset = -2054491066} (gdb) p i $5 = <value optimized out> (gdb) p query_in_bind_msg $6 = 0x722240 "" (gdb) p query_in_bind_msg[i] value has been optimized out (gdb) p query_in_bind_msg[0] $7 = 0 '\000' (gdb)

t-ishii 2013-10-16 10:20 developer ~0000345	Thanks for the info. I think I found the cause of the problem. In the first place, the code should not execute if the query cannot be cached. bind_msg->param_offse is set when the query is SELECT. However, in the code path in question does not check if the query can be cached or not. As a result, if the query cannot be cached, the contents of bind_msg->param_offse is garbage. Attached patch should fix the problem. Also, there was a logic bug with the caliculation of bind_msg->param_offse and it was fixed in the patch as well. This should not affect your case however.

t-ishii 2013-10-16 10:20 developer	pool_proto_modules.c-v2.patch (964 bytes) pool_proto_modules.c-v2.patch (964 bytes)

glenngerhardt 2013-10-17 06:48 reporter ~0000346	Both of these changes have been applied to the code and the test has been running for about 6 hours with no crash seen. Previously it would crash in roughly an hour. Looks good, but we will continue to monitor the test. Thanks!

Date Modified	Username	Field	Change
2013-10-12 03:17	glenngerhardt	New Issue
2013-10-12 03:17	glenngerhardt	File Added: pgpool.conf
2013-10-15 17:46	t-ishii	Note Added: 0000339
2013-10-15 17:46	t-ishii	Assigned To	=> t-ishii
2013-10-15 17:46	t-ishii	Status	new => assigned
2013-10-15 17:48	t-ishii	File Added: pool_proto_modules.c.patch
2013-10-15 17:48	t-ishii	Status	assigned => feedback
2013-10-16 02:27	glenngerhardt	Note Added: 0000343
2013-10-16 02:27	glenngerhardt	Status	feedback => assigned
2013-10-16 10:20	t-ishii	Note Added: 0000345
2013-10-16 10:20	t-ishii	File Added: pool_proto_modules.c-v2.patch
2013-10-16 10:21	t-ishii	Status	assigned => feedback
2013-10-17 06:48	glenngerhardt	Note Added: 0000346
2013-10-17 06:48	glenngerhardt	Status	feedback => assigned
2013-10-19 12:47	t-ishii	Status	assigned => resolved