View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000076 | Pgpool-II | Bug | public | 2013-10-12 03:17 | 2013-10-19 12:47 |
| Reporter | glenngerhardt | Assigned To | t-ishii | ||
| Priority | normal | Severity | major | Reproducibility | always |
| Status | resolved | Resolution | open | ||
| Platform | Linux | OS | CentOS | OS Version | 2.6.32-279.22.1 |
| Summary | 0000076: pgpool 3.3.1 has segfault under load of UPDATEs to psql 9.2 database. | ||||
| Description | Core file stack trace: Program terminated with signal 11, Segmentation fault. #0 0x000000000044ce18 in Execute (frontend=0x1248280, backend=0x10c27d0, len=5, contents=0x121d440 "") at pool_proto_modules.c:745 745 pool_proto_modules.c: No such file or directory. in pool_proto_modules.c #0 0x000000000044ce18 in Execute (frontend=0x1248280, backend=0x10c27d0, len=5, contents=0x121d440 "") at pool_proto_modules.c:745 0000001 0x000000000044e27d in ProcessFrontendResponse (frontend=0x1248280, backend=0x10c27d0) at pool_proto_modules.c:2496 0000002 0x000000000041c8d0 in pool_process_query (frontend=<value optimized out>, backend=0x10c27d0, reset_request=<value optimized out>) at pool_process_query.c:232 0000003 0x000000000040b968 in do_child (unix_fd=4, inet_fd=<value optimized out>) at child.c:371 0000004 0x0000000000404dff in fork_a_child (unix_fd=4, inet_fd=5, id=454) at main.c:1246 0000005 0x0000000000408010 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:654 | ||||
| Steps To Reproduce | Load test. Running approximately 300 concurrent UPDATES to pgpool. The UPDATE looks like : UPDATE client SET (device_profile, device_type, model, nonce, os, os_version, vendor, updated_at) = ('(iPhone4;4.2.1;Apple)', 'IMC2.3.0_IP_D2.2.0_S0', 'iPhone4', '<0>drm_token', 'iphone', '4.2.1', 'Apple', NOW() AT TIME ZONE 'UTC') WHERE device_id = 'brg3_IMC2.3.0_IP_D2.2.0_S0_1' AND owner_id = '2' AND user_token = 'brg3' Steps to reproduce are difficult. There's no 1, 2, 3, BANG that does it. I let the load run and occasionally see the crash. | ||||
| Additional Information | Pgpool log: 2013-10-11 17:41:17 ERROR: pid 99744: Child process 100200 was terminated by segmentation fault /var/log/messages: Oct 11 17:41:01 butler kernel: pgpool[100200]: segfault at ffffffffbe51e684 ip 000000000044ce18 sp 00007fff65f626e0 error 4 in pgpool[400000+107000] This defect could be a duplicate of this, but not sure. http://www.pgpool.net/mantisbt/view.php?id=55 | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
The place where segfault gets happend is in the code path of query cache. While bug 0000055 did not use the query cache. So I think it is not likely those bugs are directly connected. In the mean time I found a nasty bug with the code path of the query cache. This may or may not be connected your problem but can you please try it out? |
|
|
|
|
|
I am sorry, but the patch did not work. The patch was on line 748 and the crash is on line 745 (before that loop). It still crashes. Here is some info from the core file that should be helpful. It seems that the bind_msg->param_offset is a very large negative number. (gdb) bt #0 0x000000000044ce18 in Execute (frontend=0x1b75a20, backend=0x1aaa080, len=5, contents=0x1b6f630 "") at pool_proto_modu 0000001 0x000000000044e27d in ProcessFrontendResponse (frontend=0x1b75a20, backend=0x1aaa080) at pool_proto_modules.c:2496 0000002 0x000000000041c8d0 in pool_process_query (frontend=<value optimized out>, backend=0x1aaa080, reset_request=<value opti at pool_process_query.c:232 0000003 0x000000000040b968 in do_child (unix_fd=4, inet_fd=<value optimized out>) at child.c:371 0000004 0x0000000000404dff in fork_a_child (unix_fd=4, inet_fd=5, id=285) at main.c:1246 0000005 0x000000000040648a in reaper () at main.c:2446 0000006 0x0000000000408854 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:702 (gdb) p hex_str $1 = "^v\340", <incomplete sequence \313> (gdb) p query_in_bind_msg $2 = 0x722240 "" (gdb) p bind_msg $3 = (POOL_SENT_MESSAGE *) 0x1b67218 (gdb) p *bind_msg $4 = {kind = 66 'B', len = 24, contents = 0x1b67258 "", num_tsparams = 0, name = 0x1b67278 "", query_context = 0x1b68bb0, param_offset = -2054491066} (gdb) p i $5 = <value optimized out> (gdb) p query_in_bind_msg $6 = 0x722240 "" (gdb) p query_in_bind_msg[i] value has been optimized out (gdb) p query_in_bind_msg[0] $7 = 0 '\000' (gdb) |
|
|
Thanks for the info. I think I found the cause of the problem. In the first place, the code should not execute if the query cannot be cached. bind_msg->param_offse is set when the query is SELECT. However, in the code path in question does not check if the query can be cached or not. As a result, if the query cannot be cached, the contents of bind_msg->param_offse is garbage. Attached patch should fix the problem. Also, there was a logic bug with the caliculation of bind_msg->param_offse and it was fixed in the patch as well. This should not affect your case however. |
|
|
|
|
|
Both of these changes have been applied to the code and the test has been running for about 6 hours with no crash seen. Previously it would crash in roughly an hour. Looks good, but we will continue to monitor the test. Thanks! |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2013-10-12 03:17 | glenngerhardt | New Issue | |
| 2013-10-12 03:17 | glenngerhardt | File Added: pgpool.conf | |
| 2013-10-15 17:46 | t-ishii | Note Added: 0000339 | |
| 2013-10-15 17:46 | t-ishii | Assigned To | => t-ishii |
| 2013-10-15 17:46 | t-ishii | Status | new => assigned |
| 2013-10-15 17:48 | t-ishii | File Added: pool_proto_modules.c.patch | |
| 2013-10-15 17:48 | t-ishii | Status | assigned => feedback |
| 2013-10-16 02:27 | glenngerhardt | Note Added: 0000343 | |
| 2013-10-16 02:27 | glenngerhardt | Status | feedback => assigned |
| 2013-10-16 10:20 | t-ishii | Note Added: 0000345 | |
| 2013-10-16 10:20 | t-ishii | File Added: pool_proto_modules.c-v2.patch | |
| 2013-10-16 10:21 | t-ishii | Status | assigned => feedback |
| 2013-10-17 06:48 | glenngerhardt | Note Added: 0000346 | |
| 2013-10-17 06:48 | glenngerhardt | Status | feedback => assigned |
| 2013-10-19 12:47 | t-ishii | Status | assigned => resolved |