View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000156 | Pgpool-II | Bug | public | 2015-11-16 14:34 | 2016-01-17 22:27 |
| Reporter | harukat | Assigned To | t-ishii | ||
| Priority | normal | Severity | minor | Reproducibility | sometimes |
| Status | resolved | Resolution | open | ||
| Product Version | 3.4.0 | ||||
| Summary | 0000156: Pgpool reload during high load causes the fatal error | ||||
| Description | Pgpool reload during high load causes the fatal error. Nov 13 14:42:10 host01 pgpool_xxxx[80744]: [4427-1] 2015-11-13 14:42:10 80744 LOG: reload config files. Nov 13 14:42:10 host01 pgpool_xxxx[80744]: [4428-1] 2015-11-13 14:42:10 80744 LOG: initializing pool configuration: backend weight for backend:1 changed from 1.000000 to 100.000000 Nov 13 14:42:10 host01 pgpool_xxxx[80744]: [4428-2] 2015-11-13 14:42:10 80744 DETAIL: This change will be effective from next client session Nov 13 14:42:10 host01 pgpool_xxxx[83077]: [2218-1] 2015-11-13 14:42:10 83077 LOG: reloading config file Nov 13 14:42:16 host01 pgpool_xxxx[84339]: [3780-1] 2015-11-13 14:42:16 84339 FATAL: failed to read kind from backend Nov 13 14:42:16 host01 pgpool_xxxx[84339]: [3780-2] 2015-11-13 14:42:16 84339 DETAIL: couldn't find first node. All backend down? Our pool_get_config() chnages "backend_desc" on shared memory without an exclusive lock. Pgpool child process may read invalid "backend_desc" and get an error. | ||||
| Tags | No tags attached. | ||||
|
|
Not clear which configuration items did you change? |
|
|
When any kind of item is changed, this occurs (even reloadable one). |
|
|
In addition, it occurs even in a case except the backend related setting. |
|
|
Works fine for me. What I did was: Run pgpool_setup. Edit pgpool.conf to make "log_per_node_statement" commented out. Run pgbench as "pgbench -p 11000 -S -c 10 -T 3600". Edit pgpool.conf to make "log_per_node_statement" enabled. Run pgpool reload using "pgpool_reload" script. Check to see if per node statement logs appear. |
|
|
This was reproduced in our customer's production environment. It has many CPU core and over 900 frontends. pool_get_config() always sets "pool_config->backend_desc->num_backends" to 0 first and count it up subsequently, though this variable is in the shared memory. --- int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context) ....(snip).... pool_config->backend_desc->num_backends = 0; total_weight = 0.0; for (i=0;i<MAX_CONNECTION_SLOTS;i++) { /* port number == 0 indicates that this server is out of use */ if (BACKEND_INFO(i).backend_port == 0) { clear_host_entry(i); } else { total_weight += BACKEND_INFO(i).unnormalized_weight; pool_config->backend_desc->num_backends = i+1; |
|
|
Got it. However I don't think locking the variable solves the problem. There are so many places where child process do something like: for (i=0;i<pool_config->backend_desc->num_backends;i++) { : : } We have to lock the whole for loop which will hurt concurrency according to your idea. Probably we have to have a process local cache variable something like my_backend_status. |
|
|
I think we must remove unnecessary zeroing num_backends at least. |
|
|
|
|
|
I attached V3_4_STABLE.pool_get_config.diff |
|
|
This doesn't work because child process will be confused while looping for pool_config->backend_desc->num_backends. As I said, pgpool child should look into local cache of pool_config->backend_desc->num_backends, and update the cache whenever it feels convenient (hint: check_restart_request() in child.c). |
|
|
After thinking more, I reached a conclusion that we don't need locking, instead we use a atomic variable (sig_atomic_t) and it's enough for the problem. Fix committed. Thanks. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2015-11-16 14:34 | harukat | New Issue | |
| 2015-11-25 10:57 | t-ishii | Note Added: 0000598 | |
| 2015-11-25 10:57 | t-ishii | Assigned To | => t-ishii |
| 2015-11-25 10:57 | t-ishii | Status | new => feedback |
| 2015-12-04 19:01 | harukat | Note Added: 0000602 | |
| 2015-12-04 19:01 | harukat | Status | feedback => assigned |
| 2015-12-04 19:05 | harukat | Note Added: 0000603 | |
| 2015-12-07 12:08 | t-ishii | Note Added: 0000604 | |
| 2015-12-07 12:09 | t-ishii | Status | assigned => feedback |
| 2015-12-20 19:10 | harukat | Note Added: 0000620 | |
| 2015-12-20 19:10 | harukat | Status | feedback => assigned |
| 2015-12-25 14:43 | t-ishii | Note Added: 0000623 | |
| 2015-12-29 11:18 | harukat | Note Added: 0000628 | |
| 2015-12-29 11:18 | harukat | File Added: V3_4_STABLE.pool_get_config.diff | |
| 2015-12-29 11:19 | harukat | Note Added: 0000629 | |
| 2015-12-29 11:30 | t-ishii | Note Added: 0000630 | |
| 2016-01-17 22:26 | t-ishii | Note Added: 0000636 | |
| 2016-01-17 22:27 | t-ishii | Status | assigned => resolved |