View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000303 | Pgpool-II | Bug | public | 2017-04-19 17:05 | 2017-08-03 14:44 |
| Reporter | nagata | Assigned To | t-ishii | ||
| Priority | normal | Severity | minor | Reproducibility | unable to reproduce |
| Status | resolved | Resolution | open | ||
| Summary | 0000303: db_node_is is not properly initialized | ||||
| Description | POOL_CONNECTION has a member named db_node_id that is DB node id for this connection. This value is used to output log or degenerate the backend in pool_read(), for example. 221 /* if fail_over_on_backend_error is true, then trigger failover */ 222 if (pool_config->fail_over_on_backend_error) 223 { 224 notice_backend_error(cp->db_node_id, true); 225 226 /* If we are in the main process, we will not exit */ 227 child_exit(POOL_EXIT_AND_RESTART); 228 ereport(ERROR, 229 (errmsg("unable to read data from DB node %d",cp->db_node_id), 230 errdetail("socket read failed with an error \"%s\"", strerror(errno)))); 231 } However, db_node_id is initialized only in connect_backend() which is called only by child processes. In other processes the value is not initialized and always 0. So, when pool_read is called from the parent process or the worker process during health check or streaming-replication delay check, and emits an error in the above code, log message output is as following, no matter which backend it tried to read from. ERROR: unable to read data from DB node 0 ~~~~~~~~~~~~~~~ To make the matter worse, if it is the worker process, backend 0 is degenerated. (If it is the parent process, notice_backend_error() do nothing.) The resolution is to initialize properly even in the parent process or the worker process. I attached the patch to fix it. What do you think about this? | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
> To make the matter worse, if it is the worker process, backend 0 is degenerated. Are you sure? Replication delay check worker does not trigger in this case. Typical log is something like: > 2017-04-20 10:47:13: pid 22543: LOG: failed to connect to PostgreSQL server on "133.137.175.105:11003", getsockopt() detected error "Connection refused" and pool_read is not called, thus does not trigger failover. |
|
|
I've not reproduced it, so the mechanism is my guess from the source code. However, certainly the log provided from our client shows that 2017-04-12 01:26:18: pid 7548: LOG: fork a new worker child process with pid: 12603 ... 2017-04-12 01:38:18: pid 12603: LOG: received degenerate backend request for node_id: 0 from pid [12603] ... 2017-04-12 01:38:18: pid 12603: WARNING: child_exit: called from invalid process. ignored. 2017-04-12 01:38:18: pid 12603: ERROR: unable to read data from DB node 0 2017-04-12 01:38:18: pid 12603: DETAIL: socket read failed with an error "Connection reset by peer" pool_read() is called during authentication in make_persistent_db_connection() do_query() also calls pool_read() |
|
|
Ok. Regarding your patch, I don't like to modify pool_open(). I recommend leave it as it is. More members may be added to struct POOOL_CONNECTION in the future. We don't want to change pool_open() signature to set such new properties. We should add values for struct members such as is_backend and db_node_id, after calling pool_open(). |
|
|
Why don't initialize the new member in pool_open() even when a new member is added to POOL_CONNECTION? I think it is better to initialize the structure's members as early stage as possible to avoid use uninitialized variable. |
|
|
As I said, that way will bring big maintenance headache in the future. |
|
|
But, if we add a new member, anyway we have to maintenance this including initialization. So, is it easier to handle it in a single function like pool_open() than doing it different places? |
|
|
Am I missing something? |
|
|
Go for it. |
|
|
Since there was no progress since April 2017, I took over this and committed/pushed fix. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2017-04-19 17:05 | nagata | New Issue | |
| 2017-04-19 17:05 | nagata | File Added: db_node_id.patch | |
| 2017-04-20 10:57 | t-ishii | Note Added: 0001435 | |
| 2017-04-20 13:06 | nagata | Note Added: 0001436 | |
| 2017-04-20 13:07 | nagata | Note Edited: 0001436 | |
| 2017-04-20 13:13 | nagata | Note Edited: 0001436 | |
| 2017-04-20 14:15 | t-ishii | Note Added: 0001437 | |
| 2017-04-20 15:54 | nagata | Note Added: 0001438 | |
| 2017-04-20 16:03 | t-ishii | Note Added: 0001439 | |
| 2017-04-20 16:17 | nagata | Note Added: 0001440 | |
| 2017-04-26 13:54 | nagata | Note Added: 0001475 | |
| 2017-04-27 07:57 | t-ishii | Note Added: 0001480 | |
| 2017-04-27 08:01 | t-ishii | Assigned To | => nagata |
| 2017-04-27 08:01 | t-ishii | Status | new => assigned |
| 2017-08-03 12:02 | t-ishii | Assigned To | nagata => t-ishii |
| 2017-08-03 12:03 | t-ishii | Note Added: 0001619 | |
| 2017-08-03 14:44 | t-ishii | Status | assigned => resolved |