View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000102 | Pgpool-II | Bug | public | 2014-06-09 10:57 | 2014-06-11 15:51 |
| Reporter | uehara | Assigned To | t-ishii | ||
| Priority | normal | Severity | major | Reproducibility | always |
| Status | resolved | Resolution | open | ||
| Summary | 0000102: pgpool-II(3.3.3) failover does not work properly many times. | ||||
| Description | Hi, The following error occurs in pgpool-II(3.3.3) ERROR: pid 31472: semop(id=16187394) failed: Identifier removed (Conditions) pgpool: version 3.3.3 raw mode backend_flag0 = 'ALLOW_TO_FAILOVER' (DB node is a single.) PostgreSQL: version 9.3.4 | ||||
| Steps To Reproduce | 1. Using the pgbench to give the load. $ pgbench -p 9999 -C -t 180 -c 20 test01 (Please change the load by machine performance.) 2. Shutdown in fast mode PostgreSQL during processing. $ pg_ctl stop -m fast -D $PGDATA 3. Activate the PostgreSQL. $ pg_ctl start -D $PGDATA 4. Run the pgbench again. Message is output in large amounts. ERROR: pid 31472: semop(id=16187394) failed: Identifier removed ... | ||||
| Additional Information | Cause of assuming:Semaphore has been removed by the unexpected circumstances I was sure that there are two case in the places following, semaphore is deleted. ○main.c 1588 if (getpid() != mypid) 1589 { 1590 pool_debug("exit_handler: I am not parent"); 1591 POOL_SETMASK(&UnBlockSig); * 1592 pool_shmem_exit(0); 1593 exit(0); 1594 } ○pool_sema.c 57 semun.buf = &seminfo; 58 if (semctl(semId, 0, IPC_STAT, semun) < 0 59 && (errno == EINVAL || errno == EACCES 60 #ifdef EIDRM 61 || errno == EIDRM 62 #endif 63 )) 64 return; 65 66 semun.val = 0; /* unused, but keep compiler quiet */ 67 * 68 if (semctl(semId, 0, IPC_RMID) < 0) 69 pool_log("semctl(%lu, 0, IPC_RMID, ...) failed: %s", semId, strerror(errno)); 70 } I create a patch for this error. Would you please check the patch? | ||||
| Tags | No tags attached. | ||||
|
|
|
|
|
Thanks for the report and patches! I will look into this. |
|
|
I could not reproduce "ERROR: pid 31472: semop(id=16187394) failed: Identifier removed". Instead I saw many errors like "semop(...) failed: Invalid arugument". Clearly the call to pool_shmem_exit() in main.c is a bug and should be removed as you suggested in your patches. After removing the line I don't see the error any more. I have committed the changes into all supported branches. However I am not so sure about the second part of your patches for pool_sema.c. If we don't call semctl() to remove semaphore when previous semctl() is interrupted, it puts us at risk not to remove semaphore, doesn't it? |
|
|
The modification of pool_sema.c may be a mistake. ---------- In my environment, error messages of two types is output. Message1. failed: Invalid argument Message2. failed: Identifier removed Updated debug log of main.c and pool_sema.c to confirm route. [main.c] 1588 if (getpid() != mypid) 1589 { * 1590 pool_log("exit_handler: I am not parent"); 1591 POOL_SETMASK(&UnBlockSig); 1592 pool_shmem_exit(0); 1593 exit(0); 1594 } [pool_sema.c] 66 semun.val = 0; /* unused, but keep compiler quiet */ * 67 pool_log("remove semId:%lu,semId"); 68 if (semctl(semId, 0, IPC_RMID) < 0) 69 pool_log("semctl(%lu, 0, IPC_RMID, ...) failed: %s", semId, strerror(errno)); 70 } Log of reproduction test using the debug logs that are updated. ... 2014-06-11 09:26:40 LOG: pid 10508: do_child: exits with status 1 due to error 2014-06-11 09:26:40 LOG: pid 10038: starting degeneration. shutdown host 192.168.1.3(5432) 2014-06-11 09:26:40 ERROR: pid 10038: failover_handler: no valid DB node found 2014-06-11 09:26:40 LOG: pid 10038: Restart all children 2014-06-11 09:26:40 LOG: pid 10038: failover: set new primary node: -1 2014-06-11 09:26:40 LOG: pid 10534: worker process received restart request (5432) * 2014-06-11 09:26:40 LOG: pid 10577: exit_handler: I am not parent * 2014-06-11 09:26:40 LOG: pid 10577: remove semId:0,semId 2014-06-11 09:26:40 LOG: pid 10576: exit_handler: I am not parent 2014-06-11 09:26:41 LOG: pid 10533: pcp child process received restart request 2014-06-11 09:26:41 LOG: pid 10038: PCP child 10533 exits with status 256 in failover() 2014-06-11 09:26:41 LOG: pid 10038: fork a new PCP child pid 10614 in failover() 2014-06-11 09:26:41 LOG: pid 10038: worker child 10534 exits with status 256 2014-06-11 09:26:41 LOG: pid 10038: fork a new worker child pid 10615 2014-06-11 09:28:01 ERROR: pid 10613: semop(id=7700482) failed: Identifier removed 2014-06-11 09:28:01 ERROR: pid 10613: semop(id=7700482) failed: Identifier removed ... As a result, I confirmed that semctl() ran after processing of pool_shmem_exit(). So, semctl() may not have caused the problem. I'm sorry. Confirmation was leaking. |
|
|
So just removing the call to pool_shmem_exit() in main.c solved your problem? I just want to make sure that. |
|
|
Thank you for your help. My problem has been resolved. |
|
|
Thanks. Glad to hear that! I'm going to mark this issue as "resolved". Again thanks for your findings and patches! |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2014-06-09 10:57 | uehara | New Issue | |
| 2014-06-09 10:57 | uehara | File Added: patch.zip | |
| 2014-06-09 11:12 | t-ishii | Note Added: 0000422 | |
| 2014-06-09 11:12 | t-ishii | Assigned To | => t-ishii |
| 2014-06-09 11:12 | t-ishii | Status | new => assigned |
| 2014-06-09 21:38 | t-ishii | Note Added: 0000423 | |
| 2014-06-09 22:01 | t-ishii | Changeset attached | => pgpool2 V3_1_STABLE 6d699404 |
| 2014-06-09 22:01 | t-ishii | Changeset attached | => pgpool2 V3_0_STABLE 4edde934 |
| 2014-06-11 09:43 | uehara | Note Added: 0000424 | |
| 2014-06-11 15:01 | t-ishii | Note Added: 0000425 | |
| 2014-06-11 15:23 | uehara | Note Added: 0000426 | |
| 2014-06-11 15:50 | t-ishii | Note Added: 0000427 | |
| 2014-06-11 15:51 | t-ishii | Status | assigned => resolved |