View Issue Details

IDProjectCategoryView StatusLast Update
0000102Pgpool-IIBugpublic2014-06-11 15:51
ReporterueharaAssigned Tot-ishii 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionopen 
Product Version 
Target VersionFixed in Version 
Summary0000102: pgpool-II(3.3.3) failover does not work properly many times.
DescriptionHi,

The following error occurs in pgpool-II(3.3.3)

ERROR: pid 31472: semop(id=16187394) failed: Identifier removed


(Conditions)
pgpool: version 3.3.3
        raw mode
        backend_flag0 = 'ALLOW_TO_FAILOVER'
        (DB node is a single.)

PostgreSQL: version 9.3.4

Steps To Reproduce1. Using the pgbench to give the load.
  $ pgbench -p 9999 -C -t 180 -c 20 test01
  (Please change the load by machine performance.)

2. Shutdown in fast mode PostgreSQL during processing.
  $ pg_ctl stop -m fast -D $PGDATA

3. Activate the PostgreSQL.
  $ pg_ctl start -D $PGDATA

4. Run the pgbench again.
  Message is output in large amounts.

  ERROR: pid 31472: semop(id=16187394) failed: Identifier removed
  ...
Additional InformationCause of assuming:Semaphore has been removed by the unexpected circumstances

I was sure that there are two case in the places following, semaphore is deleted.

○main.c
  1588 if (getpid() != mypid)
  1589 {
  1590 pool_debug("exit_handler: I am not parent");
  1591 POOL_SETMASK(&UnBlockSig);
* 1592 pool_shmem_exit(0);
  1593 exit(0);
  1594 }


○pool_sema.c
  57 semun.buf = &seminfo;
  58 if (semctl(semId, 0, IPC_STAT, semun) < 0
  59 && (errno == EINVAL || errno == EACCES
  60 #ifdef EIDRM
  61 || errno == EIDRM
  62 #endif
  63 ))
  64 return;
  65
  66 semun.val = 0; /* unused, but keep compiler quiet */
  67
* 68 if (semctl(semId, 0, IPC_RMID) < 0)
  69 pool_log("semctl(%lu, 0, IPC_RMID, ...) failed: %s", semId, strerror(errno));
  70 }


I create a patch for this error.

Would you please check the patch?
TagsNo tags attached.

Activities

uehara

2014-06-09 10:57

reporter  

patch.zip (863 bytes)

t-ishii

2014-06-09 11:12

developer   ~0000422

Thanks for the report and patches! I will look into this.

t-ishii

2014-06-09 21:38

developer   ~0000423

I could not reproduce "ERROR: pid 31472: semop(id=16187394) failed: Identifier removed". Instead I saw many errors like "semop(...) failed: Invalid arugument".

Clearly the call to pool_shmem_exit() in main.c is a bug and should be removed as you suggested in your patches. After removing the line I don't see the error any more. I have committed the changes into all supported branches.

However I am not so sure about the second part of your patches for pool_sema.c. If we don't call semctl() to remove semaphore when previous semctl() is interrupted, it puts us at risk not to remove semaphore, doesn't it?

uehara

2014-06-11 09:43

reporter   ~0000424

The modification of pool_sema.c may be a mistake.

----------

In my environment, error messages of two types is output.

Message1. failed: Invalid argument
Message2. failed: Identifier removed

Updated debug log of main.c and pool_sema.c to confirm route.

[main.c]
  1588 if (getpid() != mypid)
  1589 {
* 1590 pool_log("exit_handler: I am not parent");
  1591 POOL_SETMASK(&UnBlockSig);
  1592 pool_shmem_exit(0);
  1593 exit(0);
  1594 }

[pool_sema.c]
  66 semun.val = 0; /* unused, but keep compiler quiet */
* 67 pool_log("remove semId:%lu,semId");
  68 if (semctl(semId, 0, IPC_RMID) < 0)
  69 pool_log("semctl(%lu, 0, IPC_RMID, ...) failed: %s", semId, strerror(errno));
  70 }


Log of reproduction test using the debug logs that are updated.

  ...
  2014-06-11 09:26:40 LOG: pid 10508: do_child: exits with status 1 due to error
  2014-06-11 09:26:40 LOG: pid 10038: starting degeneration. shutdown host 192.168.1.3(5432)
  2014-06-11 09:26:40 ERROR: pid 10038: failover_handler: no valid DB node found
  2014-06-11 09:26:40 LOG: pid 10038: Restart all children
  2014-06-11 09:26:40 LOG: pid 10038: failover: set new primary node: -1
  2014-06-11 09:26:40 LOG: pid 10534: worker process received restart request
  (5432)
* 2014-06-11 09:26:40 LOG: pid 10577: exit_handler: I am not parent
* 2014-06-11 09:26:40 LOG: pid 10577: remove semId:0,semId
  2014-06-11 09:26:40 LOG: pid 10576: exit_handler: I am not parent
  2014-06-11 09:26:41 LOG: pid 10533: pcp child process received restart request
  2014-06-11 09:26:41 LOG: pid 10038: PCP child 10533 exits with status 256 in failover()
  2014-06-11 09:26:41 LOG: pid 10038: fork a new PCP child pid 10614 in failover()
  2014-06-11 09:26:41 LOG: pid 10038: worker child 10534 exits with status 256
  2014-06-11 09:26:41 LOG: pid 10038: fork a new worker child pid 10615
  2014-06-11 09:28:01 ERROR: pid 10613: semop(id=7700482) failed: Identifier removed
  2014-06-11 09:28:01 ERROR: pid 10613: semop(id=7700482) failed: Identifier removed
  ...


As a result, I confirmed that semctl() ran after processing of pool_shmem_exit().

So, semctl() may not have caused the problem.

I'm sorry.
Confirmation was leaking.

t-ishii

2014-06-11 15:01

developer   ~0000425

So just removing the call to pool_shmem_exit() in main.c solved your problem? I just want to make sure that.

uehara

2014-06-11 15:23

reporter   ~0000426

Thank you for your help.

My problem has been resolved.

t-ishii

2014-06-11 15:50

developer   ~0000427

Thanks. Glad to hear that!

I'm going to mark this issue as "resolved".

Again thanks for your findings and patches!

Issue History

Date Modified Username Field Change
2014-06-09 10:57 uehara New Issue
2014-06-09 10:57 uehara File Added: patch.zip
2014-06-09 11:12 t-ishii Note Added: 0000422
2014-06-09 11:12 t-ishii Assigned To => t-ishii
2014-06-09 11:12 t-ishii Status new => assigned
2014-06-09 21:38 t-ishii Note Added: 0000423
2014-06-09 22:01 t-ishii Changeset attached => pgpool2 V3_1_STABLE 6d699404
2014-06-09 22:01 t-ishii Changeset attached => pgpool2 V3_0_STABLE 4edde934
2014-06-11 09:43 uehara Note Added: 0000424
2014-06-11 15:01 t-ishii Note Added: 0000425
2014-06-11 15:23 uehara Note Added: 0000426
2014-06-11 15:50 t-ishii Note Added: 0000427
2014-06-11 15:51 t-ishii Status assigned => resolved