[pgpool-general: 9500] Re: Weird error in pgpool 4.6 logs: WARNING: failed to lock semaphore

Mon Jun 2 23:54:33 JST 2025

Here is the strace result, hope this helps in any way, it takes about 10
minutes for the semaphors to 'go away'...

On Mon, Jun 2, 2025 at 2:53 PM Tatsuo Ishii <ishii at postgresql.org> wrote:

> You have 4 pgpool nodes and you issue the queries through one of
> pgpools. Suppose you issued the queries through node 0, the "status"
> field is only updated on node 0. On other 3 nodes the status remains
> "waiting". However the status on all 4 nodes will be synched when the
> status becomes "down", or becomes "up" and the former status is
> "down".
>
> > That's where I wonder if this is working properly, because there seem to
> be
> > clients/connections happening, I ran the backend status command this is
> > what it showed:
> >
> > node_id |        hostname        | port | status |  role   | select_cnt |
> > insert_cnt | update_cnt | delete_cnt | ddl_cnt | other_cnt | panic_cnt |
> > fatal_cnt | error_cnt
> >
> >
> ---------+------------------------+------+--------+---------+------------+------------+------------+------------+---------+-----------+-----------+-----------+-----------
> >
> > 0       | db-primarypriv | 5434 | up     | primary | 7552       | 47
> >   | 332        | 27         | 0       | 13678     | 0         | 0
>  |
> > 6
> >
> > 1       | db-replica1priv | 5434 | up     | standby | 6817       | 0
> >   | 0          | 0          | 0       | 13682     | 0         | 0
>  |
> > 0
> >
> > 2       | db-replica2priv | 5434 | up     | standby | 6514       | 0
> >   | 0          | 0          | 0       | 13693     | 0         | 0
>  |
> > 1
> >
> > 3       | db-replica3priv | 5434 | up     | standby | 6910       | 0
> >   | 0          | 0          | 0       | 13679     | 0         | 0
>  |
> > 1
> >
> > (4 rows)
> >
> > So plenty of queries supposedly happening on them, but the nodes still
> show
> > status 1.  What gives?
> >
> >
> > On Mon, Jun 2, 2025 at 1:43 PM Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >
> >> > Hi, another thing I noticed - the status of all those nodes in
> >> > pcp_node_info seems to be wrong, for example:
> >> > $ pcp_node_info -h localhost -p 9898 -U pcpadmin --verbose -n 3
> >> > Hostname               : db-replica2
> >> > Port                   : 5434
> >> > Status                 : 1
> >> > Weight                 : 0.250000
> >> > Status Name            : waiting
> >> > Backend Status Name    : up
> >> > Role                   : standby
> >> > Backend Role           : standby
> >> > Replication Delay      : 0
> >> > Replication State      : streaming
> >> > Replication Sync State : async
> >> > Last Status Change     : 2025-06-02 12:31:46
> >>
> >> Are you saying this is wrong?
> >>
> >> > Status                 : 1
> >>
> >> Seems normal to me because it just says no client has connected to the
> >> pgpool server until now.
> >>
> >> > Do you think this is because of the semaphore problem, or does it
> suggest
> >> > something else completely?
> >> >
> >> > On Mon, Jun 2, 2025 at 12:59 PM Tatsuo Ishii <ishii at postgresql.org>
> >> wrote:
> >> >
> >> >> > Hi, I changed the pgpool.service file for startup as you said,
> >> replaced
> >> >> my
> >> >> > start with this:
> >> >> > ExecStart=strace -f -tt --trace=ipc /usr/sbin/pgpool -f
> >> >> > /etc/pgpool2/pgpool.conf $OPTS
> >> >> > I'm attaching the log from replica again, but I'm not sure this
> >> actually
> >> >> > worked (as in, it doesn't look much different, to me at least).
> >> >>
> >> >> Yeah. It seems there's strace output in the log.
> >> >> Can you try "-o" of strace, which instructs strace a file name to
> print
> >> >> the output. e.g.
> >> >>
> >> >> ExecStart=strace -f -tt --trace=ipc -o /tmp/strace.out
> /usr/sbin/pgpool
> >> -f
> >> >> /etc/pgpool2/pgpool.conf $OPTS
> >> >>
> >> >> > On Mon, Jun 2, 2025 at 11:47 AM Tatsuo Ishii <ishii at postgresql.org
> >
> >> >> wrote:
> >> >> >
> >> >> >> > Hi,
> >> >> >> > I tried the strace as you shown but it didn't really output
> >> anything:
> >> >> >> >
> >> >> >> > $ sudo strace -p 1662563 -f -tt --trace=ipc
> >> >> >> > strace: Process 1662563 attached
> >> >> >> >
> >> >> >> > ^Cstrace: Process 1662563 detached.   <--- this is me getting
> out
> >> of
> >> >> it,
> >> >> >> > since I saw in another window the semaphores were gone already,
> >> >> without
> >> >> >> > anything outputting here.  Unless something's wrong with the
> >> syntax of
> >> >> >> the
> >> >> >> > command, it didn't really output anything.  I changed the log
> line
> >> you
> >> >> >> > suggested as well.
> >> >> >> >
> >> >> >> > Also, FYI - I tried to downgrade to a lower (4.3.7) version as
> it
> >> was
> >> >> >> > available in my package repository, and the same semaphore
> version
> >> >> >> persists,
> >> >> >>
> >> >> >> Okay. I think you need to start pgpool from strace. I mean,
> modifiy
> >> >> >> pgpool.service from:
> >> >> >> ExecStart=/usr/sbin/pgpool -f /etc/pgpool2/pgpool.conf $OPTS
> >> >> >> to:
> >> >> >> ExecStart=strace -f -tt --trace=ipc /usr/sbin/pgpool -f
> >> >> >> /etc/pgpool2/pgpool.conf $OPTS
> >> >> >>
> >> >> >> > On Mon, Jun 2, 2025 at 2:40 AM Tatsuo Ishii <
> ishii at postgresql.org>
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> > Hi,
> >> >> >> >> > Not sure I can patch the source, as it was installed via apt
> get
> >> >> and
> >> >> >> not
> >> >> >> >> > compiled from source.  If that is doable to a package-built
> one
> >> and
> >> >> >> you
> >> >> >> >> can
> >> >> >> >> > give me some instructions, I can certainly try.
> >> >> >> >>
> >> >> >> >> Unfortunately I'm not familiar with debian packaging.
> >> >> >> >>
> >> >> >> >> > As for the pgpool log, I'm attaching it, hopefully it will go
> >> >> through,
> >> >> >> >> it's
> >> >> >> >> > about 5mb as I enabled higher debug options.
> >> >> >> >>
> >> >> >> >> It was sent as a direct mail to me. Not sure who/what removed
> the
> >> >> >> >> semaphore. I observe following from the log anyway:
> >> >> >> >>
> >> >> >> >> - pgpool started on 16:10:21. At that time the semaphore
> >> definitely
> >> >> >> >> existed.
> >> >> >> >> - Until 16:15:13, everything worked great.
> >> >> >> >> - On 16:15:23, the problem started.
> >> >> >> >>
> >> >> >> >> So it seems the semaphore was removed by someone between
> 16:15:13
> >> and
> >> >> >> >> 16:15:23.
> >> >> >> >>
> >> >> >> >> Unfortunately the removal of semaphore was not recorded in the
> >> log.
> >> >> To
> >> >> >> >> determine who removed the semaphore, can you please attach
> strace
> >> to
> >> >> >> >> pgpool main and share the log? This time debug log is not
> >> necessary.
> >> >> >> >>
> >> >> >> >> - Find the pgpool parent process pid (in the debug log it was
> >> >> 1483510)
> >> >> >> >> - Become root (attaching strace usually requires root
> privilege)
> >> >> >> >> - Attach strace to the process $pid
> >> >> >> >>
> >> >> >> >> strace -p $pid -f -tt --trace=ipc
> >> >> >> >>
> >> >> >> >> -f: (follow the child process of pgpool main)
> >> >> >> >> -tt: add timestamp with milliseconds
> >> >> >> >> --trace=ipc: limit the tracing system call to ipc to reduce the
> >> log
> >> >> >> >>
> >> >> >> >> BTW, I recommend yo to set log_line_prefix to something like
> this
> >> so
> >> >> >> >> that you can take milliseconds log (%m) and process application
> >> name
> >> >> >> >> (%a). They are usable in 4.6.
> >> >> >> >>
> >> >> >> >> log_line_prefix = '%m: %a pid %p: '
> >> >> >> >>
> >> >> >> >> > On Sun, Jun 1, 2025 at 4:20 PM Tatsuo Ishii <
> >> ishii at postgresql.org>
> >> >> >> >> wrote:
> >> >> >> >> >
> >> >> >> >> >> > To your points/questions:
> >> >> >> >> >> > 1) version is 4.6.1 - the source I'm using (via apt get)
> >> doesn't
> >> >> >> have
> >> >> >> >> >> 4.6.2
> >> >> >> >> >> > yet
> >> >> >> >> >>
> >> >> >> >> >> Ok.
> >> >> >> >> >>
> >> >> >> >> >> > 2) ipcs -s shows... nothing, literally.  Checked all
> nodes,
> >> i.e.
> >> >> >> >> >> >
> >> >> >> >> >> > postgres at db-replica3:~$ ipcs -s
> >> >> >> >> >> >
> >> >> >> >> >> > ------ Semaphore Arrays --------
> >> >> >> >> >> > key        semid      owner      perms      nsems
> >> >> >> >> >> >
> >> >> >> >> >> > Also, it kinda seems like "failed to unlock semaphore"
> shows
> >> >> during
> >> >> >> >> >> startup
> >> >> >> >> >> > of pgpool and while it's running, but maybe I'm not
> reading
> >> the
> >> >> log
> >> >> >> >> >> > properly.
> >> >> >> >> >>
> >> >> >> >> >> So pgpool lost semaphore soon after starting. Can you share
> >> full
> >> >> >> >> >> pgpool log before "failed to unlock semaphore" appears?
> >> >> >> >> >>
> >> >> >> >> >> > 3) regarding heartbeats, my pgpool.conf has it roughly
> like
> >> >> this:
> >> >> >> >> >> >
> >> >> >> >> >> > heartbeat_hostname0 = 'db-primary-priv'
> >> >> >> >> >> > heartbeat_port0 = 9694
> >> >> >> >> >> > heartbeat_device0 = ''
> >> >> >> >> >> >
> >> >> >> >> >> > heartbeat_hostname1 = 'db-replica1-priv'
> >> >> >> >> >> > heartbeat_port1 = 9694
> >> >> >> >> >> > heartbeat_device1 = ''
> >> >> >> >> >> >
> >> >> >> >> >> > heartbeat_hostname2 = 'db-replica2-priv'
> >> >> >> >> >> > heartbeat_port2 = 9694
> >> >> >> >> >> > heartbeat_device2 = ''
> >> >> >> >> >> >
> >> >> >> >> >> > heartbeat_hostname3 = 'db-replica3-priv'
> >> >> >> >> >> > heartbeat_port3 = 9694
> >> >> >> >> >> > heartbeat_device3 = ''
> >> >> >> >> >>
> >> >> >> >> >> Looks normal.
> >> >> >> >> >>
> >> >> >> >> >> I have checked the pgpool source code and found suspicious
> >> >> >> >> >> code. before 4.6, pgpool heartbeart receiver process binds
> to
> >> any
> >> >> >> >> >> address on the host. However from 4.6 pgpool binds to only
> loop
> >> >> back
> >> >> >> >> >> address, which is plain wrong. Is it possible for you to
> patch
> >> the
> >> >> >> >> >> pgpool source code and build pgpool binary?
> >> >> >> >> >>
> >> >> >> >> >> > So it seems likely that the inability to lock/unlock
> >> semaphores
> >> >> is
> >> >> >> the
> >> >> >> >> >> > problem, question is why and how do I resolve it? :(
> >> >> >> >> >>
> >> >> >> >> >> Currently I have no idea. That's why I am requesting the
> pgpool
> >> >> log.
> >> >> >> >> >>
> >> >> >> >> >> > I'll c/p parts of my pgpool.service file as well, in case
> you
> >> >> may
> >> >> >> see
> >> >> >> >> >> > something 'off' in it?
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > [Service]
> >> >> >> >> >> > User=postgres
> >> >> >> >> >> > Group=postgres
> >> >> >> >> >> >
> >> >> >> >> >> > EnvironmentFile=-/etc/default/pgpool2
> >> >> >> >> >> >
> >> >> >> >> >> > ExecStart=/usr/sbin/pgpool -f /etc/pgpool2/pgpool.conf
> $OPTS
> >> >> >> >> >> > ExecStop=/usr/sbin/pgpool -f /etc/pgpool2/pgpool.conf
> >> $STOP_OPTS
> >> >> >> stop
> >> >> >> >> >> > ExecReload=/usr/sbin/pgpool -f /etc/pgpool2/pgpool.conf
> >> reload
> >> >> >> >> >> > LimitNOFILE=65536:65536
> >> >> >> >> >> > LimitNPROC=16384:16384
> >> >> >> >> >> > LimitSTACK=10485760:33554432
> >> >> >> >> >> > LimitMEMLOCK=infinity
> >> >> >> >> >> > LimitDATA=infinity
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > Thanks in advance.
> >> >> >> >> >> >
> >> >> >> >> >> > On Sun, Jun 1, 2025 at 2:06 PM Tatsuo Ishii <
> >> >> ishii at postgresql.org>
> >> >> >> >> >> wrote:
> >> >> >> >> >> >
> >> >> >> >> >> >> > Hi,
> >> >> >> >> >> >> > So long story short, I'm migrating a PG13 cluster from
> >> RHEL
> >> >> to
> >> >> >> >> Ubuntu
> >> >> >> >> >> >> > 24.04. In the process, the pgpool version is also being
> >> >> >> upgraded to
> >> >> >> >> >> 4.6
> >> >> >> >> >> >> > (from 4.2).
> >> >> >> >> >> >>
> >> >> >> >> >> >> What is the exact version of 4.6? I am asking because
> 4.6.0
> >> has
> >> >> >> >> >> >> vulnerability.
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> https://pgpool.net/mediawiki/index.php/Main_Page#Pgpool-II_4.6.1.2C_4.5.7.2C_4.4.12.2C_4.3.15_and_4.2.22_officially_released_.282025.2F05.2F15.29
> >> >> >> >> >> >>
> >> >> >> >> >> >> Also 4.6.1 (and 4.6.0) have an issue with IPv6.
> >> >> >> >> >> >>
> >> https://www.pgpool.net/docs/latest/en/html/release-4-6-2.html
> >> >> >> >> >> >>
> >> >> >> >> >> >> If possible, I recommend to upgrade to 4.6.2.
> >> >> >> >> >> >>
> >> >> >> >> >> >> > There is also repmgr running on this 4 node cluster, 1
> >> >> primary
> >> >> >> >> >> >> > and 3 replicas.
> >> >> >> >> >> >>
> >> >> >> >> >> >> I am not famililar with repmgr, and I ignore this part.
> >> >> >> >> >> >>
> >> >> >> >> >> >> > All the migration seems to have completed successfully,
> >> >> >> >> >> >> > but I've ran into a weird problem with pgpool.  I have
> >> quite
> >> >> a
> >> >> >> few
> >> >> >> >> >> errors
> >> >> >> >> >> >> > that look like the below entries, and it seems to "come
> >> and
> >> >> go"
> >> >> >> >> when
> >> >> >> >> >> >> > checking pgpool log.  The process itself, seems to
> work (I
> >> >> >> think),
> >> >> >> >> but
> >> >> >> >> >> >> > every now and then, these errors (warnings?) show up:
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628043: DEBUG:  watchdog
> >> heartbeat:
> >> >> >> send
> >> >> >> >> >> >> heartbeat
> >> >> >> >> >> >> > signal to db-replica3-priv:9694
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628043: LOCATION:
> >> >> wd_heartbeat.c:771
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 627958: LOG:  exit handler
> called
> >> >> >> >> (signal: 2)
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 627958: LOCATION:
> >> >> pgpool_main.c:1272
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > *2025-06-01 09:11:00: pid 627958: WARNING:  failed to
> lock
> >> >> >> >> >> >> > semaphore2025-06-01 09:11:00: pid 627958: DETAIL:
> Invalid
> >> >> >> >> >> >> argument*2025-06-01
> >> >> >> >> >> >> > 09:11:00: pid 627958: LOCATION:  pool_sema.c:129
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 627958: WARNING:  failed to
> >> unlock
> >> >> >> >> semaphore
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 627958: DETAIL:  Invalid
> argument
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 627958: LOCATION:
> >> pool_sema.c:202
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 627958: LOG:  shutting down by
> >> >> signal 2
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 627958: LOCATION:
> >> >> pgpool_main.c:1324
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 627958: LOG:  terminating all
> >> child
> >> >> >> >> processes
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 627958: LOCATION:
> >> >> pgpool_main.c:1334
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628047: WARNING:  failed to
> lock
> >> >> >> semaphore
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628047: DETAIL:  Invalid
> argument
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628047: LOCATION:
> >> pool_sema.c:129
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628042: WARNING:  failed to
> lock
> >> >> >> semaphore
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628042: DETAIL:  Invalid
> argument
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628042: LOCATION:
> >> pool_sema.c:129
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628039: WARNING:  failed to
> lock
> >> >> >> semaphore
> >> >> >> >> >> >> > 2025-06-01 09:11:00: pid 628039: DETAIL:  Invalid
> argument
> >> >> >> >> >> >>
> >> >> >> >> >> >> Can you share the result of "ipcs -s" so that I can
> confirm
> >> >> pgpool
> >> >> >> >> >> >> properly allocates the semaphore.
> >> >> >> >> >> >>
> >> >> >> >> >> >> By the way do you see "WARNING: failed to unlock
> semaphore"
> >> >> only
> >> >> >> when
> >> >> >> >> >> >> pgpool is shutting down?
> >> >> >> >> >> >>
> >> >> >> >> >> >> > Also, earlier in the log, I see errors about being
> unable
> >> to
> >> >> >> >> create a
> >> >> >> >> >> >> > socket, for some reason it's trying (and failing) to
> >> create
> >> >> >> socket
> >> >> >> >> on
> >> >> >> >> >> >> > localhost (127.0.0.1:9694) and  "creating socket on
> >> >> ::1:9694".
> >> >> >> >> Are
> >> >> >> >> >> >> these
> >> >> >> >> >> >> > "ok to ignore"?
> >> >> >> >> >> >>
> >> >> >> >> >> >> First of all "creating socket on ::1:9694" looks unusual
> >> >> because
> >> >> >> it
> >> >> >> >> >> >> means that heartbeat_hostname is set to "::1", which is
> the
> >> >> IPv6
> >> >> >> loop
> >> >> >> >> >> >> back address. heartbeat_hostname is expected to be set
> IPv4
> >> or
> >> >> >> IPv6
> >> >> >> >> >> >> inet addesss (not loop back address).
> >> >> >> >> >> >>
> >> >> >> >> >> >> For now, I would like to wait for your reply regarding
> >> these my
> >> >> >> >> >> >> questions above.
> >> >> >> >> >> >>
> >> >> >> >> >> >> Best regards,
> >> >> >> >> >> >> --
> >> >> >> >> >> >> Tatsuo Ishii
> >> >> >> >> >> >> SRA OSS K.K.
> >> >> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
> >> >> >> >> >> >> Japanese:http://www.sraoss.co.jp
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: DETAIL:  creating
> socket
> >> on
> >> >> >> >> >> >> 127.0.0.1:9694
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: LOCATION:
> >> >> wd_heartbeat.c:385
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: LOG:  set SO_REUSEPORT
> >> >> option
> >> >> >> to
> >> >> >> >> the
> >> >> >> >> >> >> socket
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: LOCATION:
> >> >> wd_heartbeat.c:891
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: LOG:  creating
> watchdog
> >> >> >> heartbeat
> >> >> >> >> >> >> receive
> >> >> >> >> >> >> > socket.
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: DETAIL:  creating
> socket
> >> on
> >> >> >> >> ::1:9694
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: LOCATION:
> >> >> wd_heartbeat.c:385
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: LOG:  failed to create
> >> >> watchdog
> >> >> >> >> >> >> heartbeat
> >> >> >> >> >> >> > receive socket. retrying...
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: DETAIL:  bind failed
> with
> >> >> >> reason:
> >> >> >> >> >> >> "Cannot
> >> >> >> >> >> >> > assign requested address"
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634772: LOCATION:
> >> >> wd_heartbeat.c:394
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: DEBUG:
> >> >> >> >> verify_backend_node_status:
> >> >> >> >> >> >> > true_primary 0
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: LOCATION:
> >> >> pgpool_main.c:2704
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: DEBUG:  node
> status[0]: 1
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: LOCATION:
> >> >> >> pool_worker_child.c:267
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: DEBUG:  node
> status[1]: 2
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: LOCATION:
> >> >> >> pool_worker_child.c:267
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: DEBUG:  node
> status[2]: 2
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: LOCATION:
> >> >> >> pool_worker_child.c:267
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: DEBUG:  node
> status[3]: 2
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: LOCATION:
> >> >> >> pool_worker_child.c:267
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: DEBUG:
> >> >> >> >> >> pool_release_follow_primary_lock
> >> >> >> >> >> >> > called
> >> >> >> >> >> >> > 2025-06-01 09:11:22: pid 635274: LOCATION:
> >> >> pgpool_main.c:4106
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634765: LOG:  set SO_REUSEPORT
> >> >> option
> >> >> >> to
> >> >> >> >> the
> >> >> >> >> >> >> socket
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634765: LOCATION:
> >> >> wd_heartbeat.c:891
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634765: LOG:  creating
> watchdog
> >> >> >> heartbeat
> >> >> >> >> >> >> receive
> >> >> >> >> >> >> > socket.
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634765: DETAIL:  creating
> socket
> >> on
> >> >> >> >> >> >> 127.0.0.1:9694
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634765: LOCATION:
> >> >> wd_heartbeat.c:385
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634765: LOG:  set SO_REUSEPORT
> >> >> option
> >> >> >> to
> >> >> >> >> the
> >> >> >> >> >> >> socket
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634765: LOCATION:
> >> >> wd_heartbeat.c:891
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634765: LOG:  creating
> watchdog
> >> >> >> heartbeat
> >> >> >> >> >> >> receive
> >> >> >> >> >> >> > socket.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > *2025-06-01 09:11:23: pid 634765: DETAIL:  creating
> >> socket on
> >> >> >> >> >> >> > ::1:96942025-06-01 09:11:23: pid 634765: LOCATION:
> >> >> >> >> >> >> >  wd_heartbeat.c:3852025-06-01 09:11:23: pid 634765:
> LOG:
> >> >> >> failed to
> >> >> >> >> >> >> create
> >> >> >> >> >> >> > watchdog heartbeat receive socket.
> retrying...*2025-06-01
> >> >> >> 09:11:23:
> >> >> >> >> >> pid
> >> >> >> >> >> >> > 634765: DETAIL:  bind failed with reason: "Cannot
> assign
> >> >> >> requested
> >> >> >> >> >> >> address"
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634765: LOCATION:
> >> >> wd_heartbeat.c:394
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634767: LOG:  set SO_REUSEPORT
> >> >> option
> >> >> >> to
> >> >> >> >> the
> >> >> >> >> >> >> socket
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634767: LOCATION:
> >> >> wd_heartbeat.c:891
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634767: LOG:  creating socket
> for
> >> >> >> sending
> >> >> >> >> >> >> heartbeat
> >> >> >> >> >> >> > 2025-06-01 09:11:23: pid 634767: DETAIL:  set
> SO_REUSEPORT
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Other odd errors I see in the logs later on are when
> >> pgpool
> >> >> >> tries
> >> >> >> >> to
> >> >> >> >> >> run
> >> >> >> >> >> >> > escalation script - can't quite find where it's
> failing to
> >> >> find
> >> >> >> >> ipv4
> >> >> >> >> >> >> > address:
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > 2025-06-01 00:07:15: pid 395780: LOG:  Pgpool-II parent
> >> >> process
> >> >> >> >> >> received
> >> >> >> >> >> >> > SIGUSR1
> >> >> >> >> >> >> > 2025-06-01 00:07:15: pid 395780: LOG:  Pgpool-II parent
> >> >> process
> >> >> >> >> >> received
> >> >> >> >> >> >> > watchdog quorum change signal from watchdog
> >> >> >> >> >> >> > 2025-06-01 00:07:15: pid 396973: LOG:  watchdog:
> >> escalation
> >> >> >> started
> >> >> >> >> >> >> > 2025-06-01 00:07:15: pid 395786: LOG:  Setting failover
> >> >> command
> >> >> >> >> >> timeout
> >> >> >> >> >> >> to
> >> >> >> >> >> >> > 135
> >> >> >> >> >> >> > 2025-06-01 00:07:15: pid 395780: LOG:  watchdog cluster
> >> now
> >> >> >> holds
> >> >> >> >> the
> >> >> >> >> >> >> quorum
> >> >> >> >> >> >> > 2025-06-01 00:07:15: pid 395780: DETAIL:  updating the
> >> state
> >> >> of
> >> >> >> >> >> >> quarantine
> >> >> >> >> >> >> > backend nodes
> >> >> >> >> >> >> > Sun Jun  1 12:07:15 AM CEST 2025
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > *escalation.sh: - remove VIP IP from all nodesNode:
> >> >> >> >> >> db-primary-privError:
> >> >> >> >> >> >> > ipv4: Address not found.Node: db-replica1-privError:
> ipv4:
> >> >> >> Address
> >> >> >> >> not
> >> >> >> >> >> >> > found.Node: db-replica2-privError: ipv4: Address not
> >> >> found.Node:
> >> >> >> >> >> >> > db-replica3-privError: ipv4: Address not
> found.*2025-06-01
> >> >> >> >> 00:07:20:
> >> >> >> >> >> pid
> >> >> >> >> >> >> > 396973: LOG:  watchdog escalation successful
> >> >> >> >> >> >> > 2025-06-01 00:07:23: pid 396973: LOG:  successfully
> >> acquired
> >> >> the
> >> >> >> >> >> delegate
> >> >> >> >> >> >> > IP:"192.168.77.130"
> >> >> >> >> >> >> > 2025-06-01 00:07:23: pid 396973: DETAIL:  'if_up_cmd'
> >> >> returned
> >> >> >> with
> >> >> >> >> >> >> success
> >> >> >> >> >> >> > 2025-06-01 00:07:23: pid 395786: LOG:  watchdog
> escalation
> >> >> >> process
> >> >> >> >> >> with
> >> >> >> >> >> >> > pid: 396973 exit with SUCCESS.
> >> >> >> >> >> >> > 2025-06-01 00:07:27: pid 395962: WARNING:  failed to
> lock
> >> >> >> semaphore
> >> >> >> >> >> >> > 2025-06-01 00:07:27: pid 395962: DETAIL:  Invalid
> argument
> >> >> >> >> >> >> > 2025-06-01 00:07:27: pid 395962: WARNING:  failed to
> >> unlock
> >> >> >> >> semaphore
> >> >> >> >> >> >> > 2025-06-01 00:07:27: pid 395962: DETAIL:  Invalid
> argument
> >> >> >> >> >> >> > 2025-06-01 00:07:28: pid 396924: WARNING:  failed to
> lock
> >> >> >> semaphore
> >> >> >> >> >> >> > 2025-06-01 00:07:28: pid 396924: DETAIL:  Invalid
> argument
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > The way I started was I copied pgpool.conf from the
> "old"
> >> and
> >> >> >> >> working
> >> >> >> >> >> >> > pgpool 4.2 configuration and modified it slightly,
> though
> >> >> >> nothing
> >> >> >> >> >> >> > major,changed hostnames and i.e. delegate_ip variable
> name
> >> >> was
> >> >> >> >> changed
> >> >> >> >> >> >> > from delegate_IP.  Also added this variable as without
> >> it, I
> >> >> >> didn't
> >> >> >> >> >> see a
> >> >> >> >> >> >> > socket created for port 9898:
> >> >> >> >> >> >> > unix_socket_directories = '/var/run/postgresql'
> >> >> >> >> >> >> > I had to change network interfaces as well, in the
> RHEL it
> >> >> was
> >> >> >> >> using
> >> >> >> >> >> eth0
> >> >> >> >> >> >> > where on Ubuntu it's ens18, so I changed lines like:
> >> >> >> >> >> >> > /usr/bin/sudo /sbin/ip addr add $_IP_$/23 dev ens18
> label
> >> >> >> ens18:0
> >> >> >> >> >> >> > Other thing I had to change was path to arping as on
> >> Ubuntu
> >> >> 24
> >> >> >> >> it's in
> >> >> >> >> >> >> > /usr/bin (not sbin)
> >> >> >> >> >> >> >  arping_cmd = '/usr/bin/sudo /usr/bin/arping -U $_IP_$
> -w
> >> 1
> >> >> -I
> >> >> >> >> ens18'
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > The other thing I'm confused about, I have two network
> >> >> >> interfaces
> >> >> >> >> on
> >> >> >> >> >> each
> >> >> >> >> >> >> > host, i.e.
> >> >> >> >> >> >> > 192.168.77.135/23 brd 192.168.77.255 scope global
> ens18
> >> >> >> >> >> >> > and
> >> >> >> >> >> >> > inet 192.168.78.135/24 brd 192.168.78.255 scope global
> >> ens19
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > In pgpool.conf, it is referencing the "priv" hosts
> (i.e.
> >> >> >> >> >> >> > heartbeat_hostname2 = 'db-replica2-priv'), which
> >> >> theoretically
> >> >> >> >> should
> >> >> >> >> >> be
> >> >> >> >> >> >> on
> >> >> >> >> >> >> > ens19 network interface as shown above, but in my old
> >> >> >> pgpool.conf,
> >> >> >> >> it
> >> >> >> >> >> was
> >> >> >> >> >> >> > only referencing eth0 on RHEL, even though "priv" hosts
> >> were
> >> >> on
> >> >> >> >> eth1.
> >> >> >> >> >> >> I'm
> >> >> >> >> >> >> > wondering if that needs to be changed in Ubuntu or it
> >> makes
> >> >> no
> >> >> >> >> >> difference
> >> >> >> >> >> >> > here?
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >  Right now, I can't quite figure out whether it is
> >> something
> >> >> >> with
> >> >> >> >> >> >> > pgpool.conf or with the Ubuntu settings and the
> semaphore
> >> >> error
> >> >> >> is
> >> >> >> >> >> quite
> >> >> >> >> >> >> > cryptic.  If you need me to post some parts of my
> >> >> pgpool.conf,
> >> >> >> >> please
> >> >> >> >> >> >> > advise which.  I've seen an older version of the FAQ
> that
> >> >> said
> >> >> >> >> repmgr
> >> >> >> >> >> and
> >> >> >> >> >> >> > pgpool don't work together but then I saw mention that
> >> this
> >> >> was
> >> >> >> no
> >> >> >> >> >> longer
> >> >> >> >> >> >> > the case.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > I would much appreciate any help you can provide in
> >> solving
> >> >> this
> >> >> >> >> >> problem.
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20250602/a45ad60b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strace.out
Type: application/octet-stream
Size: 44819 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20250602/a45ad60b/attachment-0001.obj>