[pgpool-general: 9489] Re: Weird error in pgpool 4.6 logs: WARNING: failed to lock semaphore

Sun Jun 1 21:06:34 JST 2025

> Hi,
> So long story short, I'm migrating a PG13 cluster from RHEL to Ubuntu
> 24.04. In the process, the pgpool version is also being upgraded to 4.6
> (from 4.2).

What is the exact version of 4.6? I am asking because 4.6.0 has vulnerability.
https://pgpool.net/mediawiki/index.php/Main_Page#Pgpool-II_4.6.1.2C_4.5.7.2C_4.4.12.2C_4.3.15_and_4.2.22_officially_released_.282025.2F05.2F15.29

Also 4.6.1 (and 4.6.0) have an issue with IPv6.
https://www.pgpool.net/docs/latest/en/html/release-4-6-2.html

If possible, I recommend to upgrade to 4.6.2.

> There is also repmgr running on this 4 node cluster, 1 primary
> and 3 replicas.

I am not famililar with repmgr, and I ignore this part.

> All the migration seems to have completed successfully,
> but I've ran into a weird problem with pgpool.  I have quite a few errors
> that look like the below entries, and it seems to "come and go" when
> checking pgpool log.  The process itself, seems to work (I think), but
> every now and then, these errors (warnings?) show up:
> 
> 2025-06-01 09:11:00: pid 628043: DEBUG:  watchdog heartbeat: send heartbeat
> signal to db-replica3-priv:9694
> 2025-06-01 09:11:00: pid 628043: LOCATION:  wd_heartbeat.c:771
> 2025-06-01 09:11:00: pid 627958: LOG:  exit handler called (signal: 2)
> 2025-06-01 09:11:00: pid 627958: LOCATION:  pgpool_main.c:1272
> 
> 
> *2025-06-01 09:11:00: pid 627958: WARNING:  failed to lock
> semaphore2025-06-01 09:11:00: pid 627958: DETAIL:  Invalid argument*2025-06-01
> 09:11:00: pid 627958: LOCATION:  pool_sema.c:129
> 2025-06-01 09:11:00: pid 627958: WARNING:  failed to unlock semaphore
> 2025-06-01 09:11:00: pid 627958: DETAIL:  Invalid argument
> 2025-06-01 09:11:00: pid 627958: LOCATION:  pool_sema.c:202
> 2025-06-01 09:11:00: pid 627958: LOG:  shutting down by signal 2
> 2025-06-01 09:11:00: pid 627958: LOCATION:  pgpool_main.c:1324
> 2025-06-01 09:11:00: pid 627958: LOG:  terminating all child processes
> 2025-06-01 09:11:00: pid 627958: LOCATION:  pgpool_main.c:1334
> 2025-06-01 09:11:00: pid 628047: WARNING:  failed to lock semaphore
> 2025-06-01 09:11:00: pid 628047: DETAIL:  Invalid argument
> 2025-06-01 09:11:00: pid 628047: LOCATION:  pool_sema.c:129
> 2025-06-01 09:11:00: pid 628042: WARNING:  failed to lock semaphore
> 2025-06-01 09:11:00: pid 628042: DETAIL:  Invalid argument
> 2025-06-01 09:11:00: pid 628042: LOCATION:  pool_sema.c:129
> 2025-06-01 09:11:00: pid 628039: WARNING:  failed to lock semaphore
> 2025-06-01 09:11:00: pid 628039: DETAIL:  Invalid argument

Can you share the result of "ipcs -s" so that I can confirm pgpool
properly allocates the semaphore.

By the way do you see "WARNING: failed to unlock semaphore" only when
pgpool is shutting down?

> Also, earlier in the log, I see errors about being unable to create a
> socket, for some reason it's trying (and failing) to create socket on
> localhost (127.0.0.1:9694) and  "creating socket on ::1:9694".  Are these
> "ok to ignore"?

First of all "creating socket on ::1:9694" looks unusual because it
means that heartbeat_hostname is set to "::1", which is the IPv6 loop
back address. heartbeat_hostname is expected to be set IPv4 or IPv6
inet addesss (not loop back address).

For now, I would like to wait for your reply regarding these my
questions above.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> 2025-06-01 09:11:23: pid 634772: DETAIL:  creating socket on 127.0.0.1:9694
> 2025-06-01 09:11:23: pid 634772: LOCATION:  wd_heartbeat.c:385
> 2025-06-01 09:11:23: pid 634772: LOG:  set SO_REUSEPORT option to the socket
> 2025-06-01 09:11:23: pid 634772: LOCATION:  wd_heartbeat.c:891
> 2025-06-01 09:11:23: pid 634772: LOG:  creating watchdog heartbeat receive
> socket.
> 2025-06-01 09:11:23: pid 634772: DETAIL:  creating socket on ::1:9694
> 2025-06-01 09:11:23: pid 634772: LOCATION:  wd_heartbeat.c:385
> 2025-06-01 09:11:23: pid 634772: LOG:  failed to create watchdog heartbeat
> receive socket. retrying...
> 2025-06-01 09:11:23: pid 634772: DETAIL:  bind failed with reason: "Cannot
> assign requested address"
> 2025-06-01 09:11:23: pid 634772: LOCATION:  wd_heartbeat.c:394
> 
> 
> 2025-06-01 09:11:22: pid 635274: DEBUG:  verify_backend_node_status:
> true_primary 0
> 2025-06-01 09:11:22: pid 635274: LOCATION:  pgpool_main.c:2704
> 2025-06-01 09:11:22: pid 635274: DEBUG:  node status[0]: 1
> 2025-06-01 09:11:22: pid 635274: LOCATION:  pool_worker_child.c:267
> 2025-06-01 09:11:22: pid 635274: DEBUG:  node status[1]: 2
> 2025-06-01 09:11:22: pid 635274: LOCATION:  pool_worker_child.c:267
> 2025-06-01 09:11:22: pid 635274: DEBUG:  node status[2]: 2
> 2025-06-01 09:11:22: pid 635274: LOCATION:  pool_worker_child.c:267
> 2025-06-01 09:11:22: pid 635274: DEBUG:  node status[3]: 2
> 2025-06-01 09:11:22: pid 635274: LOCATION:  pool_worker_child.c:267
> 2025-06-01 09:11:22: pid 635274: DEBUG:  pool_release_follow_primary_lock
> called
> 2025-06-01 09:11:22: pid 635274: LOCATION:  pgpool_main.c:4106
> 2025-06-01 09:11:23: pid 634765: LOG:  set SO_REUSEPORT option to the socket
> 2025-06-01 09:11:23: pid 634765: LOCATION:  wd_heartbeat.c:891
> 2025-06-01 09:11:23: pid 634765: LOG:  creating watchdog heartbeat receive
> socket.
> 2025-06-01 09:11:23: pid 634765: DETAIL:  creating socket on 127.0.0.1:9694
> 2025-06-01 09:11:23: pid 634765: LOCATION:  wd_heartbeat.c:385
> 2025-06-01 09:11:23: pid 634765: LOG:  set SO_REUSEPORT option to the socket
> 2025-06-01 09:11:23: pid 634765: LOCATION:  wd_heartbeat.c:891
> 2025-06-01 09:11:23: pid 634765: LOG:  creating watchdog heartbeat receive
> socket.
> 
> 
> 
> *2025-06-01 09:11:23: pid 634765: DETAIL:  creating socket on
> ::1:96942025-06-01 09:11:23: pid 634765: LOCATION:
>  wd_heartbeat.c:3852025-06-01 09:11:23: pid 634765: LOG:  failed to create
> watchdog heartbeat receive socket. retrying...*2025-06-01 09:11:23: pid
> 634765: DETAIL:  bind failed with reason: "Cannot assign requested address"
> 2025-06-01 09:11:23: pid 634765: LOCATION:  wd_heartbeat.c:394
> 2025-06-01 09:11:23: pid 634767: LOG:  set SO_REUSEPORT option to the socket
> 2025-06-01 09:11:23: pid 634767: LOCATION:  wd_heartbeat.c:891
> 2025-06-01 09:11:23: pid 634767: LOG:  creating socket for sending heartbeat
> 2025-06-01 09:11:23: pid 634767: DETAIL:  set SO_REUSEPORT
> 
> 
> Other odd errors I see in the logs later on are when pgpool tries to run
> escalation script - can't quite find where it's failing to find ipv4
> address:
> 
> 2025-06-01 00:07:15: pid 395780: LOG:  Pgpool-II parent process received
> SIGUSR1
> 2025-06-01 00:07:15: pid 395780: LOG:  Pgpool-II parent process received
> watchdog quorum change signal from watchdog
> 2025-06-01 00:07:15: pid 396973: LOG:  watchdog: escalation started
> 2025-06-01 00:07:15: pid 395786: LOG:  Setting failover command timeout to
> 135
> 2025-06-01 00:07:15: pid 395780: LOG:  watchdog cluster now holds the quorum
> 2025-06-01 00:07:15: pid 395780: DETAIL:  updating the state of quarantine
> backend nodes
> Sun Jun  1 12:07:15 AM CEST 2025
> 
> 
> 
> 
> 
> 
> 
> 
> 
> *escalation.sh: - remove VIP IP from all nodesNode: db-primary-privError:
> ipv4: Address not found.Node: db-replica1-privError: ipv4: Address not
> found.Node: db-replica2-privError: ipv4: Address not found.Node:
> db-replica3-privError: ipv4: Address not found.*2025-06-01 00:07:20: pid
> 396973: LOG:  watchdog escalation successful
> 2025-06-01 00:07:23: pid 396973: LOG:  successfully acquired the delegate
> IP:"192.168.77.130"
> 2025-06-01 00:07:23: pid 396973: DETAIL:  'if_up_cmd' returned with success
> 2025-06-01 00:07:23: pid 395786: LOG:  watchdog escalation process with
> pid: 396973 exit with SUCCESS.
> 2025-06-01 00:07:27: pid 395962: WARNING:  failed to lock semaphore
> 2025-06-01 00:07:27: pid 395962: DETAIL:  Invalid argument
> 2025-06-01 00:07:27: pid 395962: WARNING:  failed to unlock semaphore
> 2025-06-01 00:07:27: pid 395962: DETAIL:  Invalid argument
> 2025-06-01 00:07:28: pid 396924: WARNING:  failed to lock semaphore
> 2025-06-01 00:07:28: pid 396924: DETAIL:  Invalid argument
> 
> The way I started was I copied pgpool.conf from the "old" and working
> pgpool 4.2 configuration and modified it slightly, though nothing
> major,changed hostnames and i.e. delegate_ip variable name was changed
> from delegate_IP.  Also added this variable as without it, I didn't see a
> socket created for port 9898:
> unix_socket_directories = '/var/run/postgresql'
> I had to change network interfaces as well, in the RHEL it was using eth0
> where on Ubuntu it's ens18, so I changed lines like:
> /usr/bin/sudo /sbin/ip addr add $_IP_$/23 dev ens18 label ens18:0
> Other thing I had to change was path to arping as on Ubuntu 24 it's in
> /usr/bin (not sbin)
>  arping_cmd = '/usr/bin/sudo /usr/bin/arping -U $_IP_$ -w 1 -I ens18'
> 
> The other thing I'm confused about, I have two network interfaces on each
> host, i.e.
> 192.168.77.135/23 brd 192.168.77.255 scope global ens18
> and
> inet 192.168.78.135/24 brd 192.168.78.255 scope global ens19
> 
> In pgpool.conf, it is referencing the "priv" hosts (i.e.
> heartbeat_hostname2 = 'db-replica2-priv'), which theoretically should be on
> ens19 network interface as shown above, but in my old pgpool.conf, it was
> only referencing eth0 on RHEL, even though "priv" hosts were on eth1.  I'm
> wondering if that needs to be changed in Ubuntu or it makes no difference
> here?
> 
>  Right now, I can't quite figure out whether it is something with
> pgpool.conf or with the Ubuntu settings and the semaphore error is quite
> cryptic.  If you need me to post some parts of my pgpool.conf, please
> advise which.  I've seen an older version of the FAQ that said repmgr and
> pgpool don't work together but then I saw mention that this was no longer
> the case.
> 
> I would much appreciate any help you can provide in solving this problem.