[pgpool-general: 9489] Re: Weird error in pgpool 4.6 logs: WARNING: failed to lock semaphore
Tatsuo Ishii
ishii at postgresql.org
Sun Jun 1 21:06:34 JST 2025
> Hi,
> So long story short, I'm migrating a PG13 cluster from RHEL to Ubuntu
> 24.04. In the process, the pgpool version is also being upgraded to 4.6
> (from 4.2).
What is the exact version of 4.6? I am asking because 4.6.0 has vulnerability.
https://pgpool.net/mediawiki/index.php/Main_Page#Pgpool-II_4.6.1.2C_4.5.7.2C_4.4.12.2C_4.3.15_and_4.2.22_officially_released_.282025.2F05.2F15.29
Also 4.6.1 (and 4.6.0) have an issue with IPv6.
https://www.pgpool.net/docs/latest/en/html/release-4-6-2.html
If possible, I recommend to upgrade to 4.6.2.
> There is also repmgr running on this 4 node cluster, 1 primary
> and 3 replicas.
I am not famililar with repmgr, and I ignore this part.
> All the migration seems to have completed successfully,
> but I've ran into a weird problem with pgpool. I have quite a few errors
> that look like the below entries, and it seems to "come and go" when
> checking pgpool log. The process itself, seems to work (I think), but
> every now and then, these errors (warnings?) show up:
>
> 2025-06-01 09:11:00: pid 628043: DEBUG: watchdog heartbeat: send heartbeat
> signal to db-replica3-priv:9694
> 2025-06-01 09:11:00: pid 628043: LOCATION: wd_heartbeat.c:771
> 2025-06-01 09:11:00: pid 627958: LOG: exit handler called (signal: 2)
> 2025-06-01 09:11:00: pid 627958: LOCATION: pgpool_main.c:1272
>
>
> *2025-06-01 09:11:00: pid 627958: WARNING: failed to lock
> semaphore2025-06-01 09:11:00: pid 627958: DETAIL: Invalid argument*2025-06-01
> 09:11:00: pid 627958: LOCATION: pool_sema.c:129
> 2025-06-01 09:11:00: pid 627958: WARNING: failed to unlock semaphore
> 2025-06-01 09:11:00: pid 627958: DETAIL: Invalid argument
> 2025-06-01 09:11:00: pid 627958: LOCATION: pool_sema.c:202
> 2025-06-01 09:11:00: pid 627958: LOG: shutting down by signal 2
> 2025-06-01 09:11:00: pid 627958: LOCATION: pgpool_main.c:1324
> 2025-06-01 09:11:00: pid 627958: LOG: terminating all child processes
> 2025-06-01 09:11:00: pid 627958: LOCATION: pgpool_main.c:1334
> 2025-06-01 09:11:00: pid 628047: WARNING: failed to lock semaphore
> 2025-06-01 09:11:00: pid 628047: DETAIL: Invalid argument
> 2025-06-01 09:11:00: pid 628047: LOCATION: pool_sema.c:129
> 2025-06-01 09:11:00: pid 628042: WARNING: failed to lock semaphore
> 2025-06-01 09:11:00: pid 628042: DETAIL: Invalid argument
> 2025-06-01 09:11:00: pid 628042: LOCATION: pool_sema.c:129
> 2025-06-01 09:11:00: pid 628039: WARNING: failed to lock semaphore
> 2025-06-01 09:11:00: pid 628039: DETAIL: Invalid argument
Can you share the result of "ipcs -s" so that I can confirm pgpool
properly allocates the semaphore.
By the way do you see "WARNING: failed to unlock semaphore" only when
pgpool is shutting down?
> Also, earlier in the log, I see errors about being unable to create a
> socket, for some reason it's trying (and failing) to create socket on
> localhost (127.0.0.1:9694) and "creating socket on ::1:9694". Are these
> "ok to ignore"?
First of all "creating socket on ::1:9694" looks unusual because it
means that heartbeat_hostname is set to "::1", which is the IPv6 loop
back address. heartbeat_hostname is expected to be set IPv4 or IPv6
inet addesss (not loop back address).
For now, I would like to wait for your reply regarding these my
questions above.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
> 2025-06-01 09:11:23: pid 634772: DETAIL: creating socket on 127.0.0.1:9694
> 2025-06-01 09:11:23: pid 634772: LOCATION: wd_heartbeat.c:385
> 2025-06-01 09:11:23: pid 634772: LOG: set SO_REUSEPORT option to the socket
> 2025-06-01 09:11:23: pid 634772: LOCATION: wd_heartbeat.c:891
> 2025-06-01 09:11:23: pid 634772: LOG: creating watchdog heartbeat receive
> socket.
> 2025-06-01 09:11:23: pid 634772: DETAIL: creating socket on ::1:9694
> 2025-06-01 09:11:23: pid 634772: LOCATION: wd_heartbeat.c:385
> 2025-06-01 09:11:23: pid 634772: LOG: failed to create watchdog heartbeat
> receive socket. retrying...
> 2025-06-01 09:11:23: pid 634772: DETAIL: bind failed with reason: "Cannot
> assign requested address"
> 2025-06-01 09:11:23: pid 634772: LOCATION: wd_heartbeat.c:394
>
>
> 2025-06-01 09:11:22: pid 635274: DEBUG: verify_backend_node_status:
> true_primary 0
> 2025-06-01 09:11:22: pid 635274: LOCATION: pgpool_main.c:2704
> 2025-06-01 09:11:22: pid 635274: DEBUG: node status[0]: 1
> 2025-06-01 09:11:22: pid 635274: LOCATION: pool_worker_child.c:267
> 2025-06-01 09:11:22: pid 635274: DEBUG: node status[1]: 2
> 2025-06-01 09:11:22: pid 635274: LOCATION: pool_worker_child.c:267
> 2025-06-01 09:11:22: pid 635274: DEBUG: node status[2]: 2
> 2025-06-01 09:11:22: pid 635274: LOCATION: pool_worker_child.c:267
> 2025-06-01 09:11:22: pid 635274: DEBUG: node status[3]: 2
> 2025-06-01 09:11:22: pid 635274: LOCATION: pool_worker_child.c:267
> 2025-06-01 09:11:22: pid 635274: DEBUG: pool_release_follow_primary_lock
> called
> 2025-06-01 09:11:22: pid 635274: LOCATION: pgpool_main.c:4106
> 2025-06-01 09:11:23: pid 634765: LOG: set SO_REUSEPORT option to the socket
> 2025-06-01 09:11:23: pid 634765: LOCATION: wd_heartbeat.c:891
> 2025-06-01 09:11:23: pid 634765: LOG: creating watchdog heartbeat receive
> socket.
> 2025-06-01 09:11:23: pid 634765: DETAIL: creating socket on 127.0.0.1:9694
> 2025-06-01 09:11:23: pid 634765: LOCATION: wd_heartbeat.c:385
> 2025-06-01 09:11:23: pid 634765: LOG: set SO_REUSEPORT option to the socket
> 2025-06-01 09:11:23: pid 634765: LOCATION: wd_heartbeat.c:891
> 2025-06-01 09:11:23: pid 634765: LOG: creating watchdog heartbeat receive
> socket.
>
>
>
> *2025-06-01 09:11:23: pid 634765: DETAIL: creating socket on
> ::1:96942025-06-01 09:11:23: pid 634765: LOCATION:
> wd_heartbeat.c:3852025-06-01 09:11:23: pid 634765: LOG: failed to create
> watchdog heartbeat receive socket. retrying...*2025-06-01 09:11:23: pid
> 634765: DETAIL: bind failed with reason: "Cannot assign requested address"
> 2025-06-01 09:11:23: pid 634765: LOCATION: wd_heartbeat.c:394
> 2025-06-01 09:11:23: pid 634767: LOG: set SO_REUSEPORT option to the socket
> 2025-06-01 09:11:23: pid 634767: LOCATION: wd_heartbeat.c:891
> 2025-06-01 09:11:23: pid 634767: LOG: creating socket for sending heartbeat
> 2025-06-01 09:11:23: pid 634767: DETAIL: set SO_REUSEPORT
>
>
> Other odd errors I see in the logs later on are when pgpool tries to run
> escalation script - can't quite find where it's failing to find ipv4
> address:
>
> 2025-06-01 00:07:15: pid 395780: LOG: Pgpool-II parent process received
> SIGUSR1
> 2025-06-01 00:07:15: pid 395780: LOG: Pgpool-II parent process received
> watchdog quorum change signal from watchdog
> 2025-06-01 00:07:15: pid 396973: LOG: watchdog: escalation started
> 2025-06-01 00:07:15: pid 395786: LOG: Setting failover command timeout to
> 135
> 2025-06-01 00:07:15: pid 395780: LOG: watchdog cluster now holds the quorum
> 2025-06-01 00:07:15: pid 395780: DETAIL: updating the state of quarantine
> backend nodes
> Sun Jun 1 12:07:15 AM CEST 2025
>
>
>
>
>
>
>
>
>
> *escalation.sh: - remove VIP IP from all nodesNode: db-primary-privError:
> ipv4: Address not found.Node: db-replica1-privError: ipv4: Address not
> found.Node: db-replica2-privError: ipv4: Address not found.Node:
> db-replica3-privError: ipv4: Address not found.*2025-06-01 00:07:20: pid
> 396973: LOG: watchdog escalation successful
> 2025-06-01 00:07:23: pid 396973: LOG: successfully acquired the delegate
> IP:"192.168.77.130"
> 2025-06-01 00:07:23: pid 396973: DETAIL: 'if_up_cmd' returned with success
> 2025-06-01 00:07:23: pid 395786: LOG: watchdog escalation process with
> pid: 396973 exit with SUCCESS.
> 2025-06-01 00:07:27: pid 395962: WARNING: failed to lock semaphore
> 2025-06-01 00:07:27: pid 395962: DETAIL: Invalid argument
> 2025-06-01 00:07:27: pid 395962: WARNING: failed to unlock semaphore
> 2025-06-01 00:07:27: pid 395962: DETAIL: Invalid argument
> 2025-06-01 00:07:28: pid 396924: WARNING: failed to lock semaphore
> 2025-06-01 00:07:28: pid 396924: DETAIL: Invalid argument
>
> The way I started was I copied pgpool.conf from the "old" and working
> pgpool 4.2 configuration and modified it slightly, though nothing
> major,changed hostnames and i.e. delegate_ip variable name was changed
> from delegate_IP. Also added this variable as without it, I didn't see a
> socket created for port 9898:
> unix_socket_directories = '/var/run/postgresql'
> I had to change network interfaces as well, in the RHEL it was using eth0
> where on Ubuntu it's ens18, so I changed lines like:
> /usr/bin/sudo /sbin/ip addr add $_IP_$/23 dev ens18 label ens18:0
> Other thing I had to change was path to arping as on Ubuntu 24 it's in
> /usr/bin (not sbin)
> arping_cmd = '/usr/bin/sudo /usr/bin/arping -U $_IP_$ -w 1 -I ens18'
>
> The other thing I'm confused about, I have two network interfaces on each
> host, i.e.
> 192.168.77.135/23 brd 192.168.77.255 scope global ens18
> and
> inet 192.168.78.135/24 brd 192.168.78.255 scope global ens19
>
> In pgpool.conf, it is referencing the "priv" hosts (i.e.
> heartbeat_hostname2 = 'db-replica2-priv'), which theoretically should be on
> ens19 network interface as shown above, but in my old pgpool.conf, it was
> only referencing eth0 on RHEL, even though "priv" hosts were on eth1. I'm
> wondering if that needs to be changed in Ubuntu or it makes no difference
> here?
>
> Right now, I can't quite figure out whether it is something with
> pgpool.conf or with the Ubuntu settings and the semaphore error is quite
> cryptic. If you need me to post some parts of my pgpool.conf, please
> advise which. I've seen an older version of the FAQ that said repmgr and
> pgpool don't work together but then I saw mention that this was no longer
> the case.
>
> I would much appreciate any help you can provide in solving this problem.
More information about the pgpool-general
mailing list