[pgpool-general: 9507] Re: Weird error in pgpool 4.6 logs: WARNING: failed to lock semaphore

Wed Jun 4 20:39:17 JST 2025

Hi, I changed the wd_lifecheck_method to query, saw this in the log
afterwards:
2025-06-04 13:32:42.930: main pid 120326: LOG:  create socket files[0]:
/var/run/postgresql/.s.PGSQL.9999
2025-06-04 13:32:42.930: main pid 120326: LOCATION:  pgpool_main.c:4888
2025-06-04 13:32:42.930: main pid 120326: LOG:  listen address[0]: *
2025-06-04 13:32:42.930: main pid 120326: LOCATION:  pgpool_main.c:4920
2025-06-04 13:32:42.930: main pid 120326: LOG:  Setting up socket for
0.0.0.0:9999
2025-06-04 13:32:42.930: main pid 120326: LOCATION:  pgpool_main.c:999

*2025-06-04 13:32:42.930: main pid 120326: LOG:  Setting up socket for
:::99992025-06-04 13:32:42.930: main pid 120326: LOCATION:
 pgpool_main.c:9992025-06-04 13:32:42.932: child pid 120332: FATAL:  SCRAM
authentication failed2025-06-04 13:32:42.932: child pid 120332: DETAIL:
 pool_passwd file does not contain an entry for "nobody"*
2025-06-04 13:32:42.932: child pid 120332: LOCATION:  pool_hba.c:1272
2025-06-04 13:32:43.375: main pid 120326: LOG:  create socket files[0]:
/var/run/postgresql/.s.PGSQL.9898
2025-06-04 13:32:43.375: main pid 120326: LOCATION:  pgpool_main.c:4888
2025-06-04 13:32:43.375: main pid 120326: LOG:  listen address[0]: *
2025-06-04 13:32:43.375: main pid 120326: LOCATION:  pgpool_main.c:4920
2025-06-04 13:32:43.375: main pid 120326: LOG:  Setting up socket for
0.0.0.0:9898
2025-06-04 13:32:43.375: main pid 120326: LOCATION:  pgpool_main.c:999
2025-06-04 13:32:43.375: main pid 120326: LOG:  Setting up socket for
:::9898
2025-06-04 13:32:43.375: main pid 120326: LOCATION:  pgpool_main.c:999
2025-06-04 13:32:43.376: pcp_main pid 121356: LOG:  PCP process: 121356
started

Any idea about that pool_passwd error and attempting to set up socket for
:::9999?
Again, not a big deal, just wondering if it's something to be concerned
about.

Also, I'm wondering about the status of that pcp_node_info command I
mentioned earlier... even though semaphores seem to be present and we have
plenty of queries/connections, the status is still showing as 1/waiting
where I suspect it should be 2... I might be stressing over nothing, but
none of the nodes are showing in status 2 at this point, example:
$ pcp_node_info -h localhost -p 9898 -U pcpadmin --verbose -n 2
Hostname               : db-replica3
Port                   : 5434
Status                 : 1
Weight                 : 0.250000
Status Name            : waiting
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2025-06-03 11:59:27

On Wed, Jun 4, 2025 at 2:18 AM Tatsuo Ishii <ishii at postgresql.org> wrote:

> Hi,
>
> > Hi, Gave it a try, although we went with option #2 (RemoveIPC=no) as
> > it was quicker to do.  After 20+ minutes, the semaphores are still
> > present and there are no errors in the log.  Tomorrow I will discuss
> > with our sysadmins how we can implement option #1, as the UID of
> > postgres user is above SYS_UID_MAX.  Maybe we won't go through
> > changing postgres user's UID, changing the SYS_UID_MAX to higher
> > number might be ok as we only have a couple users above that number.
>
> Thank you for sharing your thought.
>
> > Also, I forgot to mention - our repo showed the newer (4.6.2)
> > version as available so I upgraded the packages and restarted
> > pgpool, but that ipv6 error at start up is still present.  It's more
> > of a nuisance more than anything else, but figured I would let you
> > know: 2025-06-03 18:56:39: pid 1950700: WARNING: failed to create
> > watchdog heartbeat receive socket as much intended 2025-06-03
> > 18:56:39: pid 1950700: DETAIL: only 1 out of 2 socket(s) had been
> > created 2025-06-03 18:56:39: pid 1950700: LOCATION:
> > wd_heartbeat.c:428 2025-06-03 18:56:39: pid 1950704: LOG: failed to
> > create watchdog heartbeat receive socket 2025-06-03 18:56:39: pid
> > 1950704: LOCATION: wd_heartbeat.c:408 2025-06-03 18:56:39: pid
> > 1950704: WARNING: failed to create watchdog heartbeat receive socket
> > as much intended 2025-06-03 18:56:39: pid 1950704: DETAIL: only 1
> > out of 2 socket(s) had been created 2025-06-03 18:56:39: pid
> > 1950704: LOCATION: wd_heartbeat.c:428 Earlier, when I had debugging
> > enabled, it was still showing unable to create this socket:
> > 2025-06-03 11:57:24.833: heart_beat_receiver pid 4005: LOG: creating
> > watchdog heartbeat receive socket.  2025-06-03 11:57:24.833:
> > heart_beat_receiver pid 4005: DETAIL: creating socket on
> > 127.0.0.1:9694 2025-06-03 11:57:24.833: heart_beat_receiver pid
> > 4005: LOCATION: wd_heartbeat.c:385 2025-06-03 11:57:24.833:
> > heart_beat_receiver pid 4005: LOG: set SO_REUSEPORT option to the
> > socket 2025-06-03 11:57:24.833: heart_beat_receiver pid 4005:
> > LOCATION: wd_heartbeat.c:891 2025-06-03 11:57:24.833:
> > heart_beat_receiver pid 4005: LOG: creating watchdog heartbeat
> > receive socket.  2025-06-03 11:57:24.833: heart_beat_receiver pid
> > 4005: DETAIL: *creating socket on ::1:9694* 2025-06-03 11:57:24.833:
> > heart_beat_receiver pid 4005: LOCATION: wd_heartbeat.c:385
> > 2025-06-03 11:57:24.833: heart_beat_receiver pid 4005: LOG: failed
> > to create watchdog heartbeat receive socket. retrying...  2025-06-03
> > 11:57:24.833: heart_beat_receiver pid 4005: DETAIL: bind failed with
> > reason: "Cannot assign requested address" Doesn't seem to affect the
> > pgpool functionality, I'd prefer to disable it altogether if
> > possible to have 'clean' logs, if possible.  So far, tried to change
> > listen_addresses and pcp_listen_addresses in pgpool.conf from "*" to
> > "0.0.0.0" but that didn't resolve it.
>
> pcp_listen_addresses is used for pcp commands only, not relevant to
> heartbeart.
>
> BTW, I think I found possible cause of heartbeat problem: before 4.6,
> the bind address of heartbeat receive socket was specified as
> INADDR_ANY, which binds all available network devices. But from 4.6.0,
> it only binds to loopback address (127.0.0.1 for IPv4, ::1 for
> IPv6). We should have bound to all available network devices in 4.6.
> Because of this, watchdog won't start lifecheck using heartbeat as
> other heartbeat sender process send heartbeat packet to hostnameX,
> which is not loopback address of course. We have alreay created a
> patch to fix it, and doing an internal test.
>
> Note that if lifecheck_method = 'query', you are not bitten by the
> bug.
>
> > Just curious if there's a way to
> > skip ipv6 functionality?
>
> No.
>
> > On Tue, Jun 3, 2025 at 1:57 PM Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >
> >> > Hi,
> >> >
> >> > Thank you for the great information!
> >> >
> >> >> Hi,
> >> >> We've made some progress here and I'd like to get your feedback
> >> regarding
> >> >> the fix/workaround.  Basically, after some research (incl.
> >> google/chatgpt
> >> >> etc), we came across this information:
> >> >>
> >> >>   1. systemd-logind manages user sessions
> >> >>
> >> >> In Ubuntu 24.04 (and since ~15.04), systemd manages user sessions.
> When
> >> a
> >> >> user session ends (e.g. when a background process or login shell
> >> finishes),
> >> >> the following happens:
> >> >>
> >> >>    -
> >> >>
> >> >>    The session scope (session-XXXX.scope) is stopped.
> >> >>    -
> >> >>
> >> >>    The user manager (user at UID.service) is also stopped *unless
> >> lingering is
> >> >>    enabled*.
> >> >>
> >> >>   If no processes remain and the user is not lingering:
> >> >>
> >> >>    -
> >> >>
> >> >>    Shared memory and *System V IPC resources* (like semaphores) *may
> be
> >> >>    destroyed* automatically if no one is using them.
> >> >>    -
> >> >>
> >> >>    If a process (like pgpool2 or PostgreSQL) relied on those
> semaphores,
> >> >>    you may later see:
> >> >>
> >> >>    failed to lock semaphore
> >> >>
> >> >>
> >> >> This is a strong sign that the user session ended and IPC resources
> >> (like
> >> >> semaphores) were destroyed.
> >> >>
> >> >> This does seem to be happening here, as I see these in
> /var/log/syslog
> >> >> around that time we discussed in last posting, just prior to
> semaphore
> >> >> errors showing in pgpool logs:
> >> >> 2025-06-02T12:45:17.173486+02:00 db-replica3 systemd[1]:
> >> user at 3000.service:
> >> >> Deactivated successfully.
> >> >> 2025-06-02T12:45:17.173997+02:00 db-replica3 systemd[1]: Stopped
> >> >> user at 3000.service - User Manager for UID 3000.
> >> >> 2025-06-02T12:45:17.240314+02:00 db-replica3 systemd[1]: Stopping
> >> >> user-runtime-dir at 3000.service - User Runtime Directory
> >> /run/user/3000...
> >> >> 2025-06-02T12:45:17.248017+02:00 db-replica3 systemd[1]:
> >> >> run-user-3000.mount: Deactivated successfully.
> >> >> 2025-06-02T12:45:17.249107+02:00 db-replica3 systemd[1]:
> >> >> user-runtime-dir at 3000.service: Deactivated successfully.
> >> >> 2025-06-02T12:45:17.249353+02:00 db-replica3 systemd[1]: Stopped
> >> >> user-runtime-dir at 3000.service - User Runtime Directory
> /run/user/3000.
> >> >> 2025-06-02T12:45:17.251672+02:00 db-replica3 systemd[1]: Removed
> slice
> >> >> user-3000.slice - User Slice of UID 3000.
> >> >> 2025-06-02T12:45:23.581108+02:00 db-replica3 strace[1692927]: [pid
> >> 1693495]
> >> >> 12:45:23.580360 semtimedop(41, [{sem_num=6, sem_op=-1,
> >> sem_flg=SEM_UNDO}],
> >> >> 1, NULL) = -1 EINVAL (Invalid argument)
> >> >>
> >> >> It also advises to add Type=forking to the [Service] section of the
> >> >> pgpool.service file, which we don't currently have, I think I posted
> it
> >> >> earlier, but just in case here is relevant part of our service file:
> >> >>
> >> >> [Unit]
> >> >> Description=Pgpool-II
> >> >> After=syslog.target network.target postgresql.service
> >> >> Wants=postgresql.service
> >> >>
> >> >> [Service]
> >> >> User=postgres
> >> >> Group=postgres
> >> >>
> >> >> EnvironmentFile=-/etc/default/pgpool2
> >> >>
> >> >> ExecStart=/usr/sbin/pgpool -f /etc/pgpool2/pgpool.conf $OPTS
> >> >> ExecStop=/usr/sbin/pgpool -f /etc/pgpool2/pgpool.conf $STOP_OPTS stop
> >> >> ExecReload=/usr/sbin/pgpool -f /etc/pgpool2/pgpool.conf reload
> >> >>
> >> >> -------------------
> >> >> One bit I read that is different is the Type=forking and adding -n is
> >> >> required (as you can see above, we don't have that right now:
> >> >> -
> >> >>
> >> >> User=postgres: Assumes pgpool should run as the postgres user.
> Adjust if
> >> >> different.
> >> >> -
> >> >>
> >> >> ExecStart=/usr/sbin/pgpool -n: Use the *correct path and arguments*
> for
> >> >> your installation.
> >> >> -
> >> >>
> >> >> pgpool must be configured to *not daemonize* (-n flag), or systemd
> will
> >> >> lose track of it.
> >> >> -
> >> >>
> >> >> You can add environment variables for config files, ports, etc.
> >> >> -----------------------
> >> >>
> >> >> Anyways, our sysadmins suggested changing the user postgres to have
> >> Linger
> >> >> option on:
> >> >> $ sudo loginctl show-user postgres -p Linger
> >> >> Linger=yes
> >> >> Once we did that, the semaphore issue disappeared, it's been running
> for
> >> >> over an hour now and ipcs shows them:
> >> >> $ ipcs -s
> >> >>
> >> >> ------ Semaphore Arrays --------
> >> >> key        semid      owner      perms      nsems
> >> >> 0x00000000 6          postgres   600        8
> >> >>
> >> >>
> >> >> So, what do you think about this fix? I read that letting the user
> have
> >> >> linger on is not generally recommended, but that seems to be the only
> >> way
> >> >> to keep the semaphores from disappearing right now...
> >> >
> >> > I am not familiar with this area. I will discuss with other
> >> > pgpool/PostgreSQL developers and reply back to you.
> >>
> >> I found this in the PostgreSQL manual:
> >>
> >>
> https://www.postgresql.org/docs/current/kernel-resources.html#SYSTEMD-REMOVEIPC
> >>
> >> If my reading is correct, we can do one of:
> >>
> >> 1) register the user to start pgpool as a "system user".
> >>
> >> 2) change the line "RemoveIPC=no" in /etc/systemd/logind.conf .
> >>
> >> For me, #1 is better, since #2 affects to other servers in the system.
> >>
> >> BTW, I think the doc does not suggest to use Linger option.
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS K.K.
> >> English: http://www.sraoss.co.jp/index_en/
> >> Japanese:http://www.sraoss.co.jp
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20250604/e6eb024f/attachment-0001.htm>