[pgpool-hackers: 4035] PgPool automatically performs failover after few mins without any activity

anirudh.goel at cern.ch anirudh.goel at cern.ch
Thu Oct 7 02:32:56 JST 2021


Hello

I was following this amazing tutorial by Bo Peng to setup a pgpool cluster with pgpool exporter. The setup went fine and I was able to run the exporter successfully, but on further testing, I found something strange. The setup works fine for a few mins, but after that, even without any activity on the primary or standby, pgpool throws an "authentication failed, password authentication failed for user “postgres”” error and performs a failover. Here are the logs-

Configuring Pgpool-II...
No custom pgpool.conf detected. Use environment variables and default config.
Starting Pgpool-II...
2021-10-06 17:19:27: pid 20: LOG: Backend status file /tmp/pgpool_status does not exist
2021-10-06 17:19:27: pid 20: LOG: health_check_stats_shared_memory_size: requested size: 12288
2021-10-06 17:19:27: pid 20: LOG: memory cache initialized
2021-10-06 17:19:27: pid 20: DETAIL: memcache blocks :64
2021-10-06 17:19:27: pid 20: LOG: allocating (136555320) bytes of shared memory segment
2021-10-06 17:19:27: pid 20: LOG: allocating shared memory segment of size: 136555320
2021-10-06 17:19:27: pid 20: LOG: health_check_stats_shared_memory_size: requested size: 12288
2021-10-06 17:19:27: pid 20: LOG: health_check_stats_shared_memory_size: requested size: 12288
2021-10-06 17:19:27: pid 20: LOG: memory cache initialized
2021-10-06 17:19:27: pid 20: DETAIL: memcache blocks :64
2021-10-06 17:19:27: pid 20: LOG: pool_discard_oid_maps: discarded memqcache oid maps
2021-10-06 17:19:27: pid 20: LOG: Setting up socket for 0.0.0.0:9999
2021-10-06 17:19:27: pid 20: LOG: Setting up socket for :::9999
2021-10-06 17:19:27: pid 20: LOG: find_primary_node_repeatedly: waiting for finding a primary node
2021-10-06 17:19:28: pid 20: LOG: find_primary_node: primary node is 0
2021-10-06 17:19:28: pid 20: LOG: find_primary_node: standby node is 1
2021-10-06 17:19:28: pid 54: LOG: PCP process: 54 started
2021-10-06 17:19:28: pid 57: LOG: process started
2021-10-06 17:19:28: pid 56: LOG: process started
2021-10-06 17:19:28: pid 55: LOG: process started
2021-10-06 17:19:28: pid 20: LOG: pgpool-II successfully started. version 4.2.3 (chichiriboshi)
2021-10-06 17:19:28: pid 20: LOG: node status[0]: 1
2021-10-06 17:19:28: pid 20: LOG: node status[1]: 2
2021-10-06 17:19:28: pid 47: LOG: pool_reuse_block: blockid: 0
2021-10-06 17:19:28: pid 47: CONTEXT: while searching system catalog, When relcache is missed
2021-10-06 17:19:43: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:19:43: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:19:53: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:19:53: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:03: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:03: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:13: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:13: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:23: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:23: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:33: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:33: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:43: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:43: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:53: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:20:53: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:03: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:03: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:13: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:13: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:23: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:23: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:33: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:33: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:43: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:43: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:48: pid 57: ERROR: authentication failed
2021-10-06 17:21:48: pid 57: DETAIL: password authentication failed for user "postgres"
2021-10-06 17:21:48: pid 57: LOG: health check failed on node 1 (timeout:0)
2021-10-06 17:21:48: pid 57: LOG: received degenerate backend request for node_id: 1 from pid [57]
2021-10-06 17:21:48: pid 20: LOG: Pgpool-II parent process has received failover request
2021-10-06 17:21:48: pid 20: LOG: starting degeneration. shutdown host postgres_replica(5432)
2021-10-06 17:21:48: pid 20: LOG: Do not restart children because we are switching over node id 1 host: postgres_replica port: 5432 and we are in streaming replication mode
2021-10-06 17:21:48: pid 20: LOG: failover: set new primary node: 0
2021-10-06 17:21:48: pid 20: LOG: failover: set new main node: 0
failover done. shutdown host postgres_replica(5432)2021-10-06 17:21:48: pid 20: LOG: failover done. shutdown host postgres_replica(5432)
2021-10-06 17:21:48: pid 55: LOG: worker process received restart request
2021-10-06 17:21:49: pid 54: LOG: restart request received in pcp child process
2021-10-06 17:21:49: pid 20: LOG: PCP child 54 exits with status 0 in failover()
2021-10-06 17:21:49: pid 20: LOG: fork a new PCP child pid 58 in failover()
2021-10-06 17:21:49: pid 20: LOG: worker child process with pid: 55 exits with status 256
2021-10-06 17:21:49: pid 58: LOG: PCP process: 58 started
2021-10-06 17:21:49: pid 20: LOG: fork a new worker child process with pid: 59
2021-10-06 17:21:49: pid 59: LOG: process started
2021-10-06 17:21:53: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:21:53: pid 47: LOG: status_changed_time 1633540908
2021-10-06 17:22:03: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:22:03: pid 47: LOG: status_changed_time 1633540908
2021-10-06 17:22:13: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:22:13: pid 47: LOG: status_changed_time 1633540908
2021-10-06 17:22:23: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:22:23: pid 47: LOG: status_changed_time 1633540908
2021-10-06 17:22:33: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:22:33: pid 47: LOG: status_changed_time 1633540908
2021-10-06 17:22:43: pid 47: LOG: status_changed_time 1633540768
2021-10-06 17:22:43: pid 47: LOG: status_changed_time 1633540908
2021-10-06 17:22:53: pid 47: LOG: status_changed_time 1633540768

As you will see in the logs, there’s no activity on my side and I have tried this multiple times and it always shows the same behaviour. I didn’t change anything in the configuration from Bo’s tutorial. Though I have added an additional Prometheus and Alertmanager in front of it. Here’s the configuration for those-

prometheus:
 restart: unless-stopped
 ports:
 - "9090:9090"
 volumes:
 - ./prometheus/config.yml:/etc/prometheus/prometheus.yml
 - ./prometheus/alert_rules.yml:/etc/prometheus/alert_rules/alert_rules.yml
 depends_on:
 - pgpool2_exporter
 image: "prom/prometheus"

 alertmanager:
 restart: unless-stopped
 volumes:
 - ./prometheus-alertmanager/config.yml:/etc/alertmanager/alertmanager.yml
 depends_on:
 - prometheus
 image: "quay.io/prometheus/alertmanager"


I don’t think prometheus or the alertmanager are causing any problems but I am unable to figure out why pgpool is doing a failover after several minutes of running fine.

Thanks in advance!

Anirudh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20211006/6792691f/attachment-0001.htm>


More information about the pgpool-hackers mailing list