0000576: getsockopt() detected error "Connection refused"

ID	Project	Category	View Status	Date Submitted	Last Update

0000576	Pgpool-II	Bug	public	2020-01-17 02:50	2020-01-19 11:43

Reporter	dcvythoulkas	Assigned To	t-ishii
Priority	normal	Severity	trivial	Reproducibility	always
Status	resolved	Resolution	open
Platform	Postgresql 12	OS	Debian	OS Version	10.2
Product Version	4.1.0

Summary	0000576: getsockopt() detected error "Connection refused"
Description	I install pgpool-II version 4.1.0 (karasukiboshi) from http://apt.postgresql.org/pub/repos/apt repo. I also install Postgresql 12. I configured a 3 node streaming replication cluster with physical replication slots (no archiving) and used modified versions of the failover, followmaster, etc, from the example here https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG. The setup works. Replication, healthchecks, watchdog, failover, online recovery, loadbalancing everything I tested (pg_rewind not yet because I am still using the initdb from Debian, so I use the pg_basebackup way). The only thing I cannot explain is why the following appears in the syslog of all servers. It appears for every server for its own local database: Jan 16 17:28:42 debian10 pgpool[10477]: 2020-01-16 17:28:42: pid 10523: LOG: failed to connect to PostgreSQL server on "pg2:5433", getsockopt() detected error "Connection refused" Jan 16 17:28:42 debian10 pgpool[10477]: 2020-01-16 17:28:42: pid 10523: LOCATION: pool_connection_pool.c:680 Both the intervals and the PID mentioned point to the healtheck as the culprit, however healthcheck appears to be working properly and instantly detects any problem that I manually create. Is it something that I can ignore? Is there something I miss configured? I have attached the pool_hba.conf (same for everyone) and every pgpool.conf from each node.
Tags	logs, pgpool health check, settings

dcvythoulkas 2020-01-17 02:50 reporter	pgpool.conf (44,432 bytes) pool_hba.conf (3,413 bytes) pgpool2.conf (44,432 bytes) pgpool3.conf (44,432 bytes)

t-ishii 2020-01-17 16:35 developer ~0003067	I guess health check succeeds after some retries because you said the heath check feature seems to be working. I need to confirm this by looking at the health check log in syslog. Can we share the log?

dcvythoulkas 2020-01-17 17:31 reporter ~0003068	Thanks for the rapid reply. I've attached extracts from syslog and postgresql log of the same time from 1 of the nodes. The whole setup is configured from my custom ansible roles. Equivalent logs exist on the other two nodes. The following is what i think to be the issue from the postgres log. The unknown@unknown login attempt followed by the proper pgpool@postgres 2020-01-17 07:57:11.319 UTC [7980] pg1(55164)[unknown][unknown]@[unknown] LOG: 00000: connection received: host=pg1 port=55164 2020-01-17 07:57:11.319 UTC [7980] pg1(55164)[unknown][unknown]@[unknown] LOCATION: BackendInitialize, postmaster.c:4296 2020-01-17 07:57:11.328 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOG: 00000: connection authorized: user=pgpool database=postgres 2020-01-17 07:57:11.328 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOCATION: PerformAuthentication, postinit.c:303 2020-01-17 07:57:11.370 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOG: 00000: disconnection: session time: 0:00:00.050 user=pgpool database=postgres host=pg1 port=55164 2020-01-17 07:57:11.370 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOCATION: log_disconnections, postgres.c:4666 pg1_postgres_log (41,923 bytes) pg1_postgres_log (41,923 bytes) pg1_syslog (21,055 bytes) pg1_syslog (21,055 bytes)

t-ishii 2020-01-18 07:12 developer ~0003069	I looked into pg1syslog and found a strange thing. It seems pgpool's health check process behaves as if it successfully connected to backend despite the getsockopt() error. Moreover, PostgreSQL backend accept the connection from health check process and does the authentication negotiation. I expected health check retry after getsockopt() error in this situation, and is also expected from the source code. Is it possible that in your debian package the source code is modified? Especially I suspect this(src/protocol/pool_connection_pool.c line 676): /* Non Solaris case */ if (error != 0) { ereport(LOG, (errmsg("failed to connect to PostgreSQL server on \"%s:%d\", getsockopt() detected error \"%s\"", host, port, strerror(error)))); return false; <--- maybe replaced with "return true;"? }

dcvythoulkas 2020-01-18 07:47 reporter ~0003070	If you check the pg1_postres_log you will that PostgreSQL did actually accepted a login connection, so healthcheck did actually manage to successfully connect. Now regarding the source, as I mentioned in the initial report, I used binary packages from http://apt.postgresql.org/pub/repos/apt . I did not compile anything from source, nor do I have the knowledge to do so. Also if I try stopping the backend, the healthcheck immediately notices and proceeds with quarantine, etc. If the code had changed as you suggest, then I think that healthcheck would fail to notice the change.

t-ishii 2020-01-18 09:16 developer ~0003071	Yes, I know you used binary packages. My guess was debian packagers patched the source code then created the binary package. I am not familiar with debian packages but there's should be a way to get the actual source code to be used to build debian packages (like source packages provided in RPM). If so, I could be able to confirm my speculation. > Also if I try stopping the backend, the healthcheck immediately notices and proceeds with quarantine, etc. If the code had changed as you suggest, then I think that healthcheck would fail to notice the change. If you stop PostgreSQL by using pg_ctl or other way and you connect to pgpool at the time, pgpool detects it by receiving a notice message from PostgreSQL. The health check process is not involved here. Also please note that there are multiple code path that detects failure in connecting to backend in health check and it is possible that health check detects the connection error in different ways.

t-ishii 2020-01-18 10:34 developer ~0003072 Last edited: 2020-01-18 11:00	I found this page: https://sources.debian.org/patches/pgpool2/4.1.0-1/ It seems there's no patch for pool_connection_pool.c. So my theory fails:-< Another theory is, getaddrinfo() for "pg1" returns multiple network address info, e.g. IPV4 and IPV6 address. In this case health check first tries IPV4 address. If it fails with the error you are seeing, then it tries IPV6 address. If it succeeds, health check succeeds. This theory explains the situation. If my theory is correct, you could avoid the error by enabling both IPV4 and IPV6 address of PostgreSQL.

dcvythoulkas 2020-01-19 02:43 reporter ~0003073	Your theory is correct. But it was not the IPv6, it was name resolution in the /etc/hosts file and the postgresql.conf. Specifically, the /etc/hosts was like this: vagrant@pg1:~$ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 pg1 .... 10.10.20.11 pg1 .... And postgresql.conf like this: .... listen_addresses = 'localhost,10.10.20.11' # what IP address(es) to listen on; .... So when healthcheck tried to reach the local node on every machine, it would check via hostname (pg1, pg2, etc). With the hosts file above that leeds to 127.0.1.1 that postgresql doesn't listen, so it would fail, hence the error, and then try with the 10.10.20.11 which would be successful. That also explains why there was no error from healthchecks of remote hosts. I changed the hosts file to the following: vagrant@pg1:~$ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 pg1-local .... 10.10.20.11 pg1 .... And the logs are clear. Thank you for your quick response and support. You can close this issue.

t-ishii 2020-01-19 08:24 developer ~0003074	Thanks for the feedback. I am going to close this issue.

Date Modified	Username	Field	Change
2020-01-17 02:50	dcvythoulkas	New Issue
2020-01-17 02:50	dcvythoulkas	File Added: pgpool.conf
2020-01-17 02:50	dcvythoulkas	File Added: pool_hba.conf
2020-01-17 02:50	dcvythoulkas	File Added: pgpool2.conf
2020-01-17 02:50	dcvythoulkas	File Added: pgpool3.conf
2020-01-17 02:50	dcvythoulkas	Tag Attached: logs
2020-01-17 02:50	dcvythoulkas	Tag Attached: pgpool health check
2020-01-17 02:50	dcvythoulkas	Tag Attached: settings
2020-01-17 16:35	t-ishii	Note Added: 0003067
2020-01-17 16:35	t-ishii	Assigned To	=> t-ishii
2020-01-17 16:35	t-ishii	Status	new => feedback
2020-01-17 17:31	dcvythoulkas	File Added: pg1_postgres_log
2020-01-17 17:31	dcvythoulkas	File Added: pg1_syslog
2020-01-17 17:31	dcvythoulkas	Note Added: 0003068
2020-01-17 17:31	dcvythoulkas	Status	feedback => assigned
2020-01-18 07:12	t-ishii	Note Added: 0003069
2020-01-18 07:13	t-ishii	Status	assigned => feedback
2020-01-18 07:47	dcvythoulkas	Note Added: 0003070
2020-01-18 07:47	dcvythoulkas	Status	feedback => assigned
2020-01-18 09:16	t-ishii	Note Added: 0003071
2020-01-18 09:17	t-ishii	Status	assigned => feedback
2020-01-18 10:34	t-ishii	Note Added: 0003072
2020-01-18 11:00	t-ishii	Note Edited: 0003072
2020-01-19 02:43	dcvythoulkas	Note Added: 0003073
2020-01-19 02:43	dcvythoulkas	Status	feedback => assigned
2020-01-19 08:24	t-ishii	Note Added: 0003074
2020-01-19 11:43	t-ishii	Status	assigned => resolved

View Issue Details

Activities

Issue History