View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000576 | Pgpool-II | Bug | public | 2020-01-17 02:50 | 2020-01-19 11:43 |
| Reporter | dcvythoulkas | Assigned To | t-ishii | ||
| Priority | normal | Severity | trivial | Reproducibility | always |
| Status | resolved | Resolution | open | ||
| Platform | Postgresql 12 | OS | Debian | OS Version | 10.2 |
| Product Version | 4.1.0 | ||||
| Summary | 0000576: getsockopt() detected error "Connection refused" | ||||
| Description | I install pgpool-II version 4.1.0 (karasukiboshi) from http://apt.postgresql.org/pub/repos/apt repo. I also install Postgresql 12. I configured a 3 node streaming replication cluster with physical replication slots (no archiving) and used modified versions of the failover, followmaster, etc, from the example here https://www.pgpool.net/docs/latest/en/html/example-cluster.html#EXAMPLE-CLUSTER-PGPOOL-CONFIG. The setup works. Replication, healthchecks, watchdog, failover, online recovery, loadbalancing everything I tested (pg_rewind not yet because I am still using the initdb from Debian, so I use the pg_basebackup way). The only thing I cannot explain is why the following appears in the syslog of all servers. It appears for every server for its own local database: Jan 16 17:28:42 debian10 pgpool[10477]: 2020-01-16 17:28:42: pid 10523: LOG: failed to connect to PostgreSQL server on "pg2:5433", getsockopt() detected error "Connection refused" Jan 16 17:28:42 debian10 pgpool[10477]: 2020-01-16 17:28:42: pid 10523: LOCATION: pool_connection_pool.c:680 Both the intervals and the PID mentioned point to the healtheck as the culprit, however healthcheck appears to be working properly and instantly detects any problem that I manually create. Is it something that I can ignore? Is there something I miss configured? I have attached the pool_hba.conf (same for everyone) and every pgpool.conf from each node. | ||||
| Tags | logs, pgpool health check, settings | ||||
|
|
|
|
|
I guess health check succeeds after some retries because you said the heath check feature seems to be working. I need to confirm this by looking at the health check log in syslog. Can we share the log? |
|
|
Thanks for the rapid reply. I've attached extracts from syslog and postgresql log of the same time from 1 of the nodes. The whole setup is configured from my custom ansible roles. Equivalent logs exist on the other two nodes. The following is what i think to be the issue from the postgres log. The unknown@unknown login attempt followed by the proper pgpool@postgres 2020-01-17 07:57:11.319 UTC [7980] pg1(55164)[unknown][unknown]@[unknown] LOG: 00000: connection received: host=pg1 port=55164 2020-01-17 07:57:11.319 UTC [7980] pg1(55164)[unknown][unknown]@[unknown] LOCATION: BackendInitialize, postmaster.c:4296 2020-01-17 07:57:11.328 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOG: 00000: connection authorized: user=pgpool database=postgres 2020-01-17 07:57:11.328 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOCATION: PerformAuthentication, postinit.c:303 2020-01-17 07:57:11.370 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOG: 00000: disconnection: session time: 0:00:00.050 user=pgpool database=postgres host=pg1 port=55164 2020-01-17 07:57:11.370 UTC [7980] pg1(55164)[unknown]pgpool@postgres LOCATION: log_disconnections, postgres.c:4666 |
|
|
I looked into pg1syslog and found a strange thing. It seems pgpool's health check process behaves as if it successfully connected to backend despite the getsockopt() error. Moreover, PostgreSQL backend accept the connection from health check process and does the authentication negotiation. I expected health check retry after getsockopt() error in this situation, and is also expected from the source code. Is it possible that in your debian package the source code is modified? Especially I suspect this(src/protocol/pool_connection_pool.c line 676): /* Non Solaris case */ if (error != 0) { ereport(LOG, (errmsg("failed to connect to PostgreSQL server on \"%s:%d\", getsockopt() detected error \"%s\"", host, port, strerror(error)))); return false; <--- maybe replaced with "return true;"? } |
|
|
If you check the pg1_postres_log you will that PostgreSQL did actually accepted a login connection, so healthcheck did actually manage to successfully connect. Now regarding the source, as I mentioned in the initial report, I used binary packages from http://apt.postgresql.org/pub/repos/apt . I did not compile anything from source, nor do I have the knowledge to do so. Also if I try stopping the backend, the healthcheck immediately notices and proceeds with quarantine, etc. If the code had changed as you suggest, then I think that healthcheck would fail to notice the change. |
|
|
Yes, I know you used binary packages. My guess was debian packagers patched the source code then created the binary package. I am not familiar with debian packages but there's should be a way to get the actual source code to be used to build debian packages (like source packages provided in RPM). If so, I could be able to confirm my speculation. > Also if I try stopping the backend, the healthcheck immediately notices and proceeds with quarantine, etc. If the code had changed as you suggest, then I think that healthcheck would fail to notice the change. If you stop PostgreSQL by using pg_ctl or other way and you connect to pgpool at the time, pgpool detects it by receiving a notice message from PostgreSQL. The health check process is not involved here. Also please note that there are multiple code path that detects failure in connecting to backend in health check and it is possible that health check detects the connection error in different ways. |
|
|
I found this page: https://sources.debian.org/patches/pgpool2/4.1.0-1/ It seems there's no patch for pool_connection_pool.c. So my theory fails:-< Another theory is, getaddrinfo() for "pg1" returns multiple network address info, e.g. IPV4 and IPV6 address. In this case health check first tries IPV4 address. If it fails with the error you are seeing, then it tries IPV6 address. If it succeeds, health check succeeds. This theory explains the situation. If my theory is correct, you could avoid the error by enabling both IPV4 and IPV6 address of PostgreSQL. |
|
|
Your theory is correct. But it was not the IPv6, it was name resolution in the /etc/hosts file and the postgresql.conf. Specifically, the /etc/hosts was like this: vagrant@pg1:~$ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 pg1 .... 10.10.20.11 pg1 .... And postgresql.conf like this: .... listen_addresses = 'localhost,10.10.20.11' # what IP address(es) to listen on; .... So when healthcheck tried to reach the local node on every machine, it would check via hostname (pg1, pg2, etc). With the hosts file above that leeds to 127.0.1.1 that postgresql doesn't listen, so it would fail, hence the error, and then try with the 10.10.20.11 which would be successful. That also explains why there was no error from healthchecks of remote hosts. I changed the hosts file to the following: vagrant@pg1:~$ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 pg1-local .... 10.10.20.11 pg1 .... And the logs are clear. Thank you for your quick response and support. You can close this issue. |
|
|
Thanks for the feedback. I am going to close this issue. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2020-01-17 02:50 | dcvythoulkas | New Issue | |
| 2020-01-17 02:50 | dcvythoulkas | File Added: pgpool.conf | |
| 2020-01-17 02:50 | dcvythoulkas | File Added: pool_hba.conf | |
| 2020-01-17 02:50 | dcvythoulkas | File Added: pgpool2.conf | |
| 2020-01-17 02:50 | dcvythoulkas | File Added: pgpool3.conf | |
| 2020-01-17 02:50 | dcvythoulkas | Tag Attached: logs | |
| 2020-01-17 02:50 | dcvythoulkas | Tag Attached: pgpool health check | |
| 2020-01-17 02:50 | dcvythoulkas | Tag Attached: settings | |
| 2020-01-17 16:35 | t-ishii | Note Added: 0003067 | |
| 2020-01-17 16:35 | t-ishii | Assigned To | => t-ishii |
| 2020-01-17 16:35 | t-ishii | Status | new => feedback |
| 2020-01-17 17:31 | dcvythoulkas | File Added: pg1_postgres_log | |
| 2020-01-17 17:31 | dcvythoulkas | File Added: pg1_syslog | |
| 2020-01-17 17:31 | dcvythoulkas | Note Added: 0003068 | |
| 2020-01-17 17:31 | dcvythoulkas | Status | feedback => assigned |
| 2020-01-18 07:12 | t-ishii | Note Added: 0003069 | |
| 2020-01-18 07:13 | t-ishii | Status | assigned => feedback |
| 2020-01-18 07:47 | dcvythoulkas | Note Added: 0003070 | |
| 2020-01-18 07:47 | dcvythoulkas | Status | feedback => assigned |
| 2020-01-18 09:16 | t-ishii | Note Added: 0003071 | |
| 2020-01-18 09:17 | t-ishii | Status | assigned => feedback |
| 2020-01-18 10:34 | t-ishii | Note Added: 0003072 | |
| 2020-01-18 11:00 | t-ishii | Note Edited: 0003072 | |
| 2020-01-19 02:43 | dcvythoulkas | Note Added: 0003073 | |
| 2020-01-19 02:43 | dcvythoulkas | Status | feedback => assigned |
| 2020-01-19 08:24 | t-ishii | Note Added: 0003074 | |
| 2020-01-19 11:43 | t-ishii | Status | assigned => resolved |