[pgpool-general: 6501] Pgpool Primary node not automatically returning to cluster

Nitish Kumar itcell.mpwz at mp.gov.in
Thu Apr 4 21:45:39 JST 2019


Hi Team,

I am using Pgpoo II 3.7 with 3 PostgreSQL 10.6 nodes at the backend.

Everything is working fine. But today we have noticed something unusual.

During a normal production run with heavy traffic our Primary Node went down due to network failure i.e the network between pgpool-11(master) server
and primary node went off ! 
So the pgpool output the following lines in log :

2019-04-04 16:12:56: pid 27680:LOG: failed to connect to PostgreSQL server on "172.18.0.160:5432", getsockopt() detected error "No route to host"

Our write requests started failing ! When we got alerted we debugged and found that Master DB or primary node was up and working fine. Only the network between
primary node & pgpool II master server was down. 

We fixed it ! and Pgpool II master process was able to connect to Primary Node. But it did not returned the primary node automatically to Pgpool cluster.
We got following lines in the logs continuously :


Apr 4 16:13:12 pgpool2 pgpool[21822]: [2325-1] 2019-04-04 16:13:12: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:12 pgpool2 pgpool[21822]: [2326-1] 2019-04-04 16:13:12: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:12 pgpool2 pgpool[21822]: [2327-1] 2019-04-04 16:13:12: pid 21822:LOG: find_primary_node: checking backend no 2
Apr 4 16:13:13 pgpool2 pgpool[21822]: [2328-1] 2019-04-04 16:13:13: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:13 pgpool2 pgpool[21822]: [2329-1] 2019-04-04 16:13:13: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:13 pgpool2 pgpool[21822]: [2330-1] 2019-04-04 16:13:13: pid 21822:LOG: find_primary_node: checking backend no 2
Apr 4 16:13:14 pgpool2 pgpool[21822]: [2331-1] 2019-04-04 16:13:14: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:14 pgpool2 pgpool[21822]: [2332-1] 2019-04-04 16:13:14: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:14 pgpool2 pgpool[21822]: [2333-1] 2019-04-04 16:13:14: pid 21822:LOG: find_primary_node: checking backend no 2
Apr 4 16:13:15 pgpool2 pgpool[21822]: [2334-1] 2019-04-04 16:13:15: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:15 pgpool2 pgpool[21822]: [2335-1] 2019-04-04 16:13:15: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:15 pgpool2 pgpool[21822]: [2336-1] 2019-04-04 16:13:15: pid 21822:LOG: find_primary_node: checking backend no 2
Apr 4 16:13:16 pgpool2 pgpool[21822]: [2337-1] 2019-04-04 16:13:16: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:16 pgpool2 pgpool[21822]: [2338-1] 2019-04-04 16:13:16: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:16 pgpool2 pgpool[21822]: [2339-1] 2019-04-04 16:13:16: pid 21822:LOG: find_primary_node: checking backend no 2
Apr 4 16:13:17 pgpool2 pgpool[27247]: [2976-1] 2019-04-04 16:13:17: pid 27247:LOG: Replication of node:2 is behind 695032 bytes from the primary server (node:0)
Apr 4 16:13:17 pgpool2 pgpool[27247]: [2976-2] 2019-04-04 16:13:17: pid 27247:CONTEXT: while checking replication time lag
Apr 4 16:13:17 pgpool2 pgpool[21822]: [2340-1] 2019-04-04 16:13:17: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:17 pgpool2 pgpool[21822]: [2341-1] 2019-04-04 16:13:17: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:17 pgpool2 pgpool[21822]: [2342-1] 2019-04-04 16:13:17: pid 21822:LOG: find_primary_node: checking backend no 2
Apr 4 16:13:19 pgpool2 pgpool[21822]: [2343-1] 2019-04-04 16:13:19: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:19 pgpool2 pgpool[21822]: [2344-1] 2019-04-04 16:13:19: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:19 pgpool2 pgpool[21822]: [2345-1] 2019-04-04 16:13:19: pid 21822:LOG: find_primary_node: checking backend no 2
Apr 4 16:13:20 pgpool2 pgpool[21822]: [2346-1] 2019-04-04 16:13:20: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:20 pgpool2 pgpool[21822]: [2347-1] 2019-04-04 16:13:20: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:20 pgpool2 pgpool[21822]: [2348-1] 2019-04-04 16:13:20: pid 21822:LOG: find_primary_node: checking backend no 2
Apr 4 16:13:21 pgpool2 pgpool[21822]: [2349-1] 2019-04-04 16:13:21: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:21 pgpool2 pgpool[21822]: [2350-1] 2019-04-04 16:13:21: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:21 pgpool2 pgpool[21822]: [2351-1] 2019-04-04 16:13:21: pid 21822:LOG: find_primary_node: checking backend no 2
Apr 4 16:13:22 pgpool2 pgpool[21822]: [2352-1] 2019-04-04 16:13:22: pid 21822:LOG: find_primary_node: checking backend no 0
Apr 4 16:13:22 pgpool2 pgpool[21822]: [2353-1] 2019-04-04 16:13:22: pid 21822:LOG: find_primary_node: checking backend no 1
Apr 4 16:13:22 pgpool2 pgpool[21822]: [2354-1] 2019-04-04 16:13:22: pid 21822:LOG: find_primary_node: checking backend no 2


To get the primary node back into pgpool II cluster we have to manually click return in PgPoolAdmin web-app. 


My concern is why primary node did not returned to the cluster automatically after the network resolved ???


Kindly help guys so that I can avert this kind of failovers in future. Is there something I am missing here ??


Regards,
Nitish Kumar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20190404/49c325b6/attachment-0001.html>


More information about the pgpool-general mailing list