0000552: unable to read data from DB node 0 and "write on backend 0 failed with error :"Success""

ID	Project	Category	View Status	Date Submitted	Last Update

0000552	Pgpool-II	Bug	public	2019-10-09 20:04	2019-10-30 15:04

Reporter	sivapgpool89	Assigned To	t-ishii
Priority	urgent	Severity	major	Reproducibility	unable to reproduce
Status	closed	Resolution	open
Platform	Linux x86-64	OS	Red Hat Enterprise Linux Server	OS Version	7.5 (Maipo)
Product Version	3.7.11

Summary	0000552: unable to read data from DB node 0 and "write on backend 0 failed with error :"Success""
Description	Hi Team, Thanks a lot for your support, you guys are doing an amazing community support. My strong belief without your support, the users like me is not able to maintain the the stack of products.Thank you very much and much appreciate for your ongoing support. ----- Please find the issue case: Recently we have upgraded the ppgool from 3.7.7 to 3.7.11 to over come "write on backend 0 failed with error :"Success"". -- pgpool info: Linux Server Type : Virtual Linux Servers (Master and Slave pgpool nodes VMs Running on Hypervisor) OS version : Red Hat Enterprise Linux Server release 7.5 (Maipo) pgpool-II version (use "product version" menu) : 3.7.11 (amefuriboshi) what mode you use : streaming replication What Functionality you use : Connection Pooling, Load Balancing and Watchdog. attached files : pgpool.conf and pgpool.log Installed Components: pgpool-II-pg95-debuginfo-3.7.11-1pgdg.rhel7.x86_64 pgpool-II-release-3.7-1.noarch pgpool-II-pg95-3.7.11-1pgdg.rhel7.x86_64 pgpool-II-pg95-extensions-3.7.11-1pgdg.rhel7.x86_64 pgpool-II-pg95-devel-3.7.11-1pgdg.rhel7.x86_64 --- PostgreSQL Info: Linux Server Type : Physical Servers OS Version : Red Hat Enterprise Linux Server release 7.6 (Maipo) PostgreSQL version : PostgreSQL 9.5.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit what kind of application language : pl/pgsql for server programming and JAVA 8 Type of workload : GIS Application Installed Components: postgresql95-devel-9.5.15-1PGDG.rhel7.x86_64 postgresql95-9.5.15-1PGDG.rhel7.x86_64 postgresql95-contrib-9.5.15-1PGDG.rhel7.x86_64 postgresql95-server-9.5.15-1PGDG.rhel7.x86_64 postgresql95-libs-9.5.15-1PGDG.rhel7.x86_64 DB Infrastructure: we have 2 pgpool nodes and 4 back-end servers: node_id \| hostname \| port \| status \| lb_weight \| role \| select_cnt \| load_balance_node \| replication_delay ---------+--------------+------+--------+-----------+---------+------------+-------------------+------------------- 0 \| 10.13.61.163 \| 5432 \| up \| 0.250000 \| primary \| 2587017 \| false \| 0 1 \| 10.13.61.164 \| 5432 \| up \| 0.250000 \| standby \| 1511372 \| false \| 0 2 \| 10.13.61.165 \| 5432 \| up \| 0.250000 \| standby \| 1683973 \| true \| 0 3 \| 10.13.61.166 \| 5432 \| up \| 0.250000 \| standby \| 1470830 \| false \| 0 (4 rows) ------------ Now we are receiving below issues from the pgpool logs and application log. I have extracted main issues and all are included in attached log file(for Sep 10 and Oct 9th logs for reference). ------ Issues: we are continuously getting below issues. Could you please help to get solutions for below issues: A). What is causing to get this issues?. B). What configuration i have to change to resolve below issues?. 1. pid 1785: WARNING: watchdog failed to ping host"10.13.61.165" WAWRNING: watchdog failed to ping host"10.13.61.165" 2. LOG: failed to connect to PostgreSQL server on "10.13.61.165:5432" using INET socket DETAIL: health check timer expired 3. WARNING: watchdog lifecheck, failed to connect to any trusted servers LOG: informing the node status change to watchdog DETAIL: node id :0 status = "NODE DEAD" message:"trusted server is unreachable" 4. ERROR:unable to read data from DB node 0 DETAIL: socket read failed with an error "Connection timed out" 5. WARNING: write on backend 0 failed with error :"Success" DETAIL: while trying to write data from offset: 0 wlen: 18 6. FATAL: unable to read data from DB node 0 DETAIL: EOF encountered with backend WARNING: write on backend 0 failed with error :"Success" DETAIL: while trying to write data from offset: 0 wlen: 5 7. ERROR: unable to read data from DB node 0 DETAIL: socket read failed with an error "Connection timed out" CONTEXT: while searching system catalog, When relcache is missed LOG: statement: DISCARD ALL WARNING: write on backend 0 failed with error :"Success" DETAIL: while trying to write data from offset: 0 wlen: 18 ---------- Detailed Log: 1. Sep 10 00:53:15 N1PPRL-UFGA0159 pgpool[1785]: [18-1] 2019-09-10 00:53:15: pid 1785: WARNING: watchdog failed to ping host"10.13.61.165" Sep 10 00:53:15 N1PPRL-UFGA0159 pgpool: 2019-09-10 00:53:15: pid 1785: WARNING: watchdog failed to ping host"10.13.61.165" 2. Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: LOG: failed to connect to PostgreSQL server on "10.13.61.165:5432" using INET socket Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: DETAIL: health check timer expired Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: ERROR: failed to make persistent db connection Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: DETAIL: connection to host:"10.13.61.165:5432" failed Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: LOG: health check retrying on DB node: 2 (round:1) Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [13-2] 2019-09-10 01:02:29: pid 1925: DETAIL: health check timer expired Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [14-1] 2019-09-10 01:02:29: pid 1925: ERROR: failed to make persistent db connection Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [14-2] 2019-09-10 01:02:29: pid 1925: DETAIL: connection to host:"10.13.61.165:5432" failed Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [15-1] 2019-09-10 01:02:29: pid 1925: LOG: health check retrying on DB node: 2 (round:1) Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool[1785]: [23-1] 2019-09-10 01:02:31: pid 1785: WARNING: watchdog failed to ping host"10.13.61.164" 3. Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:31: pid 1785: WARNING: watchdog lifecheck, failed to connect to any trusted servers Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:31: pid 1785: LOG: informing the node status change to watchdog Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:31: pid 1785: DETAIL: node id :0 status = "NODE DEAD" message:"trusted server is unreachable" Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool[1785]: [23-2] 2019-09-10 01:02:31: pid 1785: DETAIL: ping process exits with code: 1 4. Oct 9 12:19:40 N1PPRL-UFGA0159 pgpool[16638]: [29646-2] 2019-10-09 12:19:40: pid 16638: DETAIL: connecting host=n1vw-pa-gis11.india.airtel.itm port=63901 Oct 9 12:21:36 N1PPRL-UFGA0159 pgpool[9131]: [28320-1] 2019-10-09 12:21:36: pid 9131: ERROR: unable to read data from DB node 2 Oct 9 12:21:36 N1PPRL-UFGA0159 pgpool[9131]: [28320-2] 2019-10-09 12:21:36: pid 9131: DETAIL: socket read failed with an error "Connection timed out" Oct 9 12:21:36 N1PPRL-UFGA0159 pgpool[9131]: [28321-1] 2019-10-09 12:21:36: pid 9131: LOG: statement: DISCARD ALL Oct 9 12:21:39 N1PPRL-UFGA0159 pgpool[9131]: [28322-1] 2019-10-09 12:21:39: pid 9131: FATAL: unable to read data from DB node 0 Oct 9 12:21:39 N1PPRL-UFGA0159 pgpool[9131]: [28322-2] 2019-10-09 12:21:39: pid 9131: DETAIL: EOF encountered with backend Oct 9 12:21:39 N1PPRL-UFGA0159 pgpool[9131]: [28323-1] 2019-10-09 12:21:39: pid 9131: WARNING: write on backend 0 failed with error :"Success" Oct 9 12:21:39 N1PPRL-UFGA0159 pgpool[9131]: [28323-2] 2019-10-09 12:21:39: pid 9131: DETAIL: while trying to write data from offset: 0 wlen: 5 5. Oct 9 12:30:01 N1PPRL-UFGA0159 pgpool[29546]: [26137-2] 2019-10-09 12:30:01: pid 29546: DETAIL: connecting host=n1vw-pa-gis12.india.airtel.itm port=53026 Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28921-1] 2019-10-09 12:31:06: pid 19856: ERROR: unable to read data from DB node 0 Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28921-2] 2019-10-09 12:31:06: pid 19856: DETAIL: socket read failed with an error "Connection timed out" Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28922-1] 2019-10-09 12:31:06: pid 19856: LOG: statement: DISCARD ALL Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28923-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 0 failed with error :"Success" Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28923-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 18 Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28924-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 0 failed with error :"Success" Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28924-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28925-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 1 failed with error :"Success" Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28925-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28926-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 2 failed with error :"Success" Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28926-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28927-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 3 failed with error :"Success" Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28927-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28928-1] 2019-10-09 12:31:06: pid 19856: LOG: new connection received Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28928-2] 2019-10-09 12:31:06: pid 19856: DETAIL: connecting host=10.92.204.141 port=51938 6. Oct 9 12:31:11 N1PPRL-UFGA0159 pgpool[20079]: [27321-1] 2019-10-09 12:31:11: pid 20079: LOG: new connection received Oct 9 12:31:11 N1PPRL-UFGA0159 pgpool[20079]: [27321-2] 2019-10-09 12:31:11: pid 20079: DETAIL: connecting host=n1vw-pa-gis12.india.airtel.itm port=53027 Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26342-1] 2019-10-09 12:31:54: pid 18576: ERROR: unable to read data from DB node 0 Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26342-2] 2019-10-09 12:31:54: pid 18576: DETAIL: socket read failed with an error "Connection timed out" Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26342-3] 2019-10-09 12:31:54: pid 18576: CONTEXT: while searching system catalog, When relcache is missed Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26343-1] 2019-10-09 12:31:54: pid 18576: LOG: statement: DISCARD ALL Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26344-1] 2019-10-09 12:31:54: pid 18576: WARNING: write on backend 0 failed with error :"Success" Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26344-2] 2019-10-09 12:31:54: pid 18576: DETAIL: while trying to write data from offset: 0 wlen: 18 Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26345-1] 2019-10-09 12:31:54: pid 18576: WARNING: write on backend 0 failed with error :"Success" Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26345-2] 2019-10-09 12:31:54: pid 18576: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26346-1] 2019-10-09 12:31:54: pid 18576: LOG: new connection received ------- Please help me to resolve the above issues. Also please help me to clarify below queries: 1. Is PostgreSQL Linux Servers and Pgpool Linux Servers should need the same server types( Both Physical/Virtual Servers). If it is mismatch, is there any impact?. 2. Is mandatory to have same OS version on both PostgreSQL Linux Servers and Pgpool Linux Servers?. 3. How to route one particular application server traffic to master database. Is there any kind of functionality like black_function_list?. 4. Some times pgpool is taking more time to execute : "show poo_nodes" / other queries from pgpool as well as pg admin. Please suggest the process to troubleshoot this king of issues?. 5. Some times pgpool is getting failed to ping DB nodes. Please suggest the process to config the ping retries like DB health configurations?. 6. Please suggest the process to create the redundancy network interface for pgpool to overcome network related issues?. Seems this is very long list of items, Please suggest the required changes and respond earliest. Once again thanks a lot for your support. Regards, Siva.
Tags	No tags attached.

sivapgpool89 2019-10-09 20:04 reporter	pgpool_log.log (29,916 bytes) Master_Pool_Config_Values.sql (42,758 bytes) Master_Pool_Config_Values.sql (42,758 bytes)

t-ishii 2019-10-10 15:30 developer ~0002915	There are so many questions:-) Let's attack one by one... In the log below health check seems to retry despite health_check_retry_delay = 10, meaning next rerty should happen after 10 seconds passed. (i.e. 01:02:39, rather than 01:02:29) Was your pgpool.conf (Master_Pool_Config_Values.sql) used while the logs below was taken? Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: LOG: failed to connect to PostgreSQL server on "10.13.61.165:5432" using INET socket Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: DETAIL: health check timer expired Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: ERROR: failed to make persistent db connection Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: DETAIL: connection to host:"10.13.61.165:5432" failed Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: LOG: health check retrying on DB node: 2 (round:1) Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [13-2] 2019-09-10 01:02:29: pid 1925: DETAIL: health check timer expired Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [14-1] 2019-09-10 01:02:29: pid 1925: ERROR: failed to make persistent db connection Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [14-2] 2019-09-10 01:02:29: pid 1925: DETAIL: connection to host:"10.13.61.165:5432" failed Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [15-1] 2019-09-10 01:02:29: pid 1925: LOG: health check retrying on DB node: 2 (round:1) Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool[1785]: [23-1] 2019-09-10 01:02:31: pid 1785: WARNING: watchdog failed to ping host"10.13.61.164"

sivapgpool89 2019-10-10 20:40 reporter ~0002916	Thanks t-ishill. In the log below health check seems to retry despite health_check_retry_delay = 10, meaning next rerty should happen after 10 seconds passed. (i.e. 01:02:39, rather than 01:02:29) Was your pgpool.conf (Master_Pool_Config_Values.sql) used while the logs below was taken? ==> Yes So, is there any impact because of this?. Please suggest on the same. Also please respond the other questions as well. Along above we observed pgpool strange behaviors , pgpool is running , but not able to accept the connections from the clients and finally getting server not listen on port. Please suggest the reasons why this kind of strange behavior of pgpool and how to over come the same.

t-ishii 2019-10-10 22:34 developer ~0002917	> So, is there any impact because of this?. Yes, because health check does not work normally. BTW, all theses problems start to happen as soon as Pgpool-II starts? Or they occasionally happen? If so, is there any possible trigger? > Please suggest the reasons why this kind of strange behavior of pgpool and how to over come the same. At this point no idea. It seems the network is unstable from Pgpool-II's point of view but I am not sure.

t-ishii 2019-10-11 09:08 developer ~0002919 Last edited: 2019-10-11 09:19	I have noticed possible issues in your pgpool.conf: 1) trusted_servers points to backend node ip addresses. This is plain wrong. trusted_servers should point to other than PostgreSQL backend and pgpool. If you cannot do that, you should disable trusted_servers (set empty string). From your log pgpool failed to ping to all backend ips and decided to suicide, this prevented health check process to retry more times. If trusted_servers were set correctly (or set to empty string), health check might succeed retrying. I don't know why pgpool failed to ping to all PostgreSQL backends, but I think that there was a network problem between pgpool and backend nodes at that time. 2) failover_command is disable. If PostgreSQL primary goes down, standby server will not be automatically promoted. I am not sure it is intended though.

t-ishii 2019-10-11 09:36 developer ~0002920 Last edited: 2019-10-11 13:09	> 1. Is PostgreSQL Linux Servers and Pgpool Linux Servers should need the same server types( Both Physical/Virtual Servers). No. > 2. Is mandatory to have same OS version on both PostgreSQL Linux Servers and Pgpool Linux Servers?. No. > 3. How to route one particular application server traffic to master database. Is there any kind of functionality like black_function_list?. If the application running on the application server uses unique application_name or database name, then you can use app_name_redirect_preference_list or database_redirect_preference_list to control which server the clients connect to. > 4. Some times pgpool is taking more time to execute : "show poo_nodes" / other queries from pgpool as well as pg admin. Please suggest the process to troubleshoot this king of issues?. No idea. The command is simple and there's no part which could take time to obtain data (basically it just fetches data from shared memory). If I were you, I will use debugger or profile tools to know which part of code takes long time but this would require knowledge about internal of Pgpool-II. > 5. Some times pgpool is getting failed to ping DB nodes. Please suggest the process to config the ping retries like DB health configurations?. As I said, ping to DB nodes is not a good idea. > 6. Please suggest the process to create the redundancy network interface for pgpool to overcome network related issues?. Pgpool-II itself does not have such a feature except heartbeat. You'd better to use bonding or similar technique to use multiple physical netoworks for that purpose. I am not a network expert, and you' better to consult to someone else.

sivapgpool89 2019-10-11 16:25 reporter ~0002921	Thanks A Lot t-ishii for your suggestions, i am checking and share findings.

sivapgpool89 2019-10-11 18:34 reporter ~0002922	Hi t-ishii, Thanks for your support. Could you please help me get to resolve below issues as well. 1. ERROR:unable to read data from DB node 0 DETAIL: socket read failed with an error "Connection timed out" 2. WARNING: write on backend 0 failed with error :"Success" DETAIL: while trying to write data from offset: 0 wlen: 18 3. FATAL: unable to read data from DB node 0 DETAIL: EOF encountered with backend WARNING: write on backend 0 failed with error :"Success" DETAIL: while trying to write data from offset: 0 wlen: 5 4. ERROR: unable to read data from DB node 0 DETAIL: socket read failed with an error "Connection timed out" CONTEXT: while searching system catalog, When relcache is missed LOG: statement: DISCARD ALL WARNING: write on backend 0 failed with error :"Success" DETAIL: while trying to write data from offset: 0 wlen: 18 Regards, Siva.

t-ishii 2019-10-12 13:21 developer ~0002923	Basically they are network layer errors reported by the kernel. > 1. > ERROR:unable to read data from DB node 0 > DETAIL: socket read failed with an error "Connection timed out" Pgpool tried to read data from backend 0 but failed. Backend 0 did not send a response in specified time. > 2. > WARNING: write on backend 0 failed with error :"Success" > DETAIL: while trying to write data from offset: 0 wlen: 18 Connection to backend 0 was already broken. Pgpool-II failed to write to the socket. > 3. > FATAL: unable to read data from DB node 0 > DETAIL: EOF encountered with backend > WARNING: write on backend 0 failed with error :"Success" > DETAIL: while trying to write data from offset: 0 wlen: 5 Due to the same reason as 2, Pgpool-II failed to read from the socket to backend 0. > 4. > ERROR: unable to read data from DB node 0 > DETAIL: socket read failed with an error "Connection timed out" > CONTEXT: while searching system catalog, When relcache is missed > LOG: statement: DISCARD ALL > WARNING: write on backend 0 failed with error :"Success" > DETAIL: while trying to write data from offset: 0 wlen: 18 Same reason as 3.

sivapgpool89 2019-10-12 16:39 reporter ~0002924	Thank you very much t-ishii. Sorry for too many number of issues. from your comment "Basically they are network layer errors reported by the kernel", i clearly understood these are because of the network issues. But in log i havn't seen any network breakage between the servers DB nodes and pgpool nodes for above issues. Above issues are reporting almost most of the time in pgpool log file. in one case , one incident was happend( In Master DB , one proc having DML statements is running from previous mid night and it is almost nearly taken all the cpu resources(90% to 100%) and proc hold locks on tables beacuse of that all other sessions locking each other and almost all 108 sessions are went waiting state) I have observed few points at that time.. and i am not sure because of the above issue might cause to report above issues. 1. Not able to run psql -U pgpool --host 10.222.164.81 --dbname postgres -c "show pool_nodes" and it took long time and finally generates " Could not connect to server". 2. It generate the below issues in log: a. ERROR:unable to read data from DB node 0 DETAIL: socket read failed with an error "Connection timed out" b. WARNING: write on backend 0 failed with error :"Success" DETAIL: while trying to write data from offset: 0 wlen: 18 c. FATAL: unable to read data from DB node 0 DETAIL: EOF encountered with backend WARNING: write on backend 0 failed with error :"Success" DETAIL: while trying to write data from offset: 0 wlen: 5 After terminating proc and cancel all locking session and restarting pgpool , pgpool can able to accept the connections. But even restart also occasionally reporting the above issues and application also reporting below messages. <Warning> <Default> <BEA-000000> <SQLException occurred while connecting to 10.222.167.250: 5432 org.postgresql.util.PSQLException: ERROR: unable to read data from DB node 1 Detail: socket read failed with an error "Connection timed out" --- Could you please find the responses for your response. > 1. > ERROR:unable to read data from DB node 0 > DETAIL: socket read failed with an error "Connection timed out" Pgpool tried to read data from backend 0 but failed. Backend 0 did not send a response in specified time. kindly Please find my My Response and Questions: 1. Can we ignore this in first phase?. Is might cause any other issues?. 2. actual there is no network breakage, what might cause to failed to read the data?. 3. What is the threshold time to send the response to pgpool from backend?. 4. How to configure the thresholds to overcome this kind of issues? 5. Is this because of the connection pooling?.we are using ppgool for connection pooling and load balancing purpose. 6. Please suggest how to overcome this issue?. > 2. > WARNING: write on backend 0 failed with error :"Success" > DETAIL: while trying to write data from offset: 0 wlen: 18 Connection to backend 0 was already broken. Pgpool-II failed to write to the socket. Response: 1. What might cause to broken the connections.? 2. Is idle connection configirations cause these issues?. 3. Is this expected behavior? or it will impact pgpool in long term?. 4. Please suggest how to resolve this issues?. below are the Life Time Configurations: # - Life time - child_life_time = 300 child_max_connections = 0 connection_life_time = 0 client_idle_limit = 0 3. Occasinally we are getting backend node degeneration and failover in logs. health_check_period = 20 health_check_timeout = 20 health_check_user = 'pgpool' health_check_password = '' health_check_database = 'postgres' health_check_max_retries = 10 health_check_retry_delay = 10 connect_timeout = 80000 from your previous comments "In the log below health check seems to retry despite health_check_retry_delay = 10, meaning next rerty should happen after 10 seconds passed. (i.e. 01:02:39, rather than 01:02:29)". Above are the configirations, we thought we miss configured the health check configirations, this causing problems. Could you please suggest the some best values to mitigte the issues?. We are using ppgool only for Connection pooling and LB . Can we configure backend_flag0 = 'DISALLOW_TO_FAILOVER' ? , is this will help in the situations?. Thanks You very much for support. Regards, Siva. ---

t-ishii 2019-10-14 11:07 developer ~0002925	I a waiting for the answer for the question below. > BTW, all theses problems start to happen as soon as Pgpool-II starts? > Or they occasionally happen? If so, is there any possible trigger?

sivapgpool89 2019-10-14 13:09 reporter ~0002926	Hi t-ishii, thanks for update. please find the response. > BTW, all theses problems start to happen as soon as Pgpool-II starts? these problems are reporting more frequently. > Or they occasionally happen? If so, is there any possible trigger? Application is accessing pgpool. Could you please let me know what do you mean possible trigger? Regards, Siva.

sivapgpool89 2019-10-16 22:20 reporter ~0002928	Hi t-ishii, Could you please update on this request. Thanks, Siva.

t-ishii 2019-10-18 07:52 developer ~0002933	> > BTW, all theses problems start to happen as soon as Pgpool-II starts? > these problems are reporting more frequently. Sorry, I don't understand what you are saying. > > Or they occasionally happen? If so, is there any possible trigger? > Application is accessing pgpool. > Could you please let me know what do you mean possible trigger? If Pgpool-II was running for a while without problem and suddenly you start to see issues, then we can logically guess that something could make the issue happen. I use "trigger" for this.

sivapgpool89 2019-10-21 18:55 reporter ~0002937	Hi t-ishii, Thanks for update. Below are the few observations. At that particular points of time, all num_init_children got full. please find the configirations at the point: num_init_children = 128 max_pool = 8 # - Life time - child_life_time = 300 child_max_connections = 0 connection_life_time = 0 client_idle_limit = 0 [cyient@N1PPRL-UFGA0159 ~]$ netstat -s \| grep overflow 369125 times the listen queue of a socket overflowed [root@s2n ~]# sysctl net.core.somaxconn net.core.somaxconn = 128 -- As soon as i found connection full, i have changed config to below values. num_init_children = 500 max_pool = 4 # - Life time - child_life_time = 180 child_max_connections = 0 connection_life_time = 180 client_idle_limit = 180 After above changes , I havn't found "Connection Timeout" issues and every thing was good. But due to client_idle_limit, pool was terminating the client connections ans causing some application issues.So again i have changed the configs. Then i have tweek some of the configs, below are the latest configirations. num_init_children = 1000 max_pool = 1 # - Life time - child_life_time = 300 child_max_connections = 0 connection_life_time = 0 client_idle_limit = 0 From changing the configs onwards, i am getting below issues almost every hours, please find the below error messages from the log. - Oct 21 01:42:50 N1PPRL-UFGA0159 pgpool[19142]: [81863-1] 2019-10-21 01:42:50: pid 19142: ERROR: unable to read data from DB node 1 Oct 21 01:42:50 N1PPRL-UFGA0159 pgpool[19142]: [81863-2] 2019-10-21 01:42:50: pid 19142: DETAIL: socket read failed with an error "Connection timed out" Oct 21 01:42:50 N1PPRL-UFGA0159 pgpool[19142]: [81864-1] 2019-10-21 01:42:50: pid 19142: LOG: statement: DISCARD ALL Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[19142]: [81865-1] 2019-10-21 01:42:51: pid 19142: FATAL: unable to read data from DB node 0 Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[19142]: [81865-2] 2019-10-21 01:42:51: pid 19142: DETAIL: EOF encountered with backend Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[19142]: [81866-1] 2019-10-21 01:42:51: pid 19142: WARNING: write on backend 0 failed with error :"Success" Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[19142]: [81866-2] 2019-10-21 01:42:51: pid 19142: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[2407]: [29694-1] 2019-10-21 01:42:51: pid 2407: LOG: child process with pid: 19142 exits with status 256 Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[23583]: [29435-1] 2019-10-21 06:22:00: pid 23583: LOG: new connection received Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[23583]: [29435-2] 2019-10-21 06:22:00: pid 23583: DETAIL: connecting host=n1vw-pa-gis14.india.airtel.itm port=51900 Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[2407]: [30718-1] 2019-10-21 06:22:00: pid 2407: LOG: child process with pid: 28320 exits with status 256 Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[2407]: [30719-1] 2019-10-21 06:22:00: pid 2407: LOG: fork a new child process with pid: 9333 Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[2407]: [30720-1] 2019-10-21 06:22:00: pid 2407: LOG: child process with pid: 29056 exits with status 256 Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[2407]: [30721-1] 2019-10-21 06:22:00: pid 2407: LOG: fork a new child process with pid: 9334 Oct 21 06:23:21 N1PPRL-UFGA0159 pgpool[2407]: [30722-1] 2019-10-21 06:23:21: pid 2407: LOG: child process with pid: 6256 exits with status 256 Oct 21 06:23:21 N1PPRL-UFGA0159 pgpool[2407]: [30723-1] 2019-10-21 06:23:21: pid 2407: LOG: fork a new child process with pid: 9519 Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108033-1] 2019-10-21 06:24:02: pid 24428: ERROR: unable to read data from DB node 0 Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108033-2] 2019-10-21 06:24:02: pid 24428: DETAIL: socket read failed with an error "Connection timed out" Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108034-1] 2019-10-21 06:24:02: pid 24428: LOG: statement: DISCARD ALL Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108035-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 0 failed with error :"Success" Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108035-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 18 Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108036-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 0 failed with error :"Success" Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108036-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108037-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 1 failed with error :"Success" Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108037-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108038-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 2 failed with error :"Success" Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108038-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108039-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 3 failed with error :"Success" Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108039-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 5 Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[8055]: [30690-1] 2019-10-21 06:24:02: pid 8055: LOG: statement: DISCARD ALL Is i have miss something/miss configirations?.Are these issues related to any other issues?. Could you please suggest how to resolve this kind of issues?. Thanks A Lot for your support. Regards, Siva.

sivapgpool89 2019-10-22 12:27 reporter ~0002938	Hi t-ishii, Please share possible cases to trigger above issues, so that we will try to check all possible causes. Please share the source code related to this issues. Thank you SIva.

t-ishii 2019-10-23 18:45 developer ~0002939	You can use grep command: cd /path/to/pgpool/source/code cd src grep -i "write on backend" /.c

sivapgpool89 2019-10-24 13:48 reporter ~0002940	Thanks for update source code details t-ishii. Could you please update on issues.

t-ishii 2019-10-24 16:35 developer ~0002941	> Is i have miss something/miss configirations?. Besides the trusted_servrs, so far so good. > Are these issues related to any other issues?. Probably. As I said before there could be issues: 1) the network connection between pgpool and PostgreSQL, 2) PostgreSQL itself. > Could you please suggest how to resolve this kind of issues?. I have no idea. They are not related to Pgpool-II itself. Again, I suggest you check the network and PostgreSQL.

sivapgpool89 2019-10-25 13:14 reporter ~0002942	Thanks for update t-ishii.

sivapgpool89 2019-10-30 13:35 reporter ~0002950	Thanks a lot for your support. Please close this task.

t-ishii 2019-10-30 15:04 developer ~0002951	Thanks. I am going to close the issue.

Date Modified	Username	Field	Change
2019-10-09 20:04	sivapgpool89	New Issue
2019-10-09 20:04	sivapgpool89	File Added: pgpool_log.log
2019-10-09 20:04	sivapgpool89	File Added: Master_Pool_Config_Values.sql
2019-10-10 15:12	t-ishii	Assigned To	=> t-ishii
2019-10-10 15:12	t-ishii	Status	new => assigned
2019-10-10 15:30	t-ishii	Note Added: 0002915
2019-10-10 15:30	t-ishii	Status	assigned => feedback
2019-10-10 15:30	t-ishii	Description Updated
2019-10-10 20:40	sivapgpool89	Note Added: 0002916
2019-10-10 20:40	sivapgpool89	Status	feedback => assigned
2019-10-10 22:34	t-ishii	Note Added: 0002917
2019-10-11 09:08	t-ishii	Note Added: 0002919
2019-10-11 09:09	t-ishii	Status	assigned => feedback
2019-10-11 09:19	t-ishii	Note Edited: 0002919
2019-10-11 09:36	t-ishii	Note Added: 0002920
2019-10-11 13:09	t-ishii	Note Edited: 0002920
2019-10-11 16:25	sivapgpool89	Note Added: 0002921
2019-10-11 16:25	sivapgpool89	Status	feedback => assigned
2019-10-11 18:34	sivapgpool89	Note Added: 0002922
2019-10-12 13:21	t-ishii	Note Added: 0002923
2019-10-12 13:21	t-ishii	Status	assigned => feedback
2019-10-12 16:39	sivapgpool89	Note Added: 0002924
2019-10-12 16:39	sivapgpool89	Status	feedback => assigned
2019-10-14 11:07	t-ishii	Note Added: 0002925
2019-10-14 11:07	t-ishii	Status	assigned => feedback
2019-10-14 13:09	sivapgpool89	Note Added: 0002926
2019-10-14 13:09	sivapgpool89	Status	feedback => assigned
2019-10-16 22:20	sivapgpool89	Note Added: 0002928
2019-10-18 07:52	t-ishii	Note Added: 0002933
2019-10-18 08:01	t-ishii	Status	assigned => feedback
2019-10-21 18:55	sivapgpool89	Note Added: 0002937
2019-10-21 18:55	sivapgpool89	Status	feedback => assigned
2019-10-22 12:27	sivapgpool89	Note Added: 0002938
2019-10-23 18:45	t-ishii	Note Added: 0002939
2019-10-24 13:48	sivapgpool89	Note Added: 0002940
2019-10-24 16:35	t-ishii	Note Added: 0002941
2019-10-25 10:25	t-ishii	Status	assigned => feedback
2019-10-25 13:14	sivapgpool89	Note Added: 0002942
2019-10-25 13:14	sivapgpool89	Status	feedback => assigned
2019-10-30 13:35	sivapgpool89	Note Added: 0002950
2019-10-30 15:04	t-ishii	Note Added: 0002951
2019-10-30 15:04	t-ishii	Status	assigned => closed

View Issue Details

Activities

Issue History