View Issue Details

IDProjectCategoryView StatusLast Update
0000552Pgpool-IIBugpublic2019-10-30 15:04
Reportersivapgpool89 Assigned Tot-ishii  
PriorityurgentSeveritymajorReproducibilityunable to reproduce
Status closedResolutionopen 
PlatformLinux x86-64OSRed Hat Enterprise Linux Server OS Version 7.5 (Maipo)
Product Version3.7.11 
Summary0000552: unable to read data from DB node 0 and "write on backend 0 failed with error :"Success""
DescriptionHi Team,

Thanks a lot for your support, you guys are doing an amazing community support. My strong belief without your support, the users like me is not able to maintain the the stack of products.Thank you very much and much appreciate for your ongoing support.

-----
Please find the issue case:

Recently we have upgraded the ppgool from 3.7.7 to 3.7.11 to over come "write on backend 0 failed with error :"Success"".

--

pgpool info:

Linux Server Type : Virtual Linux Servers (Master and Slave pgpool nodes VMs Running on Hypervisor)
OS version : Red Hat Enterprise Linux Server release 7.5 (Maipo)
pgpool-II version (use "product version" menu) : 3.7.11 (amefuriboshi)
what mode you use : streaming replication
What Functionality you use : Connection Pooling, Load Balancing and Watchdog.
attached files : pgpool.conf and pgpool.log

Installed Components:
pgpool-II-pg95-debuginfo-3.7.11-1pgdg.rhel7.x86_64
pgpool-II-release-3.7-1.noarch
pgpool-II-pg95-3.7.11-1pgdg.rhel7.x86_64
pgpool-II-pg95-extensions-3.7.11-1pgdg.rhel7.x86_64
pgpool-II-pg95-devel-3.7.11-1pgdg.rhel7.x86_64

---
PostgreSQL Info:

Linux Server Type : Physical Servers
OS Version : Red Hat Enterprise Linux Server release 7.6 (Maipo)
PostgreSQL version : PostgreSQL 9.5.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
what kind of application language : pl/pgsql for server programming and JAVA 8
Type of workload : GIS Application

Installed Components:
postgresql95-devel-9.5.15-1PGDG.rhel7.x86_64
postgresql95-9.5.15-1PGDG.rhel7.x86_64
postgresql95-contrib-9.5.15-1PGDG.rhel7.x86_64
postgresql95-server-9.5.15-1PGDG.rhel7.x86_64
postgresql95-libs-9.5.15-1PGDG.rhel7.x86_64

DB Infrastructure:

we have 2 pgpool nodes and 4 back-end servers:

 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+--------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0 | 10.13.61.163 | 5432 | up | 0.250000 | primary | 2587017 | false | 0
 1 | 10.13.61.164 | 5432 | up | 0.250000 | standby | 1511372 | false | 0
 2 | 10.13.61.165 | 5432 | up | 0.250000 | standby | 1683973 | true | 0
 3 | 10.13.61.166 | 5432 | up | 0.250000 | standby | 1470830 | false | 0
(4 rows)

------------

Now we are receiving below issues from the pgpool logs and application log. I have extracted main issues and all are included in attached log file(for Sep 10 and Oct 9th logs for reference).

------
Issues:

we are continuously getting below issues.

Could you please help to get solutions for below issues:

A). What is causing to get this issues?.
B). What configuration i have to change to resolve below issues?.

1.
pid 1785: WARNING: watchdog failed to ping host"10.13.61.165"
WAWRNING: watchdog failed to ping host"10.13.61.165"

2.
LOG: failed to connect to PostgreSQL server on "10.13.61.165:5432" using INET socket
DETAIL: health check timer expired

3.
WARNING: watchdog lifecheck, failed to connect to any trusted servers
LOG: informing the node status change to watchdog
DETAIL: node id :0 status = "NODE DEAD" message:"trusted server is unreachable"

4.
ERROR:unable to read data from DB node 0
DETAIL: socket read failed with an error "Connection timed out"

5.
WARNING: write on backend 0 failed with error :"Success"
DETAIL: while trying to write data from offset: 0 wlen: 18

6.
FATAL: unable to read data from DB node 0
DETAIL: EOF encountered with backend
WARNING: write on backend 0 failed with error :"Success"
DETAIL: while trying to write data from offset: 0 wlen: 5

7.
ERROR: unable to read data from DB node 0
DETAIL: socket read failed with an error "Connection timed out"
CONTEXT: while searching system catalog, When relcache is missed
LOG: statement: DISCARD ALL
WARNING: write on backend 0 failed with error :"Success"
DETAIL: while trying to write data from offset: 0 wlen: 18

----------

Detailed Log:

1.
Sep 10 00:53:15 N1PPRL-UFGA0159 pgpool[1785]: [18-1] 2019-09-10 00:53:15: pid 1785: WARNING: watchdog failed to ping host"10.13.61.165"
Sep 10 00:53:15 N1PPRL-UFGA0159 pgpool: 2019-09-10 00:53:15: pid 1785: WARNING: watchdog failed to ping host"10.13.61.165"

2.
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: LOG: failed to connect to PostgreSQL server on "10.13.61.165:5432" using INET socket
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: DETAIL: health check timer expired
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: ERROR: failed to make persistent db connection
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: DETAIL: connection to host:"10.13.61.165:5432" failed
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: LOG: health check retrying on DB node: 2 (round:1)
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [13-2] 2019-09-10 01:02:29: pid 1925: DETAIL: health check timer expired
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [14-1] 2019-09-10 01:02:29: pid 1925: ERROR: failed to make persistent db connection
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [14-2] 2019-09-10 01:02:29: pid 1925: DETAIL: connection to host:"10.13.61.165:5432" failed
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [15-1] 2019-09-10 01:02:29: pid 1925: LOG: health check retrying on DB node: 2 (round:1)
Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool[1785]: [23-1] 2019-09-10 01:02:31: pid 1785: WARNING: watchdog failed to ping host"10.13.61.164"


3.
Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:31: pid 1785: WARNING: watchdog lifecheck, failed to connect to any trusted servers
Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:31: pid 1785: LOG: informing the node status change to watchdog
Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:31: pid 1785: DETAIL: node id :0 status = "NODE DEAD" message:"trusted server is unreachable"
Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool[1785]: [23-2] 2019-09-10 01:02:31: pid 1785: DETAIL: ping process exits with code: 1


4.
Oct 9 12:19:40 N1PPRL-UFGA0159 pgpool[16638]: [29646-2] 2019-10-09 12:19:40: pid 16638: DETAIL: connecting host=n1vw-pa-gis11.india.airtel.itm port=63901
Oct 9 12:21:36 N1PPRL-UFGA0159 pgpool[9131]: [28320-1] 2019-10-09 12:21:36: pid 9131: ERROR: unable to read data from DB node 2
Oct 9 12:21:36 N1PPRL-UFGA0159 pgpool[9131]: [28320-2] 2019-10-09 12:21:36: pid 9131: DETAIL: socket read failed with an error "Connection timed out"
Oct 9 12:21:36 N1PPRL-UFGA0159 pgpool[9131]: [28321-1] 2019-10-09 12:21:36: pid 9131: LOG: statement: DISCARD ALL
Oct 9 12:21:39 N1PPRL-UFGA0159 pgpool[9131]: [28322-1] 2019-10-09 12:21:39: pid 9131: FATAL: unable to read data from DB node 0
Oct 9 12:21:39 N1PPRL-UFGA0159 pgpool[9131]: [28322-2] 2019-10-09 12:21:39: pid 9131: DETAIL: EOF encountered with backend
Oct 9 12:21:39 N1PPRL-UFGA0159 pgpool[9131]: [28323-1] 2019-10-09 12:21:39: pid 9131: WARNING: write on backend 0 failed with error :"Success"
Oct 9 12:21:39 N1PPRL-UFGA0159 pgpool[9131]: [28323-2] 2019-10-09 12:21:39: pid 9131: DETAIL: while trying to write data from offset: 0 wlen: 5


5.
Oct 9 12:30:01 N1PPRL-UFGA0159 pgpool[29546]: [26137-2] 2019-10-09 12:30:01: pid 29546: DETAIL: connecting host=n1vw-pa-gis12.india.airtel.itm port=53026
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28921-1] 2019-10-09 12:31:06: pid 19856: ERROR: unable to read data from DB node 0
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28921-2] 2019-10-09 12:31:06: pid 19856: DETAIL: socket read failed with an error "Connection timed out"
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28922-1] 2019-10-09 12:31:06: pid 19856: LOG: statement: DISCARD ALL
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28923-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 0 failed with error :"Success"
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28923-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 18
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28924-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 0 failed with error :"Success"
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28924-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28925-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 1 failed with error :"Success"
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28925-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28926-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 2 failed with error :"Success"
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28926-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28927-1] 2019-10-09 12:31:06: pid 19856: WARNING: write on backend 3 failed with error :"Success"
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28927-2] 2019-10-09 12:31:06: pid 19856: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28928-1] 2019-10-09 12:31:06: pid 19856: LOG: new connection received
Oct 9 12:31:06 N1PPRL-UFGA0159 pgpool[19856]: [28928-2] 2019-10-09 12:31:06: pid 19856: DETAIL: connecting host=10.92.204.141 port=51938


6.
Oct 9 12:31:11 N1PPRL-UFGA0159 pgpool[20079]: [27321-1] 2019-10-09 12:31:11: pid 20079: LOG: new connection received
Oct 9 12:31:11 N1PPRL-UFGA0159 pgpool[20079]: [27321-2] 2019-10-09 12:31:11: pid 20079: DETAIL: connecting host=n1vw-pa-gis12.india.airtel.itm port=53027
Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26342-1] 2019-10-09 12:31:54: pid 18576: ERROR: unable to read data from DB node 0
Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26342-2] 2019-10-09 12:31:54: pid 18576: DETAIL: socket read failed with an error "Connection timed out"
Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26342-3] 2019-10-09 12:31:54: pid 18576: CONTEXT: while searching system catalog, When relcache is missed
Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26343-1] 2019-10-09 12:31:54: pid 18576: LOG: statement: DISCARD ALL
Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26344-1] 2019-10-09 12:31:54: pid 18576: WARNING: write on backend 0 failed with error :"Success"
Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26344-2] 2019-10-09 12:31:54: pid 18576: DETAIL: while trying to write data from offset: 0 wlen: 18
Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26345-1] 2019-10-09 12:31:54: pid 18576: WARNING: write on backend 0 failed with error :"Success"
Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26345-2] 2019-10-09 12:31:54: pid 18576: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 9 12:31:54 N1PPRL-UFGA0159 pgpool[18576]: [26346-1] 2019-10-09 12:31:54: pid 18576: LOG: new connection received

-------

Please help me to resolve the above issues.

Also please help me to clarify below queries:

1. Is PostgreSQL Linux Servers and Pgpool Linux Servers should need the same server types( Both Physical/Virtual Servers). If it is mismatch, is there any impact?.
2. Is mandatory to have same OS version on both PostgreSQL Linux Servers and Pgpool Linux Servers?.
3. How to route one particular application server traffic to master database. Is there any kind of functionality like black_function_list?.
4. Some times pgpool is taking more time to execute : "show poo_nodes" / other queries from pgpool as well as pg admin. Please suggest the process to troubleshoot this king of issues?.
5. Some times pgpool is getting failed to ping DB nodes. Please suggest the process to config the ping retries like DB health configurations?.
6. Please suggest the process to create the redundancy network interface for pgpool to overcome network related issues?.

Seems this is very long list of items, Please suggest the required changes and respond earliest.

Once again thanks a lot for your support.

Regards,
Siva.


TagsNo tags attached.

Activities

sivapgpool89

2019-10-09 20:04

reporter  

pgpool_log.log (29,916 bytes)
Master_Pool_Config_Values.sql (42,758 bytes)   
Master_Pool_Config_Values.sql (42,758 bytes)   

t-ishii

2019-10-10 15:30

developer   ~0002915

There are so many questions:-) Let's attack one by one...

In the log below health check seems to retry despite health_check_retry_delay = 10, meaning next rerty should happen after 10 seconds passed. (i.e. 01:02:39, rather than 01:02:29) Was your pgpool.conf (Master_Pool_Config_Values.sql) used while the logs below was taken?

Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: LOG: failed to connect to PostgreSQL server on "10.13.61.165:5432" using INET socket
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: DETAIL: health check timer expired
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: ERROR: failed to make persistent db connection
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: DETAIL: connection to host:"10.13.61.165:5432" failed
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool: 2019-09-10 01:02:29: pid 1925: LOG: health check retrying on DB node: 2 (round:1)
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [13-2] 2019-09-10 01:02:29: pid 1925: DETAIL: health check timer expired
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [14-1] 2019-09-10 01:02:29: pid 1925: ERROR: failed to make persistent db connection
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [14-2] 2019-09-10 01:02:29: pid 1925: DETAIL: connection to host:"10.13.61.165:5432" failed
Sep 10 01:02:29 N1PPRL-UFGA0159 pgpool[1925]: [15-1] 2019-09-10 01:02:29: pid 1925: LOG: health check retrying on DB node: 2 (round:1)
Sep 10 01:02:31 N1PPRL-UFGA0159 pgpool[1785]: [23-1] 2019-09-10 01:02:31: pid 1785: WARNING: watchdog failed to ping host"10.13.61.164"

sivapgpool89

2019-10-10 20:40

reporter   ~0002916

Thanks t-ishill.

In the log below health check seems to retry despite health_check_retry_delay = 10, meaning next rerty should happen after 10 seconds passed. (i.e. 01:02:39, rather than 01:02:29) Was your pgpool.conf (Master_Pool_Config_Values.sql) used while the logs below was taken? ==> Yes

So, is there any impact because of this?.
Please suggest on the same.

Also please respond the other questions as well.

Along above we observed pgpool strange behaviors , pgpool is running , but not able to accept the connections from the clients and finally getting server not listen on port.

Please suggest the reasons why this kind of strange behavior of pgpool and how to over come the same.

t-ishii

2019-10-10 22:34

developer   ~0002917

> So, is there any impact because of this?.
Yes, because health check does not work normally.

BTW, all theses problems start to happen as soon as Pgpool-II starts?
Or they occasionally happen? If so, is there any possible trigger?

> Please suggest the reasons why this kind of strange behavior of pgpool and how to over come the same.
At this point no idea. It seems the network is unstable from Pgpool-II's point of view but I am not sure.

t-ishii

2019-10-11 09:08

developer   ~0002919

Last edited: 2019-10-11 09:19

I have noticed possible issues in your pgpool.conf:

1) trusted_servers points to backend node ip addresses. This is plain wrong. trusted_servers should point to other than PostgreSQL backend and pgpool. If you cannot do that, you should disable trusted_servers (set empty string).

From your log pgpool failed to ping to all backend ips and decided to suicide, this prevented health check process to retry more times. If trusted_servers were set correctly (or set to empty string), health check might succeed retrying. I don't know why pgpool failed to ping to all PostgreSQL backends, but I think that there was a network problem between pgpool and backend nodes at that time.

2) failover_command is disable. If PostgreSQL primary goes down, standby server will not be automatically promoted. I am not sure it is intended though.

t-ishii

2019-10-11 09:36

developer   ~0002920

Last edited: 2019-10-11 13:09

> 1. Is PostgreSQL Linux Servers and Pgpool Linux Servers should need the same server types( Both Physical/Virtual Servers).
No.

> 2. Is mandatory to have same OS version on both PostgreSQL Linux Servers and Pgpool Linux Servers?.
No.

> 3. How to route one particular application server traffic to master database. Is there any kind of functionality like black_function_list?.

If the application running on the application server uses unique application_name or database name, then you can use
app_name_redirect_preference_list or database_redirect_preference_list to control which server the clients connect to.

> 4. Some times pgpool is taking more time to execute : "show poo_nodes" / other queries from pgpool as well as pg admin. Please suggest the process to troubleshoot this king of issues?.

No idea. The command is simple and there's no part which could take time to obtain data (basically it just fetches data from shared memory). If I were you, I will use debugger or profile tools to know which part of code takes long time but this would require knowledge about internal of Pgpool-II.

> 5. Some times pgpool is getting failed to ping DB nodes. Please suggest the process to config the ping retries like DB health configurations?.

As I said, ping to DB nodes is not a good idea.

> 6. Please suggest the process to create the redundancy network interface for pgpool to overcome network related issues?.

Pgpool-II itself does not have such a feature except heartbeat. You'd better to use bonding or similar technique to use multiple physical netoworks for that purpose. I am not a network expert, and you' better to consult to someone else.

sivapgpool89

2019-10-11 16:25

reporter   ~0002921

Thanks A Lot t-ishii for your suggestions, i am checking and share findings.

sivapgpool89

2019-10-11 18:34

reporter   ~0002922

Hi t-ishii,

Thanks for your support.

Could you please help me get to resolve below issues as well.

1.
ERROR:unable to read data from DB node 0
DETAIL: socket read failed with an error "Connection timed out"

2.
WARNING: write on backend 0 failed with error :"Success"
DETAIL: while trying to write data from offset: 0 wlen: 18

3.
FATAL: unable to read data from DB node 0
DETAIL: EOF encountered with backend
WARNING: write on backend 0 failed with error :"Success"
DETAIL: while trying to write data from offset: 0 wlen: 5

4.
ERROR: unable to read data from DB node 0
DETAIL: socket read failed with an error "Connection timed out"
CONTEXT: while searching system catalog, When relcache is missed
LOG: statement: DISCARD ALL
WARNING: write on backend 0 failed with error :"Success"
DETAIL: while trying to write data from offset: 0 wlen: 18

Regards,
Siva.

t-ishii

2019-10-12 13:21

developer   ~0002923

Basically they are network layer errors reported by the kernel.

> 1.
> ERROR:unable to read data from DB node 0
> DETAIL: socket read failed with an error "Connection timed out"
Pgpool tried to read data from backend 0 but failed. Backend 0 did not send a response in specified time.

> 2.
> WARNING: write on backend 0 failed with error :"Success"
> DETAIL: while trying to write data from offset: 0 wlen: 18

Connection to backend 0 was already broken. Pgpool-II failed to write to the socket.

> 3.
> FATAL: unable to read data from DB node 0
> DETAIL: EOF encountered with backend
> WARNING: write on backend 0 failed with error :"Success"
> DETAIL: while trying to write data from offset: 0 wlen: 5
Due to the same reason as 2, Pgpool-II failed to read from the socket to backend 0.

> 4.
> ERROR: unable to read data from DB node 0
> DETAIL: socket read failed with an error "Connection timed out"
> CONTEXT: while searching system catalog, When relcache is missed
> LOG: statement: DISCARD ALL
> WARNING: write on backend 0 failed with error :"Success"
> DETAIL: while trying to write data from offset: 0 wlen: 18
Same reason as 3.

sivapgpool89

2019-10-12 16:39

reporter   ~0002924

Thank you very much t-ishii.

Sorry for too many number of issues.

from your comment "Basically they are network layer errors reported by the kernel", i clearly understood these are because of the network issues.

But in log i havn't seen any network breakage between the servers DB nodes and pgpool nodes for above issues.
Above issues are reporting almost most of the time in pgpool log file.

in one case , one incident was happend( In Master DB , one proc having DML statements is running from previous mid night and it is almost nearly taken all the cpu resources(90% to 100%) and proc hold locks on tables beacuse of that all other sessions locking each other and almost all 108 sessions are went waiting state)

I have observed few points at that time.. and i am not sure because of the above issue might cause to report above issues.

1. Not able to run psql -U pgpool --host 10.222.164.81 --dbname postgres -c "show pool_nodes" and it took long time and finally generates " Could not connect to server".
2. It generate the below issues in log:
a. ERROR:unable to read data from DB node 0
  DETAIL: socket read failed with an error "Connection timed out"
b.
WARNING: write on backend 0 failed with error :"Success"
DETAIL: while trying to write data from offset: 0 wlen: 18
c.
FATAL: unable to read data from DB node 0
DETAIL: EOF encountered with backend
WARNING: write on backend 0 failed with error :"Success"
DETAIL: while trying to write data from offset: 0 wlen: 5

After terminating proc and cancel all locking session and restarting pgpool , pgpool can able to accept the connections. But even restart also occasionally reporting the above issues and application also reporting below messages.

<Warning> <Default> <BEA-000000> <SQLException occurred while connecting to 10.222.167.250:
5432
org.postgresql.util.PSQLException: ERROR: unable to read data from DB node 1
  Detail: socket read failed with an error "Connection timed out"
  
---

Could you please find the responses for your response.

> 1.
> ERROR:unable to read data from DB node 0
> DETAIL: socket read failed with an error "Connection timed out"
Pgpool tried to read data from backend 0 but failed. Backend 0 did not send a response in specified time.

kindly Please find my My Response and Questions:

1. Can we ignore this in first phase?. Is might cause any other issues?.
2. actual there is no network breakage, what might cause to failed to read the data?.
3. What is the threshold time to send the response to pgpool from backend?.
4. How to configure the thresholds to overcome this kind of issues?
5. Is this because of the connection pooling?.we are using ppgool for connection pooling and load balancing purpose.
6. Please suggest how to overcome this issue?.


> 2.
> WARNING: write on backend 0 failed with error :"Success"
> DETAIL: while trying to write data from offset: 0 wlen: 18

Connection to backend 0 was already broken. Pgpool-II failed to write to the socket.

Response:
1. What might cause to broken the connections.?
2. Is idle connection configirations cause these issues?.
3. Is this expected behavior? or it will impact pgpool in long term?.
4. Please suggest how to resolve this issues?.

below are the Life Time Configurations:

# - Life time -

child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0

3. Occasinally we are getting backend node degeneration and failover in logs.

health_check_period = 20
health_check_timeout = 20
health_check_user = 'pgpool'
health_check_password = ''
health_check_database = 'postgres'
health_check_max_retries = 10
health_check_retry_delay = 10
connect_timeout = 80000
     
from your previous comments "In the log below health check seems to retry despite health_check_retry_delay = 10, meaning next rerty should happen after 10 seconds passed. (i.e. 01:02:39, rather than 01:02:29)".

Above are the configirations, we thought we miss configured the health check configirations, this causing problems.
Could you please suggest the some best values to mitigte the issues?.

We are using ppgool only for Connection pooling and LB . Can we configure backend_flag0 = 'DISALLOW_TO_FAILOVER' ? , is this will help in the situations?.

Thanks You very much for support.

Regards,
Siva.
---

t-ishii

2019-10-14 11:07

developer   ~0002925

I a waiting for the answer for the question below.
> BTW, all theses problems start to happen as soon as Pgpool-II starts?
> Or they occasionally happen? If so, is there any possible trigger?

sivapgpool89

2019-10-14 13:09

reporter   ~0002926

Hi t-ishii,

thanks for update.

please find the response.

> BTW, all theses problems start to happen as soon as Pgpool-II starts?

these problems are reporting more frequently.

> Or they occasionally happen? If so, is there any possible trigger?

Application is accessing pgpool.

Could you please let me know what do you mean possible trigger?

Regards,
Siva.

sivapgpool89

2019-10-16 22:20

reporter   ~0002928

Hi t-ishii,

Could you please update on this request.

Thanks,
Siva.

t-ishii

2019-10-18 07:52

developer   ~0002933

> > BTW, all theses problems start to happen as soon as Pgpool-II starts?
> these problems are reporting more frequently.

Sorry, I don't understand what you are saying.

> > Or they occasionally happen? If so, is there any possible trigger?
> Application is accessing pgpool.
> Could you please let me know what do you mean possible trigger?
If Pgpool-II was running for a while without problem and suddenly you start to see issues, then we can logically guess that something could make the issue happen. I use "trigger" for this.

sivapgpool89

2019-10-21 18:55

reporter   ~0002937

Hi t-ishii,

Thanks for update.

Below are the few observations.

At that particular points of time, all num_init_children got full. please find the configirations at the point:

num_init_children = 128
max_pool = 8
                                                
# - Life time -
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0

[cyient@N1PPRL-UFGA0159 ~]$ netstat -s | grep overflow
    369125 times the listen queue of a socket overflowed
    
[root@s2n ~]# sysctl net.core.somaxconn
net.core.somaxconn = 128

--

As soon as i found connection full, i have changed config to below values.
num_init_children = 500
max_pool = 4
                                                
# - Life time -
child_life_time = 180
child_max_connections = 0
connection_life_time = 180
client_idle_limit = 180

After above changes , I havn't found "Connection Timeout" issues and every thing was good. But due to client_idle_limit, pool was terminating the client connections ans causing some application issues.So again i have changed the configs.

Then i have tweek some of the configs, below are the latest configirations.

num_init_children = 1000
max_pool = 1
                                                
# - Life time -
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0

From changing the configs onwards, i am getting below issues almost every hours, please find the below error messages from the log.

-

Oct 21 01:42:50 N1PPRL-UFGA0159 pgpool[19142]: [81863-1] 2019-10-21 01:42:50: pid 19142: ERROR: unable to read data from DB node 1
Oct 21 01:42:50 N1PPRL-UFGA0159 pgpool[19142]: [81863-2] 2019-10-21 01:42:50: pid 19142: DETAIL: socket read failed with an error "Connection timed out"
Oct 21 01:42:50 N1PPRL-UFGA0159 pgpool[19142]: [81864-1] 2019-10-21 01:42:50: pid 19142: LOG: statement: DISCARD ALL
Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[19142]: [81865-1] 2019-10-21 01:42:51: pid 19142: FATAL: unable to read data from DB node 0
Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[19142]: [81865-2] 2019-10-21 01:42:51: pid 19142: DETAIL: EOF encountered with backend
Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[19142]: [81866-1] 2019-10-21 01:42:51: pid 19142: WARNING: write on backend 0 failed with error :"Success"
Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[19142]: [81866-2] 2019-10-21 01:42:51: pid 19142: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 21 01:42:51 N1PPRL-UFGA0159 pgpool[2407]: [29694-1] 2019-10-21 01:42:51: pid 2407: LOG: child process with pid: 19142 exits with status 256


Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[23583]: [29435-1] 2019-10-21 06:22:00: pid 23583: LOG: new connection received
Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[23583]: [29435-2] 2019-10-21 06:22:00: pid 23583: DETAIL: connecting host=n1vw-pa-gis14.india.airtel.itm port=51900
Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[2407]: [30718-1] 2019-10-21 06:22:00: pid 2407: LOG: child process with pid: 28320 exits with status 256
Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[2407]: [30719-1] 2019-10-21 06:22:00: pid 2407: LOG: fork a new child process with pid: 9333
Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[2407]: [30720-1] 2019-10-21 06:22:00: pid 2407: LOG: child process with pid: 29056 exits with status 256
Oct 21 06:22:00 N1PPRL-UFGA0159 pgpool[2407]: [30721-1] 2019-10-21 06:22:00: pid 2407: LOG: fork a new child process with pid: 9334
Oct 21 06:23:21 N1PPRL-UFGA0159 pgpool[2407]: [30722-1] 2019-10-21 06:23:21: pid 2407: LOG: child process with pid: 6256 exits with status 256
Oct 21 06:23:21 N1PPRL-UFGA0159 pgpool[2407]: [30723-1] 2019-10-21 06:23:21: pid 2407: LOG: fork a new child process with pid: 9519
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108033-1] 2019-10-21 06:24:02: pid 24428: ERROR: unable to read data from DB node 0
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108033-2] 2019-10-21 06:24:02: pid 24428: DETAIL: socket read failed with an error "Connection timed out"
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108034-1] 2019-10-21 06:24:02: pid 24428: LOG: statement: DISCARD ALL
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108035-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 0 failed with error :"Success"
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108035-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 18
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108036-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 0 failed with error :"Success"
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108036-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108037-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 1 failed with error :"Success"
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108037-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108038-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 2 failed with error :"Success"
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108038-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108039-1] 2019-10-21 06:24:02: pid 24428: WARNING: write on backend 3 failed with error :"Success"
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[24428]: [108039-2] 2019-10-21 06:24:02: pid 24428: DETAIL: while trying to write data from offset: 0 wlen: 5
Oct 21 06:24:02 N1PPRL-UFGA0159 pgpool[8055]: [30690-1] 2019-10-21 06:24:02: pid 8055: LOG: statement: DISCARD ALL


Is i have miss something/miss configirations?.Are these issues related to any other issues?.
Could you please suggest how to resolve this kind of issues?.

Thanks A Lot for your support.

Regards,
Siva.

sivapgpool89

2019-10-22 12:27

reporter   ~0002938

Hi t-ishii,

Please share possible cases to trigger above issues, so that we will try to check all possible causes. Please share the source code related to this issues.

Thank you
SIva.

t-ishii

2019-10-23 18:45

developer   ~0002939

You can use grep command:

cd /path/to/pgpool/source/code
cd src
grep -i "write on backend" */*.c

sivapgpool89

2019-10-24 13:48

reporter   ~0002940

Thanks for update source code details t-ishii.

Could you please update on issues.

t-ishii

2019-10-24 16:35

developer   ~0002941

> Is i have miss something/miss configirations?.

Besides the trusted_servrs, so far so good.

> Are these issues related to any other issues?.

Probably. As I said before there could be issues: 1) the network connection between pgpool and PostgreSQL, 2) PostgreSQL itself.

> Could you please suggest how to resolve this kind of issues?.
I have no idea. They are not related to Pgpool-II itself. Again, I suggest you check the network and PostgreSQL.

sivapgpool89

2019-10-25 13:14

reporter   ~0002942

Thanks for update t-ishii.

sivapgpool89

2019-10-30 13:35

reporter   ~0002950

Thanks a lot for your support.

Please close this task.

t-ishii

2019-10-30 15:04

developer   ~0002951

Thanks. I am going to close the issue.

Issue History

Date Modified Username Field Change
2019-10-09 20:04 sivapgpool89 New Issue
2019-10-09 20:04 sivapgpool89 File Added: pgpool_log.log
2019-10-09 20:04 sivapgpool89 File Added: Master_Pool_Config_Values.sql
2019-10-10 15:12 t-ishii Assigned To => t-ishii
2019-10-10 15:12 t-ishii Status new => assigned
2019-10-10 15:30 t-ishii Note Added: 0002915
2019-10-10 15:30 t-ishii Status assigned => feedback
2019-10-10 15:30 t-ishii Description Updated
2019-10-10 20:40 sivapgpool89 Note Added: 0002916
2019-10-10 20:40 sivapgpool89 Status feedback => assigned
2019-10-10 22:34 t-ishii Note Added: 0002917
2019-10-11 09:08 t-ishii Note Added: 0002919
2019-10-11 09:09 t-ishii Status assigned => feedback
2019-10-11 09:19 t-ishii Note Edited: 0002919
2019-10-11 09:36 t-ishii Note Added: 0002920
2019-10-11 13:09 t-ishii Note Edited: 0002920
2019-10-11 16:25 sivapgpool89 Note Added: 0002921
2019-10-11 16:25 sivapgpool89 Status feedback => assigned
2019-10-11 18:34 sivapgpool89 Note Added: 0002922
2019-10-12 13:21 t-ishii Note Added: 0002923
2019-10-12 13:21 t-ishii Status assigned => feedback
2019-10-12 16:39 sivapgpool89 Note Added: 0002924
2019-10-12 16:39 sivapgpool89 Status feedback => assigned
2019-10-14 11:07 t-ishii Note Added: 0002925
2019-10-14 11:07 t-ishii Status assigned => feedback
2019-10-14 13:09 sivapgpool89 Note Added: 0002926
2019-10-14 13:09 sivapgpool89 Status feedback => assigned
2019-10-16 22:20 sivapgpool89 Note Added: 0002928
2019-10-18 07:52 t-ishii Note Added: 0002933
2019-10-18 08:01 t-ishii Status assigned => feedback
2019-10-21 18:55 sivapgpool89 Note Added: 0002937
2019-10-21 18:55 sivapgpool89 Status feedback => assigned
2019-10-22 12:27 sivapgpool89 Note Added: 0002938
2019-10-23 18:45 t-ishii Note Added: 0002939
2019-10-24 13:48 sivapgpool89 Note Added: 0002940
2019-10-24 16:35 t-ishii Note Added: 0002941
2019-10-25 10:25 t-ishii Status assigned => feedback
2019-10-25 13:14 sivapgpool89 Note Added: 0002942
2019-10-25 13:14 sivapgpool89 Status feedback => assigned
2019-10-30 13:35 sivapgpool89 Note Added: 0002950
2019-10-30 15:04 t-ishii Note Added: 0002951
2019-10-30 15:04 t-ishii Status assigned => closed