View Issue Details

IDProjectCategoryView StatusLast Update
0000319Pgpool-IIBugpublic2017-07-19 16:48
Reporterdrrtuy Assigned To 
PrioritynormalSeverityblockReproducibilitysometimes
Status closedResolutionopen 
Platformx86_64OSCentosOS Version7.1
Product Version3.6.4 
Summary0000319: pgpool hangs in pool_check_fd()
DescriptionGreetings,

I have an issue with pgpool 3.6.4 and postgres 9.4.12. Somehow packets of the frontend protocol got lost and pgpool hangs waiting an answer from postgres.
The topology is simple pgpool(P) ----> postgres(PG). P's ip is 10.69.64.27 and PG's ip is 10.69.64.164.
Here are the states of the pgpool process [1] and postgres [2]

1. https://pastebin.com/UeGLkUYA
2. https://pastebin.com/z2U8Xhfk

Whether it possible to call pool_set_timeout() with a reasonable timeout right before pool_check_fd()?
TagsNo tags attached.

Activities

t-ishii

2017-07-04 10:40

developer   ~0001568

I think there's no reasonable timeout since each SQL command could take very long time.

It is possible that Pgpool-II waits for answer from PostgreSQL in wrong timing. To judge it, we need self contained test case.

drrtuy

2017-07-04 22:21

reporter   ~0001574

I have two questions regarding the issue:
1) What could be the reason of such lost messages between a pool and a backend in your opinion? I couldn't imagine because the only thought I have is a network layer packet loss. But if it is a network PL then TCP retransmission would fix the gap asking for lost segments.
2) Whether it is acceptable to swap infinite select() with a loop that contains select() with configurable TO and a function that tries to send Describe protocol message. The command will fail if the remote backend waits for a next command or reaches timeout if the backend is active for real. I could write the prototype and try it out while the situation is reproducable in my environment.

t-ishii

2017-07-19 14:35

developer   ~0001587

I am not sure what you mean by "lost messages". I think Pgpool-II just waits for message coming from backend (or frontend) in vain.

> 2) Whether it is acceptable to swap infinite select() with a loop that contains select() with configurable TO and a function that > tries to send Describe protocol message. The command will fail if the remote backend waits for a next command or reaches > timeout if the backend is active for real. I could write the prototype and try it out while the situation is reproducable in my > > environment.
Failed command will cause lots of other problems: transaction aborting, unwanted error messages coming from backend.

I believe the proper solution would be finding put the cause of the hang and fix the bug. That's why I requested a self contained test case (but you do not respond to my request).

I'm not sure what kind of problem you have because there's no test case. Anyway, attached patch *may* fix your problem which was created from different error report. (bug317)
pgpool-hung.diff (9,560 bytes)   
pgpool-hung.diff (9,560 bytes)   

drrtuy

2017-07-19 15:33

reporter   ~0001590

Thx for the answer Tatsuo. I will try the patch.
Meanwhile the issue should be closed since I can't reproduce the behavior in a controlable fashion.

t-ishii

2017-07-19 16:48

developer   ~0001593

Ok, issue closed.

Issue History

Date Modified Username Field Change
2017-07-03 22:40 drrtuy New Issue
2017-07-04 10:40 t-ishii Note Added: 0001568
2017-07-04 22:21 drrtuy Note Added: 0001574
2017-07-19 14:35 t-ishii File Added: pgpool-hung.diff
2017-07-19 14:35 t-ishii Note Added: 0001587
2017-07-19 14:35 t-ishii Status new => feedback
2017-07-19 15:33 drrtuy Note Added: 0001590
2017-07-19 15:33 drrtuy Status feedback => new
2017-07-19 16:48 t-ishii Note Added: 0001593
2017-07-19 16:48 t-ishii Status new => closed