View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000046 | Pgpool-II | Bug | public | 2012-12-15 01:01 | 2013-01-23 11:01 |
| Reporter | mcousin | Assigned To | t-ishii | ||
| Priority | normal | Severity | major | Reproducibility | sometimes |
| Status | resolved | Resolution | open | ||
| Platform | Linux | OS | Linéux | ||
| Summary | 0000046: Watchdog failing to connect sometimes | ||||
| Description | Hi, I'm having a similar problem to this : http://www.pgpool.net/pipermail/pgpool-general/2012-December/001242.html, but on Linux. Capturing TCP frames, I see that a connection is correctly established between PGPool's watchdog and PostgreSQL, but that PGPool thinks the connection is dead. We have this kind of messages in the log: Nov 22 11:55:22 fantomas1 pgpool[30351]: connect_inet_domain_socket: connect() failed: Connection timed out Nov 22 11:55:22 fantomas1 pgpool[30351]: connection to fantomas4.prod.extelia.fr(5432) failed Nov 22 11:55:22 fantomas1 pgpool[30351]: new_connection: create_cp() failed Nov 22 11:55:22 fantomas1 pgpool[30351]: degenerate_backend_set: 1 fail over request from pid 30351 A tcpdump from the same period shows that the connection has been established. Digging further into the problem, I took a look at the code, and am wondering if the cause of the problem isn't to be found in the int connect_inet_domain_socket_by_port function: This function tries to connect using a non-blocking TCP connection. pool_set_nonblock(fd); and only then a connect (connect(fd, (struct sockaddr *)&addr, len)) that is called in a loop. Linux's manpage for socket (7) says this: « It is possible to do nonblocking I/O on sockets by setting the O_NONBLOCK flag on a socket file descriptor using fcntl(2). Then all opera- tions that would block will (usually) return with EAGAIN (operation should be retried later); connect(2) will return EINPROGRESS error. The user can then wait for various events via poll(2) or select(2).» This seems to mean that a non-blocking connect() is to be coupled with a poll or select. That's not what the code is doing. I think this may be the explanation both for what I am seeing (the connect is not blocking, and sometimes is still in progress, and connect is then called again and won't work), and the Mac OS problem (the connect was in progress during the first iteration and then already done on the second, hence the «Socket is already connected») There is another potential problem in this code (if I'm still not mistaken): if the connect() takes time (over a slow link for instance), it will be in a tight loop over a system call, and may eat a lof of CPU. | ||||
| Tags | No tags attached. | ||||
|
|
As you can see in the thread you are referring to, at least "Socket is already connected" problem has been fixed in the git repo. |
|
|
Ok, I rewite connect_inet_domain_socket_by_port() by using select(2). Can you try it out? attached patch is against pgpool-II 3.2.1. |
|
|
|
|
|
Thanks a lot ! We will apply it ASAP and keep you informed. |
|
|
Hi, and sorry for the long wait. This patch has been applied on monday. Everything seems to work fine since then. We will report back in another week to confirm this. |
|
|
Thanks. Looking forward to hearing from you next week. |
|
|
The problem has completely disappeared. It hasn't occurred since applying the patch. |
|
|
Great. Fix committed to master and 3.2 stable. Thanks! |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2012-12-15 01:01 | mcousin | New Issue | |
| 2012-12-15 09:41 | t-ishii | Assigned To | => t-ishii |
| 2012-12-15 09:41 | t-ishii | Status | new => assigned |
| 2012-12-15 09:45 | t-ishii | Note Added: 0000188 | |
| 2012-12-15 17:49 | t-ishii | Note Added: 0000190 | |
| 2012-12-15 17:50 | t-ishii | File Added: patch_against_3.2.1.patch | |
| 2012-12-17 16:45 | mcousin | Note Added: 0000196 | |
| 2013-01-08 09:46 | t-ishii | Status | assigned => feedback |
| 2013-01-16 19:14 | mcousin | Note Added: 0000215 | |
| 2013-01-16 19:14 | mcousin | Status | feedback => assigned |
| 2013-01-16 20:11 | t-ishii | Note Added: 0000216 | |
| 2013-01-17 06:38 | t-ishii | Status | assigned => feedback |
| 2013-01-22 19:21 | mcousin | Note Added: 0000220 | |
| 2013-01-22 19:21 | mcousin | Status | feedback => assigned |
| 2013-01-23 10:50 | t-ishii | Note Added: 0000221 | |
| 2013-01-23 10:50 | t-ishii | Status | assigned => resolved |
| 2013-01-23 11:01 | t-ishii | Changeset attached | => pgpool2 master 249af07c |