[Pgpool-general] The Unplugged wire

Daniel Codina dcodina at laigu.net
Mon Mar 1 14:19:00 UTC 2010


I spoke in another message about this problem, yet, I debugged deeper and I
have more specific information, that, maybe, can be usefull.
(The thread I spoke something about was:
http://lists.pgfoundry.org/pipermail/pgpool-general/2010-February/002565.html
)

I am working with two VB Virtual machines with CentOS 5 (i386). Running
PostgreSQL 8.3.9 and pgpool 2.3.2.1.

The test was simple. While I was inserting values every second, I unplugged
one of the nodes.
health check is every second and it's timeout is 2 seconds.

In that moment all inserts stops, and pgpool waits.
The point where it stops is:

[...]
[pid 29444] 10:47:55.537470 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 29444] 10:47:55.537591 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 9
[pid 29444] 10:47:55.537726 setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 29444] 10:47:55.537886 connect(9, {sa_family=AF_INET,
sin_port=htons(5432), sin_addr=inet_addr("192.168.1.10")}, 16) = ?
ERESTARTSYS (To be restarted)
[pid 29444] 10:47:56.529113 --- SIGALRM (Alarm clock) @ 0 (0) ---
[pid 29444] 10:47:56.529235 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS
FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
[pid 29444] 10:47:56.529428 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 29444] 10:47:56.529602 sigreturn() = ? (mask now [])
[pid 29444] 10:47:56.529894 connect(9, {sa_family=AF_INET,
sin_port=htons(5432), sin_addr=inet_addr("192.168.1.10")}, 16 <unfinished
...>


First it does a connect() wich receives de SIGALARM, and continues. But then
it does another connect(), and this time it does not receive any SIGALARM,
so, it waits (I think) till the system closes the connection.

After waiting (too long) it starts working again (now with the node down):

[...]
[pid 29445] 10:49:30.273727 <... connect resumed> ) = -1 EHOSTUNREACH (No
route to host)
[pid 29444] 10:49:30.274739 <... connect resumed> ) = -1 EHOSTUNREACH (No
route to host)
[pid 29445] 10:49:30.274809 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS
FPE SEGV CONT SYS RTMIN RT_1], [], 8) = 0
[pid 29445] 10:49:30.275057 time(NULL)  = 1267436970
[pid 29445] 10:49:30.275202 stat64("/etc/localtime", {st_mode=S_IFREG|0644,
st_size=2593, ...}) = 0
[pid 29445] 10:49:30.275485 write(2, "2010-03-01 10:49:30 ERROR: pid 2"...,
1012010-03-01 10:49:30 ERROR: pid 29445: connect_inet_domain_socket:
connect() failed: No route to host
) = 101
[pid 29445] 10:49:30.275911 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 29445] 10:49:30.276062 close(7)    = 0
[pid 29445] 10:49:30.276221 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS
FPE SEGV CONT SYS RTMIN RT_1], [], 8) = 0
[pid 29445] 10:49:30.276389 time(NULL)  = 1267436970
[pid 29445] 10:49:30.276715 stat64("/etc/localtime", {st_mode=S_IFREG|0644,
st_size=2593, ...}) = 0
[pid 29445] 10:49:30.276895 write(2, "2010-03-01 10:49:30 ERROR: pid 2"...,
782010-03-01 10:49:30 ERROR: pid 29445: connection to 192.168.1.10(5432)
failed
) = 78
[...]

As you can see it restarts after 1 min and a half (wich is too much). It is
always the same (without changeing any system values)

If it is necessary I can show more debug lines.

Looking trough the source, we think, maybe it could be a problem with the
connection being blocked. Maybe, it would be possible not to block it
(speaking about the socket).
We suppose something is happening in pool_connection_pool.c arround line 473
("connect_inet_domain_socket_by_port").

Or maybe I am doing something wrong,... does anybody else tested the
"unpluged wire" ? Is it working?

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20100301/98311431/attachment.html>


More information about the Pgpool-general mailing list