View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0000035||Pgpool-II||Bug||public||2012-11-20 11:54||2013-04-04 11:31|
|Target Version||Fixed in Version|
|Summary||0000035: Authentication is timeout|
I get this error message in the logs all the time, tens of thousands of times a day. The postgresql servers respond fast and there's no problems when manually connecting through pgpool or directly.
In the config authentication_timeout is set to 60.
|Additional Information||PGPool version is 3.2.1 but it happened with earlier versions too|
Postgresql version is 9.1.x and streaming replication is used.
Load balancing is off, paraller mode is off, query cache is off, watchdog is off.
|Tags||No tags attached.|
||Could you show us the log messages of PostgreSQL and pgpool-II in detail when timeout occurs?|
pgpool.log (56,259 bytes)
primary_postgresql.log (48,338 bytes)
pgpool.conf (28,404 bytes)
I attached the pgpool log and config and the log from primary postgresql.
The timeouts come in pretty much evenly during the day, 1000-1100 per hour or so.
The application's connections (user=app, db=app_production in the logs) are long lasting and there's usually 50-70 connections open to pgpool (and to the postgresql server), usually only 1-3 are processing something and others are idle. Max is 200. New connection is only opened when a new application thread is started (not so often).
The message of "authentication is timeout" means that pgpool's child process can't read a startup packet.
Could you confirm your system's networking again?
I haven't seen any issues with the network, this is also happening both in qa and production environments and I think it would be visible in other ways too, if there was network issues..
We've done performance testing to it and I don't get any connection failures through pgpool or directly to postgresql so this looks more of like a false error, something caused by pgpool's internal connections (health checks?) or an error that is retried by pgpool.
The application doesn't get any db errors and it also records the time spent in db operations and there hasn't been any slowness there.
I debugged this further. It seems that the timeout message occurs because pool_read fails in child.c:825 (/* read startup packet length */) and it just returns null without clearing the timer.
in pool_read len=4, readlen=0 and cp->isbackend is false, so it returns -1
in pool_stream.c:203 (/* if backend offers authentication method, frontend could close connection */)
I thought it was a problem communicating with the backend but it seems to be with frontend.
readlen is 0, so is there no data coming in? could it be just a client opening the connection but sending no data? Or are we maybe not waiting long enough for it?
The frontend connection is closed nearly instantly in child.c:187 (/* connection request from frontend timed out */), within the second from the connection and long ago before the timeout message comes.
||Ok, I know what is causing those, it's a haproxy doing health checks by opening the tcp connection and closing it. I guess the authentication timeout timer should be cleared if/when the frontend closes the connection.|
||How? As far as I know, there's no way to know reading socket is closed in UNUX/Linux API.|
Well, currently it seems to be detected (nearly?) instantly in child.c:187. I added some debug messages to see what happens:
Dec 16 02:10:03 pgpool: I am 31272 accept fd 12
Dec 16 02:10:03 pgpool: connection received: host=app01 port=41484
Dec 16 02:10:03 pgpool: pool_read debug: len: 4, consume_size: 0
Dec 16 02:10:03 pgpool: pool_read debug: ssl_active: 0
Dec 16 02:10:03 pgpool: pool_read debug: readlen: 0
Dec 16 02:10:03 pgpool: pool_read debug: returning -1 in pool_stream.c:203
Dec 16 02:10:03 pgpool: read_startup_packet debug: pool_read failed in child.c:825
Dec 16 02:10:03 pgpool: do_child debug: failed to read the startup packet, close frontend connection in child.c:187
Dec 16 02:11:03 pgpool: authentication is timeout
Dec 16 02:11:03 pgpool: child 31272 exits with status 256
The read failed and pgpool closed it's side nearly instantly when the client had closed the tcp-connection and the "authentication is timeout" message came about 60 seconds later (60 seconds is the authentication timeout). I suppose the child moved back to accept loop then.
Thanks for your debugging.
I fixed read_startup_packet() to reset timeout alarm correctly.
Could you try this snapshot?
||Thanks, no longer getting the "Authentication is timeout". I would probably change the log line from error to debug, though. If a monitoring is polling the port, then it will cause quite a bit of noise just like the authentication timeout message.|
|2012-11-20 11:54||tuomas||New Issue|
|2012-11-26 15:20||anzai||Assigned To||=> anzai|
|2012-11-26 15:20||anzai||Status||new => assigned|
|2012-11-26 17:41||anzai||Note Added: 0000169|
|2012-11-26 23:16||tuomas||File Added: pgpool.log|
|2012-11-26 23:17||tuomas||File Added: primary_postgresql.log|
|2012-11-26 23:32||tuomas||File Added: pgpool.conf|
|2012-11-26 23:36||tuomas||Note Added: 0000170|
|2012-12-13 11:34||anzai||Note Added: 0000183|
|2012-12-13 15:46||tuomas||Note Added: 0000184|
|2012-12-16 10:27||tuomas||Note Added: 0000191|
|2012-12-16 11:20||tuomas||Note Added: 0000192|
|2012-12-16 12:45||t-ishii||Note Added: 0000193|
|2012-12-16 22:51||tuomas||Note Added: 0000194|
|2012-12-17 13:55||anzai||Note Added: 0000195|
|2012-12-22 05:08||tuomas||Note Added: 0000206|
|2013-04-04 11:31||anzai||Status||assigned => resolved|
|2013-04-04 11:31||anzai||Resolution||open => fixed|