View Issue Details

IDProjectCategoryView StatusLast Update
0000025Pgpool-IIBugpublic2012-09-10 21:32
Reportersingh.gurjeetAssigned Tot-ishii 
PriorityhighSeverityminorReproducibilityalways
Status resolvedResolutionreopened 
PlatformOSHP-UX OS Version
Product Version 
Target VersionFixed in Version 
Summary0000025: s_do_auth doesn't handle NoticeResponse (N) message
DescriptionA customer of ours sees these messages in log fie:

2012-08-02 13:10:57 ERROR: pid 12428: pool_read2: EOF encountered with backend
2012-08-02 13:10:57 ERROR: pid 12428: make_persistent_db_connection:
s_do_auth failed
2012-08-02 13:10:57 DEBUG: pid 12428: pool_ssl: SSL requested but SSL
support is not available
2012-08-02 13:10:57 ERROR: pid 12393: pool_read2: EOF encountered with backend
2012-08-02 13:10:57 ERROR: pid 12393: make_persistent_db_connection:
s_do_auth failed
2012-08-02 13:10:57 ERROR: pid 12393: find_primary_node:
make_persistent_connection failed
2012-08-02 13:10:57 DEBUG: pid 12393: starting health checking
2012-08-02 13:10:57 DEBUG: pid 12393: health_check: 0 th DB node status: 2
......

2012-08-02 13:10:57 ERROR: pid 12428: s_do_auth: unknown response "N"
before processing BackendKeyData

......

2012-08-02 13:10:57 ERROR: pid 12428: s_do_auth: unknown response "J"
before processing BackendKeyData

......

2012-08-02 13:10:57 ERROR: pid 12393: health check failed during read. host
172.16.1.2 at port 5445 is down. reason: Connection reset by peer
2012-08-02 13:10:57 LOG: pid 12393: set 0 th backend down status
2012-08-02 13:10:57 DEBUG: pid 12393: failover_handler called 
2012-08-02 13:10:57 DEBUG: pid 12393: failover_handler: starting to select
new master node 

......

2012-08-02 13:10:57 LOG: pid 12393: execute command:
/opt/PostgresPlus/9.1AS/bin/failover_stream.sh 0 172.16.1.3
/opt/PostgresPlus/9.1AS/data/trigger_file

......

2012-08-02 13:09:28 ERROR: pid 12428: s_do_auth: unknown response "N" before processing BackendKeyData

Robert Hass' analysis of the problem in pgpool code:

<snip>
I analyzed the following error messages in detail:

2012-08-02 13:10:57 ERROR: pid 12428: s_do_auth: unknown response "N"
before processing BackendKeyData
2012-08-02 13:10:57 ERROR: pid 12428: s_do_auth: unknown response "J"
before processing BackendKeyData

It looks like the function s_do_auth() gets invoked when pgpool descides to try to create a persistent connection to the backend. It has a loop where it reads parameter-status (S) messages while waiting for a backend-key-data (K) message. If it gets any other message type, it bombs out with the above error. Now, a notice-response message (N) is actually valid here, but the code doesn't handle it.

Now, you might ask where the complaint about "J" comes from, since AFAICT there's no J message anywhere. But it turns out that after each complaint about an N message, there are three complaints about a message where the message type is a NUL byte followed by a complaint about a "J" message. This happens because the code is s_do_read doesn't actually bail out when it reads an unrecognized message-type byte, nor does it make any effort to skip the message and preserve sync. Instead it just blindly read a byte at a time and hopes something good happens, which of course it doesn't, because now it's
reading through the length-word of the protocol message a byte at a time, trying to interpret each byte as the beginning of a new message.
 Oops.

We will need to get this fixed in the community version of pgpool.
</snip>
TagsNo tags attached.

Activities

t-ishii

2012-08-28 09:54

developer   ~0000061

Last edited: 2012-08-28 10:52

View 2 revisions

Robert Haas's comment seems correct. I need to know how to reproduce (notice message arriving case). Can you provide?

t-ishii

2012-08-28 10:53

developer   ~0000062

Anyway I have committed fix to master.
http://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=7205711e4da0037043dc85c04b0373493798fd1a

ketanaka

2012-08-28 16:58

reporter   ~0000063

Hello.
We can reproduce it with following setting.

Master server 's postgresql.conf
client_min_messages = debug5


And I confirmed that the patch fixes this problem on my test machine.
But I have no idea where "J" comes from.

Regards.

t-ishii

2012-08-28 20:05

developer   ~0000065

Glad to hear that. Thanks for the report!

singh.gurjeet

2012-08-29 01:04

reporter   ~0000066

The fix looks good. But I do not agree with this hunk of commit diff:

                        case 'S': /* parameter status */
+ case 'N': /* notice response */
+ case 'E': /* error response */
                                pool_debug("s_do_auth: parameter status data received");

That is plain misleading. Emitting the message "parameter status data received" upon receiving an error or notice response will mislead anyone reading the log file.

I think each of those cases deserve a message of their own.

t-ishii

2012-08-29 07:52

developer   ~0000067

Will fix.

Issue History

Date Modified Username Field Change
2012-08-28 03:57 singh.gurjeet New Issue
2012-08-28 09:52 t-ishii Assigned To => t-ishii
2012-08-28 09:52 t-ishii Status new => assigned
2012-08-28 09:54 t-ishii Note Added: 0000061
2012-08-28 10:52 t-ishii Note Edited: 0000061 View Revisions
2012-08-28 10:53 t-ishii Note Added: 0000062
2012-08-28 16:58 ketanaka Note Added: 0000063
2012-08-28 20:05 t-ishii Note Added: 0000065
2012-08-28 20:06 t-ishii Status assigned => resolved
2012-08-28 20:06 t-ishii Resolution open => fixed
2012-08-29 01:04 singh.gurjeet Note Added: 0000066
2012-08-29 01:04 singh.gurjeet Status resolved => feedback
2012-08-29 01:04 singh.gurjeet Resolution fixed => reopened
2012-08-29 07:52 t-ishii Note Added: 0000067
2012-08-29 11:44 t-ishii Status feedback => resolved