[Pgpool-hackers] Major bug with pcp_detach_node

Fri Jan 7 11:26:12 UTC 2011

Hi,

Don't know what you call the pcp server program ? I've modified the
patch to be placed into the pcp_do_child() function from pcp_child.c,
this is for me where PCP command are received. I guess that libpcp use
this part too. I've also add the same fix for pcp_recovery_node and
pcp_attach_node that doesn't handle the case too.

Here is the server response when out of range :

DEBUG: send: tos="R", len=46
DEBUG: recv: tos="r", len=21, data=AuthenticationOK
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="e", len=21, data=NodeIdOutOfRange
DEBUG: command failed. reason=NodeIdOutOfRange
BackendError
DEBUG: send: tos="X", len=4

Hope this is what you requested, else I don't know where to do better.

Regards,

Le 07/01/2011 10:12, Tatsuo Ishii a écrit :
> Gilles,
>
> Thanks for the report.  That is definitely a bug. There should be a
> node id range check somewhere.  However I don't think placing the
> check in pcp_detach_node command is a good idea. Rather, we should put
> the check in the pcp server program. This way, not only by using
> pcp_command but using pcp library is being checked of the node range.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>> I found an annoying problem with the PCP command pcp_detach_node. I have
>> 3 computers running each a postgresql instance in a streaming
>> replication line. PgPool is running on the first node which is the
>> master. The problem comes when you give a node id outside the real node
>> numbers.
>>
>> As I explain above I just have 3 nodes so node id goes from 0 up to 2
>> and if I use node id 3 that doesn't exists, here are the results:
>>
>> /usr/bin/pcp_detach_node -d 10 192.168.1.11 9898 postgres postgres 3
>>
>> DEBUG: send: tos="R", len=46
>> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
>> DEBUG: send: tos="D", len=6
>> DEBUG: recv: tos="d", len=20, data=CommandComplete
>> DEBUG: send: tos="X", len=4
>> ------------- log file ----------------
>> LOG: notice_backend_error: node 0 is not valid backend.
>> LOG: starting degeneration. shutdown host 192.168.1.13(5432)
>> LOG: execute command: /home/postgres/bin/failover.sh 2 192.168.1.13
>> 192.168.1.11 /home/postgres/data/postgres.trigger
>> LOG: failover_handler: set new master node: 0
>> LOG: failover done. shutdown host 192.168.1.13(5432)
>> LOG: find_primary_node: primary node id is 0
>>  
>> [postgres at vm1 ~]$ psql -p 9999 -c "SHOW pool_nodes;"
>>  node_id |   hostname   | port | status | lb_weight | state
>> ---------+--------------+------+--------+-----------+-------
>>  0       | 192.168.1.11 | 5432 | 2      | 0.333333  | P
>>  1       | 192.168.1.12 | 5432 | 2      | 0.333333  | S
>>  2       | 192.168.1.13 | 5432 | 3      | 0.333333  | S
>> (3 rows)
>>
>> As you can see node 2 has been detached instead of aborting and
>> displaying an error, I also experienced that the detached node was node
>> 0, which is worst.
>>
>> I've attached a patch that will return the following :
>>
>> /usr/bin/pcp_detach_node -d 10 192.168.1.11 9898 postgres postgres 3
>> DEBUG: send: tos="R", len=46
>> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
>> DEBUG: send: tos="D", len=6
>> EOFError
>> DEBUG: send: tos="X", len=4
>> ------------- log file ----------------
>> LOG: pcp_child: node id 3 is not valid
>> LOG: PCP child 32232 exits with status 256
>> LOG: fork a new PCP child pid 32299
>>
>>
>> Regards,
>>
>> -- 
>> Gilles Darold
>> http://dalibo.com - http://dalibo.org
>>

-- 
Gilles Darold
http://dalibo.com - http://dalibo.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-II-detach-bug-2.diff
Type: text/x-patch
Size: 1728 bytes
Desc: not available
URL: <http://pgfoundry.org/pipermail/pgpool-hackers/attachments/20110107/ccd81a0c/attachment.bin>