[Pgpool-hackers] Major bug with pcp_detach_node

Fri Jan 7 09:12:52 UTC 2011

Gilles,

Thanks for the report.  That is definitely a bug. There should be a
node id range check somewhere.  However I don't think placing the
check in pcp_detach_node command is a good idea. Rather, we should put
the check in the pcp server program. This way, not only by using
pcp_command but using pcp library is being checked of the node range.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> I found an annoying problem with the PCP command pcp_detach_node. I have
> 3 computers running each a postgresql instance in a streaming
> replication line. PgPool is running on the first node which is the
> master. The problem comes when you give a node id outside the real node
> numbers.
> 
> As I explain above I just have 3 nodes so node id goes from 0 up to 2
> and if I use node id 3 that doesn't exists, here are the results:
> 
> /usr/bin/pcp_detach_node -d 10 192.168.1.11 9898 postgres postgres 3
> 
> DEBUG: send: tos="R", len=46
> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
> DEBUG: send: tos="D", len=6
> DEBUG: recv: tos="d", len=20, data=CommandComplete
> DEBUG: send: tos="X", len=4
> ------------- log file ----------------
> LOG: notice_backend_error: node 0 is not valid backend.
> LOG: starting degeneration. shutdown host 192.168.1.13(5432)
> LOG: execute command: /home/postgres/bin/failover.sh 2 192.168.1.13
> 192.168.1.11 /home/postgres/data/postgres.trigger
> LOG: failover_handler: set new master node: 0
> LOG: failover done. shutdown host 192.168.1.13(5432)
> LOG: find_primary_node: primary node id is 0
>  
> [postgres at vm1 ~]$ psql -p 9999 -c "SHOW pool_nodes;"
>  node_id |   hostname   | port | status | lb_weight | state
> ---------+--------------+------+--------+-----------+-------
>  0       | 192.168.1.11 | 5432 | 2      | 0.333333  | P
>  1       | 192.168.1.12 | 5432 | 2      | 0.333333  | S
>  2       | 192.168.1.13 | 5432 | 3      | 0.333333  | S
> (3 rows)
> 
> As you can see node 2 has been detached instead of aborting and
> displaying an error, I also experienced that the detached node was node
> 0, which is worst.
> 
> I've attached a patch that will return the following :
> 
> /usr/bin/pcp_detach_node -d 10 192.168.1.11 9898 postgres postgres 3
> DEBUG: send: tos="R", len=46
> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
> DEBUG: send: tos="D", len=6
> EOFError
> DEBUG: send: tos="X", len=4
> ------------- log file ----------------
> LOG: pcp_child: node id 3 is not valid
> LOG: PCP child 32232 exits with status 256
> LOG: fork a new PCP child pid 32299
> 
> 
> Regards,
> 
> -- 
> Gilles Darold
> http://dalibo.com - http://dalibo.org
>