[pgpool-general: 3453] Re: Failing node parameter (%d) different between failover_command and follow_master_command execution

Tatsuo Ishii ishii at postgresql.org
Tue Feb 3 23:47:06 JST 2015


Can you please show me the follow master command script?
(i.e. failover.sh)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi guys,
> 
> I'm runing pgpool 3.3.2 in a lab with 3 hosts over a Pg 9.2. I have
> the failover and the follow commands to automate both processes.
> 
> The main test is to simulate a master failure, promote a new slave and
> change the source replication for the other one.
> 
> Here is the ID list:
> [root at lrn-db-tst-03 ~]# for i in $(seq 0 2) ; do pcp_node_info 3
> 192.168.1.70 9898 pgpool pgpool $i ; done
> 192.168.1.68 5433 2 0.333333
> 192.168.1.69 5433 2 0.333333
> 192.168.1.70 5433 2 0.333333
> 
> 
> The failover process executes fine. Here is the log:
> 
> 2015-02-02 05:48:47 LOG:   pid 24925: execute command:
> /usr/pgsql-9.2/bin/failover.sh 2 2 0 /var/lib/pgsql/9.2/data
> + TEMPLOG=/var/log/pgpool/failover.log
> + FAILING_NODE=2
> + OLD_MASTER=2
> + NEW_MASTER=0
> + DB_PATH=/var/lib/pgsql/9.2/data
> + NODE[0]=192.168.1.68
> + NODE[1]=192.168.1.69
> + NODE[2]=192.168.1.70
> + NEW_MASTER_IP=192.168.1.68
> + unset 'NODE[0]'
> + FAILING_NODE_IP=192.168.1.70
> + unset 'NODE[2]'
> + NEW_SLAVE=1
> + NEW_SLAVE_IP=192.168.1.69
> ++ date
> + echo 'Mon Feb  2 05:48:47 CET 2015 NEW_SLAVE_IP=192.168.1.69
> FAILING_NODE_IP=192.168.1.70'
> + '[' 2 == 2 ']'
> ++ date
> + echo 'Mon Feb  2 05:48:47 CET 2015 MASTER FAILED'
> + /usr/bin/ssh -T 192.168.1.68 /usr/pgsql-9.2/bin/pg_ctl -p5433 -D
> /var/lib/pgsql/9.2/data promote
> + /usr/bin/ssh -T 192.168.1.68 'while test ! -f
> /var/lib/pgsql/9.2/data/recovery.done; do sleep 1; done; scp
> /var/lib/pgsql/9.2/data/pg_xlog/*history*
> 192.168.1.69:/var/lib/pgsql/9.2/data/pg_xlog/'
> + exit 0
> 2015-02-02 05:48:50 LOG:   pid 24925: find_primary_node_repeatedly:
> waiting for finding a primary node
> 2015-02-02 05:48:50 LOG:   pid 24925: find_primary_node: primary node id is 0
> 2015-02-02 05:48:50 LOG:   pid 24925: starting follow degeneration.
> shutdown host 192.168.1.69(5433)
> 2015-02-02 05:48:50 LOG:   pid 24925: starting follow degeneration.
> shutdown host 192.168.1.70(5433)
> 2015-02-02 05:48:50 LOG:   pid 24925: failover: 2 follow backends have
> been degenerated
> 2015-02-02 05:48:50 LOG:   pid 24925: failover: set new primary node: 0
> 2015-02-02 05:48:50 LOG:   pid 24925: failover: set new master node: 0
> 
> 
> 
> However, when the follow command is triggered, it changes the parameters values:
> 
> 2015-02-02 05:48:50 LOG:   pid 25048: start triggering follow command.
> 2015-02-02 05:48:50 LOG:   pid 25048: execute command:
> /usr/pgsql-9.2/bin/follow.sh 1 2 0 /var/lib/pgsql/9.2/data
> + TEMPLOG=/var/log/pgpool/follow.log
> + FAILING_NODE=1
> + OLD_MASTER=2
> + NEW_MASTER=0
> + DB_PATH=/var/lib/pgsql/9.2/data
> + NODE[0]=192.168.1.68
> + NODE[1]=192.168.1.69
> + NODE[2]=192.168.1.70
> + NEW_MASTER_IP=192.168.1.68
> + unset 'NODE[0]'
> + FAILING_NODE_IP=192.168.1.69
> + unset 'NODE[1]'
> + NEW_SLAVE=1
> + NEW_SLAVE_IP=192.168.1.70
> ++ date
> 2015-02-02 05:48:50 LOG:   pid 24925: failover done. shutdown host
> 192.168.1.70(5433)
> 2015-02-02 05:48:50 LOG:   pid 24943: worker process received restart request
> + echo 'Mon Feb  2 05:48:50 CET 2015 failing old new dbpath'
> + /usr/bin/ssh -T 192.168.1.70 'sed -i
> '\''s/192.168.1.69/192.168.1.68/'\''
> /var/lib/pgsql/9.2/data/recovery.conf'
> + /usr/bin/ssh -T 192.168.1.70 /usr/pgsql-9.2/bin/pg_ctl -D
> /var/lib/pgsql/9.2/data restart
> 2015-02-02 05:48:51 LOG:   pid 24942: pcp child process received restart request
> 2015-02-02 05:48:51 LOG:   pid 24925: PCP child 24942 exits with
> status 256 in failover()
> 2015-02-02 05:48:51 LOG:   pid 24925: fork a new PCP child pid 25078
> in failover()
> 2015-02-02 05:48:51 LOG:   pid 24925: worker child 24943 exits with status 256
> 2015-02-02 05:48:51 LOG:   pid 24925: fork a new worker child pid 25079
> 
> 
> The ID of the %d option is different, causing to select the wrong
> master/slave. For example, this behavior is trying to setup as slave
> the failed master.
> 
> Here is the configuration lines:
> 
>   follow_master_command = '/usr/pgsql-9.2/bin/follow.sh %d %P %m %D'
>   failover_command = '/usr/pgsql-9.2/bin/failover.sh %d %P %m %D'
> 
> Let me know if I need me to attach a log with debug symbols.
> 
> Regards,
> 
> 
> 
> 
> 
> 
> -- 
> --
> Emanuel Calvo http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general


More information about the pgpool-general mailing list