[pgpool-general: 3449] Failing node parameter (%d) different between failover_command and follow_master_command execution

Emanuel Calvo postgres.arg at gmail.com
Mon Feb 2 19:43:14 JST 2015


Hi guys,

I'm runing pgpool 3.3.2 in a lab with 3 hosts over a Pg 9.2. I have
the failover and the follow commands to automate both processes.

The main test is to simulate a master failure, promote a new slave and
change the source replication for the other one.

Here is the ID list:
[root at lrn-db-tst-03 ~]# for i in $(seq 0 2) ; do pcp_node_info 3
192.168.1.70 9898 pgpool pgpool $i ; done
192.168.1.68 5433 2 0.333333
192.168.1.69 5433 2 0.333333
192.168.1.70 5433 2 0.333333


The failover process executes fine. Here is the log:

2015-02-02 05:48:47 LOG:   pid 24925: execute command:
/usr/pgsql-9.2/bin/failover.sh 2 2 0 /var/lib/pgsql/9.2/data
+ TEMPLOG=/var/log/pgpool/failover.log
+ FAILING_NODE=2
+ OLD_MASTER=2
+ NEW_MASTER=0
+ DB_PATH=/var/lib/pgsql/9.2/data
+ NODE[0]=192.168.1.68
+ NODE[1]=192.168.1.69
+ NODE[2]=192.168.1.70
+ NEW_MASTER_IP=192.168.1.68
+ unset 'NODE[0]'
+ FAILING_NODE_IP=192.168.1.70
+ unset 'NODE[2]'
+ NEW_SLAVE=1
+ NEW_SLAVE_IP=192.168.1.69
++ date
+ echo 'Mon Feb  2 05:48:47 CET 2015 NEW_SLAVE_IP=192.168.1.69
FAILING_NODE_IP=192.168.1.70'
+ '[' 2 == 2 ']'
++ date
+ echo 'Mon Feb  2 05:48:47 CET 2015 MASTER FAILED'
+ /usr/bin/ssh -T 192.168.1.68 /usr/pgsql-9.2/bin/pg_ctl -p5433 -D
/var/lib/pgsql/9.2/data promote
+ /usr/bin/ssh -T 192.168.1.68 'while test ! -f
/var/lib/pgsql/9.2/data/recovery.done; do sleep 1; done; scp
/var/lib/pgsql/9.2/data/pg_xlog/*history*
192.168.1.69:/var/lib/pgsql/9.2/data/pg_xlog/'
+ exit 0
2015-02-02 05:48:50 LOG:   pid 24925: find_primary_node_repeatedly:
waiting for finding a primary node
2015-02-02 05:48:50 LOG:   pid 24925: find_primary_node: primary node id is 0
2015-02-02 05:48:50 LOG:   pid 24925: starting follow degeneration.
shutdown host 192.168.1.69(5433)
2015-02-02 05:48:50 LOG:   pid 24925: starting follow degeneration.
shutdown host 192.168.1.70(5433)
2015-02-02 05:48:50 LOG:   pid 24925: failover: 2 follow backends have
been degenerated
2015-02-02 05:48:50 LOG:   pid 24925: failover: set new primary node: 0
2015-02-02 05:48:50 LOG:   pid 24925: failover: set new master node: 0



However, when the follow command is triggered, it changes the parameters values:

2015-02-02 05:48:50 LOG:   pid 25048: start triggering follow command.
2015-02-02 05:48:50 LOG:   pid 25048: execute command:
/usr/pgsql-9.2/bin/follow.sh 1 2 0 /var/lib/pgsql/9.2/data
+ TEMPLOG=/var/log/pgpool/follow.log
+ FAILING_NODE=1
+ OLD_MASTER=2
+ NEW_MASTER=0
+ DB_PATH=/var/lib/pgsql/9.2/data
+ NODE[0]=192.168.1.68
+ NODE[1]=192.168.1.69
+ NODE[2]=192.168.1.70
+ NEW_MASTER_IP=192.168.1.68
+ unset 'NODE[0]'
+ FAILING_NODE_IP=192.168.1.69
+ unset 'NODE[1]'
+ NEW_SLAVE=1
+ NEW_SLAVE_IP=192.168.1.70
++ date
2015-02-02 05:48:50 LOG:   pid 24925: failover done. shutdown host
192.168.1.70(5433)
2015-02-02 05:48:50 LOG:   pid 24943: worker process received restart request
+ echo 'Mon Feb  2 05:48:50 CET 2015 failing old new dbpath'
+ /usr/bin/ssh -T 192.168.1.70 'sed -i
'\''s/192.168.1.69/192.168.1.68/'\''
/var/lib/pgsql/9.2/data/recovery.conf'
+ /usr/bin/ssh -T 192.168.1.70 /usr/pgsql-9.2/bin/pg_ctl -D
/var/lib/pgsql/9.2/data restart
2015-02-02 05:48:51 LOG:   pid 24942: pcp child process received restart request
2015-02-02 05:48:51 LOG:   pid 24925: PCP child 24942 exits with
status 256 in failover()
2015-02-02 05:48:51 LOG:   pid 24925: fork a new PCP child pid 25078
in failover()
2015-02-02 05:48:51 LOG:   pid 24925: worker child 24943 exits with status 256
2015-02-02 05:48:51 LOG:   pid 24925: fork a new worker child pid 25079


The ID of the %d option is different, causing to select the wrong
master/slave. For example, this behavior is trying to setup as slave
the failed master.

Here is the configuration lines:

  follow_master_command = '/usr/pgsql-9.2/bin/follow.sh %d %P %m %D'
  failover_command = '/usr/pgsql-9.2/bin/failover.sh %d %P %m %D'

Let me know if I need me to attach a log with debug symbols.

Regards,






-- 
--
Emanuel Calvo http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


More information about the pgpool-general mailing list