[pgpool-general: 4898] pgpool duplicate IP (3.5.3)

Shay Cohavi cohavisi at gmail.com
Thu Aug 11 23:46:14 JST 2016


Hi,

When performing an restart on a primary pgpool node, and vip transfer to
the 2nd node, but when the faulty primary boots up, it  declare itself as
the only node in the cluster and brings up the VIP (duplicate IP)!!


the 1st node (startup):
2016-08-11 17:31:36: pid 1761: WARNING:  checking setuid bit of if_up_cmd
2016-08-11 17:31:36: pid 1761: DETAIL:  ifup[/sbin/ifconfig] doesn't have
setuid bit
2016-08-11 17:31:36: pid 1761: WARNING:  checking setuid bit of if_down_cmd
2016-08-11 17:31:36: pid 1761: DETAIL:  ifdown[/sbin/ifconfig] doesn't have
setuid bit
2016-08-11 17:31:36: pid 1761: WARNING:  checking setuid bit of arping
command
2016-08-11 17:31:36: pid 1761: DETAIL:  arping[/sbin/arping] doesn't have
setuid bit
2016-08-11 17:31:36: pid 1761: LOG:  waiting for watchdog to initialize
2016-08-11 17:31:36: pid 1767: LOG:  setting the local watchdog node name
to "Linux_mgrdb84_9999"
2016-08-11 17:31:36: pid 1767: LOG:  watchdog cluster configured with 1
remote nodes
2016-08-11 17:31:36: pid 1767: LOG:  watchdog remote node:0 on 1.1.1.85:9000
2016-08-11 17:31:36: pid 1767: LOG:  interface monitoring is disabled in
watchdog
2016-08-11 17:31:36: pid 1767: LOG:  IPC socket path:
"/tmp/.s.PGPOOLWD_CMD.9000"
2016-08-11 17:31:41: pid 1767: LOG:  watchdog node state changed from
[LOADING] to [JOINING]
2016-08-11 17:31:46: pid 1767: LOG:  watchdog node state changed from
[JOINING] to [INITIALIZING]
*2016-08-11 17:31:47: pid 1767: LOG:  I am the only alive node in the
watchdog cluster*
2016-08-11 17:31:47: pid 1767: HINT:  skiping stand for coordinator state
2016-08-11 17:31:47: pid 1767: LOG:  watchdog node state changed from
[INITIALIZING] to [MASTER]
2016-08-11 17:31:47: pid 1767: LOG:  I am announcing my self as
master/coordinator watchdog node
2016-08-11 17:31:48: pid 1767: LOG:  new watchdog node connection is
received from "1.1.1.85:37022"
2016-08-11 17:31:48: pid 1767: LOG:  quorum is complete after node
"Linux_mgrdb85_9999" joined the cluster
2016-08-11 17:31:48: pid 1767: DETAIL:  starting escalation process
2016-08-11 17:31:48: pid 1767: LOG:  escalation process started with
PID:2087
2016-08-11 17:31:48: pid 2087: LOG:  watchdog: escalation started
2016-08-11 17:31:50: pid 2087: WARNING:  watchdog failed to bring up
delegate IP, 'if_up_cmd' failed
2016-08-11 17:31:50: pid 2087: WARNING:  watchdog de-escalation failed to
bring down delegate IP
2016-08-11 17:31:50: pid 1767: LOG:  watchdog escalation process with pid:
2087 exit with SUCCESS.
2016-08-11 17:31:51: pid 1767: LOG:  new outbond connection to 1.1.1.85:9000
2016-08-11 17:31:53: pid 1767: LOG:  I am the cluster leader node
2016-08-11 17:31:53: pid 1767: DETAIL:  our declare coordinator message is
accepted by all nodes
2016-08-11 17:31:53: pid 1761: LOG:  watchdog process is initialized
2016-08-11 17:31:53: pid 1767: LOG:  new IPC connection received
2016-08-11 17:31:53: pid 1761: LOG:  Setting up socket for 0.0.0.0:9999
2016-08-11 17:31:53: pid 1761: LOG:  Setting up socket for :::9999
2016-08-11 17:31:53: pid 2103: LOG:  2 watchdog nodes are configured for
lifecheck
2016-08-11 17:31:53: pid 2103: LOG:  watchdog nodes ID:0
Name:"Linux_mgrdb84_9999"
2016-08-11 17:31:53: pid 2103: DETAIL:  Host:"1.1.1.84" WD Port:9000
pgpool-II port:9999
2016-08-11 17:31:53: pid 2103: LOG:  watchdog nodes ID:1
Name:"Linux_mgrdb85_9999"
2016-08-11 17:31:53: pid 2103: DETAIL:  Host:"1.1.1.85" WD Port:9000
pgpool-II port:9999
2016-08-11 17:31:53: pid 1761: LOG:  pgpool-II successfully started.
version 3.5.3 (ekieboshi)
2016-08-11 17:31:53: pid 1761: LOG:  find_primary_node: checking backend no
0
2016-08-11 17:31:53: pid 1761: LOG:  find_primary_node: primary node id is 0
2016-08-11 17:31:54: pid 2105: LOG:  createing watchdog heartbeat receive
socket.
2016-08-11 17:31:54: pid 2105: DETAIL:  bind receive socket to device:
"eth1"
2016-08-11 17:31:54: pid 2105: LOG:  set SO_REUSEPORT option to the socket
2016-08-11 17:31:54: pid 2105: LOG:  creating watchdog heartbeat receive
socket.
2016-08-11 17:31:54: pid 2105: DETAIL:  set SO_REUSEPORT
2016-08-11 17:31:54: pid 2107: LOG:  creating socket for sending heartbeat
2016-08-11 17:31:54: pid 2107: DETAIL:  bind send socket to device: eth1
2016-08-11 17:31:54: pid 2107: LOG:  set SO_REUSEPORT option to the socket
2016-08-11 17:31:54: pid 2107: LOG:  creating socket for sending heartbeat
2016-08-11 17:31:54: pid 2107: DETAIL:  set SO_REUSEPORT
2016-08-11 17:33:33: pid 2103: LOG:  watchdog: lifecheck started
2016-08-11 17:36:31: pid 1761: LOG:  child process with pid: 2254 exits
with status 256
2016-08-11 17:36:31: pid 1761: LOG:  fork a new child process with pid: 3193


the 2nd node:

*2016-08-11 17:17:34: pid 16256: WARNING:  checking setuid bit of if_up_cmd*
2016-08-11 17:17:34: pid 16256: DETAIL:  ifup[/sbin/ifconfig] doesn't have
setuid bit
*2016-08-11 17:17:34: pid 16256: WARNING:  checking setuid bit of
if_down_cmd*
2016-08-11 17:17:34: pid 16256: DETAIL:  ifdown[/sbin/ifconfig] doesn't
have setuid bit
*2016-08-11 17:17:34: pid 16256: WARNING:  checking setuid bit of arping
command*
2016-08-11 17:17:34: pid 16256: DETAIL:  arping[/sbin/arping] doesn't have
setuid bit
2016-08-11 17:17:34: pid 16256: LOG:  reading status file: 1 th backend is
set to down status
2016-08-11 17:17:34: pid 16256: LOG:  waiting for watchdog to initialize
2016-08-11 17:17:34: pid 16258: LOG:  setting the local watchdog node name
to "Linux_mgrdb84_9999"
2016-08-11 17:17:34: pid 16258: LOG:  watchdog cluster configured with 1
remote nodes
2016-08-11 17:17:34: pid 16258: LOG:  watchdog remote node:0 on
1.1.1.85:9000
2016-08-11 17:17:34: pid 16258: LOG:  interface monitoring is disabled in
watchdog
2016-08-11 17:17:34: pid 16258: LOG:  IPC socket path:
"/tmp/.s.PGPOOLWD_CMD.9000"
2016-08-11 17:17:38: pid 16258: LOG:  watchdog node state changed from
[LOADING] to [JOINING]
2016-08-11 17:17:43: pid 16258: LOG:  watchdog node state changed from
[JOINING] to [INITIALIZING]
2016-08-11 17:17:44: pid 16258: LOG:  I am the only alive node in the
watchdog cluster
2016-08-11 17:17:44: pid 16258: HINT:  skiping stand for coordinator state
2016-08-11 17:17:44: pid 16258: LOG:  watchdog node state changed from
[INITIALIZING] to [MASTER]
2016-08-11 17:17:44: pid 16258: LOG:  I am announcing my self as
master/coordinator watchdog node
2016-08-11 17:17:49: pid 16258: LOG:  I am the cluster leader node
2016-08-11 17:17:49: pid 16258: DETAIL:  our declare coordinator message is
accepted by all nodes
2016-08-11 17:17:49: pid 16258: LOG:  I am the cluster leader node.
Starting escalation process
2016-08-11 17:17:49: pid 16256: LOG:  watchdog process is initialized
2016-08-11 17:17:49: pid 16258: LOG:  escalation process started with
PID:16262
2016-08-11 17:17:49: pid 16262: LOG:  watchdog: escalation started
2016-08-11 17:17:49: pid 16258: LOG:  new IPC connection received
0:9999
-11 17:17:49: pid 16256: LOG:  Setting up socket for :::9999
2016-08-11 17:17:49: pid 16263: LOG:  2 watchdog nodes are configured for
lifecheck
2016-08-11 17:17:49: pid 16263: LOG:  watchdog nodes ID:0
Name:"Linux_mgrdb84_9999"
2016-08-11 17:17:49: pid 16263: DETAIL:  Host:"1.1.1.84" WD Port:9000
pgpool-II port:9999
2016-08-11 17:17:49: pid 16263: LOG:  watchdog nodes ID:1 Name:"Not_Set"
2016-08-11 17:17:49: pid 16263: DETAIL:  Host:"1.1.1.85" WD Port:9000
pgpool-II port:9999
2016-08-11 17:17:49: pid 16256: LOG:  pgpool-II successfully started.
version 3.5.3 (ekieboshi)
2016-08-11 17:17:49: pid 16256: LOG:  find_primary_node: checking backend
no 0
2016-08-11 17:17:49: pid 16256: LOG:  find_primary_node: primary node id is
0
2016-08-11 17:17:50: pid 16267: LOG:  createing watchdog heartbeat receive
socket.
2016-08-11 17:17:50: pid 16267: DETAIL:  bind receive socket to device:
"eth1"
2016-08-11 17:17:50: pid 16269: LOG:  set SO_REUSEPORT option to the socket
2016-08-11 17:17:50: pid 16269: LOG:  creating socket for sending heartbeat
2016-08-11 17:17:50: pid 16269: DETAIL:  set SO_REUSEPORT
EPORT
*2016-08-11 17:17:51: pid 16262: WARNING:  watchdog failed to bring up
delegate IP, 'if_up_cmd' failed*
*2016-08-11 17:17:51: pid 16262: WARNING:  watchdog de-escalation failed to
bring down delegate IP*
2016-08-11 17:17:51: pid 16258: LOG:  watchdog escalation process with pid:
16262 exit with SUCCESS.
2016-08-11 17:18:09: pid 16256: LOG:  child process with pid: 16423 exits
with status 256
2016-08-11 17:18:09: pid 16256: LOG:  fork a new child process with pid:
16476




Please advice,
cohavisi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20160811/4239ac51/attachment.html>


More information about the pgpool-general mailing list