[pgpool-general: 5300] secondary node fails to start watchdog
Felix Hanley
felix at userspace.com.au
Mon Feb 6 14:49:32 JST 2017
Hi all,
This is most likely an issue with my setup but I need some confirmation
with my watchdog configuration. The first node starts up fine and sets
the delegate_IP. The second node successfully pings the delegate_IP but
then fails to start watchdog and exits with the following:
2017-02-06 05:45:29: pid 1263: LOG: watchdog is verifying connectivity with a trusted server "ww01-01"
2017-02-06 05:45:31: pid 1263: DEBUG: watchdog ping process for host "ww01-01" exited successfully
2017-02-06 05:45:31: pid 1263: DEBUG: watchdog ping
2017-02-06 05:45:31: pid 1263: DETAIL: ping data: PING ww01-01.mel.userspace.com.au (10.240.0.89): 56 data bytes
--- ww01-01.mel.userspace.com.au ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.526/0.753/1.144/0.277 ms
2017-02-06 05:45:31: pid 1263: DEBUG: watchdog succeeded to ping a host "ww01-01"
2017-02-06 05:45:31: pid 1263: LOG: failed to create watchdog sending socket
2017-02-06 05:45:31: pid 1263: DETAIL: connect() reports failure "Connection refused"
2017-02-06 05:45:31: pid 1263: HINT: You can safely ignore this while starting up.
2017-02-06 05:45:31: pid 1263: LOG: watchdog sending packet for nodes
2017-02-06 05:45:31: pid 1263: DETAIL: packet for "10.240.0.33:9000" is canceled
2017-02-06 05:45:33: pid 1263: DEBUG: watchdog ping process for host "10.240.1.16" exited successfully
2017-02-06 05:45:33: pid 1263: DEBUG: watchdog ping
2017-02-06 05:45:33: pid 1263: DETAIL: ping data: PING 10.240.1.16 (10.240.1.16): 56 data bytes
--- 10.240.1.16 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.347/0.432/0.561/0.093 ms
2017-02-06 05:45:33: pid 1263: DEBUG: watchdog succeeded to ping a host "10.240.1.16"
2017-02-06 05:45:33: pid 1263: FATAL: failed to initialize watchdog, delegate_IP "10.240.1.16" already exists
I would expect a successful ping to indicate that the master is active
and so the node would assume the slave role, but it just dies.
My setup is as follows:
Two FreeBSD 11 hosts pg01-01 and pg01-02, network connectivity is
confirmed and firewall disabled for testing. The the heartbeat is being
sent and received (tcpdump confirmed). Both running pgpool 3.4.9 with
the following config options (just relevant ones?):
listen_addresses = '0.0.0.0'
port = 5432
backend_hostname0 = 'pg01-01'
backend_port0 = 5431
backend_weight0 = 1
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'pg01-02'
backend_port1 = 5431
backend_flag1 = 'ALLOW_TO_FAILOVER'
replication_mode = on
master_slave_mode = off
use_watchdog = on
trusted_servers = 'ww01-01,ww01-02'
ping_path = '/sbin'
wd_hostname = '10.240.0.33'
wd_port = 9000
wd_authkey = ''
delegate_IP = '10.240.1.16'
ifconfig_path = '/sbin'
if_up_cmd = 'ifconfig vtnet1 alias $_IP_$ netmask 255.255.255.255'
if_down_cmd = 'ifconfig vtnet1 delete $_IP_$'
arping_path = '/usr/local/sbin'
arping_cmd = 'arping -U $_IP_$ -w 1'
wd_lifecheck_method = 'heartbeat'
wd_heartbeat_port = 9694
heartbeat_destination0 = 'pg01-02'
heartbeat_destination_port0 = 9694
other_pgpool_hostname0 = 'pg01-02'
other_pgpool_port0 = 5432
other_wd_port0 = 9000
Let me know if you need any more details.
-felix
More information about the pgpool-general
mailing list