[pgpool-general: 5300] secondary node fails to start watchdog

Felix Hanley felix at userspace.com.au
Mon Feb 6 14:49:32 JST 2017


Hi all,

This is most likely an issue with my setup but I need some confirmation
with my watchdog configuration. The first node starts up fine and sets
the delegate_IP. The second node successfully pings the delegate_IP but
then fails to start watchdog and exits with the following:

    2017-02-06 05:45:29: pid 1263: LOG:  watchdog is verifying connectivity with a trusted server "ww01-01"
    2017-02-06 05:45:31: pid 1263: DEBUG:  watchdog ping process for host "ww01-01" exited successfully
    2017-02-06 05:45:31: pid 1263: DEBUG:  watchdog ping
    2017-02-06 05:45:31: pid 1263: DETAIL:  ping data: PING ww01-01.mel.userspace.com.au (10.240.0.89): 56 data bytes

            --- ww01-01.mel.userspace.com.au ping statistics ---
            3 packets transmitted, 3 packets received, 0.0% packet loss
            round-trip min/avg/max/stddev = 0.526/0.753/1.144/0.277 ms

    2017-02-06 05:45:31: pid 1263: DEBUG:  watchdog succeeded to ping a host "ww01-01"
    2017-02-06 05:45:31: pid 1263: LOG:  failed to create watchdog sending socket
    2017-02-06 05:45:31: pid 1263: DETAIL:  connect() reports failure "Connection refused"
    2017-02-06 05:45:31: pid 1263: HINT:  You can safely ignore this while starting up.
    2017-02-06 05:45:31: pid 1263: LOG:  watchdog sending packet for nodes
    2017-02-06 05:45:31: pid 1263: DETAIL:  packet for "10.240.0.33:9000" is canceled
    2017-02-06 05:45:33: pid 1263: DEBUG:  watchdog ping process for host "10.240.1.16" exited successfully
    2017-02-06 05:45:33: pid 1263: DEBUG:  watchdog ping
    2017-02-06 05:45:33: pid 1263: DETAIL:  ping data: PING 10.240.1.16 (10.240.1.16): 56 data bytes

            --- 10.240.1.16 ping statistics ---
            3 packets transmitted, 3 packets received, 0.0% packet loss
            round-trip min/avg/max/stddev = 0.347/0.432/0.561/0.093 ms

    2017-02-06 05:45:33: pid 1263: DEBUG:  watchdog succeeded to ping a host "10.240.1.16"
    2017-02-06 05:45:33: pid 1263: FATAL:  failed to initialize watchdog, delegate_IP "10.240.1.16" already exists

I would expect a successful ping to indicate that the master is active
and so the node would assume the slave role, but it just dies.

My setup is as follows:

Two FreeBSD 11 hosts pg01-01 and pg01-02, network connectivity is
confirmed and firewall disabled for testing. The the heartbeat is being
sent and received (tcpdump confirmed).  Both running pgpool 3.4.9 with
the following config options (just relevant ones?):

    listen_addresses = '0.0.0.0'
    port = 5432
    backend_hostname0 = 'pg01-01'
    backend_port0 = 5431
    backend_weight0 = 1
    backend_flag0 = 'ALLOW_TO_FAILOVER'
    backend_hostname1 = 'pg01-02'
    backend_port1 = 5431
    backend_flag1 = 'ALLOW_TO_FAILOVER'
    replication_mode = on
    master_slave_mode = off
    use_watchdog = on
    trusted_servers = 'ww01-01,ww01-02'
    ping_path = '/sbin'
    wd_hostname = '10.240.0.33'
    wd_port = 9000
    wd_authkey = ''
    delegate_IP = '10.240.1.16'
    ifconfig_path = '/sbin'
    if_up_cmd = 'ifconfig vtnet1 alias $_IP_$ netmask 255.255.255.255'
    if_down_cmd = 'ifconfig vtnet1 delete $_IP_$'
    arping_path = '/usr/local/sbin'
    arping_cmd = 'arping -U $_IP_$ -w 1'
    wd_lifecheck_method = 'heartbeat'
    wd_heartbeat_port = 9694
    heartbeat_destination0 = 'pg01-02'
    heartbeat_destination_port0 = 9694
    other_pgpool_hostname0 = 'pg01-02'
    other_pgpool_port0 = 5432
    other_wd_port0 = 9000

Let me know if you need any more details.

-felix


More information about the pgpool-general mailing list