[pgpool-general: 3728] Re: pgpool-general Digest, Vol 43, Issue 17

Wed May 20 05:08:35 JST 2015

Have you actually seen this happen? It looked to me that there was some
attempt by the watchdog coming up to contact other nodes *before* it tried
to bring up the delegate IP?

I have a different, but related question. What happens if the network
partitions and watchdog X is unable to communicate with watchdog Y.
Suppose X is the leader, and has the delegate IP. Y will see the heartbeat
failure from X, and try to promote. Will it fail when it tries to ping the
delegate IP if the ping succeeds?
I think I have seen this, the watchdog fails to initialize, and then
pgpool quits. Seems like the watchdog should just revert to standby if the
ping succeeds.

//w

On 5/19/15, 5:53 AM, "pgpool-general-request at pgpool.net"
<pgpool-general-request at pgpool.net> wrote:

>Send pgpool-general mailing list submissions to
>	pgpool-general at pgpool.net
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	http://www.sraoss.jp/mailman/listinfo/pgpool-general
>or, via email, send a message with subject or body 'help' to
>	pgpool-general-request at pgpool.net
>
>You can reach the person managing the list at
>	pgpool-general-owner at pgpool.net
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of pgpool-general digest..."
>
>
>Today's Topics:
>
>   1. [pgpool-general: 3724] delegate ip lost (Janusz Borkowski)
>   2. [pgpool-general: 3725] pcp_promote problem (Janusz Borkowski)
>   3. [pgpool-general: 3726] Re: Questions about watchdog
>      (Gervais de Montbrun)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Tue, 19 May 2015 10:35:43 +0200
>From: Janusz Borkowski <janusz.borkowski at infobright.com>
>To: pgpool-general at pgpool.net
>Subject: [pgpool-general: 3724] delegate ip lost
>Message-ID: <555AF5DF.6080302 at infobright.com>
>Content-Type: text/plain; charset="utf-8"
>
>Hi!
>
>I wonder how to deal with such a situation:
>
>1.
>a node X runs an Active watchdog and has the delegate IP assigned.
>Network goes down physically (or the network cable is unplugged) for a
>while. The interface gets down when there is no eth signal.
>
>2.
>A standby watchdog at node Y detects it and makes itself Active bringing
>up the delegate IP.
>
>3.
>The interface at node X is up after the network is up (cable is plugged)
>again.
>
>4.
>The watchdog at X considers itself to be Active all the time.
>Unfortunately, its delegate IP is lost - it is not restored when the
>interface gets up again.
>
>5.
>The watchdog at Y is active as well and has the delegate IP.
>
>6.
>We have 2 watchdogs thinking they are Active, while only one has the
>delegate IP.
>
>7.
>When node Y goes down, watchdog at X considers itself to be Active all
>the time, so it does not need to do anything. But it does not have the
>delegate IP anymore
>
>
>Any clue ? It seems a manual pgpool restart at node X after step 4. is
>required...
>
>Cheers!
>
>JanuszB
>
>
>
>------------------------------
>
>Message: 2
>Date: Tue, 19 May 2015 13:22:35 +0200
>From: Janusz Borkowski <janusz.borkowski at infobright.com>
>To: pgpool-general at pgpool.net
>Subject: [pgpool-general: 3725] pcp_promote problem
>Message-ID: <555B1CFB.30308 at infobright.com>
>Content-Type: text/plain; charset="utf-8"
>
>Hi!
>
>I see in pgpoolAdmin:
>
>node 1 192.168.10.188 6544     Up. Disconnect. Running as standby server
>   postgres: Up   
>
>Indeed, recovery.conf file exists on .188 in the PGDATA folder. This is
>the only running node.
>
>However promoting is not possible:
>
>[root at ib-wawa-189 pgpoolAdmin]# sudo -u apache /usr/bin/pcp_promote_node
>-g  10 localhost 9898 admin 'pgpool' 1
>BackendError
>
>in the logs:
>May 19 12:35:12 localhost pgpool: 2015-05-19 12:35:12: pid 2494: FATAL:
>invalid pgpool mode for the command
>May 19 12:35:12 localhost pgpool: 2015-05-19 12:35:12: pid 2494: DETAIL:
>specified node is already primary node, can't promote node id 1
>
>Surely, I can promote with pg_ctl:
>
>[root at ib-wawa-188 ~]# sudo -u postgres /usr/pgsql-9.2/bin/pg_ctl -D
>/var/lib/pgsql/9.2/data/ promote
>server promoting
>
>Is it a pgpool bug? I use 3.4.2-1 version.
>
>BTW, pgpoolAdmin (pgpoolAdmin-3.4.1-2pgdg.rhel7) has a bug in php code
>servicing the 'Promote' button - it uses 'e1007' code for it, which
>resolves to "pcp_detach_node command error occurred" message.
>
>Thanks,
>JanuszB
>
>
>------------------------------
>
>Message: 3
>Date: Tue, 19 May 2015 09:53:24 -0300
>From: Gervais de Montbrun <gervais at silverorange.com>
>To: Wes Mitchell <wes.mitchell at ericsson.com>
>Cc: "pgpool-general at pgpool.net" <pgpool-general at pgpool.net>
>Subject: [pgpool-general: 3726] Re: Questions about watchdog
>Message-ID: <39D9CBD1-34E6-4238-B816-B229DCE93C88 at silverorange.com>
>Content-Type: text/plain; charset="windows-1252"
>
>Hey Wes,
>
>I have the same issue where if pgpool doesn?t shutdown properly, it
>leaves the socket files trailing around and it prevents a proper startup.
>I?m using CentOS 7, so no init.d/pgpool-II scripts for me. I did try
>adding something to systemd files to remove the socket files, but no
>success. It remains an issue for me also. I have skipped this issue for
>now as I am not overly concerned about it. Nagios checks should let me
>know if anything is amiss.
>
>Cheers,
>Gervais
>
>> On May 18, 2015, at 9:10 PM, Wes Mitchell <wes.mitchell at ericsson.com>
>>wrote:
>> 
>> Hi Gervais, 
>> 
>> I have solved this particular problem; I had opened udp ports on the
>>firewall for both the heartbeat and the watchdog traffic, not realizing
>>the watchdog traffic was tcp. Now both pgpool instances initialize
>>properly, and failover even works!
>> 
>> However, now I have a different issue: if I use the service command to
>>stop/restart pgpool, sometimes Unix domain socket files are left
>>dangling (and sometimes they are cleaned up properly), preventing the
>>restart from succeeding.
>> 
>> Is this a known issue? Should I pre-emptively remove the socket files,
>>perhaps in the init.d/pgpool-II script? Or, can I avoid the issue
>>altogether by some magic in pool_hba.conf?
>> I find it difficult to search the archives. Is there full-text search
>>capability somewhere for the list?
>> 
>> Here are my settings. I have a non-root user with sudo privileges only
>>for ifconfig and arping, which is why the paths don?t look right (but it
>>works!)
>> 
>> use_watchdog = 'on'
>> wd_hostname = 'pgpool1'
>> Note: 'pgpool2' on second machine
>>  
>> wd_port = 19000
>> delegate_IP = 'your virtual IP'
>> if_up_cmd = 'sudo ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.0'
>> ifconfig_path = '/usr/bin'
>> if_down_cmd = 'sudo ifconfig eth0:0 down'
>> arping_path = '/usr/bin'
>> arping_cmd = 'sudo arping -U $_IP_$ -w 1
>> wd_interval = 3
>> wd_heartbeat_port = 19464
>> heartbeat_destination_port0 = 19464
>> other_pgpool_port0 = 15432
>> other_wd_port0 = 19000
>> Note: these lines must be different on the different hosts. On host
>>pgpool1, use
>> heartbeat_destination0 = 'pgpool2'
>> other_pgpool_hostname0 = 'pgpool2'
>>  
>> On host pgpool2, use
>> heartbeat_destination0 = 'pgpool1'
>> other_pgpool_hostname0 = 'pgpool1'
>> Thanks for your response,
>> 
>> Wes
>> 
>> From: Gervais de Montbrun <gervais at silverorange.com
>><mailto:gervais at silverorange.com>>
>> Date: Monday, May 18, 2015 at 3:13 PM
>> To: Wes Mitchell <wes.mitchell at ericsson.com
>><mailto:wes.mitchell at ericsson.com>>
>> Cc: "pgpool-general at pgpool.net <mailto:pgpool-general at pgpool.net>"
>><pgpool-general at pgpool.net <mailto:pgpool-general at pgpool.net>>
>> Subject: Re: [pgpool-general: 3721] Questions about watchdog
>> 
>> Hi Wes,
>> 
>> Something must be awry with your configs. Can you share the watchdog
>>relevant settings of your configs?
>> Perhaps your heartbeat_destination0 and other_pgpool_hostname0 are not
>>set. Just a hunch. Make sure that the first is the hostname of the
>>server you are running on and the ?other? points to the other pgpool
>>server.
>> 
>> Cheers,
>> Gervais
>> 
>>> On May 18, 2015, at 5:30 PM, Wes Mitchell <wes.mitchell at ericsson.com
>>><mailto:wes.mitchell at ericsson.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> I am trying to configure pgpool-II for HA using watchdog.
>>> I am running into the following issue: if I specify the delegate_IP
>>>parameter on both pgpool hosts, then whichever one is brought up second
>>>fails:
>>> 
>>> 2015-05-18 16:11:44: pid 26948: LOCATION:  wd_ping.c:309
>>> 2015-05-18 16:11:44: pid 26948: FATAL:  failed to initialize watchdog,
>>>delegate_IP "10.61.156.162" already exists
>>> 
>>> And all processes then terminate.
>>> 
>>> Please help me understand the proper configuration. I am setting
>>> delegate_IP = ?10.61.156.162?
>>> 
>>> I see that the interface is brought up and bound to that IP on the
>>>first instance, using ifconfig:
>>> eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:39:17:DF
>>>           inet addr:10.61.156.162  Bcast:10.61.156.255
>>>Mask:255.255.255.0
>>> 
>>> Is there some setting to tell pgpool process that it is master or
>>>standby? How do I set delegate_IP so that failover will bring up the IP
>>>on the promoted machine?
>>> 
>>> If you could also reply directly, I would appreciate it.
>>> 
>>> Thanks,
>>> //w
>>> _______________________________________________
>>> pgpool-general mailing list
>>> pgpool-general at pgpool.net <mailto:pgpool-general at pgpool.net>
>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>><http://www.pgpool.net/mailman/listinfo/pgpool-general>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: 
><http://www.sraoss.jp/pipermail/pgpool-general/attachments/20150519/81c741
>57/attachment.html>
>
>------------------------------
>
>_______________________________________________
>pgpool-general mailing list
>pgpool-general at pgpool.net
>http://www.pgpool.net/mailman/listinfo/pgpool-general
>
>
>End of pgpool-general Digest, Vol 43, Issue 17
>**********************************************