[pgpool-general: 4681] Re: pgpool-general Digest, Vol 55, Issue 7

Thu May 12 21:26:43 JST 2016

Hey Lazar:
I have a favor to ask. i was wondering if you can make the following test
in your setup:
have node 1 be postgres primary and the pgpool master.
Do a test psql connection from your master to the virtual IP
now shut down the interface on node2 (The slave).
Make another test psql connection from your master to the virutal ip.

Does this last connection hang?
Thanks
Regards

On Wed, May 11, 2016 at 7:07 PM, Ricardo Larrañaga <
ricardo.larranaga at gmail.com> wrote:

> Hello Lazar, thanks a lot for your answers!
> When i want to send an email to the list, i add pgpool-general at pgpool.net
> to the To: Field. I dnt know why i am out of the list, but ill take a look
> at the summary, maybe someone else also answered.
> Ill take a look at your config, and compare it to mine. Ill send mine
> tomorrow, for reference.
>
> In your case, it definitely looks like the problem is an arp issue. does
> the arping work when the node failover? It does not look like it reaches
> the gateway. Maybe you have arp filtering somewhere?.I would sniff some
> traffic there.
>
> My case is different. It looks like the problem for me is not connecting
> to the virtual ip, but its that pgpool still tries to connect to the
> backend that went down, and when it cant, it does not mark that backend as
> failed.
>
> I am a little at a loss here i am not sure where else to look.
>
> Regards!
>
>
>
> On Wed, May 11, 2016 at 6:48 PM, Lazar Krumov <lkrumov_subs at yahoo.co.uk>
> wrote:
>
>> Hi Ricardo,
>>
>> I'm attaching the two config files to you.
>>
>> Probably it would be nice if we continue the chat via the mailing list.
>> This would allow other folks to share their thoughts and also to find
>> something for their hassles in our chat :)
>> So if you are OK with sharing the chat with community - please reply with
>> CC: to the:
>>  pgpool-general at pgpool.net
>> I'm not sure how exactly I should send the e-mails to the list?! It is
>> described:
>> http://www.sraoss.jp/mailman/listinfo/pgpool-general
>> but the first reply to you stays outside of the thread initiated by you
>> in the mail-list!
>>
>> ---------------
>> Regards the specifics in the PGPOOL-Config:
>> I run the PgPool daemon with non-root user. As you see the
>> arp/ip/ipconfig commands are non-orginal ones. For PGPOOL I use copies of
>> originals, but with setuid flag. I suspect a security gap here! But as far
>> as you secure enough the directories, these executables became invisible
>> for other users at a file-access level. So the setuid is not a security
>> obstacle anymore :)
>>
>> ---------------
>> On my servers I use Linux-BOND from 4 NICs in mode 802.3ad:
>>
>> > ca-dbs05:~# more /etc/modprobe.d/bonding_mode.conf
>> > options bonding mode=802.3ad
>>
>> Servers network configuration is:
>> > ca-dbs05:~# more /etc/network/interfaces
>> > auto lo eth0 eth1 eth2 eth3 bond0
>> > iface lo inet loopback
>> >
>> > iface eth0 inet static
>> >         address 0.0.0.0
>> >
>> > iface eth1 inet static
>> >         address 0.0.0.0
>> >
>> > iface eth2 inet static
>> >         address 0.0.0.0
>> >
>> > iface eth3 inet static
>> >         address 0.0.0.0
>> >
>> > iface bond0 inet static
>> >         address 192.168.0.218
>> >         netmask 255.255.255.0
>> >         network 192.168.0.0
>> >         broadcast 192.168.0.255
>> >         gateway 192.168.0.245
>> >         # dns-* options are implemented by the resolvconf package, if
>> installed
>> >         dns-nameservers 192.168.0.56
>> >         dns-search cadastre.bg
>> >         slaves eth0 eth1 eth2 eth3
>> >
>> > iface bond0:vip inet static
>> >         address 192.168.0.219
>> >         netmask 255.255.255.0
>> >         network 192.168.0.0
>> >         broadcast 192.168.0.255
>>
>> Notice the interface "bond0:vip". It is defined in Debian-config, but I
>> let PgPool to activate it.
>>
>>
>> On the CISCO Switches - Catalyst WS-C4948 - I use "Ether Channel":
>>
>> > #sh etherchannel 4 port-channel
>> >               Port-channels in the group:
>> >               ---------------------------
>> >
>> > Port-channel: Po4    (Primary Aggregator)
>> >
>> > ------------
>> >
>> > Age of the Port-channel   = 118d:20h:29m:40s
>> > Logical slot/port   = 11/4          Number of ports = 4
>> > Port state          = Port-channel Ag-Inuse
>> > Protocol            =   LACP
>> > Port security       = Disabled
>> >
>> > Ports in the Port-channel:
>> >
>> > Index   Load   Port     EC state        No of bits
>> > ------+------+------+------------------+-----------
>> >   2     00     Gi1/25   Active             0
>> >   0     00     Gi1/26   Active             0
>> >   1     00     Gi1/27   Active             0
>> >   3     00     Gi1/28   Active             0
>> >
>> > Time since last port bundled:    9d:04h:16m:22s    Gi1/28
>> > Time since last port Un-bundled: 9d:04h:18m:57s    Gi1/28
>>
>>
>>
>> All the 4 ports in the switch are configured like:
>>
>> > #sh run int Gi1/26
>> > interface GigabitEthernet1/26
>> >  description ca-dbs05 eth1 BOND
>> >  switchport trunk encapsulation dot1q
>> >  switchport mode trunk
>> >  channel-group 4 mode active
>> > end
>>
>> Here I have another consideration - all the interfaces in the switch are
>> configured as "mode trunk". This is another field for investigation of my
>> problem with slow ARP table learning.
>>
>>
>> -----------------------------------------------------
>> I'm experiencing another problem, which is pure network config:
>> The BONDing works in different that wanted way. I'd like a single
>> network-transfer-session to utilize all the NICs in the BOND. This is
>> described in:
>> https://www.kernel.org/doc/Documentation/networking/bonding.txt
>> as "balance-rr".
>> Unfortunately I don't know how to setup Cisco devices to agree with
>> "balance-rr" mode of the BOND.
>> The only working configuration, that I could establish with
>> Cisco-PortChannel / Linux-BOND is (Cisco-LACP/Active) /
>> (Linux-BOND/mode=802.3ad)
>> This stays in my TODO list... :)
>> I mention it, because it could also be an important configuration detail!
>>
>> BR
>> LAZA
>>
>>
>>
>>
>> > Hi Lazar, thanks a lot.
>> > The problem that i see here is that ll my servers(clients and pgpool
>> > instances) are in the same subnet, so the router issue should not be a
>> > problem for me. (These are all VMs). Any possibility you could share
>> > pgpool.conf to look at the differences?
>> > I would not have a problem sharing mine.
>> > Regards
>> >
>> >
>> >
>> > On Wed, May 11, 2016 at 11:14 AM, Lazar Krumov <
>> lkrumov_subs at yahoo.co.uk>
>> > wrote:
>> >
>> > > Hi Ricardo,
>> > >
>> > > I've been replied to your post, I afraid you've not seen the reply
>> > > You can see it in the mailing list, I also attach the reply to you:
>> > >
>> > > -
>> > > http://www.sraoss.jp/pipermail/pgpool-general/2016-May/004722.html
>> > > -
>> > > > Hi Ricardo,
>> > > >
>> > > > I'm experiencing a similar problem with PgPool failover.
>> > > > I am running 2 PgPool servers version 3.5.2 with watchdog - Active
>> and
>> > > Standby PgPool instances, running a virtual IP address (pgpool.cong:
>> > > delegate_IP parameter).
>> > > > When a failover event appears, the StandBy server activates the
>> virtual
>> > > IP address. The clients which are in the same subnet see the new
>> active
>> > > PgPool with very short delay. The clients which are in other IP
>> network
>> > > segments see the new active PgPool server with more than 30 min.
>> delay!
>> > > >
>> > > > Investigating of the problem found that the ARP table of the
>> > > router-switch (L3 Cisco switch), responsible for routing the
>> IP-network
>> > > where the virtual PgPool IP address is, is not updated adequately.
>> The old
>> > > ARP entry, pointing the the old active PgPool server, is kept for
>> more than
>> > > 30 min. If I replace the ARP entry in the router-switch, the
>> connections
>> > > become possible immediately.
>> > > >
>> > > > Currently I'm investigating the problem. It's rather a network
>> problem,
>> > > than PGPOOL problem.
>> > > > Probably important details are:
>> > > > - my virtual IP address and the primary interface of both PgPool
>> servers
>> > > are in the same IP-network segment
>> > > > - my servers are GNU/Debian Linux and I use bond fromm 4 physical
>> UTP
>> > > cards. The virtual IP is a second IP-address of the bond interface.
>> > > > - the two PgPool servers are connected on different LAN swtches.
>> These
>> > > two switches are uplinked through the router-switch, the same
>> router-switch
>> > > which cause the problem with ARP table update.
>> > > >
>> > > > So you could check if your problem is not a netowrk problem, as mine
>> > > seems to be.
>> > > >
>> > > > Best Regards,
>> > > > Lazar Krumov
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > >
>> > > > > Hello guys:
>> > > > > I am running pgpool 3.5.2 in a 2 node cluster with postgresql-9.5
>> in
>> > > > > master-slave and streaming replication mode.
>> > > > >
>> > > > > I have been testing failover and failback for a while. While doing
>> > > > > failover by shutting down the processes, everything looks fine.
>> > > > >
>> > > > >
>> > > > > The one test that is failing is when i shutdown the interface of
>> one
>> > > node.
>> > > > > Right now i am shutting down the interface of the slave node (both
>> > > pgool
>> > > > > slave and postgresql slave).
>> > > > >
>> > > > > The problem i am running into is that after doing this, all
>> > > connections to
>> > > > > my database (through pgpool) hang. i am testing it with psql, and
>> psql
>> > > just
>> > > > > hangs and does not give me an output. When i bring the interface
>> of the
>> > > > > slave back up, and try to connect to psql again, it looks like
>> pgpool
>> > > never
>> > > > > marked the postgresql node as disconnected.
>> > > > >
>> > > > > I tried both with and without health check, and also tried with
>> > > different
>> > > > > values of health_check_timeout. my connect timeout value is
>> default (10
>> > > > > seconds).
>> > > > >
>> > > > > Any one has encountered this issue? I just dont see pgpool
>> attempting
>> > > to
>> > > > > do any failover. Pgpool is still running though, i can see the
>> logs
>> > > still
>> > > > > comming. I just never see an error.
>> > > > >
>> > > > > I am NOT using interface monitoring, and i would prefer not to
>> use it.
>> > > > >
>> > > > > Any pointers on how could i troubleshoot this?
>> > > > > Thanks.
>> > > > > Regards
>> > > > >
>> > > > >
>> > > > -------------- next part --------------
>> > > > An HTML attachment was scrubbed...
>> > > > URL: <
>> > >
>> http://www.sraoss.jp/pipermail/pgpool-general/attachments/20160510/2488e9f9/attachment-0001.html
>> > > >
>> > > >
>> > > > ------------------------------
>> > >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20160512/ff0b046e/attachment.html>