[pgpool-general: 2577] Re: pgpoll failure

Thu Feb 13 19:47:23 JST 2014

Hi,

On Wed, 12 Feb 2014 12:05:56 -0200
Gonzalo Gil <gonxalo2000 at gmail.com> wrote:

> i think it does not work...

I'm sorry for jumping to a wring conclusion. load_balance_mode is irrelevant.
The problem is that, pgpool-II considers myself as down before failover is
done completely. Before failover completed, pgpool-II's child process doesn't
know the backend server is down, hence lifecheck query 'SELECT 1' fails, and
pgpool-II consider itself in down status.

To avoid this, health check should be done more frequently, or, lifecheck
interval should be larger. In your configuration, health_check_max_retries = 3
and helth_check_retry_delay = 10. So, it takes more than 30 seconds to detect
backend DB's down and start failover. However, wd_interval = 5 and wd_life_point = 3.
So, it is about 15 to 20 seconds before pgpool-II decide to go to down status.

Could you please try edit pgpool.conf? For example:

health_check_max_retries = 2
health_check_retry_delay = 5
wd_interval = 10
wd_life_point = 3;

In fact, I recommend to use heartbeat mode instead of query mode. This mode
doesn't issue query like 'SELECT 1' for checking pgpool status. So, this avoids
the kind of problem.

> 
> 
> http://172.16.62.141/status.php
>          IP Address         Port         Status         Weight
> 
> node 0         tad1         5432         Up. Connected. Running as primary
> server         postgres: Up         0.500                 |
> node 1         tad2         5432         Up. Connected. Running as standby
> server         postgres: Up         0.500                 |
> 
> http://172.16.62.142/status.php
>          IP Address         Port         Status         Weight
> 
> node 0         tad1         5432         Up. Connected. Running as primary
> server         postgres: Up         0.500                 |
> node 1         tad2         5432         Up. Connected. Running as standby
> server         postgres: Up         0.500                 |
> 
> shutdown  141, node0, tad1...
> 
> 
> 
> i attach logs....
> 
> 
> this was the final result....
> --->
>          IP Address         Port         Status         Weight
> 
> node 0         tad1         5432         Down         postgres: Down
>         0.500                 |
> node 1         tad2         5432         Up. Connected. Running as standby
> server         postgres: Up         0.500                 |
> <---
> 
> 
> 
> On Wed, Feb 12, 2014 at 4:11 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
> 
> > Hi,
> >
> > Thanks for sending confs & logs.
> >
> > I found that this problem occurs when load_balance_mode = off.
> > Could you please try with load_balance_mode = on?
> >
> > I'll continue to analyze the detailed reason.
> >
> > On Mon, 10 Feb 2014 11:40:41 -0200
> > Gonzalo Gil <gonxalo2000 at gmail.com> wrote:
> >
> > > i send the message but it was too long.
> > > i'll attach the files....
> > >
> > > it happens again, even when node 2 was the postgres standby node.
> > >
> > > after i put the logs here, i shutdown node 1 (it has the primary
> > database)
> > > and it happens the same thing. node 2 lost ip and no failover happens.
> > >
> > >
> > > TKS!
> > >
> > >
> > >
> > >
> > > On Mon, Feb 10, 2014 at 5:23 AM, Yugo Nagata <nagata at sraoss.co.jp>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > This is odd that pgpool-1 losts VIP when server2 goes down. For
> > analysis,
> > > > could you please send pgpool.conf and log output (of both pgpool1 and
> > > > pgpool2)?
> > > >
> > > > On Tue, 4 Feb 2014 13:38:16 -0200
> > > > Gonzalo Gil <gonxalo2000 at gmail.com> wrote:
> > > >
> > > > > Hello Tatsuo Ishii. I send some query mails to
> > > > > pgpool-general at pgpool.netbut i don't get my own messagese. But i do
> > > > > recieve other mails from the
> > > > > forum.
> > > > >
> > > > > Can you answer me some questions or forward them to the forum!?
> > > > >
> > > > >
> > > > > I'm runing pgpool with streaming replication: pgpool1 - db postgres1
> > > > > (server 1) and pgpool2 - db postgres 2 (server 2).
> > > > > I'm using watchdog with a virtual ip and life_check_query.
> > > > >
> > > > > It's all configured and working .... more or less....
> > > > >
> > > > > INIT: I start my system: postgres1 is standby database and postgres2
> > > > > is master (streaming replication).
> > > > > pgpool1 has the virtual ip.(and pgpool2 no, obviously)
> > > > >
> > > > > i connect to database via pgpool and everithing is ok.
> > > > > i stop postgres1 and nothing happens because i check new_master <>
> > > > > old_master (no master failure).
> > > > > i start postgres1 again (and returning it with pgpoolAdmin) or call a
> > > > > recovery and it works great.
> > > > >
> > > > > I stop postgres2 and failover fires ... and i get postgres1 as the
> > new
> > > > > primary.
> > > > > and so on...
> > > > >
> > > > > this works fine.
> > > > >
> > > > >
> > > > > i go back to INIT again....
> > > > > and i do in server2
> > > > > reboot -h now
> > > > >
> > > > > i see in the server1 (pgpool1) log that pgpool2 is down...ok
> > > > > watching the log, i see pgpool1 lost the virtual ip address
> > (!?)....and
> > > > > tell me to restart pgpool....(!?)
> > > > >
> > > > > i restart it and i see that failover fires ... but in the failover
> > > > script i
> > > > > get new_master_node = old_master_node ...and thus i do not make
> > touch and
> > > > > postgres1 keeps as a standby...
> > > > >
> > > > >
> > > > > I change failover.sh (and the command in the pgpool.conf). i include
> > all
> > > > > parameters to see it's values when failover.sh start....
> > > > >
> > > > > Then, i restart serve2 and "return" database to pgpool....
> > > > >
> > > > > again, pgpool1 has the virtual ip.
> > > > > i stop database in node 2 and failover fires.... but pgpool2 does
> > > > it....and
> > > > > pgpool1 too (!?)
> > > > > i check network activity and saw that pgpool2 connects to server1 and
> > > > make
> > > > > the touch and i did see log from pgpool1 firing the failover command
> > > > too....
> > > > >
> > > > >
> > > > >
> > > > > Cuestions....
> > > > > 1. why pgpool1 lost virtual ip and ask me to restart!?
> > > > > 2. why pgpool2 fires failover? i thought just the "primary" pgpool
> > (the
> > > > one
> > > > > with the virtual ip) fires it.
> > > > >
> > > > >
> > > > > i hope you understand mr.
> > > > > tks a lot for your time..
> > > > > sorry for my english.
> > > >
> > > >
> > > > --
> > > > Yugo Nagata <nagata at sraoss.co.jp>
> > > >
> >
> >
> > --
> > Yugo Nagata <nagata at sraoss.co.jp>
> >

-- 
Yugo Nagata <nagata at sraoss.co.jp>