No subject
Sat Nov 16 19:48:32 JST 2013
pcp_attach_node command. However, it should be confirmed that PostgreSQL<br=
>
#0 is primary and #1 is standby actually.<br>
<div class=3D""><br>
><br>
> it is not. i can connect directly (psql) or via pgpool (psql -h virtua=
l_ip)<br>
><br>
> I can't recovery the node either.<br>
><br>
> # pcp_recovery_node 10 localhost 9898 postgres postgres 2<br>
> BackendError<br>
<br>
</div>I think the reason is that pgpool regards backend #0 as down<br>
as shown by pcp_node_info. Some clues of teh failure would<br>
left in backend's log (#0 or #1), since recovery command is<br>
executed in backend server by postgres.<br>
<div class=3D""><br>
><br>
><br>
> in pgpool.log:<br>
><br>
> Mar 11 11:39:30 postgresql pgpool[14723]: starting recovering node 2<b=
r>
> Mar 11 11:39:30 postgresql pgpool[14723]: starting recovery command:<b=
r>
> "SELECT pgpool_recovery('basebackup.sh', 'replica'=
;,<br>
> '/var/lib/pgsql/9.0/datastb')"<br>
> Mar 11 11:39:30 postgresql pgpool[14723]: exec_recovery: basebackup.sh=
<br>
> command failed at 1st stage<br>
><br>
><br>
> in the standby node,<br>
><br>
> # pcp_node_info 10 localhost 9898 postgres postgres 0<br>
</div>> prod 5432 *1* 1.000000<br>
<div class=3D"">> # pcp_node_info 10 localhost 9898 postgres postgres 1<=
br>
</div>> replica 5432* 1* 0.000000<br>
<div class=3D"HOEnZb"><div class=3D"h5">> # pcp_node_info 10 localhost 9=
898 postgres postgres 2<br>
> replica 5444 3 0.000000<br>
><br>
> # pcp_recovery_node 10 localhost 9898 postgres postgres 2<br>
><br>
> and the recovery starts ....<br>
> but... database is quite big, so i see<br>
> =A0 =A0 =A0 =A0 new structure in standby server =A0/datastb and subdir=
ectories<br>
> =A0 =A0 =A0 =A0 rsync command in master node...<br>
><br>
> any clue why pcp commands are not working in mster node!!?<br>
><br>
><br>
><br>
> tks!<br>
><br>
><br>
><br>
><br>
> On Wed, Feb 19, 2014 at 7:50 AM, Yugo Nagata <<a href=3D"mailto:nag=
ata at sraoss.co.jp">nagata at sraoss.co.jp</a>> wrote:<br>
><br>
> > On Fri, 14 Feb 2014 18:39:10 -0200<br>
> > Gonzalo Gil <<a href=3D"mailto:gonxalo2000 at gmail.com">gonxalo2=
000 at gmail.com</a>> wrote:<br>
> ><br>
> > > I changed it and works fine.<br>
> > > just one more thing.<br>
> > > when one node gets down (pgpool process, not database), it t=
akes one<br>
> > minute<br>
> > > and a half to the other to make itself primary...<br>
> > > i change some parameters but it still takes 1,5 minutes to s=
et maste<br>
> > pgpool<br>
> > > node<br>
> ><br>
> > It is possible depending on parameter configuration, but I can=
9;t identify<br>
> > the cause.<br>
> > Could you please send your pgpool.conf and logs?<br>
> ><br>
> > ><br>
> > ><br>
> > > is it possible?<br>
> > > how so?<br>
> > ><br>
> > > tanks again<br>
> > ><br>
> > ><br>
> > > On Thu, Feb 13, 2014 at 11:26 AM, Gonzalo Gil <<a href=3D=
"mailto:gonxalo2000 at gmail.com">gonxalo2000 at gmail.com</a>><br>
> > wrote:<br>
> > ><br>
> > > > Great!<br>
> > > ><br>
> > > ><br>
> > > > On Thu, Feb 13, 2014 at 12:23 PM, Yugo Nagata <<a hr=
ef=3D"mailto:nagata at sraoss.co.jp">nagata at sraoss.co.jp</a>><br>
> > wrote:<br>
> > > ><br>
> > > >> On Thu, 13 Feb 2014 11:59:20 -0200<br>
> > > >> Gonzalo Gil <<a href=3D"mailto:gonxalo2000 at gmail=
.com">gonxalo2000 at gmail.com</a>> wrote:<br>
> > > >><br>
> > > >> > YES! it works!<br>
> > > >><br>
> > > >> I'm glad to hear that.<br>
> > > >><br>
> > > >> ><br>
> > > >> > i will install heartbear.... but i'm testi=
ng instalation and i take<br>
> > the<br>
> > > >> > easy way...<br>
> > > >> > i let you know when i got it running<br>
> > > >><br>
> > > >> You don't need to install heartbeat (Pacemaker)=
. Watchdog's heartbeat<br>
> > > >> mode is<br>
> > > >> pgpool-II's built-in functionality. For the mos=
t simple configuration,<br>
> > > >> what<br>
> > > >> you need to do is:<br>
> > > >><br>
> > > >> wd_lifecheck_method =3D 'heartbeat'<br>
> > > >><br>
> > > >> wd_heartbeat_port =3D 9694<br>
> > > >> wd_heartbeat_keepalive =3D 2<br>
> > > >> wd_heartbeat_deadtime =3D 30<br>
> > > >><br>
> > > >> heartbeat_destination0 =3D 'tad2' =A0 =A0 =
=A0 <=3D 'tad1' in tad2 server<br>
> > > >> heartbeat_destination_port0 =3D 9694<br>
> > > >> heartbeat_device0 =3D ''<br>
> > > >><br>
> > > >><br>
> > > >> ><br>
> > > >> > tks a lot!!<br>
> > > >> ><br>
> > > >> ><br>
> > > >> > On Thu, Feb 13, 2014 at 8:47 AM, Yugo Nagata &=
lt;<a href=3D"mailto:nagata at sraoss.co.jp">nagata at sraoss.co.jp</a>><br>
> > > >> wrote:<br>
> > > >> ><br>
> > > >> > > Hi,<br>
> > > >> > ><br>
> > > >> > > On Wed, 12 Feb 2014 12:05:56 -0200<br>
> > > >> > > Gonzalo Gil <<a href=3D"mailto:gonxalo=
2000 at gmail.com">gonxalo2000 at gmail.com</a>> wrote:<br>
> > > >> > ><br>
> > > >> > > > i think it does not work...<br>
> > > >> > ><br>
> > > >> > > I'm sorry for jumping to a wring conc=
lusion. load_balance_mode is<br>
> > > >> > > irrelevant.<br>
> > > >> > > The problem is that, pgpool-II considers =
myself as down before<br>
> > > >> failover is<br>
> > > >> > > done completely. Before failover complete=
d, pgpool-II's child<br>
> > process<br>
> > > >> > > doesn't<br>
> > > >> > > know the backend server is down, hence li=
fecheck query 'SELECT 1'<br>
> > > >> fails,<br>
> > > >> > > and<br>
> > > >> > > pgpool-II consider itself in down status.=
<br>
> > > >> > ><br>
> > > >> > > To avoid this, health check should be don=
e more frequently, or,<br>
> > > >> lifecheck<br>
> > > >> > > interval should be larger. In your config=
uration,<br>
> > > >> health_check_max_retries<br>
> > > >> > > =3D 3<br>
> > > >> > > and helth_check_retry_delay =3D 10. So, i=
t takes more than 30<br>
> > seconds to<br>
> > > >> > > detect<br>
> > > >> > > backend DB's down and start failover.=
However, wd_interval =3D 5 and<br>
> > > >> > > wd_life_point =3D 3.<br>
> > > >> > > So, it is about 15 to 20 seconds before p=
gpool-II decide to go to<br>
> > down<br>
> > > >> > > status.<br>
> > > >> > ><br>
> > > >> > > Could you please try edit pgpool.conf? Fo=
r example:<br>
> > > >> > ><br>
> > > >> > > health_check_max_retries =3D 2<br>
> > > >> > > health_check_retry_delay =3D 5<br>
> > > >> > > wd_interval =3D 10<br>
> > > >> > > wd_life_point =3D 3;<br>
> > > >> > ><br>
> > > >> > > In fact, I recommend to use heartbeat mod=
e instead of query mode.<br>
> > > >> This mode<br>
> > > >> > > doesn't issue query like 'SELECT =
1' for checking pgpool status.<br>
> > So,<br>
> > > >> this<br>
> > > >> > > avoids<br>
> > > >> > > the kind of problem.<br>
> > > >> > ><br>
> > > >> > > ><br>
> > > >> > > ><br>
> > > >> > > > <a href=3D"http://172.16.62.141/stat=
us.php" target=3D"_blank">http://172.16.62.141/status.php</a><br>
> > > >> > > > =A0 =A0 =A0 =A0 =A0IP Address =A0 =
=A0 =A0 =A0 Port =A0 =A0 =A0 =A0 Status =A0 =A0 =A0 =A0 Weight<br>
> > > >> > > ><br>
> > > >> > > > node 0 =A0 =A0 =A0 =A0 tad1 =A0 =A0 =
=A0 =A0 5432 =A0 =A0 =A0 =A0 Up. Connected. Running<br>
> > as<br>
> > > >> > > primary<br>
> > > >> > > > server =A0 =A0 =A0 =A0 postgres: Up =
=A0 =A0 =A0 =A0 0.500 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 |<br>
> > > >> > > > node 1 =A0 =A0 =A0 =A0 tad2 =A0 =A0 =
=A0 =A0 5432 =A0 =A0 =A0 =A0 Up. Connected. Running<br>
> > as<br>
> > > >> > > standby<br>
> > > >> > > > server =A0 =A0 =A0 =A0 postgres: Up =
=A0 =A0 =A0 =A0 0.500 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 |<br>
> > > >> > > ><br>
> > > >> > > > <a href=3D"http://172.16.62.142/stat=
us.php" target=3D"_blank">http://172.16.62.142/status.php</a><br>
> > > >> > > > =A0 =A0 =A0 =A0 =A0IP Address =A0 =
=A0 =A0 =A0 Port =A0 =A0 =A0 =A0 Status =A0 =A0 =A0 =A0 Weight<br>
> > > >> > > ><br>
> > > >> > > > node 0 =A0 =A0 =A0 =A0 tad1 =A0 =A0 =
=A0 =A0 5432 =A0 =A0 =A0 =A0 Up. Connected. Running<br>
> > as<br>
> > > >> > > primary<br>
> > > >> > > > server =A0 =A0 =A0 =A0 postgres: Up =
=A0 =A0 =A0 =A0 0.500 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 |<br>
> > > >> > > > node 1 =A0 =A0 =A0 =A0 tad2 =A0 =A0 =
=A0 =A0 5432 =A0 =A0 =A0 =A0 Up. Connected. Running<br>
> > as<br>
> > > >> > > standby<br>
> > > >> > > > server =A0 =A0 =A0 =A0 postgres: Up =
=A0 =A0 =A0 =A0 0.500 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 |<br>
> > > >> > > ><br>
> > > >> > > > shutdown =A0141, node0, tad1...<br>
> > > >> > > ><br>
> > > >> > > ><br>
> > > >> > > ><br>
> > > >> > > > i attach logs....<br>
> > > >> > > ><br>
> > > >> > > ><br>
> > > >> > > > this was the final result....<br>
> > > >> > > > ---><br>
> > > >> > > > =A0 =A0 =A0 =A0 =A0IP Address =A0 =
=A0 =A0 =A0 Port =A0 =A0 =A0 =A0 Status =A0 =A0 =A0 =A0 Weight<br>
> > > >> > > ><br>
> > > >> > > > node 0 =A0 =A0 =A0 =A0 tad1 =A0 =A0 =
=A0 =A0 5432 =A0 =A0 =A0 =A0 Down =A0 =A0 =A0 =A0 postgres:<br>
> > Down<br>
> > > >> > > > =A0 =A0 =A0 =A0 0.500 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 |<br>
> > > >> > > > node 1 =A0 =A0 =A0 =A0 tad2 =A0 =A0 =
=A0 =A0 5432 =A0 =A0 =A0 =A0 Up. Connected. Running<br>
> > as<br>
> > > >> > > standby<br>
> > > >> > > > server =A0 =A0 =A0 =A0 postgres: Up =
=A0 =A0 =A0 =A0 0.500 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 |<br>
> > > >> > > > <---<br>
> > > >> > > ><br>
> > > >> > > ><br>
> > > >> > > ><br>
> > > >> > > > On Wed, Feb 12, 2014 at 4:11 AM, Yug=
o Nagata <<br>
> > <a href=3D"mailto:nagata at sraoss.co.jp">nagata at sraoss.co.jp</a>>=
;<br>
> > > >> > > wrote:<br>
> > > >> > > ><br>
> > > >> > > > > Hi,<br>
> > > >> > > > ><br>
> > > >> > > > > Thanks for sending confs & =
logs.<br>
> > > >> > > > ><br>
> > > >> > > > > I found that this problem occur=
s when load_balance_mode =3D off.<br>
> > > >> > > > > Could you please try with load_=
balance_mode =3D on?<br>
> > > >> > > > ><br>
> > > >> > > > > I'll continue to analyze th=
e detailed reason.<br>
> > > >> > > > ><br>
> > > >> > > > > On Mon, 10 Feb 2014 11:40:41 -0=
200<br>
> > > >> > > > > Gonzalo Gil <<a href=3D"mail=
to:gonxalo2000 at gmail.com">gonxalo2000 at gmail.com</a>> wrote:<br>
> > > >> > > > ><br>
> > > >> > > > > > i send the message but it =
was too long.<br>
> > > >> > > > > > i'll attach the files.=
...<br>
> > > >> > > > > ><br>
> > > >> > > > > > it happens again, even whe=
n node 2 was the postgres standby<br>
> > > >> node.<br>
> > > >> > > > > ><br>
> > > >> > > > > > after i put the logs here,=
i shutdown node 1 (it has the<br>
> > primary<br>
> > > >> > > > > database)<br>
> > > >> > > > > > and it happens the same th=
ing. node 2 lost ip and no<br>
> > failover<br>
> > > >> > > happens.<br>
> > > >> > > > > ><br>
> > > >> > > > > ><br>
> > > >> > > > > > TKS!<br>
> > > >> > > > > ><br>
> > > >> > > > > ><br>
> > > >> > > > > ><br>
> > > >> > > > > ><br>
> > > >> > > > > > On Mon, Feb 10, 2014 at 5:=
23 AM, Yugo Nagata <<br>
> > > >> <a href=3D"mailto:nagata at sraoss.co.jp">nagata at sraos=
s.co.jp</a>><br>
> > > >> > > > > wrote:<br>
> > > >> > > > > ><br>
> > > >> > > > > > > Hi,<br>
> > > >> > > > > > ><br>
> > > >> > > > > > > This is odd that pgpo=
ol-1 losts VIP when server2 goes<br>
> > down.<br>
> > > >> For<br>
> > > >> > > > > analysis,<br>
> > > >> > > > > > > could you please send=
pgpool.conf and log output (of both<br>
> > > >> pgpool1<br>
> > > >> > > and<br>
> > > >> > > > > > > pgpool2)?<br>
> > > >> > > > > > ><br>
> > > >> > > > > > > On Tue, 4 Feb 2014 13=
:38:16 -0200<br>
> > > >> > > > > > > Gonzalo Gil <<a hr=
ef=3D"mailto:gonxalo2000 at gmail.com">gonxalo2000 at gmail.com</a>> wrote:<br=
>
> > > >> > > > > > ><br>
> > > >> > > > > > > > Hello Tatsuo Ish=
ii. I send some query mails to<br>
> > > >> > > > > > > > pgpool-general at p=
gpool.netbut i don't get my own<br>
> > messagese.<br>
> > > >> But<br>
> > > >> > > i do<br>
> > > >> > > > > > > > recieve other ma=
ils from the<br>
> > > >> > > > > > > > forum.<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > Can you answer m=
e some questions or forward them to the<br>
> > > >> forum!?<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > I'm runing p=
gpool with streaming replication: pgpool1 -<br>
> > db<br>
> > > >> > > postgres1<br>
> > > >> > > > > > > > (server 1) and p=
gpool2 - db postgres 2 (server 2).<br>
> > > >> > > > > > > > I'm using wa=
tchdog with a virtual ip and<br>
> > life_check_query.<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > It's all con=
figured and working .... more or less....<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > INIT: I start my=
system: postgres1 is standby database<br>
> > and<br>
> > > >> > > postgres2<br>
> > > >> > > > > > > > is master (strea=
ming replication).<br>
> > > >> > > > > > > > pgpool1 has the =
virtual ip.(and pgpool2 no, obviously)<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > i connect to dat=
abase via pgpool and everithing is ok.<br>
> > > >> > > > > > > > i stop postgres1=
and nothing happens because i check<br>
> > > >> new_master<br>
> > > >> > > <><br>
> > > >> > > > > > > > old_master (no m=
aster failure).<br>
> > > >> > > > > > > > i start postgres=
1 again (and returning it with<br>
> > pgpoolAdmin)<br>
> > > >> or<br>
> > > >> > > call a<br>
> > > >> > > > > > > > recovery and it =
works great.<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > I stop postgres2=
and failover fires ... and i get<br>
> > postgres1<br>
> > > >> as<br>
> > > >> > > the<br>
> > > >> > > > > new<br>
> > > >> > > > > > > > primary.<br>
> > > >> > > > > > > > and so on...<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > this works fine.=
<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > i go back to INI=
T again....<br>
> > > >> > > > > > > > and i do in serv=
er2<br>
> > > >> > > > > > > > reboot -h now<br=
>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > i see in the ser=
ver1 (pgpool1) log that pgpool2 is<br>
> > down...ok<br>
> > > >> > > > > > > > watching the log=
, i see pgpool1 lost the virtual ip<br>
> > address<br>
> > > >> > > > > (!?)....and<br>
> > > >> > > > > > > > tell me to resta=
rt pgpool....(!?)<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > i restart it and=
i see that failover fires ... but in<br>
> > the<br>
> > > >> > > failover<br>
> > > >> > > > > > > script i<br>
> > > >> > > > > > > > get new_master_n=
ode =3D old_master_node ...and thus i do<br>
> > not<br>
> > > >> make<br>
> > > >> > > > > touch and<br>
> > > >> > > > > > > > postgres1 keeps =
as a standby...<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > I change failove=
r.sh (and the command in the<br>
> > pgpool.conf). i<br>
> > > >> > > include<br>
> > > >> > > > > all<br>
> > > >> > > > > > > > parameters to se=
e it's values when failover.sh start....<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > Then, i restart =
serve2 and "return" database to<br>
> > pgpool....<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > again, pgpool1 h=
as the virtual ip.<br>
> > > >> > > > > > > > i stop database =
in node 2 and failover fires.... but<br>
> > > >> pgpool2 does<br>
> > > >> > > > > > > it....and<br>
> > > >> > > > > > > > pgpool1 too (!?)=
<br>
> > > >> > > > > > > > i check network =
activity and saw that pgpool2 connects<br>
> > to<br>
> > > >> > > server1 and<br>
> > > >> > > > > > > make<br>
> > > >> > > > > > > > the touch and i =
did see log from pgpool1 firing the<br>
> > failover<br>
> > > >> > > command<br>
> > > >> > > > > > > too....<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > Cuestions....<br=
>
> > > >> > > > > > > > 1. why pgpool1 l=
ost virtual ip and ask me to restart!?<br>
> > > >> > > > > > > > 2. why pgpool2 f=
ires failover? i thought just the<br>
> > "primary"<br>
> > > >> > > pgpool<br>
> > > >> > > > > (the<br>
> > > >> > > > > > > one<br>
> > > >> > > > > > > > with the virtual=
ip) fires it.<br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > ><br>
> > > >> > > > > > > > i hope you under=
stand mr.<br>
> > > >> > > > > > > > tks a lot for yo=
ur time..<br>
> > > >> > > > > > > > sorry for my eng=
lish.<br>
> > > >> > > > > > ><br>
> > > >> > > > > > ><br>
> > > >> > > > > > > --<br>
> > > >> > > > > > > Yugo Nagata <<a hr=
ef=3D"mailto:nagata at sraoss.co.jp">nagata at sraoss.co.jp</a>><br>
> > > >> > > > > > ><br>
> > > >> > > > ><br>
> > > >> > > > ><br>
> > > >> > > > > --<br>
> > > >> > > > > Yugo Nagata <<a href=3D"mail=
to:nagata at sraoss.co.jp">nagata at sraoss.co.jp</a>><br>
> > > >> > > > ><br>
> > > >> > ><br>
> > > >> > ><br>
> > > >> > > --<br>
> > > >> > > Yugo Nagata <<a href=3D"mailto:nagata@=
sraoss.co.jp">nagata at sraoss.co.jp</a>><br>
> > > >> > ><br>
> > > >><br>
> > > >><br>
> > > >> --<br>
> > > >> Yugo Nagata <<a href=3D"mailto:nagata at sraoss.co.=
jp">nagata at sraoss.co.jp</a>><br>
> > > >><br>
> > > ><br>
> > > ><br>
> ><br>
> ><br>
> > --<br>
> > Yugo Nagata <<a href=3D"mailto:nagata at sraoss.co.jp">nagata at sra=
oss.co.jp</a>><br>
> ><br>
<br>
<br>
</div></div><span class=3D"HOEnZb"><font color=3D"#888888">--<br>
Yugo Nagata <<a href=3D"mailto:nagata at sraoss.co.jp">nagata at sraoss.co.jp<=
/a>><br>
</font></span></blockquote></div><br></div>
--001a11c35d1493f03604f46daf21--
More information about the pgpool-general
mailing list