[pgpool-general: 1393] Re: pgpool using an out of sync database ?

Gilbert Soucy gsoucy at 36pix.com
Fri Feb 15 12:38:41 JST 2013


Hi

Thanks for the reply.

In my tests, it seems that as long as one of the pgpool stays up, the state
of a database (in-sync or not) is correctly preserved. When they both go
down, the out-of-sync state of the databases can be lost.

In your description, it might be technically ok that pgpool accepts to use
postgres2 (which is out of sync) but I was not expecting this behavior. I
think that the database that is marked out of sync should remain marked out
of sync untill a human goes and says to resync it. If a database previously
marked out of sync is made the new master just because of a specific
sequence of start up events, I think that this is risky.

I suppose that it is up to us to implement a failover command (
failover_command in pgpool.conf ) to forbid a database marked as out of
sync to start again until we can sync it properly and explicitly (ex: from
backup and transaction log). I will probably do something like that.
Luckily, the sequence of events to reproduce this problem is unlikely but
still certainly possible.

Thanks

Gilbert

On Thu, Feb 14, 2013 at 9:47 PM, Nozomi Anzai <anzai at sraoss.co.jp> wrote:

> Hi,
>
> Finally, I think pgpool worked correctly.
>
> > Hello,
> >
> > I attach the logs for pgpool on server1 and server2.
> >
> > The sequence is as described in my first email. Server 2 was made out of
> > sync (simply stoppping postgres on server2) :
> >
> > [root at daphne-d pg_log]# psql -p 5431 -h 192.168.0.109 -c "show
> pool_nodes;"
> > -U postgres
> > psql: [root at daphne-d pg_log]# psql -p 5431 -h 192.168.0.109 -c "show
> > pool_nodes;" -U postgres
> >  node_id |   hostname    | port | status | lb_weight |  role
> > ---------+---------------+------+--------+-----------+--------
> >  0       | 192.168.0.102 | 5432 | 2      | 0.500000  | master
> >  1       | 192.168.0.103 | 5432 | 3      | 0.500000  | slave
> > (2 rows)
> >
> > and, then:
> >
> > - stop pgpool 1 and pgpool2
> > - stop postgres1,
> > - start postgres2
> > - start pgpool2
>
> At this time, pgpool tried to connect postgres1 and failed, and failover
> was executed. So pgpool marked postgres1 as down and desided postgres2
> (node 1) is new master.
> If pgpool2 started with both of postgres1 and postgres2, failover didn't
> occur.
>
> ----
> 2013-02-13 22:07:45 ERROR: pid 18382: connect_inet_domain_socket:
> connect() failed: Connection refused
> 2013-02-13 22:07:45 ERROR: pid 18382: connection to 192.168.0.102(5432)
> failed
> (snip)
> 2013-02-13 22:07:45 LOG:   pid 18341: Restart all children
> 2013-02-13 22:07:45 LOG:   pid 18341: failover: set new primary node: -1
> 2013-02-13 22:07:45 LOG:   pid 18341: failover: set new master node: 1
> 2013-02-13 22:07:45 LOG:   pid 18341: failover done. shutdown host
> 192.168.0.102(5432)
> ----
>
> Why postres2, which is out of sync in fact, can be the master node as
> follows:
> Pgpool don't know if postgres2 is exactly synced or not when pgpool
> just starts.
> It can recognize the differences among nodes by results of queries
> (succeeded or failed, and if each counts of rows is the same).
>
>
> And,
>
> > - stop pgpool2
> > - start pgpool2
>
> at this time, failover was executed due to the same reason, too.
>
> >
> > and, after those events, the situation was inverse:
> >
> > [root at daphne-d pg_log]# psql -p 5431 -h 192.168.0.109 -c "show
> pool_nodes;"
> > -U postgres
> >  node_id |   hostname    | port | status | lb_weight |  role
> > ---------+---------------+------+--------+-----------+--------
> >  0       | 192.168.0.102 | 5432 | 3      | 0.500000  | slave
> >  1       | 192.168.0.103 | 5432 | 2      | 0.500000  | master
> > (2 rows)
> >
> > Server2 was out of sync and only stopping postgres on server1 and
> > restarting pgpool on server2 created that situation. Server2 was never
> > synchronized explicitly but at it end of this sequence, it is the only
> live
> > node.
> >
> > Thanks
> >
> > Gilbert
> >
> >
> >
> > On Wed, Feb 13, 2013 at 7:07 PM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >
> > > > Hello,
> > > >
> > > > I am still testing pgpool before deploying with our production
> database.
> > > >
> > > > When one of the databases goes down, my understanding is that it
> should
> > > be
> > > > marked as down by pgpool until we resync it manually. However, in the
> > > > following (admittedly paranoiac) scenario, I think that it does not:
> > > >
> > > > Here is the case. We have 2 identical computers both running pgpool
> and
> > > > postgres. Here is sequence of events:
> > > >
> > > > - we have DB1 and DB2 running fine, perfectly in sync (replication
> mode)
> > > > - DB2 goes down for some reason and comes back a little later, enough
> > > > though to be marked as down by pgpool
> > > > - we run like that a little while (DB2 now gets seriously out of
> sync)
> > > > - now, just as a test, do the following:
> > > >     - stop everything (pgpool1, pgpool2, DB1, DB2)
> > > >     - start DB2  and then pgpool2
> > > >     - everything is good, pgpool refuses to use DB2 (which is out of
> > > sync)
> > > >    - however, stop and start again pgpool2 (with DB1 down and DB2
> up) and
> > > > now pgpool hapilly starts using DB2 (while it was marked out of sync)
> > >
> > > > Is that a normal/desired behavior?  I would expect that DB2 should
> not be
> > > > used until I run a sync manually or I tell pgpool to forget about the
> > > > previous down status.
> > >
> > > Pgpool uses DB2 while it is marked out of sync? (or it *was* out of
> > > sync, but now it is in sync?) This should not happen unless you are in
> > > raw mode(that means both replication_mode = off and master_slave_mode
> > > = off).
> > >
> > > Any can you please show me pgpool log?
> > > --
> > > Tatsuo Ishii
> > > SRA OSS, Inc. Japan
> > > English: http://www.sraoss.co.jp/index_en.php
> > > Japanese: http://www.sraoss.co.jp
> > >
>
>
> --
> Nozomi Anzai
> SRA OSS, Inc. Japan
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130214/f9673fa8/attachment.html>


More information about the pgpool-general mailing list