[Pgpool-general] PgPool install, Part Deux

Tue Jul 14 07:26:39 UTC 2009

> I can imagine how tricky it is.. Two options:
> 1 - pgpool should 'read' the definition of every table before sending a
> query to backends, keep a 'pseudo-catalog' in memory that describes every
> table and then 'filter' queries making proper substitutions of default
> dates, etc.. Only add to its catalog the definition of queried tables, as a
> cache.

Actually we already do this (read system catalog and cache it) for
judging if the target table has a SERIAL column and needs to issue
LOCK TABLE. See need_insert_lock defined in pool_process_query.c.

> 2 - The administrator would somehow indicate to pgpool which tables are
> 'special' and need to be 'filtered'. Something to the paralell mode stuff..
> 
> Yes, another question is the impact on pgpool's performance...:( So, the 3rd
> option could be just to filter queries where there are 'insert into t1
> values (1, now());', pgpool to generate those dates and just not to support
> tables with 'current_timestamp' in its definition.. It would already reduce
> the list of restrictions...
> 
> 
> 2009/7/14 Tatsuo Ishii <ishii at sraoss.co.jp>
> 
> > Yes, once I thought about the same idea...
> >
> > The major problem is the case now() is used as the default:
> >
> > CREATE TABLE t1(i TIMESTAMP DEFAULT CURRENT_TIMESTAMP, j INTEGER);
> > INSERT INTO t1(j) VALUES(1);
> >
> > In this case we need to rewrite the INSERT to:
> >
> > INSERT INTO t1(i,j) VALUES('2009-07-13 12:02:49.521097+04', 1);
> >
> > This is a little bit tricky...
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> > > Of course we dont..
> > > And, what about intercepting the now() functions and make those
> > timestamps
> > > to be generated by pgpool? It is, substitute this query:
> > >   insert into insert into test values (1, now());
> > > by:
> > >   insert into test values (1, '2009-07-13 12:02:49.521097+04');
> > >
> > > 2009/7/13 Tatsuo Ishii <ishii at sraoss.co.jp>
> > >
> > > > > > Basically, with your example, it is easy to see how now() could be
> > a
> > > > > > problem, as if there is a slight delay on backend2 over backend1,
> > the
> > > > dates
> > > > > > could / would drift some ...
> > > > >
> > > > >
> > > > > I tested that situation with an small table many days ago.. Sort of:
> > > > >
> > > > >   create table test (a integer, b timestamp);
> > > > >   insert into test values (1, now());
> > > > >
> > > > > when i stated "select b from test;" in every node, i got indeed
> > slightly
> > > > > different values, in thousandths of seconds..
> > > > > Even more, when I executed the select through pgpool, it returned
> > > > different
> > > > > values on different sessions.. But pgpool anyway continued to work
> > > > > normally.. I thought that it would detect the data mismatch and
> > degrade
> > > > one
> > > > > of the nodes...
> > > >
> > > > Actually long time ago once we did it (checks the actually returned
> > > > rows). What we found was, it's terribly slow. Do we want to do it
> > > > again?
> > > > --
> > > > Tatsuo Ishii
> > > > SRA OSS, Inc. Japan