[Pgpool-hackers] GSoC 2011 - "Caching query results in pgpool-II" specification

Wed Jun 1 04:03:27 UTC 2011

> > Hello! I am a GSoC 2011 student.
> > My GSoC project's name is "Caching query results in pgpool-II",
> > and I am sending a specification about it.
> > If you give me your opinion, I appreciate it so much.
> >
> > ---
> > ■external specification(for user)
> >
> > [synopsis]
> >
> > On memory query cache allows you to speed-up database access by reusing
> > SELECT results.
> >
> > - The cache can be stored in either shared memory or memcached (not
> both).
> >
> > - The result of SELECT using none IMMUTABLE functions is not cached.
> >
> > - If the size of SELECT result is larger than the value specified in
> > pgpool.conf, it is not cached.
> >
> > - Cache will be removed if one of following conditions are met:
> >
> >   - Cache expiration time exceeds
> >   - If cache is full, the least recently used cache is removed
> >   - If a table is dropped or modified, related cache data is deleted.
> >   - If a table definition is modified(ALTER TABLE), related cache data is
> > deleted.
> >   - If a schema or database is dropped, related cache data is deleted.
>
> How do you track DDL changes ? table drop/altered or database dropped as
> instance ?
>

I am going to prepare for a structure for tables and a structure for cache
data.
This structure for tables has table OID in RowDescription (table OID is
unique in DB clustor),
and cache_id. On the other hand, cache is stored in memory block and each
cache data
relate to cache_id of the structure for tables.

> > [pgpool.conf Setting]
> >
> > - To enable the memory cache functionality, set this to on (default is
> off)
> >
> > memory_cache_enabled = on/off
> >
> > - Memory cache behavior can be specified by cache_param directive.
> >
> > cache_param = 'method=shmem,  *1
> >
> >                memcached_host=,  *2
> >                memcached_port=,  *3
> >                total_size=,  *4
> >                expire=60,  *5
> >                maxcache=,  *6
> >                cache_block_size=8192' *7
> >
> > *1 Cache store method. Either "shmem"(shared memory) or "memcached".
> Default
> > is shmem.
> >
> > *2 Memcached host name. Mandatory if method=memcached. If this parameter
> is
> > starting with "/", UNIX domain socket is assumed.
> >
> > *3 Memcached port number. Mondatory if method=memcached.
> >
> > *4 Total memory size in bytes for storing memory cache. Mandatory if
> > method=shmem.
> >
> > *5 Memory cache entry life time specified in seconds. Default is 60.
> >
> > *6 Maximum SELECT result size in bytes. If exceeds this value, the result
> is
> > not cached.
> >
> > *7 Cache block size in bytes
> >
> > Only meaning full when method=shmem. Memory cache is devided into fixed
> size
> > of block.
> > If the block size is small, memory usage is efficient, but processing
> might
> > be slow. On the other hand, if the block size is large, memory usage is
> not
> > efficient but processing might be fast. Default is 8192.
> >
> >
> > ■Internal specification
> >
> > 1) Fetch query from frontend.
> >
> > (Extended Query's case)
> > get a message in "Execute" process after executing "Parse" or "Bind"
>
> I don't understand what you meant here. I guess you want to save the
> query from the Parse message, then parameters from the Bind one. right ?
>
> yes

> > 2) Check the SQL from Frontend
> >
> > 2.1) Parse the query. If the query is SELECT
> >
> > 2.1.1) Get md5 hash value of the query string.
> >  search for the corresponding data between this MD5 string and
> >  that in cache
>
> What about swapping 2.1.1 and 2.1 ? we would avoid parsing a query
> multiple times if it is already in the cache.
>

parse query and fetch the string "SELECT" or others ("INSERT", "UPDATE" and
so on..).
The string is used for the IF Condition. if the string is "SELECT", key of
md5 hash value of
the string "SELECT" search for caching data, so I thought it necessary to
parse the
query before IF Condition.

> > 2.1.1.1) If corresponding data is found
> >
> > 2.1.1.1.1) Send cached data to frontend.
> >
> > 2.1.1.2) If correnponding data is not found
> >
> > 2.1.1.2.1) Send the query to backend and get result.
> >
> > (Extended Query's case)
> > A portal is executed using an "Execute message" and get the  results
> >
> > 2.1.1.2.2) Check the result's size
> >
> > 2.1.1.2.2.1) If the result's size from Backend is smaller than  value of
> > "maxcache" set in pgpool.conf
> >
> > 2.1.1.2.2.1.1) Call PostgreSQL system catalog and check Function
> Volatility
> > Categories in SQL
>
> Couldn't we move this right after 2.1.1.2 ?
>

This can be shifted to right after 2.1.1.2.

> >
> > 2.1.1.2.2.1.1.1) If there are no function, or only an IMMUTABLE function
> in
> > SQL
> >
> > 2.1.1.2.2.1.1.1.1) Store the query result in shared memory cache or
> > memcached.
>
>
> When reading this I thought about something else: what about tracking
> queries to exclude as well ? having a list of md5 to blacklist because
> of non-SELECT queries or using non IMMUTABLE functions ?
> If possible, I guess it would allow to shortcut some steps here in a lot
> of cases.
>

It is good idea. I am thinking about specification again.
thanks!

>
> > ----------
> > best regards
> > Masanori Yamazaki
>
> - --
> Jehan-Guillaume (ioguix) de Rorthais
> DBA
> http://www.dalibo.com
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk3ji1EACgkQXu9L1HbaT6I4gwCgjT6ROYPZR0mOo5YSM7Bz9+7Y
> WxMAnjHhBaGbIckglgu1K8pyChiYh6jo
> =J+rY
> -----END PGP SIGNATURE-----
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-hackers/attachments/20110601/146ee66e/attachment.html>