[Pgpool-hackers] GSoC 2011 - "Caching query results in pgpool-II" specification

Masanori Yamazaki m.yamazaki23 at gmail.com
Tue May 31 16:11:03 UTC 2011


Thank Jehan Guillaume and Tatsuo Ishii very much for your continued support.
I am going to write a specification in more detail.


> Hackers
> >
> > Hello! I am a GSoC 2011 student.
> > My GSoC project's name is "Caching query results in pgpool-II",
> > and I am sending a specification about it.
> > If you give me your opinion, I appreciate it so much.
> >
> > ---
> > ■external specification(for user)
> >
> > [synopsis]
> >
> > On memory query cache allows you to speed-up database access by reusing
> > SELECT results.
> >
> > - The cache can be stored in either shared memory or memcached (not
> both).
> >
> > - The result of SELECT using none IMMUTABLE functions is not cached.
> >
> > - If the size of SELECT result is larger than the value specified in
> > pgpool.conf, it is not cached.
> >
> > - Cache will be removed if one of following conditions are met:
> >
> >   - Cache expiration time exceeds
> >   - If cache is full, the least recently used cache is removed
> >   - If a table is dropped or modified, related cache data is deleted.
> >   - If a table definition is modified(ALTER TABLE), related cache data is
> > deleted.
> >   - If a schema or database is dropped, related cache data is deleted.
>
> How do you track DDL changes ? table drop/altered or database dropped as
> instance ?
>
> > [pgpool.conf Setting]
> >
> > - To enable the memory cache functionality, set this to on (default is
> off)
> >
> > memory_cache_enabled = on/off
> >
> > - Memory cache behavior can be specified by cache_param directive.
> >
> > cache_param = 'method=shmem,  *1
> >
> >                memcached_host=,  *2
> >                memcached_port=,  *3
> >                total_size=,  *4
> >                expire=60,  *5
> >                maxcache=,  *6
> >                cache_block_size=8192' *7
> >
> > *1 Cache store method. Either "shmem"(shared memory) or "memcached".
> Default
> > is shmem.
> >
> > *2 Memcached host name. Mandatory if method=memcached. If this parameter
> is
> > starting with "/", UNIX domain socket is assumed.
> >
> > *3 Memcached port number. Mondatory if method=memcached.
> >
> > *4 Total memory size in bytes for storing memory cache. Mandatory if
> > method=shmem.
> >
> > *5 Memory cache entry life time specified in seconds. Default is 60.
> >
> > *6 Maximum SELECT result size in bytes. If exceeds this value, the result
> is
> > not cached.
> >
> > *7 Cache block size in bytes
> >
> > Only meaning full when method=shmem. Memory cache is devided into fixed
> size
> > of block.
> > If the block size is small, memory usage is efficient, but processing
> might
> > be slow. On the other hand, if the block size is large, memory usage is
> not
> > efficient but processing might be fast. Default is 8192.
> >
> >
> > ■Internal specification
> >
> > 1) Fetch query from frontend.
> >
> > (Extended Query's case)
> > get a message in "Execute" process after executing "Parse" or "Bind"
>
> I don't understand what you meant here. I guess you want to save the
> query from the Parse message, then parameters from the Bind one. right ?
>
> > 2) Check the SQL from Frontend
> >
> > 2.1) Parse the query. If the query is SELECT
> >
> > 2.1.1) Get md5 hash value of the query string.
> >  search for the corresponding data between this MD5 string and
> >  that in cache
>
> What about swapping 2.1.1 and 2.1 ? we would avoid parsing a query
> multiple times if it is already in the cache.
>
> > 2.1.1.1) If corresponding data is found
> >
> > 2.1.1.1.1) Send cached data to frontend.
> >
> > 2.1.1.2) If correnponding data is not found
> >
> > 2.1.1.2.1) Send the query to backend and get result.
> >
> > (Extended Query's case)
> > A portal is executed using an "Execute message" and get the  results
> >
> > 2.1.1.2.2) Check the result's size
> >
> > 2.1.1.2.2.1) If the result's size from Backend is smaller than  value of
> > "maxcache" set in pgpool.conf
> >
> > 2.1.1.2.2.1.1) Call PostgreSQL system catalog and check Function
> Volatility
> > Categories in SQL
>
> Couldn't we move this right after 2.1.1.2 ?
>
> >
> > 2.1.1.2.2.1.1.1) If there are no function, or only an IMMUTABLE function
> in
> > SQL
> >
> > 2.1.1.2.2.1.1.1.1) Store the query result in shared memory cache or
> > memcached.
>
>
> When reading this I thought about something else: what about tracking
> queries to exclude as well ? having a list of md5 to blacklist because
> of non-SELECT queries or using non IMMUTABLE functions ?
> If possible, I guess it would allow to shortcut some steps here in a lot
> of cases.
>
> > ----------
> > best regards
> > Masanori Yamazaki
>
> - --
> Jehan-Guillaume (ioguix) de Rorthais
> DBA
> http://www.dalibo.com
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk3ji1EACgkQXu9L1HbaT6I4gwCgjT6ROYPZR0mOo5YSM7Bz9+7Y
> WxMAnjHhBaGbIckglgu1K8pyChiYh6jo
> =J+rY
> -----END PGP SIGNATURE-----
>
-----BEGIN PGP SIGNED MESSAGE-----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-hackers/attachments/20110601/faddf7be/attachment.html>


More information about the Pgpool-hackers mailing list