[Pgpool-hackers] GSoC 2011 - "Caching query results in pgpool-II" specification

Masanori Yamazaki m.yamazaki23 at gmail.com
Sun May 29 15:21:55 UTC 2011


Hackers

Hello! I am a GSoC 2011 student.
My GSoC project's name is "Caching query results in pgpool-II",
and I am sending a specification about it.
If you give me your opinion, I appreciate it so much.

---
■external specification(for user)

[synopsis]

On memory query cache allows you to speed-up database access by reusing
SELECT results.

- The cache can be stored in either shared memory or memcached (not both).

- The result of SELECT using none IMMUTABLE functions is not cached.

- If the size of SELECT result is larger than the value specified in
pgpool.conf, it is not cached.

- Cache will be removed if one of following conditions are met:

  - Cache expiration time exceeds
  - If cache is full, the least recently used cache is removed
  - If a table is dropped or modified, related cache data is deleted.
  - If a table definition is modified(ALTER TABLE), related cache data is
deleted.
  - If a schema or database is dropped, related cache data is deleted.


[pgpool.conf Setting]

- To enable the memory cache functionality, set this to on (default is off)

memory_cache_enabled = on/off

- Memory cache behavior can be specified by cache_param directive.

cache_param = 'method=shmem,  *1

               memcached_host=,  *2
               memcached_port=,  *3
               total_size=,  *4
               expire=60,  *5
               maxcache=,  *6
               cache_block_size=8192' *7

*1 Cache store method. Either "shmem"(shared memory) or "memcached". Default
is shmem.

*2 Memcached host name. Mandatory if method=memcached. If this parameter is
starting with "/", UNIX domain socket is assumed.

*3 Memcached port number. Mondatory if method=memcached.

*4 Total memory size in bytes for storing memory cache. Mandatory if
method=shmem.

*5 Memory cache entry life time specified in seconds. Default is 60.

*6 Maximum SELECT result size in bytes. If exceeds this value, the result is
not cached.

*7 Cache block size in bytes

Only meaning full when method=shmem. Memory cache is devided into fixed size
of block.
If the block size is small, memory usage is efficient, but processing might
be slow. On the other hand, if the block size is large, memory usage is not
efficient but processing might be fast. Default is 8192.


■Internal specification

1) Fetch query from frontend.

(Extended Query's case)
get a message in "Execute" process after executing "Parse" or "Bind"

2) Check the SQL from Frontend

2.1) Parse the query. If the query is SELECT

2.1.1) Get md5 hash value of the query string.
 search for the corresponding data between this MD5 string and
 that in cache

2.1.1.1) If corresponding data is found

2.1.1.1.1) Send cached data to frontend.

2.1.1.2) If correnponding data is not found

2.1.1.2.1) Send the query to backend and get result.

(Extended Query's case)
A portal is executed using an "Execute message" and get the  results

2.1.1.2.2) Check the result's size

2.1.1.2.2.1) If the result's size from Backend is smaller than  value of
"maxcache" set in pgpool.conf

2.1.1.2.2.1.1) Call PostgreSQL system catalog and check Function Volatility
Categories in SQL

2.1.1.2.2.1.1.1) If there are no function, or only an IMMUTABLE function in
SQL

2.1.1.2.2.1.1.1.1) Store the query result in shared memory cache or
memcached.



----------
best regards
Masanori Yamazaki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-hackers/attachments/20110530/71c91db7/attachment.html>


More information about the Pgpool-hackers mailing list