[Pgpool-hackers] GSoC 2011 - "Caching query results in pgpool-II" specification

Tatsuo Ishii ishii at sraoss.co.jp
Mon May 30 23:27:43 UTC 2011


> Hackers
> 
> Hello! I am a GSoC 2011 student.
> My GSoC project's name is "Caching query results in pgpool-II",
> and I am sending a specification about it.
> If you give me your opinion, I appreciate it so much.
> 
> ---
o external specification(for user)
[remove non ASCII characters]

> [synopsis]
> 
> On memory query cache allows you to speed-up database access by reusing
> SELECT results.
> 
> - The cache can be stored in either shared memory or memcached (not both).
> 
> - The result of SELECT using none IMMUTABLE functions is not cached.
> 
> - If the size of SELECT result is larger than the value specified in
> pgpool.conf, it is not cached.
> 
> - Cache will be removed if one of following conditions are met:
> 
>   - Cache expiration time exceeds
>   - If cache is full, the least recently used cache is removed
>   - If a table is dropped or modified, related cache data is deleted.
>   - If a table definition is modified(ALTER TABLE), related cache data is
> deleted.
>   - If a schema or database is dropped, related cache data is deleted.

Other things we have to care:

- We should not cache data until current transaction committed. SELECT
  results generated in an aborted transaction apparently should not be
  cached.

- SELECT results from cursor operations shoud not be cached.

> [pgpool.conf Setting]
> 
> - To enable the memory cache functionality, set this to on (default is off)
> 
> memory_cache_enabled = on/off
> 
> - Memory cache behavior can be specified by cache_param directive.
> 
> cache_param = 'method=shmem,  *1
> 
>                memcached_host=,  *2
>                memcached_port=,  *3
>                total_size=,  *4
>                expire=60,  *5
>                maxcache=,  *6
>                cache_block_size=8192' *7
> 
> *1 Cache store method. Either "shmem"(shared memory) or "memcached". Default
> is shmem.
> 
> *2 Memcached host name. Mandatory if method=memcached. If this parameter is
> starting with "/", UNIX domain socket is assumed.
> 
> *3 Memcached port number. Mondatory if method=memcached.
> 
> *4 Total memory size in bytes for storing memory cache. Mandatory if
> method=shmem.
> 
> *5 Memory cache entry life time specified in seconds. Default is 60.
> 
> *6 Maximum SELECT result size in bytes. If exceeds this value, the result is
> not cached.
> 
> *7 Cache block size in bytes
> 
> Only meaning full when method=shmem. Memory cache is devided into fixed size
> of block.
> If the block size is small, memory usage is efficient, but processing might
> be slow. On the other hand, if the block size is large, memory usage is not
> efficient but processing might be fast. Default is 8192.
> 
> 
> ■Internal specification
> 
> 1) Fetch query from frontend.
> 
> (Extended Query's case)
> get a message in "Execute" process after executing "Parse" or "Bind"
> 
> 2) Check the SQL from Frontend
> 
> 2.1) Parse the query. If the query is SELECT
> 
> 2.1.1) Get md5 hash value of the query string.
>  search for the corresponding data between this MD5 string and
>  that in cache
> 
> 2.1.1.1) If corresponding data is found
> 
> 2.1.1.1.1) Send cached data to frontend.
> 
> 2.1.1.2) If correnponding data is not found
> 
> 2.1.1.2.1) Send the query to backend and get result.
> 
> (Extended Query's case)
> A portal is executed using an "Execute message" and get the  results
> 
> 2.1.1.2.2) Check the result's size
> 
> 2.1.1.2.2.1) If the result's size from Backend is smaller than  value of
> "maxcache" set in pgpool.conf
> 
> 2.1.1.2.2.1.1) Call PostgreSQL system catalog and check Function Volatility
> Categories in SQL
> 
> 2.1.1.2.2.1.1.1) If there are no function, or only an IMMUTABLE function in
> SQL
> 
> 2.1.1.2.2.1.1.1.1) Store the query result in shared memory cache or
> memcached.
> 
> 
> 
> ----------
> best regards
> Masanori Yamazaki


More information about the Pgpool-hackers mailing list