[Pgpool-hackers] GSoC 2011 - "Caching query results in pgpool-II" specification

Tatsuo Ishii ishii at sraoss.co.jp
Wed Jun 1 04:22:41 UTC 2011


Hi,

Please not break mail thread. You make tracking discussions difficult.

Also do not use non ASCII characters in your mail. This is an English
mailing list.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

>> > Hello! I am a GSoC 2011 student.
>> > My GSoC project's name is "Caching query results in pgpool-II",
>> > and I am sending a specification about it.
>> > If you give me your opinion, I appreciate it so much.
>> >
>> > ---
>> > ■external specification(for user)
>> >
>> > [synopsis]
>> >
>> > On memory query cache allows you to speed-up database access by reusing
>> > SELECT results.
>> >
>> > - The cache can be stored in either shared memory or memcached (not
>> both).
>> >
>> > - The result of SELECT using none IMMUTABLE functions is not cached.
>> >
>> > - If the size of SELECT result is larger than the value specified in
>> > pgpool.conf, it is not cached.
>> >
>> > - Cache will be removed if one of following conditions are met:
>> >
>> >   - Cache expiration time exceeds
>> >   - If cache is full, the least recently used cache is removed
>> >   - If a table is dropped or modified, related cache data is deleted.
>> >   - If a table definition is modified(ALTER TABLE), related cache data is
>> > deleted.
>> >   - If a schema or database is dropped, related cache data is deleted.
>>
>> How do you track DDL changes ? table drop/altered or database dropped as
>> instance ?
>>
> 
> I am going to prepare for a structure for tables and a structure for cache
> data.
> This structure for tables has table OID in RowDescription (table OID is
> unique in DB clustor),
> and cache_id. On the other hand, cache is stored in memory block and each
> cache data
> relate to cache_id of the structure for tables.
> 
> 
>> > [pgpool.conf Setting]
>> >
>> > - To enable the memory cache functionality, set this to on (default is
>> off)
>> >
>> > memory_cache_enabled = on/off
>> >
>> > - Memory cache behavior can be specified by cache_param directive.
>> >
>> > cache_param = 'method=shmem,  *1
>> >
>> >                memcached_host=,  *2
>> >                memcached_port=,  *3
>> >                total_size=,  *4
>> >                expire=60,  *5
>> >                maxcache=,  *6
>> >                cache_block_size=8192' *7
>> >
>> > *1 Cache store method. Either "shmem"(shared memory) or "memcached".
>> Default
>> > is shmem.
>> >
>> > *2 Memcached host name. Mandatory if method=memcached. If this parameter
>> is
>> > starting with "/", UNIX domain socket is assumed.
>> >
>> > *3 Memcached port number. Mondatory if method=memcached.
>> >
>> > *4 Total memory size in bytes for storing memory cache. Mandatory if
>> > method=shmem.
>> >
>> > *5 Memory cache entry life time specified in seconds. Default is 60.
>> >
>> > *6 Maximum SELECT result size in bytes. If exceeds this value, the result
>> is
>> > not cached.
>> >
>> > *7 Cache block size in bytes
>> >
>> > Only meaning full when method=shmem. Memory cache is devided into fixed
>> size
>> > of block.
>> > If the block size is small, memory usage is efficient, but processing
>> might
>> > be slow. On the other hand, if the block size is large, memory usage is
>> not
>> > efficient but processing might be fast. Default is 8192.
>> >
>> >
>> > ■Internal specification
>> >
>> > 1) Fetch query from frontend.
>> >
>> > (Extended Query's case)
>> > get a message in "Execute" process after executing "Parse" or "Bind"
>>
>> I don't understand what you meant here. I guess you want to save the
>> query from the Parse message, then parameters from the Bind one. right ?
>>
>> yes
> 
> 
> 
>> > 2) Check the SQL from Frontend
>> >
>> > 2.1) Parse the query. If the query is SELECT
>> >
>> > 2.1.1) Get md5 hash value of the query string.
>> >  search for the corresponding data between this MD5 string and
>> >  that in cache
>>
>> What about swapping 2.1.1 and 2.1 ? we would avoid parsing a query
>> multiple times if it is already in the cache.
>>
> 
> parse query and fetch the string "SELECT" or others ("INSERT", "UPDATE" and
> so on..).
> The string is used for the IF Condition. if the string is "SELECT", key of
> md5 hash value of
> the string "SELECT" search for caching data, so I thought it necessary to
> parse the
> query before IF Condition.
> 
> 
>> > 2.1.1.1) If corresponding data is found
>> >
>> > 2.1.1.1.1) Send cached data to frontend.
>> >
>> > 2.1.1.2) If correnponding data is not found
>> >
>> > 2.1.1.2.1) Send the query to backend and get result.
>> >
>> > (Extended Query's case)
>> > A portal is executed using an "Execute message" and get the  results
>> >
>> > 2.1.1.2.2) Check the result's size
>> >
>> > 2.1.1.2.2.1) If the result's size from Backend is smaller than  value of
>> > "maxcache" set in pgpool.conf
>> >
>> > 2.1.1.2.2.1.1) Call PostgreSQL system catalog and check Function
>> Volatility
>> > Categories in SQL
>>
>> Couldn't we move this right after 2.1.1.2 ?
>>
> 
> This can be shifted to right after 2.1.1.2.
> 
> 
>> >
>> > 2.1.1.2.2.1.1.1) If there are no function, or only an IMMUTABLE function
>> in
>> > SQL
>> >
>> > 2.1.1.2.2.1.1.1.1) Store the query result in shared memory cache or
>> > memcached.
>>
>>
>> When reading this I thought about something else: what about tracking
>> queries to exclude as well ? having a list of md5 to blacklist because
>> of non-SELECT queries or using non IMMUTABLE functions ?
>> If possible, I guess it would allow to shortcut some steps here in a lot
>> of cases.
>>
> 
> It is good idea. I am thinking about specification again.
> thanks!
> 
> 
>>
>> > ----------
>> > best regards
>> > Masanori Yamazaki
>>
>> - --
>> Jehan-Guillaume (ioguix) de Rorthais
>> DBA
>> http://www.dalibo.com
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.11 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAk3ji1EACgkQXu9L1HbaT6I4gwCgjT6ROYPZR0mOo5YSM7Bz9+7Y
>> WxMAnjHhBaGbIckglgu1K8pyChiYh6jo
>> =J+rY
>> -----END PGP SIGNATURE-----
>>


More information about the Pgpool-hackers mailing list