[Pgpool-hackers] GSoC 2011 - "Caching query results in pgpool-II" specification

Masanori Yamazaki m.yamazaki23 at gmail.com
Wed Jun 1 04:31:26 UTC 2011


Sorry, I am careful to follow the rule of mailing list or English document.
I didn't know about them.



2011年6月1日13:22 Tatsuo Ishii <ishii at sraoss.co.jp>:

> Hi,
>
> Please not break mail thread. You make tracking discussions difficult.
>
> Also do not use non ASCII characters in your mail. This is an English
> mailing list.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> >> > Hello! I am a GSoC 2011 student.
> >> > My GSoC project's name is "Caching query results in pgpool-II",
> >> > and I am sending a specification about it.
> >> > If you give me your opinion, I appreciate it so much.
> >> >
> >> > ---
> >> > ■external specification(for user)
> >> >
> >> > [synopsis]
> >> >
> >> > On memory query cache allows you to speed-up database access by
> reusing
> >> > SELECT results.
> >> >
> >> > - The cache can be stored in either shared memory or memcached (not
> >> both).
> >> >
> >> > - The result of SELECT using none IMMUTABLE functions is not cached.
> >> >
> >> > - If the size of SELECT result is larger than the value specified in
> >> > pgpool.conf, it is not cached.
> >> >
> >> > - Cache will be removed if one of following conditions are met:
> >> >
> >> >   - Cache expiration time exceeds
> >> >   - If cache is full, the least recently used cache is removed
> >> >   - If a table is dropped or modified, related cache data is deleted.
> >> >   - If a table definition is modified(ALTER TABLE), related cache data
> is
> >> > deleted.
> >> >   - If a schema or database is dropped, related cache data is deleted.
> >>
> >> How do you track DDL changes ? table drop/altered or database dropped as
> >> instance ?
> >>
> >
> > I am going to prepare for a structure for tables and a structure for
> cache
> > data.
> > This structure for tables has table OID in RowDescription (table OID is
> > unique in DB clustor),
> > and cache_id. On the other hand, cache is stored in memory block and each
> > cache data
> > relate to cache_id of the structure for tables.
> >
> >
> >> > [pgpool.conf Setting]
> >> >
> >> > - To enable the memory cache functionality, set this to on (default is
> >> off)
> >> >
> >> > memory_cache_enabled = on/off
> >> >
> >> > - Memory cache behavior can be specified by cache_param directive.
> >> >
> >> > cache_param = 'method=shmem,  *1
> >> >
> >> >                memcached_host=,  *2
> >> >                memcached_port=,  *3
> >> >                total_size=,  *4
> >> >                expire=60,  *5
> >> >                maxcache=,  *6
> >> >                cache_block_size=8192' *7
> >> >
> >> > *1 Cache store method. Either "shmem"(shared memory) or "memcached".
> >> Default
> >> > is shmem.
> >> >
> >> > *2 Memcached host name. Mandatory if method=memcached. If this
> parameter
> >> is
> >> > starting with "/", UNIX domain socket is assumed.
> >> >
> >> > *3 Memcached port number. Mondatory if method=memcached.
> >> >
> >> > *4 Total memory size in bytes for storing memory cache. Mandatory if
> >> > method=shmem.
> >> >
> >> > *5 Memory cache entry life time specified in seconds. Default is 60.
> >> >
> >> > *6 Maximum SELECT result size in bytes. If exceeds this value, the
> result
> >> is
> >> > not cached.
> >> >
> >> > *7 Cache block size in bytes
> >> >
> >> > Only meaning full when method=shmem. Memory cache is devided into
> fixed
> >> size
> >> > of block.
> >> > If the block size is small, memory usage is efficient, but processing
> >> might
> >> > be slow. On the other hand, if the block size is large, memory usage
> is
> >> not
> >> > efficient but processing might be fast. Default is 8192.
> >> >
> >> >
> >> > ■Internal specification
> >> >
> >> > 1) Fetch query from frontend.
> >> >
> >> > (Extended Query's case)
> >> > get a message in "Execute" process after executing "Parse" or "Bind"
> >>
> >> I don't understand what you meant here. I guess you want to save the
> >> query from the Parse message, then parameters from the Bind one. right ?
> >>
> >> yes
> >
> >
> >
> >> > 2) Check the SQL from Frontend
> >> >
> >> > 2.1) Parse the query. If the query is SELECT
> >> >
> >> > 2.1.1) Get md5 hash value of the query string.
> >> >  search for the corresponding data between this MD5 string and
> >> >  that in cache
> >>
> >> What about swapping 2.1.1 and 2.1 ? we would avoid parsing a query
> >> multiple times if it is already in the cache.
> >>
> >
> > parse query and fetch the string "SELECT" or others ("INSERT", "UPDATE"
> and
> > so on..).
> > The string is used for the IF Condition. if the string is "SELECT", key
> of
> > md5 hash value of
> > the string "SELECT" search for caching data, so I thought it necessary to
> > parse the
> > query before IF Condition.
> >
> >
> >> > 2.1.1.1) If corresponding data is found
> >> >
> >> > 2.1.1.1.1) Send cached data to frontend.
> >> >
> >> > 2.1.1.2) If correnponding data is not found
> >> >
> >> > 2.1.1.2.1) Send the query to backend and get result.
> >> >
> >> > (Extended Query's case)
> >> > A portal is executed using an "Execute message" and get the  results
> >> >
> >> > 2.1.1.2.2) Check the result's size
> >> >
> >> > 2.1.1.2.2.1) If the result's size from Backend is smaller than  value
> of
> >> > "maxcache" set in pgpool.conf
> >> >
> >> > 2.1.1.2.2.1.1) Call PostgreSQL system catalog and check Function
> >> Volatility
> >> > Categories in SQL
> >>
> >> Couldn't we move this right after 2.1.1.2 ?
> >>
> >
> > This can be shifted to right after 2.1.1.2.
> >
> >
> >> >
> >> > 2.1.1.2.2.1.1.1) If there are no function, or only an IMMUTABLE
> function
> >> in
> >> > SQL
> >> >
> >> > 2.1.1.2.2.1.1.1.1) Store the query result in shared memory cache or
> >> > memcached.
> >>
> >>
> >> When reading this I thought about something else: what about tracking
> >> queries to exclude as well ? having a list of md5 to blacklist because
> >> of non-SELECT queries or using non IMMUTABLE functions ?
> >> If possible, I guess it would allow to shortcut some steps here in a lot
> >> of cases.
> >>
> >
> > It is good idea. I am thinking about specification again.
> > thanks!
> >
> >
> >>
> >> > ----------
> >> > best regards
> >> > Masanori Yamazaki
> >>
> >> - --
> >> Jehan-Guillaume (ioguix) de Rorthais
> >> DBA
> >> http://www.dalibo.com
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.11 (GNU/Linux)
> >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> >>
> >> iEYEARECAAYFAk3ji1EACgkQXu9L1HbaT6I4gwCgjT6ROYPZR0mOo5YSM7Bz9+7Y
> >> WxMAnjHhBaGbIckglgu1K8pyChiYh6jo
> >> =J+rY
> >> -----END PGP SIGNATURE-----
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-hackers/attachments/20110601/2f75cad9/attachment.html>


More information about the Pgpool-hackers mailing list