[Pgpool-hackers] Memory based query cache

Tatsuo Ishii ishii at sraoss.co.jp
Sat Aug 20 12:57:51 UTC 2011


Hi,

Based on Ymazaki's GSoC project code, I'm working on enhancing it to
be a production level product. This is the status report of the work.
(patches included and how to use it is explained in the last of mail)

- What is memory based query cache anyway?

It's similar to MySQL's query cache except it is implemnted in a
middle ware (pgpool-II) rather than in DBMS itself. When a SELECT gets
executed, the result is preserved in shared memory or memcached(you
have choice). If a subsequent SELECT has identical query string, the
preserved result will be returned to frontend. The frontend will not
notice the difference whether the query result is returned from DBMS
or the cache except the respons time. Usually the cache returns data
much quicker than DBMS. For example, getting count(*) from 100,000
rows table(pgbench_accounts) takes 50-90ms by using PostgreSQL,
whereas getting from query cache takes only 0.2ms on my Linux
laptop. This is one of benefits of query cache. Other benefit is
reducing load of DBMS. Because using caches, the query never touches
DBMS.

Note that memory query cache can be used all of pgpool-II running
mode.

- How does it work?

Upon receiving a SELECT, md5(query string+username+dbname) is
calculated and SELECT result is stored in cache storage with the key
being the hash value. The reason why we include username here is to
prevent from users accessing tables which they don't have access
right. Same calculaion is done for later SELECTs. If the md5 hash
value matches, the data is retrieved from cache. Here we assumes that
same query returns identical result(except the case when table is
changed. We explain later). For this reason, following queries will
never be cached:

o SELECTs using non immutable functions(for example,
  current_timestamp, current_user)
o SELECTs using temporary tables
o SELECTs having SELECT INTO or FOR SHARE/FOR UPDATE clause

Also if the SELECT result is larger than certain value(can be
specified in pgpool.conf. Explained later), cache is not created. One
reason for this is to prevent a big data occupying cache
storage. Another reason is memcached has a data limit (1MB).

- What happens if a table gets changed?

While creating caches, table oids used in the SELECT are stored on a
file along with cache key (md5 value. This is the key for memcached)
or location on shared memory cache. Thoses files are placed under
directories named by database oids. If following commands:

o INSERT, UPDATE, DELETE,
o ALTER/TRUNCATE/DROP TABLE
o DROP DATABASE

are received, oids are retrieved from the files and examined. If a
file which has same oid is found, cache key or locatin retrieved from
the file and cache is removed.

- What happens if a transaction is aborted?

SELECTs included in an aborted transaction should never be
cached. Consider following scenario:

BEGIN;
INSERT INTO t1 ....
SELECT * FROM t1;
ABORT;

Because the last SELECT returns false data, we should not cache it.

- What are knobs to control query cache behavior?

Here are directives to control query cache:

# If true, use the memory cache functionality, false by default
memory_cache_enabled = true

# Cache store method. either shmem(shared memory) or Memcached. shmem by default
memqcache_method = 'shmem'

# Memcached host name. Mandatory if memqcache_method=memcached.
memqcache_memcached_host = '127.0.0.1'

# Memcached port number. Mondatory if memqcache_method=memcached.
memqcache_memcached_port=11211

# Total memory size in bytes for storing memory cache. Mandatory if memqcache_method=shmem.
memqcache_total_size=10240

# Memory cache entry life time specified in seconds. 60 by default.
memqcache_expire=60

# Maximum SELECT result size in bytes.
memqcache_maxcache=1024

# Cache block size in bytes. 1MB by default
memqcache_cache_block_size=1048576

# Temporary work directory to record table oids
memqcache_oiddir = '/var/log/pgpool/oiddir'

- What's difference from existing query cache in pgpool-II?

Existing query cache in pgpool-II has similar objective with memory
query cache but has several drawbacks:

o Cache storage is PostgreSQL(system DB). Much slower than memory.

o No cache invalidation mechanism. Outdated caches have to be removed
  by hand.

o Caches are created even in aborated transactions

In summary, once memory query cache is implmented, current query cache
functionality will be obsoleted.

- What are remaining works?

o Extended protocols (e.g. Java, PHP PDO etc.) cannot use query cache

o Shared memory query cache should use more sophiscated cache
  management mechanism(for example, searching query is based on
  sequential scan, rather than hash).

o Compressing cache data?

o DROP/TRUNCATE TABLE CASCADE does not invalidate caches used by
  tables implicitly specified because of CASCADE.

o DROP/ALTER TABLE does not invalidate caches used by
  child tables.

o Cache management facility needed. For example, to know cache hit
  rate, how much cache storage is used and so on.

- What is the release plan?

o I hope memory query cache will be include in 3.2 release.

- How to use patches?

Apply memqcache.patch to CVS HEAD.
$ cd somewhere/pgpool-II
$ patch -b -p0 < /tmp/memqcache.patch

run configure as usal
$ ./configure

If you want to use memcached, you must turn on --with-memcached option.
$ ./configure --with-memcached=/usr/local

make
make install

cp one of pgpoo.l.conf.sample*
edit pgpool.conf

To know if query result is fetched from query cache, turn on
log_per_node_statement. You wil see something like:

2011-08-20 21:46:02 LOG:   pid 16295: query result fetched from cache. statement: select * from t1;
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memqcache.patch.gz
Type: application/octet-stream
Size: 32620 bytes
Desc: not available
URL: <http://pgfoundry.org/pipermail/pgpool-hackers/attachments/20110820/06d1f3b1/attachment-0001.obj>


More information about the Pgpool-hackers mailing list