[pgpool-general: 2009] Re: is sharding possible with pgpool?

Lonni J Friedman netllama at gmail.com
Sat Aug 10 03:45:09 JST 2013


I doubt pgpool is capable of doing what you want with respect to
sharding.  pgpool has no means of keeping track of which data is in
which 'shard'.  You'd need to implement some other middleware layer to
keep track of that logic, and have that middleware sit either in front
of pgpool, or between pgpool & the database cluster, depending on how
the logic is setup.

On Fri, Aug 9, 2013 at 11:42 AM, Torsten Förtsch
<torsten.foertsch at gmx.net> wrote:
> Hi,
>
> I want to set up from a structural point of view identical backend
> databases. Each backend then is to hold the data of a certain range of
> customers. I was told pgpool can help here.
>
> So, I started experimenting and by now I have the impression that pgpool
> is the wrong tool. But maybe I am too stupid.
>
> I set up 4 backends each with a "pgptest" table with 2 columns "id" and
> "txt". The system database contains one record in
> pgpool_catalog.dist_def. pgpool_catalog.replicate_def is empty.
>
> pgpool=> select * from pgpool_catalog.dist_def ;
> -[ RECORD 1 ]-+----------------------------
> dbname        | pgpool
> schema_name   | public
> table_name    | pgptest
> col_name      | id
> col_list      | {id,txt}
> type_list     | {int,text}
> dist_def_func | pgpool_catalog.dist_pgptest
>
> The pgpool_catalog.dist_pgptest function is defined as follows:
>
> pgpool=> \df+ pgpool_catalog.dist_pgptest
> List of functions
> -[ RECORD 1 ]-------+---------------
> Schema              | pgpool_catalog
> Name                | dist_pgptest
> Result data type    | integer
> Argument data types | i integer
> Type                | normal
> Volatility          | immutable
> Owner               | pgpool
> Language            | sql
> Source code         |
>                     | select $1%4
>
> Pgpool is version 3.1.3 and configured with
>
> load_balance_mode = on
> replication_mode = off
> parallel_mode = on
>
> Now, if I insert data into the table it works as expected:
>
>   insert into pgptest(id, txt) values( 125, 'test125');
>
> The query is sent to only one backend and on the system db I see this
> statement being executed with $1=125:
>
>   SELECT pgpool_catalog.dist_pgptest($1::int)
>
> But the query
>
>   select * from pgptest where id=125
>
> is sent to all backends, although pgpool could know that id=125 can be
> only in backend 1. Also, a command to update the record with id=125 is
> sent to all backends.
>
> At first I thought that *all* read statements are sent to all backends
> and only writes are distributed. But then I tried:
>
>   select substr(txt, 1, 1) letter, count(*) count
>     from pgptest group by letter
>
> and was surprised to see that it was sent only to *one* backend. Though,
> the equivalent
>
>   select substr(txt, 1, 1) letter, count(*) count
>     from pgptest group by substr(txt, 1, 1)
>
> is sent to all backends.
>
> Also
>
>   with xx as (select * from pgptest) select * from xx
>
> is sent to only one backend.
>
> Very confusing...
>
> I have also tried turning replication_mode on and load_balance_mode off
> and both of them on. That didn't help. A google search came up with
> similar questions but I couldn't find any answer.
>
> What do I wrong?
>
> Torsten
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general



-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama at gmail.com
LlamaLand                       https://netllama.linux-sxs.org


More information about the pgpool-general mailing list