[Pgpool-general] Question about the load balancing algorithm

Wed Nov 5 08:53:26 UTC 2008

> Hi and thanks for your reply. What is a "session" in the pgpool's
> terminology? Does pgpool spawn multiple processes/threads of itself?

Yes, pgpool spawn multiple worker processes, then they issue accept(2)
to wait for connect(2) request from clients(this architecture is
almost same as Apache). When a client send a connect request to
pgpool's waiting port, one of the processes catch the request and
starts to receive queries from the client. The pgpool process selects
one of the PostgreSQL backend according to the some predefined
condtions such as load balance weight etc. What I call "session" is
the pair client/pgpool process/backend. The sesson continues until the
client sends quit request.

> The reason I ask is that Im writing a billing system for a client that
> uses ridiculous amounts of data. This client is planning massive expansion
> in the future(so more nodes will be added). What I must avoid in designing
> the load balancing of this system is that if the client or their client
> who also participated in the billing(understand my client can start
> billing at 12:05 and their client at 12:06) execute a large SELECT and
> another client executes also a large SELECT the resources will be
> appropriately balanced between the nodes so that both queries wont be
> executed on the same node and slowing down processing.

pgpool selects backend in completely random manner. So I think load
between backends will be balanced in the long term. If you want to
load the balance in the short term, for example, if backend A loads
*heavier* than backend B, then wants to throw a query to B, no pgpool
is not smart enough. Howver I think the problem here is, how you could
define backend A is *heavier* than B. Number of connections, CPU load,
I/O load? I'm not sure.
--
Tatsuo Ishii
SRA OSS, Inc. Japan