[Pgpool-general] questions about poolin, load balancing and other things

Fri Dec 19 10:59:28 UTC 2008

> Hi Tatsuo,
> 
> I understand that pgpool does pooling by saving the connections to PG  
> and reusing them when the same user/database is used and indeed I see  
> some pcp procs being reused 40+ times.  What I'm trying to figure out  
> here is, does pgpool just passes the new query it receives through  
> that same connection that has already been opened previously and such  
> will be reused by this new request coming from the same user to the  
> same database ?

Yes.

> How does pgpool queues its incoming connections when it starts to  
> receive more connections than num_init_children is available ?

The queueing is actually done by the kernel, not by
pgpool. num_init_children pgpool child processes are spawn and are
waiting for incoming connections by issuing accept(2). If a client
tries to connect to pgpool's waiting port, the kernel will assign one
of the pgpool processes to receive the connection. Next client will
connect to one of free pgpool processes and so on. So if all of pgpool
process become busy with communicating with clients, new incoming
connection request will be queued by the kernel. If one of the clients
disconnects to pgpool, the pgpool child process goes back and issues
accept(2) then the kernel picks up one of the requests from the queue
and assign to the pgpool process.

This scenario is pretty close to the one Apache is doing, I believe.

> I'm  
> pretty sure here that the "child_life_time" setting would be the one  
> responsible for freeing up the pgpool a  child so that the new  
> connection queued can obtain access to PG through pgpool and execute  
> its query, correct ?

No. child_life_time only works while a pgpool process is *free*, in
other word,  while waiting for incoming connections.

"Freeing up the pgpool child" is done by disconnecting client and/or
client_ilde_limit if it's set.

> In regards to the load balancing, that can indeed be very helpful  
> specially since the master node is usually the one with a higher load.  
> I'm pretty sure this may not be possible right now but it would be  
> pretty cool if pgpool only opened a connection to the backend that it  
> chooses to run the SELECT query against.

I'm not sure what you mean here. Can you show an example?

> I'm pretty sure this may be  
> complicated to implement, if it all possible which may not be,  since  
> this would affect how pgpool handles connections.
> 
> 
> Also you were right about the online recovery scripts. If I skip the  
> second base backup it seems 30-50% faster in most cases. What takes  
> the longest time is just the checkpoint that pg_start_backup has to do  
> while there is a lot of writes are being done to the DB. But the new  
> online recovery setting makes things perfect since the client just  
> keeps on trying to send the data over and eventually when the 2nd  
> stage is over the rest of all data resumes to be sent.
> 
> can't remember the other questions right now, sorry :)
--
Tatsuo Ishii
SRA OSS, Inc. Japan