[Pgpool-general] Several questions about pgpool

Sun Aug 13 19:12:31 UTC 2006

On Saturday 12 August 2006 08:35 am, Tatsuo Ishii wrote:
> > the whole idea is to break one-to-one relationship between client and pgpool process and backend process.
> > if daemon starts working its queries should be sent to idle backend connection if it exists, not wait
> > until assigned child to finish process something time-consuming. 
> > if there's no idle backend connections then it should be created. this way you keep unneeded backend
> > connections to the minimum while still having advantages of pgpool - pooling, load-balancing etc.
> > Again as I imagine it the client connections and backend connections within pgpool should be separated.
> > Each of them should have a set of properties - username/database/protocol/whatever. Then at any moment
> > when a client sends a query an idle backend connection with matching set of properties is used to serve it. 
> > I think if implemented this would cause some additional overhead, but not significant
> > and the win on busy systems (due to the fact you don't have to keep hundreds backends in memory)
> > will  be well worth it.
> 
> To be honest, I doubt your idea makes things any better. In the long
> run, connection pools for a particular username/database in a pgpool
> process will be equally distributed among processes. So trying to find
> a username/database pair in another process would be a wast of
> time. Also I'm not clear about your definition for "idle
> connection". If you mean that a connection wating for next transaction
> from a client, not client already disconnected, reusing such a
> connection will break the concept of "session" in the database
> application.

What you said is probably true for small installations that serves 10-20 concurrent
short-lived connections. I'm talking about bigger installations that have to
serve hundreds concurrent connections. With such a load distributing clients
among processes isn't just good enough. What I'm suggesting is exactly the same
as "connection managers" implemented in every middleware server I saw or heard about
like Weblogic/Websphere/Jboss etc. I don't think they've implemented it just for an
excercise. Those packages are specifically designed for large installations and I think
it was realized that this approach is beneficial. As I said before, pgpool is a kind of
middleware between database clients and database backends. It already does many
of very useful things like pooling, balancing etc. In my opinion it makes a perfect sense 
to at least accept that having functionality of connection manager implemented in pgpool 
would be a very nice and desirable addition. 
Yes, by "idle connection" I mean a connection which is idle at the moment, 
i.e. connection waiting for the next transaction while previous transaction has been 
finished. Now I'm not sure how do you define a "database session". I'd define it as a connection
having a set of properties: username/database/protocol/etc. If another client with the same
properties starts a transaction I don't see why the "database session" serving the first client 
cannot be reused. 
Yes it will cause some overhead and require some IPC, but I believe for large installations it well worth it.
Also it will probably cause problems with clients that change run-time backend configuration with 
SET command, but I don't think it's a show-stopper. I think workaround for this can be devised.

> > In no way I'm suggesting to abandon the pre-fork architecture. What I'm saying is
> > it would be nice if pgpool would start to fork additional children after pre-forked one
> > are exhausted or better yet if it could maintain a number of unused children at 
> > all times much like apache does.
> >  
> > > Moreover, others may want to prevent unlimited number of client 
> > > connections to be made to the backend. To control that, you need a 
> > > parameter to limit the max number of connections made to the backend anyway.
> > 
> > A configuration parameters (something like "max_backend_connections" and "max_client_connections" 
> > might be the best choice for it, again much like one can do it in apache.
> 
> Changing number of processes in the pool on demand would be nice (but
> for different reason what you are suggesting). Actually I wanted to
> implement such that functionality since I started to implement pgpool
> (that's why the parameter name is num_init_children, rather than
> num_children). Currently users have to set it to as large as it is to
> prepare the maximum load. That would hart performance of in coming
> connection accepting. This is due to the famous "stampede syndrome".
> 
> To implement this we need to know if all of child processes are busy
> or not and it requires a interprocess communication using shared
> memory.  pgpool does not have such that functionality right
> now. However pgpool-II, the next generation of pgpool, already has it
> and would be easy to implement on-demand-process-forking.

You mentioned pgpool-II several times on the list. Could you please point me
to where I can read about it and watch it progress?
Thanks.

Michael

> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> 
>