[Pgpool-general] Several questions about pgpool

Thu Aug 10 18:10:29 UTC 2006

On Thursday 10 August 2006 04:22 am, Taiki Yamaguchi wrote:
> 
> Michael Ulitskiy wrote:
> > On Tuesday 08 August 2006 09:51 pm, Taiki Yamaguchi wrote:
> >> Hi,
> >>
> >> Michael Ulitskiy wrote:
> >>> Hello,
> >>>
> >>> I'm using pgpool 3.1 and recently discovered several things that
> >>> I'm not sure whether they're bugs or intended. I'd appreciate if someone
> >>> can enlighten me on that.
> >>> 1. Will pgpool fork additional children when the number of available children exhausted? 
> >>> I was under impression it will, but the testing shows otherwise. Connecting clients just block
> >>> until another client disconnects and pgpool process becomes available.
> >> No. pgpool only forks children up to num_init_children. If there is no 
> >> available child left, other connections will be queued until one of 
> >> children will be available.
> >>
> >>> 2. I thought that each pgpool process can serve "max_pool" number of usernames, multiplexing
> >>> queries from different usernames. My testing shows it doesn't work. I set 
> >>> num_init_children=4
> >>> max_pool=4
> >>> So i expect to be able to connect 16 clients using 4 different usernames, but after 4 clients
> >>> connected (with different usernames) the 5th blocks waiting for available pgpool process.
> >>> Is my understanding wrong? What is the purpose of max_pool parameter then?
> >> pgpool child process *pools* up to max_pool number of connections (based 
> >> on username, database, and protocol version). In your case, pgpool will 
> >> pool at most 16 connections, but only 4 clients can connect concurrently.
> >>
> >>> 3. I'm using load-balancing mode with replication done by slony-I. It seems when a client
> >>> is connected, pgpool child opens connections to both primary and secondary backend
> >>> which means that the number of concurrently served clients do not depend on the number
> >>> of backends, right? In other words if I have num_init_children=32 then I won't be able to
> >>> connect more than 32 clients regardless of number of backends I'm using?
> >> As I answered in question 1, pgpool only serves connections up to 
> >> num_init_children. The number of backends has nothing to do with it.
> >>
> >>> 4. This is more a feature request. Have no idea how hard it is to do and whether it's possible
> >>> in pgpool architecture. I'm expecting to have hundreds of clients connecting through pgpool
> >>> to database cluster behind it. Some connections will be short-lived - several seconds, 
> >>> while others will be relatively long-lived - minutes-to-hours, while others will be permanent - daemons.
> >>> It would be really nice if pgpool children could process more than one connection multiplexing
> >>> queries from different connection. Is it possible?
> >> I am not sure why you need this feature. pgpool can handle hundreds of 
> >> clients if you set your num_init_children to suit your application. If a 
> >> short-lived client disconnects, child process which was serving that 
> >> client will be available for the next incoming connection.
> > 
> > Thanks for the info.
> 
> No problem :)
> 
> > I need this feature because at the moment I already have around 80 idle postmaster processes
> > on each of 2 backends and this number is expected to grow significantly. When I say "idle"
> > I mean client is still connected to the database, but doing some other work. I don't think that having
> > hundreds of idle postmaters is good.
> > Currently I have 2 choices: disconnect after every query or implement a connection manager
> > either on application side or in middleware. Since pgpool is actually a kind of middleware, 
> > I think it's very logical for it to have connection manager functionality.
> > I'd imagine it like this: 
> > - when client connection comes in pgpool checks if there's another connection
> >   with the same username/database. if no - open connections to backends. if yes - do nothing, wait for query.
> 
> pgpool already does this. Only thing that differs from what you are 
> probably expecting is that pgpool's child process checks if there's a 
> connection with the same username & database & protocol version exists 
> only within itself; not in other child processes.

That's a principal difference :)

> > - when client query comes in - check if there're idle at the moment connections to backend with the
> >   same credentials.   if yes - use it. if no - open new connections and use it.
> 
> I think it is faster to have another child process handle the new 
> incoming connection rather than checking and make a child process 
> handles multiple client connections. Think about a case when a daemon 
> started working while the child was processing other new time-consuming 
> connections.

the whole idea is to break one-to-one relationship between client and pgpool process and backend process.
if daemon starts working its queries should be sent to idle backend connection if it exists, not wait
until assigned child to finish process something time-consuming. 
if there's no idle backend connections then it should be created. this way you keep unneeded backend
connections to the minimum while still having advantages of pgpool - pooling, load-balancing etc.
Again as I imagine it the client connections and backend connections within pgpool should be separated.
Each of them should have a set of properties - username/database/protocol/whatever. Then at any moment
when a client sends a query an idle backend connection with matching set of properties is used to serve it. 
I think if implemented this would cause some additional overhead, but not significant
and the win on busy systems (due to the fact you don't have to keep hundreds backends in memory)
will  be well worth it.

> > Also it would be very nice for pgpool to fork additional children as needed as I believe it's always
> > good for application to be able to adapt to the run-time conditions instead of solely depending on
> > manual configuration.
> 
> pgpool adapts pre-fork model that all children will be created when 
> pgpool is started, just like apache2. The reason for this is that the 
> overhead created by forking additional processes degrades performance, 
> and many applications may connect to the backend thousands of times, but 
> each connection lives only for a short period of time.

In no way I'm suggesting to abandon the pre-fork architecture. What I'm saying is
it would be nice if pgpool would start to fork additional children after pre-forked one
are exhausted or better yet if it could maintain a number of unused children at 
all times much like apache does.

> Moreover, others may want to prevent unlimited number of client 
> connections to be made to the backend. To control that, you need a 
> parameter to limit the max number of connections made to the backend anyway.

A configuration parameters (something like "max_backend_connections" and "max_client_connections" 
might be the best choice for it, again much like one can do it in apache.

Michael

> --
> Taiki Yamaguchi
> 
> > Again I have no idea how hard that is to implement. I realize that complexity and/or pgpool architecture
> > may prohibit it, but I believe this would be a very nice and logical addition. 
> > What do you think?
> > 
> > Michael
> > 
> >> --
> >> Taiki Yamaguchi
> >>
> >>> Thanks,
> >>> Michael 
> >>> _______________________________________________
> >>> Pgpool-general mailing list
> >>> Pgpool-general at pgfoundry.org
> >>> http://pgfoundry.org/mailman/listinfo/pgpool-general
> >>>
> >>>
> >>>
> >>
> > _______________________________________________
> > Pgpool-general mailing list
> > Pgpool-general at pgfoundry.org
> > http://pgfoundry.org/mailman/listinfo/pgpool-general
> 
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
>