[Pgpool-general] Several questions about pgpool

Mon Aug 14 15:26:53 UTC 2006

On Monday 14 August 2006 05:31 am, Taiki Yamaguchi wrote:
> 
> Michael Ulitskiy wrote:
> > On Sunday 13 August 2006 09:02 pm, you wrote:
> >>>> To be honest, I doubt your idea makes things any better. In the long
> >>>> run, connection pools for a particular username/database in a pgpool
> >>>> process will be equally distributed among processes. So trying to find
> >>>> a username/database pair in another process would be a wast of
> >>>> time. Also I'm not clear about your definition for "idle
> >>>> connection". If you mean that a connection wating for next transaction
> >>>> from a client, not client already disconnected, reusing such a
> >>>> connection will break the concept of "session" in the database
> >>>> application.
> >>> What you said is probably true for small installations that serves 10-20 concurrent
> >>> short-lived connections. I'm talking about bigger installations that have to
> >>> serve hundreds concurrent connections.
> >> Odd. If the number of concurret transactions becomes larger,
> >> then the distribution of connections should become more uniform, no?
> > 
> > Hm, what do you mean by more uniform? I don't worry about connection distribution, but
> > about connections number. I'll give you an example. Imagine you have a server farm of,
> > let's say 50 servers that are "workers" or "nodes" of a cluster setup. Each node has identical
> > setup and each node needs to have let's say 3 permanent connections to the database for
> > several daemons running on them. Again each node using usernames "user1", "user2" and
> > "user3" for DB connections. In this case we will have 150 connections to pgpool. This is unavoidable,
> > but we also will have 150 database connections, which is what I'm trying to avoid. 
> > Suppose that application I'm running is not very database intensive and those connections are
> > "idle" most of the time. In my particular application I'd estimate 20 backend connections at max
> > would be able to serve all the needs if client connections could be multiplexed into DB connections.
> > Do you think I'm talking about something marginal?
> 
> I'm not sure what everybody else thinks, but 150 doesn't sound that big 
> to me...

This is an imaginary example to demonstrate the problem. You can multiply these figures as needed 
until you feel it's high enough :) 
Also I usually use a rule of thumb that if the number of concurrent connections is exceeding 100
then something needs to be done about it.

> Can't you disconnect your daemon connections if they are not busy? That 
> would solve the problem here, perhaps. I don't understand why daemons 
> need to be connected all the time when they are not doing anything.

Please see my reply Tatsuo Ishii. 

> -- yamaguti
> 
> > 
> >>> With such a load distributing clients
> >>> among processes isn't just good enough. What I'm suggesting is exactly the same
> >>> as "connection managers" implemented in every middleware server I saw or heard about
> >>> like Weblogic/Websphere/Jboss etc. I don't think they've implemented it just for an
> >>> excercise. Those packages are specifically designed for large installations and I think
> >>> it was realized that this approach is beneficial. As I said before, pgpool is a kind of
> >>> middleware between database clients and database backends. It already does many
> >>> of very useful things like pooling, balancing etc. In my opinion it makes a perfect sense 
> >>> to at least accept that having functionality of connection manager implemented in pgpool 
> >>> would be a very nice and desirable addition. 
> >> Your arguments sound not very strong to me. It seems you are saying
> >> that "big names do like this, tnen we should do like them too".
> > 
> > This is not what I meant to say. What I meant to say was that I think they've done it for a reason.
> > If you can suggest a solution better then "connection manager" I described, I'd very appreciate it.
> > 
> >>> Yes, by "idle connection" I mean a connection which is idle at the moment, 
> >>> i.e. connection waiting for the next transaction while previous transaction has been 
> >>> finished. Now I'm not sure how do you define a "database session". I'd define it as a connection
> >>> having a set of properties: username/database/protocol/etc. If another client with the same
> >>> properties starts a transaction I don't see why the "database session" serving the first client 
> >>> cannot be reused. 
> >>> Yes it will cause some overhead and require some IPC, but I believe for large installations it well worth it.
> >>> Also it will probably cause problems with clients that change run-time backend configuration with 
> >>> SET command, but I don't think it's a show-stopper. I think workaround for this can be devised.
> >> That's exactly what I'm worring about. I don't think the workaround is
> >> trivial. Do you know how these big names handle this problem?
> > 
> > I don't know, but I can propose several workarounds out of the top of my head with the first and simplest
> > is to require clients that use these commands to reset it after each query, similar to the way you require now
> > that calling functions with side effects to be issued in the manner that will prohibit them from being load-balanced.
> > In other words you can make it clear in documentation and push responsibility to the users.
> > Another way that comes to my mind would be to use dedicated backend connections for such clients, i.e. to work
> > with them the same way pgpool works now. 
> > And finally you can watch which SET commands issued by a client and attach it to session properties, so no other
> > clients (unless they issued the same SET commands) would match.
> >  
> >>>> Changing number of processes in the pool on demand would be nice (but
> >>>> for different reason what you are suggesting). Actually I wanted to
> >>>> implement such that functionality since I started to implement pgpool
> >>>> (that's why the parameter name is num_init_children, rather than
> >>>> num_children). Currently users have to set it to as large as it is to
> >>>> prepare the maximum load. That would hart performance of in coming
> >>>> connection accepting. This is due to the famous "stampede syndrome".
> >>>>
> >>>> To implement this we need to know if all of child processes are busy
> >>>> or not and it requires a interprocess communication using shared
> >>>> memory.  pgpool does not have such that functionality right
> >>>> now. However pgpool-II, the next generation of pgpool, already has it
> >>>> and would be easy to implement on-demand-process-forking.
> >>> You mentioned pgpool-II several times on the list. Could you please point me
> >>> to where I can read about it and watch it progress?
> >>> Thanks.
> >> The progress of the project is not open at the moment but it will be
> >> finished by the end of this August anyway. pgpool-II will be
> >> distributed as open source software via pgfoundry in September. Also
> >> the presentation material at the PostgreSQL Anniversary Summit is
> >> placed at:
> >>
> >> http://www.sraoss.co.jp/event_seminar/2006/pgpool_feat_and_devel.pdf
> >>
> >> If you want to obtain the source code before September, please let me
> >> know.
> > 
> > Thanks for the info. I'll check that link. At the moment I'd just like to see where it's going.
> > 
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >>
> > _______________________________________________
> > Pgpool-general mailing list
> > Pgpool-general at pgfoundry.org
> > http://pgfoundry.org/mailman/listinfo/pgpool-general
> > 
> > 
> > 
> 
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
>