[Pgpool-general] Several questions about pgpool

Taiki Yamaguchi yamaguchi at sraoss.co.jp
Mon Aug 14 09:31:23 UTC 2006



Michael Ulitskiy wrote:
> On Sunday 13 August 2006 09:02 pm, you wrote:
>>>> To be honest, I doubt your idea makes things any better. In the long
>>>> run, connection pools for a particular username/database in a pgpool
>>>> process will be equally distributed among processes. So trying to find
>>>> a username/database pair in another process would be a wast of
>>>> time. Also I'm not clear about your definition for "idle
>>>> connection". If you mean that a connection wating for next transaction
>>>> from a client, not client already disconnected, reusing such a
>>>> connection will break the concept of "session" in the database
>>>> application.
>>> What you said is probably true for small installations that serves 10-20 concurrent
>>> short-lived connections. I'm talking about bigger installations that have to
>>> serve hundreds concurrent connections.
>> Odd. If the number of concurret transactions becomes larger,
>> then the distribution of connections should become more uniform, no?
> 
> Hm, what do you mean by more uniform? I don't worry about connection distribution, but
> about connections number. I'll give you an example. Imagine you have a server farm of,
> let's say 50 servers that are "workers" or "nodes" of a cluster setup. Each node has identical
> setup and each node needs to have let's say 3 permanent connections to the database for
> several daemons running on them. Again each node using usernames "user1", "user2" and
> "user3" for DB connections. In this case we will have 150 connections to pgpool. This is unavoidable,
> but we also will have 150 database connections, which is what I'm trying to avoid. 
> Suppose that application I'm running is not very database intensive and those connections are
> "idle" most of the time. In my particular application I'd estimate 20 backend connections at max
> would be able to serve all the needs if client connections could be multiplexed into DB connections.
> Do you think I'm talking about something marginal?

I'm not sure what everybody else thinks, but 150 doesn't sound that big 
to me...

Can't you disconnect your daemon connections if they are not busy? That 
would solve the problem here, perhaps. I don't understand why daemons 
need to be connected all the time when they are not doing anything.

-- yamaguti

> 
>>> With such a load distributing clients
>>> among processes isn't just good enough. What I'm suggesting is exactly the same
>>> as "connection managers" implemented in every middleware server I saw or heard about
>>> like Weblogic/Websphere/Jboss etc. I don't think they've implemented it just for an
>>> excercise. Those packages are specifically designed for large installations and I think
>>> it was realized that this approach is beneficial. As I said before, pgpool is a kind of
>>> middleware between database clients and database backends. It already does many
>>> of very useful things like pooling, balancing etc. In my opinion it makes a perfect sense 
>>> to at least accept that having functionality of connection manager implemented in pgpool 
>>> would be a very nice and desirable addition. 
>> Your arguments sound not very strong to me. It seems you are saying
>> that "big names do like this, tnen we should do like them too".
> 
> This is not what I meant to say. What I meant to say was that I think they've done it for a reason.
> If you can suggest a solution better then "connection manager" I described, I'd very appreciate it.
> 
>>> Yes, by "idle connection" I mean a connection which is idle at the moment, 
>>> i.e. connection waiting for the next transaction while previous transaction has been 
>>> finished. Now I'm not sure how do you define a "database session". I'd define it as a connection
>>> having a set of properties: username/database/protocol/etc. If another client with the same
>>> properties starts a transaction I don't see why the "database session" serving the first client 
>>> cannot be reused. 
>>> Yes it will cause some overhead and require some IPC, but I believe for large installations it well worth it.
>>> Also it will probably cause problems with clients that change run-time backend configuration with 
>>> SET command, but I don't think it's a show-stopper. I think workaround for this can be devised.
>> That's exactly what I'm worring about. I don't think the workaround is
>> trivial. Do you know how these big names handle this problem?
> 
> I don't know, but I can propose several workarounds out of the top of my head with the first and simplest
> is to require clients that use these commands to reset it after each query, similar to the way you require now
> that calling functions with side effects to be issued in the manner that will prohibit them from being load-balanced.
> In other words you can make it clear in documentation and push responsibility to the users.
> Another way that comes to my mind would be to use dedicated backend connections for such clients, i.e. to work
> with them the same way pgpool works now. 
> And finally you can watch which SET commands issued by a client and attach it to session properties, so no other
> clients (unless they issued the same SET commands) would match.
>  
>>>> Changing number of processes in the pool on demand would be nice (but
>>>> for different reason what you are suggesting). Actually I wanted to
>>>> implement such that functionality since I started to implement pgpool
>>>> (that's why the parameter name is num_init_children, rather than
>>>> num_children). Currently users have to set it to as large as it is to
>>>> prepare the maximum load. That would hart performance of in coming
>>>> connection accepting. This is due to the famous "stampede syndrome".
>>>>
>>>> To implement this we need to know if all of child processes are busy
>>>> or not and it requires a interprocess communication using shared
>>>> memory.  pgpool does not have such that functionality right
>>>> now. However pgpool-II, the next generation of pgpool, already has it
>>>> and would be easy to implement on-demand-process-forking.
>>> You mentioned pgpool-II several times on the list. Could you please point me
>>> to where I can read about it and watch it progress?
>>> Thanks.
>> The progress of the project is not open at the moment but it will be
>> finished by the end of this August anyway. pgpool-II will be
>> distributed as open source software via pgfoundry in September. Also
>> the presentation material at the PostgreSQL Anniversary Summit is
>> placed at:
>>
>> http://www.sraoss.co.jp/event_seminar/2006/pgpool_feat_and_devel.pdf
>>
>> If you want to obtain the source code before September, please let me
>> know.
> 
> Thanks for the info. I'll check that link. At the moment I'd just like to see where it's going.
> 
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>>
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
> 
> 
> 



More information about the Pgpool-general mailing list