[Pgpool-general] Several questions about pgpool

Mon Aug 14 15:21:27 UTC 2006

On Monday 14 August 2006 04:51 am, you wrote:
> > > Odd. If the number of concurret transactions becomes larger,
> > > then the distribution of connections should become more uniform, no?
> > 
> > Hm, what do you mean by more uniform? I don't worry about connection distribution, but
> > about connections number. I'll give you an example. Imagine you have a server farm of,
> > let's say 50 servers that are "workers" or "nodes" of a cluster setup. Each node has identical
> > setup and each node needs to have let's say 3 permanent connections to the database for
> > several daemons running on them. Again each node using usernames "user1", "user2" and
> > "user3" for DB connections. In this case we will have 150 connections to pgpool. This is unavoidable,
> > but we also will have 150 database connections, which is what I'm trying to avoid. 
> > Suppose that application I'm running is not very database intensive and those connections are
> > "idle" most of the time. In my particular application I'd estimate 20 backend connections at max
> > would be able to serve all the needs if client connections could be multiplexed into DB connections.
> > Do you think I'm talking about something marginal?
> 
> Why don't you ask programmers to release database connections as soon
> as the database processing finished then?

In other words you're suggesting reopening database connection for every query - the point I made at the
very beginning of this discussion that I see 2 ways to deal with it:
1. reopening connection for every transaction
2. employing connection manager
I don't think that using the 1st method is always possible - think about the cases where I don't have control over source code and
more important I don't think that it's always practical and makes sense from the performance point view. 
If I say that connection is idle most of the time it doesn't mean that it's idle for minutes. 
Suppose that those daemons issue SELECT query on average every second and the processing time of 
those SELECTs is averages at 10ms. The connection will be idle 99% of the time, but I don't think that
having 150 clients reconnecting/reauthenticating every second is a wise design. 
After all applications open permanent connections for the reason them to be readily available when 
they needed and clients can be served ASAP.

> > > Your arguments sound not very strong to me. It seems you are saying
> > > that "big names do like this, tnen we should do like them too".
> > 
> > This is not what I meant to say. What I meant to say was that I think they've done it for a reason.
> > If you can suggest a solution better then "connection manager" I described, I'd very appreciate it.
> 
> Look like a circular argument:-) I think you should tell me the reason
> why big names do like that first.

I think I made myself as clear as I could. The reason is to better distribute available resources. 
It is my understanding that backend connection, even idle, eats up backend resources in terms
of memory,sockets,select/pool calls,etc. Every documentation I've read says that it's better
to have the number of concurrent clients to the minimum. Also I think that this is a commonly
perceived view that postgres performance will degrade as a number of concurrent connections grow.

> > > That's exactly what I'm worring about. I don't think the workaround is
> > > trivial. Do you know how these big names handle this problem?
> > 
> > I don't know, but I can propose several workarounds out of the top of my head with the first and simplest
> > is to require clients that use these commands to reset it after each query, similar to the way you require now
> > that calling functions with side effects to be issued in the manner that will prohibit them from being load-balanced.
> 
> My guess is they do exactly what we are doing now. Probably the only
> difference is they use thread pool while we are using process pool. I
> don't think they allow to steal a not-yet-finished-session (probably
> the thread would be marked as "busy").

Well I don't think so. At least this is not what they advertise.
It doesn't matter anyway. I brought those names up as an example to make it
easier to understand what I mean. Again 
I can't see another way to cope with ever-increasing number of concurrent backend connections
except connection manager. If you don't agree, please tell me what you think would be a better
solution.

 > > In other words you can make it clear in documentation and push responsibility to the users.
> > Another way that comes to my mind would be to use dedicated backend connections for such clients, i.e. to work
> > with them the same way pgpool works now. 
> > And finally you can watch which SET commands issued by a client and attach it to session properties, so no other
> > clients (unless they issued the same SET commands) would match.
> 
> Again, I think you'd better to teach programmers release the database
> connection earlier. I think pgpool is designed for short but large
> number of sessions, something like web based systems. 

pgpool may be designed to handle any number of sessions. Unfortunately postgres isn't designed to
handle large number of concurrent clients and pgpool doesn't help it in any way, while
in my opinion, it's very logical place to do it (except redesigning postgres of course).
As such I'd say that a couple of pgpool-postgres at he moment are not designed to handle
large number of concurrent sessions.

> > > > You mentioned pgpool-II several times on the list. Could you please point me
> > > > to where I can read about it and watch it progress?
> > > > Thanks.
> > > 
> > > The progress of the project is not open at the moment but it will be
> > > finished by the end of this August anyway. pgpool-II will be
> > > distributed as open source software via pgfoundry in September. Also
> > > the presentation material at the PostgreSQL Anniversary Summit is
> > > placed at:
> > > 
> > > http://www.sraoss.co.jp/event_seminar/2006/pgpool_feat_and_devel.pdf
> > > 
> > > If you want to obtain the source code before September, please let me
> > > know.
> > 
> > Thanks for the info. I'll check that link. At the moment I'd just like to see where it's going.
> 
> You are welcome. Please feel free to ask questions regarding
> pgpool-II.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>