[pgpool-hackers: 3716] Re: discuss of pgpool enhancement

Wed Jul 15 20:38:58 JST 2020

On Tue, Jul 14, 2020 at 4:40 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> > Hi Jianshen,
> >
> > I think it is a very good idea to have on-demand spawning of the
> > child processes and it will enable us to effectively configure the
> > Pgpool-II in environments where we get instantaneous connection spikes.
> > Currently we have to configure the Pgpool's num_init_children to a
> > value of "maximum number of connections expected" and most of the
> > time, 50 to 60 percent of child processes keep sitting idle and
> > consuming system resources.
> >
> > Similarly, I also agree with having an option of the global connection
> > pool and I believe that will enable us to have less number of opened
> backend
> > connections and also in the future we can build a different type of
> pooling
> > options on that like transaction pooling and similar features.
>
> Not sure abut this. Suppose we have num_init_children = 4 and max_pool
> = 1. We have 4 users u1, u2, u3 and u4. For simplicity, all those
> users connect to the same database.
>
> Current Pgpool-II:
> 1. u1 connects to pgpool child p1 and creates connection pool p_u1.
> 2. u2 connects to pgpool child p2 and creates connection pool p_u2.
> 3. u3 connects to pgpool child p3 and creates connection pool p_u3.
> 4. u4 connects to pgpool child p4 and creates connection pool p_u4.
>
> Pgpool-II with global connection pooling.
> 1. u1 connects to pgpool child p1 and creates connection pool p_u1.
> 2. u2 connects to pgpool child p2 and creates connection pool p_u2.
> 3. u3 connects to pgpool child p3 and creates connection pool p_u3.
> 4. u4 connects to pgpool child p4 and creates connection pool p_u4.
>
> So there's no difference with/without global connection pooling in the
> end.
>
> The case global connection when pooling wins would be, number of kind
> of users and concurrent connections are much smaller than
> num_init_children. For example, if there's only one user u1 and
> there's only one concurrent connections, we will have:
>
> Current Pgpool-II:
> 1. u1 connects to pgpool child p1 and creates connection pool p_u1.
> 2. u1 connects to pgpool child p2 and creates connection pool p_u2.
> 3. u1 connects to pgpool child p3 and creates connection pool p_u3.
> 4. u1 connects to pgpool child p4 and creates connection pool p_u4.
>
> Pgpool-II with global connection pooling.
> 1. u1 connects to pgpool child p1 and creates connection pool p_u1.
> 2. u1 connects to pgpool child p2 and reuses connection pool p_u1.
> 3. u1 connects to pgpool child p3 and reuses connection pool p_u1.
> 4. u1 connects to pgpool child p4 and reuses connection pool p_u1.
>
> Thus global connection pool wins having only 1 connection pool if
> number of kind of users and concurrent connections are much smaller
> than num_init_children.
>
> But question is, if we have only 1 concurrent session,
> num_init_children = 1 would be enough in the first place. In this case
> we will have similar result with current Pgpool-II.
>
> 1. u1 connects to pgpool child p1 and creates connection pool p_u1.
> 2. u1 connects to pgpool child p1 and reuses connection pool p_u1.
> 3. u1 connects to pgpool child p1 and reuses connection pool p_u1.
> 4. u1 connects to pgpool child p1 and reuses connection pool p_u1.
>
> So there's no point to have global connection pool here.
>

Yes, I agree with the current architecture there is no real advantage of
having a global
connection pool.

What I meant to say was: I think if we go with the on-demand child process
spawning feature
then the global connection pool would have some clear advantages.

Since with on-demand child spawning feature implemented, Pgpool-II
eventually will be spawning
and killing the child processes continuously (as per the connection load on
the system).
And in that case, having a global connection cache would be helpful.
because then we can
ensure that the backend connection cache keeps intact even with the child
processes spawning
and getting killed continuously.
Secondly, I was thinking that having a local connection cache with
on-demand child spawn feature
would make the child scale down logic much more complex and we would need
to come up
with some algorithm that takes into account the child's connection cache
state for selecting
the victim child processes to be killed.

But these are just the thoughts I had on top of my head
when I read the email and require more grooming.

> > IMHO we should take both of these features as a separate project.
> > We can start with on-demand child spawning feature and once we have
> > that in Pgpool-II we build the global connection pool option on top of
> that.
> >
> > So if you are interested in working on that, you can send the proposal
> and
> > include the details like how are you planning to manage the
> > child-process-pool
> > and when will the Pgpool-II spawn and destroy the child processes?
> > My idea would be to make child-process-pool as much configurable as
> > possible.
> > Some of the configuration parameters I can think of for the purpose are.
> >
> > CPP_batch_size                                     /* number of child
> > process we will spawn when required */
> >
> > CPP_downscale_trigger                          /* number of idle child
> > process in Pgpool-II to start
> >                                                                 * killing
> > the idle child process */
> >
> > CPP_upscale_trigger                             /* number of idle child
> > process in Pgpool-II to start
> >                                                                 *
> spawning
> > new child process */
> >
> > CPP here stands for CHILD-PROCESS-POOL and these are just my thoughts and
> > you may want to choose
> > different names and/or different types of configurations altogether.
>
> Apache already has similar parameters:
>
> -------------------------------------------------------------------------
> # prefork MPM
> # StartServers: number of server processes to start
> # MinSpareServers: minimum number of server processes which are kept spare
> # MaxSpareServers: maximum number of server processes which are kept spare
> # MaxRequestWorkers: maximum number of server processes allowed to start
>
>         StartServers                     5
>         MinSpareServers           5
>         MaxSpareServers          10
>         MaxRequestWorkers         150
> -------------------------------------------------------------------------
>
> I think our num_init_children looks similar to StartServers. So all we
> have to have are MinSpareServers, MaxSpareServers, and
> MaxRequestWorkers. (Probably we should rename them to more appropreate
> ones).
>
>
Agreed. These are far better options then what I suggested. @周建身
<zhoujianshen at highgo.com>  I think studying
the behavior of these Apache configurations can provide a good idea for
implementing
the on-demand child spawn behavior for Pgpool-II

Thanks
Best Regards
Muhammad Usama

> > Looking forward to getting an actual proposal.
> >
> > Thanks
> > Best regards
> > Muhammad Usama
> >
> >
> >
> > On Mon, Jul 13, 2020 at 2:56 PM 周建身 <zhoujianshen at highgo.com> wrote:
> >
> >> Hello Usama and Hackers,
> >>
> >>     I have tested the pgpool connection pool.And I think there are some
> >> parts need to be enhanced.
> >>
> >>     When you set the parameter num_init_children = 32.only 32 child
> >> processes will be forked and waiting for the connection from client.Each
> >> child process can only receive one client connection,therefore, only 32
> >> clients can connect to pgpool at the same time.The extra
> >> connections,before connection, can only wait for the previous
> connection to
> >> be disconnected.So,can we change the waiting connection structure of
> >> pgpool. When pgpool starts ,we can fork ten child processes to wait for
> the
> >> client to connect.When the child process receives the connection
> request,
> >> it creates a new child process to maintain the session connection.
> >>
> >>     there is also another one which should be enhance is the connection
> >> pool.Now, for each connection, the child process can only receive one
> >> client connection. Therefore, the connection pool in the child process
> does
> >> not play a global reuse effect.And each connection will re-initialize
> the
> >> connection pool. Therefore we should implement a global connection pool
> to
> >> achieve the effect of back end reuse.However ,we should confirm how many
> >> connections the global connection pool should maintain.And also we
> should
> >> confirm that if the connection pool is full,how should we respond to the
> >> arrival of new connections.I can come up with two kind of solutions.
> >>
> >>     The first one is waiting until the connection in the connection pool
> >> disconnected.and then receive the new connection.The second one,We
> should
> >> check the number of connection and  the last access time of the
> connection
> >> in connection pool.And we replace the connection which has the oldest
> >> access time in the connection pool to the new connection.or we periodic
> >> detection the access time of each connection,and throw away the
> connections
> >> whose access time exceed a certain value.then we can use the extra
> space in
> >> connection pool.
> >>
> >>     In my opinion，these two aspects need to be enhanced.How about your
> >> opinion.And what do you think we need to do to enhance these two
> >> aspects.Any suggestions and comments are welcome.
> >>
> >>
> >>
> >> Thanks
> >> Best regards
> >> Zhou Jianshen
> >> zhoujianshen at highgo.com
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> pgpool-hackers mailing list
> >> pgpool-hackers at pgpool.net
> >> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20200715/896920ac/attachment.html>