[pgpool-hackers: 4188] Dynamic spare process management of Pgpool-II children

Tue Sep 13 04:12:07 JST 2022

Hi Hackers.

Few years back we had a discussion on implementing the on-demand child
process spawning and "zhoujianshen at highgo.com" also shared a patch for that.
Ref:
https://www.sraoss.jp/pipermail/pgpool-hackers/2020-September/003831.html

The patch had a few issues and review comments and somehow or the other it
never made it to the committable state. So I decided to take that up and
re-work on that.

Little background:
The motivation behind this feature is that while deciding the value of
num_init_children configuration, the administrator has to figure out the
maximum number of concurrent client connections that needs to be supported
by the setup even if that maximum number might only hit once in a day or
even once in a month depending on the type of setup, and 90% of time only
5-10% of connections are required. But because Pgpool-II always spawns
num_init_children number of child processes at the startup, so for such
setups majority of the time the huge amount of child processes keep sitting
idle and consume the system resources. This approach is suboptimal in terms
of system resource usage and also causes problems like 'thundering herd' (
although we do have a serialize-accept to get around that) in some cases.

So the idea is to keep the spare child processes (processes that are
sitting idle in 'waiting for connection' state) within the configured
limits., and depending on the connected client count scale up/down the
number of child processes.

Attached re-worked patch:
The original patch on the topic had few shortcomings but my biggest concern
was the approach that it uses to scale down the child processes. IMHO the
patch was too aggressive in bringing down the child process if
max_spare_child and the victim process identification was not smart.
Secondly the responsibility to manage the spare children was not properly
segregated and was shared between main and child processes.

So I took up the patch and basically redesigned it from the ground up. The
attached version of patch gives the responsibility of keeping the track of
spare processes and scaling them to the main process and also implements
three strategies of scaling down. On top of that it also adds the switch
that can be used to turn off this auto scaling feature and brings back
current behaviour.

Moreover instead of using a new configuration (max_children as in the
original patch) the attached one uses the existing num_init_children config
to keep the backward compatibility.

To summarise, the patch adds the following new config parameters to control
the process scaling

-- process_management_mode (default = static )
can be set to either static or dynamic. while static keeps the current
behaviour and dynamic mode enables the auto scaling of spare processes

-- process_management_strategy (default = gentle )
Configures the process management strategy to satisfy spare processes count.
Valid options:
lazy:
In this mode the scale down is performed gradually
and only gets triggered when excessive spare processes count
remains high for more than 5 mins
gentle:
In this mode the scale down is performed gradually
and only gets triggered when excessive spare processes count
remains high for more than 2 mins
aggressive:
In this mode the scale down is performed aggressively
and gets triggered more frequently in case of higher spare processes.
This mode uses faster and slightly less smart process selection criteria
 to identify the child processes that can be serviced to satisfy
max_spare_children

-- min_spare_children
Minimum number of spare child processes to keep in waiting for connection
state
This works only for dynamic process management mode

--max_spare_children
Maximum number of spare child processes to keep in waiting for connection
state
This works only for dynamic process management mode

Furthermore, the patch relies on existing conn_counter to keep track of
connected children count, that means it does not add additional overhead of
computing that information, and the documentation updates are still not
part of the patch and I will add those once we have an agreement on the
approach, and usability of the feature.

Meanwhile I am trying to figure out a way to benchmark this feature if it
adds any performance benefits but haven't been able to figure out the way
to do that currently. So any suggestions on this topic is welcome.

Thanks
Best rgards
Muhammad Usama
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20220913/d285b0be/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dynamic_spare_process_management.diff
Type: application/octet-stream
Size: 32932 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20220913/d285b0be/attachment-0001.obj>