[pgpool-general: 7810] Re: Error: sorry, too many clients

Mon Oct 18 14:46:05 JST 2021

Hi,

She's pretty busy for now. I am going to answer your questions
instead.

> Hello Bo,
> 
> In our environment:
> reserved_connections=0
> listen_backlog_multiplier=2
> 
> So, if I understand correctly, we have 2 different problematic cases:
> 
> (1) a client silently disappears (doesn't properly close the connection):
> In this case, setting `child_idle_limit` to a non-zero value (say 15min)
> could mitigate the problem, and eventually the child worker will again be
> available to handle an incoming connection.
>
> (2) a server-side child process crashes: In this case, Pgpool still keeps a
> dangling reference to this process, and if some client is routed there will
> get a "sorry, too many clients" error.

Since the counter is shared by all process, any process could get a
"sorry..." error.

> As for now, this problem cannot be
> mitigated unless the (parent) Pgpool process is restarted.

Yes.

> One more question:
> What can make a child process crash? (well, I mean, what are the most
> common reasons for that).

Besides admin's mistake, probabbly the most case is OOM killer.

> Does this crash leave any trace (that I should search for) in Pgpool's
> error log? Does parent Pgpool get notified on this (e.g. via waitpid or
> SIGCHLD)?

Yes, Pgpool parent process should leave a log when catching SIGCHLD.
If I kill a child process using kill -9, I got following log:

2021-10-18 14:44:48.951: main pid 997555: LOG:  child process with pid: 997582 exits with status 9 by signal 9
2021-10-18 14:44:48.952: main pid 997555: LOG:  fork a new child process with pid: 997609

> Best regards, Michail
> 
> 
> 
> On Wed, Oct 13, 2021 at 10:10 AM Bo Peng <pengbo at sraoss.co.jp> wrote:
> 
>> Hello,
>>
>> I could reproduce this "sorry, too many clients" issue,
>> even if reserved_connections=0 and pgpool child processes are available.
>>
>> The cause is that the pgpool child process connected by client is
>> terminated unexpectedly.
>> After that, if client connects to that terminated child process,
>> "sorry, too many clients" error occurs.
>>
>> So far, the only workaround is to restart pgpool.
>> We are considering a solution to avoid the problem.
>>
>> Regards,
>>
>> > Hello,
>> >
>> > I want to make sure if the message "sorry, too many clients" is issued
>> by Pgpool,
>> > because Pgpool-II issues "sorry, too many clients" messages,
>> > only when reserved_connections is a non-zero value.
>> > What is the setting of "reserved_connections"?
>> >
>> > > I will  try `client_idle_limit` parameter.
>> > > I was reluctant to use it because some clients probably won't like it
>> (but
>> > > it may be easier to fix it client-side).
>> > >
>> > > I have a basic question: how can I detect leaked (occupied and not
>> reused)
>> > > connection slots?
>> > > Can `pcp_proc_info --all` provide some meaningful information in this
>> > > direction?
>> >
>> > You can use pcp_proc_info command to confirm the used (occupied)
>> connections.
>> > However, the "not resued" connection is due to your application.
>> > You may need to find out the reason why application doesn't use the
>> pooled
>> > connections on application side.
>> >
>> > For example,
>> > application processes terminated unexpectedly, but the connections
>> remained on pgpool side.
>> >
>> > BTW, in the coming major version 4.3, pcp_proc_info command is able to
>> > display more useful information such as:
>> > - connection created timestamp
>> > - connection created timestamp
>> > - last client connected timestamp
>> > - client idle duration (sec)
>> > - last client disconnected timestamp
>> > - connection-reuse counter
>> > - etc.
>> >
>> > > Let M be the maximum number (worst case) of connections that clients
>> can
>> > > ask for.
>> > > Let N be the number of per-backend connection slots (=
>> num_init_children *
>> > > max_pool) available in Pgpool.
>> > > Of course: M < N.
>> > >
>> > > I can measure the number m of occupied slots (say for backend #0) as:
>> > > m=$(pcp_proc_info ${pcp_opts} --all | grep '0$' | grep -c -v
>> '^[[:space:]]')
>> > >
>> > > If m grows greater than M for some "steady" period of time (say for
>> over
>> > > 10min),
>> > > can I interpret this as a sign of leaked (non-reused) connection slots
>> ?
>> >
>> > I think m could not be greater than M,
>> > because M is the max connections that client can ask for.
>> >
>> >   M = num_init_children - reserved_connections
>> >
>> > Regrads,
>> >
>> > > On Tue, Oct 12, 2021 at 7:10 AM Bo Peng <pengbo at sraoss.co.jp> wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > > Hello PgPool
>> > > > >
>> > > > > We are using PgPool 4.2.4 (from Alpine 3.13) as the frontend of a
>> > > > > master/standby PostgreSQL 10.14 cluster (2 synchronous standbys in
>> > > > > streaming replication mode). PgPool is operating as a load
>> balancer and a
>> > > > > connection pool.
>> > > > >
>> > > > > PgPool forks 90 pool workers (num_init_children=90, max_pool=2).
>> All
>> > > > > PostgreSQL backend servers are configured properly to accept a
>> maximum of
>> > > > > 190 connections (so, num_init_children * max_pool is below this
>> maximum
>> > > > > with a safety margin of 10).
>> > > > >
>> > > > > The relevant (pool-specific) part of the configuration is:
>> > > > > num_init_children = 90
>> > > > > max_pool = 2
>> > > > > child_life_time = 5min
>> > > > > child_max_connections = 0
>> > > > > connection_life_time = 0
>> > > > > client_idle_limit = 0
>> > > > >
>> > > > > After a while (5-10 days of operation), we face several errors of
>> the
>> > > > type
>> > > > > of "Sorry, too many clients".  These errors are not originating
>> from a
>> > > > > PostgreSQL backend (error logs are completely clear and
>> pg_stat_activity
>> > > > > shows no more than 140 (from 190) active connections).
>> > > > >
>> > > > > This problem has been around for some time now, and we have to
>> resort to
>> > > > > restarting the entire PgPool service. Of course, this is not a
>> perfect
>> > > > > solution because it disrupts some database clients (so, we also
>> have to
>> > > > > cascade those restarts to some other services, and so on).
>> > > > >
>> > > > > The majority of database clients are connecting from a Hikari pool
>> (Java
>> > > > > Spring/Boot) which creates long-standing connections (i.e. not one
>> > > > > per-transaction).
>> > > > >
>> > > > > Any light on this issue is highly appreciated!
>> > > >
>> > > > The connections hasn't been resued?
>> > > > Because you are using connection pooling between client and pgpool,
>> > > > I think the cached connections should be reused.
>> > > >
>> > > > You may need to configure "client_idle_limit" paramater
>> > > > to prevent the Pgpool-II child processes from being occupied for
>> long time
>> > > > by broken connections.
>> > > >
>> > > >
>> > > >
>> https://www.pgpool.net/docs/latest/en/html/runtime-config-connection-pooling.html#GUC-CLIENT-IDLE-LIMIT
>> > > >
>> > > > --
>> > > > Bo Peng <pengbo at sraoss.co.jp>
>> > > > SRA OSS, Inc. Japan
>> > > > http://www.sraoss.co.jp/
>> > > >
>> >
>> >
>> > --
>> > Bo Peng <pengbo at sraoss.co.jp>
>> > SRA OSS, Inc. Japan
>> > http://www.sraoss.co.jp/
>> > _______________________________________________
>> > pgpool-general mailing list
>> > pgpool-general at pgpool.net
>> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>>
>> --
>> Bo Peng <pengbo at sraoss.co.jp>
>> SRA OSS, Inc. Japan
>> http://www.sraoss.co.jp/
>>