[pgpool-general: 6466] Re: High I/O Usage on PGPool nodes
ishii at sraoss.co.jp
Mon Mar 18 09:58:20 JST 2019
I have pushed the proposed patches to all supported branches.
SRA OSS, Inc. Japan
>> On Feb 27, 2019, 10:51 AM -0800, Stephen Eilert <contact at stepheneilert.com>, wrote:
>>> On Feb 26, 2019, 6:27 PM -0800, Tatsuo Ishii <ishii at sraoss.co.jp>, wrote:
>>> > > Hi,
>>> > >
>>> > > Apologies if this has been discussed before, but if so I can’t find a relevant thread.
>>> > I think this has not been discussed before. Thank you for the report.
>>> > > We are experiencing some strange behavior on our PGPool nodes. In some of them, load average is well in excess of 100 – although CPU is over 70% idle (rest spent in system), so this pointed to I/O. Checking on AWS, we see that there is a lot of disk activity, including writes. This was surprising, as we did not expect the pgpool machines to have any significant disk usage, nor we can find anything in the documentation that would indicate it to be the case.
>>> > >
>>> > > After some sysdig usage, the only file getting heavily written to appears to be /var/log/postgresql/pgpool_status
>>> > >
>>> > > It is multiple 6 bytes writes a second, all with the same data, by pgpool. So this shouldn’t be a problem as it is not too much data. Except that it seems to be getting flushed right away (fsync?).
>>> > >
>>> > > Is this behavior expected? The PG node status have not changed at any time, there are two nodes, always up. Why would the file get rewritten dozens (even hundreds, depending on load) of times per second?
>>> > >
>>> > > Would this be caused by a misconfiguration?
>>> > I appears that every time a client connects to Pgpool-II, and
>>> > Pgpool-II creates new connections to PostgreSQL node, it issues
>>> > fsync(). Maybe you disabled connection cache or set very short life
>>> > time for connection cache?
>>> > Anyway, I think current behavior of Pgpool-II is not best optimized
>>> > and I would like to propose attached patch to enhance it. It eliminate
>>> > the necessity of fsync(): instead of every time a connection is
>>> > created to PostgreSQL, fsync() will be issued only once for the first
>>> > time a connection is created to PostgreSQL.
>>> > The patch should be able to applied to 3.7.
>>> That would explain it.
>>> Our clients are a bunch of microservices running in Kubernetes. There are a lot of them. Their behavior is, whenever they need to perform any database operation, they will establish a new connection, perform any queries needed, then close the connection as soon as they are done. This is so because we experienced issues in the past with holding connections for longer periods. And not all of them require constant database communication(they may be idle waiting for API calls themselves), so it was also wasteful
>>> I don't have enough understanding of pgpool's internals yet, but it looks to me that the patch would solve this issue.
>>> Thanks for the response. We have increased resources in the pgpool machine for now while we wait a patch. Load is still very high, but manageable.
>>> – Stephen
>> It seems that I missed one thing and jumped the gun.
>> From what you have explained, it is not only when new client connections are created to PGPool, but they need to ALSO be created to PostgreSQL itself, for this behavior to manifest, right?
>> Then this doesn’t fully explain it. Connection cache is on (attached pgpool config file).
>> We do reach the configured 500 connection limit on pgpool processes quite often (after which pgpool will spawn another process). Could this be related?
> But you have:
> child_life_time = 300
> # Pool exits after being idle for this many seconds
> child_max_connections = 250
> # Pool exits after receiving that many connections
> # 0 means no exit
> connection_life_time = 180
> # Connection to backend closes after being idle for this many seconds
> # 0 means no close
> with these settings, Pgpool-II child process would die at some point
> (because child_life_time = 300, child_max_connections = 250), or
> closes the connection to backend (connection_life_time = 180). After
> these events happen, connections to backend in those process does not
> exist and you will hit the issue.
> Best regards,
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> pgpool-general mailing list
> pgpool-general at pgpool.net
More information about the pgpool-general