[pgpool-hackers: 3479] Re: Proposal: health check statistics

Mon Dec 16 17:13:52 JST 2019

>>> Currently Pgpool-II's health check process logs various information
>>> including backend connection problem, retrying to recover from it, and
>>> so on. This information is very important for users because it reports
>>> the healthiness problem of PostgreSQL.　For example, observing
>>> increase of retry count may suggest that network connection between
>>> Pgpool-II and PostgreSQL having trouble so that users could replace
>>> the switch before actual failure occurs. Problem is, it is annoying to
>>> look for such that information from log files afterward since it may
>>> already disappear or was not logged by other problems (such as disk
>>> full).
>>> 
>>> I would like to propose a new feature:
>>> 
>>> - Accumulate health check statistics on shared memory so that later on
>>>   users can look into the stats using PCP commands.
>>> 
>>> - Such statistics includes:
>>>   - failure count per backend nodes
>>>   - retry count per backend nodes
>>>   - success count after retries
>> 
>> I think, we should add statistis about:
>> - success count per backend nodes
>> 
>> If pgpool's statistics have this, we can know parcentage of failure.
> 
> That's definitely a good thing for users. Than you for your suggestion.

So, here is the revised proposal for health check statistics.
(all per node data).

- total count
- total success count
- total failure count
- total retry count
- average retry count
- maximum retry count
- average response time
- maximum response time
- the latest healthchek timestamp
- the latest retry timestamp
- the latest status change timestamp
- cause of the status change (failover, failback etc.)
- current status (up, down...)
- last 10 status change timestamp and it's status at the time ("10" should be configurable)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp