[pgpool-general: 559] Re: load balancing seems to be bottlenecked by performance of master

Lonni J Friedman netllama at gmail.com
Wed May 30 10:17:36 JST 2012


On Tue, May 29, 2012 at 5:32 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>> On Tue, May 29, 2012 at 3:47 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>> On 05/29/2012 07:40 AM, Lonni J Friedman wrote:
>>>>> On Mon, May 28, 2012 at 10:28 PM, Matt Solnit<msolnit at soasta.com>
>>>>> wrote:
>>>>>> On May 28, 2012, at 6:53 PM, Lonni J Friedman<netllama at gmail.com>
>>>>>> wrote:
>>>>>>> On Mon, May 28, 2012 at 5:23 PM, Tatsuo Ishii<ishii at postgresql.org>
>>>>>>> wrote:
>>>>>>>>> On Mon, May 28, 2012 at 3:54 PM, Tatsuo Ishii<ishii at sraoss.co.jp>
>>>>>>>>> wrote:
>>>>>>>>>>> What are the reasons for analysing system catalogs on primary server?
>>>>>>>>>>
>>>>>>>>>> For example, if a table is a temporary one or not.
>>>>>>>>>
>>>>>>>>> Yes, but as I noted, I don't use temp tables at all. ?If this is the
>>>>>>>>> primary justification, then its not doing me any good, and causing
>>>>>>>>> unnecessary negative performance impact.
>>>>>>>>
>>>>>>>> But how does pgpool know that you are not going to use temporary
>>>>>>>> tables beforehand?
>>>>>>>
>>>>>>> Provide a new pgpool.conf option that tells it to ignore them (with
>>>>>>> the assumption that they do not exist).
>>>>>>
>>>>>> +1 for new pgpool.conf setting, although I think the default should be
>>>>>> the
>>>>>> current behavior, for backward compatibility.
>>>>>>
>>>>>> Maybe something like "enable_temp_tables", although that might be too
>>>>>> vague.  "on" means current behavior, all system catalog queries go to
>>>>>> the
>>>>>> master.  "off" means that system catalog queries are load-balanced
>>>>>> just
>>>>>> like any other.
>>>>>
>>>>> agreed.
>>>>>
>>>>> Another idea is to create a new pgpool.conf option which makes the
>>>>> system catalogue check a configurable interval, as one of the
>>>>> following:
>>>>> * every query (current/default behavior)
>>>>> * when pgpool is started only
>>>>> * once every X minutes or X queries
>>>>>
>>>>> This would provide people with a tradeoff between performance and
>>>>> accuracy.  Anyone who never or rarely makes changes (and/or never uses
>>>>> temp tables) would likely prefer the 2nd or 3rd setting, while those
>>>>> who often make changes and/or use temp tables, would stick with the
>>>>> default (current behavior).
>>>>
>>>> Yeah that's sort of where I was going with it, it's apparently not
>>>> easy to move those checks.
>>>>
>>>> enable_temp_tables=y/n seems like a possible workaround for now, but I
>>>> have to assume that there are people out there that want to use
>>>> temp/unlogged tables and also will be write heavy and want to scale
>>>> out in this manner.
>>>
>>> Before jump into any of these ideas, I would like to confirm if the
>>> those accesses are really a bottle neck or not. Pgpool caches results
>>> of accessing system catalogs and the cache persists as long as the
>>> connection pool remains. So unless your set the life time of
>>> connection pool very short, I guess those system catalog accesses are
>>> not the bottle neck in the real world.
>>
>> My settings are as follows:
>> child_life_time = 300
>> child_max_connections = 2990
>> connection_life_time = 0
>> client_idle_limit = 180
>>
>> and I have the weighting split 33/33/33/1 with 33 going to the read
>> servers, and 1 going to the master.
>>
>> All that I'm certain of is that the responsiveness of the database
>> cluster to any query degrades as the load on the master increases.
>> I've seen numerous cases where the standby/slave servers had a load of
>> less than 1.00, yet I was still waiting seconds to get a response from
>> a trivial query.  I've even had cases where I've manually run the same
>> query on the standby/slave directly, and it came before the query
>> going through pgpool.
>
> This sounds too vage evidence that pgpool's catalog accesess is the
> major bottle neck of your case. I would like to know clearer
> evidence. For example, how many times catalog accesses happen, and
> each access takes non trivial execution time.

How would I determine how often the catalog access happens?

Regardless, I'm not who implicated the catalog access as the problem,
it was you who suggested that initially.  All I did was describe the
symptoms of the problem.  I'm quite willing to pursue other
possibilities, as long as you can provide guidance on how to do so.

>
> Or even better, someone comes up with a patch and show the performance
> difference between with/without the patche.

That someone wouldn't be me, as I'm not remotely qualified to work on
the pgpool codebase.


More information about the pgpool-general mailing list