[pgpool-general: 1529] Re: Mysterious Load Spikes
Lonni J Friedman
netllama at gmail.com
Thu Mar 28 00:15:48 JST 2013
On Wed, Mar 27, 2013 at 8:09 AM, Quentin Hartman
<qhartman at direwolfdigital.com> wrote:
> We've been happily using the pgpool2 3.1.1 packages from Ubuntu 12.04 on
> Ubuntu 11.04 for nearly a year. We needed features in 3.1 and backporting
> the packages was the most clean and expedient way to get that version.
> We recently upgraded the distribution on those servers to 12.04, so we are
> still running the same pgpool2 packages, but now we have this mysterious
> "phantom load" on those boxes.
> Under the same workload as they had before, they now are spiking to 5 or 10
> load (these are dual core machines) every few minutes with no perceivable
> explanation. There is no process consuming CPU, plenty of RAM is available,
> and there is negligible disk IO. It may be worth mentioning these are
> virtualized machines running on Amazon EC2, and the steal rate is also very
> low, so I don't think there is something external to the machines causing
> the problem. Additionally, this upgrade was performed on nearly 40 machines
> in this cluster, and only the ones running pgpool2 are showing this
> behavior, so I'm confident it's related to pgpool2.
> I've seen at least one other post on the list mentioning this problem, but
> no resolution.
> Is this something anyone else has seen? Is there anything that has changed
> in the newer releases that is likely to affect this? My suspicion is that
> something changed in the 3.x kernel series that pgpool2 doesn't interact
> with well. Is there more information I can provide that would help debug /
FWIW, I also see roughly the same behavior, although my environment is
somewhat different than yours. Previously my pgpool server was
running on RHEL6, however it was migrated to Fedora17 (without
changing the pgpool version). Before the migration the load on the
server was typically less than 1.00. Since migrating the load average
is 30+ (with spikes up to 60 quite often). Yet there's nothing
obvious causing the load. I'm certain something in pgpool is causing
this behavior, because the load drops to 0 during the rare times when
we need to stop pgpool for maintenance.
I suspect that this is a Linux kernel issue, as the version of pgpool
didn't change when the issue started. I'm at a loss on what is
actually causing the problem, but perhaps its some weird inefficiency
in context switching inside the kernel? I'd love to know if there's
some way to tune the kernel to eliminate the behavior, as its always
quite scary to see a production server with a load of 30+ when its
operating supposedly normally.
More information about the pgpool-general