[pgpool-general: 1297] Re: pgpool not load balancing

Tatsuo Ishii ishii at postgresql.org
Tue Jan 8 07:43:39 JST 2013


> On Sun, Jan 6, 2013 at 11:39 AM, Greg Donald <gdonald at gmail.com> wrote:
>> Works, I can now see equal work loads being distributed to all three servers.
> 
> I did a soft launch with just one of my web servers last night but I
> awoke to a broken site.
> 
> 
> In the logs I'm seeing tons of log messages like these:
> 
> 1) ProcessFrontendResponse: failed to read kind from frontend.
> frontend abnormally exited
> 
> 2) current transaction is aborted, commands ignored until end of
> transaction block
> 
> 
> And I think this is where my db2 was removed:
> 
> 
> pool pgpool[11979]: connection on node 1 was terminated due to
> conflict with recovery
> pool pgpool[11979]: do_child: exits with status 1 due to error
> pool pgpool[11984]: pool_process_query: discard E packet from backend 1
> pool pgpool[11984]: pool_read: read failed (Connection reset by peer)
> pool pgpool[11984]: degenerate_backend_set: 1 fail over request from pid 11984
> pool pgpool[11984]: pool_flush_it: write failed to backend (1).
> reason: Broken pipe offset: 0 wlen: 5
> pool pgpool[20766]: starting degeneration. shutdown host 10.123.165.60(5432)
> pool pgpool[20766]: Restart all children
> pool pgpool[20766]: find_primary_node_repeatedly: waiting for finding
> a primary node
> pool pgpool[11984]: pool_flush_it: write failed to backend (1).
> reason: Broken pipe offset: 0 wlen: 5
> pool pgpool[20766]: find_primary_node: primary node id is 0
> pool pgpool[20766]: failover: set new primary node: 0
> pool pgpool[20766]: failover: set new master node: 0
> pool pgpool[20800]: worker process received restart request
> pool pgpool[20766]: failover done. shutdown host 10.123.165.60(5432)
> 
> 
> Does "pool_flush_it: write failed to backend (1)" mean it's trying to
> write to my hot_standby slave?

No. pgpool tried to write to the socket connected to standby node.

> I don't get it.  My db2 server seems to still be replicating fine from
> db1.  Why did this work fine for 4 hours of testing last night then
> completely break down 3 hours later?

Notice this message:
> pool pgpool[11979]: connection on node 1 was terminated due to
> conflict with recovery

This is not pgpool's fault. i.e. You will see same error even if you
directly connect to PostgreSQL standby node and this is a well known
problem with PostgreSQL's streaming replication. Google "conflict with
recovery" or read:
http://www.postgresql.org/docs/9.2/static/hot-standby.html

> My Django app also gave me a number of unusual errors I've never see before:
> 
> 
> 1) django.db.utils.DatabaseError: error with no message from the libpq
> 
> 
> 2) InterfaceError: connection already closed
> 
> 
> 3) django.db.utils.DatabaseError: connection was terminated due to
> confilict with recovery
> DETAIL:  User was holding a relation lock for too long.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> FATAL:  connection was terminated due to confilict with recovery
> DETAIL:  User was holding a relation lock for too long.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.

Because pgpool did fail over the diconnected connection from your app.

> 4) django.db.utils.IntegrityError: duplicate key value violates unique
> constraint "cp_scriptrun_pkey"
> DETAIL:  Key (id)=(771338) already exists.

Apparently your app's or Django's problem.

> So I had to revert my soft launch until I can figure this out.
> 
> 
> 
> -- 
> Greg Donald


More information about the pgpool-general mailing list