[Pgpool-general] "torture test" for pgpool-ii?

Tatsuo Ishii ishii at sraoss.co.jp
Sun Dec 13 10:06:44 UTC 2009


> On 13.12.2009 02:03, Tomasz Chmielewski wrote:
> > Is there any "torture test" for pgpool-ii?
> > 
> > Say, a test which would connect over pgpool-ii and do the following:
> > 
> > - create a database,
> > 
> > - do lots of inserts etc. changes, over a certain period of time (30 
> > mins or so),
> > 
> > - ideally, compare databases which pgpool-ii uses.
> 
> I found I could use "pgbench", i.e.:
> 
> /usr/lib/postgresql/8.3/bin/pgbench -i -t 100 -s 10 -F 100 -h localhost -p 9999 -U pgpool2 -d bench_replication
> 
> 
> However, pgpool-ii doesn't work flawlessly for me - whenever I either kill pgpool during pgbench is running,
> or just detach one node, I'm no longer able to use the same "pgbench -i -t 100 ...", even if I do one node recovery.
> 
> 
> To reproduce:
> 
> - start this command:
> 
> /usr/lib/postgresql/8.3/bin/pgbench -i -t 100 -s 10 -F 100 -h localhost -p 9999 -U pgpool2 -d bench_replication
> 
> 
> - while the above command still runs, detach one node,
> 
> - recovery this node,
> 
> - try to start pgbench command again - it will fail with:
> 
> Connection to database "bench_replication" failed:
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> 
> 
> pgpool -nd will log:
> 
> 2009-12-13 01:50:46 DEBUG: pid 24222: I am 24222 accept fd 5
> 2009-12-13 01:50:46 LOG:   pid 24222: connection received: host=127.0.0.1 port=34907
> 2009-12-13 01:50:46 DEBUG: pid 24222: Protocol Major: 1234 Minor: 5679 database:  user:
> 2009-12-13 01:50:46 DEBUG: pid 24222: SSLRequest: sent N; retry startup
> 2009-12-13 01:50:46 DEBUG: pid 24222: Protocol Major: 3 Minor: 0 database: bench_replication user: pgpool2
> 2009-12-13 01:50:46 DEBUG: pid 24222: new_connection: connecting 0 backend
> 2009-12-13 01:50:46 DEBUG: pid 24222: new_connection: connecting 1 backend
> 2009-12-13 01:50:46 DEBUG: pid 24222: pool_read_message_length: slot: 0 length: 8
> 2009-12-13 01:50:46 DEBUG: pid 24222: pool_read_message_length: slot: 1 length: 8
> 2009-12-13 01:50:46 ERROR: pid 24222: pool_read_kind: kind does not match between master(53) slot[1] (45)
> 2009-12-13 01:50:46 ERROR: pid 24222: pool_do_auth: failed to read kind before BackendKeyData
> 
> 
> What's wrong here? I use pgpool-2.3.

Probably online recovery faild. You could try to connect to the second
DB node using psql with database bench_replication if there's
something wrong.

The cause of the online recovery failure is not clear, but I
experienced it a few times. In my case, just deleting the PostgreSQL's
base directory (for example, /usr/local/data/base/) on the
to-be-recovered DB node before running rsync solved the
problem. Please make sure that you are *not* deleting live database's
base directory.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


More information about the Pgpool-general mailing list