[Pgpool-general] stopping/adding a node causes pgpool to return error to all running sqls

Wed Jun 29 16:57:09 UTC 2011

Hello,

I've just installed pgpool-II 2.3.2.1 with 2 nodes and I wanted to test what would happen if one of my backends would have failed.
So I gave a few select sqls to pgpool and I checked (with top) that the select sqls were distributed to both my nodes, so I was happy:-)

But after stopping the 2nd node, I expected to get a failure result only from those sqls that were running on the 2nd node (since I had just stopped it).
To my surprise I got error result from all my sqls!
Is this normal?
(Pgpool works fine afterwards (with node2 kicked-out of course).)
I also noticed that the same thing happens (running sqls fail) while adding node2 (pcp_attach_node 1 node1 9898 ilias 1234 1).
Again, is this normal?

Thank you
ilias

For your information, this is what I did:

root at node1:~# for ((i=1; $i<=10; i++)); do echo "select count(*) from test2"|psql -Upostgres test -p 5433 -h node1 & done
(test2 is a very big table.)

root at node2:~# /etc/init.d/postgresql stop
 * Stopping PostgreSQL 8.4 database server                                                                                                                                   [ OK ] 
root at node2:~# 

root at node1:~# pcp_node_info 1 node1 9898 ilias 1234 0
node1 5432 2 1073741823.500000
root at node1:~# pcp_node_info 1 node1 9898 ilias 1234 1
node2 5432 3 1073741823.500000

And then I got 10 times the following error:

root at node1:~# server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

Here is the log:

Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 6827: pool_read: read failed (Connection reset by peer)
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 6827: notice_backend_error: 0 fail over request from pid 6827
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 6827: pool_flush_it: write failed to backend (0). reason: Broken pipe offset: 0 wlen: 5
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 6822: pool_read: read failed (Connection reset by peer)
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 6822: notice_backend_error: 0 fail over request from pid 6822
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 6822: pool_flush_it: write failed to backend (0). reason: Broken pipe offset: 0 wlen: 5
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 8108: statement:  DISCARD ALL
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 8108: DB node id: 0 backend pid: 8203 statement:  DISCARD ALL
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 8108: DB node id: 1 backend pid: 6520 statement:  DISCARD ALL
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 7660: statement:  DISCARD ALL
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 7660: DB node id: 0 backend pid: 8207 statement:  DISCARD ALL
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 7660: DB node id: 1 backend pid: 6524 statement:  DISCARD ALL
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 7669: statement:  DISCARD ALL
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 7669: DB node id: 0 backend pid: 8209 statement:  DISCARD ALL
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 7669: DB node id: 1 backend pid: 6526 statement:  DISCARD ALL
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 8108: pool_read: read failed (Connection reset by peer)
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 8108: notice_backend_error: 1 fail over request from pid 8108
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 7669: pool_read: read failed (Connection reset by peer)
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 7669: notice_backend_error: 1 fail over request from pid 7669
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 7669: pool_flush_it: write failed to backend (1). reason: Broken pipe offset: 0 wlen: 5
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 8108: pool_flush_it: write failed to backend (1). reason: Broken pipe offset: 0 wlen: 5
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 7660: pool_read: read failed (Connection reset by peer)
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 7660: notice_backend_error: 1 fail over request from pid 7660
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 ERROR: pid 7660: pool_flush_it: write failed to backend (1). reason: Broken pipe offset: 0 wlen: 5
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 2460: starting degeneration. shutdown host node2(5432)
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 2460: failover_handler: set new master node: 0
Jun 29 19:40:43 node1 pgpool: 2011-06-29 19:40:43 LOG:   pid 2460: failover done. shutdown host node2(5432)

And finally, here is my configuration file:

root at node1:~# grep -v ^# /etc/pgpool.conf|grep -v ^$

listen_addresses = '*'
port = 5433
pcp_port = 9898
socket_dir = '/var/run/postgresql'
pcp_socket_dir = '/var/run/postgresql'
backend_socket_dir = '/var/run/postgresql'
pcp_timeout = 10
num_init_children = 32
max_pool = 4
child_life_time = 300
connection_life_time = 0
child_max_connections = 0
client_idle_limit = 0
authentication_timeout = 60
logdir = '/var/run/postgresql'
pid_file_name = '/var/run/pgpool/pgpool.pid'
replication_mode = true
load_balance_mode = true
replication_stop_on_mismatch = false
replicate_select = false
reset_query_list = 'ABORT; DISCARD ALL'
print_timestamp = true
master_slave_mode = false
connection_cache = true
health_check_timeout = 20
health_check_period = 1
health_check_user = 'nobody'
failover_command = ''
failback_command = ''
fail_over_on_backend_error = true
insert_lock = true
ignore_leading_white_space = true
log_statement = true
log_per_node_statement = true
log_connections = true 
log_hostname = true
parallel_mode = false
enable_query_cache = false
pgpool2_hostname = ''
backend_hostname0 = 'node1'
backend_port0 = 5432
backend_weight0 = 1
backend_hostname1 = 'node2'
backend_port1 = 5432
backend_weight1 = 1
enable_pool_hba = false
recovery_user = 'nobody'
recovery_password = ''
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
lobj_lock_table = ''
ssl = false
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20110629/531c5d50/attachment.html>