[pgpool-general: 3250] Reattached node not receiving queries, ignored by pgpool while having status=2

Rod cckramer at gmail.com
Fri Oct 31 14:18:47 JST 2014


Hello,

Tried everything and it looks like may be a bug.


Configuration with three nodes, streaming mode:

---------------

backend_hostname0 = '10.2.2.2'
backend_port0 = 5432
backend_weight0 = 1
#backend_data_directory0 = '/data'
#backend_flag0 = 'DISALLOW_TO_FAILOVER'

backend_hostname1 = 'localhost'
backend_port1 = 5432

backend_hostname2 = '10.5.5.5'
backend_port2 = 5432
backend_weight2 = 1
#backend_flag2 = 'DISALLOW_TO_FAILOVER'


# - Authentication -

enable_pool_hba = true

num_init_children = 128
max_pool = 128

# - Life time -
child_life_time = 30
child_max_connections = 0
connection_life_time = 120
client_idle_limit = 90

connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off

insert_lock = on
lobj_lock_table = ''


# - Degenerate handling -
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off

load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'currval,lastval,nextval,setval'

master_slave_mode = on
master_slave_sub_mode = 'stream'

# - Streaming -

sr_check_period = 10
sr_check_user = 'user'
sr_check_password = 'password'
delay_threshold = 100

parallel_mode = off
enable_query_cache = off
health_check_period = 0

# Failover

failover_command = ''
failback_command = ''
fail_over_on_backend_error = on

relcache_expire = 0

------------------

All backends are Postgresql 9.1


# show pool_nodes;
 node_id |   hostname    | port | status | lb_weight |  role
---------+---------------+------+--------+-----------+---------
 0       | 10.2.2.2 | 5432 | 2      | 0.333333  | primary
 1       | localhost     | 5432 | 2      | 0.333333  | standby
 2       | 10.5.2.2   | 5432 | 2      | 0.333333  | standby
(3 rows)


WHAT HAPPENS:

1. Node 2 is restarted and pgpool automatically detects and takes it out of
circulation.
2. Node 2 is back up.
3. We reattach node 2:
$ pcp_attach_node -d 1 localhost 9898 postgres postgres 2
DEBUG: send: tos="R", len=46
DEBUG: recv: tos="r", len=21, data=AuthenticationOK
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="c", len=20, data=CommandComplete
DEBUG: send: tos="X", len=4


log:

Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915073:
send_failback_request: fail back 2 th node request from pid 2915073
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
starting fail back. reconnect host 10.5.2.2 (5432)
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688: Do
not restart children because we are failbacking node id 2 host10.5.2.2
port:5432 and we are in streaming replication mode
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
find_primary_node_repeatedly: waiting for finding a primary node
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
find_primary_node: primary node id is 0
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
failover: set new primary node: 0
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
failover: set new master node: 0
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
failback done. reconnect host 10.5.2.2(5432)
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915074:
worker process received restart request
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915084:
do_child: failback event found. restart myself.
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915166:
do_child: failback event found. restart myself.
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915195:
do_child: failback event found. restart myself.
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915197:
do_child: failback event found. restart myself.
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915108:
do_child: failback event found. restart myself.
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915141:
do_child: failback event found. restart myself.
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915094:
do_child: failback event found. discard existing connections
Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915087:
do_child: failback event found. discard existing connections
Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2915073: pcp
child process received restart request
Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2899688: PCP
child 2915073 exits with status 256 in failover()
Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2899688: fork
a new PCP child pid 2915325 in failover()
Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2899688:
worker child 2915074 exits with status 256
Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2899688: fork
a new worker child pid 2915326


# show pool_nodes;
 node_id |   hostname    | port | status | lb_weight |  role
---------+---------------+------+--------+-----------+---------
 0       | 10.2.2.2 | 5432 | 2      | 0.333333  | primary
 1       | localhost     | 5432 | 2      | 0.333333  | standby
 2       | 10.5.2.2   | 5432 | 2      | 0.333333  | standby
(3 rows)

Yet the node does not receive any queries.
pgpool2 reload does not help.

Only full restart of pgpool make it back to normal:

$ pgpool2 restart

Log:

received fast shutdown request
Oct 30 23:56:10 web02 pgpool: 2014-10-30 23:56:10 LOG:   pid 2899688:
pgpool main: close listen socket
Oct 30 23:56:10 web02 pgpool: 2014-10-30 23:56:10 ERROR: pid 2899688: Could
not open status file /var/log/postgresql/pgpool_status
Oct 30 23:56:11 web02 pgpool: 2014-10-30 23:56:11 LOG:   pid 2919540:
pgpool-II successfully started. version 3.3.4 (tokakiboshi)
Oct 30 23:56:11 web02 pgpool: 2014-10-30 23:56:11 LOG:   pid 2919540:
find_primary_node: primary node id is 0


If node is reported with status =2 and not receiving queries, it must be a
bug.

This was working fine with same configuration on pgpool2 3.1.3.
Does not work after upgrade to pgpool 3.3.4

Or is there anything wrong in my setup?
Thank you!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20141030/5f227bac/attachment.html>


More information about the pgpool-general mailing list