[pgpool-general: 3257] Re: Reattached node not receiving queries, ignored by pgpool while having status=2

Tatsuo Ishii ishii at postgresql.org
Wed Nov 5 07:45:53 JST 2014


Applications that are already connected to pgpool-II before attaching
node 2 does not recognize the node until they are reconnected to
pgpool-II. Maybe you are using kind of persistent connections in your
applications?

BTW, 

> num_init_children = 128
> max_pool = 128

do not look appropriate since this could create 128*128 = 16384
connections to each PostgreSQL.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hello,
> 
> Tried everything and it looks like may be a bug.
> 
> 
> Configuration with three nodes, streaming mode:
> 
> ---------------
> 
> backend_hostname0 = '10.2.2.2'
> backend_port0 = 5432
> backend_weight0 = 1
> #backend_data_directory0 = '/data'
> #backend_flag0 = 'DISALLOW_TO_FAILOVER'
> 
> backend_hostname1 = 'localhost'
> backend_port1 = 5432
> 
> backend_hostname2 = '10.5.5.5'
> backend_port2 = 5432
> backend_weight2 = 1
> #backend_flag2 = 'DISALLOW_TO_FAILOVER'
> 
> 
> # - Authentication -
> 
> enable_pool_hba = true
> 
> num_init_children = 128
> max_pool = 128
> 
> # - Life time -
> child_life_time = 30
> child_max_connections = 0
> connection_life_time = 120
> client_idle_limit = 90
> 
> connection_cache = on
> reset_query_list = 'ABORT; DISCARD ALL'
> replication_mode = off
> replicate_select = off
> 
> insert_lock = on
> lobj_lock_table = ''
> 
> 
> # - Degenerate handling -
> replication_stop_on_mismatch = off
> failover_if_affected_tuples_mismatch = off
> 
> load_balance_mode = on
> ignore_leading_white_space = on
> white_function_list = ''
> black_function_list = 'currval,lastval,nextval,setval'
> 
> master_slave_mode = on
> master_slave_sub_mode = 'stream'
> 
> # - Streaming -
> 
> sr_check_period = 10
> sr_check_user = 'user'
> sr_check_password = 'password'
> delay_threshold = 100
> 
> parallel_mode = off
> enable_query_cache = off
> health_check_period = 0
> 
> # Failover
> 
> failover_command = ''
> failback_command = ''
> fail_over_on_backend_error = on
> 
> relcache_expire = 0
> 
> ------------------
> 
> All backends are Postgresql 9.1
> 
> 
> # show pool_nodes;
>  node_id |   hostname    | port | status | lb_weight |  role
> ---------+---------------+------+--------+-----------+---------
>  0       | 10.2.2.2 | 5432 | 2      | 0.333333  | primary
>  1       | localhost     | 5432 | 2      | 0.333333  | standby
>  2       | 10.5.2.2   | 5432 | 2      | 0.333333  | standby
> (3 rows)
> 
> 
> WHAT HAPPENS:
> 
> 1. Node 2 is restarted and pgpool automatically detects and takes it out of
> circulation.
> 2. Node 2 is back up.
> 3. We reattach node 2:
> $ pcp_attach_node -d 1 localhost 9898 postgres postgres 2
> DEBUG: send: tos="R", len=46
> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
> DEBUG: send: tos="D", len=6
> DEBUG: recv: tos="c", len=20, data=CommandComplete
> DEBUG: send: tos="X", len=4
> 
> 
> log:
> 
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915073:
> send_failback_request: fail back 2 th node request from pid 2915073
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
> starting fail back. reconnect host 10.5.2.2 (5432)
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688: Do
> not restart children because we are failbacking node id 2 host10.5.2.2
> port:5432 and we are in streaming replication mode
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
> find_primary_node_repeatedly: waiting for finding a primary node
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
> find_primary_node: primary node id is 0
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
> failover: set new primary node: 0
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
> failover: set new master node: 0
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2899688:
> failback done. reconnect host 10.5.2.2(5432)
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915074:
> worker process received restart request
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915084:
> do_child: failback event found. restart myself.
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915166:
> do_child: failback event found. restart myself.
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915195:
> do_child: failback event found. restart myself.
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915197:
> do_child: failback event found. restart myself.
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915108:
> do_child: failback event found. restart myself.
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915141:
> do_child: failback event found. restart myself.
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915094:
> do_child: failback event found. discard existing connections
> Oct 30 23:30:38 web02 pgpool: 2014-10-30 23:30:38 LOG:   pid 2915087:
> do_child: failback event found. discard existing connections
> Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2915073: pcp
> child process received restart request
> Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2899688: PCP
> child 2915073 exits with status 256 in failover()
> Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2899688: fork
> a new PCP child pid 2915325 in failover()
> Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2899688:
> worker child 2915074 exits with status 256
> Oct 30 23:30:39 web02 pgpool: 2014-10-30 23:30:39 LOG:   pid 2899688: fork
> a new worker child pid 2915326
> 
> 
> # show pool_nodes;
>  node_id |   hostname    | port | status | lb_weight |  role
> ---------+---------------+------+--------+-----------+---------
>  0       | 10.2.2.2 | 5432 | 2      | 0.333333  | primary
>  1       | localhost     | 5432 | 2      | 0.333333  | standby
>  2       | 10.5.2.2   | 5432 | 2      | 0.333333  | standby
> (3 rows)
> 
> Yet the node does not receive any queries.
> pgpool2 reload does not help.
> 
> Only full restart of pgpool make it back to normal:
> 
> $ pgpool2 restart
> 
> Log:
> 
> received fast shutdown request
> Oct 30 23:56:10 web02 pgpool: 2014-10-30 23:56:10 LOG:   pid 2899688:
> pgpool main: close listen socket
> Oct 30 23:56:10 web02 pgpool: 2014-10-30 23:56:10 ERROR: pid 2899688: Could
> not open status file /var/log/postgresql/pgpool_status
> Oct 30 23:56:11 web02 pgpool: 2014-10-30 23:56:11 LOG:   pid 2919540:
> pgpool-II successfully started. version 3.3.4 (tokakiboshi)
> Oct 30 23:56:11 web02 pgpool: 2014-10-30 23:56:11 LOG:   pid 2919540:
> find_primary_node: primary node id is 0
> 
> 
> If node is reported with status =2 and not receiving queries, it must be a
> bug.
> 
> This was working fine with same configuration on pgpool2 3.1.3.
> Does not work after upgrade to pgpool 3.3.4
> 
> Or is there anything wrong in my setup?
> Thank you!


More information about the pgpool-general mailing list