[pgpool-general: 59] Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

Tatsuo Ishii ishii at postgresql.org
Thu Dec 8 17:42:01 JST 2011


Lonni,

Thanks. I will look into this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> On Thu, Dec 8, 2011 at 12:06 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>> On Wed, Dec 7, 2011 at 11:10 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>> How is your PostgreSQL server configuration?
>>>
>>> Did you want a copy of my postgresql.conf ?
> 
> attached
> 
>>
>> Yes, please. Also please show me the output of "show pool_status;".
> 
> 
> show pool_status ;
>                  item                 |             value
> |                           description
> --------------------------------------+--------------------------------+------------------------------------------------------------------
>  listen_addresses                     | *
> | host name(s) or IP address(es) to listen to
>  port                                 | 9999
> | pgpool accepting port number
>  socket_dir                           | /tmp
> | pgpool socket directory
>  num_init_children                    | 195
> | # of children initially pre-forked
>  child_life_time                      | 300
> | if idle for this seconds, child exits
>  connection_life_time                 | 3
> | if idle for this seconds, connection closes
>  client_idle_limit                    | 0
> | if idle for this seconds, child connection closes
>  child_max_connections                | 0
> | if max_connections received, chile exits
>  max_pool                             | 1
> | max # of connection pool per child
>  authentication_timeout               | 60
> | maximum time in seconds to complete client authentication
>  logdir                               | /tmp
> | PgPool status file logging directory
>  log_destination                      | stderr
> | logging destination
>  syslog_facility                      | LOCAL0
> | syslog local faclity
>  syslog_ident                         | pgpool
> | syslog program ident string
>  pid_file_name                        | /var/run/pgpool/pgpool.pid
> | path to pid file
>  replication_mode                     | 0
> | non 0 if operating in replication mode
>  load_balance_mode                    | 1
> | non 0 if operating in load balancing mode
>  replication_stop_on_mismatch         | 0
> | stop replication mode on fatal error
>  failover_if_affected_tuples_mismatch | 0
> | failover if affected tuples are mismatch
>  replicate_select                     | 0
> | non 0 if SELECT statement is replicated
>  reset_query_list                     | ABORT; DISCARD ALL
> | queries issued at the end of session
>  white_function_list                  |
> | functions those do not write to database
>  black_function_list                  | currval,lastval,nextval,setval
> | functions those write to database
>  print_timestamp                      | 1
> | if true print time stamp to each log line
>  master_slave_mode                    | 1
> | if true, operate in master/slave mode
>  master_slave_sub_mode                | stream
> | master/slave sub mode
>  sr_check_period                      | 0
> | sr check period
>  sr_check_user                        | postgres
> | sr check user
>  delay_threshold                      | 0
> | standby delay threshold
>  log_standby_delay                    | none
> | how to log standby delay
>  connection_cache                     | 1
> | if true, cache connection pool
>  health_check_timeout                 | 90
> | health check timeout
>  health_check_period                  | 0
> | health check period
>  health_check_user                    | postgres
> | health check user
>  failover_command                     |
> | failover command
>  follow_master_command                |
> | follow master command
>  failback_command                     |
> | failback command
>  fail_over_on_backend_error           | 0
> | fail over on backend error
>  insert_lock                          | 0
> | insert lock
>  ignore_leading_white_space           | 1
> | ignore leading white spaces
>  num_reset_queries                    | 2
> | number of queries in reset_query_list
>  pcp_port                             | 9898
> | PCP port # to bind
>  pcp_socket_dir                       | /tmp
> | PCP socket directory
>  pcp_timeout                          | 10
> | PCP timeout for an idle client
>  log_statement                        | 0
> | if non 0, logs all SQL statements
>  log_per_node_statement               | 0
> | if non 0, logs all SQL statements on each node
>  log_connections                      | 0
> | if true, print incoming connections to the log
>  log_hostname                         | 0
> | if true, resolve hostname for ps and log print
>  enable_pool_hba                      | 1
> | if true, use pool_hba.conf for client authentication
>  recovery_user                        | postgres
> | online recovery user
>  recovery_1st_stage_command           |
> | execute a command in first stage.
>  recovery_2nd_stage_command           |
> | execute a command in second stage.
>  recovery_timeout                     | 90
> | max time in seconds to wait for the recovering node's postmaster
>  client_idle_limit_in_recovery        | 0
> | if idle for this seconds, child connection closes in recovery 2n
>  lobj_lock_table                      |
> | table name used for large object replication control
>  ssl                                  | 0
> | SSL support
>  ssl_key                              |
> | path to the SSL private key file
>  ssl_cert                             |
> | path to the SSL public certificate file
>  debug_level                          | 0
> | debug message level
>  relcache_expire                      | 0
> | relation cache expiration time in seconds
>  parallel_mode                        | 0
> | if non 0, run in parallel query mode
>  enable_query_cache                   | 0
> | if non 0, use query cache
>  pgpool2_hostname                     | cuda-fs2
> | pgpool2 hostname
>  system_db_hostname                   | localhost
> | system DB hostname
>  system_db_port                       | 5432
> | system DB port number
>  system_db_dbname                     | pgpool
> | system DB name
>  system_db_schema                     | pgpool_catalog
> | system DB schema name
>  system_db_user                       | pgpool
> | user name to access system DB
>  backend_hostname0                    | cuda-db2
> | backend #0 hostname
>  backend_port0                        | 5432
> | backend #0 port number
>  backend_weight0                      | 0.090909
> | weight of backend #0
>  backend_status0                      | 2
> | status of backend #0
>  standby_delay0                       | 0
> | standby delay of backend #0
>  backend_flag0                        | ALLOW_TO_FAILOVER
> | backend #0 flag
>  backend_hostname1                    | cuda-db1
> | backend #1 hostname
>  backend_port1                        | 5432
> | backend #1 port number
>  backend_weight1                      | 0.454545
> | weight of backend #1
>  backend_status1                      | 2
> | status of backend #1
>  standby_delay1                       | 0
> | standby delay of backend #1
>  backend_flag1                        | ALLOW_TO_FAILOVER
> | backend #1 flag
>  backend_hostname2                    | cuda-db0
> | backend #2 hostname
>  backend_port2                        | 5432
> | backend #2 port number
>  backend_weight2                      | 0.454545
> | weight of backend #2
>  backend_status2                      | 2
> | status of backend #2
>  standby_delay2                       | 0
> | standby delay of backend #2
>  backend_flag2                        | ALLOW_TO_FAILOVER
> | backend #2 flag
> (86 rows)
> 
>>
>>>> Before you said you have 1 master and 2 standbys. Is this still
>>>> correct? It seems you only have two servers from the gdb trace and I
>>>> would like to make sure that.
>>>
>>> Yes, we still have 3 servers.  pcp_node_count reports 3.
>>
>> Ok.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>>>> Nope, no SSL enabled anywhere.
>>>>>
>>>>> On Tue, Dec 6, 2011 at 6:49 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>>> Can you tell me if you are enabling SSL between frontend and pgpool
>>>>>> AND/OR pgpool and PostgreSQL?
>>>>>> --
>>>>>> Tatsuo Ishii
>>>>>> SRA OSS, Inc. Japan
>>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>>> Japanese: http://www.sraoss.co.jp
>>>>>>
>>>>>>> Lonni,
>>>>>>>
>>>>>>> First of all, pgpool-general at pgfoundry has moved to  pgpool-general at pgpool.net.
>>>>>>> Please subscribe here:
>>>>>>>
>>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>>>>>
>>>>>>> From: Lonni J Friedman <netllama at gmail.com>
>>>>>>> Subject: Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU
>>>>>>> Date: Tue, 6 Dec 2011 16:23:41 -0800
>>>>>>> Message-ID: <CAP=oouHQACD6ELcHOZz+3Oz8NkbbgjK3gSRbcbHrPKoi_DRP8g at mail.gmail.com>
>>>>>>>
>>>>>>>> On Wed, Nov 23, 2011 at 10:51 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>>>>>>> On Wed, Nov 23, 2011 at 10:42 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>>>>>>>>> Not wanting to be impatient, but I'm very concerned about this
>>>>>>>>>>>> problem, since its impossible to predict when it will occur.  Is there
>>>>>>>>>>>> additional information that I can provide to investigate this further?
>>>>>>>>>>>
>>>>>>>>>>> I really need to know where pgpool is looping.
>>>>>>>>>>
>>>>>>>>>> OK, how can I capture that information?
>>>>>>>>>
>>>>>>>>> You already attached to the pgpool process. So just type "n" (for
>>>>>>>>> "next") will tell you next line to execute. If pgpool really loops,
>>>>>>>>> "n" should show the same line after some repeating "n".
>>>>>>>>
>>>>>>>> OK, this reproduced again.  Here's the output:
>>>>>>>> #######
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x0000000000419305 in pool_process_query (frontend=0x413e1f0,
>>>>>>>> backend=0x24d3520, reset_request=<value optimized out>) at
>>>>>>>> pool_process_query.c:111
>>>>>>>> #1  0x000000000040ae42 in do_child (unix_fd=3, inet_fd=<value
>>>>>>>> optimized out>) at child.c:354
>>>>>>>> #2  0x00000000004054c5 in fork_a_child (unix_fd=3, inet_fd=4, id=126)
>>>>>>>> at main.c:1072
>>>>>>>> #3  0x0000000000407b1c in main (argc=<value optimized out>,
>>>>>>>> argv=<value optimized out>) at main.c:549
>>>>>>>> (gdb) cont
>>>>>>>> Continuing.
>>>>>>>> ^C
>>>>>>>> Program received signal SIGINT, Interrupt.
>>>>>>>> pool_ssl_pending (cp=0x413e1f0) at pool_ssl.c:247
>>>>>>>> 247     {
>>>>>>>> (gdb) n
>>>>>>>> 248             if (cp->ssl_active > 0 && SSL_pending(cp->ssl) > 0)
>>>>>>>> (gdb) n
>>>>>>>> 251     }
>>>>>>>> (gdb) n
>>>>>>>> is_cache_empty (frontend=0x413e1f0, backend=0x24d3520) at
>>>>>>>> pool_process_query.c:3232
>>>>>>>> 3232            if (!pool_read_buffer_is_empty(frontend))
>>>>>>>> (gdb) n
>>>>>>>> 3235            for (i=0;i<NUM_BACKENDS;i++)
>>>>>>>> (gdb) n
>>>>>>>> 3237                    if (!VALID_BACKEND(i))
>>>>>>>> (gdb) n
>>>>>>>> 3244                    if (pool_ssl_pending(CONNECTION(backend, i)))
>>>>>>>> (gdb) n
>>>>>>>> 3247                    if (CONNECTION(backend, i)->len > 0)
>>>>>>>> (gdb) n
>>>>>>>> 3237                    if (!VALID_BACKEND(i))
>>>>>>>> (gdb) n
>>>>>>>> 3244                    if (pool_ssl_pending(CONNECTION(backend, i)))
>>>>>>>> (gdb) n
>>>>>>>> 3247                    if (CONNECTION(backend, i)->len > 0)
>>>>>>>> (gdb) n
>>>>>>>> 3252    }
>>>>>>>> (gdb) n
>>>>>>>> 3235            for (i=0;i<NUM_BACKENDS;i++)
>>>>>>>> (gdb) n
>>>>>>>> 3252    }
>>>>>>>> (gdb) n
>>>>>>>> pool_process_query (frontend=0x413e1f0, backend=0x24d3520,
>>>>>>>> reset_request=<value optimized out>) at pool_process_query.c:361
>>>>>>>> 361                             if
>>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 388                     if (got_sighup)
>>>>>>>> (gdb) n
>>>>>>>> 111             state = 0;
>>>>>>>> (gdb) n
>>>>>>>> 116                     if (state == 0 && reset_request)
>>>>>>>> (gdb) n
>>>>>>>> 159                     check_stop_request();
>>>>>>>> (gdb) n
>>>>>>>> 165                     if (*InRecovery > 0 &&
>>>>>>>> pool_config->client_idle_limit_in_recovery == -1)
>>>>>>>> (gdb) n
>>>>>>>> 179                     if (is_cache_empty(frontend, backend) &&
>>>>>>>> !pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 361                             if
>>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 388                     if (got_sighup)
>>>>>>>> (gdb) n
>>>>>>>> 111             state = 0;
>>>>>>>> (gdb) n
>>>>>>>> 116                     if (state == 0 && reset_request)
>>>>>>>> (gdb) n
>>>>>>>> 159                     check_stop_request();
>>>>>>>> (gdb) n
>>>>>>>> 165                     if (*InRecovery > 0 &&
>>>>>>>> pool_config->client_idle_limit_in_recovery == -1)
>>>>>>>> (gdb) n
>>>>>>>> 179                     if (is_cache_empty(frontend, backend) &&
>>>>>>>> !pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 361                             if
>>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 388                     if (got_sighup)
>>>>>>>> (gdb) n
>>>>>>>> 111             state = 0;
>>>>>>>> (gdb) n
>>>>>>>> 116                     if (state == 0 && reset_request)
>>>>>>>> (gdb) n
>>>>>>>> 159                     check_stop_request();
>>>>>>>> (gdb) n
>>>>>>>> 165                     if (*InRecovery > 0 &&
>>>>>>>> pool_config->client_idle_limit_in_recovery == -1)
>>>>>>>> (gdb) n
>>>>>>>> 179                     if (is_cache_empty(frontend, backend) &&
>>>>>>>> !pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 361                             if
>>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 388                     if (got_sighup)
>>>>>>>> (gdb) n
>>>>>>>> 111             state = 0;
>>>>>>>> (gdb) n
>>>>>>>> 116                     if (state == 0 && reset_request)
>>>>>>>> (gdb) n
>>>>>>>> 159                     check_stop_request();
>>>>>>>> (gdb) n
>>>>>>>> 165                     if (*InRecovery > 0 &&
>>>>>>>> pool_config->client_idle_limit_in_recovery == -1)
>>>>>>>> (gdb) n
>>>>>>>> 179                     if (is_cache_empty(frontend, backend) &&
>>>>>>>> !pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 361                             if
>>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) n
>>>>>>>> 379                             if
>>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>>> pool_is_query_in_progress())
>>>>>>>> (gdb) bt
>>>>>>>> #0  pool_process_query (frontend=0x413e1f0, backend=0x24d3520,
>>>>>>>> reset_request=<value optimized out>) at pool_process_query.c:379
>>>>>>>> #1  0x000000000040ae42 in do_child (unix_fd=3, inet_fd=<value
>>>>>>>> optimized out>) at child.c:354
>>>>>>>> #2  0x00000000004054c5 in fork_a_child (unix_fd=3, inet_fd=4, id=126)
>>>>>>>> at main.c:1072
>>>>>>>> #3  0x0000000000407b1c in main (argc=<value optimized out>,
>>>>>>>> argv=<value optimized out>) at main.c:549
>>>>>>>> (gdb) q
>>>>>>>> A debugging session is active.
>>>>>>>>
>>>>>>>>         Inferior 1 [process 22143] will be detached.
>>>>>>>> #######
>>>>>>>>
>>>>>>>>
>>>>>>>> Does this clarify where the problem exists?  If so, is it fixed in 3.1.1?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>
>>>>>>> Thanks. I will look into this. I'm sure this is not fixed in 3.1.1
>>>>>>> since the issue above has not been addressed yet.


More information about the pgpool-general mailing list