[Pgpool-general] pcp_child: pcp_read() failed. reason: Success

Fri Nov 6 05:37:34 UTC 2009

Actually I'm was running pgpool on db2 (backend_hostname1) and am now
running it on db3 (backend_hostname2).
I have actually suspected that pgpool might be opting for some sort of
socket connection to the local instance of postgres instead of using the
TCP/IP connection parameters in an effort to speed things up.

I have done my best to ensure that pgpool has completely separate socket
directories but it wouldn't be hard for pgpool to find a local postgres
socket if it wanted.  If I end up with another outage and this time db3 is
the postgres instance that locks up, I'll be fairly certain that this is the
problem but for the moment I can only speculate.

I'm assuming you're suggesting I set backend_hostname0 = '' because it is
already weighted to 0.0 anyway?

I have db1 (backend_hostname0) weighted to 0.0 in an effort to direct all
selects to the two slave hosts (db2 and db3) but still benefit from pgpool
intelligently sending writes to db1.
db1 is the mammoth master host and needs all available i/o to deal with
writes.
My understanding is that this is how "master_slave_mode = true" works.
Writes are always directed to backend_hostname0.

If I need to reevaluate that thinking, please advise but that has been
working for me for months now.

thx
-s

On Thu, Nov 5, 2009 at 9:26 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Besides the useless error message from pgp_child(it seems someone
> believed that EOF will set some error number to the global errono
> variable. I will fix this anyway.), for me it seems socket files are
> going dead. I suspect some network stack bugs could cause this but I'm
> not sure. One thing you might want to try is, changing this:
>
> backend_hostname0 = 'db1.xxx.xxx'
>
> to:
>
> backend_hostname0 = ''
>
> This will make pgpool to use UNIX domain socket for the communication
> channel to PostgreSQL, rather than TCP/IP. It may or may not affect
> the problem you have, since the network code in the kernel will be
> different.
>
> (I assume you are running pgpool on db1.xxx.xxx)
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
> > Has anyone else run into this:
> >
> > My pgpool instance runs without problems for days on end and then
> suddenly
> > stops responding to all requests.
> > At the same moment, one of my three backend db hosts becomes completely
> > inaccessible.
> > Pgpool will not respond to shutdown, or even kill and must be kill -9'd
> > Once all pgpool processes are out of the way, the inaccessible postgres
> > server once again becomes responsive.
> > I restart pgpool and everything works properly for a few more days.
> >
> > At the moment the problem occurs, pgpool's log output, which typically
> > consists of just connection logging, turns into a steady stream of this:
> > Nov  5 11:33:18 src at obfuscated pgpool: 2009-11-05 11:33:18 ERROR: pid
> 12811:
> > pcp_child: pcp_read() failed. reason: Success
> > These errors show up sporaticlly in my pgpool logs all the time but don't
> > appear to have any adverse effects until the whole thing takes a dive.
> > I would desperately like to know what this error message is trying to
> tell
> > me.
> >
> > I have not been able to correlate any given query/connection/process to
> the
> > timing of the outages.
> > Sometimes they happens at peak usage periods, sometimes they happen in
> the
> > middle of the night.
> >
> > I experienced this problem using pgpool-II v1.3 and have recently
> upgraded
> > to pgpool-II v2.2.5 but am still seeing the same issue.
> >
> > It may be relevant to point out that I am running pgpool on one of the
> > machines that is also acting as a postgres backend and it is always the
> > postgres instance on the pgpool host that locks up.
> > This morning I moved the pgpool instance onto another one of the postgres
> > backend hosts in an effort to see if the cohabitation of pgpool and
> postgres
> > is causing problems or if there is simply an issue with that postres on
> that
> > host of if this is just a coincidence.
> > I likely won't gain anything from this test for a day or more.
> >
> > Also relevant is that I am running mammoth replicator and am only using
> > pgpool for connection load balancing and high availability.
> >
> > Below is my pgpool.conf.
> >
> > Any thoughts appreciated.
> >
> > -steve crandell
> >
> >
> >
> > f
> >
> > #
> > # pgpool-II configuration file sample
> > # $Header: /cvsroot/pgpool/pgpool-II/pgpool.conf.sample,v 1.4.2.3
> > 2007/10/12 09:15:02 y-asaba Exp $
> >
> > # Host name or IP address to listen on: '*' for all, '' for no TCP/IP
> > # connections
> > #listen_addresses = 'localhost'
> > listen_addresses = '10.xxx.xxx.xxx'
> >
> > # Port number for pgpool
> > port = 5432
> >
> > # Port number for pgpool communication manager
> > pcp_port = 9898
> >
> > # Unix domain socket path.  (The Debian package defaults to
> > # /var/run/postgresql.)
> > socket_dir = '/usr/local/pgpool'
> >
> > # Unix domain socket path for pgpool communication manager.
> > pcp_socket_dir = '/usr/local/pgpool'
> >
> > # Unix domain socket path for the backend. Debian package defaults to
> > /var/run/postgresql!
> > backend_socket_dir = '/usr/local/pgpool'
> >
> > # pgpool communication manager timeout. 0 means no timeout, but
> > strongly not recommended!
> > pcp_timeout = 10
> >
> > # number of pre-forked child process
> > num_init_children = 32
> >
> >
> > # Number of connection pools allowed for a child process
> > max_pool = 4
> >
> >
> > # If idle for this many seconds, child exits.  0 means no timeout.
> > child_life_time = 30
> >
> > # If idle for this many seconds, connection to PostgreSQL closes.
> > # 0 means no timeout.
> > #connection_life_time = 0
> > connection_life_time = 30
> >
> > # If child_max_connections connections were received, child exits.
> > # 0 means no exit.
> > # change
> > child_max_connections = 0
> >
> > # Maximum time in seconds to complete client authentication.
> > # 0 means no timeout.
> > authentication_timeout = 60
> >
> > # Logging directory (more accurately, the directory for the PID file)
> > logdir = '/usr/local/pgpool'
> >
> > # Replication mode
> > replication_mode = false
> >
> > # Set this to true if you want to avoid deadlock situations when
> > # replication is enabled.  There will, however, be a noticable
> performance
> > # degradation.  A workaround is to set this to false and insert a
> /*STRICT*/
> > # comment at the beginning of the SQL command.
> > replication_strict = false
> >
> > # When replication_strict is set to false, there will be a chance for
> > # deadlocks.  Set this to nonzero (in milliseconds) to detect this
> > # situation and resolve the deadlock by aborting current session.
> > replication_timeout = 5000
> >
> > # Load balancing mode, i.e., all SELECTs except in a transaction block
> > # are load balanced.  This is ignored if replication_mode is false.
> > # change
> > load_balance_mode = true
> >
> > # if there's a data mismatch between master and secondary
> > # start degeneration to stop replication mode
> > replication_stop_on_mismatch = false
> >
> > # If true, replicate SELECT statement when load balancing is disabled.
> > # If false, it is only sent to the master node.
> > # change
> > replicate_select = true
> >
> > # Semicolon separated list of queries to be issued at the end of a
> session
> > reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
> >
> > # If true print timestamp on each log line.
> > print_timestamp = true
> >
> > # If true, operate in master/slave mode.
> > # change
> > master_slave_mode = true
> >
> > # If true, cache connection pool.
> > connection_cache = false
> >
> > # Health check timeout.  0 means no timeout.
> > health_check_timeout = 20
> >
> > # Health check period.  0 means no health check.
> > health_check_period = 0
> >
> > # Health check user
> > health_check_user = 'nobody'
> >
> > # If true, automatically lock table with INSERT statements to keep SERIAL
> > # data consistency.  An /*INSERT LOCK*/ comment has the same effect.  A
> > # /NO INSERT LOCK*/ comment disables the effect.
> > insert_lock = false
> >
> > # If true, ignore leading white spaces of each query while pgpool judges
> > # whether the query is a SELECT so that it can be load balanced.  This
> > # is useful for certain APIs such as DBI/DBD which is known to adding an
> > # extra leading white space.
> > ignore_leading_white_space = false
> >
> > # If true, print all statements to the log.  Like the log_statement
> option
> > # to PostgreSQL, this allows for observing queries without engaging in
> full
> > # debugging.
> > log_statement = false
> >
> > # If true, incoming connections will be printed to the log.
> > # change
> > log_connections = true
> >
> > # If true, hostname will be shown in ps status. Also shown in
> > # connection log if log_connections = true.
> > # Be warned that this feature will add overhead to look up hostname.
> > log_hostname = false
> >
> > # if non 0, run in parallel query mode
> > parallel_mode = false
> >
> > # if non 0, use query cache
> > enable_query_cache = 0
> >
> > #set pgpool2 hostname
> > pgpool2_hostname = ''
> >
> > # system DB info
> > #system_db_hostname = 'localhost'
> > #system_db_port = 5432
> > #system_db_dbname = 'pgpool'
> > #system_db_schema = 'pgpool_catalog'
> > #system_db_user = 'pgpool'
> > #system_db_password = ''
> >
> > # backend_hostname, backend_port, backend_weight
> > # here are examples
> > backend_hostname0 = 'db1.xxx.xxx'
> > backend_port0 = 5433
> > backend_weight0 = 0.0
> >
> > backend_hostname1 = 'db2.xxx.xxx'
> > backend_port1 = 5433
> > backend_weight1 = 0.4
> >
> > backend_hostname2 = 'db3.xxx.xxx'
> > backend_port2 = 5433
> > backend_weight2 = 0.6
> >
> >
> >
> > # - HBA -
> >
> > # If true, use pool_hba.conf for client authentication. In pgpool-II
> > # 1.1, the default value is false. The default value will be true in
> > # 1.2.
> > enable_pool_hba = false
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20091105/a0f89a98/attachment-0001.html>