[pgpool-general: 314] Fwd: Re: Trouble with connection during long/large data queries

Joe Barnett joe.barnett at mr72.com
Wed Mar 28 04:49:18 JST 2012


I ran this again against pgpool with "ssl = off" in pgpool.conf and 
experienced the same interesting behavior: namely that while my script 
returned no errors and that all outward indications are that it 
completed successfully, the .dbf files in my shapes contain no data.  
This behavior only happens when running against pgpool with "ssl = off" 
in pgpool.conf.  Summary:

pgpool w/ "ssl = off" in pgpool.conf: behavior described above

pgpool w/ "ssl = on" in pgpool.conf: receive all data expected *for the 
queries that seem to finish before the connection goes away*.  
Connection seems to go away after about 400MB of data are returned.

postgresql directly (with and without ssl): receive all data expected 
for all queries. ~1.2GB

Following is the pgpool.log file (debug = 0 in pgpool.conf):

(I see some timeouts in there, some DISCARD ALL, perhaps those are 
telling?)

2012-03-26 21:50:05 LOG:   pid 2270: Backend status file 
/scratch0/var/run/pgpool/pgpool_status does not exist
2012-03-26 21:50:05 LOG:   pid 2270: pgpool-II successfully started. 
version 3.1.2 (hatsuiboshi)
2012-03-26 21:50:05 LOG:   pid 2270: find_primary_node: primary node id 
is 0
2012-03-26 21:51:51 LOG:   pid 2303: connection received: 
host=localhost port=51530
2012-03-26 21:52:06 LOG:   pid 2303: connection received: 
host=localhost port=51537
2012-03-26 21:52:09 LOG:   pid 2303: statement:  DISCARD ALL
2012-03-26 21:52:09 LOG:   pid 2303: DB node id: 0 backend pid: 3165 
statement:  DISCARD ALL
2012-03-26 21:52:09 LOG:   pid 2303: DB node id: 1 backend pid: 21796 
statement:  DISCARD ALL
2012-03-26 21:52:17 LOG:   pid 2301: connection received: 
host=localhost port=51542
2012-03-26 21:53:17 LOG:   pid 2301: authentication is timeout
2012-03-26 21:55:21 LOG:   pid 2300: connection received: 
host=boss.xxx.xxx port=42486
2012-03-26 21:55:21 LOG:   pid 2300: statement: 		CREATE TEMP TABLE 
temp_shapes_daily
		AS SELECT * FROM view_act_shapefiles_daily

2012-03-26 21:55:21 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: 		CREATE TEMP TABLE temp_shapes_daily
		AS SELECT * FROM view_act_shapefiles_daily

2012-03-26 21:55:48 LOG:   pid 2297: connection received: 
host=localhost port=51610
2012-03-26 21:55:54 LOG:   pid 2280: connection received: 
host=localhost port=51611
2012-03-26 21:56:48 LOG:   pid 2297: authentication is timeout
2012-03-26 21:56:54 LOG:   pid 2280: authentication is timeout
2012-03-26 23:19:57 LOG:   pid 2300: statement: 		SELECT COUNT(*)
		FROM temp_shapes_daily

2012-03-26 23:19:57 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname 
= 'pgpool_regclass'
2012-03-26 23:19:57 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE 
c.relname = 'pg_namespace' AND c.relnamespace = n.oid AND n.nspname = 
'pg_catalog'
2012-03-26 23:19:57 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE 
c.relname = 'temp_shapes_daily' AND c.relnamespace = n.oid AND n.nspname 
= 'pg_catalog'
2012-03-26 23:19:57 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: SELECT count(*) FROM pg_catalog.pg_class AS c, pg_attribute 
AS a WHERE c.relname = 'pg_class' AND a.attrelid = c.oid AND a.attname = 
'relistemp'
2012-03-26 23:19:57 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE 
c.relname = 'temp_shapes_daily' AND c.relnamespace = n.oid AND n.nspname 
~ '^pg_temp_'
2012-03-26 23:19:57 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: 		SELECT COUNT(*)
		FROM temp_shapes_daily

2012-03-26 23:19:59 LOG:   pid 2300: statement: 		SELECT column_name, 
data_type
		FROM information_schema.columns
		WHERE table_name = 'temp_shapes_daily'
		AND column_name NOT IN ('pg_footprint', 'pgg_footprint')
		ORDER BY ordinal_position

2012-03-26 23:19:59 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE 
c.relname = 'columns' AND c.relnamespace = n.oid AND n.nspname = 
'pg_catalog'
2012-03-26 23:20:00 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE 
c.relname = 'columns' AND c.relnamespace = n.oid AND n.nspname ~ 
'^pg_temp_'
2012-03-26 23:20:00 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: SELECT count(*) FROM pg_catalog.pg_class AS c, 
pg_catalog.pg_attribute AS a WHERE c.relname = 'pg_class' AND a.attrelid 
= c.oid AND a.attname = 'relpersistence'
2012-03-26 23:20:00 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.relname 
= 'columns' AND c.relpersistence = 'u'
2012-03-26 23:20:00 LOG:   pid 2300: DB node id: 1 backend pid: 22736 
statement: 		SELECT column_name, data_type
		FROM information_schema.columns
		WHERE table_name = 'temp_shapes_daily'
		AND column_name NOT IN ('pg_footprint', 'pgg_footprint')
		ORDER BY ordinal_position

2012-03-26 23:20:00 LOG:   pid 2300: statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'COMMISSIONING'
			AND original_product_id LIKE 'nac%'

2012-03-26 23:20:00 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'COMMISSIONING'
			AND original_product_id LIKE 'nac%'

2012-03-26 23:20:04 LOG:   pid 2300: statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'COMMISSIONING'
			AND original_product_id LIKE 'wac%'

2012-03-26 23:20:04 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'COMMISSIONING'
			AND original_product_id LIKE 'wac%'

2012-03-26 23:20:07 LOG:   pid 2300: statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'NOMINAL MISSION'
			AND original_product_id LIKE 'nac%'

2012-03-26 23:20:07 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'NOMINAL MISSION'
			AND original_product_id LIKE 'nac%'

2012-03-26 23:20:31 LOG:   pid 2300: statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'NOMINAL MISSION'
			AND original_product_id LIKE 'wac%'

2012-03-26 23:20:31 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'NOMINAL MISSION'
			AND original_product_id LIKE 'wac%'

2012-03-26 23:20:43 LOG:   pid 2300: statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'SCIENCE MISSION'
			AND original_product_id LIKE 'nac%'

2012-03-26 23:20:43 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'SCIENCE MISSION'
			AND original_product_id LIKE 'nac%'

2012-03-26 23:21:18 LOG:   pid 2300: statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'SCIENCE MISSION'
			AND original_product_id LIKE 'wac%'

2012-03-26 23:21:18 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement: 			SELECT *
			FROM temp_shapes_daily
			WHERE mission_phase_name = 'SCIENCE MISSION'
			AND original_product_id LIKE 'wac%'

2012-03-26 23:21:32 LOG:   pid 2300: statement:  DISCARD ALL
2012-03-26 23:21:32 LOG:   pid 2300: DB node id: 0 backend pid: 4546 
statement:  DISCARD ALL
2012-03-26 23:21:32 LOG:   pid 2300: DB node id: 1 backend pid: 22736 
statement:  DISCARD ALL
2012-03-27 04:00:25 LOG:   pid 2305: Replication of node:1 is behind 
16662680 bytes from the primary server (node:0)

-------- Original Message --------
Subject: [pgpool-general: 309] Re: Trouble with connection during 
long/large data queries
Date: 2012-03-26 22:08
 From: Joe Barnett <joe.barnett at mr72.com>
To: Matt Wise <matt at nextdoor.com>
Cc: pgpool-general at pgpool.net

Thank you for the responses.

We are running pgpool-ii 3.1.2 against postgres 9.1.2.  While we employ 
(postgres') streaming replication, we point pgpool only to the master 
server at this time (running a pgpool daemon local to each client 
machine).  In test, however, we do point to both the master and the 
(hot) standby server.  The pgpool behavior we are seeing is identical on 
both our test and production systems.

I followed the suggestion of disabling SSL at the pgpool level ("ssl = 
off" in pgpool.conf; prior I had just set PGSSLMODE=disable in the shell 
prior to running my Ruby script).  This produced some interesting 
results (which I will try to reproduce): the script ran without error, 
and errors were logged in neither the postgres log nor the pgpool log, 
but the resulting output contained no data.  The output, in this case, 
are shapefiles (ESRI formatted geographic data) each of which contain 
several pieces including a .dbf file.  The .dbf files were empty, though 
everything else looked as would be expected.  As the logs give no clue, 
I can not really identify what happened.  Again, I will try to reproduce 
this behavior.

In a more normal setting (SSL enabled both from the client to pgpool 
and from pgpool to the postgres server) we get an expected amount of 
data for the queries that run before the connection goes away.  In cases 
where this is run through pgpool we get about 1/3 of the way through the 
job before the connection goes away.  When we access the database 
directly, we get all of the data expected.  As I think about this, I 
might be able to modify the script to request a new connection for each 
iteration and that might prove to be another workaround--though even if 
it works, it will still only buy us some time rather than fix the root 
problem.

More on the issue of (nearly) empty return sets if I can reproduce 
them.  Otherwise, I hope this info helps shed some light on this.

Thanks,

Joe

On 2012-03-26 18:03, Matt Wise wrote:
> I believe we may be seeing this problem as well, but with much
> shorter querytimes. We have a few jobs that run database transactions
> for up to 15minutes or so. They will do 500-1000 individual database
> queries, that range from 1-30 seconds in length. We continue to see
> failures when running through PGPool and end up having to run these
> through our legacy PGBouncer service directly on the database hosts.
>
> We DO use SSL between PGPool and the backend PostgreSQL databases. We
> also use SSL in between PGPool and the clients (which is a local copy
> of PGBouncer on each client), but we do that with Stunnel so its
> transparent to the applications.
>
> —Matt
>
>
>
> On Mar 26, 2012, at 5:07 PM, Tatsuo Ishii wrote:
>
>> What pgpoo version are you using?
>>
>> Also if you disable SSL between pgpool and PostgreSQL, do you see 
>> any
>> difference?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>> We are using pgpool in front of a postrgesql 9.1 database and,
>>> generally, we are experiencing tremendous improvement in 
>>> performance
>>> (and even stability) over accessing the database directly.  We are
>>> seeing one behavior, however, which is troubling: long queries 
>>> which
>>> produce much resulting data seem to kill (or are killed by) the 
>>> pgpool
>>> process handling the connection.  By long/large, I mean connections
>>> lasting ninety minutes or more (often comprising just a handful of
>>> long running queries) and returning >1GB of data.  Specifically, 
>>> this
>>> is happening when querying (and returning) large, global sets of
>>> geographic data (which are not conducive to being run as smaller,
>>> independent queries), though we can reproduce this problem with any
>>> query taking sufficient time and returning sufficient data.
>>>
>>> The workaround is to run these queries against the database 
>>> directly,
>>> bypassing pgpool.  We would, however, like to require nearly all
>>> access to the database to go through pgpool, restricting direct 
>>> access
>>> to things of an administrative nature.
>>>
>>> Ideas, advice, and criticism welcome, thanks,
>>>
>>> Joe Barnett
>>>
>>> --- Errors and logs follow ---
>>>
>>> ### Error returned when connection goes away (from Ruby script):
>>>
>>> (with SSL enabled)
>>> ./bin/shapes.rb:150:in `exec': SSL SYSCALL error: EOF detected
>>> (PG::Error)
>>>
>>> (with SSL disabled)
>>> ./bin/shapes.rb:150:in `exec': PG::Error
>>>
>>> ### Excerpts from pgpool.log:
>>>
>>> 2012-03-26 13:16:35 LOG: pid 14860: statement: SELECT
>>> max(octet_length(hist_file_path::text)) from temp_shapes_daily
>>> 2012-03-26 13:16:35 LOG: pid 14860: DB node id: 0 backend pid: 
>>> 16005
>>> statement: SELECT max(octet_length(hist_file_path::text)) from
>>> temp_shapes_daily
>>> 2012-03-26 13:16:36 LOG: pid 14860: statement: SELECT
>>> max(octet_length(hist_file_name::text)) from temp_shapes_daily
>>> 2012-03-26 13:16:36 LOG: pid 14860: DB node id: 0 backend pid: 
>>> 16005
>>> statement: SELECT max(octet_length(hist_file_name::text)) from
>>> temp_shapes_daily
>>> 2012-03-26 13:16:37 LOG: pid 14860: statement: SELECT *
>>>                        FROM temp_shapes_daily
>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>                        AND original_product_id LIKE 'nac%'
>>>
>>> 2012-03-26 13:16:37 LOG: pid 14860: DB node id: 0 backend pid: 
>>> 16005
>>> statement: SELECT *
>>>                        FROM temp_shapes_daily
>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>                        AND original_product_id LIKE 'nac%'
>>>
>>> 2012-03-26 13:17:28 LOG: pid 14860: statement: SELECT *
>>>                        FROM temp_shapes_daily
>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>                        AND original_product_id LIKE 'wac%'
>>>
>>> 2012-03-26 13:17:28 LOG: pid 14860: DB node id: 0 backend pid: 
>>> 16005
>>> statement: SELECT *
>>>                        FROM temp_shapes_daily
>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>                        AND original_product_id LIKE 'wac%'
>>>
>>> 2012-03-26 13:17:33 LOG: pid 14846: connection received:
>>> host=localhost port=39636
>>> 2012-03-26 13:18:08 LOG: pid 14860: statement: SELECT *
>>>                        FROM temp_shapes_daily
>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>                        AND original_product_id LIKE 'nac%'
>>>
>>> 2012-03-26 13:18:08 LOG: pid 14860: DB node id: 0 backend pid: 
>>> 16005
>>> statement: SELECT *
>>>                        FROM temp_shapes_daily
>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>                        AND original_product_id LIKE 'nac%'
>>>
>>> 2012-03-26 13:22:38 LOG: pid 14860: statement: SELECT *
>>>                        FROM temp_shapes_daily
>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>                        AND original_product_id LIKE 'wac%'
>>>
>>> 2012-03-26 13:22:38 LOG: pid 14860: DB node id: 0 backend pid: 
>>> 16005
>>> statement: SELECT *
>>>                        FROM temp_shapes_daily
>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>                        AND original_product_id LIKE 'wac%'
>>>
>>> 2012-03-26 13:22:43 ERROR: pid 14860: pool_read2: EOF encountered 
>>> with
>>> backend
>>> 2012-03-26 13:22:43 LOG: pid 14860: do_child: exits with status 1 
>>> due
>>> to error
>>>
>>> (I have pgpool logs with DEBUG info as well, but they are very 
>>> large
>>> (and I am not certain how much useful information they contain).  I
>>> can certainly provide excerpts from those logs if anyone thinks 
>>> they
>>> are useful--perhaps suggest things for which I might look.)
>>>
>>> ### Excerpts from postgresql log:
>>>
>>> LOG:  could not send data to client: Connection reset by peer
>>> STATEMENT:                      SELECT *
>>>                                FROM temp_shapes_daily
>>>                                WHERE mission_phase_name = 'NOMINAL 
>>> MISSION'
>>>                                AND original_product_id LIKE 'wac%'
>>>
>>> LOG:  could not send data to client: Broken pipe
>>> STATEMENT:                      SELECT *
>>>                                FROM temp_shapes_daily
>>>                                WHERE mission_phase_name = 'NOMINAL 
>>> MISSION'
>>>                                AND original_product_id LIKE 'wac%'
>>>
>>> LOG:  SSL error: bad write retry
>>> LOG:  could not send data to client: Connection reset by peer
>>>
>>> _______________________________________________
>>> pgpool-general mailing list
>>> pgpool-general at pgpool.net
>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general

-- 
Joe Barnett
623.670.1326
_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net
http://www.pgpool.net/mailman/listinfo/pgpool-general

-- 
Joe Barnett
623.670.1326


More information about the pgpool-general mailing list