[pgpool-general: 340] Re: Trouble with connection during long/large data queries

Tatsuo Ishii ishii at postgresql.org
Tue Apr 3 20:26:08 JST 2012


I was able to reproduce your problem while retrieving large data from
PostgreSQL, with SSL enabled between pgpool and PostgreSQL. Included
is the patch which should fix the problem(this patch solved the
reporduced problem in our environment). Please try if you like.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Thank you for the responses.
> 
> We are running pgpool-ii 3.1.2 against postgres 9.1.2.  While we
> employ (postgres') streaming replication, we point pgpool only to the
> master server at this time (running a pgpool daemon local to each
> client machine).  In test, however, we do point to both the master and
> the (hot) standby server.  The pgpool behavior we are seeing is
> identical on both our test and production systems.
> 
> I followed the suggestion of disabling SSL at the pgpool level ("ssl =
> off" in pgpool.conf; prior I had just set PGSSLMODE=disable in the
> shell prior to running my Ruby script).  This produced some
> interesting results (which I will try to reproduce): the script ran
> without error, and errors were logged in neither the postgres log nor
> the pgpool log, but the resulting output contained no data.  The
> output, in this case, are shapefiles (ESRI formatted geographic data)
> each of which contain several pieces including a .dbf file.  The .dbf
> files were empty, though everything else looked as would be expected.
> As the logs give no clue, I can not really identify what happened.
> Again, I will try to reproduce this behavior.
> 
> In a more normal setting (SSL enabled both from the client to pgpool
> and from pgpool to the postgres server) we get an expected amount of
> data for the queries that run before the connection goes away.  In
> cases where this is run through pgpool we get about 1/3 of the way
> through the job before the connection goes away.  When we access the
> database directly, we get all of the data expected.  As I think about
> this, I might be able to modify the script to request a new connection
> for each iteration and that might prove to be another
> workaround--though even if it works, it will still only buy us some
> time rather than fix the root problem.
> 
> More on the issue of (nearly) empty return sets if I can reproduce
> them.  Otherwise, I hope this info helps shed some light on this.
> 
> Thanks,
> 
> Joe
> 
> On 2012-03-26 18:03, Matt Wise wrote:
>> I believe we may be seeing this problem as well, but with much
>> shorter querytimes. We have a few jobs that run database transactions
>> for up to 15minutes or so. They will do 500-1000 individual database
>> queries, that range from 1-30 seconds in length. We continue to see
>> failures when running through PGPool and end up having to run these
>> through our legacy PGBouncer service directly on the database hosts.
>>
>> We DO use SSL between PGPool and the backend PostgreSQL databases. We
>> also use SSL in between PGPool and the clients (which is a local copy
>> of PGBouncer on each client), but we do that with Stunnel so its
>> transparent to the applications.
>>
>> ―Matt
>>
>>
>>
>> On Mar 26, 2012, at 5:07 PM, Tatsuo Ishii wrote:
>>
>>> What pgpoo version are you using?
>>>
>>> Also if you disable SSL between pgpool and PostgreSQL, do you see any
>>> difference?
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>>> We are using pgpool in front of a postrgesql 9.1 database and,
>>>> generally, we are experiencing tremendous improvement in performance
>>>> (and even stability) over accessing the database directly.  We are
>>>> seeing one behavior, however, which is troubling: long queries which
>>>> produce much resulting data seem to kill (or are killed by) the pgpool
>>>> process handling the connection.  By long/large, I mean connections
>>>> lasting ninety minutes or more (often comprising just a handful of
>>>> long running queries) and returning >1GB of data.  Specifically, this
>>>> is happening when querying (and returning) large, global sets of
>>>> geographic data (which are not conducive to being run as smaller,
>>>> independent queries), though we can reproduce this problem with any
>>>> query taking sufficient time and returning sufficient data.
>>>>
>>>> The workaround is to run these queries against the database directly,
>>>> bypassing pgpool.  We would, however, like to require nearly all
>>>> access to the database to go through pgpool, restricting direct access
>>>> to things of an administrative nature.
>>>>
>>>> Ideas, advice, and criticism welcome, thanks,
>>>>
>>>> Joe Barnett
>>>>
>>>> --- Errors and logs follow ---
>>>>
>>>> ### Error returned when connection goes away (from Ruby script):
>>>>
>>>> (with SSL enabled)
>>>> ./bin/shapes.rb:150:in `exec': SSL SYSCALL error: EOF detected
>>>> (PG::Error)
>>>>
>>>> (with SSL disabled)
>>>> ./bin/shapes.rb:150:in `exec': PG::Error
>>>>
>>>> ### Excerpts from pgpool.log:
>>>>
>>>> 2012-03-26 13:16:35 LOG: pid 14860: statement: SELECT
>>>> max(octet_length(hist_file_path::text)) from temp_shapes_daily
>>>> 2012-03-26 13:16:35 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>> statement: SELECT max(octet_length(hist_file_path::text)) from
>>>> temp_shapes_daily
>>>> 2012-03-26 13:16:36 LOG: pid 14860: statement: SELECT
>>>> max(octet_length(hist_file_name::text)) from temp_shapes_daily
>>>> 2012-03-26 13:16:36 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>> statement: SELECT max(octet_length(hist_file_name::text)) from
>>>> temp_shapes_daily
>>>> 2012-03-26 13:16:37 LOG: pid 14860: statement: SELECT *
>>>>                        FROM temp_shapes_daily
>>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>>                        AND original_product_id LIKE 'nac%'
>>>>
>>>> 2012-03-26 13:16:37 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>> statement: SELECT *
>>>>                        FROM temp_shapes_daily
>>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>>                        AND original_product_id LIKE 'nac%'
>>>>
>>>> 2012-03-26 13:17:28 LOG: pid 14860: statement: SELECT *
>>>>                        FROM temp_shapes_daily
>>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>>                        AND original_product_id LIKE 'wac%'
>>>>
>>>> 2012-03-26 13:17:28 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>> statement: SELECT *
>>>>                        FROM temp_shapes_daily
>>>>                        WHERE mission_phase_name = 'COMMISSIONING'
>>>>                        AND original_product_id LIKE 'wac%'
>>>>
>>>> 2012-03-26 13:17:33 LOG: pid 14846: connection received:
>>>> host=localhost port=39636
>>>> 2012-03-26 13:18:08 LOG: pid 14860: statement: SELECT *
>>>>                        FROM temp_shapes_daily
>>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>                        AND original_product_id LIKE 'nac%'
>>>>
>>>> 2012-03-26 13:18:08 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>> statement: SELECT *
>>>>                        FROM temp_shapes_daily
>>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>                        AND original_product_id LIKE 'nac%'
>>>>
>>>> 2012-03-26 13:22:38 LOG: pid 14860: statement: SELECT *
>>>>                        FROM temp_shapes_daily
>>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>                        AND original_product_id LIKE 'wac%'
>>>>
>>>> 2012-03-26 13:22:38 LOG: pid 14860: DB node id: 0 backend pid: 16005
>>>> statement: SELECT *
>>>>                        FROM temp_shapes_daily
>>>>                        WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>                        AND original_product_id LIKE 'wac%'
>>>>
>>>> 2012-03-26 13:22:43 ERROR: pid 14860: pool_read2: EOF encountered with
>>>> backend
>>>> 2012-03-26 13:22:43 LOG: pid 14860: do_child: exits with status 1 due
>>>> to error
>>>>
>>>> (I have pgpool logs with DEBUG info as well, but they are very large
>>>> (and I am not certain how much useful information they contain).  I
>>>> can certainly provide excerpts from those logs if anyone thinks they
>>>> are useful--perhaps suggest things for which I might look.)
>>>>
>>>> ### Excerpts from postgresql log:
>>>>
>>>> LOG:  could not send data to client: Connection reset by peer
>>>> STATEMENT:                      SELECT *
>>>>                                FROM temp_shapes_daily
>>>>                                WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>                                AND original_product_id LIKE 'wac%'
>>>>
>>>> LOG:  could not send data to client: Broken pipe
>>>> STATEMENT:                      SELECT *
>>>>                                FROM temp_shapes_daily
>>>>                                WHERE mission_phase_name = 'NOMINAL MISSION'
>>>>                                AND original_product_id LIKE 'wac%'
>>>>
>>>> LOG:  SSL error: bad write retry
>>>> LOG:  could not send data to client: Connection reset by peer
>>>>
>>>> _______________________________________________
>>>> pgpool-general mailing list
>>>> pgpool-general at pgpool.net
>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> _______________________________________________
>>> pgpool-general mailing list
>>> pgpool-general at pgpool.net
>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> 
> -- 
> Joe Barnett
> 623.670.1326
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pool_ssl.patch
Type: text/x-patch
Size: 687 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20120403/4ba2a037/attachment.bin>


More information about the pgpool-general mailing list