[Pgpool-general] Parallel SELECT now works but is very slow...but dblink is parallel?

Sandeep Thakkar sandeeptt at yahoo.com
Tue Feb 22 04:51:52 UTC 2011


Is it a bug in pgpool-3.0.1 or have I misunderstood anything here? Please 
explain. 


Thankyou very much.





________________________________
From: Sandeep Thakkar <sandeeptt at yahoo.com>
To: Bob Lunney <bob_lunney at yahoo.com>; pgpool-general at pgfoundry.org
Sent: Fri, February 18, 2011 1:58:36 PM
Subject: Re: [Pgpool-general] Parallel SELECT now works but is very slow...but 
dblink is parallel?


Hi Bob,

I followed the tutorial, and in the end I executed these queries:

psql -p 5432 -c "SELECT * FROM pgbench_accounts" bench_parallel
psql -p 5433 -c "SELECT * FROM pgbench_accounts" bench_parallel
psql -p 9999 -c "select pool_parallel('SELECT * FROM pgbench_accounts')" 
bench_parallel

and these are time these commands took to execute:
start_date=Fri Feb 18 13:22:23 IST 2011
port=5432
end_date=Fri Feb 18 13:22:26 IST 2011
start_date=Fri Feb 18 13:22:26 IST 2011
port=5433
end_date=Fri Feb 18 13:22:30 IST 2011
start_date=Fri Feb 18 13:22:30 IST 2011
port=9999


That means, parallel query is not working?




________________________________
From: Bob Lunney <bob_lunney at yahoo.com>
To: pgpool-general at pgfoundry.org
Sent: Wed, February 16, 2011 8:43:42 PM
Subject: Re: [Pgpool-general] Parallel SELECT now works but is very slow...but 
dblink is parallel?

Alessandro,

Pgpool will try to run your query on the master backend node first to ensure it 
parses and executes, then will run it on the remaining nodes, collect the 
results, collate them and present the complete parallel result set to the 
client.

Try using the pool_parallel() function to surround your query if you are sure 
the query will parse and execute on a single node:
  select pool_parallel("select a_column 
                          from parallel_table 
                         where some_condition = ...")

That directs pgpool to skip checking the query to verify it can be executed in 
parallel and just send it to each backend node immediately.

Be sure to put the original query in quotes, and you do not have to prefix the 
table with the schema name (but it is a good idea) to  trigger a parallel query 
verification, since your are telling pgpool the query can be run in parallel. 

Good luck, and Thanks! to Tatsuo for a great bit of software.

Bob Lunney

--- On Wed, 2/16/11, Sandeep Thakkar <sandeeptt at yahoo.com> wrote:

From: Sandeep Thakkar
<sandeeptt at yahoo.com>
Subject: Re: [Pgpool-general] Parallel SELECT now works but is very slow...but 
dblink is parallel?
To: "Alessandro Candini" <candini at meeo.it>, pgpool-general at pgfoundry.org
Date: Wednesday, February 16, 2011, 7:21 AM

Yes, so how do you conclude that parallel query is working fine? Infact  using 
parallel query, the test query should return faster.. 

From: Alessandro Candini <candini at meeo.it>
To: pgpool-general at pgfoundry.org
Sent: Tue, February 1, 2011 7:57:41 PM
Subject: [Pgpool-general] Parallel SELECT now works but is very slow...but 
dblink is parallel?


Finally I was able to properly configure pgpool in order to perform a parallel
query.

But it is so slow that it seems that the query on my splitted database is
performed sequentially rather than in a parallel way.

I have on the same machine pgpool-II-3.0.1 and 4 istances of postgresql-9.0.2
(respectovely on port 5433, 5434, 5435, 5436).
Every istance has a different piece (no replication!) of the same database.

A test query that I prepared which retrieve a big  amount of data, takes 6
seconds with pgpool,
but if I launch the same query with 4 different but
contemporary threads directly on database ports (using a python script),
it takes only 0.9 seconds per thread.
Ok, the results are not merged together, but what a difference!

I think that pgpool splits the queries through the instancies, but launch them
sequentially.
I guess "1 sec * db + result_merge_time" = 6 seconds more or
less...

Is that possible and there is a way to fix it?
Is the dblink function launched in a parallel way (contemporarily) on the 4 db
instancies?

My configuration is the follows, thanks in advance...
listen_addresses = '*'
port = 9999
pcp_port = 9898
socket_dir = '/tmp'
pcp_socket_dir = '/tmp'
backend_socket_dir = '/tmp'
pcp_timeout = 10
num_init_children = 32
max_pool = 4
child_life_time = 0
connection_life_time = 0
child_max_connections =  0
client_idle_limit = 0
authentication_timeout = 60
logdir = '/var/log/pgpool'
pid_file_name = '/var/log/pgpool/pgpool.pid'
replication_mode = true
load_balance_mode = false
replication_stop_on_mismatch = false
failover_if_affected_tuples_mismatch = false
replicate_select = false
reset_query_list = 'ABORT; DISCARD ALL'
white_function_list = ''
black_function_list = 'nextval,setval'
print_timestamp =
true
master_slave_mode = false
master_slave_sub_mode = 'stream'
delay_threshold = 100
log_standby_delay = 'if_over_threshold'
connection_cache = true
health_check_timeout = 10
health_check_period = 10
health_check_user = 'postgis'
failover_command = '/usr/lib/pgpool-II/3.0.1/bin/failover.sh %h %H
/tmp/trigger_file0'
failback_command = '/usr/lib/pgpool-II/3.0.1/bin/failback.sh %h %M
/tmp/trigger_file0 %m'
fail_over_on_backend_error = false
insert_lock =  true
ignore_leading_white_space = false
log_statement = false
log_per_node_statement = false
log_connections = false
log_hostname = false
parallel_mode = true
enable_query_cache = false
pgpool2_hostname = ''
system_db_hostname = 'localhost'
system_db_port = 5433
system_db_dbname = 'pgpool'
system_db_schema = 'pgpool_catalog'
system_db_user = 'pgpool'
system_db_password = 'gispost'
backend_hostname0 =
'localhost'
backend_port0 = 5433
backend_weight0 = 1
backend_data_directory0 = '/home/database/9.0.2/db_0/data'
backend_hostname1 = 'localhost'
backend_port1 = 5434
backend_weight1 = 1
backend_data_directory1 = '/home/database/9.0.2/db_1/data'
backend_hostname2 = 'localhost'
backend_port2 = 5435
backend_weight2 = 1
backend_data_directory2 = '/home/database/9.0.2/db_2/data'
backend_hostname3 = 'localhost'
backend_port3 = 5436
backend_weight3 =  1
backend_data_directory3 = '/home/database/9.0.2/db_3/data'
enable_pool_hba = true
recovery_user = 'postgis'
recovery_password = 'gispost'
recovery_1st_stage_command = 'basebackup.sh'
recovery_2nd_stage_command = ''
recovery_timeout = 60
client_idle_limit_in_recovery = 0
lobj_lock_table = 'pgpool_lobj_lock'
ssl = false
debug_level =
0

_______________________________________________
Pgpool-general mailing list
Pgpool-general at pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general




      
-----Inline Attachment Follows-----

_______________________________________________
Pgpool-general mailing list
Pgpool-general at pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general



      
_______________________________________________
Pgpool-general mailing list
Pgpool-general at pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20110221/42f94839/attachment-0001.html>


More information about the Pgpool-general mailing list