[Pgpool-general] Parallel SELECT now works but is very slow...but dblink is parallel?

Bob Lunney bob_lunney at yahoo.com
Wed Feb 16 15:13:42 UTC 2011


Alessandro,

Pgpool will try to run your query on the master backend node first to ensure it parses and executes, then will run it on the remaining nodes, collect the results, collate them and present the complete parallel result set to the client.

Try using the pool_parallel() function to surround your query if you are sure the query will parse and execute on a single node:
  select pool_parallel("select a_column 
                          from parallel_table 
                         where some_condition = ...")

That directs pgpool to skip checking the query to verify it can be executed in parallel and just send it to each backend node immediately.

Be sure to put the original query in quotes, and you do not have to prefix the table with the schema name (but it is a good idea) to trigger a parallel query verification, since your are telling pgpool the query can be run in parallel. 

Good luck, and Thanks! to Tatsuo for a great bit of software.

Bob Lunney

--- On Wed, 2/16/11, Sandeep Thakkar <sandeeptt at yahoo.com> wrote:

From: Sandeep Thakkar
 <sandeeptt at yahoo.com>
Subject: Re: [Pgpool-general] Parallel SELECT now works but is very slow...but dblink is parallel?
To: "Alessandro Candini" <candini at meeo.it>, pgpool-general at pgfoundry.org
Date: Wednesday, February 16, 2011, 7:21 AM

Yes, so how do you conclude that parallel query is working fine? Infact using parallel query, the test query should return faster.. 
From: Alessandro Candini <candini at meeo.it>
To: pgpool-general at pgfoundry.org
Sent: Tue, February 1, 2011 7:57:41 PM
Subject: [Pgpool-general] Parallel SELECT now works but is very slow...but dblink is parallel?


Finally I was able to properly configure pgpool in order to perform a parallel
query.

But it is so slow that it seems that the query on my splitted database is
performed sequentially rather than in a parallel way.

I have on the same machine pgpool-II-3.0.1 and 4 istances of postgresql-9.0.2
(respectovely on port 5433, 5434, 5435, 5436).
Every istance has a different piece (no replication!) of the same database.

A test query that I prepared which retrieve a big amount of data, takes 6
seconds with pgpool,
but if I launch the same query with 4 different but
contemporary threads directly on database ports (using a python script),
it takes only 0.9 seconds per thread.
Ok, the results are not merged together, but what a difference!

I think that pgpool splits the queries through the instancies, but launch them
sequentially.
I guess "1 sec * db + result_merge_time" = 6 seconds more or
 less...

Is that possible and there is a way to fix it?
Is the dblink function launched in a parallel way (contemporarily) on the 4 db
instancies?

My configuration is the follows, thanks in advance...
listen_addresses = '*'
port = 9999
pcp_port = 9898
socket_dir = '/tmp'
pcp_socket_dir = '/tmp'
backend_socket_dir = '/tmp'
pcp_timeout = 10
num_init_children = 32
max_pool = 4
child_life_time = 0
connection_life_time = 0
child_max_connections = 0
client_idle_limit = 0
authentication_timeout = 60
logdir = '/var/log/pgpool'
pid_file_name = '/var/log/pgpool/pgpool.pid'
replication_mode = true
load_balance_mode = false
replication_stop_on_mismatch = false
failover_if_affected_tuples_mismatch = false
replicate_select = false
reset_query_list = 'ABORT; DISCARD ALL'
white_function_list = ''
black_function_list = 'nextval,setval'
print_timestamp =
 true
master_slave_mode = false
master_slave_sub_mode = 'stream'
delay_threshold = 100
log_standby_delay = 'if_over_threshold'
connection_cache = true
health_check_timeout = 10
health_check_period = 10
health_check_user = 'postgis'
failover_command = '/usr/lib/pgpool-II/3.0.1/bin/failover.sh %h %H
/tmp/trigger_file0'
failback_command = '/usr/lib/pgpool-II/3.0.1/bin/failback.sh %h %M
/tmp/trigger_file0 %m'
fail_over_on_backend_error = false
insert_lock = true
ignore_leading_white_space = false
log_statement = false
log_per_node_statement = false
log_connections = false
log_hostname = false
parallel_mode = true
enable_query_cache = false
pgpool2_hostname = ''
system_db_hostname = 'localhost'
system_db_port = 5433
system_db_dbname = 'pgpool'
system_db_schema = 'pgpool_catalog'
system_db_user = 'pgpool'
system_db_password = 'gispost'
backend_hostname0 =
 'localhost'
backend_port0 = 5433
backend_weight0 = 1
backend_data_directory0 = '/home/database/9.0.2/db_0/data'
backend_hostname1 = 'localhost'
backend_port1 = 5434
backend_weight1 = 1
backend_data_directory1 = '/home/database/9.0.2/db_1/data'
backend_hostname2 = 'localhost'
backend_port2 = 5435
backend_weight2 = 1
backend_data_directory2 = '/home/database/9.0.2/db_2/data'
backend_hostname3 = 'localhost'
backend_port3 = 5436
backend_weight3 = 1
backend_data_directory3 = '/home/database/9.0.2/db_3/data'
enable_pool_hba = true
recovery_user = 'postgis'
recovery_password = 'gispost'
recovery_1st_stage_command = 'basebackup.sh'
recovery_2nd_stage_command = ''
recovery_timeout = 60
client_idle_limit_in_recovery = 0
lobj_lock_table = 'pgpool_lobj_lock'
ssl = false
debug_level =
 0

_______________________________________________
Pgpool-general mailing list
Pgpool-general at pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general




      
-----Inline Attachment Follows-----

_______________________________________________
Pgpool-general mailing list
Pgpool-general at pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general



      


More information about the Pgpool-general mailing list