View Issue Details

IDProjectCategoryView StatusLast Update
0000238Pgpool-IIEnhancementpublic2016-09-06 09:14
Reportersupp_kAssigned Tot-ishii 
PriorityhighSeverityfeatureReproducibilityalways
Status resolvedResolutionopen 
Platformx86 64OSCentOSOS Version6.x, 7.x
Product Version3.5.3 
Target Version3.6.0Fixed in Version 
Summary0000238: Performance of selects queries
DescriptionAfter integration of pgpool into our product we could detect the performance of a standalone 1 thread select is slowed down by 50%. It is good scaled when the number of clients is growing. But one single thread select now works slowly.

After a quick profiling of the pgpool's activity from the network perspective and having counted the number of flush operations it does we could detect that every single record (tuple) returned by PostgreSQL is processed by pgpool separately. Though the data is returned by PostgreSQL in one single array of bytes the Pgpool forwards response to the clients as separate (from network perspective) pieces of data with separate write/flush actions, from the OS's perspective every flush operation into network socket looks like a separate interruption. It directly impacts the performance of pgpool.


We could detect the described feature with the help of the strace tool (https://en.wikipedia.org/wiki/Strace) and standalone internal tests.

Please see the output below. From the log one can see the PostgreSQL has returned 1025 bytes but the pgpool chunks it into separate packets forwarded to the client which executes simple SQL: "select id from table" (the table contains several 16 records)
Steps To ReproduceWe have two backends behind the Pgpool.
Please see the attached quick patch (not sure if it is correst).
Also the chart the reflects performance testing results is attached:
 - the blue series reflects performance without pgpool,
 - the orange chart is the pgpool testing result
 - the X axis show the number of concurrent clients, the Y axis reflects the number of requests per second.
It can be seen the performance selects is significantly lower.
Additional Information1472135729.734737 select(10, [9], NULL, [9], NULL) = 1 (in [9])
1472135729.734792 read(9, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0010D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0"..., 1024) = 1024
1472135729.734828 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0010", 57) = 57
1472135729.734864 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011", 57) = 57
1472135729.734898 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010", 57) = 57
1472135729.734931 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0011", 57) = 57
1472135729.734971 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010", 57) = 57
1472135729.735003 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0011", 57) = 57
1472135729.735036 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010", 57) = 57
1472135729.735073 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011", 57) = 57
1472135729.735106 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0010\0\0\0\0010", 57) = 57
1472135729.735138 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0010\0\0\0\0011", 57) = 57
1472135729.735172 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0010", 57) = 57
1472135729.735204 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0010\0\0\0\0011\0\0\0\0011", 57) = 57
1472135729.735238 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0010", 57) = 57
1472135729.735270 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0011", 57) = 57
1472135729.735303 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0011\0\0\0\0010", 57) = 57
1472135729.735335 write(8, "D\0\0\0008\0\n\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0010\0\0\0\0011\0\0\0\0011\0\0\0\0011", 57) = 57
1472135729.735368 write(8, "D\0\0\0008\0\n\0\0\


We have discovered the code and implemented some fix (attached). It boosts the performance of pgpool by ratio 1.5 but not always because for sure we don't understand the whole logic of the code.

It seems the performance of pgpool can be boosted if the series of write/flush operations will be replaced with one single write/flush. We are not using caching and seems the parsing of whole the PostgreSQL response is unnecessary here.

We are ready to assist you with the performance and stability testing and provide the true data regarding our test environment and data.

Yours sincerely,
Sergey
HA architect
http://www.odin.com
TagsNo tags attached.

Activities

supp_k

2016-08-29 18:52

reporter  

flush_move.patch (1,747 bytes)
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index cc666b7..fbe0c98 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -211,8 +211,10 @@ POOL_STATUS pool_process_query(POOL_CONNECTION *frontend,
 		if (pool_is_query_in_progress() || !is_backend_cache_empty(backend))
 		{
 			status = ProcessBackendResponse(frontend, backend, &state, &num_fields);
-			if (status != POOL_CONTINUE)
+			if (status != POOL_CONTINUE) {
+				pool_flush(frontend);
 				return status;
+			}
 		}
 
 		/*
@@ -738,6 +740,8 @@ POOL_STATUS SimpleForwardToFrontend(char kind, POOL_CONNECTION *frontend,
 		execute_select = 0;
 	}
 #endif
+	
+	ereport(DEBUG1, (errmsg("SimpleForwardToFrontend==============="), errdetail("kind [%c] ", kind)));		
 
 	pool_read(MASTER(backend), &len, sizeof(len));
 
@@ -796,7 +800,8 @@ POOL_STATUS SimpleForwardToFrontend(char kind, POOL_CONNECTION *frontend,
 	pool_write(frontend, &kind, 1);
 	sendlen = htonl(len1+4);
 	pool_write(frontend, &sendlen, sizeof(sendlen));
-	pool_write_and_flush(frontend, p1, len1);
+//	pool_write_and_flush(frontend, p1, len1);
+	pool_write(frontend, p1, len1);	
 
 	ereport(DEBUG1,
 			(errmsg("SimpleForwardToFrontend: packet:%c length:%d",
diff --git a/src/utils/pool_stream.c b/src/utils/pool_stream.c
index e1336d3..8610e12 100644
--- a/src/utils/pool_stream.c
+++ b/src/utils/pool_stream.c
@@ -983,6 +983,7 @@ static int consume_pending_data(POOL_CONNECTION *cp, void *data, int len)
 	if (cp->len <= 0)
 		return 0;
 
+	ereport(DEBUG1, (errmsg("consume_pending_data"), errdetail("[%d] / [%d]", cp->len, len)));	
 	consume_size = Min(len, cp->len);
 	memmove(data, cp->hp + cp->po, consume_size);
 	cp->len -= consume_size;
flush_move.patch (1,747 bytes)

supp_k

2016-08-29 18:52

reporter  

chart_performance.png (11,437 bytes)
chart_performance.png (11,437 bytes)

supp_k

2016-08-31 05:44

reporter   ~0001038

We believe that if the next piece is modified it would boost the performance of selects significantly.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=shortlog;h=refs/heads/V3_5_STABLE
source file pool_proto_modules, line 2378:
            default:
                status = SimpleForwardToFrontend(kind, frontend, backend);
                if (pool_flush(frontend))
                    return POOL_END;
                break;
                
                
Here we think the "pool_flush" action can be replaced with some additional check which is already for instance if performed in "int pool_write_noerror(POOL_CONNECTION *cp, void *buf, int len)" function. As we see it executes the actual write when the buffer end is reached. Thus if reduce the number of flushes we can reduce the number of operations performed by OS and CPU. Of course we dont understand the whole logic of the code but what we observe is the performance of simple selects that return several records. The original PostgreSQL does the same - it returnds the byte array instead of several chunked pieces of the whole result set when it comes to several records (tuples).

Again we are ready and want to assist you with the performance and stability testing of the patch.

Yours sincerely,
Sergey
HA architect
http://www.odin.com

t-ishii

2016-08-31 13:56

developer   ~0001039

Last edited: 2016-08-31 13:57

View 2 revisions

Since SimpleForwardToFrontend is called for each raw data received from backend, above code you referred cause system call for each raw data. Yes, it's pretty expensive. However, using pool_write_noerror instead is pointless since writing to socket is already done in SimpleForwardToFrontend.

Attached diff (performance.diff) avoids flushing in SimpleForwardToFrontend when the data raw message in handled and removes pool_flush call after SimpleForwardToFronten. Other than data raw messages, flushing is done in SimpleForwardToFrontend.

I got 40 to 50% performance boost using pgbench and custom query (SELECT * FROM pgbench_accounts) on my laptop with attached diff (performance.diff).

Can you please test it?

t-ishii

2016-08-31 13:56

developer  

performance.diff (1,146 bytes)
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index cc666b7..5b852d6 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -796,7 +796,15 @@ POOL_STATUS SimpleForwardToFrontend(char kind, POOL_CONNECTION *frontend,
 	pool_write(frontend, &kind, 1);
 	sendlen = htonl(len1+4);
 	pool_write(frontend, &sendlen, sizeof(sendlen));
-	pool_write_and_flush(frontend, p1, len1);
+
+	if (kind == 'D')	/* optimization for "Data Row" packet */
+	{
+		pool_write(frontend, p1, len1);
+	}
+	else
+	{
+		pool_write_and_flush(frontend, p1, len1);
+	}
 
 	ereport(DEBUG1,
 			(errmsg("SimpleForwardToFrontend: packet:%c length:%d",
diff --git a/src/protocol/pool_proto_modules.c b/src/protocol/pool_proto_modules.c
index c833f48..9549bbb 100644
--- a/src/protocol/pool_proto_modules.c
+++ b/src/protocol/pool_proto_modules.c
@@ -2637,8 +2637,10 @@ POOL_STATUS ProcessBackendResponse(POOL_CONNECTION *frontend,
 
 			default:
 				status = SimpleForwardToFrontend(kind, frontend, backend);
+#ifdef NOT_USED
 				if (pool_flush(frontend))
 					return POOL_END;
+#endif
 				break;
 		}
 	}
performance.diff (1,146 bytes)

supp_k

2016-08-31 17:30

reporter   ~0001040

I see the performance boost by 50% but not always. The performance increase is is floating for unknown reasons. We have dedicated hardware for these tests.

Also I see the macros in the attached diff file (+#ifdef NOT_USED ... ).
Should I activate it? Or it has a different purpose?

t-ishii

2016-08-31 17:44

developer   ~0001041

My patch does not improve the performance if the query in questions returns small number of rows. For example, "pgbench -S" does not show improvement because the SELECT query in this pattern returns only 1 row at a time.

Also I don't see improvement when resources are saturated because the effect of the patch is masked. For example, there's large number of concurrent queries are going on.

> Also I see the macros in the attached diff file (+#ifdef NOT_USED ... ).
> Should I activate it? Or it has a different purpose?

It's a kind of customs in C coding :-) It means that a programmer wants to remove the part of the code temporarily. I will remove the part before commit.

supp_k

2016-08-31 21:36

reporter   ~0001042

We have illuminated all possible obstacles that can consume system resources. But we still see that the performance boost for one thread is not stable. Sometimes we see it gives 50% performance boost. It is unpredictable.

Can you please recommend what reasons may influence the performance of your patch?

supp_k

2016-08-31 22:15

reporter   ~0001043

Is it possible to implement the similar action for date passed to backends?

The strace util shows that simple "select 1" transferred to backend in separate flushes.

supp_k

2016-09-01 04:50

reporter   ~0001044

Maximum stable performance growth we could reach is 50% for the query (select 1).
It is possible to flush the whole request to backend and response from it in one operation? Anyway network clients (DB client and the PostgreSQL in this case) are waiting for the whole array of data to be received. In this case we see the performance level can be boosted more.

t-ishii

2016-09-02 07:57

developer   ~0001045

That depends on the message types. Please define "whole message". I don't think in general it is possible.

Anyway, I got 80% performance increase in my same test case by tweaking Pgpool-II further. Please test if you like (performance-v2.diff).

BTW, bug tracker is not the perfect place for this kind of discussion. Please move to the mailing list.

t-ishii

2016-09-02 07:58

developer  

performance-v2.diff (2,132 bytes)
diff --git a/src/protocol/CommandComplete.c b/src/protocol/CommandComplete.c
index f18f3db..a3069c5 100644
--- a/src/protocol/CommandComplete.c
+++ b/src/protocol/CommandComplete.c
@@ -370,17 +370,17 @@ static int foward_command_complete(POOL_CONNECTION *frontend, char *packet, int
 {
 	int sendlen;
 
-	pool_flush(frontend);
+	//pool_flush(frontend);
 
 	if (pool_write(frontend, "C", 1) < 0)
 		return -1;
 
-	pool_flush(frontend);
+	//pool_flush(frontend);
 
 	sendlen = htonl(packetlen+4);
 	if (pool_write(frontend, &sendlen, sizeof(sendlen)) < 0)
 		return -1;
-	pool_flush(frontend);
+	//pool_flush(frontend);
 	pool_write_and_flush(frontend, packet, packetlen);
 
 	return 0;
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index 4da4a56..25cbc0a 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -796,7 +796,22 @@ POOL_STATUS SimpleForwardToFrontend(char kind, POOL_CONNECTION *frontend,
 	pool_write(frontend, &kind, 1);
 	sendlen = htonl(len1+4);
 	pool_write(frontend, &sendlen, sizeof(sendlen));
-	pool_write_and_flush(frontend, p1, len1);
+
+	/*
+	 * Optimization for "Data Row" message.  Since it is too often to receive
+	 * and forward "Data Row" message, we do not flush the message to frontend
+	 * now. We expect that "Command Complete" message (or "ERROR" message)
+	 * follows the stream of data row message, and flushing will be done at
+	 * that time.
+	 */
+	if (kind == 'D' || kind == 'T')
+	{
+		pool_write(frontend, p1, len1);
+	}
+	else
+	{
+		pool_write_and_flush(frontend, p1, len1);
+	}
 
 	ereport(DEBUG1,
 			(errmsg("SimpleForwardToFrontend: packet:%c length:%d",
diff --git a/src/protocol/pool_proto_modules.c b/src/protocol/pool_proto_modules.c
index a32563e..8b5d4e4 100644
--- a/src/protocol/pool_proto_modules.c
+++ b/src/protocol/pool_proto_modules.c
@@ -2524,8 +2524,10 @@ POOL_STATUS ProcessBackendResponse(POOL_CONNECTION *frontend,
 
 			default:
 				status = SimpleForwardToFrontend(kind, frontend, backend);
+#ifdef NOT_USED
 				if (pool_flush(frontend))
 					return POOL_END;
+#endif
 				break;
 		}
 	}
performance-v2.diff (2,132 bytes)

supp_k

2016-09-06 06:15

reporter   ~0001047

Yes, the fix gives improvement in performance.
We see the 25-50% increase of performance.

Would it be possible to include the patch in the major codebase?

Thank you!

t-ishii

2016-09-06 09:12

developer   ~0001048

Yes, that's my plan.

Issue History

Date Modified Username Field Change
2016-08-29 18:52 supp_k New Issue
2016-08-29 18:52 supp_k File Added: flush_move.patch
2016-08-29 18:52 supp_k File Added: chart_performance.png
2016-08-30 07:37 t-ishii Assigned To => t-ishii
2016-08-30 07:37 t-ishii Status new => assigned
2016-08-30 07:37 t-ishii Additional Information Updated View Revisions
2016-08-31 05:44 supp_k Note Added: 0001038
2016-08-31 13:56 t-ishii Note Added: 0001039
2016-08-31 13:56 t-ishii File Added: performance.diff
2016-08-31 13:57 t-ishii Note Edited: 0001039 View Revisions
2016-08-31 13:57 t-ishii Status assigned => feedback
2016-08-31 17:30 supp_k Note Added: 0001040
2016-08-31 17:30 supp_k Status feedback => assigned
2016-08-31 17:44 t-ishii Note Added: 0001041
2016-08-31 21:36 supp_k Note Added: 0001042
2016-08-31 22:15 supp_k Note Added: 0001043
2016-09-01 04:50 supp_k Note Added: 0001044
2016-09-02 07:57 t-ishii Note Added: 0001045
2016-09-02 07:58 t-ishii File Added: performance-v2.diff
2016-09-06 06:15 supp_k Note Added: 0001047
2016-09-06 09:12 t-ishii Note Added: 0001048
2016-09-06 09:13 t-ishii Target Version => 3.6.0
2016-09-06 09:14 t-ishii Status assigned => resolved