View Issue Details

IDProjectCategoryView StatusLast Update
0000324Pgpool-IIBugpublic2017-08-04 13:57
Reportera.dushechkinAssigned Tot-ishii 
PrioritynormalSeveritymajorReproducibilityunable to reproduce
Status closedResolutionopen 
Platformx86_64OSUbuntuOS Version14.04
Product Version3.6.4 
Target VersionFixed in Version 
Summary0000324: backend consumes large amount of RAM with long sessions
DescriptionHello,

I have PostgreSQL 9.6.3 and Gitlab 9.4.1 (Ruby application with prepared statements on) connected to it via pgpool2 in master/slave mode. Shortly after upgrade (was PostgreSQL 9.4, pgpool2 3.4.6 and earlier Gitlab) I observe unusual memory consumption by pgpool2 processes corresponding to Gitlab connections (500-700 megabytes RSS, grows with time).



Steps To ReproduceExcessive memory consumption only observed on production installation, so no known steps to reproduce.
Additional InformationGitlab opens a connection per unicorn worker and holds it indefinitely (seen 24 hours straight). Memory dump taken by gcore contains massive amounts of different SQL queries texts (maybe thousands, checked with 'strings core.pid | sort | uniq -c'). Unfortunately, I can't provide the memory dump itself since it may contain sensitive information. Query caching is off on pgpool2, and there are no long-running transactions.
TagsNo tags attached.

Activities

t-ishii

2017-08-01 14:42

developer   ~0001611

I'm not familiar with Gitlab. Does it use extended query?

a.dushechkin

2017-08-01 14:57

reporter   ~0001612

Yes; I can see Parse/Bind messages in the DB traffic. I also found https://github.com/rails/rails/issues/14645 which says Ruby on Rails 4 (which is used by Gitlab) tends to create excessive amount of prepared statements. But I observe increased memory usage only on pgpool, not on database backend processes.

t-ishii

2017-08-01 15:04

developer   ~0001613

Pgpool-II behaves differently than non extended query case (it's quite complex). Maybe there's an oversight in the memory management in the mode. I will try to reproduce the leak.

t-ishii

2017-08-01 15:32

developer   ~0001614

I have tried with very simple case (pgbench -M extended) and could not reproduce the problem.

Also I want to make sure that you have set in pgpool.conf:

master_slave_sub_mode = 'stream'

a.dushechkin

2017-08-01 15:41

reporter   ~0001615

Maybe that's because pgbench uses prepared statements properly and there are few of them (tenths, not hundreds/thousands). I'm also trying to reproduce this issue.

Yes, I have that in my pgpool.conf:

# grep master_slave_sub_mode /etc/pgpool2/pgpool.conf
master_slave_sub_mode = 'stream'

a.dushechkin

2017-08-01 16:12

reporter   ~0001616

I wandering around in the memory dump and spotted session_context.message_list variable (dump is 500 megabytes in size). Its .capacity is 32768 and .size 24041. I also looked in some .sent_messages elements (beginning, end) and they all have status POOL_SENT_MESSAGE_CREATED and name like a1234, and from PG traffic dump I see that's a statement which explicitly asked to be prepared by application.

Can it be the source of such memory consumption? If so, its definitely an application misbehaving and not pgpool's fault.

t-ishii

2017-08-01 16:23

developer   ~0001617

But the same data structure was used in Pgpool-II 3.4.6 and you didn't see memory leak in the version. So I still I suspect there's something wrong in Pgpool-II 3.6.4. I will study more.

t-ishii

2017-08-02 18:04

developer   ~0001618

I found a memory leak case. If the app uses named prepared statements or portals, they must be closed after execution. Pgpool-II remembers them in the sent message data structure and supposed to remove the memory for the data upon receiving Close Complemete message from backend. However Pgpool-II does not do that properly in Pgpool-II 3.6. Attached patch tries to fix the case.

memory-leak.diff (2,373 bytes)
diff --git a/src/protocol/pool_process_query.c b/src/protocol/pool_process_query.c
index 92c19ef..5eec4a5 100644
--- a/src/protocol/pool_process_query.c
+++ b/src/protocol/pool_process_query.c
@@ -3628,11 +3628,20 @@ void read_kind_from_backend(POOL_CONNECTION *frontend, POOL_CONNECTION_POOL *bac
 		}
 		else
 		{
-			POOL_PENDING_MESSAGE *pending_message;
+			POOL_PENDING_MESSAGE *pending_message = NULL;
 
-			ereport(DEBUG1,
-					(errmsg("read_kind_from_backend: pending message was pulled out")));
-			pending_message = pool_pending_message_pull_out();
+			if (msg->type == POOL_CLOSE)
+			{
+				/* Pending message will be pulled out in CloseComplete() */
+				ereport(DEBUG1,
+						(errmsg("read_kind_from_backend: pending message was not pulled out because message type is CloseComplete")));
+			}
+			else
+			{
+				ereport(DEBUG1,
+						(errmsg("read_kind_from_backend: pending message was pulled out")));
+				pending_message = pool_pending_message_pull_out();
+			}
 
 			if (pending_message)
 				pool_pending_message_free_pending_message(pending_message);
diff --git a/src/protocol/pool_proto_modules.c b/src/protocol/pool_proto_modules.c
index a07e182..c3fd893 100644
--- a/src/protocol/pool_proto_modules.c
+++ b/src/protocol/pool_proto_modules.c
@@ -2091,12 +2091,22 @@ POOL_STATUS CloseComplete(POOL_CONNECTION *frontend, POOL_CONNECTION_POOL *backe
 	{
 		POOL_PENDING_MESSAGE *pmsg;
 
-		pmsg = pool_pending_message_get(POOL_CLOSE);
+		pmsg = pool_pending_message_pull_out();
 
 		if (pmsg)
 		{
-			kind = pool_get_close_message_spec(pmsg);
-			name = pool_get_close_message_name(pmsg);
+			/* Sanity check */
+			if (pmsg->type != POOL_CLOSE)
+			{
+				ereport(LOG,
+						(errmsg("loseComplete: pending messge was not CloseComplete")));
+			}
+			else
+			{
+				kind = pool_get_close_message_spec(pmsg);
+				name = pool_get_close_message_name(pmsg);
+				kind = kind=='S'?'P':'B';
+			}
 			pool_pending_message_free_pending_message(pmsg);
 		}
 	}
@@ -2115,11 +2125,11 @@ POOL_STATUS CloseComplete(POOL_CONNECTION *frontend, POOL_CONNECTION_POOL *backe
 		}
 	}
 	
-	if (kind == ' ')
+	if (kind != ' ')
 	{
 		pool_remove_sent_message(kind, name);
 		ereport(DEBUG1,
-				(errmsg("CloseComplete: remove uncompleted message. kind:%c, name:%s",
+				(errmsg("CloseComplete: remove sent message. kind:%c, name:%s",
 						kind, name)));
 	}
 
memory-leak.diff (2,373 bytes)

a.dushechkin

2017-08-03 18:15

reporter   ~0001621

I don't believe that's the issue I'm facing, because I don't observe any Close messages passed by the application. So there is really 24 thousand prepared statements in a session. As a "workaround" I disabled prepared statements in application, and everything became normal again.

I'm not sure what to do with the bug, since my original issue is within application, not pgpool, but you found another issue.

t-ishii

2017-08-04 13:46

developer   ~0001622

If you don't observe Close messages, then yes my guess is not correct.

So I think we can close this issue for now if you don't mind.

a.dushechkin

2017-08-04 13:51

reporter   ~0001623

Sure, no problem.

t-ishii

2017-08-04 13:57

developer   ~0001624

Thanks. Issue closed.

Issue History

Date Modified Username Field Change
2017-07-31 20:09 a.dushechkin New Issue
2017-08-01 08:13 t-ishii Assigned To => t-ishii
2017-08-01 08:13 t-ishii Status new => assigned
2017-08-01 14:42 t-ishii Note Added: 0001611
2017-08-01 14:57 a.dushechkin Note Added: 0001612
2017-08-01 15:04 t-ishii Note Added: 0001613
2017-08-01 15:32 t-ishii Note Added: 0001614
2017-08-01 15:41 a.dushechkin Note Added: 0001615
2017-08-01 16:12 a.dushechkin Note Added: 0001616
2017-08-01 16:23 t-ishii Note Added: 0001617
2017-08-02 18:04 t-ishii File Added: memory-leak.diff
2017-08-02 18:04 t-ishii Note Added: 0001618
2017-08-03 09:38 t-ishii Status assigned => feedback
2017-08-03 18:15 a.dushechkin Note Added: 0001621
2017-08-03 18:15 a.dushechkin Status feedback => assigned
2017-08-04 13:46 t-ishii Note Added: 0001622
2017-08-04 13:51 a.dushechkin Note Added: 0001623
2017-08-04 13:53 t-ishii Status assigned => feedback
2017-08-04 13:57 t-ishii Note Added: 0001624
2017-08-04 13:57 t-ishii Status feedback => closed