[Pgpool-general] primary server cannot be recovered by online recovery

Tatsuo Ishii ishii at sraoss.co.jp
Wed Oct 5 10:45:59 UTC 2011


From: Sandeep Thakkar <sandeeptt at yahoo.com>
Subject: [Pgpool-general] primary server cannot be recovered by online recovery
Date: Tue, 4 Oct 2011 23:09:41 -0700 (PDT)
Message-ID: <1317794981.19634.YahooMailNeo at web121717.mail.ne1.yahoo.com>

> I use pgpool-II 3.0.3 and configured it in Streaming replication mode. My test cases work fine. 
> But sometimes, once in a few days, I see the following error during online recovery:
> 
> DEBUG: send: tos="R", len=41
> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
> DEBUG: send: tos="D", len=6
> DEBUG: recv: tos="e", len=59, data=primary server cannot be recovered by online recovery.
> DEBUG: command failed. reason=primary server cannot be recovered by online recovery.
> DEBUG: send: tos="X", len=4
> BackendError
> 
> I looked into code and found that this error should appear when the master_slave_sub_mode value in pgpool.conf
> is not set to 'stream'. But my pgpool.conf settings are all fine. I just wanted to know if there could be other reason for this error?
>
> ...
> if (MASTER_SLAVE && !strcmp(pool_config->master_slave_sub_mode, MODE_STREAMREP))
> 						msg = "primary server cannot be recovered by online recovery.";
> ........

No. The code says the error message should appears when the
master_slave_sub_mode value in pgpool.confis *set* to 'stream'.

I think real cause of the problem is just before the code segment:

				if ((!REPLICATION &&
					 !(MASTER_SLAVE &&
					   !strcmp(pool_config->master_slave_sub_mode, MODE_STREAMREP))) ||
					(MASTER_SLAVE &&
					 !strcmp(pool_config->master_slave_sub_mode, MODE_STREAMREP) &&
Here ---->			 node_id == PRIMARY_NODE_ID))
				{
					int len;
					char *msg;

					if (MASTER_SLAVE && !strcmp(pool_config->master_slave_sub_mode, MODE_STREAMREP))
						msg = "primary server cannot be recovered by online recovery.";
					else
						msg = "recovery request is accepted only in replication mode or stereaming replication mode. ";

"PRIMARY_NODE_ID" is a macro:

#define PRIMARY_NODE_ID (Req_info->primary_node_id >=0?\
						 Req_info->primary_node_id:REAL_MASTER_NODE_ID)

Req_info->primary_node_id is data on shared memory. It is set by
calling pgpool_walrecrunning(). The function checks whether WAL
receiver process of PostgreSQL is running. If not running it must be
the primary server. Unfortunately, there is a logic flaw in this. If
something goes wrong (for example, the network connection between
primary and standby is broken) then WAL receiver goes down. In this
case it leads to mistake to determine which is primary.

In summary, in your system WAL receiver process occasionally goes
down, and this trigger the error described above.

Pgpool-II 3.1 changes the logic to detect primary to solve the
problem. So my bet is upgrading to 3.1 will solve the problem.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


More information about the Pgpool-general mailing list