0000772: all backend node server reboot

ID	Project	Category	View Status	Date Submitted	Last Update

0000772	Pgpool-II	Bug	public	2022-11-09 11:02	2022-12-13 14:24

Reporter	jaewan	Assigned To	t-ishii
Priority	normal	Severity	tweak	Reproducibility	always
Status	assigned	Resolution	open
Product Version	4.3.3

Summary	0000772: all backend node server reboot
Description	Hello, I am an engineer who uses pgpool. My configuration consists of three servers running pgpool and two servers running postgresql. When the postgresql server is powered off and powered on, pgpool will attempt to failover in a few moments. At this time, we confirmed 'pcp_node_info' that the 'primary' node becomes 'standby' and both postgresql will remain as standby. There is no 'primary' so it is not available for general service. If you manually attach a node that was previously 'primary' using 'pcp_attach_node', the primary will come back to normal. What I'm curious about is that if all the backend servers are down at once, can't it be automatically restored? Is this the intended move?
Tags	No tags attached.

t-ishii 2022-11-09 14:05 developer ~0004136	> What I'm curious about is that if all the backend servers are down at once, can't it be automatically restored? Automatic restore is not technically difficult. However Pgpool-II cares about the server's healthiness after rebooting. So Pgpool-II intendedly leaves the status of PostgreSQL to down in case of PostgreSQL rebooting. It expects the judgement from human. By the way you can create your own script so that it periodically checks the status managed by Pgpool-II and the actual status of PostgreSQL by looking at the output of pcp_node_info. t-ishii$ pcp_node_info -w -p 11001 -v Hostname : localhost Port : 11002 Status : 2 Weight : 0.500000 Status Name : up Backend Status Name : up Role : primary Backend Role : primary Replication Delay : 0 Replication State : none Replication Sync State : none Last Status Change : 2022-11-09 13:31:48 Hostname : localhost Port : 11003 Status : 2 Weight : 0.500000 Status Name : up Backend Status Name : up Role : standby Backend Role : standby Replication Delay : 0 Replication State : streaming Replication Sync State : async Last Status Change : 2022-11-09 13:31:48 If "Backend Status Name" is "up" and "Status Name" is "down", the script will try pcp_attach_node to let Pgpool-II set the "up" status.

jaewan 2022-11-09 14:59 reporter ~0004137	Thank you very much for your reply. There was no one around me to ask questions, and I found this place through many searches. What do you mean by automatic restoration? For example, turn on the back-end server automatically, or when the back-end server turns on, it automatically attach to the pgpool? Is there a way in pgpool to automatically execute the last mentioned "pcp_attach_node" when the backend server is down and up again? Or should I make the script you mentioned in the middle to check the status of the backend node periodically? Thank you again for your reply.

t-ishii 2022-11-09 16:28 developer ~0004138	You are welcome. > What do you mean by automatic restoration? This one: > when the back-end server turns on, it automatically attach to the pgpool? > Is there a way in pgpool to automatically execute the last mentioned "pcp_attach_node" when the backend server is down and up again? No. > Or should I make the script you mentioned in the middle to check the status of the backend node periodically? Yes.

jaewan 2022-11-09 17:54 reporter ~0004139	Thank you for your answer. > Or should I make the script you mentioned in the middle to check the status of the backend node periodically? Yes. You said yes, so for example, how to run a script that periodically checks pcp_node_info for all backend nodes and if the Status Name is down, runs pcp_attach_node? At this time, if the backup server does not die at the same time, but only the primary dies and the failover occurs, Hostname : DB01 Port : 5432 Status : 3 Weight : 0.500000 Status Name : down Backend Status Name : up Role : standby Backend Role : primary Replication Delay : 0 Replication State : none Replication Sync State : none Last Status Change : 2022-11-09 15:13:39 That's what happens. If you run pcp_attach_node for this node, Replication_delay occurs when both backend nodes become primary as the Backend Role becomes primary. I wonder if I set it up wrong or if there is anything else I need to do. If you don't understand the question, I'd like to show you the entire process with the capture screen. I have a lot of questions, but I'll try it all at once. 1. Is the pgpool main process intended to die if the network service of the server running pgpool leader is disconnected (e.g. systemctl restart network)? 2. The following is after the backend node failover occurred. Hostname : DB01 Port : 5432 Status : 3 Weight : 0.500000 Status Name : down Backend Status Name : up Role : standby Backend Role : primary Replication Delay : 0 Replication State : none Replication Sync State : none Last Status Change : 2022-11-09 15:13:39 Is there any automated way to make the Backend Role stand by here? I'm manually running pcp_recovery_node to normalize it. But I would like to do so if automation is possible without human intervention. 3. Is there a way to automatically drive the weight of the load balance of the pgpool to the primary? If you set the ratio to 1:0 in pgpool.conf, the load balance operation does not seem to work well during failover. The reason is that there are currently two DB Servers and they are using it async. If you do sync, you'll be waiting indefinitely to sync when Standby dies. This will cause problems across the service. Thank you very much for your response. You are my savior.

t-ishii 2022-11-09 19:34 developer ~0004140 Last edited: 2022-11-09 19:35	Ok, let's discuss simpler strategy. Probably the simplest solution is to prevent health checking from failure while PostgreSQL is rebooting . Suppose the rebooting process takes 30 seconds. Set (health_check_retry_delay * health_check_max_retries) > 30, for example set health_check_retry_delay = 1 and health_check_max_retries = 60. With this setting, health check will retry up to 60 seconds before detecting PostgreSQL down. So unless the rebooting takes more than 60 seconds, failover will never happen and clients can happily connect to pgpool and PostgreSQL after PostgreSQL's rebooting. Please note that you should set failover_on_backend_error = off, otherwise if client connects to pgpool, pgpool tries to connect PostgreSQL and fails, which triggers immediate failover.

jaewan 2022-11-10 14:40 reporter ~0004141	As you mentioned, I tested it after modifying it. In my opinion, the primary will only be able to 'select query' or not be able to use DB during the time the server reboots. As a result of the test, if I query through pgpool vip while the server reboots, 'authentication failed' occurs, is this the result of my other mistake? What do you think about the idea that the service will be unavailable during the reboot?

jaewan 2022-11-10 15:06 reporter ~0004142	For authentication failure, I made a mistake while creating a new server for testing.

jaewan 2022-11-10 16:03 reporter ~0004143	We conducted a proper test. As I thought "failed to create a backend 1 connection" I get an error. I think this method will be useful in a short time, such as restarting the network service after modifying the network of the OS.

t-ishii 2022-11-10 17:22 developer ~0004144	> I think this method will be useful in a short time, such as restarting the network service after modifying the network of the OS. Is it worth to add the method to pgpool documentation?

jaewan 2022-11-10 18:04 reporter ~0004145	I always appreciate your continuous response. >> I think this method will be useful in a short time, such as restarting the network service after modifying the network of the OS. > Is it worth to add the method to pgpool documentation? I think it'll be good enough as a tip. If you write it with a detailed description of the situation, it will be helpful for regular users like me. As a result, we found that it is best to make sure that actions after failover are done manually or periodically scripted. I have another question. I understand that when a network dies for a few seconds on a server where pgpool is running (for example, systemctl restart network), pgpool (watchdog) kills the pgpool main process if it tries to access the network several times in a moment and fails. I looked it up and found out that it was intentionally made like this. Is this true? Another question is to make the load balancing ratio 1:0. If a failover occurs, rather than a fixed 1:0, the primary node is assigned a fixed 1. Is this possible?

t-ishii 2022-11-14 12:09 developer ~0004146	> I understand that when a network dies for a few seconds on a server where pgpool is running (for example, systemctl restart network), pgpool (watchdog) kills the pgpool main process if it tries to access the network several times in a moment and fails. I looked it up and found out that it was intentionally made like this. Is this true? I am going to ask this to other developers. > Another question is to make the load balancing ratio 1:0. If a failover occurs, rather than a fixed 1:0, the primary node is assigned a fixed 1. Is this possible? Yes. You can use "database_redirect_preference_list" parameter for this purpose. Add following to pgpool.conf and reload or restart pgpool: database_redirect_preference_list = '.:primary' Any SELECTs to Any database (represented by '.') will be forwarded to the primary node regardless the backend node id. See https://www.pgpool.net/docs/latest/en/html/runtime-config-load-balancing.html for more details.

jaewan 2022-11-15 09:02 reporter ~0004147	Thank you for your answer. database_redirect_preference_list Sorry for this parameter, I should have checked the document more carefully. And thank you for letting me know. It was helpful. > I am going to ask this to other developers. I'd appreciate it if you could let me know after receiving your answer.

jaewan 2022-11-29 14:59 reporter ~0004153	Do I have to wait a little longer?

t-ishii 2022-11-30 16:30 developer ~0004155	A watchdog developer is looking at this. Please wait for a while.

jaewan 2022-12-01 12:00 reporter ~0004156	https://git.postgresql.org/gitweb/?p=pgpool2.git;a=blob;f=src/watchdog/watchdog.c;h=d7001df7d27f4601744628a91ad0479f4bfcba1e;hb=refs/heads/master The content is in line 6394 on the above page.

t-ishii 2022-12-01 13:36 developer ~0004157	I know. I need confirmation from the developer who wrote the code.

t-ishii 2022-12-09 13:56 developer ~0004163	> pgpool (watchdog) kills the pgpool main process if it tries to access the network several times in a moment and fails. I looked it up and found out that it was intentionally made like this. Is this true? I think this is the answer for you too. https://www.pgpool.net/pipermail/pgpool-general/2022-December/008570.html

jaewan 2022-12-09 15:19 reporter ~0004164	But I didn't modify the wd_monitoring_interfaces_list. (i.e. empty value) If it's an empty value, monitoring should be disabled, but my pgpool commits suicide.

t-ishii 2022-12-09 17:19 developer ~0004165	Ok, I have invited Muhmmad Usama, who is the authority of watchdog.

t-ishii 2022-12-12 17:01 developer ~0004167 Last edited: 2022-12-12 19:23	For some reason I don't know (maybe Mantis issue), he cannot see this issue. After forwarding the conversation, he said he needs the pgpool log file. Can you share it with him? (please attach the file. I will forward it to him.)

jaewan 2022-12-13 10:53 reporter ~0004171	I want to do that now, but it's not an immediate environment. It cannot be retested on a commercial server, which can take several days to configure the environment. Thank you very much for trying to analyze the log. I'll get ready quickly.

jaewan 2022-12-13 11:19 reporter ~0004172	There's a problem that happened last night. It's been restored now. I rebooted one DB server in the environment where I am using two DB servers of the three pgpool servers mentioned above. Then my client came up with the phrase "pgpool is not accepting any new connections." We looked at pcp_node_info and the results of the two were the same. Role: standby, Backend Role: primary The solution I did was to create a standby.signal file in either DB and perform pcp_recovery_node to make the entire service work. 1. I wonder what you think is a good solution. 2. There are two DB servers, but I wonder if there is a way that the service does not stop just because one is dead. 3. Did I set something up wrong? 4. When the above phenomenon occurs, what is the way to know the DB that was previously standby?

t-ishii 2022-12-13 14:24 developer ~0004173	Since your question is about usage of Pgpool-II. I recommend you to move to the pgpool-general mailing list. https://www.pgpool.net/mailman/listinfo/pgpool-general

Date Modified	Username	Field	Change
2022-11-09 11:02	jaewan	New Issue
2022-11-09 14:05	t-ishii	Note Added: 0004136
2022-11-09 14:59	jaewan	Note Added: 0004137
2022-11-09 16:24	t-ishii	Assigned To	=> t-ishii
2022-11-09 16:24	t-ishii	Status	new => assigned
2022-11-09 16:28	t-ishii	Note Added: 0004138
2022-11-09 17:54	jaewan	Note Added: 0004139
2022-11-09 19:34	t-ishii	Note Added: 0004140
2022-11-09 19:35	t-ishii	Note Edited: 0004140
2022-11-10 14:40	jaewan	Note Added: 0004141
2022-11-10 15:06	jaewan	Note Added: 0004142
2022-11-10 16:03	jaewan	Note Added: 0004143
2022-11-10 17:22	t-ishii	Note Added: 0004144
2022-11-10 18:04	jaewan	Note Added: 0004145
2022-11-14 12:09	t-ishii	Note Added: 0004146
2022-11-15 09:02	jaewan	Note Added: 0004147
2022-11-29 14:59	jaewan	Note Added: 0004153
2022-11-30 16:30	t-ishii	Note Added: 0004155
2022-12-01 12:00	jaewan	Note Added: 0004156
2022-12-01 13:36	t-ishii	Note Added: 0004157
2022-12-09 13:56	t-ishii	Note Added: 0004163
2022-12-09 15:19	jaewan	Note Added: 0004164
2022-12-09 17:19	t-ishii	Note Added: 0004165
2022-12-12 17:01	t-ishii	Note Added: 0004167
2022-12-12 19:23	t-ishii	Note Edited: 0004167
2022-12-13 10:53	jaewan	Note Added: 0004171
2022-12-13 11:19	jaewan	Note Added: 0004172
2022-12-13 14:24	t-ishii	Note Added: 0004173

View Issue Details

Activities

Issue History