View Issue Details

IDProjectCategoryView StatusLast Update
0000772Pgpool-IIBugpublic2022-12-13 14:24
Reporterjaewan Assigned Tot-ishii  
PrioritynormalSeveritytweakReproducibilityalways
Status assignedResolutionopen 
Product Version4.3.3 
Summary0000772: all backend node server reboot
DescriptionHello, I am an engineer who uses pgpool.
My configuration consists of three servers running pgpool and two servers running postgresql.

When the postgresql server is powered off and powered on, pgpool will attempt to failover in a few moments.
At this time, we confirmed 'pcp_node_info' that the 'primary' node becomes 'standby' and both postgresql will remain as standby.
There is no 'primary' so it is not available for general service.

If you manually attach a node that was previously 'primary' using 'pcp_attach_node', the primary will come back to normal.

What I'm curious about is that if all the backend servers are down at once, can't it be automatically restored?
Is this the intended move?
TagsNo tags attached.

Activities

t-ishii

2022-11-09 14:05

developer   ~0004136

> What I'm curious about is that if all the backend servers are down at once, can't it be automatically restored?
Automatic restore is not technically difficult. However Pgpool-II cares about the server's healthiness after rebooting. So Pgpool-II intendedly leaves the status of PostgreSQL to down in case of PostgreSQL rebooting. It expects the judgement from human.

By the way you can create your own script so that it periodically checks the status managed by Pgpool-II and the actual status of PostgreSQL by looking at the output of pcp_node_info.

t-ishii$ pcp_node_info -w -p 11001 -v
Hostname : localhost
Port : 11002
Status : 2
Weight : 0.500000
Status Name : up
Backend Status Name : up
Role : primary
Backend Role : primary
Replication Delay : 0
Replication State : none
Replication Sync State : none
Last Status Change : 2022-11-09 13:31:48

Hostname : localhost
Port : 11003
Status : 2
Weight : 0.500000
Status Name : up
Backend Status Name : up
Role : standby
Backend Role : standby
Replication Delay : 0
Replication State : streaming
Replication Sync State : async
Last Status Change : 2022-11-09 13:31:48

If "Backend Status Name" is "up" and "Status Name" is "down", the script will try pcp_attach_node to let Pgpool-II set the "up" status.

jaewan

2022-11-09 14:59

reporter   ~0004137

Thank you very much for your reply. There was no one around me to ask questions, and I found this place through many searches.

What do you mean by automatic restoration?
For example, turn on the back-end server automatically, or when the back-end server turns on, it automatically attach to the pgpool?

Is there a way in pgpool to automatically execute the last mentioned "pcp_attach_node" when the backend server is down and up again?

Or should I make the script you mentioned in the middle to check the status of the backend node periodically?

Thank you again for your reply.

t-ishii

2022-11-09 16:28

developer   ~0004138

You are welcome.

> What do you mean by automatic restoration?
This one:
> when the back-end server turns on, it automatically attach to the pgpool?

> Is there a way in pgpool to automatically execute the last mentioned "pcp_attach_node" when the backend server is down and up again?
No.
> Or should I make the script you mentioned in the middle to check the status of the backend node periodically?
Yes.

jaewan

2022-11-09 17:54

reporter   ~0004139

Thank you for your answer.

> Or should I make the script you mentioned in the middle to check the status of the backend node periodically?
Yes.

You said yes, so for example, how to run a script that periodically checks pcp_node_info for all backend nodes and if the Status Name is down, runs pcp_attach_node?

At this time, if the backup server does not die at the same time, but only the primary dies and the failover occurs,
Hostname : DB01
Port : 5432
Status : 3
Weight : 0.500000
Status Name : down
Backend Status Name : up
Role : standby
Backend Role : primary
Replication Delay : 0
Replication State : none
Replication Sync State : none
Last Status Change : 2022-11-09 15:13:39
That's what happens.

If you run pcp_attach_node for this node,
Replication_delay occurs when both backend nodes become primary as the Backend Role becomes primary.

I wonder if I set it up wrong or if there is anything else I need to do.

If you don't understand the question, I'd like to show you the entire process with the capture screen.


I have a lot of questions, but I'll try it all at once.

1. Is the pgpool main process intended to die if the network service of the server running pgpool leader is disconnected (e.g. systemctl restart network)?
2. The following is after the backend node failover occurred.
Hostname : DB01
Port : 5432
Status : 3
Weight : 0.500000
Status Name : down
Backend Status Name : up
Role : standby
Backend Role : primary
Replication Delay : 0
Replication State : none
Replication Sync State : none
Last Status Change : 2022-11-09 15:13:39

Is there any automated way to make the Backend Role stand by here?
I'm manually running pcp_recovery_node to normalize it.
But I would like to do so if automation is possible without human intervention.

3. Is there a way to automatically drive the weight of the load balance of the pgpool to the primary?
If you set the ratio to 1:0 in pgpool.conf, the load balance operation does not seem to work well during failover.
The reason is that there are currently two DB Servers and they are using it async. If you do sync, you'll be waiting indefinitely to sync when Standby dies. This will cause problems across the service.

Thank you very much for your response.
You are my savior.

t-ishii

2022-11-09 19:34

developer   ~0004140

Last edited: 2022-11-09 19:35

Ok, let's discuss simpler strategy.
Probably the simplest solution is to prevent health checking from failure while PostgreSQL is rebooting .
Suppose the rebooting process takes 30 seconds. Set (health_check_retry_delay *
health_check_max_retries) > 30, for example set health_check_retry_delay = 1 and health_check_max_retries = 60. With this setting, health check will retry up to 60 seconds before detecting PostgreSQL down. So unless the rebooting takes more than 60 seconds, failover will never happen and clients can happily connect to pgpool and PostgreSQL after PostgreSQL's rebooting.
Please note that you should set failover_on_backend_error = off, otherwise if client connects to pgpool, pgpool tries to connect PostgreSQL and fails, which triggers immediate failover.

jaewan

2022-11-10 14:40

reporter   ~0004141

As you mentioned, I tested it after modifying it.

In my opinion, the primary will only be able to 'select query' or not be able to use DB during the time the server reboots.

As a result of the test, if I query through pgpool vip while the server reboots, 'authentication failed' occurs, is this the result of my other mistake?

What do you think about the idea that the service will be unavailable during the reboot?

jaewan

2022-11-10 15:06

reporter   ~0004142

For authentication failure, I made a mistake while creating a new server for testing.

jaewan

2022-11-10 16:03

reporter   ~0004143

We conducted a proper test.
As I thought
"failed to create a backend 1 connection"
I get an error.

I think this method will be useful in a short time, such as restarting the network service after modifying the network of the OS.

t-ishii

2022-11-10 17:22

developer   ~0004144

> I think this method will be useful in a short time, such as restarting the network service after modifying the network of the OS.
Is it worth to add the method to pgpool documentation?

jaewan

2022-11-10 18:04

reporter   ~0004145

I always appreciate your continuous response.

>> I think this method will be useful in a short time, such as restarting the network service after modifying the network of the OS.
> Is it worth to add the method to pgpool documentation?
I think it'll be good enough as a tip.
If you write it with a detailed description of the situation, it will be helpful for regular users like me.

As a result, we found that it is best to make sure that actions after failover are done manually or periodically scripted.

I have another question.

I understand that when a network dies for a few seconds on a server where pgpool is running (for example, systemctl restart network), pgpool (watchdog) kills the pgpool main process if it tries to access the network several times in a moment and fails.
I looked it up and found out that it was intentionally made like this. Is this true?

Another question is to make the load balancing ratio 1:0.
If a failover occurs, rather than a fixed 1:0, the primary node is assigned a fixed 1.
Is this possible?

t-ishii

2022-11-14 12:09

developer   ~0004146

> I understand that when a network dies for a few seconds on a server where pgpool is running (for example, systemctl restart network), pgpool (watchdog) kills the pgpool main process if it tries to access the network several times in a moment and fails.
I looked it up and found out that it was intentionally made like this. Is this true?

I am going to ask this to other developers.

> Another question is to make the load balancing ratio 1:0.
If a failover occurs, rather than a fixed 1:0, the primary node is assigned a fixed 1.
Is this possible?

Yes. You can use "database_redirect_preference_list" parameter for this purpose. Add following to pgpool.conf and reload or restart pgpool:

database_redirect_preference_list = '.*:primary'

Any SELECTs to Any database (represented by '.*') will be forwarded to the primary node regardless the backend node id.

See https://www.pgpool.net/docs/latest/en/html/runtime-config-load-balancing.html for more details.

jaewan

2022-11-15 09:02

reporter   ~0004147

Thank you for your answer.

database_redirect_preference_list Sorry for this parameter, I should have checked the document more carefully.
And thank you for letting me know. It was helpful.

> I am going to ask this to other developers.

I'd appreciate it if you could let me know after receiving your answer.

jaewan

2022-11-29 14:59

reporter   ~0004153

Do I have to wait a little longer?

t-ishii

2022-11-30 16:30

developer   ~0004155

A watchdog developer is looking at this. Please wait for a while.

jaewan

2022-12-01 12:00

reporter   ~0004156

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=blob;f=src/watchdog/watchdog.c;h=d7001df7d27f4601744628a91ad0479f4bfcba1e;hb=refs/heads/master

The content is in line 6394 on the above page.

t-ishii

2022-12-01 13:36

developer   ~0004157

I know. I need confirmation from the developer who wrote the code.

t-ishii

2022-12-09 13:56

developer   ~0004163

> pgpool (watchdog) kills the pgpool main process if it tries to access the network several times in a moment and fails.
I looked it up and found out that it was intentionally made like this. Is this true?

I think this is the answer for you too.
https://www.pgpool.net/pipermail/pgpool-general/2022-December/008570.html

jaewan

2022-12-09 15:19

reporter   ~0004164

But I didn't modify the wd_monitoring_interfaces_list. (i.e. empty value)
If it's an empty value, monitoring should be disabled, but my pgpool commits suicide.

t-ishii

2022-12-09 17:19

developer   ~0004165

Ok, I have invited Muhmmad Usama, who is the authority of watchdog.

t-ishii

2022-12-12 17:01

developer   ~0004167

Last edited: 2022-12-12 19:23

For some reason I don't know (maybe Mantis issue), he cannot see this issue. After forwarding the conversation, he said he needs the pgpool log file. Can you share it with him?
(please attach the file. I will forward it to him.)

jaewan

2022-12-13 10:53

reporter   ~0004171

I want to do that now, but it's not an immediate environment.
It cannot be retested on a commercial server, which can take several days to configure the environment.
Thank you very much for trying to analyze the log.
I'll get ready quickly.

jaewan

2022-12-13 11:19

reporter   ~0004172

There's a problem that happened last night.
It's been restored now.
I rebooted one DB server in the environment where I am using two DB servers of the three pgpool servers mentioned above.

Then my client came up with the phrase "pgpool is not accepting any new connections."

We looked at pcp_node_info and the results of the two were the same.
Role: standby,
Backend Role: primary

The solution I did was to create a standby.signal file in either DB and perform pcp_recovery_node to make the entire service work.

1. I wonder what you think is a good solution.
2. There are two DB servers, but I wonder if there is a way that the service does not stop just because one is dead.
3. Did I set something up wrong?
4. When the above phenomenon occurs, what is the way to know the DB that was previously standby?

t-ishii

2022-12-13 14:24

developer   ~0004173

Since your question is about usage of Pgpool-II. I recommend you to move to the pgpool-general mailing list.
https://www.pgpool.net/mailman/listinfo/pgpool-general

Issue History

Date Modified Username Field Change
2022-11-09 11:02 jaewan New Issue
2022-11-09 14:05 t-ishii Note Added: 0004136
2022-11-09 14:59 jaewan Note Added: 0004137
2022-11-09 16:24 t-ishii Assigned To => t-ishii
2022-11-09 16:24 t-ishii Status new => assigned
2022-11-09 16:28 t-ishii Note Added: 0004138
2022-11-09 17:54 jaewan Note Added: 0004139
2022-11-09 19:34 t-ishii Note Added: 0004140
2022-11-09 19:35 t-ishii Note Edited: 0004140
2022-11-10 14:40 jaewan Note Added: 0004141
2022-11-10 15:06 jaewan Note Added: 0004142
2022-11-10 16:03 jaewan Note Added: 0004143
2022-11-10 17:22 t-ishii Note Added: 0004144
2022-11-10 18:04 jaewan Note Added: 0004145
2022-11-14 12:09 t-ishii Note Added: 0004146
2022-11-15 09:02 jaewan Note Added: 0004147
2022-11-29 14:59 jaewan Note Added: 0004153
2022-11-30 16:30 t-ishii Note Added: 0004155
2022-12-01 12:00 jaewan Note Added: 0004156
2022-12-01 13:36 t-ishii Note Added: 0004157
2022-12-09 13:56 t-ishii Note Added: 0004163
2022-12-09 15:19 jaewan Note Added: 0004164
2022-12-09 17:19 t-ishii Note Added: 0004165
2022-12-12 17:01 t-ishii Note Added: 0004167
2022-12-12 19:23 t-ishii Note Edited: 0004167
2022-12-13 10:53 jaewan Note Added: 0004171
2022-12-13 11:19 jaewan Note Added: 0004172
2022-12-13 14:24 t-ishii Note Added: 0004173