View Issue Details

IDProjectCategoryView StatusLast Update
0000445Pgpool-IIBugpublic2018-11-27 16:44
ReporterMandyQCXAssigned Topengbo 
PriorityhighSeveritymajorReproducibilityrandom
Status resolvedResolutionopen 
PlatformLinuxOScentosOS Version7.3
Product Version4.0.1 
Target VersionFixed in Version 
Summary0000445: failover fail when I hold on connection with the port of pgpool-II ,9999
Description I have 2 severs with centOS7.3, one for pgpool-II-4.0.1 and postgresql-11.1, the other for postgresql-11.1. I installed pgpool-II and postgresql by using sources.Actually, pgpool-II-4.0.0 has the same problem.
the primary server is node 1 now, I want to change it to node 0. When I use the command,"psql -p 9999" to connect postgresql server, I use "pg_ctl stop" command to shut down the primary postgresql server(node 1),but the standby server(node 0) does not be promoted to be a new primary server.
The failover log says, it actually failovered successfully, the node 0 was promoted to a new primary, but after 5 seconds, it faliovered again.I failovered for change the primary from 0 to 1.
But the node 1 is down now, because I can not recovery it in a such short time. So the failover failed,it set the new primary as null.
Here is the log,

failover.sh FALLING_NODE: 1; FALLING_HOST: pgsrv14(node1); OLDPRIMARY_NODE: 1; NEW_PRIMARY: pgsrv13(node 0); at Mon Nov 12 05:22:19 EST 2018
ssh -f -n -T postgres@pgsrv13 /usr/local/pgpool-4.0.1/replscript/promote.sh -d pgsrv14
failover done!

failover.sh FALLING_NODE: 0; FALLING_HOST: pgsrv13(node 0); OLDPRIMARY_NODE: 0; NEW_PRIMARY: ; at Mon Nov 12 05:22:24 EST 2018
ssh -f -n -T postgres@ /usr/local/pgpool-4.0.1/replscript/promote.sh -d pgsrv13
failover done!

It can failover successfully when I did not hold connections with the port of pgpool. And when I hold connections with the port of pgpool-II, the accident happened when the new primary is changed form 1 to 0 randomly. It means that it's not always happened.It happened infrequently when the new primary is changed from 0 to 1.



Steps To ReproduceFirstly,use the "psql -p 9999 "command to hold on the connection of postgresql server with port of pgpool in pgpool server.
Secondly, use the "pg_ctl stop "command to shut down the primary server.
And then you will see the failover failed.I would offer you pgpool.conf and failover.sh.
Tagsfailover

Activities

MandyQCX

2018-11-14 11:13

reporter  

pgpool.conf (38,150 bytes)
failover.sh (1,111 bytes)

pengbo

2018-11-14 14:36

developer   ~0002247

Could you show the debug log of pgpool when failover failed.

MandyQCX

2018-11-14 15:15

reporter   ~0002248

Here is the debug log of pgpool when the failed failover happened , it is begin at Nov 12 05:22:19.

pgpool.log (7,304 bytes)

pengbo

2018-11-14 17:27

developer   ~0002249

Last edited: 2018-11-14 17:28

View 2 revisions

Thank you.

Could you share the debug log by starting pgpool with "-d" option?
You are setting "enable_pool_hba = on", how did you set the authentication method in pool_hba.conf and pg_hba.conf for postgres user?

MandyQCX

2018-11-14 19:03

reporter   ~0002250

Here is the information when I start pgpool with -d option, and pgpool_d.log is the log of pgpool at the same time.
[postgres@pgsrv13 ~]$ pgpool -d
2018-11-14 12:19:28: pid 2273: DEBUG: initializing pool configuration
2018-11-14 12:19:28: pid 2273: DETAIL: num_backends: 2 total_weight: 2.000000
2018-11-14 12:19:28: pid 2273: DEBUG: initializing pool configuration
2018-11-14 12:19:28: pid 2273: DETAIL: backend 0 weight: 1073741823.500000 flag: 0000
2018-11-14 12:19:28: pid 2273: DEBUG: initializing pool configuration
2018-11-14 12:19:28: pid 2273: DETAIL: backend 1 weight: 1073741823.500000 flag: 0000

pgpool_d.log (5,097 bytes)
pg_hba.conf (4,656 bytes)
pgpool-2.conf (38,150 bytes)

MandyQCX

2018-11-15 09:56

reporter   ~0002255

Sorry, this is the pool_hba.conf

pool_hba.conf (3,397 bytes)

pengbo

2018-11-15 18:22

developer   ~0002260

I mean this debug log include failover when failover fails.
And the failover log "/var/log/pgpool/failover.log".

Because I could not reproduce this issue, I want to try to find the reason in log files.

MandyQCX

2018-11-20 17:04

reporter  

pcp.conf (900 bytes)
pgpool-2.log (288,271 bytes)
failover.log (1,136 bytes)
pgpool-3.conf (38,150 bytes)

MandyQCX

2018-11-20 17:04

reporter   ~0002273

Hello ,I recorded video to reproduce the issue.It is failover_fail.
In the video,I did failover four times , the first three times are successful, but the last one is failed.
The failover in the first two times,I did not keep connection with the port of pgpool, and they were successful.
And failover in the last two times, I did keep connection with the port of pgpool, and it worked when I changed primary node from 0 to 1.but it failed when I changed primary node from 1 to 0.

If you need anything else , please let me know.

postgresql.conf (23,791 bytes)
pool_hba-2.conf (3,397 bytes)
pg_hba-2.conf (4,656 bytes)

MandyQCX

2018-11-20 17:07

reporter   ~0002274

this is the video.

MandyQCX

2018-11-20 18:22

reporter   ~0002275

hello,it seems that I could not upload the video,so I upload it to youtube.it is here,https://youtu.be/B1x6befx5tE

MandyQCX

2018-11-23 17:46

reporter   ~0002279

hello,do you have any idea about the video,https://youtu.be/B1x6befx5tE?

pengbo

2018-11-25 21:02

developer   ~0002280

Last edited: 2018-11-26 00:23

View 3 revisions

Thank you for your video. I confirmed the video.
I used the same config files provided by you but still I couldn't reproduce this issue.

Could you enable pgpool debug mode by editing "/etc/sysconfig/pgpool" file and do failover again?

Enable debug mode:
# vi /etc/sysconfig/pgpool
------------------------------
OPTS=" -d -D -n"
------------------------------

If possible, could you try if this issue occurs by pgpool-II 4.0.2?

MandyQCX

2018-11-26 11:09

reporter   ~0002281

Thank you for you reply!
I already tried with pgpool-II 4.0.2, and the issue occurs.
And ,I didn't find the parameter that called 'OPTS' in pgpool.conf. Is the parameter also called as 'log_min_messages' in pgpool.conf . if it is ,should I set it as debug 5?
I installed pgpool-II by source . Is the same install way with you ?

pengbo

2018-11-26 11:14

developer   ~0002282

I installed Pgpool-II by RPM.

If you installed Pgpool-II by source code, please set log_min_messages = debug5 to log the debug log.

MandyQCX

2018-11-26 11:54

reporter   ~0002283

Hello,
I have set log_min_messages=debug5 in pgpool.conf . And then I did failover again, it is failed when I did the second time failover where I change the primary node from 1 to 0.
The pgpool.log and failover.log which includes the failover time are for you. Thank you !

debug5.rar (95,858 bytes)

pengbo

2018-11-26 21:56

developer   ~0002285

Thank you for your log.

After the failover (node1->node0), Pgpool-II can't connect new primary node0 duo to "detecting postmaster down error" error, then failover occured again.

====
Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [637-1] 2018-11-26 05:38:53: pid 2583: DEBUG: pool_read: read 117 bytes from backend 0
Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [638-1] 2018-11-26 05:38:53: pid 2583: DEBUG: detect error: kind: E
Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [639-1] 2018-11-26 05:38:53: pid 2583: DEBUG: detect error: kind: E
Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [640-1] 2018-11-26 05:38:53: pid 2583: DEBUG: detecting postmaster down error
Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [640-2] 2018-11-26 05:38:53: pid 2583: DETAIL: receive admin shutdown error from a node
Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [641-1] 2018-11-26 05:38:53: pid 2583: LOG: reading and processing packets
Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [641-2] 2018-11-26 05:38:53: pid 2583: DETAIL: postmaster on DB node 0 was shutdown by administrative command
Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [642-1] 2018-11-26 05:38:53: pid 2583: LOG: received degenerate backend request for node_id: 0 from pid [2583]
Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [643-1] 2018-11-26 05:38:53: pid 2583: DEBUG: sending signal:10 to the parent process with PID:2286
====

It seems that due to some reasons the backend server shutdown. Is some shutdown signal sent to the new primary server?
What did you execute in promote.sh?

MandyQCX

2018-11-27 10:40

reporter   ~0002286

Thanks for you reply!
When the primary node is down , pgpool will perform the 'failover.sh', then perform 'promote.sh', and 'promote.sh' will perform 'disable_postgresql.sh'. The shells that used in failover were sent to you!
May I ask you for the shells that used in failover by you! I want to test failover again using your failover shells to see what will happen.
By the way, the failed failover event also occurs in other versions of pgpool , like pgpool-II 3.7.6 ,4.0.0, 4.0.1, 4.0.2.

failover_promote.rar (3,122 bytes)

MandyQCX

2018-11-27 16:39

reporter   ~0002288

Hello, I think I found the reason why failover failed.
In the promote.sh, I found that it execute the command 'pg_ctl restart' in primary node, which is the admin shutdown that show in log. After removing the command, the failover succeed. I should have checked the failover shells.I am so sorry.
Thank you for help! Thank you very much!
Best wish!

pengbo

2018-11-27 16:44

developer   ~0002290

Great!

I just made my failover script like the example below:
http://www.pgpool.net/docs/latest/en/html/example-cluster.html

I will mark this issue as "resolved".

Issue History

Date Modified Username Field Change
2018-11-14 11:13 MandyQCX New Issue
2018-11-14 11:13 MandyQCX File Added: failover.sh
2018-11-14 11:13 MandyQCX File Added: pgpool.conf
2018-11-14 11:13 MandyQCX Tag Attached: failover
2018-11-14 14:36 pengbo Note Added: 0002247
2018-11-14 15:15 MandyQCX File Added: pgpool.log
2018-11-14 15:15 MandyQCX Note Added: 0002248
2018-11-14 17:27 pengbo Note Added: 0002249
2018-11-14 17:28 pengbo Note Edited: 0002249 View Revisions
2018-11-14 19:03 MandyQCX File Added: pg_hba.conf
2018-11-14 19:03 MandyQCX File Added: pgpool-2.conf
2018-11-14 19:03 MandyQCX File Added: pgpool_d.log
2018-11-14 19:03 MandyQCX Note Added: 0002250
2018-11-15 09:56 MandyQCX File Added: pool_hba.conf
2018-11-15 09:56 MandyQCX Note Added: 0002255
2018-11-15 18:22 pengbo Note Added: 0002260
2018-11-20 11:38 pengbo Assigned To => pengbo
2018-11-20 11:38 pengbo Status new => feedback
2018-11-20 11:38 pengbo Description Updated View Revisions
2018-11-20 11:38 pengbo Steps to Reproduce Updated View Revisions
2018-11-20 17:04 MandyQCX File Added: failover.log
2018-11-20 17:04 MandyQCX File Added: pcp.conf
2018-11-20 17:04 MandyQCX File Added: pgpool-3.conf
2018-11-20 17:04 MandyQCX File Added: pgpool-2.log
2018-11-20 17:04 MandyQCX File Added: pg_hba-2.conf
2018-11-20 17:04 MandyQCX File Added: pool_hba-2.conf
2018-11-20 17:04 MandyQCX File Added: postgresql.conf
2018-11-20 17:04 MandyQCX Note Added: 0002273
2018-11-20 17:04 MandyQCX Status feedback => assigned
2018-11-20 17:07 MandyQCX Note Added: 0002274
2018-11-20 18:22 MandyQCX Note Added: 0002275
2018-11-23 17:46 MandyQCX Note Added: 0002279
2018-11-25 21:02 pengbo Note Added: 0002280
2018-11-25 21:04 pengbo Note Edited: 0002280 View Revisions
2018-11-26 00:23 pengbo Note Edited: 0002280 View Revisions
2018-11-26 11:09 MandyQCX Note Added: 0002281
2018-11-26 11:14 pengbo Note Added: 0002282
2018-11-26 11:54 MandyQCX File Added: debug5.rar
2018-11-26 11:54 MandyQCX Note Added: 0002283
2018-11-26 21:56 pengbo Note Added: 0002285
2018-11-27 10:40 MandyQCX File Added: failover_promote.rar
2018-11-27 10:40 MandyQCX Note Added: 0002286
2018-11-27 16:39 MandyQCX Note Added: 0002288
2018-11-27 16:44 pengbo Note Added: 0002290
2018-11-27 16:44 pengbo Status assigned => resolved