View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000445 | Pgpool-II | Bug | public | 2018-11-14 11:13 | 2018-11-27 16:44 |
| Reporter | MandyQCX | Assigned To | pengbo | ||
| Priority | high | Severity | major | Reproducibility | random |
| Status | resolved | Resolution | open | ||
| Platform | Linux | OS | centos | OS Version | 7.3 |
| Product Version | 4.0.1 | ||||
| Summary | 0000445: failover fail when I hold on connection with the port of pgpool-II ,9999 | ||||
| Description | I have 2 severs with centOS7.3, one for pgpool-II-4.0.1 and postgresql-11.1, the other for postgresql-11.1. I installed pgpool-II and postgresql by using sources.Actually, pgpool-II-4.0.0 has the same problem. the primary server is node 1 now, I want to change it to node 0. When I use the command,"psql -p 9999" to connect postgresql server, I use "pg_ctl stop" command to shut down the primary postgresql server(node 1),but the standby server(node 0) does not be promoted to be a new primary server. The failover log says, it actually failovered successfully, the node 0 was promoted to a new primary, but after 5 seconds, it faliovered again.I failovered for change the primary from 0 to 1. But the node 1 is down now, because I can not recovery it in a such short time. So the failover failed,it set the new primary as null. Here is the log, failover.sh FALLING_NODE: 1; FALLING_HOST: pgsrv14(node1); OLDPRIMARY_NODE: 1; NEW_PRIMARY: pgsrv13(node 0); at Mon Nov 12 05:22:19 EST 2018 ssh -f -n -T postgres@pgsrv13 /usr/local/pgpool-4.0.1/replscript/promote.sh -d pgsrv14 failover done! failover.sh FALLING_NODE: 0; FALLING_HOST: pgsrv13(node 0); OLDPRIMARY_NODE: 0; NEW_PRIMARY: ; at Mon Nov 12 05:22:24 EST 2018 ssh -f -n -T postgres@ /usr/local/pgpool-4.0.1/replscript/promote.sh -d pgsrv13 failover done! It can failover successfully when I did not hold connections with the port of pgpool. And when I hold connections with the port of pgpool-II, the accident happened when the new primary is changed form 1 to 0 randomly. It means that it's not always happened.It happened infrequently when the new primary is changed from 0 to 1. | ||||
| Steps To Reproduce | Firstly,use the "psql -p 9999 "command to hold on the connection of postgresql server with port of pgpool in pgpool server. Secondly, use the "pg_ctl stop "command to shut down the primary server. And then you will see the failover failed.I would offer you pgpool.conf and failover.sh. | ||||
| Tags | failover | ||||
|
|
|
|
|
Could you show the debug log of pgpool when failover failed. |
|
|
Here is the debug log of pgpool when the failed failover happened , it is begin at Nov 12 05:22:19. |
|
|
Thank you. Could you share the debug log by starting pgpool with "-d" option? You are setting "enable_pool_hba = on", how did you set the authentication method in pool_hba.conf and pg_hba.conf for postgres user? |
|
|
Here is the information when I start pgpool with -d option, and pgpool_d.log is the log of pgpool at the same time. [postgres@pgsrv13 ~]$ pgpool -d 2018-11-14 12:19:28: pid 2273: DEBUG: initializing pool configuration 2018-11-14 12:19:28: pid 2273: DETAIL: num_backends: 2 total_weight: 2.000000 2018-11-14 12:19:28: pid 2273: DEBUG: initializing pool configuration 2018-11-14 12:19:28: pid 2273: DETAIL: backend 0 weight: 1073741823.500000 flag: 0000 2018-11-14 12:19:28: pid 2273: DEBUG: initializing pool configuration 2018-11-14 12:19:28: pid 2273: DETAIL: backend 1 weight: 1073741823.500000 flag: 0000 |
|
|
Sorry, this is the pool_hba.conf |
|
|
I mean this debug log include failover when failover fails. And the failover log "/var/log/pgpool/failover.log". Because I could not reproduce this issue, I want to try to find the reason in log files. |
|
|
|
|
|
Hello ,I recorded video to reproduce the issue.It is failover_fail. In the video,I did failover four times , the first three times are successful, but the last one is failed. The failover in the first two times,I did not keep connection with the port of pgpool, and they were successful. And failover in the last two times, I did keep connection with the port of pgpool, and it worked when I changed primary node from 0 to 1.but it failed when I changed primary node from 1 to 0. If you need anything else , please let me know. |
|
|
this is the video. |
|
|
hello,it seems that I could not upload the video,so I upload it to youtube.it is here,https://youtu.be/B1x6befx5tE |
|
|
hello,do you have any idea about the video,https://youtu.be/B1x6befx5tE? |
|
|
Thank you for your video. I confirmed the video. I used the same config files provided by you but still I couldn't reproduce this issue. Could you enable pgpool debug mode by editing "/etc/sysconfig/pgpool" file and do failover again? Enable debug mode: # vi /etc/sysconfig/pgpool ------------------------------ OPTS=" -d -D -n" ------------------------------ If possible, could you try if this issue occurs by pgpool-II 4.0.2? |
|
|
Thank you for you reply! I already tried with pgpool-II 4.0.2, and the issue occurs. And ,I didn't find the parameter that called 'OPTS' in pgpool.conf. Is the parameter also called as 'log_min_messages' in pgpool.conf . if it is ,should I set it as debug 5? I installed pgpool-II by source . Is the same install way with you ? |
|
|
I installed Pgpool-II by RPM. If you installed Pgpool-II by source code, please set log_min_messages = debug5 to log the debug log. |
|
|
Hello, I have set log_min_messages=debug5 in pgpool.conf . And then I did failover again, it is failed when I did the second time failover where I change the primary node from 1 to 0. The pgpool.log and failover.log which includes the failover time are for you. Thank you ! |
|
|
Thank you for your log. After the failover (node1->node0), Pgpool-II can't connect new primary node0 duo to "detecting postmaster down error" error, then failover occured again. ==== Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [637-1] 2018-11-26 05:38:53: pid 2583: DEBUG: pool_read: read 117 bytes from backend 0 Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [638-1] 2018-11-26 05:38:53: pid 2583: DEBUG: detect error: kind: E Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [639-1] 2018-11-26 05:38:53: pid 2583: DEBUG: detect error: kind: E Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [640-1] 2018-11-26 05:38:53: pid 2583: DEBUG: detecting postmaster down error Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [640-2] 2018-11-26 05:38:53: pid 2583: DETAIL: receive admin shutdown error from a node Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [641-1] 2018-11-26 05:38:53: pid 2583: LOG: reading and processing packets Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [641-2] 2018-11-26 05:38:53: pid 2583: DETAIL: postmaster on DB node 0 was shutdown by administrative command Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [642-1] 2018-11-26 05:38:53: pid 2583: LOG: received degenerate backend request for node_id: 0 from pid [2583] Nov 26 05:38:53 pgsrv13 pgpool: postgres postgres idle[2583]: [643-1] 2018-11-26 05:38:53: pid 2583: DEBUG: sending signal:10 to the parent process with PID:2286 ==== It seems that due to some reasons the backend server shutdown. Is some shutdown signal sent to the new primary server? What did you execute in promote.sh? |
|
|
Thanks for you reply! When the primary node is down , pgpool will perform the 'failover.sh', then perform 'promote.sh', and 'promote.sh' will perform 'disable_postgresql.sh'. The shells that used in failover were sent to you! May I ask you for the shells that used in failover by you! I want to test failover again using your failover shells to see what will happen. By the way, the failed failover event also occurs in other versions of pgpool , like pgpool-II 3.7.6 ,4.0.0, 4.0.1, 4.0.2. |
|
|
Hello, I think I found the reason why failover failed. In the promote.sh, I found that it execute the command 'pg_ctl restart' in primary node, which is the admin shutdown that show in log. After removing the command, the failover succeed. I should have checked the failover shells.I am so sorry. Thank you for help! Thank you very much! Best wish! |
|
|
Great! I just made my failover script like the example below: http://www.pgpool.net/docs/latest/en/html/example-cluster.html I will mark this issue as "resolved". |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2018-11-14 11:13 | MandyQCX | New Issue | |
| 2018-11-14 11:13 | MandyQCX | File Added: failover.sh | |
| 2018-11-14 11:13 | MandyQCX | File Added: pgpool.conf | |
| 2018-11-14 11:13 | MandyQCX | Tag Attached: failover | |
| 2018-11-14 14:36 | pengbo | Note Added: 0002247 | |
| 2018-11-14 15:15 | MandyQCX | File Added: pgpool.log | |
| 2018-11-14 15:15 | MandyQCX | Note Added: 0002248 | |
| 2018-11-14 17:27 | pengbo | Note Added: 0002249 | |
| 2018-11-14 17:28 | pengbo | Note Edited: 0002249 | |
| 2018-11-14 19:03 | MandyQCX | File Added: pg_hba.conf | |
| 2018-11-14 19:03 | MandyQCX | File Added: pgpool-2.conf | |
| 2018-11-14 19:03 | MandyQCX | File Added: pgpool_d.log | |
| 2018-11-14 19:03 | MandyQCX | Note Added: 0002250 | |
| 2018-11-15 09:56 | MandyQCX | File Added: pool_hba.conf | |
| 2018-11-15 09:56 | MandyQCX | Note Added: 0002255 | |
| 2018-11-15 18:22 | pengbo | Note Added: 0002260 | |
| 2018-11-20 11:38 | pengbo | Assigned To | => pengbo |
| 2018-11-20 11:38 | pengbo | Status | new => feedback |
| 2018-11-20 11:38 | pengbo | Description Updated | |
| 2018-11-20 11:38 | pengbo | Steps to Reproduce Updated | |
| 2018-11-20 17:04 | MandyQCX | File Added: failover.log | |
| 2018-11-20 17:04 | MandyQCX | File Added: pcp.conf | |
| 2018-11-20 17:04 | MandyQCX | File Added: pgpool-3.conf | |
| 2018-11-20 17:04 | MandyQCX | File Added: pgpool-2.log | |
| 2018-11-20 17:04 | MandyQCX | File Added: pg_hba-2.conf | |
| 2018-11-20 17:04 | MandyQCX | File Added: pool_hba-2.conf | |
| 2018-11-20 17:04 | MandyQCX | File Added: postgresql.conf | |
| 2018-11-20 17:04 | MandyQCX | Note Added: 0002273 | |
| 2018-11-20 17:04 | MandyQCX | Status | feedback => assigned |
| 2018-11-20 17:07 | MandyQCX | Note Added: 0002274 | |
| 2018-11-20 18:22 | MandyQCX | Note Added: 0002275 | |
| 2018-11-23 17:46 | MandyQCX | Note Added: 0002279 | |
| 2018-11-25 21:02 | pengbo | Note Added: 0002280 | |
| 2018-11-25 21:04 | pengbo | Note Edited: 0002280 | |
| 2018-11-26 00:23 | pengbo | Note Edited: 0002280 | |
| 2018-11-26 11:09 | MandyQCX | Note Added: 0002281 | |
| 2018-11-26 11:14 | pengbo | Note Added: 0002282 | |
| 2018-11-26 11:54 | MandyQCX | File Added: debug5.rar | |
| 2018-11-26 11:54 | MandyQCX | Note Added: 0002283 | |
| 2018-11-26 21:56 | pengbo | Note Added: 0002285 | |
| 2018-11-27 10:40 | MandyQCX | File Added: failover_promote.rar | |
| 2018-11-27 10:40 | MandyQCX | Note Added: 0002286 | |
| 2018-11-27 16:39 | MandyQCX | Note Added: 0002288 | |
| 2018-11-27 16:44 | pengbo | Note Added: 0002290 | |
| 2018-11-27 16:44 | pengbo | Status | assigned => resolved |