View Issue Details

IDProjectCategoryView StatusLast Update
0000133Pgpool-IIBugpublic2016-03-25 02:43
ReportermbAssigned ToMuhammad Usama 
PrioritynormalSeverityminorReproducibilityrandom
Status feedbackResolutionopen 
Product Version 
Target VersionFixed in Version 
Summary0000133: Get empty parameters for failover.sh: Master and slave can not define a new primary node
DescriptionpgPool is configurate in master-slave-mode with streaming replication.
If I shut down the master database the failover.sh script should be execute and the slave database should be to master database. But the master execute the failover.sh script with empty parameters:

Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + FALLING_NODE=0
Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + OLDPRIMARY_NODE=0
Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + NEW_PRIMARY=
Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + PGDATA=
Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + '[' 0 = 0 ']'
Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + '[' '' '!=' '' ']'
Apr 21 14:58:05 lnx-dbpgxcdev2 pgpool: + exit 0

and the slave executes the failover.sh script with following parameters:


Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + FALLING_NODE=1
Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + OLDPRIMARY_NODE=1
Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + NEW_PRIMARY=10.1.0.2
Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + PGDATA=/var/lib/postgresql/9.3/main
Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + '[' 1 = 1 ']'
Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + '[' 10.1.0.2 '!=' '' ']'
Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + '[' 10.1.0.2 = 10.1.0.2 ']'
Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + /usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/9.3/main promote
Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: Server wird befördert
Apr 21 14:57:50 lnx-dbpgxcdev1 pgpool: + exit 0

Master and slave can not define a new primary node. Master set new primary node to -1

Apr 21 14:58:16 lnx-dbpgxcdev2 pgpool[22338]: [3540-1] 2015-04-21 14:58:16: pid 22338: LOG: failover: set new primary node: -1
Apr 21 14:58:16 lnx-dbpgxcdev2 pgpool: 2015-04-21 14:58:16: pid 22338: LOG: failover: set new primary node: -1

and the slave set new primary node first to 0

Apr 21 14:57:55 lnx-dbpgxcdev1 pgpool[5669]: [30-1] 2015-04-21 14:57:55: pid 5669: LOG: failover: set new primary node: 0
Apr 21 14:57:55 lnx-dbpgxcdev1 pgpool[5669]: [31-1] 2015-04-21 14:57:55: pid 5669: LOG: failover: set new master node: 0

and after a second to -1

Apr 21 14:58:09 lnx-dbpgxcdev1 pgpool[5669]: [127-1] 2015-04-21 14:58:09: pid 5669: LOG: failover: set new primary node: -1
Apr 21 14:58:09 lnx-dbpgxcdev1 pgpool: 2015-04-21 14:58:09: pid 5669: LOG: failover: set new primary node: -1

The database is no longer available over virtual IP.
Steps To Reproduce1. Start postgresql on server1
2. Start postgresql on server2
3. Start pgPool-II on server1
4. Start pgPool-II on server2
5. Stop postgresql on server1

failover.sh script will be execute on master with empty parameters and on slave with correct parameters.
Additional InformationpgPool-II version: 3.4.2-0
postgresql version: 9.3

TagsNo tags attached.

Activities

mb

2015-04-22 15:52

reporter  

configuration_files_and_logs.zip (69,304 bytes)

Muhammad Usama

2015-04-25 01:29

developer   ~0000530

The problem is with your setup of pgpool-II on node 10.1.0.2. pgpool-II is not able to make a connection with PostgreSQL because of "Authentication failure"
see the related log entries at the start of "syslog_10.1.0.2" file.

Now, Since the pgpool-II is not able to communicate with any configured backends so no master node gets selected and failover script is called with empty parameter values for "master database host name %H" and "master database directory %R" as there is no master node.

mb

2015-05-04 16:24

reporter  

backtrace.txt (2,126 bytes)
[New LWP 2035]
[New LWP 1872]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `pgpool: lifecheck     '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000046b1d7 in ?? ()
#0  0x000000000046b1d7 in ?? ()
No symbol table info available.
#1  0x000000000046ceac in ?? ()
No symbol table info available.
#2  0x00000000004a0c8a in ?? ()
No symbol table info available.
#3  0x00007f630164e182 in start_thread (arg=0x7f62fbad1700) at pthread_create.c:312
        __res = <optimized out>
        pd = 0x7f62fbad1700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140063105947392, 775236642125850364, 0, 0, 140063105948096, 140063105947392, -862502257377728772, -863508777169770756}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#4  0x00007f63006c847d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.

Thread 2 (Thread 0x7f6301c91780 (LWP 1872)):
#0  0x00007f630164f66b in pthread_join (threadid=140063105947392, thread_return=0x7ffc3c144788) at pthread_join.c:92
#1  0x00000000004a0f3a in ?? ()
#2  0x000000000049d965 in ?? ()
#3  0x000000000049ad20 in ?? ()
#4  0x000000000049b073 in ?? ()
#5  0x000000000040aced in ?? ()
#6  0x00000000004053fe in ?? ()
#7  0x00007f63005efec5 in __libc_start_main (main=0x404ef0, argc=3, argv=0x7ffc3c153d28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc3c153d18) at libc-start.c:287
#8  0x0000000000405863 in ?? ()

Thread 1 (Thread 0x7f62fbad1700 (LWP 2035)):
#0  0x000000000046b1d7 in ?? ()
#1  0x000000000046ceac in ?? ()
#2  0x00000000004a0c8a in ?? ()
#3  0x00007f630164e182 in start_thread (arg=0x7f62fbad1700) at pthread_create.c:312
#4  0x00007f63006c847d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
backtrace.txt (2,126 bytes)

mb

2015-05-04 16:31

reporter   ~0000534

I have checked the configuration on node 10.1.0.2. The authentication parameters are correct.


But I discover another problem which produces the empty parameters for failover.sh script.

If the debug libraries for pgpool-II are installed on node 10.1.0.2 and on 10.1.0.3 the failover.sh script will be execute correctly with the right parameters.

But if this libraries aren't installed the failover.sh script will be execute on node 10.1.0.2 with empty parameters and I will get a CoreDump on node 10.1.0.2.

You can see the backtrace of CoreDump in attached files.

Muhammad Usama

2015-06-11 23:00

developer   ~0000539

Hi

First of all sorry for the delayed response.

Basically there are two seperate problems. One is the segmentation fault and the stacktrace shared by you suggests that it is in the lifecheck process so it should not have anything to do with the other problem you are facing of getting empty parameters in the failover command. I am looking into the crash issue which I am still not able to reproduce on my local setup.

And the analysis on the original problem is as follows

The reason you are getting the empty parameters in failover script is only because pgpool-II is not able to select a new master node after the failure of old primary node. The failover command in your pgpool-II configuration files is passing 4 parameters to the failover.sh script. parameter 3 and 4 (%H and %R) gets the value of Hostname and cluster path of new master node respectively. And these parameter will always get the empty values when pgpool-II does not find any valid backend node to select as a new master node.
So the most probable reason the failover script is getting empty values for %H and %R parameters is, pgpool-II is not able to connect to the backend PostgreSQL server. And the logs files you shared shows many errors suggesting that pgpool-II is failing to connect to PostgreSQL backend node.

For example in syslog_10.1.0.3 log file shared above there are many backend connection failure messages

--log file snippet start--

Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool[22397]: [6551-1] 2015-04-21 14:58:00: pid 22397: LOG: failed to connect to PostgreSQL server on "10.1.0.3:5432",

Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool: 2015-04-21 14:58:00: pid 22397: LOG: failed to connect to PostgreSQL server on "10.1.0.3:5432",

Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool[22397]: [6552-1] 2015-04-21 14:58:00: pid 22397: ERROR: failed to make persistent db connection

Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool[22397]: [6552-2] 2015-04-21 14:58:00: pid 22397: DETAIL: connection to host:"10.1.0.3:5432" failed

Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool: 2015-04-21 14:58:00: pid 22397: ERROR: failed to make persistent db connection

Apr 21 14:58:00 lnx-dbpgxcdev2 pgpool: 2015-04-21 14:58:00: pid 22397: DETAIL: connection to host:"10.1.0.3:5432" failed

--log file snippet end--

So can please check the reason of these connection failure messages appearing in the pgpool-II log files. See the PostgreSQL server log file to identify what is causing these connection failures. and secondly please check the output of "show pool_nodes;" command before triggering the failover, if pgpool-II is successfully connected to all PostgreSQL backend node.

Thanks and best regards

QM

2016-03-25 02:43

reporter   ~0000719

Hi, I have a feedback :

The same issue occur with pgpool 3.5 and posgresql 9.5

My configuration is based on : http://www.pgpool.net/pgpool-web/contrib_docs/watchdog_master_slave/en.html

When the debug libraries for pgpool-II are installed on all my nodes the failover.sh script is executed correctly with the right parameters. Without the debug libraries, if y shutdown postgres (the master) the failover.sh script is called with empty parameters.

Issue History

Date Modified Username Field Change
2015-04-22 15:52 mb New Issue
2015-04-22 15:52 mb File Added: configuration_files_and_logs.zip
2015-04-22 16:27 t-ishii Assigned To => Muhammad Usama
2015-04-22 16:27 t-ishii Status new => assigned
2015-04-25 01:29 Muhammad Usama Note Added: 0000530
2015-05-04 16:24 mb File Added: backtrace.txt
2015-05-04 16:31 mb Note Added: 0000534
2015-06-11 23:00 Muhammad Usama Note Added: 0000539
2015-06-29 18:22 Muhammad Usama Status assigned => feedback
2016-03-25 02:43 QM Note Added: 0000719