[Pgpool-general] client_ilde_limit_in_recovery

Marcelo Martins pglists at zeroaccess.org
Fri Dec 5 17:42:02 UTC 2008


I have been able to successfully perform an online recovery now with  
no problems.
I have only done once so far but I will be doing a few more shortly.

Thanks Tatsuo

Marcelo
Linux/Solaris System Administrator
PostgreSQL DBA
http://www.zeroaccess.org

On Dec 4, 2008, at 9:21 PM, Marcelo Martins wrote:

> That's awesome, thank you.
> I will download the latest cvs version and try it out.
>
> On Dec 4, 2008, at 20:39, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>
>> Hi Marcelo,
>>
>> With your help, I was able to find the problem.
>> If you connect to pgpool *before* starting recovery, the timeout
>> parameter to select(2) is set to NULL which means it will wait
>> forever. I have modified pool_process_query.c so that it will set
>> timeout whenever client_idle_limit_in_recovery > 0. Please grab the
>> CVS Head and try it out.
>>
>> Thanks for your great help!
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>>
>>> Hi Tatsuo,
>>>
>>>
>>> I have also checked the value for fds
>>>
>>> This "if (((*InRecovery == 0 && pool_config->client_idle_limit > 0)
>>> ||
>>> (*InRecovery && pool_config->client_idle_limit_in_recovery > 0)) &&
>>> fds == 0) " never
>>> becomes true unless I run some query inside the psql connection
>>> that I
>>> created on step 1.
>>>
>>> 1) connect to pgpool with psql
>>>
>>> 2) run pcp_recovery_node
>>>
>>> 3) When log shows that is stuck on starting staging 2 connect to
>>> pgpool PID from step 1
>>>
>>> - 1st GDB backtrace  -
>>>
>>> - on frame 2 - in pool_process_query (frontend=0x8128c18,
>>> backend=0x8128a10, connection_reuse=0,
>>> first_ready_for_query_received=0) at pool_process_query.c:363
>>> 363                                     fds = select(num_fds,
>>> &readmask, &writemask, &exceptmask, &timeout);
>>>
>>> (gdb) frame 2
>>> #2  0x0805a499 in pool_process_query (frontend=0x8128c18,
>>> backend=0x8128a10, connection_reuse=0,
>>> first_ready_for_query_received=0) at pool_process_query.c:365
>>> 365                                     fds = select(num_fds,
>>> &readmask, &writemask, &exceptmask, NULL);
>>> (gdb) print *InRecovery
>>> $1 = 1
>>> (gdb) print pool_config->client_idle_limit
>>> $2 = 0
>>> (gdb) print pool_config->client_idle_limit_in_recovery
>>> $3 = 7
>>> (gdb) print fds
>>> $4 = 135432720
>>>
>>>
>>> 4) List databases inside psql connection created on step 1 "\l"
>>>
>>> 5) Detach gdb from PID and attach it back to let "\l" run
>>>
>>> - 2st GDB backtrace  -
>>>
>>> - on frame 2 - in pool_process_query (frontend=0x8128c18,
>>> backend=0x8128a10, connection_reuse=0,
>>> first_ready_for_query_received=0) at pool_process_query.c:363
>>> 363                                     fds = select(num_fds,
>>> &readmask, &writemask, &exceptmask, &timeout);
>>>
>>> (gdb) bt
>>> #0  0xb7f69402 in ?? ()
>>> #1  0xb7e810fd in select () from /lib/tls/i686/cmov/libc.so.6
>>> #2  0x0805a463 in pool_process_query (frontend=0x8128c18,
>>> backend=0x8128a10, connection_reuse=0,
>>> first_ready_for_query_received=0) at pool_process_query.c:363
>>> #3  0x0804f03e in do_child (unix_fd=3, inet_fd=4) at child.c:428
>>> #4  0x0804bc21 in fork_a_child (unix_fd=3, inet_fd=4, id=3) at
>>> main.c:
>>> 814
>>> #5  0x0804d1e8 in failover () at main.c:1328
>>> #6  0x0804b16b in main (argc=7, argv=0xbfef7c64) at main.c:519
>>> (gdb) frame 2
>>> #2  0x0805a463 in pool_process_query (frontend=0x8128c18,
>>> backend=0x8128a10, connection_reuse=0,
>>> first_ready_for_query_received=0) at pool_process_query.c:363
>>> 363                                     fds = select(num_fds,
>>> &readmask, &writemask, &exceptmask, &timeout);
>>> (gdb) print *InRecovery
>>> $1 = 1
>>> (gdb) print pool_config->client_idle_limit
>>> $2 = 0
>>> (gdb) print pool_config->client_idle_limit_in_recovery
>>> $3 = 7
>>> (gdb) print fds
>>> $4 = 0
>>>
>>>
>>> Once I attach back to process I'm able to see  a line in the pgpool
>>> LOG file as shown below
>>>
>>> Dec  4 09:41:58 debian-db6 pgpool: 2008-12-04 09:41:58 DEBUG: pid
>>> 24697: idle count:1 InRecovery:0 client_idle_limit:7
>>> client_idle_limit_in_recovery:-1074827064
>>>
>>> Then I let gdb continue the process and recovery proceeds since the
>>> if
>>> statement is now able to evaluate to true
>>>
>>>
>>>
>>>
>>> Hope that helps
>>>
>>> If you want to see this happening let me know and I can setup some
>>> VMs
>>> and then provide you with access to it
>>>
>>> -
>>> Marcelo
>>>
>>>
>>> On Dec 4, 2008, at 4:17 AM, Tatsuo Ishii wrote:
>>>
>>>> Thanks!
>>>>
>>>> Can you please print the value of:
>>>>
>>>> *InRecovery
>>>> *pool_config
>>>>
>>>> at frame #2?
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>>
>>>>> Hi Tatsuo,
>>>>>
>>>>> sorry for the delay here.
>>>>> I was able to compile the CVS version now and no problem in  
>>>>> regards
>>>>> to
>>>>> bison, thanks.
>>>>>
>>>>> I have also placed this back on the list
>>>>>>
>>>>>> Thanks. What I want to know is followings:
>>>>>>
>>>>>> 1) connect to pgpool-II using psql
>>>>>
>>>>> Ok connected to pgpool through psql
>>>>>>
>>>>>> 2) start recovery
>>>>>
>>>>> Ok, ./pcp_recovery_node 100 localhost 9898 nastpcp nastpcp 1
>>>>>
>>>>>>
>>>>>> 3) pgpool-II stucks at the beginning of 2nd stage (this is what I
>>>>>> couldn't reproduce)
>>>>>
>>>>> Ok, got stuck
>>>>>
>>>>>>
>>>>>> 4) attach gdb to pgpool-II child process which psql connected at
>>>>>> 1)
>>>>>
>>>>> Ok, gdb pgpool PID
>>>>>
>>>>>>
>>>>>> 5) get backtrace to know where pgpool-II sticks
>>>>>>
>>>>>>>
>>>>>
>>>>> Attaching to process 23712
>>>>> Reading symbols from /opt/pgpool-cvs.1.117/bin/pgpool...done.
>>>>> Using host libthread_db library "/lib/tls/i686/cmov/
>>>>> libthread_db.so.
>>>>> 1".
>>>>> Reading symbols from /usr/lib/libpq.so.5...done.
>>>>> Loaded symbols for /usr/lib/libpq.so.5
>>>>> Reading symbols from /opt/pgpool-cvs.1.117/lib/libpcp.so.0...done.
>>>>> Loaded symbols for /opt/pgpool-cvs.1.117/lib/libpcp.so.0
>>>>> Reading symbols from /lib/tls/i686/cmov/libresolv.so.2...done.
>>>>> Loaded symbols for /lib/tls/i686/cmov/libresolv.so.2
>>>>> Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done.
>>>>> Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1
>>>>> Reading symbols from /lib/tls/i686/cmov/libm.so.6...done.
>>>>> Loaded symbols for /lib/tls/i686/cmov/libm.so.6
>>>>> Reading symbols from /lib/tls/i686/cmov/libc.so.6...done.
>>>>> Loaded symbols for /lib/tls/i686/cmov/libc.so.6
>>>>> Reading symbols from /lib/tls/i686/cmov/libcrypt.so.1...done.
>>>>> Loaded symbols for /lib/tls/i686/cmov/libcrypt.so.1
>>>>> Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.8...done.
>>>>> Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.8
>>>>> Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8...done.
>>>>> Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8
>>>>> Reading symbols from /usr/lib/libkrb5.so.3...done.
>>>>> Loaded symbols for /usr/lib/libkrb5.so.3
>>>>> Reading symbols from /lib/libcom_err.so.2...done.
>>>>> Loaded symbols for /lib/libcom_err.so.2
>>>>> Reading symbols from /usr/lib/libgssapi_krb5.so.2...done.
>>>>> Loaded symbols for /usr/lib/libgssapi_krb5.so.2
>>>>> Reading symbols from /usr/lib/libldap_r.so.2...done.
>>>>> Loaded symbols for /usr/lib/libldap_r.so.2
>>>>> Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...done.
>>>>> [Thread debugging using libthread_db enabled]
>>>>> [New Thread -1214495040 (LWP 23712)]
>>>>> Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0
>>>>> Reading symbols from /lib/ld-linux.so.2...done.
>>>>> Loaded symbols for /lib/ld-linux.so.2
>>>>> Reading symbols from /lib/tls/i686/cmov/libdl.so.2...done.
>>>>> Loaded symbols for /lib/tls/i686/cmov/libdl.so.2
>>>>> Reading symbols from /usr/lib/libz.so.1...done.
>>>>> Loaded symbols for /usr/lib/libz.so.1
>>>>> Reading symbols from /usr/lib/libk5crypto.so.3...done.
>>>>> Loaded symbols for /usr/lib/libk5crypto.so.3
>>>>> Reading symbols from /usr/lib/libkrb5support.so.0...done.
>>>>> Loaded symbols for /usr/lib/libkrb5support.so.0
>>>>> Reading symbols from /usr/lib/liblber.so.2...done.
>>>>> root      5312     6  0 Dec03 ?        00:00:00 [pdflush]
>>>>> Loaded symbols for /usr/lib/liblber.so.2
>>>>> Reading symbols from /usr/lib/libsasl2.so.2...done.
>>>>> Loaded symbols for /usr/lib/libsasl2.so.2
>>>>> Reading symbols from /usr/lib/libgnutls.so.13...done.
>>>>> Loaded symbols for /usr/lib/libgnutls.so.13
>>>>> Reading symbols from /usr/lib/libtasn1.so.3...done.
>>>>> Loaded symbols for /usr/lib/libtasn1.so.3
>>>>> Reading symbols from /usr/lib/libgcrypt.so.11...done.
>>>>> Loaded symbols for /usr/lib/libgcrypt.so.11
>>>>> Reading symbols from /usr/lib/libgpg-error.so.0...done.
>>>>> Loaded symbols for /usr/lib/libgpg-error.so.0
>>>>> Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done.
>>>>> Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2
>>>>> Failed to read a valid object file image from memory.
>>>>> 0xb7f39402 in ?? ()
>>>>>
>>>>> (gdb) bt
>>>>> #0  0xb7f39402 in ?? ()
>>>>> #1  0xb7e510fd in select () from /lib/tls/i686/cmov/libc.so.6
>>>>> #2  0x0805a499 in pool_process_query (frontend=0x8128c18,
>>>>> backend=0x8128a10, connection_reuse=0,
>>>>> first_ready_for_query_received=0)
>>>>>   at pool_process_query.c:365
>>>>> #3  0x0804f03e in do_child (unix_fd=3, inet_fd=4) at child.c:428
>>>>> #4  0x0804bc21 in fork_a_child (unix_fd=3, inet_fd=4, id=2) at
>>>>> main.c:
>>>>> 814
>>>>> #5  0x0804d1e8 in failover () at main.c:1328
>>>>> #6  0x0804b16b in main (argc=7, argv=0xbff10594) at main.c:519
>>>>>
>>>>>
>>>>>
>>>>>
>>>
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general



More information about the Pgpool-general mailing list