[pgpool-general: 8979] Re: Clients disconnection when slave node is off

Jesús Campoy jesuscampoy at gmail.com
Tue Dec 5 17:56:58 JST 2023


Hi Tatsuo,

I just downloaded the 4.5RC1 version to test it to check if
clients disconnections are solved.
The problem is that I cannot compile this version in RHEL5. We had no
problems to compile version 4.3.2.
Is the new version compatible with RHEL5 operating system?

Please find attached info from my OS:

Red Hat Enterprise Linux Client release 5.8 (Tikanga)
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52)
ldd (GNU libc) 2.5
Postgres 12.5

The compilation error is:

gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../src/include  -D_GNU_SOURCE -I
../../src/include/parser -I /opt/postgresql/include   -g -O2 -Wall
-Wmissing-prototypes -Wmissing-declarations -fno-strict-aliasing -c -o
copyfuncs.o copyfuncs.c
In file included from ../../src/include/parser/parsenodes.h:28,
                 from copyfuncs.c:30:
../../src/include/parser/primnodes.h:27: error: redefinition of type
‘TransactionId’
../../src/include/parser/pg_list.h:50: error: previous declaration of ‘
TransactionId’ was here

Thank you for your support.

Best,
Jesús


El lun, 15 may 2023 a las 4:14, Tatsuo Ishii (<ishii at sraoss.co.jp>)
escribió:

> > Ok, thank you for your great work.
> >
> > In this case the failover is due to the slave node.
> > Taking into account I have no load balancing, I think clients should can
> > connect to pgpool during this failover because master node is still the
> > same and it is alive.
>
> There are some code paths to access the standby node, which triggers
> session disconnection:
>
> 1. When client connects to pgpool, pgpool tries to connect to standby
>    PostgreSQL.
>
> 2. When client connects to pgpool, pgpool sends SET application_name
>    command.
>
> 3. Pgpool detects the PostgreSQL shutdown event even if it's from
>    standby, which results in session disconnection.
>
> Although I am not sure if I can eliminate all the code paths above, I
> will try for upcoming Pgpool 4.5.
>
> > Is there a plan to solve this limitation in future releases?
> >
> > Thanks in advance.
> >
> > Best,
> > Jesús
> >
> > El dom, 14 may 2023 4:06, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
> >
> >> Hi Jesús,
> >>
> >> > Hi Tatsuo. Thank you very much.
> >> > Should we add It as a bug in Pgpool-II Bug Tracker?
> >>
> >> No, you don't need to as it's a known limitation that pgpool does not
> >> accept new connection while failover.
> >>
> >> > El sáb, 13 may 2023 9:24, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
> >> >
> >> >> Ok. The errors were generated while clients tried to connect to
> >> >> pgpool.  My patch covers the case when failover happens while
> >> >> connections from clients to pgpool are *kept*.  However the patch
> does
> >> >> not cover the case when clients try to establish connection to pgpool
> >> >> while failover.
> >> >>
> >> >> I tested my patch using pgbench. If pgbench is given "-C" (create
> >> >> connection for each transaction), I get same errors you mentioned.
> >> >>
> >> >> I have to admit my patch does not cover all the cases. I need more
> >> >> time to deal with these problems.
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I think we cannot connect to pgpool. I will show you the output of
> my
> >> >> > dbCheck script.
> >> >> >
> >> >> >
> >> >> >    - *Pgpool without patch and backend1 as slave**:*
> >> >> >
> >> >> > [root at pg_client1 services]# ./dbcheck.sh $VIP_PGPOOL
> >> >> >
> >> >> > psql: ERROR:  do command failed
> >> >> >
> >> >> > DETAIL:  backend error: "SFATAL"
> >> >> >
> >> >> > psql: ERROR:  unable to read data from DB node 1
> >> >> >
> >> >> > DETAIL:  socket read failed with error "Connection reset by peer"
> >> >> >
> >> >> > psql: server closed the connection unexpectedly
> >> >> >
> >> >> >         This probably means the server terminated abnormally
> >> >> >
> >> >> >         before or while processing the request.
> >> >> >
> >> >> > psql: server closed the connection unexpectedly
> >> >> >
> >> >> >         This probably means the server terminated abnormally
> >> >> >
> >> >> >         before or while processing the request.
> >> >> >
> >> >> >         This probably means the server terminated abnormally
> >> >> >
> >> >> >         before or while processing the request.
> >> >> >
> >> >> >
> >> >> >
> >> >> >    - *Pgpool with patch and backend1 as slave:*
> >> >> >
> >> >> > psql: ERROR:  unable to read message kind
> >> >> >
> >> >> > DETAIL:  kind does not match between main(52)
> >> >> >
> >> >> >
> >> >> >
> >> >> >    - *Pgpool with patch and backend1 as master**:*
> >> >> >
> >> >> > psql: ERROR:  unable to read data from DB node
> >> >> >
> >> >> > DETAIL:  socket read failed with error "Connection reset by peer"
> >> >> >
> >> >> > server closed the connection unexpectedly
> >> >> >
> >> >> >         This probably means the server terminated abnormally
> >> >> >
> >> >> >         before or while processing the request.
> >> >> >
> >> >> > connection to server was lost
> >> >> >
> >> >> > server closed the connection unexpectedly
> >> >> >
> >> >> >         This probably means the server terminated abnormally
> >> >> >
> >> >> >         before or while processing the request.
> >> >> >
> >> >> > connection to server was lost
> >> >> >
> >> >> >
> >> >> > Anyway, with a client which uses ODBC, if it tries to access the
> >> database
> >> >> > during failover (from slave node) the following error is displayed:
> >> >> "Driver
> >> >> > Unable to Establish Connection with Data Source".
> >> >> >
> >> >> > El vie, 12 may 2023 a las 9:40, Tatsuo Ishii (<ishii at sraoss.co.jp
> >)
> >> >> > escribió:
> >> >> >
> >> >> >> What do you mean by "database is not available"?
> >> >> >>
> >> >> >> 1. You can connect to pgpool but pgpool does not reply back.
> >> >> >>
> >> >> >> 2. You can cannect to pgpool but pgpool immediately disconnects.
> >> >> >>
> >> >> >> > Hi Tatsuo,
> >> >> >> >
> >> >> >> > I'm working with your patch but I continue facing a problem
> because
> >> >> the
> >> >> >> > database is not available during 1 second aprox (I have a script
> >> >> calling
> >> >> >> > select query every 0.1 seconds to check the time is not
> available
> >> the
> >> >> >> > database).
> >> >> >> >
> >> >> >> > I will explain two different cases:
> >> >> >> >
> >> >> >> > 1. Slave node (backend1 in pgpool.conf) is turn off. With your
> >> patch
> >> >> the
> >> >> >> > database is always available. Without your patch the database is
> >> not
> >> >> >> > available during 1 second.
> >> >> >> > 2. Master node (backend0) is turn off. Failover is done to
> promote
> >> >> >> > backend1. After that, I turn on again backend0, which is now
> slave
> >> >> node.
> >> >> >> If
> >> >> >> > I turn off this slave node (backend0), the database is not
> >> available
> >> >> >> during
> >> >> >> > 1 second (with or without your patch)
> >> >> >> >
> >> >> >> > Do you have any idea why is this behaviour?
> >> >> >> >
> >> >> >> > Thanks in advance.
> >> >> >> >
> >> >> >> > Best,
> >> >> >> > Jesús
> >> >> >> >
> >> >> >> > El vie, 14 abr 2023 3:41, Tatsuo Ishii <ishii at sraoss.co.jp>
> >> escribió:
> >> >> >> >
> >> >> >> >> Hi Jesús,
> >> >> >> >>
> >> >> >> >> > Hi Tatsuo,
> >> >> >> >> >
> >> >> >> >> > At first, thank you so much for your time to investigate this
> >> >> issue.
> >> >> >> >>
> >> >> >> >> No problem.
> >> >> >> >>
> >> >> >> >> > I have compiled pgpool 4.3.2 with your patch and the problem
> >> with
> >> >> >> pgbench
> >> >> >> >> > is solved.
> >> >> >> >> > I still need to test it in my environment.
> >> >> >> >> >
> >> >> >> >> > Anyway, I had a look your code and I have seen that the
> session
> >> is
> >> >> >> closed
> >> >> >> >> > only if failover is not completed in 30 seconds.
> >> >> >> >> > I have the following doubt related to this change. Is this
> >> session
> >> >> >> >> > operative during the failover? I mean, if failover spends 20
> >> >> seconds,
> >> >> >> is
> >> >> >> >> > this session blocked during this time or this session can
> accept
> >> >> any
> >> >> >> >> > transaction?
> >> >> >> >>
> >> >> >> >> It is likely the session is blocked. The reason for "likely" is
> >> the
> >> >> >> >> function which has the logic inside can be called frequently
> >> during
> >> >> >> >> session but it is not always. It is possible that a pgpool
> process
> >> >> >> >> already called the function by the time when failover starts,
> then
> >> >> >> >> proceeds and sends a query to backend.
> >> >> >> >>
> >> >> >> >> > Let me another question. Should we add this issue as a bug?
> >> >> >> >>
> >> >> >> >> No you don't need. Developers already recognize this a bug
> report.
> >> >> >> >>
> >> >> >> >> > Thanks in advance.
> >> >> >> >> >
> >> >> >> >> > Best,
> >> >> >> >> > Jesús
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > El mié, 12 abr 2023 3:33, Tatsuo Ishii <ishii at sraoss.co.jp>
> >> >> escribió:
> >> >> >> >> >
> >> >> >> >> >> > However a downside of this is, while failover clients
> cannot
> >> >> >> process
> >> >> >> >> >> > queries or at least slow down processing. Below is the log
> >> from
> >> >> >> >> >> > pgbench using "-P 1" option to show progress. As you can
> see
> >> >> from
> >> >> >> 170
> >> >> >> >> >> > s pgbench starts to slow down and recovers at 194 s. That
> is,
> >> >> the
> >> >> >> >> >> > slowdown continued for 24 seconds.
> >> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >> After more research, I suspect the slow down is due to
> effect
> >> of
> >> >> >> >> >> checkpointing. If I add "-S" option to change the
> transaction
> >> >> time, I
> >> >> >> >> >> don't see the slow down anymore.
> >> >> >> >> >>
> >> >> >> >> >> Best reagards,
> >> >> >> >> >> --
> >> >> >> >> >> Tatsuo Ishii
> >> >> >> >> >> SRA OSS LLC
> >> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
> >> >> >> >> >> Japanese:http://www.sraoss.co.jp
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231205/5c8eeaaa/attachment.htm>


More information about the pgpool-general mailing list