[pgpool-general: 7503] Re: Problems with pg_rewind

Emond Papegaaij emond.papegaaij at gmail.com
Tue Apr 13 15:42:40 JST 2021


Hi,

> > >From what I can see, the database on node 0 thinks it is still on
> > timeline 1, where node 1 is at timeline 2. Why does pg_rewind decide
> > that a rewind is not required? Could this be a timing issue? Maybe the
> > rewind is started before node 1 actually started the new timeline?
>
> I think it is not a timing issue, because follow_master_command is executed after failover_command.
> It may be caused by pg_rewind command.

The reason I think it's timing related is because it sometimes fails
but often works. Also, at first PostgreSQL does start, but has a
replication delay, but a few seconds later PostgreSQL fails to start
entirely.

> I can't reproduce the pg_rewind issue reported above.
> If you can share your failover and follow_master script,
> it helps to identify the cause.

I've attached the relevant scripts. They are inspired by the example
scripts and largely follow the same steps. Both pgpool and pg run as
docker containers, so the scripts often need to pass calls trough
docker. Also, we do not use direct SSH connections between the
servers, instead the remote parts are executed via a webhook service.
For example, the following code invokes pg_rewind.sh on
${FAILED_NODE_HOST} and passes ${NEW_MASTER_NODE_HOST} as parameter:

    curl --fail -H "Content-Type: application/json" \
       -H "Authorization: Bearer ${WEBHOOK_SECRET}" \
       -d "{\"master_hostname\": \"${NEW_MASTER_NODE_HOST}\"}" \
       "http://${FAILED_NODE_HOST}:9000/hooks/pg_rewind"

> > In other tests we do see a rewind happening and in those cases the
> > database does start properly. The scripts we used are based on the
> > examples but modified to our setup. So pgpool_follow_master.sh first
> > checks if the database is up, then tries a pg_rewind and if this fails
> > a pg_basebackup. Finally it starts and attaches the node.
>
> After executing pg_rewind, you may need to try to start PostgreSQL.
> If it fails, then execute pg_basebackup.
>
> See the latest sample script in 4.1 branch:
>
> https://git.postgresql.org/gitweb/?p=pgpool2.git;a=blob_plain;f=src/sample/scripts/follow_master.sh.sample;hb=refs/heads/V4_1_STABLE

We used that script as a basis for our scripts. The problem is that
the database at first does start correctly, but fails to replicate.
Only after some time (it's restarted after about 30 seconds) it fails
to start entirely.

One possible reason I can think of that may be causing this problem is
that our testcases require the creation of snapshots for the vms
running the cluster. These snapshots are restored at the beginning of
the test. We do make sure the cluster is in a consistent state before
we continue with the test, but maybe some part of the cluster is still
not entirely in sync. We've changed the re-synchronization code
yesterday to fully stop and restart the entire cluster in a reliable
way rather than trying to fix a distorted cluster. We haven't seen the
error since, but it will require some more runs to be sure. It still
is a strange issue, because even if it's caused the restoring the vm
snapshots, all pgpools and postgresql databases have already been
restarted prior to this failure.

Best regards,
Emond

On Tue, Apr 13, 2021 at 1:56 AM Bo Peng <pengbo at sraoss.co.jp> wrote:
>
> Hi,
>
> On Mon, 12 Apr 2021 09:48:26 +0200
> Emond Papegaaij <emond.papegaaij at gmail.com> wrote:
>
> > Hi all,
> >
> > In our test setup we see frequent failures were a database does not
> > start properly after a forced failover. In this test we start with a
> > cluster of 3 pgpool (4.1.4)+pg (12) nodes (every node runs both pgpool
> > and pg). Node 0 is master+primary and needs to be restarted so we
> > trigger a forced failover. First we stop and restart pgpool to force
> > another node in the cluster to become cluster master. Next, we detach
> > the backend at node 0 and this is where the trouble starts. These log
> > lines from the server show the steps being executed (I've removed some
> > to keep it concise):
> >
> > Apr 11 05:31:51 tkh-server tkh-force-failover: Node 0 status:
> > 172.29.30.1 5432 1 0.333333 waiting primary 0   2021-04-11 07:31:49
> > Apr 11 05:31:51 tkh-server tkh-force-failover: Detaching node 0 from
> > the pgpool cluster
> > Apr 11 05:31:51 tkh-server tkh-force-failover: pcp_detach_node --
> > Command Successful
> > Apr 11 05:31:51 tkh-server tkh-force-failover: Waiting for failover to complete
> > Apr 11 05:31:55 tkh-server tkh-force-failover: Node 0 status:
> > 172.29.30.1 5432 3 0.333333 down standby 0   2021-04-11 07:31:52
> > Apr 11 05:31:55 tkh-server webhook: pg_rewind: source and target
> > cluster are on the same timeline
> > Apr 11 05:31:55 tkh-server webhook: pg_rewind: no rewind required
> > Apr 11 05:31:56 tkh-server webhook: Creating tkh-db ... done
> > Apr 11 05:31:57 tkh-server tkh-force-failover: Node 0 status:
> > 172.29.30.1 5432 1 0.333333 waiting standby 160 streaming async
> > 2021-04-11 07:31:56
> > Apr 11 05:31:57 tkh-server tkh-force-failover: Failover complete
> >
> > As you can see, pg_rewinds decides that a rewind is not needed.
> > However, when the database starts, it starts at a replication delay of
> > 160. This is the first indication of something being wrong. After
> > this, the database is stopped again, some docker packages are being
> > modified and the database is started again. Now, the database does not
> > start anymore and reports:
> >
> > {"log":"2021-04-11 07:32:34.145 CEST [1] LOG:  starting PostgreSQL
> > 12.6 (Debian 12.6-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc
> > (Debian 8.3.0-6) 8.3.0,
> > 64-bit\n","stream":"stderr","time":"2021-04-11T05:32:34.146994187Z"}
> > {"log":"2021-04-11 07:32:34.145 CEST [1] LOG:  listening on IPv4
> > address \"0.0.0.0\", port
> > 5432\n","stream":"stderr","time":"2021-04-11T05:32:34.147013641Z"}
> > {"log":"2021-04-11 07:32:34.145 CEST [1] LOG:  listening on IPv6
> > address \"::\", port
> > 5432\n","stream":"stderr","time":"2021-04-11T05:32:34.147017054Z"}
> > {"log":"2021-04-11 07:32:34.146 CEST [1] LOG:  listening on Unix
> > socket \"/var/run/postgresql/.s.PGSQL.5432\"\n","stream":"stderr","time":"2021-04-11T05:32:34.147019855Z"}
> > {"log":"2021-04-11 07:32:34.162 CEST [28] LOG:  database system was
> > shut down in recovery at 2021-04-11 07:32:07
> > CEST\n","stream":"stderr","time":"2021-04-11T05:32:34.165175019Z"}
> > {"log":"2021-04-11 07:32:34.163 CEST [28] LOG:  entering standby
> > mode\n","stream":"stderr","time":"2021-04-11T05:32:34.165196461Z"}
> > {"log":"2021-04-11 07:32:34.163 CEST [28] FATAL:  requested timeline 2
> > is not a child of this server's
> > history\n","stream":"stderr","time":"2021-04-11T05:32:34.165209297Z"}
> > {"log":"2021-04-11 07:32:34.163 CEST [28] DETAIL:  Latest checkpoint
> > is at 0/50354F8 on timeline 1, but in the history of the requested
> > timeline, the server forked off from that timeline at
> > 0/50354F8.\n","stream":"stderr","time":"2021-04-11T05:32:34.165212383Z"}
> > {"log":"2021-04-11 07:32:34.164 CEST [1] LOG:  startup process (PID
> > 28) exited with exit code
> > 1\n","stream":"stderr","time":"2021-04-11T05:32:34.168416599Z"}
> > {"log":"2021-04-11 07:32:34.164 CEST [1] LOG:  aborting startup due to
> > startup process
> > failure\n","stream":"stderr","time":"2021-04-11T05:32:34.168429202Z"}
> > {"log":"2021-04-11 07:32:34.166 CEST [1] LOG:  database system is shut
> > down\n","stream":"stderr","time":"2021-04-11T05:32:34.168432379Z"}
> >
> > The database (node 1) that now is the primary reports the following in the logs:
> >
> > {"log":"2021-04-11 07:30:34.303 CEST [1] LOG:  database system is
> > ready to accept read only
> > connections\n","stream":"stderr","time":"2021-04-11T05:30:34.305997834Z"}
> > {"log":"2021-04-11 07:30:34.320 CEST [32] LOG:  started streaming WAL
> > from primary at 0/5000000 on timeline
> > 1\n","stream":"stderr","time":"2021-04-11T05:30:34.321296398Z"}
> > {"log":"2021-04-11 07:31:51.674 CEST [28] LOG:  received promote
> > request\n","stream":"stderr","time":"2021-04-11T05:31:51.675934327Z"}
> > {"log":"2021-04-11 07:31:51.674 CEST [32] FATAL:  terminating
> > walreceiver process due to administrator
> > command\n","stream":"stderr","time":"2021-04-11T05:31:51.675968345Z"}
> > {"log":"2021-04-11 07:31:51.676 CEST [28] LOG:  redo done at
> > 0/50354C0\n","stream":"stderr","time":"2021-04-11T05:31:51.677261588Z"}
> > {"log":"2021-04-11 07:31:51.676 CEST [28] LOG:  last completed
> > transaction was at log time 2021-04-11
> > 07:21:00.991231+02\n","stream":"stderr","time":"2021-04-11T05:31:51.677276189Z"}
> > {"log":"2021-04-11 07:31:51.681 CEST [28] LOG:  selected new timeline
> > ID: 2\n","stream":"stderr","time":"2021-04-11T05:31:51.682159124Z"}
> > {"log":"2021-04-11 07:31:51.798 CEST [28] LOG:  archive recovery
> > complete\n","stream":"stderr","time":"2021-04-11T05:31:51.798821424Z"}
> > {"log":"2021-04-11 07:31:51.809 CEST [1] LOG:  database system is
> > ready to accept
> > connections\n","stream":"stderr","time":"2021-04-11T05:31:51.810598738Z"}
> >
> > >From what I can see, the database on node 0 thinks it is still on
> > timeline 1, where node 1 is at timeline 2. Why does pg_rewind decide
> > that a rewind is not required? Could this be a timing issue? Maybe the
> > rewind is started before node 1 actually started the new timeline?
>
> I think it is not a timing issue, because follow_master_command is executed after failover_command.
> It may be caused by pg_rewind command.
>
> I can't reproduce the pg_rewind issue reported above.
> If you can share your failover and follow_master script,
> it helps to identify the cause.
>
> > In other tests we do see a rewind happening and in those cases the
> > database does start properly. The scripts we used are based on the
> > examples but modified to our setup. So pgpool_follow_master.sh first
> > checks if the database is up, then tries a pg_rewind and if this fails
> > a pg_basebackup. Finally it starts and attaches the node.
>
> After executing pg_rewind, you may need to try to start PostgreSQL.
> If it fails, then execute pg_basebackup.
>
> See the latest sample script in 4.1 branch:
>
> https://git.postgresql.org/gitweb/?p=pgpool2.git;a=blob_plain;f=src/sample/scripts/follow_master.sh.sample;hb=refs/heads/V4_1_STABLE
>
> > Best regards,
> > Emond
> > _______________________________________________
> > pgpool-general mailing list
> > pgpool-general at pgpool.net
> > http://www.pgpool.net/mailman/listinfo/pgpool-general
>
>
> --
> Bo Peng <pengbo at sraoss.co.jp>
> SRA OSS, Inc. Japan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pg_rewind.sh
Type: application/x-shellscript
Size: 791 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210413/98b81d0e/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool_failover.sh
Type: application/x-shellscript
Size: 1146 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210413/98b81d0e/attachment-0007.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pg_basebackup.sh
Type: application/x-shellscript
Size: 454 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210413/98b81d0e/attachment-0008.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool_follow_master.sh
Type: application/x-shellscript
Size: 2232 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210413/98b81d0e/attachment-0009.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pg_promote.sh
Type: application/x-shellscript
Size: 108 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210413/98b81d0e/attachment-0010.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pg_start.sh
Type: application/x-shellscript
Size: 89 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210413/98b81d0e/attachment-0011.bin>


More information about the pgpool-general mailing list