[Pgpool-general] pcp_recovery_node and errors in postgres log
Tomasz Chmielewski
mangoo at wpkg.org
Tue Dec 22 22:29:29 UTC 2009
On 22.12.2009 23:05, Tomasz Chmielewski wrote:
> I followed the http://linuxsilo.net/articles/postgresql-pgpool.html to set up pgpool-ii replication.
>
>
> When I detach and recover a node with these commands:
>
> # pcp_detach_node -d 240 127.0.0.1 9898 user pass 1
> # pcp_recovery_node -d 240 127.0.0.1 9898 user pass 1
>
>
> I can observer the following on node 1 in postgres logs - :
>
> 2009-12-23 06:03:15 SGT LOG: database system was interrupted; last known up at 2009-12-23 06:03:12 SGT
> 2009-12-23 06:03:15 SGT LOG: starting archive recovery
> 2009-12-23 06:03:15 SGT LOG: restore_command = '/usr/bin/scp db10:/var/lib/postgresql/8.3/main/pg_xlog_archive/%f %p'
> scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/00000002.history: No such file or directory
> Because of these errors, recovery sometimes fails.
>
> How does postgres on the node which is recovered determines the %f files it needs to copy?
OK, I see it's normal that it asks for files which are not present:
http://developer.postgresql.org/pgdocs/postgres/continuous-archiving.html
It is important for the command to return a zero exit status if and
only if it succeeds. The command will be asked for file names that
are not present in the archive; it must return nonzero when so
asked.
However, postgres on recovered node fails to start if it finds no files to copy, i.e.:
2009-12-23 06:21:40 SGT LOG: database system was shut down at 2009-12-23 06:21:36 SGT
2009-12-23 06:21:40 SGT LOG: starting archive recovery
2009-12-23 06:21:40 SGT LOG: restore_command = '/usr/bin/scp db10:/var/lib/postgresql/8.3/main/pg_xlog_archive/%f %p'
scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/00000003.history: No such file or directory
scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/000000030000000000000063: No such file or directory
2009-12-23 06:21:40 SGT LOG: could not open file "pg_xlog/000000030000000000000063" (log file 0, segment 99): No such file or directory
2009-12-23 06:21:40 SGT LOG: invalid primary checkpoint record
2009-12-23 06:21:40 SGT LOG: incomplete startup packet
scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/000000030000000000000063: No such file or directory
2009-12-23 06:21:40 SGT LOG: could not open file "pg_xlog/000000030000000000000063" (log file 0, segment 99): No such file or directory
2009-12-23 06:21:40 SGT LOG: invalid secondary checkpoint record
2009-12-23 06:21:40 SGT PANIC: could not locate a valid checkpoint record
2009-12-23 06:21:40 SGT LOG: startup process (PID 24196) was terminated by signal 6: Aborted
2009-12-23 06:21:40 SGT LOG: aborting startup due to startup process failure
To reproduce:
1) on a failed node, do:
tail -f /var/log/postgresql/postgresql-8.3-main.log
2) start pcp_recovery_node, pcp_detach_node and then pcp_recovery_node again:
pcp_recovery_node -d 240 127.0.0.1 9898 user pass 1
pcp_detach_node -d 240 127.0.0.1 9898 user pass 1
pcp_recovery_node -d 240 127.0.0.1 9898 user pass 1
The log on node 1 will show postgres startup failure; pcp_recovery_node will "hang" until it times out.
Is it expected?
--
Tomasz Chmielewski
http://wpkg.org
More information about the Pgpool-general
mailing list