[Pgpool-general] pcp_recovery_node

Tue Jan 25 08:38:59 UTC 2011

cvs version can not compile:

pgpool-walrecvrunning]# make
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv
-fpic -I. -I/usr/local/postgresql-8.3.7/include/server
-I/usr/local/postgresql-8.3.7/include/internal -D_GNU_SOURCE   -c -o
pgpool-walrecrunning.o pgpool-walrecrunning.c
pgpool-walrecrunning.c:33:37: error: replication/walreceiver.h: No such file
or directory
pgpool-walrecrunning.c: In function ‘pgpool_walrecrunning’:
pgpool-walrecrunning.c:47: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or
‘__attribute__’ before ‘*’ token
pgpool-walrecrunning.c:47: error: ‘walrcv’ undeclared (first use in this
function)
pgpool-walrecrunning.c:47: error: (Each undeclared identifier is reported
only once
pgpool-walrecrunning.c:47: error: for each function it appears in.)
pgpool-walrecrunning.c:47: error: ‘WalRcv’ undeclared (first use in this
function)
pgpool-walrecrunning.c:49: error: ‘WALRCV_RUNNING’ undeclared (first use in
this function)
make: *** [pgpool-walrecrunning.o] Error 1

but pgpool-II-3.1.0-alpha1 can,
I'll let you know when I test it.

On Tue, Jan 25, 2011 at 1:07 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> >>> This doesn't look good?
> >>>
> >>> 25314 2011-01-21 10:00:43 CET ERROR:  function pgpool_walrecrunning()
> does
> >>> not exist at character 32
>
> Have you installed pgpool_walrecrunning()?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > No answer here?
> >
> > That's real problem for me.
> > I would really appreciate any help or suggestion.
> >
> > Thanks.
> >
> > On Fri, Jan 21, 2011 at 9:21 AM, Armin Nesiren <armin.nesiren at intera.si
> >wrote:
> >
> >> This is what bothers me
> >> from node1 postgresql.log:
> >>
> >> 25319 2011-01-21 10:01:23 CET LOG:  statement: SELECT
> >> pg_current_xlog_location()
> >> 25319 2011-01-21 10:01:23 CET ERROR:  recovery is in progress
> >> 25319 2011-01-21 10:01:23 CET HINT:  WAL control functions cannot be
> >> executed during recovery.
> >>
> >> What this error mean?
> >> SELECT can not be executed on node1 and node1 have primary status.
> >> And where did I tell pgpool to switch primaries from node2 -> node1.
> >>
> >> new status:
> >>
> >> # psql -p 9999 -c "show pool_nodes;" -Upostgres
> >>  node_id |     hostname     | port | status | lb_weight | state
> >> ---------+------------------+------+--------+-----------+-------
> >>  0       | node1         | 5432 | 2      | 0.500000  | P
> >>  1       | node2         | 5432 | 2      | 0.500000  | S
> >>
> >>
> >> On Fri, Jan 21, 2011 at 9:15 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >>
> >>> This doesn't look good?
> >>>
> >>> 25314 2011-01-21 10:00:43 CET ERROR:  function pgpool_walrecrunning()
> does
> >>> not exist at character 32
> >>> --
> >>> Tatsuo Ishii
> >>> SRA OSS, Inc. Japan
> >>> English: http://www.sraoss.co.jp/index_en.php
> >>> Japanese: http://www.sraoss.co.jp
> >>>
> >>> > Ok, this is before starting pcp_recovery_node:
> >>> >
> >>> > # psql -p 9999 -c "show pool_nodes;" -Upostgres
> >>> >  node_id |     hostname     | port | status | lb_weight | state
> >>> > ---------+------------------+------+--------+-----------+-------
> >>> >  0       | node1         | 5432 | 3      | 0.500000  | S
> >>> >  1       | node2         | 5432 | 2      | 0.500000  | P
> >>> >
> >>> > then, I type:
> >>> >
> >>> > pcp_recovery_node 20 localhost 9898 postgres <password> 0 -d
> >>> >
> >>> > (I have disabled creating trigger file in failback for this example)
> >>> >
> >>> > pgpool.log (I have removed some debug duplicated messages):
> >>> >
> >>> > 2011-01-21 08:58:18 LOG:   pid 4467: starting recovering node 0
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:18 LOG:   pid 4467: starting recovery command:
> "SELECT
> >>> > pgpool_recovery('basebackup.sh', 'puma.intera.si',
> >>> '/pgdata/data90.C/')"
> >>> >
> >>> > 2011-01-21 08:58:19 DEBUG: pid 4434: starting health checking
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:19 DEBUG: pid 4434: health_check: 0 th DB node
> status:
> >>> 3
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:19 DEBUG: pid 4434: health_check: 1 th DB node
> status:
> >>> 2
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:56 LOG:   pid 4467: 1st stage is done
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:56 LOG:   pid 4467: check_postmaster_started: try to
> >>> > connect to postmaster on hostname:node1 database:postgres
> user:postgres
> >>> > (retry 0 times)
> >>> > 2011-01-21 08:58:56 LOG:   pid 4467: check_postmaster_started: failed
> to
> >>> > connect to postmaster on hostname:node1 database:postgres
> user:postgres
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4467: check_postmaster_started: try to
> >>> > connect to postmaster on hostname:puma.intera.si database:postgres
> >>> > user:postgres (retry 1 times)
> >>> > 2011-01-21 08:58:59 LOG:   pid 4467: 0 node restarted
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4467: send_failback_request: fail back
> 0
> >>> th
> >>> > node request from pid 4467
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler called
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: starting to
> >>> select
> >>> > new master node
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4434: starting fail back. reconnect
> host
> >>> > node1(5432)
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4434: execute command:
> >>> > /usr/local/pgpool-II/etc/failback.sh 0 "node1" 5432 /pgdata/data90.C/
> 0
> >>> 1
> >>> > "node2" 1
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4944
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4944: child received shutdown request
> >>> signal
> >>> > 3
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4945
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4946
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4947
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4948
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4974
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4975
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4434: failover_handler: set new master
> >>> node:
> >>> > 0
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 5583: I am 5583
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4975: child received shutdown request
> >>> signal
> >>> > 3
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4971: child received shutdown request
> >>> signal
> >>> > 3
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4945: child received shutdown request
> >>> signal
> >>> > 3
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 5613: I am 5613
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4434: failback done. reconnect host
> >>> > node1(5432)
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 5614: I am 5614
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4964: child received shutdown request
> >>> signal
> >>> > 3
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: pool_ssl: SSL requested but SSL
> >>> support
> >>> > is not available
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: s_do_auth: auth kind: 0
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: s_do_auth: parameter status data
> >>> > received
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: s_do_auth: parameter status data
> >>> > received
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: do_query: kind: E
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: do_query: kind: Z
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4434: find_primary_node: do_query
> returns
> >>> no
> >>> > rows
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4434: find_primary_node: do_query
> returns
> >>> no
> >>> > data
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4434: find_primary_node: primary node
> id
> >>> is 0
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: reap_handler called
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: reap_handler: call wait3
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: child 4944 exits with status 0
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4434: reap_handler: normally exited
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 LOG:   pid 4467: recovery done
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 DEBUG: pid 4468: do_query: kind: E
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 ERROR: pid 4468: pool_read: EOF encountered with
> >>> backend
> >>> >
> >>> >
> >>> > 2011-01-21 08:58:59 ERROR: pid 4468: do_query: error while reading
> >>> message
> >>> > kind
> >>> >
> >>> > 2011-01-21 08:58:59 ERROR: pid 4468: check_replication_time_lag:
> SELECT
> >>> > pg_current_xlog_location() failed
> >>> >
> >>> > 2011-01-21 08:59:09 DEBUG: pid 4434: starting health checking
> >>> >
> >>> >
> >>> > 2011-01-21 08:59:09 DEBUG: pid 4434: health_check: 0 th DB node
> status:
> >>> 1
> >>> >
> >>> >
> >>> > 2011-01-21 08:59:09 DEBUG: pid 4434: health_check: 1 th DB node
> status:
> >>> 2
> >>> >
> >>> >
> >>> > 2011-01-21 08:59:09 DEBUG: pid 4468: pool_ssl: SSL requested but SSL
> >>> support
> >>> > is not available
> >>> >
> >>> > 2011-01-21 08:59:09 DEBUG: pid 4468: s_do_auth: auth kind: 0
> >>> > 2011-01-21 09:03:40 DEBUG: pid 4468: do_query: kind: T
> >>> > 2011-01-21 09:03:40 DEBUG: pid 4468: num_fileds: 1
> >>> > 2011-01-21 09:03:40 DEBUG: pid 4468: do_query: kind: E
> >>> > 2011-01-21 09:03:40 DEBUG: pid 4468: do_query: kind: Z
> >>> > 2011-01-21 09:03:40 ERROR: pid 4468: check_replication_time_lag:
> SELECT
> >>> > pg_current_xlog_location() returns no rows
> >>> > 2011-01-21 09:03:49 DEBUG: pid 4434: starting health checking
> >>> > 2011-01-21 09:03:49 DEBUG: pid 4434: health_check: 0 th DB node
> status:
> >>> 1
> >>> > 2011-01-21 09:03:50 DEBUG: pid 4434: health_check: 1 th DB node
> status:
> >>> 2
> >>> > 2011-01-21 09:03:50 DEBUG: pid 4468: do_query: kind: T
> >>> > 2011-01-21 09:03:50 DEBUG: pid 4468: num_fileds: 1
> >>> > 2011-01-21 09:03:50 DEBUG: pid 4468: do_query: kind: E
> >>> > 2011-01-21 09:03:50 DEBUG: pid 4468: do_query: kind: Z
> >>> > 2011-01-21 09:03:50 ERROR: pid 4468: check_replication_time_lag:
> SELECT
> >>> > pg_current_xlog_location() returns no rows
> >>> > 2011-01-21 09:04:00 DEBUG: pid 4434: starting health checking
> >>> > 2011-01-21 09:04:00 DEBUG: pid 4434: health_check: 0 th DB node
> status:
> >>> 1
> >>> > 2011-01-21 09:04:00 DEBUG: pid 4434: health_check: 1 th DB node
> status:
> >>> 2
> >>> > 2011-01-21 09:04:00 DEBUG: pid 4468: do_query: kind: T
> >>> > 2011-01-21 09:04:00 DEBUG: pid 4468: num_fileds: 1
> >>> >
> >>> > postgresql.conf (node1):
> >>> >
> >>> > 25308 2011-01-21 10:00:40 CET LOG:  database system was interrupted;
> >>> last
> >>> > known up at 2011-01-21 09:58:57 CET
> >>> >
> >>> > 25308 2011-01-21 10:00:40 CET LOG:  creating missing WAL directory
> >>> > "pg_xlog/archive_status"
> >>> >
> >>> > 25308 2011-01-21 10:00:40 CET LOG:  entering standby mode
> >>> >
> >>> >
> >>> > 25309 2011-01-21 10:00:40 CET LOG:  streaming replication
> successfully
> >>> > connected to primary
> >>> >
> >>> > 25308 2011-01-21 10:00:41 CET LOG:  redo starts at 1/E4000020
> >>> >
> >>> >
> >>> > 25308 2011-01-21 10:00:41 CET LOG:  consistent recovery state reached
> at
> >>> > 1/E5000000
> >>> >
> >>> > 25306 2011-01-21 10:00:41 CET LOG:  database system is ready to
> accept
> >>> read
> >>> > only connections
> >>> >
> >>> > 25314 2011-01-21 10:00:43 CET LOG:  statement: SELECT
> >>> pg_is_in_recovery()
> >>> > AND pgpool_walrecrunning()
> >>> >
> >>> > 25314 2011-01-21 10:00:43 CET ERROR:  function pgpool_walrecrunning()
> >>> does
> >>> > not exist at character 32
> >>> >
> >>> > 25314 2011-01-21 10:00:43 CET HINT:  No function matches the given
> name
> >>> and
> >>> > argument types. You might need to add explicit type casts.
> >>> >
> >>> > 25314 2011-01-21 10:00:43 CET STATEMENT:  SELECT pg_is_in_recovery()
> AND
> >>> > pgpool_walrecrunning()
> >>> >
> >>> > 25314 2011-01-21 10:00:43 CET LOG:  unexpected EOF on client
> connection
> >>> >
> >>> >
> >>> > 25319 2011-01-21 10:00:53 CET LOG:  statement: SELECT
> >>> > pg_current_xlog_location()
> >>> >
> >>> > 25319 2011-01-21 10:00:53 CET ERROR:  recovery is in progress
> >>> >
> >>> >
> >>> > 25319 2011-01-21 10:00:53 CET HINT:  WAL control functions cannot be
> >>> > executed during recovery.
> >>> >
> >>> > 25319 2011-01-21 10:00:53 CET STATEMENT:  SELECT
> >>> pg_current_xlog_location()
> >>> >
> >>> >
> >>> > 25319 2011-01-21 10:01:03 CET LOG:  statement: SELECT
> >>> > pg_current_xlog_location()
> >>> >
> >>> > 25319 2011-01-21 10:01:03 CET ERROR:  recovery is in progress
> >>> >
> >>> >
> >>> > 25319 2011-01-21 10:01:03 CET HINT:  WAL control functions cannot be
> >>> > executed during recovery.
> >>> >
> >>> > 25319 2011-01-21 10:01:03 CET STATEMENT:  SELECT
> >>> pg_current_xlog_location()
> >>> >
> >>> >
> >>> > 25319 2011-01-21 10:01:13 CET LOG:  statement: SELECT
> >>> > pg_current_xlog_location()
> >>> >
> >>> > 25319 2011-01-21 10:01:13 CET ERROR:  recovery is in progress
> >>> >
> >>> >
> >>> > 25319 2011-01-21 10:01:13 CET HINT:  WAL control functions cannot be
> >>> > executed during recovery.
> >>> > 25319 2011-01-21 10:01:13 CET STATEMENT:  SELECT
> >>> pg_current_xlog_location()
> >>> > 25319 2011-01-21 10:01:23 CET LOG:  statement: SELECT
> >>> > pg_current_xlog_location()
> >>> > 25319 2011-01-21 10:01:23 CET ERROR:  recovery is in progress
> >>> > 25319 2011-01-21 10:01:23 CET HINT:  WAL control functions cannot be
> >>> > executed during recovery.
> >>> > 25319 2011-01-21 10:01:23 CET STATEMENT:  SELECT
> >>> pg_current_xlog_location()
> >>> > 25319 2011-01-21 10:01:33 CET LOG:  statement: SELECT
> >>> > pg_current_xlog_location()
> >>> >
> >>> > postgresql.conf (node2):
> >>> >
> >>> > 5853 2011-01-21 09:58:57 CET LOG:  statement: SELECT
> >>> > pgpool_recovery('basebackup.sh', node1', '/pgdata/data90.C/')
> >>> > 5856 2011-01-21 09:58:57 CET LOG:  statement: SELECT
> >>> > pg_start_backup('Streaming Replication', true)
> >>> >  pg_start_backup
> >>> > -----------------
> >>> >  1/E4000020
> >>> > (1 row)
> >>> >
> >>> > 5887 2011-01-21 09:59:34 CET LOG:  statement: SELECT pg_stop_backup()
> >>> > NOTICE:  pg_stop_backup complete, all required WAL segments have been
> >>> > archived
> >>> >  pg_stop_backup
> >>> > ----------------
> >>> >  1/E40000D8
> >>> > (1 row)
> >>> >
> >>> > 5853 2011-01-21 09:59:35 CET LOG:  statement: SELECT
> >>> > pgpool_remote_start('node1', '/pgdata/data90.C/')
> >>> > 5893 2011-01-21 09:59:36 CET LOG:  replication connection authorized:
> >>> > user=postgres host=192.168.10.120 port=45066
> >>> >
> >>> >
> >>> >
> >>> > On Fri, Jan 21, 2011 at 8:44 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
> >>> wrote:
> >>> >
> >>> >> Ok. Can you show the pgpool log and PostgreSQL log(both node 0 and
> 1)
> >>> >> when you did:
> >>> >>
> >>> >> >> >  pcp_recovery_node 20 localhost 9898 postgres <password> 0 -d
> >>> >> --
> >>> >> Tatsuo Ishii
> >>> >> SRA OSS, Inc. Japan
> >>> >> English: http://www.sraoss.co.jp/index_en.php
> >>> >> Japanese: http://www.sraoss.co.jp
> >>> >>
> >>> >> > Actually, yesterday in desperate moments I have done that, I got
> >>> >> > cvs version of pgpool.
> >>> >> > I think I have same situation.
> >>> >> > I have added failback script that creates trigger file on my new
> >>> primary
> >>> >> > node,
> >>> >> > but standby is acting like primary..
> >>> >> >
> >>> >> > On Fri, Jan 21, 2011 at 8:19 AM, Tatsuo Ishii <ishii at sraoss.co.jp
> >
> >>> >> wrote:
> >>> >> >
> >>> >> >> > I'm having problem with pcp_recovery_node.
> >>> >> >> > I have 3 computer architecture. One computer is with installed
> >>> >> pg_pool,
> >>> >> >> and
> >>> >> >> > two other are nodes.
> >>> >> >> > Pg_pool version installed is 3.0.1, and postgres is 9.0.2
> >>> >> >> > I have streaming replication configured.
> >>> >> >> >
> >>> >> >> > I'm trying to setup automatic recovery of nodes.
> >>> >> >> > I set one server to be master, and other slave (read only
> >>> connection).
> >>> >> >> > When primary fails, slave is being used as a master and this
> >>> >> transition
> >>> >> >> from
> >>> >> >> > one
> >>> >> >> > node to another is automatic and everything works great.
> >>> >> >> > But, when I execute pcp_recovery_node command:
> >>> >> >> >
> >>> >> >> >  pcp_recovery_node 20 localhost 9898 postgres <password> 0 -d
> >>> >> >> >
> >>> >> >> > pgpool is trying to make my first node (one that I'm trying to
> >>> >> recover)
> >>> >> >> > as a master node, and everything went well except crating
> trigger
> >>> >> file.
> >>> >> >>
> >>> >> >> That is the limitation of pgpool-II 3.0.x. pgpool-II 3.1 will
> make
> >>> >> >> this possible. You can try the to-be-pgpool-II 3.1 source code
> from
> >>> >> >> CVS HEAD or tar ball from
> >>> >> >>
> >>> >> >>
> >>> >>
> >>>
> http://pgpool.projects.postgresql.org/contrib_docs/simple_sr_setting/index.html
> >>> >> >> --
> >>> >> >> Tatsuo Ishii
> >>> >> >> SRA OSS, Inc. Japan
> >>> >> >> English: http://www.sraoss.co.jp/index_en.php
> >>> >> >> Japanese: http://www.sraoss.co.jp
> >>> >> >>
> >>> >> >> > So, in log file there is no errors, failover.sh is not
> executed,
> >>> and
> >>> >> >> > node1 (just recovered node) is acting like it is master, but it
> is
> >>> not
> >>> >> >> > until I manually create trigger file.
> >>> >> >> >
> >>> >> >> > Why pgpool is trying to make my just recovered node as master,
> and
> >>> >> >> > not just bring it up and set it to act like slave. Can I
> configure
> >>> >> this?
> >>> >> >> >
> >>> >> >> > How can I know what action is provided by executing
> >>> pcp_recovery_node?
> >>> >> >> >
> >>> >> >> > I hope that I descibed my problem well.
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Armin
> >>>
> >>
> >>
> >>
> >> --
> >> Armin
> >>
> >
> >
> >
> > --
> > Armin
>

-- 
Armin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20110125/1739f80e/attachment-0001.html>