[Pgpool-general] pcp_recovery_node

Fri Jan 21 08:09:56 UTC 2011

Ok, this is before starting pcp_recovery_node:

# psql -p 9999 -c "show pool_nodes;" -Upostgres
 node_id |     hostname     | port | status | lb_weight | state
---------+------------------+------+--------+-----------+-------
 0       | node1         | 5432 | 3      | 0.500000  | S
 1       | node2         | 5432 | 2      | 0.500000  | P

then, I type:

pcp_recovery_node 20 localhost 9898 postgres <password> 0 -d

(I have disabled creating trigger file in failback for this example)

pgpool.log (I have removed some debug duplicated messages):

2011-01-21 08:58:18 LOG:   pid 4467: starting recovering node 0

2011-01-21 08:58:18 LOG:   pid 4467: starting recovery command: "SELECT
pgpool_recovery('basebackup.sh', 'puma.intera.si', '/pgdata/data90.C/')"

2011-01-21 08:58:19 DEBUG: pid 4434: starting health checking

2011-01-21 08:58:19 DEBUG: pid 4434: health_check: 0 th DB node status: 3

2011-01-21 08:58:19 DEBUG: pid 4434: health_check: 1 th DB node status: 2

2011-01-21 08:58:56 LOG:   pid 4467: 1st stage is done

2011-01-21 08:58:56 LOG:   pid 4467: check_postmaster_started: try to
connect to postmaster on hostname:node1 database:postgres user:postgres
(retry 0 times)
2011-01-21 08:58:56 LOG:   pid 4467: check_postmaster_started: failed to
connect to postmaster on hostname:node1 database:postgres user:postgres

2011-01-21 08:58:59 LOG:   pid 4467: check_postmaster_started: try to
connect to postmaster on hostname:puma.intera.si database:postgres
user:postgres (retry 1 times)
2011-01-21 08:58:59 LOG:   pid 4467: 0 node restarted

2011-01-21 08:58:59 LOG:   pid 4467: send_failback_request: fail back 0 th
node request from pid 4467

2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler called

2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: starting to select
new master node

2011-01-21 08:58:59 LOG:   pid 4434: starting fail back. reconnect host
node1(5432)

2011-01-21 08:58:59 LOG:   pid 4434: execute command:
/usr/local/pgpool-II/etc/failback.sh 0 "node1" 5432 /pgdata/data90.C/ 0 1
"node2" 1
2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4944

2011-01-21 08:58:59 DEBUG: pid 4944: child received shutdown request signal
3

2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4945

2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4946

2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4947

2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4948
2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4974

2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4975

2011-01-21 08:58:59 LOG:   pid 4434: failover_handler: set new master node:
0

2011-01-21 08:58:59 DEBUG: pid 5583: I am 5583

2011-01-21 08:58:59 DEBUG: pid 4975: child received shutdown request signal
3

2011-01-21 08:58:59 DEBUG: pid 4971: child received shutdown request signal
3
2011-01-21 08:58:59 DEBUG: pid 4945: child received shutdown request signal
3

2011-01-21 08:58:59 DEBUG: pid 5613: I am 5613

2011-01-21 08:58:59 LOG:   pid 4434: failback done. reconnect host
node1(5432)

2011-01-21 08:58:59 DEBUG: pid 5614: I am 5614

2011-01-21 08:58:59 DEBUG: pid 4964: child received shutdown request signal
3

2011-01-21 08:58:59 DEBUG: pid 4434: pool_ssl: SSL requested but SSL support
is not available

2011-01-21 08:58:59 DEBUG: pid 4434: s_do_auth: auth kind: 0

2011-01-21 08:58:59 DEBUG: pid 4434: s_do_auth: parameter status data
received

2011-01-21 08:58:59 DEBUG: pid 4434: s_do_auth: parameter status data
received
2011-01-21 08:58:59 DEBUG: pid 4434: do_query: kind: E

2011-01-21 08:58:59 DEBUG: pid 4434: do_query: kind: Z

2011-01-21 08:58:59 LOG:   pid 4434: find_primary_node: do_query returns no
rows

2011-01-21 08:58:59 LOG:   pid 4434: find_primary_node: do_query returns no
data

2011-01-21 08:58:59 LOG:   pid 4434: find_primary_node: primary node id is 0

2011-01-21 08:58:59 DEBUG: pid 4434: reap_handler called

2011-01-21 08:58:59 DEBUG: pid 4434: reap_handler: call wait3

2011-01-21 08:58:59 DEBUG: pid 4434: child 4944 exits with status 0
2011-01-21 08:58:59 DEBUG: pid 4434: reap_handler: normally exited

2011-01-21 08:58:59 LOG:   pid 4467: recovery done

2011-01-21 08:58:59 DEBUG: pid 4468: do_query: kind: E

2011-01-21 08:58:59 ERROR: pid 4468: pool_read: EOF encountered with backend

2011-01-21 08:58:59 ERROR: pid 4468: do_query: error while reading message
kind

2011-01-21 08:58:59 ERROR: pid 4468: check_replication_time_lag: SELECT
pg_current_xlog_location() failed

2011-01-21 08:59:09 DEBUG: pid 4434: starting health checking

2011-01-21 08:59:09 DEBUG: pid 4434: health_check: 0 th DB node status: 1

2011-01-21 08:59:09 DEBUG: pid 4434: health_check: 1 th DB node status: 2

2011-01-21 08:59:09 DEBUG: pid 4468: pool_ssl: SSL requested but SSL support
is not available

2011-01-21 08:59:09 DEBUG: pid 4468: s_do_auth: auth kind: 0
2011-01-21 09:03:40 DEBUG: pid 4468: do_query: kind: T
2011-01-21 09:03:40 DEBUG: pid 4468: num_fileds: 1
2011-01-21 09:03:40 DEBUG: pid 4468: do_query: kind: E
2011-01-21 09:03:40 DEBUG: pid 4468: do_query: kind: Z
2011-01-21 09:03:40 ERROR: pid 4468: check_replication_time_lag: SELECT
pg_current_xlog_location() returns no rows
2011-01-21 09:03:49 DEBUG: pid 4434: starting health checking
2011-01-21 09:03:49 DEBUG: pid 4434: health_check: 0 th DB node status: 1
2011-01-21 09:03:50 DEBUG: pid 4434: health_check: 1 th DB node status: 2
2011-01-21 09:03:50 DEBUG: pid 4468: do_query: kind: T
2011-01-21 09:03:50 DEBUG: pid 4468: num_fileds: 1
2011-01-21 09:03:50 DEBUG: pid 4468: do_query: kind: E
2011-01-21 09:03:50 DEBUG: pid 4468: do_query: kind: Z
2011-01-21 09:03:50 ERROR: pid 4468: check_replication_time_lag: SELECT
pg_current_xlog_location() returns no rows
2011-01-21 09:04:00 DEBUG: pid 4434: starting health checking
2011-01-21 09:04:00 DEBUG: pid 4434: health_check: 0 th DB node status: 1
2011-01-21 09:04:00 DEBUG: pid 4434: health_check: 1 th DB node status: 2
2011-01-21 09:04:00 DEBUG: pid 4468: do_query: kind: T
2011-01-21 09:04:00 DEBUG: pid 4468: num_fileds: 1

postgresql.conf (node1):

25308 2011-01-21 10:00:40 CET LOG:  database system was interrupted; last
known up at 2011-01-21 09:58:57 CET

25308 2011-01-21 10:00:40 CET LOG:  creating missing WAL directory
"pg_xlog/archive_status"

25308 2011-01-21 10:00:40 CET LOG:  entering standby mode

25309 2011-01-21 10:00:40 CET LOG:  streaming replication successfully
connected to primary

25308 2011-01-21 10:00:41 CET LOG:  redo starts at 1/E4000020

25308 2011-01-21 10:00:41 CET LOG:  consistent recovery state reached at
1/E5000000

25306 2011-01-21 10:00:41 CET LOG:  database system is ready to accept read
only connections

25314 2011-01-21 10:00:43 CET LOG:  statement: SELECT pg_is_in_recovery()
AND pgpool_walrecrunning()

25314 2011-01-21 10:00:43 CET ERROR:  function pgpool_walrecrunning() does
not exist at character 32

25314 2011-01-21 10:00:43 CET HINT:  No function matches the given name and
argument types. You might need to add explicit type casts.

25314 2011-01-21 10:00:43 CET STATEMENT:  SELECT pg_is_in_recovery() AND
pgpool_walrecrunning()

25314 2011-01-21 10:00:43 CET LOG:  unexpected EOF on client connection

25319 2011-01-21 10:00:53 CET LOG:  statement: SELECT
pg_current_xlog_location()

25319 2011-01-21 10:00:53 CET ERROR:  recovery is in progress

25319 2011-01-21 10:00:53 CET HINT:  WAL control functions cannot be
executed during recovery.

25319 2011-01-21 10:00:53 CET STATEMENT:  SELECT pg_current_xlog_location()

25319 2011-01-21 10:01:03 CET LOG:  statement: SELECT
pg_current_xlog_location()

25319 2011-01-21 10:01:03 CET ERROR:  recovery is in progress

25319 2011-01-21 10:01:03 CET HINT:  WAL control functions cannot be
executed during recovery.

25319 2011-01-21 10:01:03 CET STATEMENT:  SELECT pg_current_xlog_location()

25319 2011-01-21 10:01:13 CET LOG:  statement: SELECT
pg_current_xlog_location()

25319 2011-01-21 10:01:13 CET ERROR:  recovery is in progress

25319 2011-01-21 10:01:13 CET HINT:  WAL control functions cannot be
executed during recovery.
25319 2011-01-21 10:01:13 CET STATEMENT:  SELECT pg_current_xlog_location()
25319 2011-01-21 10:01:23 CET LOG:  statement: SELECT
pg_current_xlog_location()
25319 2011-01-21 10:01:23 CET ERROR:  recovery is in progress
25319 2011-01-21 10:01:23 CET HINT:  WAL control functions cannot be
executed during recovery.
25319 2011-01-21 10:01:23 CET STATEMENT:  SELECT pg_current_xlog_location()
25319 2011-01-21 10:01:33 CET LOG:  statement: SELECT
pg_current_xlog_location()

postgresql.conf (node2):

5853 2011-01-21 09:58:57 CET LOG:  statement: SELECT
pgpool_recovery('basebackup.sh', node1', '/pgdata/data90.C/')
5856 2011-01-21 09:58:57 CET LOG:  statement: SELECT
pg_start_backup('Streaming Replication', true)
 pg_start_backup
-----------------
 1/E4000020
(1 row)

5887 2011-01-21 09:59:34 CET LOG:  statement: SELECT pg_stop_backup()
NOTICE:  pg_stop_backup complete, all required WAL segments have been
archived
 pg_stop_backup
----------------
 1/E40000D8
(1 row)

5853 2011-01-21 09:59:35 CET LOG:  statement: SELECT
pgpool_remote_start('node1', '/pgdata/data90.C/')
5893 2011-01-21 09:59:36 CET LOG:  replication connection authorized:
user=postgres host=192.168.10.120 port=45066

On Fri, Jan 21, 2011 at 8:44 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Ok. Can you show the pgpool log and PostgreSQL log(both node 0 and 1)
> when you did:
>
> >> >  pcp_recovery_node 20 localhost 9898 postgres <password> 0 -d
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > Actually, yesterday in desperate moments I have done that, I got
> > cvs version of pgpool.
> > I think I have same situation.
> > I have added failback script that creates trigger file on my new primary
> > node,
> > but standby is acting like primary..
> >
> > On Fri, Jan 21, 2011 at 8:19 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >
> >> > I'm having problem with pcp_recovery_node.
> >> > I have 3 computer architecture. One computer is with installed
> pg_pool,
> >> and
> >> > two other are nodes.
> >> > Pg_pool version installed is 3.0.1, and postgres is 9.0.2
> >> > I have streaming replication configured.
> >> >
> >> > I'm trying to setup automatic recovery of nodes.
> >> > I set one server to be master, and other slave (read only connection).
> >> > When primary fails, slave is being used as a master and this
> transition
> >> from
> >> > one
> >> > node to another is automatic and everything works great.
> >> > But, when I execute pcp_recovery_node command:
> >> >
> >> >  pcp_recovery_node 20 localhost 9898 postgres <password> 0 -d
> >> >
> >> > pgpool is trying to make my first node (one that I'm trying to
> recover)
> >> > as a master node, and everything went well except crating trigger
> file.
> >>
> >> That is the limitation of pgpool-II 3.0.x. pgpool-II 3.1 will make
> >> this possible. You can try the to-be-pgpool-II 3.1 source code from
> >> CVS HEAD or tar ball from
> >>
> >>
> http://pgpool.projects.postgresql.org/contrib_docs/simple_sr_setting/index.html
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese: http://www.sraoss.co.jp
> >>
> >> > So, in log file there is no errors, failover.sh is not executed, and
> >> > node1 (just recovered node) is acting like it is master, but it is not
> >> > until I manually create trigger file.
> >> >
> >> > Why pgpool is trying to make my just recovered node as master, and
> >> > not just bring it up and set it to act like slave. Can I configure
> this?
> >> >
> >> > How can I know what action is provided by executing pcp_recovery_node?
> >> >
> >> > I hope that I descibed my problem well.
>

-- 
Armin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pgfoundry.org/pipermail/pgpool-general/attachments/20110121/a059f6fc/attachment-0001.html>