[Pgpool-general] pcp_recovery_node

Tue Jan 25 00:07:34 UTC 2011

>>> This doesn't look good?
>>>
>>> 25314 2011-01-21 10:00:43 CET ERROR:  function pgpool_walrecrunning() does
>>> not exist at character 32

Have you installed pgpool_walrecrunning()?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> No answer here?
> 
> That's real problem for me.
> I would really appreciate any help or suggestion.
> 
> Thanks.
> 
> On Fri, Jan 21, 2011 at 9:21 AM, Armin Nesiren <armin.nesiren at intera.si>wrote:
> 
>> This is what bothers me
>> from node1 postgresql.log:
>>
>> 25319 2011-01-21 10:01:23 CET LOG:  statement: SELECT
>> pg_current_xlog_location()
>> 25319 2011-01-21 10:01:23 CET ERROR:  recovery is in progress
>> 25319 2011-01-21 10:01:23 CET HINT:  WAL control functions cannot be
>> executed during recovery.
>>
>> What this error mean?
>> SELECT can not be executed on node1 and node1 have primary status.
>> And where did I tell pgpool to switch primaries from node2 -> node1.
>>
>> new status:
>>
>> # psql -p 9999 -c "show pool_nodes;" -Upostgres
>>  node_id |     hostname     | port | status | lb_weight | state
>> ---------+------------------+------+--------+-----------+-------
>>  0       | node1         | 5432 | 2      | 0.500000  | P
>>  1       | node2         | 5432 | 2      | 0.500000  | S
>>
>>
>> On Fri, Jan 21, 2011 at 9:15 AM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>
>>> This doesn't look good?
>>>
>>> 25314 2011-01-21 10:00:43 CET ERROR:  function pgpool_walrecrunning() does
>>> not exist at character 32
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>> > Ok, this is before starting pcp_recovery_node:
>>> >
>>> > # psql -p 9999 -c "show pool_nodes;" -Upostgres
>>> >  node_id |     hostname     | port | status | lb_weight | state
>>> > ---------+------------------+------+--------+-----------+-------
>>> >  0       | node1         | 5432 | 3      | 0.500000  | S
>>> >  1       | node2         | 5432 | 2      | 0.500000  | P
>>> >
>>> > then, I type:
>>> >
>>> > pcp_recovery_node 20 localhost 9898 postgres <password> 0 -d
>>> >
>>> > (I have disabled creating trigger file in failback for this example)
>>> >
>>> > pgpool.log (I have removed some debug duplicated messages):
>>> >
>>> > 2011-01-21 08:58:18 LOG:   pid 4467: starting recovering node 0
>>> >
>>> >
>>> > 2011-01-21 08:58:18 LOG:   pid 4467: starting recovery command: "SELECT
>>> > pgpool_recovery('basebackup.sh', 'puma.intera.si',
>>> '/pgdata/data90.C/')"
>>> >
>>> > 2011-01-21 08:58:19 DEBUG: pid 4434: starting health checking
>>> >
>>> >
>>> > 2011-01-21 08:58:19 DEBUG: pid 4434: health_check: 0 th DB node status:
>>> 3
>>> >
>>> >
>>> > 2011-01-21 08:58:19 DEBUG: pid 4434: health_check: 1 th DB node status:
>>> 2
>>> >
>>> >
>>> > 2011-01-21 08:58:56 LOG:   pid 4467: 1st stage is done
>>> >
>>> >
>>> > 2011-01-21 08:58:56 LOG:   pid 4467: check_postmaster_started: try to
>>> > connect to postmaster on hostname:node1 database:postgres user:postgres
>>> > (retry 0 times)
>>> > 2011-01-21 08:58:56 LOG:   pid 4467: check_postmaster_started: failed to
>>> > connect to postmaster on hostname:node1 database:postgres user:postgres
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4467: check_postmaster_started: try to
>>> > connect to postmaster on hostname:puma.intera.si database:postgres
>>> > user:postgres (retry 1 times)
>>> > 2011-01-21 08:58:59 LOG:   pid 4467: 0 node restarted
>>> >
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4467: send_failback_request: fail back 0
>>> th
>>> > node request from pid 4467
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler called
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: starting to
>>> select
>>> > new master node
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4434: starting fail back. reconnect host
>>> > node1(5432)
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4434: execute command:
>>> > /usr/local/pgpool-II/etc/failback.sh 0 "node1" 5432 /pgdata/data90.C/ 0
>>> 1
>>> > "node2" 1
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4944
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4944: child received shutdown request
>>> signal
>>> > 3
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4945
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4946
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4947
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4948
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4974
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: failover_handler: kill 4975
>>> >
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4434: failover_handler: set new master
>>> node:
>>> > 0
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 5583: I am 5583
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4975: child received shutdown request
>>> signal
>>> > 3
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4971: child received shutdown request
>>> signal
>>> > 3
>>> > 2011-01-21 08:58:59 DEBUG: pid 4945: child received shutdown request
>>> signal
>>> > 3
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 5613: I am 5613
>>> >
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4434: failback done. reconnect host
>>> > node1(5432)
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 5614: I am 5614
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4964: child received shutdown request
>>> signal
>>> > 3
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: pool_ssl: SSL requested but SSL
>>> support
>>> > is not available
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: s_do_auth: auth kind: 0
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: s_do_auth: parameter status data
>>> > received
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: s_do_auth: parameter status data
>>> > received
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: do_query: kind: E
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: do_query: kind: Z
>>> >
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4434: find_primary_node: do_query returns
>>> no
>>> > rows
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4434: find_primary_node: do_query returns
>>> no
>>> > data
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4434: find_primary_node: primary node id
>>> is 0
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: reap_handler called
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: reap_handler: call wait3
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: child 4944 exits with status 0
>>> > 2011-01-21 08:58:59 DEBUG: pid 4434: reap_handler: normally exited
>>> >
>>> >
>>> > 2011-01-21 08:58:59 LOG:   pid 4467: recovery done
>>> >
>>> >
>>> > 2011-01-21 08:58:59 DEBUG: pid 4468: do_query: kind: E
>>> >
>>> >
>>> > 2011-01-21 08:58:59 ERROR: pid 4468: pool_read: EOF encountered with
>>> backend
>>> >
>>> >
>>> > 2011-01-21 08:58:59 ERROR: pid 4468: do_query: error while reading
>>> message
>>> > kind
>>> >
>>> > 2011-01-21 08:58:59 ERROR: pid 4468: check_replication_time_lag: SELECT
>>> > pg_current_xlog_location() failed
>>> >
>>> > 2011-01-21 08:59:09 DEBUG: pid 4434: starting health checking
>>> >
>>> >
>>> > 2011-01-21 08:59:09 DEBUG: pid 4434: health_check: 0 th DB node status:
>>> 1
>>> >
>>> >
>>> > 2011-01-21 08:59:09 DEBUG: pid 4434: health_check: 1 th DB node status:
>>> 2
>>> >
>>> >
>>> > 2011-01-21 08:59:09 DEBUG: pid 4468: pool_ssl: SSL requested but SSL
>>> support
>>> > is not available
>>> >
>>> > 2011-01-21 08:59:09 DEBUG: pid 4468: s_do_auth: auth kind: 0
>>> > 2011-01-21 09:03:40 DEBUG: pid 4468: do_query: kind: T
>>> > 2011-01-21 09:03:40 DEBUG: pid 4468: num_fileds: 1
>>> > 2011-01-21 09:03:40 DEBUG: pid 4468: do_query: kind: E
>>> > 2011-01-21 09:03:40 DEBUG: pid 4468: do_query: kind: Z
>>> > 2011-01-21 09:03:40 ERROR: pid 4468: check_replication_time_lag: SELECT
>>> > pg_current_xlog_location() returns no rows
>>> > 2011-01-21 09:03:49 DEBUG: pid 4434: starting health checking
>>> > 2011-01-21 09:03:49 DEBUG: pid 4434: health_check: 0 th DB node status:
>>> 1
>>> > 2011-01-21 09:03:50 DEBUG: pid 4434: health_check: 1 th DB node status:
>>> 2
>>> > 2011-01-21 09:03:50 DEBUG: pid 4468: do_query: kind: T
>>> > 2011-01-21 09:03:50 DEBUG: pid 4468: num_fileds: 1
>>> > 2011-01-21 09:03:50 DEBUG: pid 4468: do_query: kind: E
>>> > 2011-01-21 09:03:50 DEBUG: pid 4468: do_query: kind: Z
>>> > 2011-01-21 09:03:50 ERROR: pid 4468: check_replication_time_lag: SELECT
>>> > pg_current_xlog_location() returns no rows
>>> > 2011-01-21 09:04:00 DEBUG: pid 4434: starting health checking
>>> > 2011-01-21 09:04:00 DEBUG: pid 4434: health_check: 0 th DB node status:
>>> 1
>>> > 2011-01-21 09:04:00 DEBUG: pid 4434: health_check: 1 th DB node status:
>>> 2
>>> > 2011-01-21 09:04:00 DEBUG: pid 4468: do_query: kind: T
>>> > 2011-01-21 09:04:00 DEBUG: pid 4468: num_fileds: 1
>>> >
>>> > postgresql.conf (node1):
>>> >
>>> > 25308 2011-01-21 10:00:40 CET LOG:  database system was interrupted;
>>> last
>>> > known up at 2011-01-21 09:58:57 CET
>>> >
>>> > 25308 2011-01-21 10:00:40 CET LOG:  creating missing WAL directory
>>> > "pg_xlog/archive_status"
>>> >
>>> > 25308 2011-01-21 10:00:40 CET LOG:  entering standby mode
>>> >
>>> >
>>> > 25309 2011-01-21 10:00:40 CET LOG:  streaming replication successfully
>>> > connected to primary
>>> >
>>> > 25308 2011-01-21 10:00:41 CET LOG:  redo starts at 1/E4000020
>>> >
>>> >
>>> > 25308 2011-01-21 10:00:41 CET LOG:  consistent recovery state reached at
>>> > 1/E5000000
>>> >
>>> > 25306 2011-01-21 10:00:41 CET LOG:  database system is ready to accept
>>> read
>>> > only connections
>>> >
>>> > 25314 2011-01-21 10:00:43 CET LOG:  statement: SELECT
>>> pg_is_in_recovery()
>>> > AND pgpool_walrecrunning()
>>> >
>>> > 25314 2011-01-21 10:00:43 CET ERROR:  function pgpool_walrecrunning()
>>> does
>>> > not exist at character 32
>>> >
>>> > 25314 2011-01-21 10:00:43 CET HINT:  No function matches the given name
>>> and
>>> > argument types. You might need to add explicit type casts.
>>> >
>>> > 25314 2011-01-21 10:00:43 CET STATEMENT:  SELECT pg_is_in_recovery() AND
>>> > pgpool_walrecrunning()
>>> >
>>> > 25314 2011-01-21 10:00:43 CET LOG:  unexpected EOF on client connection
>>> >
>>> >
>>> > 25319 2011-01-21 10:00:53 CET LOG:  statement: SELECT
>>> > pg_current_xlog_location()
>>> >
>>> > 25319 2011-01-21 10:00:53 CET ERROR:  recovery is in progress
>>> >
>>> >
>>> > 25319 2011-01-21 10:00:53 CET HINT:  WAL control functions cannot be
>>> > executed during recovery.
>>> >
>>> > 25319 2011-01-21 10:00:53 CET STATEMENT:  SELECT
>>> pg_current_xlog_location()
>>> >
>>> >
>>> > 25319 2011-01-21 10:01:03 CET LOG:  statement: SELECT
>>> > pg_current_xlog_location()
>>> >
>>> > 25319 2011-01-21 10:01:03 CET ERROR:  recovery is in progress
>>> >
>>> >
>>> > 25319 2011-01-21 10:01:03 CET HINT:  WAL control functions cannot be
>>> > executed during recovery.
>>> >
>>> > 25319 2011-01-21 10:01:03 CET STATEMENT:  SELECT
>>> pg_current_xlog_location()
>>> >
>>> >
>>> > 25319 2011-01-21 10:01:13 CET LOG:  statement: SELECT
>>> > pg_current_xlog_location()
>>> >
>>> > 25319 2011-01-21 10:01:13 CET ERROR:  recovery is in progress
>>> >
>>> >
>>> > 25319 2011-01-21 10:01:13 CET HINT:  WAL control functions cannot be
>>> > executed during recovery.
>>> > 25319 2011-01-21 10:01:13 CET STATEMENT:  SELECT
>>> pg_current_xlog_location()
>>> > 25319 2011-01-21 10:01:23 CET LOG:  statement: SELECT
>>> > pg_current_xlog_location()
>>> > 25319 2011-01-21 10:01:23 CET ERROR:  recovery is in progress
>>> > 25319 2011-01-21 10:01:23 CET HINT:  WAL control functions cannot be
>>> > executed during recovery.
>>> > 25319 2011-01-21 10:01:23 CET STATEMENT:  SELECT
>>> pg_current_xlog_location()
>>> > 25319 2011-01-21 10:01:33 CET LOG:  statement: SELECT
>>> > pg_current_xlog_location()
>>> >
>>> > postgresql.conf (node2):
>>> >
>>> > 5853 2011-01-21 09:58:57 CET LOG:  statement: SELECT
>>> > pgpool_recovery('basebackup.sh', node1', '/pgdata/data90.C/')
>>> > 5856 2011-01-21 09:58:57 CET LOG:  statement: SELECT
>>> > pg_start_backup('Streaming Replication', true)
>>> >  pg_start_backup
>>> > -----------------
>>> >  1/E4000020
>>> > (1 row)
>>> >
>>> > 5887 2011-01-21 09:59:34 CET LOG:  statement: SELECT pg_stop_backup()
>>> > NOTICE:  pg_stop_backup complete, all required WAL segments have been
>>> > archived
>>> >  pg_stop_backup
>>> > ----------------
>>> >  1/E40000D8
>>> > (1 row)
>>> >
>>> > 5853 2011-01-21 09:59:35 CET LOG:  statement: SELECT
>>> > pgpool_remote_start('node1', '/pgdata/data90.C/')
>>> > 5893 2011-01-21 09:59:36 CET LOG:  replication connection authorized:
>>> > user=postgres host=192.168.10.120 port=45066
>>> >
>>> >
>>> >
>>> > On Fri, Jan 21, 2011 at 8:44 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
>>> wrote:
>>> >
>>> >> Ok. Can you show the pgpool log and PostgreSQL log(both node 0 and 1)
>>> >> when you did:
>>> >>
>>> >> >> >  pcp_recovery_node 20 localhost 9898 postgres <password> 0 -d
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS, Inc. Japan
>>> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> Japanese: http://www.sraoss.co.jp
>>> >>
>>> >> > Actually, yesterday in desperate moments I have done that, I got
>>> >> > cvs version of pgpool.
>>> >> > I think I have same situation.
>>> >> > I have added failback script that creates trigger file on my new
>>> primary
>>> >> > node,
>>> >> > but standby is acting like primary..
>>> >> >
>>> >> > On Fri, Jan 21, 2011 at 8:19 AM, Tatsuo Ishii <ishii at sraoss.co.jp>
>>> >> wrote:
>>> >> >
>>> >> >> > I'm having problem with pcp_recovery_node.
>>> >> >> > I have 3 computer architecture. One computer is with installed
>>> >> pg_pool,
>>> >> >> and
>>> >> >> > two other are nodes.
>>> >> >> > Pg_pool version installed is 3.0.1, and postgres is 9.0.2
>>> >> >> > I have streaming replication configured.
>>> >> >> >
>>> >> >> > I'm trying to setup automatic recovery of nodes.
>>> >> >> > I set one server to be master, and other slave (read only
>>> connection).
>>> >> >> > When primary fails, slave is being used as a master and this
>>> >> transition
>>> >> >> from
>>> >> >> > one
>>> >> >> > node to another is automatic and everything works great.
>>> >> >> > But, when I execute pcp_recovery_node command:
>>> >> >> >
>>> >> >> >  pcp_recovery_node 20 localhost 9898 postgres <password> 0 -d
>>> >> >> >
>>> >> >> > pgpool is trying to make my first node (one that I'm trying to
>>> >> recover)
>>> >> >> > as a master node, and everything went well except crating trigger
>>> >> file.
>>> >> >>
>>> >> >> That is the limitation of pgpool-II 3.0.x. pgpool-II 3.1 will make
>>> >> >> this possible. You can try the to-be-pgpool-II 3.1 source code from
>>> >> >> CVS HEAD or tar ball from
>>> >> >>
>>> >> >>
>>> >>
>>> http://pgpool.projects.postgresql.org/contrib_docs/simple_sr_setting/index.html
>>> >> >> --
>>> >> >> Tatsuo Ishii
>>> >> >> SRA OSS, Inc. Japan
>>> >> >> English: http://www.sraoss.co.jp/index_en.php
>>> >> >> Japanese: http://www.sraoss.co.jp
>>> >> >>
>>> >> >> > So, in log file there is no errors, failover.sh is not executed,
>>> and
>>> >> >> > node1 (just recovered node) is acting like it is master, but it is
>>> not
>>> >> >> > until I manually create trigger file.
>>> >> >> >
>>> >> >> > Why pgpool is trying to make my just recovered node as master, and
>>> >> >> > not just bring it up and set it to act like slave. Can I configure
>>> >> this?
>>> >> >> >
>>> >> >> > How can I know what action is provided by executing
>>> pcp_recovery_node?
>>> >> >> >
>>> >> >> > I hope that I descibed my problem well.
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Armin
>>>
>>
>>
>>
>> --
>> Armin
>>
> 
> 
> 
> -- 
> Armin