[Pgpool-general] Database nodes mismatch after failback

Tue Dec 15 14:49:38 UTC 2009

Tatsuo,
there was a "RAISE NOTICE" in the procedure body returning a variable
value for debug purporse (vcheck and vcheckvalidate.. trainee's
job!!).
It was removed now.
Maybe this N in pgpool log was this. let me know if i'm wrong!

        PL/pgSQL function "mova_rcv_tracker_data" line 380 at SQL statement
2009-12-14 12:05:06 BRST STATEMENT:  /*REPLICATION*/ select
mova_rcv_tracker_data(700106,'2009-12-14 14:04:36'::timestamp without
time zone, -2.5078, -44.284, 0 , 5, 55.7452, 1::boolean, 0::boolean,
0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean,
0::boolean, 0::boolean, 1::boolean,1::boolean, 1249141, 46, 14, 0, 0,
0, 0, 29923, 0::boolean)
2009-12-14 12:05:06 BRST NOTICE:  vcheckvalidate = true
2009-12-14 12:05:06 BRST CONTEXT:  SQL statement "SELECT * FROM
MOVA_ANALYZE_ALARM( $1 )"
        PL/pgSQL function "mova_rcv_tracker_data" line 380 at SQL statement
2009-12-14 12:05:06 BRST NOTICE:  vcheck = false

.....

> Yes, in this case pgpool-II does not do fail over. There were
> discussions if we should fail over in this case or not. Maybe we
> should add a knob to control the behavior?

Why pgpool doesn

On Mon, Dec 14, 2009 at 9:36 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> i'm using pgpool II with postgres 8.3 and Ubuntu as OS.
>> I have 2 database nodes configured with load balance and replication mode
>> and the replication_stop_on_mismatch is set to true.
>> the database has a moderated load, about 27.000 records a day, an average of
>> 18 records each minute, inserted trough a stored procedure that is called by
>> the application, to call the procedure i use /*REPLICATION*/ hint because
>> the proc changes data in DB. is it correct? is it necessary?
>>
>> i'm having 2 problems..
>>
>> 1 - sometimes i see in the logfile a message of difference between nodes,
>> but the node isn't being degenerated, and other times the degeneration
>> works..
>>
>> Example, here the node wasn't degenerated...
>>
>> Dec 14 12:04:56 pgsql1 pgpool: 2009-12-14 12:04:56 ERROR: pid 5558: pgpool
>> detected difference of the number of inserted, updated or deleted tuples.
>> Possible last query was: "UPDATE unit_alarm_violation SET read_by_user_id =
>> 41 , read_date = now() , status = 2 WHERE id  = 3114 AND status = 1"
>
> Yes, in this case pgpool-II does not do fail over. There were
> discussions if we should fail over in this case or not. Maybe we
> should add a knob to control the behavior?
>
>> and few minutes after...
>>
>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5558: pgpool
>> detected difference of the number of inserted, updated or deleted tuples.
>> Possible last query was: "UPDATE unit_alarm_violation SET read_by_user_id =
>> 41 , read_date = now() , status = 2 WHERE id  = 3114 AND status = 1"
>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5570:
>> read_kind_from_backend: 1 th kind N does not match with master or majority
>> connection kind D
>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5570: kind
>> mismatch among backends. Possible last query was: "/*REPLICATION*/ select
>> mova_rcv_tracker_data(332775,'2009-12-14 14:03:51'::timestamp without time
>> zone, -2.53511, -44.2782, 0 , 3, 25.7428, 1::boolean, 1::boolean,
>> 0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean,
>> 0::boolean, 1::boolean,1::boolean, 10042016, 787, 12, 0, 0, 0, 0, 65251,
>> 0::boolean)" kind details are: 0[D] 1[N]
>
> [D] means data packet. So the SELECT on the master node returned
> data. So far so good. [N] means notification packet (when you see
> "NOTICE..." N packet is sent from PostgreSQL). So it seems the slave
> returns NOTICE message while it should return data packet. You should
> be able to see what NOTICE message was sent by looking at PostgreSQL
> log on the slave. Can you show me it?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5570:
>> notice_backend_error: 1 fail over request from pid 5570
>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
>> starting degeneration. shutdown host pgsql2(5432)
>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
>> failover_handler: set new master node: 0
>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
>> failover done. shutdown host pgsql2(5432)
>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5727:
>> connection received: host=10.20.24.35 port=60233
>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5727:
>> statement: select u.id, r.name, u.rule_message, r.email, u.unit_label,
>> l.description as "level", TO_CHAR(u.date, 'DD/MM/YYYY HH24:MI:SS') as data
>>  , u.street ||', '||number||' - '||region||' - '||city||', '||state as
>> address  FROM  alarm_rule r  , unit_alarm_violation u  , alarm_rule_level l
>>  WHERE  u.rule_id = r.id  AND email_send = false  AND l.id = u.rule_level
>> ...
>> ...
>>
>>
>>
>> 2 - My other problem is to recover a node.
>> After failover i do the pcp_recovery_node, and the process works without any
>> error, the rsync works fine, but the database keeps with differences, the
>> node backs and goes down after few seconds.
>>
>>
>> Dec 14 11:58:28 pgsql1 pgpool[5392]: Executing base-backup as user postgres
>> Dec 14 11:58:28 pgsql1 pgpool[5393]: Executing pg_start_backup
>> Dec 14 11:58:28 pgsql1 pgpool[5396]: Creating file recovery.conf
>> Dec 14 11:58:28 pgsql1 pgpool[5397]: Rsyncing directory base
>> Dec 14 11:58:28 pgsql1 pgpool[5400]: Rsyncing directory global
>> Dec 14 11:58:29 pgsql1 pgpool[5403]: Rsyncing directory pg_clog
>> Dec 14 11:58:29 pgsql1 pgpool[5406]: Rsyncing directory pg_multixact
>> Dec 14 11:58:30 pgsql1 pgpool[5410]: Rsyncing directory pg_subtrans
>> Dec 14 11:58:30 pgsql1 pgpool[5413]: Rsyncing directory pg_tblspc
>> Dec 14 11:59:45 pgsql1 pgpool[5473]: Rsyncing directory pg_twophase
>> Dec 14 11:59:46 pgsql1 pgpool[5476]: Rsyncing directory pg_xlog
>> Dec 14 12:00:04 pgsql1 pgpool[5500]: Rsyncing file recovery.conf (with
>> source deletion)
>> Dec 14 12:00:05 pgsql1 pgpool[5503]: Executing pg_stop_backup
>> Dec 14 12:02:05 pgsql1 pgpool[5529]: Executing pgpool-recovery-pitr as user
>> postgres
>> Dec 14 12:02:05 pgsql1 pgpool[5530]: Executing pg_switch_xlog
>> Dec 14 12:02:05 pgsql1 pgpool[5539]: Executing pgpool_remote_start as user
>> postgres
>> Dec 14 12:02:05 pgsql1 pgpool[5540]: Starting remote PostgreSQL server
>>
>>
>> ps. I used the Jaume's article (at linuxsilo) as reference to implement
>> pgpool...
>>
>> Tks for your help
>>
>>
>>
>> --
>> Marcos Davi Reis
>> Mova
>> www.movaomundo.com
>> +55 21 3553-1511
>> +55 21 9923-8319
>

-- 
Marcos Davi Reis
Mova
www.movaomundo.com
+55 21 3553-1511
+55 21 9923-8319