[Pgpool-general] Database nodes mismatch after failback

Tue Dec 15 15:06:19 UTC 2009

> Tatsuo,
> the previous mail was sent accidentally unfinished..
> 
> there was a "RAISE NOTICE" in the procedure body returning a variable
> value for debug purpose (vcheck and vcheckvalidate.. trainee's
> job!!).
> It was removed now.
> Maybe this N in pgpool log was this. let me know if i'm wrong!

Yes, RAISE NOTICE definitely produces N packet.

>         PL/pgSQL function "mova_rcv_tracker_data" line 380 at SQL statement
>  2009-12-14 12:05:06 BRST STATEMENT:  /*REPLICATION*/ select
>  mova_rcv_tracker_data(700106,'2009-12-14 14:04:36'::timestamp without
>  time zone, -2.5078, -44.284, 0 , 5, 55.7452, 1::boolean, 0::boolean,
>  0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean,
>  0::boolean, 0::boolean, 1::boolean,1::boolean, 1249141, 46, 14, 0, 0,
>  0, 0, 29923, 0::boolean)
>  2009-12-14 12:05:06 BRST NOTICE:  vcheckvalidate = true
>  2009-12-14 12:05:06 BRST CONTEXT:  SQL statement "SELECT * FROM
>  MOVA_ANALYZE_ALARM( $1 )"
>         PL/pgSQL function "mova_rcv_tracker_data" line 380 at SQL statement
>  2009-12-14 12:05:06 BRST NOTICE:  vcheck = false
> 
> 
>  .....
> 
> > Yes, in this case pgpool-II does not do fail over. There were
> > discussions if we should fail over in this case or not. Maybe we
> > should add a knob to control the behavior?
> 
>  Why pgpool doesn't do failover? what is the difference calling a proc
> or executing a single statement?
> i think the node needs to be degenerated in this case too..

What I meant "in this case" was...

pgpool treats DML(INSERT/UPDATE/DELETE) specially. These statements
returns how many rows are affected by the query. pgpool compares those
numbers from DB nodes and complains if there are not identical.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> tks for your help!
> 
> 
> Att,
> Marcos
> 
> 
> 
> 
> 
> 
> 
> 
> > On Mon, Dec 14, 2009 at 9:36 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >>> i'm using pgpool II with postgres 8.3 and Ubuntu as OS.
> >>> I have 2 database nodes configured with load balance and replication mode
> >>> and the replication_stop_on_mismatch is set to true.
> >>> the database has a moderated load, about 27.000 records a day, an average of
> >>> 18 records each minute, inserted trough a stored procedure that is called by
> >>> the application, to call the procedure i use /*REPLICATION*/ hint because
> >>> the proc changes data in DB. is it correct? is it necessary?
> >>>
> >>> i'm having 2 problems..
> >>>
> >>> 1 - sometimes i see in the logfile a message of difference between nodes,
> >>> but the node isn't being degenerated, and other times the degeneration
> >>> works..
> >>>
> >>> Example, here the node wasn't degenerated...
> >>>
> >>> Dec 14 12:04:56 pgsql1 pgpool: 2009-12-14 12:04:56 ERROR: pid 5558: pgpool
> >>> detected difference of the number of inserted, updated or deleted tuples.
> >>> Possible last query was: "UPDATE unit_alarm_violation SET read_by_user_id =
> >>> 41 , read_date = now() , status = 2 WHERE id  = 3114 AND status = 1"
> >>
> >> Yes, in this case pgpool-II does not do fail over. There were
> >> discussions if we should fail over in this case or not. Maybe we
> >> should add a knob to control the behavior?
> >>
> >>> and few minutes after...
> >>>
> >>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5558: pgpool
> >>> detected difference of the number of inserted, updated or deleted tuples.
> >>> Possible last query was: "UPDATE unit_alarm_violation SET read_by_user_id =
> >>> 41 , read_date = now() , status = 2 WHERE id  = 3114 AND status = 1"
> >>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5570:
> >>> read_kind_from_backend: 1 th kind N does not match with master or majority
> >>> connection kind D
> >>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5570: kind
> >>> mismatch among backends. Possible last query was: "/*REPLICATION*/ select
> >>> mova_rcv_tracker_data(332775,'2009-12-14 14:03:51'::timestamp without time
> >>> zone, -2.53511, -44.2782, 0 , 3, 25.7428, 1::boolean, 1::boolean,
> >>> 0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean,
> >>> 0::boolean, 1::boolean,1::boolean, 10042016, 787, 12, 0, 0, 0, 0, 65251,
> >>> 0::boolean)" kind details are: 0[D] 1[N]
> >>
> >> [D] means data packet. So the SELECT on the master node returned
> >> data. So far so good. [N] means notification packet (when you see
> >> "NOTICE..." N packet is sent from PostgreSQL). So it seems the slave
> >> returns NOTICE message while it should return data packet. You should
> >> be able to see what NOTICE message was sent by looking at PostgreSQL
> >> log on the slave. Can you show me it?
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >>
> >>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5570:
> >>> notice_backend_error: 1 fail over request from pid 5570
> >>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
> >>> starting degeneration. shutdown host pgsql2(5432)
> >>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
> >>> failover_handler: set new master node: 0
> >>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
> >>> failover done. shutdown host pgsql2(5432)
> >>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5727:
> >>> connection received: host=10.20.24.35 port=60233
> >>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5727:
> >>> statement: select u.id, r.name, u.rule_message, r.email, u.unit_label,
> >>> l.description as "level", TO_CHAR(u.date, 'DD/MM/YYYY HH24:MI:SS') as data
> >>>  , u.street ||', '||number||' - '||region||' - '||city||', '||state as
> >>> address  FROM  alarm_rule r  , unit_alarm_violation u  , alarm_rule_level l
> >>>  WHERE  u.rule_id = r.id  AND email_send = false  AND l.id = u.rule_level
> >>> ...
> >>> ...
> >>>
> >>>
> >>>
> >>> 2 - My other problem is to recover a node.
> >>> After failover i do the pcp_recovery_node, and the process works without any
> >>> error, the rsync works fine, but the database keeps with differences, the
> >>> node backs and goes down after few seconds.
> >>>
> >>>
> >>> Dec 14 11:58:28 pgsql1 pgpool[5392]: Executing base-backup as user postgres
> >>> Dec 14 11:58:28 pgsql1 pgpool[5393]: Executing pg_start_backup
> >>> Dec 14 11:58:28 pgsql1 pgpool[5396]: Creating file recovery.conf
> >>> Dec 14 11:58:28 pgsql1 pgpool[5397]: Rsyncing directory base
> >>> Dec 14 11:58:28 pgsql1 pgpool[5400]: Rsyncing directory global
> >>> Dec 14 11:58:29 pgsql1 pgpool[5403]: Rsyncing directory pg_clog
> >>> Dec 14 11:58:29 pgsql1 pgpool[5406]: Rsyncing directory pg_multixact
> >>> Dec 14 11:58:30 pgsql1 pgpool[5410]: Rsyncing directory pg_subtrans
> >>> Dec 14 11:58:30 pgsql1 pgpool[5413]: Rsyncing directory pg_tblspc
> >>> Dec 14 11:59:45 pgsql1 pgpool[5473]: Rsyncing directory pg_twophase
> >>> Dec 14 11:59:46 pgsql1 pgpool[5476]: Rsyncing directory pg_xlog
> >>> Dec 14 12:00:04 pgsql1 pgpool[5500]: Rsyncing file recovery.conf (with
> >>> source deletion)
> >>> Dec 14 12:00:05 pgsql1 pgpool[5503]: Executing pg_stop_backup
> >>> Dec 14 12:02:05 pgsql1 pgpool[5529]: Executing pgpool-recovery-pitr as user
> >>> postgres
> >>> Dec 14 12:02:05 pgsql1 pgpool[5530]: Executing pg_switch_xlog
> >>> Dec 14 12:02:05 pgsql1 pgpool[5539]: Executing pgpool_remote_start as user
> >>> postgres
> >>> Dec 14 12:02:05 pgsql1 pgpool[5540]: Starting remote PostgreSQL server
> >>>
> >>>
> >>> ps. I used the Jaume's article (at linuxsilo) as reference to implement
> >>> pgpool...
> >>>
> >>> Tks for your help
> >>>
> >>>
> >>>
> >>> --
> >>> Marcos Davi Reis
> >>> Mova
> >>> www.movaomundo.com
> >>> +55 21 3553-1511
> >>> +55 21 9923-8319
> >>
> >
> >
> >
> > --
> > Marcos Davi Reis
> > Mova
> > www.movaomundo.com
> > +55 21 3553-1511
> > +55 21 9923-8319
> >
> 
> 
> 
> -- 
> Marcos Davi Reis
> Mova
> www.movaomundo.com
> +55 21 3553-1511
> +55 21 9923-8319