[Pgpool-general] Database nodes mismatch after failback

Tue Dec 15 00:36:29 UTC 2009

> i'm using pgpool II with postgres 8.3 and Ubuntu as OS.
> I have 2 database nodes configured with load balance and replication mode
> and the replication_stop_on_mismatch is set to true.
> the database has a moderated load, about 27.000 records a day, an average of
> 18 records each minute, inserted trough a stored procedure that is called by
> the application, to call the procedure i use /*REPLICATION*/ hint because
> the proc changes data in DB. is it correct? is it necessary?
> 
> i'm having 2 problems..
> 
> 1 - sometimes i see in the logfile a message of difference between nodes,
> but the node isn't being degenerated, and other times the degeneration
> works..
> 
> Example, here the node wasn't degenerated...
> 
> Dec 14 12:04:56 pgsql1 pgpool: 2009-12-14 12:04:56 ERROR: pid 5558: pgpool
> detected difference of the number of inserted, updated or deleted tuples.
> Possible last query was: "UPDATE unit_alarm_violation SET read_by_user_id =
> 41 , read_date = now() , status = 2 WHERE id  = 3114 AND status = 1"

Yes, in this case pgpool-II does not do fail over. There were
discussions if we should fail over in this case or not. Maybe we
should add a knob to control the behavior?

> and few minutes after...
> 
> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5558: pgpool
> detected difference of the number of inserted, updated or deleted tuples.
> Possible last query was: "UPDATE unit_alarm_violation SET read_by_user_id =
> 41 , read_date = now() , status = 2 WHERE id  = 3114 AND status = 1"
> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5570:
> read_kind_from_backend: 1 th kind N does not match with master or majority
> connection kind D
> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5570: kind
> mismatch among backends. Possible last query was: "/*REPLICATION*/ select
> mova_rcv_tracker_data(332775,'2009-12-14 14:03:51'::timestamp without time
> zone, -2.53511, -44.2782, 0 , 3, 25.7428, 1::boolean, 1::boolean,
> 0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean,
> 0::boolean, 1::boolean,1::boolean, 10042016, 787, 12, 0, 0, 0, 0, 65251,
> 0::boolean)" kind details are: 0[D] 1[N]

[D] means data packet. So the SELECT on the master node returned
data. So far so good. [N] means notification packet (when you see
"NOTICE..." N packet is sent from PostgreSQL). So it seems the slave
returns NOTICE message while it should return data packet. You should
be able to see what NOTICE message was sent by looking at PostgreSQL
log on the slave. Can you show me it?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5570:
> notice_backend_error: 1 fail over request from pid 5570
> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
> starting degeneration. shutdown host pgsql2(5432)
> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
> failover_handler: set new master node: 0
> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
> failover done. shutdown host pgsql2(5432)
> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5727:
> connection received: host=10.20.24.35 port=60233
> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5727:
> statement: select u.id, r.name, u.rule_message, r.email, u.unit_label,
> l.description as "level", TO_CHAR(u.date, 'DD/MM/YYYY HH24:MI:SS') as data
>  , u.street ||', '||number||' - '||region||' - '||city||', '||state as
> address  FROM  alarm_rule r  , unit_alarm_violation u  , alarm_rule_level l
>  WHERE  u.rule_id = r.id  AND email_send = false  AND l.id = u.rule_level
> ...
> ...
> 
> 
> 
> 2 - My other problem is to recover a node.
> After failover i do the pcp_recovery_node, and the process works without any
> error, the rsync works fine, but the database keeps with differences, the
> node backs and goes down after few seconds.
> 
> 
> Dec 14 11:58:28 pgsql1 pgpool[5392]: Executing base-backup as user postgres
> Dec 14 11:58:28 pgsql1 pgpool[5393]: Executing pg_start_backup
> Dec 14 11:58:28 pgsql1 pgpool[5396]: Creating file recovery.conf
> Dec 14 11:58:28 pgsql1 pgpool[5397]: Rsyncing directory base
> Dec 14 11:58:28 pgsql1 pgpool[5400]: Rsyncing directory global
> Dec 14 11:58:29 pgsql1 pgpool[5403]: Rsyncing directory pg_clog
> Dec 14 11:58:29 pgsql1 pgpool[5406]: Rsyncing directory pg_multixact
> Dec 14 11:58:30 pgsql1 pgpool[5410]: Rsyncing directory pg_subtrans
> Dec 14 11:58:30 pgsql1 pgpool[5413]: Rsyncing directory pg_tblspc
> Dec 14 11:59:45 pgsql1 pgpool[5473]: Rsyncing directory pg_twophase
> Dec 14 11:59:46 pgsql1 pgpool[5476]: Rsyncing directory pg_xlog
> Dec 14 12:00:04 pgsql1 pgpool[5500]: Rsyncing file recovery.conf (with
> source deletion)
> Dec 14 12:00:05 pgsql1 pgpool[5503]: Executing pg_stop_backup
> Dec 14 12:02:05 pgsql1 pgpool[5529]: Executing pgpool-recovery-pitr as user
> postgres
> Dec 14 12:02:05 pgsql1 pgpool[5530]: Executing pg_switch_xlog
> Dec 14 12:02:05 pgsql1 pgpool[5539]: Executing pgpool_remote_start as user
> postgres
> Dec 14 12:02:05 pgsql1 pgpool[5540]: Starting remote PostgreSQL server
> 
> 
> ps. I used the Jaume's article (at linuxsilo) as reference to implement
> pgpool...
> 
> Tks for your help
> 
> 
> 
> -- 
> Marcos Davi Reis
> Mova
> www.movaomundo.com
> +55 21 3553-1511
> +55 21 9923-8319