[Pgpool-general] Database nodes mismatch after failback

Tue Dec 15 14:54:10 UTC 2009

Tatsuo,
the previous mail was sent accidentally unfinished..

there was a "RAISE NOTICE" in the procedure body returning a variable
value for debug purpose (vcheck and vcheckvalidate.. trainee's
job!!).
It was removed now.
Maybe this N in pgpool log was this. let me know if i'm wrong!

        PL/pgSQL function "mova_rcv_tracker_data" line 380 at SQL statement
 2009-12-14 12:05:06 BRST STATEMENT:  /*REPLICATION*/ select
 mova_rcv_tracker_data(700106,'2009-12-14 14:04:36'::timestamp without
 time zone, -2.5078, -44.284, 0 , 5, 55.7452, 1::boolean, 0::boolean,
 0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean,
 0::boolean, 0::boolean, 1::boolean,1::boolean, 1249141, 46, 14, 0, 0,
 0, 0, 29923, 0::boolean)
 2009-12-14 12:05:06 BRST NOTICE:  vcheckvalidate = true
 2009-12-14 12:05:06 BRST CONTEXT:  SQL statement "SELECT * FROM
 MOVA_ANALYZE_ALARM( $1 )"
        PL/pgSQL function "mova_rcv_tracker_data" line 380 at SQL statement
 2009-12-14 12:05:06 BRST NOTICE:  vcheck = false

 .....

> Yes, in this case pgpool-II does not do fail over. There were
> discussions if we should fail over in this case or not. Maybe we
> should add a knob to control the behavior?

 Why pgpool doesn't do failover? what is the difference calling a proc
or executing a single statement?
i think the node needs to be degenerated in this case too..

tks for your help!

Att,
Marcos

> On Mon, Dec 14, 2009 at 9:36 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>> i'm using pgpool II with postgres 8.3 and Ubuntu as OS.
>>> I have 2 database nodes configured with load balance and replication mode
>>> and the replication_stop_on_mismatch is set to true.
>>> the database has a moderated load, about 27.000 records a day, an average of
>>> 18 records each minute, inserted trough a stored procedure that is called by
>>> the application, to call the procedure i use /*REPLICATION*/ hint because
>>> the proc changes data in DB. is it correct? is it necessary?
>>>
>>> i'm having 2 problems..
>>>
>>> 1 - sometimes i see in the logfile a message of difference between nodes,
>>> but the node isn't being degenerated, and other times the degeneration
>>> works..
>>>
>>> Example, here the node wasn't degenerated...
>>>
>>> Dec 14 12:04:56 pgsql1 pgpool: 2009-12-14 12:04:56 ERROR: pid 5558: pgpool
>>> detected difference of the number of inserted, updated or deleted tuples.
>>> Possible last query was: "UPDATE unit_alarm_violation SET read_by_user_id =
>>> 41 , read_date = now() , status = 2 WHERE id  = 3114 AND status = 1"
>>
>> Yes, in this case pgpool-II does not do fail over. There were
>> discussions if we should fail over in this case or not. Maybe we
>> should add a knob to control the behavior?
>>
>>> and few minutes after...
>>>
>>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5558: pgpool
>>> detected difference of the number of inserted, updated or deleted tuples.
>>> Possible last query was: "UPDATE unit_alarm_violation SET read_by_user_id =
>>> 41 , read_date = now() , status = 2 WHERE id  = 3114 AND status = 1"
>>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5570:
>>> read_kind_from_backend: 1 th kind N does not match with master or majority
>>> connection kind D
>>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 ERROR: pid 5570: kind
>>> mismatch among backends. Possible last query was: "/*REPLICATION*/ select
>>> mova_rcv_tracker_data(332775,'2009-12-14 14:03:51'::timestamp without time
>>> zone, -2.53511, -44.2782, 0 , 3, 25.7428, 1::boolean, 1::boolean,
>>> 0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean, 0::boolean,
>>> 0::boolean, 1::boolean,1::boolean, 10042016, 787, 12, 0, 0, 0, 0, 65251,
>>> 0::boolean)" kind details are: 0[D] 1[N]
>>
>> [D] means data packet. So the SELECT on the master node returned
>> data. So far so good. [N] means notification packet (when you see
>> "NOTICE..." N packet is sent from PostgreSQL). So it seems the slave
>> returns NOTICE message while it should return data packet. You should
>> be able to see what NOTICE message was sent by looking at PostgreSQL
>> log on the slave. Can you show me it?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>>
>>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5570:
>>> notice_backend_error: 1 fail over request from pid 5570
>>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
>>> starting degeneration. shutdown host pgsql2(5432)
>>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
>>> failover_handler: set new master node: 0
>>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 19479:
>>> failover done. shutdown host pgsql2(5432)
>>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5727:
>>> connection received: host=10.20.24.35 port=60233
>>> Dec 14 12:05:06 pgsql1 pgpool: 2009-12-14 12:05:06 LOG:   pid 5727:
>>> statement: select u.id, r.name, u.rule_message, r.email, u.unit_label,
>>> l.description as "level", TO_CHAR(u.date, 'DD/MM/YYYY HH24:MI:SS') as data
>>>  , u.street ||', '||number||' - '||region||' - '||city||', '||state as
>>> address  FROM  alarm_rule r  , unit_alarm_violation u  , alarm_rule_level l
>>>  WHERE  u.rule_id = r.id  AND email_send = false  AND l.id = u.rule_level
>>> ...
>>> ...
>>>
>>>
>>>
>>> 2 - My other problem is to recover a node.
>>> After failover i do the pcp_recovery_node, and the process works without any
>>> error, the rsync works fine, but the database keeps with differences, the
>>> node backs and goes down after few seconds.
>>>
>>>
>>> Dec 14 11:58:28 pgsql1 pgpool[5392]: Executing base-backup as user postgres
>>> Dec 14 11:58:28 pgsql1 pgpool[5393]: Executing pg_start_backup
>>> Dec 14 11:58:28 pgsql1 pgpool[5396]: Creating file recovery.conf
>>> Dec 14 11:58:28 pgsql1 pgpool[5397]: Rsyncing directory base
>>> Dec 14 11:58:28 pgsql1 pgpool[5400]: Rsyncing directory global
>>> Dec 14 11:58:29 pgsql1 pgpool[5403]: Rsyncing directory pg_clog
>>> Dec 14 11:58:29 pgsql1 pgpool[5406]: Rsyncing directory pg_multixact
>>> Dec 14 11:58:30 pgsql1 pgpool[5410]: Rsyncing directory pg_subtrans
>>> Dec 14 11:58:30 pgsql1 pgpool[5413]: Rsyncing directory pg_tblspc
>>> Dec 14 11:59:45 pgsql1 pgpool[5473]: Rsyncing directory pg_twophase
>>> Dec 14 11:59:46 pgsql1 pgpool[5476]: Rsyncing directory pg_xlog
>>> Dec 14 12:00:04 pgsql1 pgpool[5500]: Rsyncing file recovery.conf (with
>>> source deletion)
>>> Dec 14 12:00:05 pgsql1 pgpool[5503]: Executing pg_stop_backup
>>> Dec 14 12:02:05 pgsql1 pgpool[5529]: Executing pgpool-recovery-pitr as user
>>> postgres
>>> Dec 14 12:02:05 pgsql1 pgpool[5530]: Executing pg_switch_xlog
>>> Dec 14 12:02:05 pgsql1 pgpool[5539]: Executing pgpool_remote_start as user
>>> postgres
>>> Dec 14 12:02:05 pgsql1 pgpool[5540]: Starting remote PostgreSQL server
>>>
>>>
>>> ps. I used the Jaume's article (at linuxsilo) as reference to implement
>>> pgpool...
>>>
>>> Tks for your help
>>>
>>>
>>>
>>> --
>>> Marcos Davi Reis
>>> Mova
>>> www.movaomundo.com
>>> +55 21 3553-1511
>>> +55 21 9923-8319
>>
>
>
>
> --
> Marcos Davi Reis
> Mova
> www.movaomundo.com
> +55 21 3553-1511
> +55 21 9923-8319
>

-- 
Marcos Davi Reis
Mova
www.movaomundo.com
+55 21 3553-1511
+55 21 9923-8319