[Pgpool-general] General pgpool locking issues

Tatsuo Ishii ishii at sraoss.co.jp
Mon Aug 7 12:20:06 UTC 2006


You seem to use totally different envrionment this time. Last time you
used at least 7.4 or later version of PostgreSQL. Now you use 7.3 or
older version of PostgreSQL. I'm not sure what version of pgpool now
you are using.

Anyway, because your log seems imcomplete in the most important line, 

> pgpool: 2006-07-11 11:00:57 ERROR: pid 13899: pool_process_query: kind
> does not match between backends master(E) secondar

this is just a guess: it seems your SELECT fails on master (the log
says master returns "E", which is an error message packet from
PostgreSQL). I don't know what secondary returned since the log ends
in a halfway, but it definitely returns other than error packet. You
can issue the select statement to each PostgreSQL directly to see what
happens.
SELECT
maz_media.id,maz_media.kassetten_id,maz_media.beschreibu
ng,maz_media.title,maz_media.mandanten_id,maz_segment.timeanfang,maz_med
ia.media_id,maz_media.entrytime,maz_media.entryoper,maz_media.dupop
er,maz_segment.timelaenge,maz_segment.timeende from maz_media,
maz_segment where maz_media.id LIKE '%7%' and maz_media.mandanten_id='1'
and
maz_media.media_id = maz_segment.media_id
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Hi,
> 
> I got exactly the same behaviour, as I posted several weeks ago.
> One strange thing about this all is, that the degeneration happens on a
> regular schedule every 2 1/2 week.
> 
> Here you'll find my log and also the pgpool status result, I hope this
> helps to analyse this. 
> I would like to emphasize that this particular select statement lasts
> about 30 sec.
> 
> If you need further information, please don't hesitate to get in touch
> with me.
> 
> Log:
> 
> pgpool: 2006-07-11 11:00:24 DEBUG: pid 13899: read kind from frontend
> Q(51)
> pgpool: 2006-07-11 11:00:24 DEBUG: pid 13899: pool_read_string: read all
> from pending data. po:0 len:0
> pgpool: 2006-07-11 11:00:24 DEBUG: pid 13899: Query: SELECT
> maz_media.id,maz_media.kassetten_id,maz_media.beschreibu
> ng,maz_media.title,maz_media.mandanten_id,maz_segment.timeanfang,maz_med
> ia.media_id,maz_media.entrytime,maz_media.entryoper,maz_media.dupop
> er,maz_segment.timelaenge,maz_segment.timeende from maz_media,
> maz_segment where maz_media.id LIKE '%7%' and maz_media.mandanten_id='1'
> and
>  maz_media.media_id = maz_segment.media_id
> pgpool: 2006-07-11 11:00:24 DEBUG: pid 13899: waiting for master
> completing the query
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: read kind from backend P
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_process_query:
> waiting for secondary for data ready
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_process_query: kind
> from backend: P
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
> from pending data. po:6 len:1017
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
> from pending data. po:6 len:1017
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: read kind from backend
> pending data T len: 1016 po: 7
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_process_query: kind
> from backend: T
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
> from pending data. po:12 len:1011
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
> from pending data. po:12 len:1011
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: type oid:
> 1043
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: field
> size: 65535
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: modifier:
> 0
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
> from pending data. po:35 len:988
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
> from pending data. po:35 len:988
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: type oid:
> 1043
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: field
> size: 65535
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: modifier:
> 0
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
> from pending data. po:58 len:965
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
> from pending data. po:58 len:965
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: type oid:
> 1043
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: field
> size: 65535
> pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: modifier:
> 0
>                         .
>                         . (~ 30 sec. for select)
>                         .
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: read kind from backend
> pending data D len: 174 po: 476
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: pool_process_query: kind
> from backend: D
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 6 data:
> 211678
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 5 data:
> S0671
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 23 data:
> T-D1 EINPACKEN MOTIV 67
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 4 data:
> T-D1
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 1 data: 1
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 5 data:
> 19955
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 5 data:
> 17136
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 22 data:
> 2000-10-26 13:02:00+00
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 2 data: RC
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 3 data: 504
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 5 data:
> 20459
> pgpool: 2006-07-11 11:00:57 ERROR: pid 13899: pool_process_query: kind
> does not match between backends master(E) secondar
> pgpool: 2006-07-11 11:00:57 LOG:   pid 13899: notice_backend_error:
> master: 1 fail over request from pid 13899
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: failover_handler called
> pgpool: 2006-07-11 11:00:57 LOG:   pid 13891: starting degeneration.
> shutdown master host clever(5432)
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13896
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13897
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13898
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13899
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13900
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13901
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13902
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13903
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13896: child receives shutdown
> request signal 3
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13897: child receives shutdown
> request signal 3
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13898: child receives shutdown
> request signal 3
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13900: child receives shutdown
> request signal 3
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13901: child receives shutdown
> request signal 3
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13902: child receives shutdown
> request signal 3
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13903: child receives shutdown
> request signal 3
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 31938: I am 31938
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 31939: I am 31939
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 31940: I am 31940
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 31941: I am 31941
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 31942: I am 31942
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 31943: I am 31943
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 31944: I am 31944
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 31945: I am 31945
> pgpool: 2006-07-11 11:00:57 LOG:   pid 13891: degeneration done.
> shutdown master host clever(5432)
> pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: reap_handler called
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: I am 31945 accept fd 6
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: Protocol Major: 2 Minor: 0
> database: SmartMaz user: abit
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: trying clear text password
> authentication
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from backend
> pending data Z len: 0 po: 0
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_process_query: kind
> from backend: Z
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from frontend
> Q(51)
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_read_string: read all
> from pending data. po:0 len:0
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: Query:
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from backend I
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_process_query: kind
> from backend: I
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from backend
> pending data Z len: 0 po: 0
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_process_query: kind
> from backend: Z
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from frontend
> Q(51)
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_read_string: read all
> from pending data. po:0 len:0
> pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: Query: select version()
> 
> 
> Pgpool status:
> 
> listen_addresses             | *
> | host name(s) or IP address(es) to listen to
>  port                         | 9999
> | pgpool accepting port number
>  socket_dir                   | /tmp
> | pgpool socket directory
>  backend_host_name            | clever
> | master backend host name
>  backend_port                 | 5432
> | master backend port number
>  secondary_backend_host_name  | smart
> | secondary backend host name
>  secondary_backend_port       | 5432
> | secondary backend port number
>  num_init_children            | 8
> | # of children initially pre-forked
>  child_life_time              | 0
> | if idle for this seconds, child exits
>  connection_life_time         | 0
> | if idle for this seconds, connection closes
>  child_max_connections        | 0
> | if max_connections received, chile exits
>  max_pool                     | 4
> | max # of connection pool per child
>  logdir                       | /var/run
> | logging directory
>  backend_socket_dir           | /tmp
> | Unix domain socket directory for the PostgreSQL server
>  replication_mode             | 1
> | non 0 if operating in replication mode
>  replication_strict           | 1
> | non 0 if operating in strict mode
>  replication_timeout          | 5000
> | if secondary does not respond in this milli seconds, abort the session
>  load_balance_mode            | 0
> | non 0 if operating in load balancing mode
>  weight_master                | 0.500000
> | weight of master
>  weight_secondary             | 0.500000
> | weight of secondary
>  replication_stop_on_mismatch | 1
> | stop replication mode on fatal error
>  reset_query_list             | ABORT; RESET ALL; SET SESSION
> AUTHORIZATION DEFAULT;  | queries issued at the end of session
>  print_timestamp              | 1
> | if true print time stamp to each log line
>  master_slave_mode            | 0
> | if true, operate in master/slave mode
>  connection_cache             | 1
> | if true, cache connection pool
>  health_check_timeout         | 60
> | health check timeout
>  health_check_period          | 600
> | health check period
>  health_check_user            | postgres
> | health check user
>  insert_lock                  | 0
> | insert lock
>  ignore_leading_white_space   | 0
> | ignore leading white spaces
>  current_backend_host_name    | clever
> | current master host name
>  current_backend_port         | 5432
> | current master port #
>  replication_enabled          | 1
> | non 0 if actually operating in replication mode
>  master_slave_enabled         | 0
> | non 0 if actually operating in master/slave
>  num_reset_queries            | 3
> | number of queries in reset_query_list
>  server_status                | master(clever on 5432) up
> secondary(smart on 5432) up | server status
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > Thank you very much for the reply.  Its helpful.
> 
> You are welcome.
> 
> > We have now turned on load_balancing and have set the weight to 1 and 
> > 0 (for db0 and db1).  And we still are getting "kind mismatches".
> > 
> > Is this weight distribution of 1/0 valid?  Or does 1/0 overwrite the
> load
> > balancing parameter.    Is this not a recommended setting?
> 
> Let me look into this.
> 
> > We do direct SELECT connects to db1 via port 5432, thus we wanted 
> > pgpool to just select from db0 vs do a 50/50 load balancing.
> > 
> > If the 1/0 setting is wrong let us know and we will change it so we 
> > will do something like 9/1 type load balancing.  If the 1/0 weight 
> > setting is valid why would we still be seeing "kind mismatches" on
> selects?
> > 
> > Thanks in advance for any suggestions/input here.
> 
> Can you please provide debugging out put of pgpool by turing on -d and
> -n option? We need info for when "kind mismatches" erros occures in your
> setting.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> 
> > Thanks,
> > John
> > 
> > 
> > 
> > On 7/30/06, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> > >
> > > > We have been testing with pgpool for a little while here and have 
> > > > run
> > > into
> > > > various locking (or lack there of) issues.
> > > >
> > > > Our setup is 2 databases (pg v8.1.4) on seperate machines with all
> 
> > > > db connections connected via pgpool.  We have run into situations 
> > > > where we
> > > the
> > > > 1st or 2nd database was not getting the update or insert fast 
> > > > enough,
> > > and a
> > > > select query hitting pgpool at the same time causing the a
> "mismatch"
> > > error
> > > > causing pgpool to degrade and stop replicating.
> > > >
> > > > Although I assume this must be quite a normal issue, I do not see 
> > > > many people talking about it here.
> > > >
> > > > Can someone give me a basic rundown of the "best practices" for pg
> > > locking
> > > > on a pretty database intensive web app?
> > > >
> > > > On Inserts the best practice is to lock the entire table (LOCK 
> > > > TABLE
> > > ...) ?
> > > >
> > > > On Updates the best practice is to do a row lock (SELECT ... FOR
> > > UPDATE  or
> > > > FOR SHARE) ?
> > > > Should you do some sort of lock on SELECTS?
> > > > Do you recommend everything be wrapped in Transactions?
> > > >
> > > > Any advice is appreciated.
> > > >
> > > > Thank you.
> > >
> > > What you have seen is a table updating timing difference between 
> > > master and secondary as yo suspected. Although there's no way to 
> > > avoid the timing difference itself, I think there are some ways to 
> > > avoid this problem.
> > >
> > > 1) use load balancing. Since this will SELECT one of the servers,
> > >    there will be no chance to see the "kind mismatch error".
> > >
> > > 2) use table locking. This will prevent the SELECT from seeing yet
> > >    updated data in the secondary server.
> > >
> > > I don't think SELECT FOR UPDATE will work, since the lock does not 
> > > block SELECT.
> > > --
> > > Tatsuo Ishii
> > > SRA OSS, Inc. Japan
> > >
> 
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general
> 


More information about the Pgpool-general mailing list