[Pgpool-general] General pgpool locking issues

Mon Aug 7 12:03:59 UTC 2006

Hi,

I got exactly the same behaviour, as I posted several weeks ago.
One strange thing about this all is, that the degeneration happens on a
regular schedule every 2 1/2 week.

Here you'll find my log and also the pgpool status result, I hope this
helps to analyse this. 
I would like to emphasize that this particular select statement lasts
about 30 sec.

If you need further information, please don't hesitate to get in touch
with me.

Log:

pgpool: 2006-07-11 11:00:24 DEBUG: pid 13899: read kind from frontend
Q(51)
pgpool: 2006-07-11 11:00:24 DEBUG: pid 13899: pool_read_string: read all
from pending data. po:0 len:0
pgpool: 2006-07-11 11:00:24 DEBUG: pid 13899: Query: SELECT
maz_media.id,maz_media.kassetten_id,maz_media.beschreibu
ng,maz_media.title,maz_media.mandanten_id,maz_segment.timeanfang,maz_med
ia.media_id,maz_media.entrytime,maz_media.entryoper,maz_media.dupop
er,maz_segment.timelaenge,maz_segment.timeende from maz_media,
maz_segment where maz_media.id LIKE '%7%' and maz_media.mandanten_id='1'
and
 maz_media.media_id = maz_segment.media_id
pgpool: 2006-07-11 11:00:24 DEBUG: pid 13899: waiting for master
completing the query
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: read kind from backend P
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_process_query:
waiting for secondary for data ready
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_process_query: kind
from backend: P
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
from pending data. po:6 len:1017
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
from pending data. po:6 len:1017
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: read kind from backend
pending data T len: 1016 po: 7
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_process_query: kind
from backend: T
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
from pending data. po:12 len:1011
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
from pending data. po:12 len:1011
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: type oid:
1043
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: field
size: 65535
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: modifier:
0
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
from pending data. po:35 len:988
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
from pending data. po:35 len:988
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: type oid:
1043
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: field
size: 65535
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: modifier:
0
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
from pending data. po:58 len:965
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: pool_read_string: read all
from pending data. po:58 len:965
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: type oid:
1043
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: field
size: 65535
pgpool: 2006-07-11 11:00:28 DEBUG: pid 13899: RowDescription: modifier:
0
                        .
                        . (~ 30 sec. for select)
                        .
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: read kind from backend
pending data D len: 174 po: 476
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: pool_process_query: kind
from backend: D
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 6 data:
211678
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 5 data:
S0671
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 23 data:
T-D1 EINPACKEN MOTIV 67
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 4 data:
T-D1
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 1 data: 1
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 5 data:
19955
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 5 data:
17136
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 22 data:
2000-10-26 13:02:00+00
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 2 data: RC
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 3 data: 504
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13899: AsciiRow: len: 5 data:
20459
pgpool: 2006-07-11 11:00:57 ERROR: pid 13899: pool_process_query: kind
does not match between backends master(E) secondar
pgpool: 2006-07-11 11:00:57 LOG:   pid 13899: notice_backend_error:
master: 1 fail over request from pid 13899
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: failover_handler called
pgpool: 2006-07-11 11:00:57 LOG:   pid 13891: starting degeneration.
shutdown master host clever(5432)
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13896
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13897
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13898
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13899
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13900
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13901
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13902
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: kill 13903
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13896: child receives shutdown
request signal 3
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13897: child receives shutdown
request signal 3
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13898: child receives shutdown
request signal 3
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13900: child receives shutdown
request signal 3
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13901: child receives shutdown
request signal 3
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13902: child receives shutdown
request signal 3
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13903: child receives shutdown
request signal 3
pgpool: 2006-07-11 11:00:57 DEBUG: pid 31938: I am 31938
pgpool: 2006-07-11 11:00:57 DEBUG: pid 31939: I am 31939
pgpool: 2006-07-11 11:00:57 DEBUG: pid 31940: I am 31940
pgpool: 2006-07-11 11:00:57 DEBUG: pid 31941: I am 31941
pgpool: 2006-07-11 11:00:57 DEBUG: pid 31942: I am 31942
pgpool: 2006-07-11 11:00:57 DEBUG: pid 31943: I am 31943
pgpool: 2006-07-11 11:00:57 DEBUG: pid 31944: I am 31944
pgpool: 2006-07-11 11:00:57 DEBUG: pid 31945: I am 31945
pgpool: 2006-07-11 11:00:57 LOG:   pid 13891: degeneration done.
shutdown master host clever(5432)
pgpool: 2006-07-11 11:00:57 DEBUG: pid 13891: reap_handler called
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: I am 31945 accept fd 6
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: Protocol Major: 2 Minor: 0
database: SmartMaz user: abit
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: trying clear text password
authentication
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from backend
pending data Z len: 0 po: 0
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_process_query: kind
from backend: Z
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from frontend
Q(51)
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_read_string: read all
from pending data. po:0 len:0
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: Query:
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from backend I
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_process_query: kind
from backend: I
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from backend
pending data Z len: 0 po: 0
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_process_query: kind
from backend: Z
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: read kind from frontend
Q(51)
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: pool_read_string: read all
from pending data. po:0 len:0
pgpool: 2006-07-11 11:03:53 DEBUG: pid 31945: Query: select version()

Pgpool status:

listen_addresses             | *
| host name(s) or IP address(es) to listen to
 port                         | 9999
| pgpool accepting port number
 socket_dir                   | /tmp
| pgpool socket directory
 backend_host_name            | clever
| master backend host name
 backend_port                 | 5432
| master backend port number
 secondary_backend_host_name  | smart
| secondary backend host name
 secondary_backend_port       | 5432
| secondary backend port number
 num_init_children            | 8
| # of children initially pre-forked
 child_life_time              | 0
| if idle for this seconds, child exits
 connection_life_time         | 0
| if idle for this seconds, connection closes
 child_max_connections        | 0
| if max_connections received, chile exits
 max_pool                     | 4
| max # of connection pool per child
 logdir                       | /var/run
| logging directory
 backend_socket_dir           | /tmp
| Unix domain socket directory for the PostgreSQL server
 replication_mode             | 1
| non 0 if operating in replication mode
 replication_strict           | 1
| non 0 if operating in strict mode
 replication_timeout          | 5000
| if secondary does not respond in this milli seconds, abort the session
 load_balance_mode            | 0
| non 0 if operating in load balancing mode
 weight_master                | 0.500000
| weight of master
 weight_secondary             | 0.500000
| weight of secondary
 replication_stop_on_mismatch | 1
| stop replication mode on fatal error
 reset_query_list             | ABORT; RESET ALL; SET SESSION
AUTHORIZATION DEFAULT;  | queries issued at the end of session
 print_timestamp              | 1
| if true print time stamp to each log line
 master_slave_mode            | 0
| if true, operate in master/slave mode
 connection_cache             | 1
| if true, cache connection pool
 health_check_timeout         | 60
| health check timeout
 health_check_period          | 600
| health check period
 health_check_user            | postgres
| health check user
 insert_lock                  | 0
| insert lock
 ignore_leading_white_space   | 0
| ignore leading white spaces
 current_backend_host_name    | clever
| current master host name
 current_backend_port         | 5432
| current master port #
 replication_enabled          | 1
| non 0 if actually operating in replication mode
 master_slave_enabled         | 0
| non 0 if actually operating in master/slave
 num_reset_queries            | 3
| number of queries in reset_query_list
 server_status                | master(clever on 5432) up
secondary(smart on 5432) up | server status

> Thank you very much for the reply.  Its helpful.

You are welcome.

> We have now turned on load_balancing and have set the weight to 1 and 
> 0 (for db0 and db1).  And we still are getting "kind mismatches".
> 
> Is this weight distribution of 1/0 valid?  Or does 1/0 overwrite the
load
> balancing parameter.    Is this not a recommended setting?

Let me look into this.

> We do direct SELECT connects to db1 via port 5432, thus we wanted 
> pgpool to just select from db0 vs do a 50/50 load balancing.
> 
> If the 1/0 setting is wrong let us know and we will change it so we 
> will do something like 9/1 type load balancing.  If the 1/0 weight 
> setting is valid why would we still be seeing "kind mismatches" on
selects?
> 
> Thanks in advance for any suggestions/input here.

Can you please provide debugging out put of pgpool by turing on -d and
-n option? We need info for when "kind mismatches" erros occures in your
setting.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Thanks,
> John
> 
> 
> 
> On 7/30/06, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> > > We have been testing with pgpool for a little while here and have 
> > > run
> > into
> > > various locking (or lack there of) issues.
> > >
> > > Our setup is 2 databases (pg v8.1.4) on seperate machines with all

> > > db connections connected via pgpool.  We have run into situations 
> > > where we
> > the
> > > 1st or 2nd database was not getting the update or insert fast 
> > > enough,
> > and a
> > > select query hitting pgpool at the same time causing the a
"mismatch"
> > error
> > > causing pgpool to degrade and stop replicating.
> > >
> > > Although I assume this must be quite a normal issue, I do not see 
> > > many people talking about it here.
> > >
> > > Can someone give me a basic rundown of the "best practices" for pg
> > locking
> > > on a pretty database intensive web app?
> > >
> > > On Inserts the best practice is to lock the entire table (LOCK 
> > > TABLE
> > ...) ?
> > >
> > > On Updates the best practice is to do a row lock (SELECT ... FOR
> > UPDATE  or
> > > FOR SHARE) ?
> > > Should you do some sort of lock on SELECTS?
> > > Do you recommend everything be wrapped in Transactions?
> > >
> > > Any advice is appreciated.
> > >
> > > Thank you.
> >
> > What you have seen is a table updating timing difference between 
> > master and secondary as yo suspected. Although there's no way to 
> > avoid the timing difference itself, I think there are some ways to 
> > avoid this problem.
> >
> > 1) use load balancing. Since this will SELECT one of the servers,
> >    there will be no chance to see the "kind mismatch error".
> >
> > 2) use table locking. This will prevent the SELECT from seeing yet
> >    updated data in the secondary server.
> >
> > I don't think SELECT FOR UPDATE will work, since the lock does not 
> > block SELECT.
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >