Learning To Understand Web Site Update Requests William W. Cohen, Einat Minkov, Anthony Tomasic.
CS848 Paper Presentaonkmsalem/courses/CS848W10/...CS848 Paper Presentaon Scalable Query Result...
Transcript of CS848 Paper Presentaonkmsalem/courses/CS848W10/...CS848 Paper Presentaon Scalable Query Result...
CS848 Paper Presenta.on Scalable Query Result Caching for
Web Applica9ons
Garrod, Manjhi, Ailamaki, Maggs, Mowry, Olston, Tomasic
PVLDB 2008
Presented by Rehan Rauf
David R. Cheriton School of Computer Science University of Waterloo
18th January 2010
1
Problem
• Central database server becomes a boQle‐neck in Content Distributed Networks
• Replica.on of database among mul.ple servers removes boQle‐neck but … • requires low latency consistency which conflicts with low latency between user and server • does not scale linearly with cost.
• Tradi.onal proxy‐cache based solu.ons remove boQle‐neck but … • inefficiently maintain consistency • scalability is limited by low cache hit‐rate
2
High‐level Architecture of Ferdinand
Proxy Server
Proxy Server
Proxy Server
Proxy Server
DHT Overlay
Publish/ Subscribe
Central Server
3
Ferdinand Proxy Server
4
Opera.ons
Each query has a master proxy server
Proxy Server Master Proxy
Server
Q
Central Server
5
Opera.ons
Proxy Server Master Proxy
Server
Q
Central Server
6
Opera.ons
Proxy Server Master Proxy
Server
Q
Central Server
7
Opera.ons
Proxy Server Master Proxy
Server
Q
Central Server
8
Opera.ons
Proxy Server Master Proxy
Server
Q
Central Server
9
Opera.ons
Proxy Server Master Proxy
Server
Q
Central Server
10
Opera.ons
Proxy Server Master Proxy
Server
U
Central Server
Proxy Server
11
Opera.ons
Proxy Server Master Proxy
Server
U
Central Server
Proxy Server
Update no.fica.on
12
Proxy Server Cache
• When proxy server receives update no.fica.on, the consistency module invalidates the query result.
• Even if results can be processed using previously cached results, it is not done
13
Advantages/ Disadvantages
• Advantages • High cache hit rate. • More work offloaded from the backend server
• Disadvantage • Latency cost of an over all cache miss is greater
14
Consistency Management
• Any proxy caching a par.cular query must be no.fied when an update affects the result of that query.
• Query Update Mul.cast Associa.on (QUMA) is used to ensure this requirement.
15
QUMA Solu.on
• Mul.cast groups are created. • Offline analysis of applica.on’s template database queries is performed. • Independence analysis is done to determine independence of query‐update pairs.
• Goals • Any update no.fica.on published to a group should affect each query subscribed to that group • Related queries should be clustered into same mul.cast group to reduce number of no.fica.ons.
• Each cached query subscribes to appropriate mul.cast groups. • Updates are published to these groups and hence reach the appropriate proxy server.
16
QUMA Example
17
QUMA Solu.on
• Selec.on predicates are only prac.cal when they are equality based. • The process is not automated yet.
18
Consistency Management
• For each mul.cast group, there is a master mul.cast group. • used for communica.on with master proxies.
19
Cache Miss
Proxy Server
Q
20
Cache Miss
Proxy Server
Q
G
G
G
subscribe
21
Cache Miss
Proxy Server
Q
G
G
G
Master Proxy
22
Cache Miss
Proxy Server
Q
G
G
G
Master Proxy
23
Cache Miss
Proxy Server
Q
G
G
G
Master Proxy
MG
MG
MG
24
Cache Miss
Proxy Server
Q
G
G
G
Master Proxy
MG
MG
MG
Central Server
25
Update
Proxy Server
U
Master Proxy
Central Server
26
Update
Proxy Server
U
Master Proxy
MG
MG
MG
Central Server
27
Update
Proxy Server
U
Master Proxy
MG
MG
MG
Central Server
28
Update
Proxy Server
U
Master Proxy
MG
MG
MG
Central Server
G
G
G
29
Update
Proxy Server
U
Master Proxy
MG
MG
MG
Central Server
G
G
G
Proxy Server
Proxy Server
30
Consistency Management
• Ensures that query cache remains coherent with the central DB but.. • Guarantees full consistency only for single statement transac.ons.
• Requires a reliable publish/subscribe system.
• Requires serializability guarantees from central database
31
Implementa.on
• JDBC Driver • Proxy runs Apache tomcat as sta.c cache • Pastry overlay as DHT ( is based on PRR tree) • Scribe for publish scribe
• does not guarantee reliable delivery. • Ferdinand cache map is stored in MySQL4 • Backend database is MYSQL4
32
Evalua.on
• Performance comparison with several alterna.ve approaches.
• Performance of DHT based coopera.ve caching in varying network latency scenarios.
• Publish/subscribe vs simple broadcast‐based system
33
Evalua.on
• Emulab testbed • Proxy ran on 3 GHz Intel Pen.um Xeon, 1GB ram, 10,000 RPM SCSI disk. • Benchmark clients on 850 MHz client servers. • Benchmark
• TPC‐W bookstore • RUBiS auc.on • RUBBos bulle.n board
• Conforms to TPC‐W model of emulated browsers
34
Evalua.on: Comparison to other approaches
35
Evalua.on: Cache Miss Rate
36
Evalua.on: Throughput as a func.on of latency
37
Evalua.on: Throughput compared to broadcast‐based consistency
38
39
Closing Observa.ons
• No Scalability evalua.on shown.
• Not suitable for update intensive applica.on.
• Requires offline analysis of database requests.
• Failure Scenarios need to be deal with.
• No consistency for mul. statement transac.ons.