Cache Provisioning for Interactive NLP...
Transcript of Cache Provisioning for Interactive NLP...
![Page 1: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/1.jpg)
Cache Provisioning for Interactive NLP Services
Jaimie Kelley and Christopher StewartThe Ohio State University
Sameh Elnikety and Yuxiong HeMicrosoft Research
![Page 2: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/2.jpg)
Interactive NLP
“Watson looks at the language. It tries to understand what is being said and find the most appropriate, relevant answer...”
Rob High, IBM Fellow
● Natural Language Processing (NLP) Service– Users issue queries via network messages
– Analyzes and understands human text in context
– Response times should be fast (bounded latency)
– Examples: Bing, Google, IBM Watson, OpenEphyra
![Page 3: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/3.jpg)
Interactive NLP
NLP Service Layer
Lucene
Scalable, Distributed Main Memory Storage
Memcache-1 Memcache-N
Persistent Disk Storage Layer
MYSQL-1 MYSQL-M
Data Flow
1. NLP processing: Extract Keywords and synonyms
2. Fetch relevant data
inverted indexes
raw content
3. More processing, reply
OpenEphyra
Query: Who volunteered as District 12's tribute in the Hunger Games?
Answer: Katniss Everdeen
![Page 4: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/4.jpg)
Interactive NLP Services
NLP Service Layer
Lucene
Scalable, Distributed Main Memory Storage
Memcache-1 Memcache-N
Persistent Disk Storage Layer
MYSQL-1 MYSQL-M
● To compete with Jeopardy champions, IBM Watson had 3 sec. latency bound
● Our experience: 8th graders -> 4 sec. Bound
● Internet services demand sub-second response times
Tight Latency Bounds
● Access DRAM in parallel
● Disk accesses timeout!
OpenEphyra
![Page 5: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/5.jpg)
Problem: Data Growth
Data is growing too fast to keep in main memory.
Wor
ld D
ata
(Bill
ions
of
GB
)
90% 2009-2011
62%2008-2009
Sources: IDC, Radicati Group, Facebook, TR research, Pew Internet
![Page 6: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/6.jpg)
Cache Management
● When the data is too large to fit in cache, what should we evict?
● Traditional Compute Workloads:
Every data access is needed to answer a query
Evict least recently used (LRU)
● NLP services:
Only some data are needed (redundant content)
Remove redundant data with little quality loss
![Page 7: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/7.jpg)
Quality-Aware Cache Management
NLP Service Layer
Main Memory Storage
Persistent Disk Storage Layer
New DataHarry PotterChapter 1
New York TimesNov 2006
Hunger GamesChapter 9
New York TimesOct 2006What Data Will
LRU Evict?New York Times or
Hunger Games
For NLP services, evict data that will cause the least quality loss.
![Page 8: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/8.jpg)
Main Memory Storage
NLP Service Layer
Our Approach
● Does existing cache management work well for NLP?
NLP Service Layer
Main Memory StorageSize < Data Growth
Most Relevant Document:New York Times Oct 2006
Query: Who volunteered as District 12's tribute?
Size > Data Growth
Real System Ideal System
Most Relevant Document:Hunger Games, Chapter 9≠
We measure quality lossi.e., dissimilarity between real and ideal
![Page 9: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/9.jpg)
NLP Service Layer
Main Memory StorageSize: 40 GB
Cache Miss Rate: 20%Avg. Quality Loss: 15%
Our Approach
● Can quality-aware cache management reduce provisioning costs over time?
NLP Service Layer
Main Memory StorageSize: 20 GB
Cache Miss Rate: 40%Avg. Quality Loss: 15%
![Page 10: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/10.jpg)
Outline
● Introduction
● Defining Quality Loss– Intuition, base model, full model
● Quality Loss in NLP Services– Representative queries, data sets, infrastructure,
results
● Quality-Aware Cache Provisioning
● Conclusion
![Page 11: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/11.jpg)
Intuition: What is quality loss?
Real System:
Query: Who volunteered as District 12's tribute?
Answers:
Katniss Foxface
Harry Potter
Peeta Mellark
Jeanine F. Pirro
Ideal System:
Query: Who volunteered as District 12's tribute?
Answers:
Katniss Everdeen
Peeta Mellark
Prim
Foxface
![Page 12: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/12.jpg)
Base Model: Quality Loss
S(w, ŵ, D, Q) = 1 - ∑
Q ∑
KФ(∑
k2 |R
q,k (ŵ, D) ∩ R
q,k2 (w, D)|)
|Q|K
Harry Potter
Jeanine F. Pirro
Katniss Everdeen
Prim
Foxface
Peeta Mellark
Real Real & Ideal Ideal
![Page 13: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/13.jpg)
Full Model: Quality Loss
● NLP responses present challenges:
– Synonyms● Answers from ideal setup fall within categories● Real setup should match categories
Query: Flowers in Washington State
Answers: florists, gardening, Coast Rhododendron
– Noise Tolerance● Answers from the real setup can be a superset
of answers from ideal setup
![Page 14: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/14.jpg)
Outline
● Introduction
● Defining Quality Loss– Intuition, base model, full model
● Quality Loss in NLP Services– Query trace, data sets, infrastructure, results
● Quality-Aware Cache Provisioning
● Conclusion
![Page 15: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/15.jpg)
Infrastructure
NLP Service
Lucene
Scalable Main Memory
Redis
Disk Storage
DiskSim
Real NLP Service:10 sec latency boundAnalyzes keywords
Ranks all indexes requested from storage.
Distributed Cache:Set max size < | data |
Implemented interface between cache and disk.
Disk Storage:Two 3-TB hard disks
We used Lucene libraries
Ideal NLP Service:Exact same processing
Distributed Cache:No set maximum size
9 GB / cache node
Provision more as needed
Disk Storage:No timeouts
Key Insight: Ideal setup returns the result created by processing all relevant data without timeouts.
![Page 16: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/16.jpg)
Obtaining a Query Trace
● Google Trends
– Trace of most popular queries per category 2004-2013
– Most (over 70%) are multiple word queries
Sept. 2013 Books:
The BibleFifty Shades of Grey
The Great GatsbyUnder the Dome
The Hunger GamesPsalms
The Lord of the RingsSword Art Online
1984The 85 Ways to Tie a Tie
June 2009 Books:
The BibleAlice's Adventures in ...The Lord of the Rings
Midnight Sun1984
Kama SutraRomeo and Juliet
QuranDiagnostic and ...
Diccionario de la lengua...
Jan. 2004 Books:
The BibleThe Lord of the Rings
The Da Vinci Code1984
Kama SutraRomeo and Juliet
HamletMacbeth
To Kill a MockingbirdThe World Factbook
![Page 17: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/17.jpg)
Data Sets
Wikipedia
January 2001 – March 2013
Total Index size: 4.7 TB
Max Data/Month: 30 GB
New York Times
October 2004 – March 2006
Total Index size: 3 GB
Max Data/Month: 88 MB
![Page 18: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/18.jpg)
Types of Data
Content:
Key: HG_ch9.txt
Value:... Katniss Everdeen, tribute from District Twelve
Inverted Index:
Key: Katniss_term
Value:
HG_ch2.txtHG_ch9.txt
Katniss
Everdeen
tribute
District
Twelve
Key: Tribute_term
Value:
New York Times 10/22WikipediaHG_ch9.txt
![Page 19: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/19.jpg)
Caching Policies: LRU
● Least Recently Used Cache Management
– Common approach in distributed stores
– Implemented in Redis
● Infrequent search terms are sent to disk, unable to be accessed within latency bounds
Main Memory StorageTribute(index)
Broomstick(content)
Gift(index)
NY Times11/2013
Main Memory StorageTribute(index)
Gift(index)
New Data Eviction Replacement
NY Times11/2013
![Page 20: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/20.jpg)
Caching Policies: LRU
LRU
1 1.5 2 2.5 3 3.5 40%
20%
40%
60%
80%
100%NYT WIKI
Data / DRAM
Qua
lity
Loss
(%)
Quality loss rises as terms are evicted
Multiple word queries and single-word synonyms benefit from redundancy
● Non-evicted data may overlap evicted data
● Answers from ideal system can come from non-evicted terms
In both Wikipedia pages and NYT articles, content related to queries were found in non-evicted terms.
![Page 21: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/21.jpg)
Caching Policies: Limit Content
Content:
Key: HG_ch9.txt
Value:... Katniss Everdeen, tribute from District Twelve
Inverted Index:
Key: Katniss_term
Value:
HG_ch2.txtHG_ch9.txt
Key: Tribute_term
Value:
New York Times 10/22WikipediaHG_ch9.txt
Katniss
Everdeen
tribute
District
Twelve
![Page 22: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/22.jpg)
1 1.5 2 2.5 3 3.5 40%
20%
40%
60%
80%
100% NYT WIK I
D a ta / D R AM
Caching Policies: Limit Content
Limit Content Quality loss rises as key documents are excluded from index
Documents may contain content that effectively has the same meaning
Wikipedia pages are less redundant, as a result of their review process
On average 1 of 2 NYT articles can be removed with low quality loss
![Page 23: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/23.jpg)
Outline
● Introduction
● Defining Quality Loss– Intuition, base model, full model
● Quality Loss in NLP Services– Representative queries, data sets, infrastructure,
results
● Quality-Aware Cache Provisioning
● Conclusion
![Page 24: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/24.jpg)
DRAM Trends
Provisioning Cost = #GB RAM * $/GB RAMhttp://www.jcmit.com/memoryprice.htm
2005 2007 2009 2011 20130
50
100
150
Year
$/G
B R
AM
![Page 25: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/25.jpg)
Cache Provisioning Policies
● Over provisioning
– Always have more RAM available than data
● Provision on Data Growth
– Wait for more data to be added
● Provision on Quality Loss
– Wait for quality loss to pass a threshold
Main Memory StorageTribute(index)
Broomstick(content)
Gift(index)
NY Times11/2013
Replace or Provision?
![Page 26: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/26.jpg)
Cost Savings
LRU Limit Content
0 1 2 3 4 5 6 7 80%
20%
40%
60%
80%
100%OverprovisioningProvision on Data GrowthProvision on Quality Loss
#Months
Rela
tive
Cost
0 5 10 150%
20%
40%
60%
80%
100%OverprovisioningProvision on Data GrowthProvision on Quality Loss
#Months
Rela
tive
Cost
![Page 27: Cache Provisioning for Interactive NLP Servicesweb.cse.ohio-state.edu/~stewart.962/Papers/kelleyladis13-pres.pdf · Quality-Aware Cache Management NLP Service Layer Main Memory Storage](https://reader030.fdocuments.net/reader030/viewer/2022041016/5ec7c3af514a3369d578f46e/html5/thumbnails/27.jpg)
Conclusion
● Data is growing fast, forcing NLP services to respond to queries after accessing only a portion of the data
● NLP services can remove redundant content and/or terms from distributed caches with little quality loss
● New cache management approach: Wait until quality loss occurs before provisioning DRAM to reduce costs