A Low-Latency Online Prediction Serving System · 2017-07-10 · Presenter: Alexey Tumanov Daniel...

Presenter: Alexey TumanovDaniel Crankshaw

Corey Zumar, Guilio Zhou, Xin WangMichael Franklin, Joseph Gonzalez, Ion Stoica

BDD/RISE Mini-Retreat 2017May 2, 2017

A Low-Latency Online Prediction Serving System

Clipper

BigData

Complex Model

Training

Learning

BigData

Complex Model

Training

Learning

Learning Produces a Trained Model

“CAT”

Query Decision

BigData

Training

Learning

Application

Decision

Serving

BigData

Training

Learning

Application

Decision

Prediction-Serving for interactive applicationsTimescale: ~10s of milliseconds

Serving

Google Translate

Serving

82,000 GPUs running 24/7

[1] https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html

140 billion words a day1Invented New Hardware!Tensor Processing Unit

Prediction-Serving Raises New Challenges

Prediction-Serving Challenges

Support low-latency, high-throughput serving workloads

???Create VWCaffe 9

Large and growing ecosystem of ML models and frameworks

Models getting more complexØ 10s of GFLOPs [1]

Using specialized hardware for predictions

Deployed on critical pathØ Maintain SLOs under heavy load

[1] Deep Residual Learning for Image Recognition. He et al. CVPR 2015.

???Create VWCaffe 11

ContentRec.

FraudDetection

PersonalAsst.

RoboticControl

MachineTranslation

Create VWCaffe 12

Big Companies Build One-Off Systems

Problems:Ø Expensive to build and maintain

Ø Highly specialized and require ML and systems expertise

Ø Tightly-coupled model and applicationØ Difficult to change or update model

Ø Only supports single ML framework

ContentRec.

FraudDetection

PersonalAsst.

RoboticControl

MachineTranslation

Create VWCaffe

Varying physicalresource requirements

Difficult to deploy andbrittle to manage

But most companies can’t build new

serving systems…

BigData

Batch Analytics

Training

Use existing systems: Offline Scoring

BigData

Training

Batch Analytics

ScoringX Y

DatastoreUse existing systems: Offline Scoring

Application

Decision

Look up decision in datastore

Low-Latency Serving

Application

Decision

Look up decision in datastore

Low-Latency Serving

Problems:Ø Requires full set of queries ahead of time

Ø Small and bounded input domainØ Wasted computation and space

Ø Can render and store unneeded predictionsØ Costly to update

Ø Re-run batch job

???Create VWCaffe 20

How does Clipper address these challenges?

q Simplifies deployment through layered architecture

q Serves many models across ML frameworks concurrently

q Employs caching, batching, scale-out for high-performance serving

Clipper Solutions

Clipper

Predict FeedbackRPC/REST Interface

CaffeMC MC MC

RPC RPC RPC RPC

Clipper Decouples Applications and Models

Applications

Model Container (MC)

Clipper Architecture

Clipper

ApplicationsPredict ObserveRPC/REST Interface

MC MC MCRPC RPC RPC RPC

Model Abstraction LayerProvide a common interface to modelswhile bounding latency and maximizing throughput.

Model Selection LayerImprove accuracy through bandit methods and ensembles, online learning, and personalization

Clipper Architecture

Clipper

ApplicationsPredict ObserveRPC/REST Interface

MC MC MCRPC RPC RPC RPC

Model Selection LayerSelection Policy

Model Abstraction LayerCaching

Adaptive Batching

ClipperModel Selection LayerSelection Policy

Selection policies supported by ClipperØExploit multiple models to estimate confidenceØ Use multi-armed bandit algorithms to learn

optimal model-selection onlineØ Online personalization across ML frameworks

*See paper for details [NSDI 2017]

A single page load may generatemany queries

Batching to Improve ThroughputØ Optimal batch depends on:

Ø hardware configurationØ model and frameworkØ system load

Ø Why batching helps:

HardwareAcceleration

Helps amortizesystem overhead

A single page load may generatemany queries

Clipper Solution:

Adaptively tradeoff latency and throughput…

Ø Inc. batch size until the latency objective is exceeded (Additive Increase)

Ø If latency exceeds SLO cut batch size by a fraction (Multiplicative Decrease)

Ø Why batching helps:

HardwareAcceleration

Helps amortizesystem overhead

Ø Optimal batch depends on:Ø hardware configurationØ model and frameworkØ system load

Batching to Improve ThroughputAdaptive

ConclusionØ Prediction-serving is an important and challenging area for systems

researchØ Support low-latency, high-throughput serving workloadsØ Serve large and growing ecosystem of ML frameworks

Ø Clipper is a first step towards addressing these challengesØ Simplifies deployment through layered architectureØ Serves many models across ML frameworks concurrentlyØ Employs caching, adaptive batching, container scale-out to meet interactive

serving workload demandsØ Beyond academic prototype to build a real, open-source system

https://github.com/ucbrise/clippercrankshaw@cs.berkeley.edu

A Low-Latency Online Prediction Serving System · 2017-07-10 · Presenter: Alexey Tumanov Daniel...

Documents

Transcript of A Low-Latency Online Prediction Serving System · 2017-07-10 · Presenter: Alexey Tumanov Daniel...

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East talk by Dan Crankshaw

GraphX: Unifying Table and Graph Analytics Presented by Joseph Gonzalez Joint work with Reynold Xin, Daniel Crankshaw, Ankur Dave, Michael Franklin, and.

Hiikkaa suuraa Zumar E SHiikkaa suuraa Zumar E S ... 1

DESKRIPSI KEGIATAN YANG DILAKUKAN GURU …eprints.uny.ac.id/19156/1/SIDIK SURIPTO.pdf · LUQMAN AL HAKIM BANGUNTAPAN BANTUL YOGYAKARTA ... (QS Az Zumar [39] : 9) ... dan personalia

Clipper: A Low-Latency Online Prediction Serving Systempages.cs.wisc.edu/~shivaram/cs744-readings/clipper.pdfClipper: A Low-Latency Online Prediction Serving System Daniel Crankshaw*,

EPSON001 - Redirecting to OWL · EPSON001.PDF Author: Vladimir Tumanov Created Date: 4/5/2015 1:14:24 PM ...

PARISH CATHOLIC COMMUNITY OF ALLISON PARK/GLENSHAW€¦ · Montana Fabry, Eleanor Rygelski, Diane, Deanna, Amanda, Shirley, Anna Goetz, Bob Gindele, Jamie Fleck, Justin Crankshaw,

1 Strategy for Competing in Russia’s Banking Market Oleg Tumanov, Deputy CEO Alfa Bank, Moscow London, December 2003.

039 az-zumar

Is there any torsion in your future? Dmitri Diakonov Petersburg Nuclear Physics Institute June 13, 2011 Euler Symposium DD, Alexander Tumanov and Alexey.

Jo Beall, Owen Crankshaw and Susan Parnelleprints.lse.ac.uk/2968/1/Victims_villians_and_fixers.pdf · 2010. 10. 1. · Victims, Villains and Fixers: the Urban Environment and Johannesburg's

Appendix 1 Translation Quran Surah Az -Zumar verse 18 · Appendix 1 Translation Quran Surah Az -Zumar verse 18 English : “Those who listen to the Word (good advice La ilaha ill

Surah Zumar *** ***~~~*** L 227 1 to 31 ***~~~*** SAW har raat Surah bani Israeel & Surah Zumar ki tilaawat ... Sache ki sachi baat he. Aayah 3 Khulaasa Aayah 2 ... Banda Quran se

ARTIKEL SURATKHABARmyrepositori.pnm.gov.my/bitstream/123456789/2573/1/... · Keutamaan orang berilmu sangat jelas, sebagaimana yang dapat difahami melalui ayat 9 daripada surah az-Zumar,

LAPORAN PRAKTEK TEKNOLOGI MAKANAN PEMBUATAN …PEMBU… · LAPORAN PRAKTEK TEKNOLOGI MAKANAN PEMBUATAN MINUMAN JAHE INSTAN Penyusun: Haikal Atharika Zumar 5404416017 ... Untuk mencapai

39 Surah Az Zumar (The Troops)

Agility and Performance in Elastic Distributed Storage · 16 Agility and Performance in Elastic Distributed Storage LIANGHONG XU, JAMES CIPAR, ELIE KREVAT, ALEXEY TUMANOV, and NITIN

39 Surah Az Zumar

Clipper: A Low-Latency Online Prediction Serving System · 09-28-2016 UTE Clipper: A Low-Latency Online Prediction Serving System Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael

KONSEP TARBIYAH DALAM PERSPEKTIF SURAT AZ-ZUMAR …

Surah Zumar * ~ L 227 1 to 31 *~*** SAW har raat Surah bani Israeel & Surah Zumar ki tilaawat ... Sache ki sachi baat he. Aayah 3 Khulaasa Aayah 2 ... Banda Quran se