DAWN: Infrastructure for Usable Machine Learningdawn.cs.stanford.edu/assets/dawn-overview.pdfDAWN:...

DAWN: Infrastructure for Usable Machine LearningPeter Bailis, Kunle Olukotun, Chris Ré, Matei Zaharia

It’s the Golden Age of DataIncredible advances in image recognition, natural language processing, planning, info retrieval

Society-scale impact: autonomous vehicles, personalized medicine, human trafficking

No end in sight for advances in ML

*for the best-funded, best-trained engineering teams

Building ML Products is Too Hard

Major successes (e.g., AlphaGo, ImageNet) require hundreds to thousands of engineers

Huge effort in data preparation, model tuning, experimentation, and productionizing

Domain experts cannot easily or cheaply build ML products

“Only a fraction of real-world ML systemsis composed of ML code”

The DAWN QuestionWhat if anyone with domain expertise could build their own production-quality ML products?• Without a PhD in machine learning• Without being an expert in systems• Without understanding the latest hardware

It’s happened before

It’s happened before: SearchBefore: Decades of research on information retrieval, indexes, ranking, etc

After: any developer can add search to an application by linking a library (e.g. Solr, Lucene); everyone (i.e., non-expert users) uses search

It’s happened before: SQLBefore: raw access to disk, manual layout of records, network databases (CODASYL)

After: SQL forms basis for transactional engines, data warehousing, business intelligence tools

Key idea: end-to-end systems that tackle the barriers to access & production use

The DAWN StackData Acquisition Feature Engineering Model Training Productionizing

Snorkel

DeepDive

MacroBase (Streaming Data)

NoScope (Video)

AutoRec, SimDex (Recommendation)

Data Fusion

Mulligan (SQL+graph+ML)

CPU GPU FPGA Cluster Mobile

New Hardware: FuzzyBit, Plasticine CGRA

End-to-End Compilers: Weld, Delite

ModelQAModelSnap

Example: MacroBasefor Continuous Analytics

End-to-end system to prioritize user attention

MacroBasemulti-dimensionaldata streams

anomalies &explanations

Too much data for manual inspectionEven harder when data is streaming

MacroBase SummaryEnd-to-end system to prioritize user attention• No ML expertise needed: MacroBase uses general

models and tunes them automatically• No separate step for production use• Co-design from algorithms to HW

Open source: github.com/stanford-futuredata/macrobase

Early users: automotive, cloud, mobile apps, manufacturing

Snorkel

DeepDive

NoScope (Video)

Data Fusion

ModelQAModelSnap

Weld: Rethinking the Interface to Data Analytics Libraries

Standard approach: users combine libraries using function calls that pass data via memory

Problem: for data-intensive apps, data movementcost dominates on modern hardware!

5-30x slowdowns in NumPy, Spark, TensorFlow, …

machine learningSQL graph

algorithms

DiverseAnalytics Tasks

CPUs GPUs FPGAsDiverseHardwarePlatforms

Weld IRCommonRuntime

Weld’s Approach

Open source: weld.stanford.edu

TPC-H Logistic RegressionVector Sum

Results: Existing Frameworks

10 15 20 25 30 35 40 45

TPC-H Q1 TPC-H Q6

Runtim

Workload

SparkSQLWeld

0 0.02 0.04 0.06 0.08 0.1

0.12 0.14 0.16 0.18 0.2

Runtim

NPNExpr

Integration effort: 500 lines glue, 30 lines/operator

LR (1T) LR (12T)

Runtim

cs; lo

Workload

TFHand-opt

1 Core 12 Cores

Results: Cross-Library Optimization

CurrentWeld, no CLOWeld, CLOWeld, 12 core

Pandas + NumPy

Scala UDFWeld

Spark SQL UDF

CLO = cross-library optimizationOpen source: weld.stanford.edu

Snorkel

DeepDive

NoScope (Video)

Data Fusion

ModelQAModelSnap

NoScope: Fast CNN-BasedVideo Queries

Opportunity: CNNs allow more accurate queries on visual data than ever

Challenge : processing 1 video in real time requires a $1000 GPU

Result: same accuracy but100-3000x faster through:• Scene-specific distillation• Temporal + spatial locality

bit.ly/NoScopeArxiv

Snorkel

DeepDive

NoScope (Video)

Data Fusion

ModelQAModelSnap

Training data is key enabler, barrier to entry

How can we leverage data that’s expensive to label at scale?

github.com/HazyResearch/snorkel

Snorkel’s Approach:Weak Supervision

1) User writes labeling functions: short programs that may not always give right label• E.g. regex to search in text

2) Snorkel simultaneously learns noise in LFs and a noise-aware target model (e.g. LSTM)

4 hours LF coding with bio experts: match months of hand-labeling

high-quality models from low-quality, scalable labeling functions

System NCBI Disease (F1)

CDR Disease(F1)

CDR Chem. (F1)

TaggerOne (Dogan, 2012)* 81.5 79.6 88.4Snorkel: Logistic Regression 79.1 79.6 88.4Snorkel: LSTM + Embeddings 79.2 80.4 88.2

DAWN: machine learning for everyone via novel techniques and interfaces that span hardware, systems, and algorithms

Find out more at dawn.cs.stanford.edu

Peter Bailis Chris Ré Kunle Olukotun Matei Zaharia

DAWN: Infrastructure for Usable Machine Learningdawn.cs.stanford.edu/assets/dawn-overview.pdfDAWN:...

Documents

Transcript of DAWN: Infrastructure for Usable Machine Learningdawn.cs.stanford.edu/assets/dawn-overview.pdfDAWN:...

Usable Presentation Design

Ram Usable

Space is the machine Bill Hillier - Dana Bixby Architecturedanabixby.com/media/2016/04/Space-is-the-Machine-Bill-Hillier.pdf · (meaning precise enough for the ideas to be usable

MT－76D...(2) New part (3) New part usable with old machine, but old part not usable with new machine (reference section 1). (4) Old part number is used through this machine serial

DAWN: Infrastructure for Usable Machine LearningThe DAWN Stack Data Acquisition Feature Engineering Model Training Productionizing aces ms ms e … Snorkel DeepDive MacroBase(Streaming

(Full Text Version) The Leadership Machine: All the Research about Women's Career Advancement Summed Up in One Usable Diagram

DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of

Usable Image

Stanford Data Science Initiative...the Microsoft Azure cloud Democratizing machine learning in the Stanford DAWN project After DAWN: The potential of usable machine learning today

Creating Usable Data

Accelerating Machine Learning Development with · Accelerating Machine Learning Development with Matei Zaharia @matei_zaharia. ... usable machine learning Cloud platform for large-scale

Usable Security

Re-usable metadata, re-usable content

Splunk Enterprise Overview - zc.proact.lvzc.proact.lv/wp-content/uploads/2017/03/Splunk Enterprise Overview.… · Splunk Enterprise Overview. 2 Make machine data accessible, usable

Privacy-friendly machine learning algorithms for intrusion ... · In this thesis, we present a set of practically usable machine-learning algorithms for intrusion detection systems

RE-USABLE WRITING BOOKS CATALOGUEgreenindiabooks.com/images/e-catalogue.pdfRE-USABLE WRITING BOOKS CATALOGUE MRP ` 199/-2/3 letters book MY RE-USABLE Manufacturer of Re-usable Writing

Designing usable forms

Dawn Dawn Mission. Dawn How Do We Get There? Dawn DAWN A Journey to the Beginning of the Solar System Vesta Travel Plans: Dawn’s Itinerary The Dawn Spacecraft.

Computer Apps. Computer concepts. What is a Computer? Turning data into usable information.. inputperforms processingproduces outputA computer is a machine.

Usable Content