VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

33
Vegas The Missing Matplotlib for Scala/Spark DB Tsai Roger Menezes

Transcript of VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Page 1: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

VegasThe Missing Matplotlib for Scala/Spark

DB TsaiRoger Menezes

Page 2: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Homepage Kids Page Downloads Page

Netflix Recommendations

Every aspect of the Experience is MachineLearned

Page 3: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

3

2017> 100M members> 190 countries

Page 4: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Multiple Devices

Page 5: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Genres: 23 rows/page average

Sims: 10 rows/page average

Page 6: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

My List:

Continue Watching:

Popular on Netflix:

Trending Now:

Watch It Again:

Top Picks:

Because You Watched:

Genres:

New Releases:

Recently Added:

Originals RowBillboard:

Page 7: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

ML at Netflix

● Optimize the Experimentation usecase vs Productionization● Experimentation

○ Opportunity sizing, Data Exploration○ Tweaks to ML algos○ Feature Selection○ Model Evaluation

Page 8: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Notebooks

● Optimal for Experimentation● Sharing reproducible research

○ Facilitates feedback loop with PMs● End to end ML experiment.

○ Interactivity drives productivity

Page 9: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Python Notebooks

Page 10: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Python Notebooks

● Seamless Experience - ML experimentation● Well known Scientific computing libraries● Huge catalog of Visualization plotting libraries

○ Matplotlib, Seaborn, Bokeh, BQPlot, Lightning, etc.

Page 11: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Scala Notebooks● Zeppelin, Jupyter, Databricks, Spark-Notebooks, ...● Computing library gap filling up● Lack of Visualization Libraries

○ Main friction point in adoption○ End to End ML use case not convincing

Page 12: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Introducing Vegas● Visualization Library in Scala● Mainly built for the notebook use case● Scala wrapper around Vega-Lite● Missing MatPlotLib for the Scala and Spark world.

Page 13: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

VegaLite● Statistical Visualization● Design considerations for vega-lite

○ Imperative vs Declarative API

Page 14: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

DECLARATIVE

STATISTICAL

VISUALIZATION GRAMMAR

IN SCALA

You tell it WHAT should be done with the data, and it knowsHOW to do it!

Operations such as filtering, aggregation, faceting are built into the visualization, rather than putting the burden on the user to massage the data into shape.

Complex visualizations can be built with a few high level abstractions:

DATA TRANS-FORMS SCALES GUIDES MARKS

cf : Altair Talk by Brian Granger in PyData 2016 https://youtu.be/v5mrwq7yJc4

Page 15: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Added Bonus of Declarative Visualizations:

INTERACTIVITY!

D3JS

VEGAS

VEGAS CODE EXPANDS OUT TO D3JS CODE!

Page 16: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Anatomy of a plot: Channels

X/Y channel

Shape Channel

Size Channel

Color Channel

Page 17: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Features…

Page 18: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

1. Supports most plot types

Page 19: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

2. Trellis plots

Page 20: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

3. Layers

Layer 1.

Layer 2.

Layer 3.

Page 21: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

4. Notebook and Consoles

Page 22: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

5. Built-in spark support

Vegas .withDataFrame(myDataFrame) .encodeX(“population”) .encodeY(“age”)

Mapped Columns

Pass In DF.

Page 23: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

6. Visual statistics

● Advanced Binning

● Sorting

● Scaling

● Custom Transforms

● Time Series

● Aggregation

● Filtering

● Math functions (log, etc)

● Missing data support

● Descriptive Statistics

Page 24: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

How It Works !

Page 25: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

1. Specify in Scala

2. Embed HTML (iFrame)

3. Render within iFrame using JS

Page 26: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

VEGA

D3JS

VEGA-LITE*

VEGAS

MORE ABS

TRACT

ION

SCALA DSL EMITS TYPE-CHECKED VEGA-LITE JSON

VEGA-LITE CONVERTS INTERNALLY TO VEGA JSON SPEC

VEGA TRANSLATES JSON TO D3JS CODE THAT CAN BE VERY VERBOSE

A SCALA DSL FOR VEGA-LITE

* Vega-Lite

Page 28: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

What’s coming

Page 29: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

1. Interactive selections

Page 30: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

2. Selections transforms

Page 31: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Contributors

Page 32: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

Thank you.

Page 33: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai

@NetflixResearch@rogermenezes @dbtsai

The missing MatPlotLib for Scala/Spark

http://vegas-viz.org