Download - Spark Hsinchu meetup

Transcript
Page 1: Spark Hsinchu meetup

Spark Summit 2016 @San Francisco

Stana He

Page 2: Spark Hsinchu meetup

• gitter https://gitter.im/hubertfc/SparkHsinchu

• Gitter app https://gitter.im/apps

• meetup https://www.meetup.com/Apache-Spark-Hsinchu/

Page 3: Spark Hsinchu meetup

Who am I ?

• Stana He

• Is-land

Page 4: Spark Hsinchu meetup

Agenda

• Something about Apache Spark

• Enterprise use case

Page 5: Spark Hsinchu meetup

What is Apache Spark?

• Open source cluster computing framework.

• Developed at the UC, Berkeley's AMPLab.

• Donated to the Apache Software Foundation.

Page 6: Spark Hsinchu meetup

Benefits of Apache Spark• Speed

- 100x faster than Hadoop for large scale data processing.

• Ease of Use

- Easy-to-use APIs.

• Unified Engine

- Packaged with higher-level libraries,including streaming data,SQL queries,machine learning and graph processing.

Page 7: Spark Hsinchu meetup

What’s New in 2.0 ?• Structured API improvements

- SQL, DataFrames, Datasets

• Structured Streaming

• MLlib model export

• MLlib R bindings

• SQL 2003 support

• Scala 2.12 support

Page 8: Spark Hsinchu meetup

What’s New in 2.0 ?

• Whole-stage code generation

- Fuse across multiple operators

• Optimized input / output

- Apache Parquet + built-in cache

reference:http://www.slideshare.net/databricks/spark-summit-san-francisco-2016-matei-zaharia-keynote-apache-spark-20

Page 9: Spark Hsinchu meetup

Enterprise use case

Page 10: Spark Hsinchu meetup
Page 11: Spark Hsinchu meetup

Winning the game with Spark!

Page 12: Spark Hsinchu meetup

Unfortunately, it doesn’t!

Page 13: Spark Hsinchu meetup

reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 14: Spark Hsinchu meetup

Players and Data

• 67+ million monthly active players

• 500+ billion data points per day

• 26 petabytes data collected since beta

Page 15: Spark Hsinchu meetup

What does Spark do ?

• Spark SQL Data exploration and reporting

• Spark Streaming Network performance

• Spark MLlib Recommendation system

Page 16: Spark Hsinchu meetup

Spark SQL -Data exploration and reporting

Page 17: Spark Hsinchu meetup

Performance

reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 18: Spark Hsinchu meetup

Spark Streaming -Network performance

Page 19: Spark Hsinchu meetup

Build network

Riot Directreference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 20: Spark Hsinchu meetup

Normal Network Model

reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 21: Spark Hsinchu meetup

Detect model

reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 22: Spark Hsinchu meetup

Another detect model

reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 23: Spark Hsinchu meetup

Model Building/Evaluation

HIVE(stores aggregated data)

Kafka

Consume/Aggregate

Alerts

Spark

Elasticsearch

Dashboards

reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 24: Spark Hsinchu meetup

Spark MLlib -Recommendation system

Page 25: Spark Hsinchu meetup

reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 26: Spark Hsinchu meetup

reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 27: Spark Hsinchu meetup

Modeling/Evaluation

HIVE

Explore/Feature

engineering

Recommendation

Game Server

Data

Feature

SparkSQL MLlib

Spark

reference:http://www.slideshare.net/SparkSummit/video-games-at-scale-improving-the-gaming-experience-with-apache-spark

Page 28: Spark Hsinchu meetup

Q & A

Page 29: Spark Hsinchu meetup

Spark Summit

• https://spark-summit.org/2016/

• Slides and video

Page 30: Spark Hsinchu meetup

Spark Cookbook-• Ch1. Getting Started with Apache Spark (Chunhung Huang) (4 )

• Ch2. Developing Applications with Spark ( )

• Spark RDD (Allen )

• Ch3. External Data Sources ( ) (8 )

• Ch4. Spark SQL ( )

• Ch5. Spark Streaming ( )

• Ch6. Getting Started with Machine Learning Using MLlib ( )

• Ch7. Supervised Learning with MLlib - Regression (Dean Du)

• Ch8. Supervised Learning with MLlib - Classification ( )

• Ch9. Unsupervised Learning with MLlib (Vito)

• Ch10.Recommender System (Leorick)

• Ch11.Graph Processing using GraphX ( )

• Ch12.Optimizations and Performance Tuning ( )