(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

Post on 02-Jul-2015

1.486 views 2 download

description

As Netflix expands their services to more countries, devices, and content, they continue to evolve their big data analytics platform to accommodate the increasing needs of product and consumer insights. This year, Netflix re-innovated their big data platform: they upgraded to Hadoop 2, transitioned to the Parquet file format, experimented with Pig on Tez for the ETL workload, and adopted Presto as their interactive querying engine. In this session, Netflix discusses their latest architecture, how they built it on the Amazon EMR infrastructure, the contributions put into the open source community, as well as some performance numbers for running a big data warehouse with Amazon S3.

Transcript of (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

November 12, 2014 | Las Vegas, LV

Eva Tse, Netflix

Cloud

apps

Suro Ursula

CassandraAegisthus

Dimension data

Event Data

15 min

Daily

Amazon S3

SS tables

Amazon S3

Storage Compute Service Tools

Amazon S3v2.0

Storage Compute Service Tools

• Works well on

Amazon Simple Storage

Service (S3)

YARN-1864

YARN-2026

YARN-2012

YARN-2214

YARN-2360

YARN-2540

S3

S3

Tez Plan

Tez Execution Engine

Logical Plan

Physical Plan

MR Plan

MR Execution Engine

MRCompilerTezCompilerd

A Distributed SQL Query Engine for Big Data

techblog.netflix.com

21 committed PRs and 14 PRs in review

S3

v2.0

techblog.netflix.com

Amazon S3v2.0

d

Storage Compute Service Tools

YARN-1864

YARN-2026

YARN-2012

YARN-2214

YARN-2360

YARN-2540

HIVE-6783

HIVE-6785

HIVE-6938

HIVE-7800

PARQUET-100

PARQUET-106

PARQUET-2

PARQUET-22

PARQUET-70

PARQUET-75

PARQUET-92

PARQUET-99

PIG-3986

Talk Time Title

PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and Service Reliability

BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix

PFC-306 Wednesday, 3:30pm Performance Tuning EC2

DEV-309 Wednesday, 3:30pm From Asgard to Zuul, How Netflix’s proven Open Source

Tools Can Accelerate and Scale Your Services

ARC-317 Wednesday, 4:30pm Maintaining a Resilient Front-Door at Massive Scale

PFC-304 Wednesday, 4:30pm Effective Inter-process Communications in the Cloud: The

Pros and Cons of Micro Services Architectures

ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems

APP-310 Friday, 9:00am Scheduling using Apache Mesos in the Cloud

http://bit.ly/awsevals