Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business...

34
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Muckel, Head of Data Platform Markus Schmidberger, Data Platform Architect Glomex GmbH – A ProSiebenSat.1 Media SE company Berlin, April 12 th 2016 Big Data is Dead, Long Live Business Intelligence? berlin

Transcript of Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business...

Page 1: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Michael Muckel, Head of Data Platform

Markus Schmidberger, Data Platform Architect

Glomex GmbH – A ProSiebenSat.1 Media SE company

Berlin, April 12th 2016

Big Data is Dead,Long Live Business Intelligence?

berlin

Page 2: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 2Glomex GmbH – A ProSiebenSat.1 Media SE company

Glomex: A ProSiebenSat.1 company

Page 3: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 3Glomex GmbH – A ProSiebenSat.1 Media SE company

Glomex – The Global Media Exchange

Publishers

Content providers

Video Value Platform

Media Delivery Platform

Media Exchange Platform

Glomex

External broadcasters

Web-only content owners

Non-P7S1 publishers

Page 4: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 4Glomex GmbH – A ProSiebenSat.1 Media SE company

Glomex – Data Platform

Video Value Platform Media Delivery Platform Media Exchange Platform

Data Platform

Real-time-Monitoring Batch Analytics Machine Learning

Page 5: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 5Glomex GmbH – A ProSiebenSat.1 Media SE company

Key Components of our New Data Platform

Content Discovery Find the most relevant content for our customers and their users.

Real-Time MonitoringEnable our development teams to serve our content to our users in the best quality possible.

AnalyticsProvide our teams access to the data to enable data-driven development of new features and products.

Page 6: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 6Glomex GmbH – A ProSiebenSat.1 Media SE company

Lambda Architecture

Graphic provided by http://lambda-architecture.net

≠ AWS Lambda

Page 7: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 7Glomex GmbH – A ProSiebenSat.1 Media SE company

ingest /collect

store process /analyze

visualize / serve

Simplify Data Processing

data answers

Time to Answer (Latency)Throughput

Cost

more concrete numbers at the end

Page 8: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 8Glomex GmbH – A ProSiebenSat.1 Media SE company

Collect Store Analyze Consume

A

iOS Android

Web Apps

Logstash

Amazon RDS

Amazon DynamoDB

AmazonES

AmazonS3

ApacheKafka

AmazonGlacier

AmazonKinesis

AmazonDynamoDB

Amazon Redshift

Impala

Pig

Amazon ML

Streaming

AmazonKinesis

AWSLambda

Amaz

on E

last

ic M

apRe

duce

AmazonElastiCache

Sea

rch

SQ

L N

oSQ

L C

ache

Stre

am P

roce

ssin

gB

atch

Inte

ract

ive

Logg

ing

Stre

am S

tora

ge

IoT

Appl

icat

ions

File

Sto

rage An

alys

is &

Vis

ualiz

atio

n

Hot

Cold

Warm

Hot

Slow

Hot

ML

Fast

Fast

Amazon QuickSight

TransactionalData

File Data

Stream Data

Not

eboo

ks

Predictions

Apps & APIs

Mobile Apps

IDE

Search Data

ETL

Data Processing in Big Data World

Page 9: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 9Glomex GmbH – A ProSiebenSat.1 Media SE company

Our Data Platform Architecture

INGEST STOREPROCESS &

ANALYSEVISUALIZE &

SERVE

AdProxy Log Import Service

Player Feedback Import Service

Data PlatformAccess

Data ScienceAnalytics Service

TechnicalMonitoring

Service

Dev / Ops Analytics Service

Content Discovery Service

KPI & Analytics Service

MetadataService

ContentImport Service

Data Platform Monitoring Service

Data QualityService

Data Management

Service

Data Layer

Data API

Data Lake

External Data Import Service

Portal

CDN files

data stream

data stream

Team

VAS Log Import Service

data stream

other modules

Real-Time Dashboards

ContentAPI

Data Platform - MicroService Layout

CDN Log Import Service

Data Science UI

Page 10: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 10Glomex GmbH – A ProSiebenSat.1 Media SE company

Real-Time Player Monitoring

INGEST STOREPROCESS &

ANALYSEVISUALIZE &

SERVE

AdProxy Log Import Service

Player Feedback Import Service

Data PlatformAccess

Data ScienceAnalytics Service

TechnicalMonitoring

Service

Dev / Ops Analytics Service

Content Discovery Service

KPI & Analytics Service

MetadataService

ContentImport Service

Data Platform Monitoring Service

Data QualityService

Data Management

Service

Data Layer

Data API

Data Lake

External Data Import Service

Portal

CDN files

data stream

data stream

Team

VAS Log Import Service

data stream

other modules

Real-Time Dashboards

ContentAPI

Data Platform - MicroService Layout

CDN Log Import Service

Data Science UI

Page 11: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 11Glomex GmbH – A ProSiebenSat.1 Media SE company

Monitoring Video-Streaming Experience

Focus on Metrics from the User‘s Perspective

From Server-Uptime To (anonymized) Real-User Monitoring

Page 12: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 12Glomex GmbH – A ProSiebenSat.1 Media SE company

Analyze

Take ActionsAutomate

1

23

Page 13: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 13Glomex GmbH – A ProSiebenSat.1 Media SE company

Our Ingest Process

Page 14: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 14Glomex GmbH – A ProSiebenSat.1 Media SE company

Kinesis Firehose is doing his job

Next session: “Streaming Data: The Opportunity and

How to Work With It”

Page 15: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 15Glomex GmbH – A ProSiebenSat.1 Media SE company

Data Facts

20 GB5 Billion

Per day click-stream data in Kinesis Firehose

Record processed per day

~100 ms Data freshness to S3

Page 16: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 16Glomex GmbH – A ProSiebenSat.1 Media SE company

ElasticSearch + Grafana for real-time analyses

Not AWS managed!

Page 17: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 17Glomex GmbH – A ProSiebenSat.1 Media SE company

ElasticSearch on Spot Instances

Page 18: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 18Glomex GmbH – A ProSiebenSat.1 Media SE company

CDN ’Batch Processing’

INGEST STOREPROCESS &

ANALYSEVISUALIZE &

SERVE

AdProxy Log Import Service

Player Feedback Import Service

Data PlatformAccess

Data ScienceAnalytics Service

TechnicalMonitoring

Service

Dev / Ops Analytics Service

Content Discovery Service

KPI & Analytics Service

MetadataService

ContentImport Service

Data Platform Monitoring Service

Data QualityService

Data Management

Service

Data Layer

Data API

Data Lake

External Data Import Service

Portal

CDN files

data stream

data stream

Team

VAS Log Import Service

data stream

other modules

Real-Time Dashboards

ContentAPI

Data Platform - MicroService Layout

CDN Log Import Service

Data Science UI

Page 19: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 19Glomex GmbH – A ProSiebenSat.1 Media SE company

Processing CDN-Logs

25 GB300 Million

Per day as zipped log-files

Record processed per day

+

Normal challenges with external data sourcesOut-of-order deliver / Data quality issues / Varying file sizes / etc.

Page 20: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 20Glomex GmbH – A ProSiebenSat.1 Media SE company

Requirements for our Data Processing Pipeline

Monitor Complete Pipeline

Enable Reprocessing of Historical Datasets

Be Ready to Scale

Page 21: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 21Glomex GmbH – A ProSiebenSat.1 Media SE company

Our CDN Pipeline

Page 22: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 22Glomex GmbH – A ProSiebenSat.1 Media SE company

• How to process 800MB gziped logfile?

• How to split compressed gzip files?

• Splitter using Amazon SQS and Amazon EC2 Spot Instances

AWS Lambda Limits5 min

512 MBAWS Lambda Timeout

AWS Lambda temp disk

Page 23: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Our Meta Data Store

https://blogs.aws.amazon.com/bigdata/post/Tx2YRX3Y16CVQFZ/Building-and-Maintaining-an-Amazon-S3-Metadata-Index-without-Servers

AWS Big Data Blog:

Page 24: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 24Glomex GmbH – A ProSiebenSat.1 Media SE company

Our Meta Data Store

Page 25: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 25Glomex GmbH – A ProSiebenSat.1 Media SE company

Be serverless and serve data

AWS Lambda AWS Lambda Amazon API GatewayAmazon Kinesis

Page 26: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 26Glomex GmbH – A ProSiebenSat.1 Media SE company

CDN Batch Facts

600 rec/sec

1 $ / hour

Processing time

Cost for 25 GB/dayCDN processing

6 Parallel AWS Lambda functions

2.3 min Average run-time of AWS Lambda AWS Lambda duration

Redshift CPU

Page 27: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 27Glomex GmbH – A ProSiebenSat.1 Media SE company

Data Science Environment

INGEST STOREPROCESS &

ANALYSEVISUALIZE &

SERVE

AdProxy Log Import Service

Player Feedback Import Service

Data PlatformAccess

Data ScienceAnalytics Service

TechnicalMonitoring

Service

Dev / Ops Analytics Service

Content Discovery Service

KPI & Analytics Service

MetadataService

ContentImport Service

Data Platform Monitoring Service

Data QualityService

Data Management

Service

Data Layer

Data API

Data Lake

External Data Import Service

Portal

CDN files

data stream

data stream

Team

VAS Log Import Service

data stream

other modules

Real-Time Dashboards

ContentAPI

Data Platform - MicroService Layout

CDN Log Import Service

Data Science UI

Page 28: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 28Glomex GmbH – A ProSiebenSat.1 Media SE company

Data Science Environment

Project Jupyter: http://jupyter.org/

Page 29: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 29Glomex GmbH – A ProSiebenSat.1 Media SE company

Data Science Environment - Architecture

Amazon Redshift Amazon S3 Elasticsearch

Amazon EMR

Amazon Kinesis

Github

Dat

a So

urce

sC

lust

er

Tech

nolo

gyD

evel

opm

ent

In development

In development

Page 30: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 30Glomex GmbH – A ProSiebenSat.1 Media SE company

Our Lambda Architecture on AWS

Batch Layer

Speed Layer

Serving Layer

Applications

Amazon KinesisFirehose

S3

EC2 with ElasticSearch

AmazonRedshift

Amazon ElasticMapReduce + Spark

Amazon API Gateway

EC2 withJupyther

EC2 withGrafana

EC2 withCaravel

data stream

CDN files Portal

Team

Instancewith Kinesis

Agent

AWS Lambda

other player

modules

Data Platform - Lambda Architecture

AWS Lambda

AWS Lambda

Page 31: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 31Glomex GmbH – A ProSiebenSat.1 Media SE company

Key Takeaways

Lambda Architecture

Enrich your traditional, batch-driven BI-workflow with real-time analytics

Use Lambda-Architecture as a guiding principle and adapt it to your needs

Page 32: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 32Glomex GmbH – A ProSiebenSat.1 Media SE company

Key Takeaways

AWS managed services provide an robust way to run complex big data infrastructures

Follow best-practices provided by AWS and the community

Focus on features development and robust pipelines not on infrastructure management

Page 33: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Page 33Glomex GmbH – A ProSiebenSat.1 Media SE company

Key Takeaways

Provide an open data environments

Structure your data that it can be access in processed and raw form

Trust the creativity of your engineering teams to find insights in your datasets

Notebooks provide easy access to even large distributed datasets

Page 34: Big Data is Dead, Long Live Business Intelligence? · Big Data is Dead, Long Live Business Intelligence? berlin. Glomex GmbH – A ProSiebenSat.1 Media SE company Page 2 Glomex: A

Michael Muckel, Head of Data Platform

Markus Schmidberger, Data Platform Architect

Glomex GmbH – A ProSiebenSat.1 Media SE company

We are hiring …

• Data Scientists• Data Engineers

• Project Managers