AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

64
AWS Roadshow 2013 Über den Wolken – befreien Sie Ihre I Datenanalyse und Business Intelligence Michael Hanisch Mgr. Solutions Architecture Matthias Jung Solutions Architect Constantin Gonzalez Solutions Architect

description

Vortrag von der AWS Roadshow Herbst 2013

Transcript of AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Page 1: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

AWS Roadshow 2013Über den Wolken – befreien Sie Ihre IT

Datenanalyse und Business Intelligence

Michael HanischMgr. Solutions Architecture

Matthias JungSolutions Architect

Constantin GonzalezSolutions Architect

Page 2: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

1. Introducing Big Data

2. From data to actionable information

3. Analytics and Cloud Computing

Overview

Page 3: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Introducing Big Data

1

Page 4: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 5: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

The cost of data generation is falling

Page 6: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

The volume of data is increasing

Page 7: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Lower cost,higher throughput

Page 8: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Lower cost,higher throughput

Highlyconstrained

Page 9: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generated data

Available for analysis

Data volume

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Page 10: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Elastic and highly scalable

No upfront capital expense

Only pay for what you use+

+

Available on-demand+

=Remove

constraints

Page 11: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Lower cost,higher throughput

Highlyconstrained

Page 12: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Accelerated

Page 13: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Technologies and techniques for working productively with data,

at any scale.

Big Data

Page 14: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

From data to

actionable information

2

Page 15: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

“Who buys video games?”

Page 16: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

3.5 billion records

13 TB of click stream logs

71 million unique cookies

Per day:

Page 17: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
Page 18: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
Page 19: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

500% return on ad spend

From 2 months procurement timeto a few minutes

Results:

Page 20: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

“Who is using our service?”

Page 21: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Identified early mobile usage

Invested heavily in mobile development

Finding signal in the noise of logs

Page 22: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

9,432,061 unique mobile devices used the Yelp mobile app.

4 million+ calls. 5 million+ directions.

In January 2013

Page 23: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Analytics and

Cloud Computing

3

Page 24: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 25: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

S3, Glacier,Storage Gateway,

DynamoDB, Redshift, RDS,

HBase

Page 26: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

EC2 &Elastic MapReduce

Page 27: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharingEC2 & S3,

CloudFormation,Elastic MapReduce,

RDS, DynamoDB, Redshift

Page 28: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Generation

Collection & storage

Analytics & computation

Collaboration & sharingEC2 & S3,

CloudFormation,Elastic MapReduce,

RDS, DynamoDB, Redshift

EC2 &Elastic MapReduce

S3, Glacier,Storage Gateway,

DynamoDB, Redshift, RDS,

HBaseAWS Data Pipeline

Page 29: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Simple Storage Service

S3

Page 30: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Elastic MapReduce

EMR

Page 31: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

What is EMR?

Map-Reduce engine Integrated with tools

Hadoop-as-a-service

Massively parallel

Cost effective AWS wrapper

Integrated to AWS services

Page 32: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

How does it work?

EMR

EMR ClusterS3

1. Put the data into S3 (or HDFS)

3. Get the results

2. Launch your cluster. Choose:• Hadoop distribution• How many nodes• Node type (hi-CPU,

hi-memory, etc.)• Hadoop apps (Hive,

Pig, HBase)

Page 33: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

EMR

EMR Cluster

How does it work?

S3

You can easily resize the cluster

Page 34: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

EMR

EMR Cluster

How does it work?

S3

Use Spot nodes to save time

and money

Page 35: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

EMR

EMR Cluster

How does it work?

S3

Launch parallel clusters against the same data source (tune for the

workload)

Page 36: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

How does it work?

EMR ClusterS3

When the work is complete, you can terminate the cluster

(and stop paying)

Page 37: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

How does it work?

You can store everything in HDFS

(local disk)

High Storage nodes = 48 TB/node

EMR Cluster

Page 38: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

EMR Cluster

How does it work?

Launch in a Virtual Private Cloud for

extra security

Page 39: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Thousands of Customers, 5+ Million Clusters

Page 40: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Integrates with Hadoop Ecosystem

EMR

Page 41: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Integrates with Hadoop Ecosystem

EMR

Page 42: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Give it a try:aws.amazon.com/elasticmapreduce

Cost to run a 100-node EMR cluster:EUR 6.15/hour

($8/h)

Page 43: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/Calgary Reviews https://www.flickr.com/photos/calgaryreviews/6328302248/in/photostream/

+

Page 44: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

What if all I want is a database?

Page 45: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

No upfront costs, pay as you go

Really fast performance at a really low price

Open and flexible with support for popular tools

Easy to provision and scale up massively

Customers asked us for a data warehouse the AWS way:

Page 46: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

A fast and powerful, petabyte-scale data warehouse that is

A Lot Faster

A Lot Cheaper

A Whole Lot SimplerAmazon Redshift

Amazon Redshift Is:

Page 47: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Column storage

Data compression

Zone maps

Direct-attached storage

Large data block sizes

Id Age State

123 20 CA

345 25 WA

678 40 FL

Amazon Redshift Dramatically Reduces IO

Page 48: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Amazon Redshift parallelizes and distributes everything

Query

Load

Backup

Restore

Resize

Page 49: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Amazon Redshift Runs on Optimized Hardware

HS1.8XL: 128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate

HS1.XL: 16GB RAM, 2 Cores, 3 Spindles, 2TB Storage

128 GB RAM16 cores

16 TB disk

16 GB RAM

2 TB disk

2 cores

Optimized for I/O intensive workloads

High disk density

Runs in HPC - fast network

HS1.8XL available on Amazon EC2

Page 50: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Redshift lets you start small and grow bigExtra Large Node (XL)3 spindles, 2TB, 15GiB RAM 2 virtual cores, 10GigE

Single Node (2TB)

Cluster 2-32 Nodes (4TB – 64TB)

8 Extra Large Node (8XL)24 spindles, 16TB, 120GiB RAM16 virtual cores, 10GigE

Cluster 2-100 Nodes (32TB – 1.6PB)8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

8XL

XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

Page 51: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Priced to Analyze All the Customer’s Data

Price Per Hour for HS1.XL Single Node

Effective Hourly Price Per TB Effective Annual Price per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year Reservation $ 0.500 $ 0.250 $ 2,190

3 Year Reservation $ 0.228 $ 0.114 $ 999

Simple Pricing: Number of Nodes x Cost per Hour

No charge for Leader Node

Pay as you grow

Page 52: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Amazon Redshift Simplifies Provisioning

• Create a cluster in minutes

• Automatically patch your OS and data warehouse software

• Scale up to 1.6PB with a few clicks and no downtime

Amazon RedshiftAmazon Redshift

Page 53: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Amazon Redshift Simplifies Operations

• Built-in security in transit, at rest, when backed up*

• Backup to S3 is continuous, incremental, and automatic

• Disk failures are transparent; nodes recover automatically

• Streaming restores resumes querying faster

Amazon S3Clients

*SSL, Amazon VPC, AES-256 (Hardware Accelerated)

(Optional) SSL Continuous, Automatic Backup

Streaming Restore

Amazon Redshift

Page 54: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Initial Pilot Results

Current production environment32 nodes, 128 CPUs, 4.2TB RAM, 1.6 PB disk

Tested 2B row data set, 6 representative queries on a

2-node Amazon Redshift cluster

queries ran > 10x faster

Page 55: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Amazon Redshift Integrates With All Data Sources

Amazon DynamoDB

Amazon Elastic MapReduce

Amazon Simple Storage Service (S3)

Amazon EC2

AWS Storage Gateway Service

Corporate Data Center

Amazon Relational Database Service (RDS)

Amazon Redshift

Page 56: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Integrates With Existing BI Tools

Connect your tools to Amazon Redshift using standard drivers from PostgreSQL.org

Amazon Redshift

JDBC/ODBC

Page 57: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

DataIntegrationPartners*

On-Premises Integration

RDBMS

Redshift

OLTPERP

Reportingand BI

Page 58: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Cloud ETL for Big Data

• Maintain online SQL access to your historical data• Transformation and enrichment with EMR• Longer history ensures better insight

RedshiftElastic MapReduceS3

Reportingand BI

Page 59: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

[email protected]

Learn More: aws.amazon.com/big-data

Page 60: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Thank [email protected]

Page 61: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

AWS Data Pipeline

Data-intensive orchestration and automation

Reliable and scheduled

Easy to use, drag and drop

Execution and retry logic

Map data dependencies

Create and manage temporary compute resources

Page 62: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Anatomy of a pipeline

Page 63: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Additional checks and notifications

Page 64: AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Arbitrarily complex pipelines