The Third Why at Gx24

The Third “Why”

Julio Faerman2014-09-29 GX24

http://jfaerman.com.br/gx24

https://startwithwhy.com

Security

Availability

ComplianceFault Tolerance

Throughput

Latency

…but what is the difference?

16 years2000+ employees

40 million user

http://aws.amazon.com/solutions/case-studies/netflix/http://www.enotechconsulting.com/2013/04/aws-s3-behind-netflix-success/http://variety.com/2014/digital/news/netflix-youtube-bandwidth-usage-1201179643/

Amazon Web Services for 100%

of Streaming

34.2% of all downstream

during primetime

AmazonSimpleStorageService

• Durable, scalable and fast storage (99.999999999%)

• 2+ Trillion (1012) objects• 1.1+ Million RPS• Native HTTP/S• Full featured:

Permissions, Static Hosting, Logging, Versionamento, Archival and Expiration Lifecycle, Torrent, Tags, Redundancy, Requester Pays, Criptography, Reduced Redundancy and more

DEMO

1. “Low, pay-as-you-go pricing with no up-front expenses or

long-term commitments.”

2. “Instantly deploy new applications, scale up as your

workload grows, and scale down based on demand.”

http://aws.amazon.com/about-aws/

“We will make electricity so cheap that only the rich

will burn candles.”Thomas Edison

The Big Switch: http://amzn.com/039334522X

http://aws.amazon.com/solutions/case-studies/

Fear, Uncertainty and Doubt

Topsy Elephant: https://www.youtube.com/watch?v=eh_mJfWKNTI

http://youtu.be/GRVPGC1haTM

SecurityComplianceCapacityFault ToleranceCostComplexityBillingScalabilityAvailabilityLatencyThroughput

…

Proof of Concept• Quantitative > Qualitative• Iterative• Incremental

http://www.infoq.com/presentations/JPL-cloud

JPL Missions

“Internet of Things”?

“Batch” Big Data

“Streaming” Big Data

How unique data systems are?

http://nathanmarz.com/blog

3 Interfaces to Amazon Web ServicesConsole, CLI, SDK

AmazonKinesis

• Real-time processing of streaming data

• High Throuput and Elastic• Integrate with Amazon S3, Amazon

Redshift, and Amazon DynamoDB• Locking, Sharding, Rollback and

more with Kinesis Client Library

Dashboard

CEP

Storage

AmazonElastic

MapReduce

• Distributed processing with Apache Hadoop

• Near linear scalability• Resizable and disposable Clusters• Apache Hadoop ecosystem:

Hive, Pig, Impala, Spark, ..., …, …• Instant automatic provisioning• Simplified Administration• 5.5M+ Clusters

• Petabyte Scale Data Warehousing

• Massively parallel OnLine Analytic Processing

• Resizable without downtime• Managed provisioning and

administration• Compatible with PostgreSQL

AmazonRedshift

Amazon Redshift Architecture

Leader Node

• SQL endpoint

• Stores metadata

• Coordinates query execution

Compute Nodes

• Local, columnar storage

• Execute queries in parallel

• Load, backup, restore via Amazon S3; load from Amazon DynamoDB or SSH

Two hardware platforms

• Optimized for data processing• DW1: HDD; scale from 2TB to 1.6PB• DW2: SSD; scale from 160GB to 256TB

10 GigE(HPC)

IngestionBackupRestore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

Amazon S3 / DynamoDB / SSH

JDBC/ODBC

128GB RAM

16TB disk

16 coresCompute Node

128GB RAM

16TB disk


128GB RAM

16TB disk


LeaderNode

ETL from EMR/Hive to Amazon Redshift trough Amazon S3

EMR S3 Redshift

Extract & Transform Load

UnstructuredUnclean

StructuredClean

ColumnarCompressed

7+ Billion

~50 to ~3500 Instances in 3 days

AmazonAuto

Scaling

• Adjust capacity to demand• Automated and customizable

provisioning• Integrated monitoring and load

balancing• Maintain fleet size across

availability zones• On-demmand or scheduled actions

DEMO

280+ Releases in 2014

http://aws.amazon.com/newhttp://aws.amazon.com/blogs/aws

Where to begin?

http://aws.amazon.com/training/intro_series/

http://aws.amazon.com/training/

Julio [email protected]

http://jfaerman.com.br/gx24

Thank you! Questions?

The Third Why at Gx24

Technology

Transcript of The Third Why at Gx24