The Third Why at Gx24
-
Upload
julio-faerman -
Category
Technology
-
view
220 -
download
2
description
Transcript of The Third Why at Gx24
The Third “Why”
Julio Faerman2014-09-29 GX24
http://jfaerman.com.br/gx24
https://startwithwhy.com
Security
Availability
ComplianceFault Tolerance
Throughput
Latency
…but what is the difference?
16 years2000+ employees
40 million user
http://aws.amazon.com/solutions/case-studies/netflix/http://www.enotechconsulting.com/2013/04/aws-s3-behind-netflix-success/http://variety.com/2014/digital/news/netflix-youtube-bandwidth-usage-1201179643/
Amazon Web Services for 100%
of Streaming
34.2% of all downstream
during primetime
AmazonSimpleStorageService
• Durable, scalable and fast storage (99.999999999%)
• 2+ Trillion (1012) objects• 1.1+ Million RPS• Native HTTP/S• Full featured:
Permissions, Static Hosting, Logging, Versionamento, Archival and Expiration Lifecycle, Torrent, Tags, Redundancy, Requester Pays, Criptography, Reduced Redundancy and more
DEMO
1. “Low, pay-as-you-go pricing with no up-front expenses or
long-term commitments.”
2. “Instantly deploy new applications, scale up as your
workload grows, and scale down based on demand.”
http://aws.amazon.com/about-aws/
3…
“We will make electricity so cheap that only the rich
will burn candles.”Thomas Edison
The Big Switch: http://amzn.com/039334522X
Day 1
http://aws.amazon.com/solutions/case-studies/
Fear, Uncertainty and Doubt
Topsy Elephant: https://www.youtube.com/watch?v=eh_mJfWKNTI
http://youtu.be/GRVPGC1haTM
SecurityComplianceCapacityFault ToleranceCostComplexityBillingScalabilityAvailabilityLatencyThroughput
…
Proof of Concept• Quantitative > Qualitative• Iterative• Incremental
http://www.infoq.com/presentations/JPL-cloud
JPL Missions
“Internet of Things”?
“Batch” Big Data
“Streaming” Big Data
How unique data systems are?
http://nathanmarz.com/blog
3 Interfaces to Amazon Web ServicesConsole, CLI, SDK
AmazonKinesis
• Real-time processing of streaming data
• High Throuput and Elastic• Integrate with Amazon S3, Amazon
Redshift, and Amazon DynamoDB• Locking, Sharding, Rollback and
more with Kinesis Client Library
Dashboard
CEP
Storage
AmazonElastic
MapReduce
• Distributed processing with Apache Hadoop
• Near linear scalability• Resizable and disposable Clusters• Apache Hadoop ecosystem:
Hive, Pig, Impala, Spark, ..., …, …• Instant automatic provisioning• Simplified Administration• 5.5M+ Clusters
• Petabyte Scale Data Warehousing
• Massively parallel OnLine Analytic Processing
• Resizable without downtime• Managed provisioning and
administration• Compatible with PostgreSQL
AmazonRedshift
Amazon Redshift Architecture
Leader Node
• SQL endpoint
• Stores metadata
• Coordinates query execution
Compute Nodes
• Local, columnar storage
• Execute queries in parallel
• Load, backup, restore via Amazon S3; load from Amazon DynamoDB or SSH
Two hardware platforms
• Optimized for data processing• DW1: HDD; scale from 2TB to 1.6PB• DW2: SSD; scale from 160GB to 256TB
10 GigE(HPC)
IngestionBackupRestore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3 / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute Node
128GB RAM
16TB disk
16 coresCompute Node
128GB RAM
16TB disk
16 coresCompute Node
LeaderNode
ETL from EMR/Hive to Amazon Redshift trough Amazon S3
EMR S3 Redshift
Extract & Transform Load
UnstructuredUnclean
StructuredClean
ColumnarCompressed
7+ Billion
~50 to ~3500 Instances in 3 days
AmazonAuto
Scaling
• Adjust capacity to demand• Automated and customizable
provisioning• Integrated monitoring and load
balancing• Maintain fleet size across
availability zones• On-demmand or scheduled actions
DEMO
280+ Releases in 2014
http://aws.amazon.com/newhttp://aws.amazon.com/blogs/aws
Where to begin?
http://aws.amazon.com/training/intro_series/
http://aws.amazon.com/training/