Big dataandhp cforawsbrasilsummit

88
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Big Data and High Performance Computing Solutions in the AWS Cloud Michel Pereira, Enterprise Solutions Architect May 27, 2014

description

Apresentações do AWS Summit Sao Paulo 2014. Baixe o conteúdo preparado por nossos especialistas para auxiliá-lo na jornada para a nuvem.

Transcript of Big dataandhp cforawsbrasilsummit

Page 1: Big dataandhp cforawsbrasilsummit

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Big Data and High Performance Computing Solutions in the AWS Cloud

Michel Pereira, Enterprise Solutions Architect

May 27, 2014

Page 2: Big dataandhp cforawsbrasilsummit

Big Data HPC

Customer Success Story

Getting Started on AWS

What we’ll cover today…

Page 3: Big dataandhp cforawsbrasilsummit

Big Data HPC

Customer Success Story

Getting Started on AWS

What we’ll cover today…

Page 4: Big dataandhp cforawsbrasilsummit

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 5: Big dataandhp cforawsbrasilsummit

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 6: Big dataandhp cforawsbrasilsummit

GB TB PB

95%  of  the  1.2  ze.abytes  of  data  in  the  digital  universe  is  unstructured  

70%  of  of  this  is  user-­‐generated  content    

Unstructured  data  growth  explosive,  with  esDmates  of  compound  annual  growth  (CAGR)  at  62%  from  2008  –  2012.  Source:  IDC

ZB

EB

Big Data: Unconstrained data growth

Page 7: Big dataandhp cforawsbrasilsummit

Lower cost, higher throughput Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 8: Big dataandhp cforawsbrasilsummit

Customer segmentation

Marketing spend optimization

Financial modeling & forecasting

Ad targeting & real time bidding

Clickstream analysis

Fraud detection

Use Cases

Page 9: Big dataandhp cforawsbrasilsummit

Visits, views, clicks, purchases

Source, device, location, time

Latency, throughput, uptime

Likes, shares, friends, follows

Price, frequency

Metrics

Page 10: Big dataandhp cforawsbrasilsummit

Relational

NoSQL

Web servers

Mobile phones

Tablets

3rd party feeds

Sources

Page 11: Big dataandhp cforawsbrasilsummit

Structured

Unstructured

Text

Binary

Near Real-time

Batched

Formats

Page 12: Big dataandhp cforawsbrasilsummit

Reporting

Dashboards

Sentiment

Clustering

Machine Learning

Optimization

Analysis

Page 13: Big dataandhp cforawsbrasilsummit

Lower cost, higher throughput

Highly constrained

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 14: Big dataandhp cforawsbrasilsummit

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Generated data

Available for analysis

Data volume

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Page 15: Big dataandhp cforawsbrasilsummit

Elastic and highly scalable

No upfront capital expense

Only pay for what you use +

+

Available on-demand +

= Remove constraints

Page 16: Big dataandhp cforawsbrasilsummit

Accelerated

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 17: Big dataandhp cforawsbrasilsummit

Technologies and techniques for working productively with data, at any scale.

Big Data

Page 18: Big dataandhp cforawsbrasilsummit

Big data and AWS cloud computing

Big data Cloud computing Variety, volume, and velocity requiring new tools

Variety of compute, storage, and networking options

Page 19: Big dataandhp cforawsbrasilsummit

Big data and AWS cloud computing

Big data Cloud computing Potentially massive datasets Massive, virtually unlimited capacity

Page 20: Big dataandhp cforawsbrasilsummit

Big data and AWS cloud computing

Big data Cloud computing Iterative, experimental style of data manipulation and analysis

Iterative, experimental style of infrastructure deployment/usage

Page 21: Big dataandhp cforawsbrasilsummit

Big data and AWS cloud computing

Big data Cloud computing Frequently not a steady-state workload; peaks and valleys

At its most efficient with highly variable workloads

Page 22: Big dataandhp cforawsbrasilsummit

Big data and AWS cloud computing

Big data Cloud computing Absolute performance not as critical as “time to results”; shared resources are a bottleneck

Parallel compute projects allow each workgroup to have more autonomy, get faster results

Page 23: Big dataandhp cforawsbrasilsummit

Ease of use Lower costs

Page 24: Big dataandhp cforawsbrasilsummit

no capital investment

pay as you go

no subscriptions

only pay for what you use

Ease of use Lower costs

Page 25: Big dataandhp cforawsbrasilsummit

programmable

zero admin easy to configure

integrate with existing tools

Ease of use Lower costs

Page 26: Big dataandhp cforawsbrasilsummit

One tool to rule them all

Page 27: Big dataandhp cforawsbrasilsummit

Use the right tools

Amazon S3

Amazon Kinesis

Amazon DynamoDB

Amazon Redshift

Amazon Elastic

MapReduce

Page 28: Big dataandhp cforawsbrasilsummit

Store anything

Object storage

Scalable

99.999999999% durability

Amazon S3

Page 29: Big dataandhp cforawsbrasilsummit

Real-time processing

High throughput; elastic

Easy to use

EMR, S3, Redshift, DynamoDB

Integrations

Amazon Kinesis

Page 30: Big dataandhp cforawsbrasilsummit

NoSQL Database

Seamless scalability

Zero admin

Single digit millisecond latency

Amazon DynamoDB

Page 31: Big dataandhp cforawsbrasilsummit

Relational data warehouse

Massively parallel

Petabyte scale

Fully managed

$1,000/TB/Year

Amazon Redshift

Page 32: Big dataandhp cforawsbrasilsummit

Hadoop/HDFS clusters

Hive, Pig, Impala, Hbase

Easy to use; fully managed

On-demand and spot pricing

Tight integration with S3,

DynamoDB, and Kinesis

Amazon Elastic

MapReduce

Page 33: Big dataandhp cforawsbrasilsummit

HDFS

Analytics languages

Data management

Amazon RedShift

Amazon EMR Amazon

RDS

Amazon S3 Amazon DynamoDB

Amazon Kinesis

Sources Sources Data

Sources

AWS Data Pipeline

Page 34: Big dataandhp cforawsbrasilsummit

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 35: Big dataandhp cforawsbrasilsummit

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Amazon Glacier

S3

Amazon DynamoDB

Amazon RDS Amazon

Redshift

AWS Direct Connect

AWS Storage Gateway

AWS Import/ Export

Amazon Kinesis Amazon EMR

Page 36: Big dataandhp cforawsbrasilsummit

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Amazon EC2 Amazon EMR Amazon Kinesis

Page 37: Big dataandhp cforawsbrasilsummit

Generation

Collection & storage

Analytics & computation

Collaboration & sharing Amazon

CloudFront AWS

CloudFormation

S3

Amazon DynamoDB

Amazon RDS

Amazon Redshift

Amazon EC2 Amazon EMR

AWS Data Pipeline

Page 38: Big dataandhp cforawsbrasilsummit

The right tools. At the right scale. At the right time.

Page 39: Big dataandhp cforawsbrasilsummit

Big Data HPC

Customer Success Story

Getting Started on AWS

What we’ll cover today…

Page 40: Big dataandhp cforawsbrasilsummit

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

AWS Customer Success Story Victor Oliveira, Diretor de Engenharia Concrete Solutions Marcos Prete, Gerente de Parcerias SAS

Page 41: Big dataandhp cforawsbrasilsummit

The Power to Know

A Empresa - Mundo

•  Líder Mundial em Inteligência Analítica q  Dados para Informações Estratégicas q  Decisões mais rápidas

q  Antecipar oportunidades

•  Fundada em 1976 •  Matriz em Cary, Carolina do Norte •  14 mil funcionários em todo o mundo •  134 países, 400 escritórios •  Great Place to Work

•  1º lugar nos rankings de 2010, 2011 e 2012

Page 42: Big dataandhp cforawsbrasilsummit

The Power to Know

Produtos oferecidos em formato de

licença, mas existe uma demanda latente

de entrega de software como serviço (SaaS)

A Empresa - Brasil

•  Atuação desde 1996 •  + 180 clientes •  Escritórios em SP, RJ e DF •  + 140 colaboradores •  Certificação Top Employers

2012 e 2013

Page 43: Big dataandhp cforawsbrasilsummit

O Desafio do SAS

•  Diminuir os Custo de Operação para seus clientes

The Power to Know

•  Adquirir e Gerenciar Servidores Físicos

•  Simplificar a venda (da licença para SaaS)

•  Oferecer uma Solução Completa

•  Diminuir os Custo de Entrada para seus clientes

Page 44: Big dataandhp cforawsbrasilsummit

•  Big Data •  O produto já existe !

•  Evolução do Negócio •  Value Proposition

•  Alavancar IaaS da AWS •  Parceria com Inteligência

•  Concrete Solutions e SAS

The Power to Know

Abordagem

Page 45: Big dataandhp cforawsbrasilsummit

•  Inédito em SaaS no Brasil. •  Ferramenta beneficia departamentos que

precisam: q  Tomar decisões rápidas baseadas em grande

volume e variedade de dados (Big Data) q  Facilitar a análise dos indicadores de seus

negócios

•  Facilidade e velocidade de entrega, com menor custo em relação ao modelo tradicional.

•  O cliente não precisará gerenciar vários provedores e nem manter uma estrutura interna para suporte ao aplicativo.

The Power to Know

O Produto – Visual Analytics

Page 46: Big dataandhp cforawsbrasilsummit

Dashboards  e  Scorecards  

Relatórios    Corpora4vos  

Análises  Dinâmicas  e    ad  hoc  

Análises  Avançadas  e  Data  Mining  

Mobile  Apps,    Distribuição  informação  e  Alertas    

•  Ad  Hoc  Analysis  •  PredicDve  Analysis  •  Data  Mining  

•  Visual  ExploraDon    •  Slice  &  Dice  InvesDgaDve  Analysis  •  Root  Cause  DeterminaDon  

•  Page-­‐perfect  OperaDonal  ReporDng  •  Pixel-­‐perfect  Business  ReporDng  •  Print-­‐perfect  Statements  &  Invoices  

•  Dynamic  Dashboards  •  OperaDonal  Scorecards  •  Metrics  Management  

•  Mobile  ApplicaDons  •  Massive  InformaDon  DistribuDon  •  iPad,    iPhone,  email  •  ExcepDon-­‐based  Alerts  

The Power to Know

Introdução ao Visual Analytics

Page 47: Big dataandhp cforawsbrasilsummit

AWS e Benefícios

PARAGRAFO RESUMO

CASO _ KEY WORDS de BENEFICIO,

DESAFIO VENCIDO –

RESUMO DO CASO EM UM PARAGRAFO

•  Flexibilidade de Capacidade

•  Planejamento do Fluxo de Caixa

•  Escalabilidade e Agilidade com baixo custo

•  Flexibilidade no pagamento

•  Menos funcionários para gerenciar a aplicação

•  Melhora no fluxo de caixa

The Power to Know

Serviços

Software

•  Instalação •  Suporte •  Treinamento •  Carga de Dados

•  SAS Visual Analytics

Infraestrutura Gerenciada

Solu

ção

Com

plet

a

•  AWS e Concrete

Page 48: Big dataandhp cforawsbrasilsummit

The Power to Know

BI Tradicional vs. Ambiente de Exploração de Dados

Page 49: Big dataandhp cforawsbrasilsummit

The Power to Know

Obrigado!

Mais informações: estamos no estande da Concrete!

Marcos Prete Gerente de Alianças do SAS Brasil [email protected]

Victor Oliveira Diretor de Engenharia [email protected] @v_oliv

Page 50: Big dataandhp cforawsbrasilsummit

Big Data HPC

Customer Success Story

Getting Started on AWS

What we’ll cover today…

Page 51: Big dataandhp cforawsbrasilsummit

Take a typical big computation task…

Page 52: Big dataandhp cforawsbrasilsummit

…that an average cluster is too small (or simply takes too long to complete)…

Page 53: Big dataandhp cforawsbrasilsummit

…optimization of algorithms can give some leverage…

Page 54: Big dataandhp cforawsbrasilsummit

…and complete the task in hand…

Page 55: Big dataandhp cforawsbrasilsummit

Applying a large cluster…

Page 56: Big dataandhp cforawsbrasilsummit

…can sometimes be overkill and too expensive

Page 57: Big dataandhp cforawsbrasilsummit

AWS instance clusters can be balanced to the job in hand…

Page 58: Big dataandhp cforawsbrasilsummit

…nor too large…

Page 59: Big dataandhp cforawsbrasilsummit

…nor too small…

Page 60: Big dataandhp cforawsbrasilsummit

…with multiple clusters running at the same time

Page 61: Big dataandhp cforawsbrasilsummit

 Why AWS for HPC?

Low cost with flexible pricing Efficient clusters

Unlimited infrastructure

Faster time to results

Concurrent Clusters on-demand

Increased collaboration

Page 62: Big dataandhp cforawsbrasilsummit

Cluster compute instances Implement HVM process execution Intel® Xeon® processors 10 Gigabit Ethernet –C3 has Enhanced Networking, SR-IOV

cc2.8xlarge

32 vCPUs 2.6 GHz Intel Xeon E5-2670 Sandy Bridge 60.5 GB RAM

4 x 840 GB Local HDD

c3.8xlarge

32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge

60GB RAM

2 x 320 GB Local SSD

AWS High Performance Computing

Page 63: Big dataandhp cforawsbrasilsummit

c3.8xlarge

32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge

60GB RAM

2 x 320 GB Local SSD

Top 500 Super Computer using Amazon EC2

64th fastest supercomputer, Nov 2013 26,496 Intel® Xeon® cores Linpack Performance (Rmax) 484.2 TFlop/s Theoretical (Rpeak) 593.5 Tflops/s

c3.8xlarge

32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge

60GB RAM

2 x 320 GB Local SSD

c3.8xlarge

32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge

60GB RAM

2 x 320 GB Local SSD

c3.8xlarge

32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge

60GB RAM

2 x 320 GB Local SSD

Page 64: Big dataandhp cforawsbrasilsummit

Network placement groups Cluster instances deployed in a Placement Group enjoy low latency, full bisection 10 Gbps bandwidth

10Gbps AWS High Performance Computing

Page 65: Big dataandhp cforawsbrasilsummit

GPU compute instances

cg1.4xlarge

Intel® Xeon® X5570 33.5 vCPUs

22.5GB RAM 2x NVIDIA GPU 448 Cores 3GB Mem

g2.2xlarge

Intel® Xeon E5-2670 8vCPUs

15GB RAM 1x NVIDIA GPU 1536 Cores 4GB Mem

G2 instances 1 NVIDIA Kepler GK104 GPU I/O Performance: Very High (10 Gigabit Ethernet)

CG1 instances 2 x NVIDIA Tesla “Fermi” M2050 GPUs I/O Performance: Very High (10 Gigabit Ethernet)

AWS High Performance Computing

Page 66: Big dataandhp cforawsbrasilsummit

HPC Partners and Apps

Page 67: Big dataandhp cforawsbrasilsummit

Making Production Cloud HPC easy from 64 cores to …

Pharma Johnson & Johnson

Manufacturing HGST, a Western Digital Company

Financial Services Pacific Life Insurance

Genomics Life Technologies

Research The Aerospace

Corporation

… 156,314 cores for better solar panel materials for $33k, not $68M

Amazon EC2 16,788 Spot

Instances

Amazon S3 4TB

Processed

Spot Instances on all 8 Regions

1.21 PetaFLOPS

Intel SandyBridge on CC2

Page 68: Big dataandhp cforawsbrasilsummit

Big Data HPC

Customer Success Story

Getting Started on AWS

What we’ll cover today…

Page 69: Big dataandhp cforawsbrasilsummit

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

AWS Customer Success Story Sergio Mafra, Líder de Inovação em TI ONS – Operador Nacional do Sistema Elétrico

Page 70: Big dataandhp cforawsbrasilsummit

•  O Operador Nacional do Sistema Elétrico (ONS) é uma empresa privada, responsável pelo planejamento e operação da geração e transmissão de energia elétrica no Sistema Interligado Nacional (SIN).

•  Com cerca de 800 funcionários, em 5 local idades (Rio de Janeiro, Recife, Florianópolis e Brasília), o ONS é uma empresa intensiva em informações com uso contínuo de modelos matemáticos que requer HPC (High Performance Computing e Big Data)

“A Amazon Web Services permite provisionar clusters de alto desempenho em minutos, reduzindo significantemente o tempo total de processamento”.

“Com isso, percebemos que a AWS transforma High Performance

Computers em High Performance Customers”

- Sérgio Mafra

Page 71: Big dataandhp cforawsbrasilsummit
Page 72: Big dataandhp cforawsbrasilsummit

O SIN atende 98% do consumo de eletricidade

do Brasil.

SIN - Sistema Elétrico Brasileiro

Sistemas Isolados Amazônia Legal 2% do Mercado Predominantemente Térmico + 300 localidades isoladas -

Modelo predominantemente hidroelétrico com grandes

reservatórios e grandes interligações.

Page 73: Big dataandhp cforawsbrasilsummit

O Desafio

•  Prover ao ONS uma plataforma de maior capacidade de processamento, permitindo obter uma redução no tempo de solução dos modelos matemáticos, com custo adequado ao tempo de utilização, de fácil gestão do ambiente em cluster e que fosse transparente para a organização.

•  Permitir o “time-to-market” para a área de TI , de tendo o conhec imento e a responsividade às demandas inesperadas provenientes das áreas da organização.

“Scotty, We Need More Power”

Page 74: Big dataandhp cforawsbrasilsummit

Benefícios alcançados

•  Redução de cerca de 40% no tempo de resolução dos modelos matemáticos de planejamento eletro-energéticos, com custo 30% inferior.

•  Condição de analisar 5 estratégias de utilização dos modelos Newave/Decomp em prazo recorde (1 semana), com a execução de 600 casos. O prazo on-premises seria de 3 semanas, incompatível com o compromisso acordado com o MME.

Virtual Private Cloud

Work

Controlador

Internet/AWS

10.24.0.0/24 10.24.1.0/24

10.21.0.0/16

Page 75: Big dataandhp cforawsbrasilsummit

Benefícios alcançados

•  “Uau... 40 minutos para 4 minutos !!!!” •  “Agora vou usar todos os parâmetros de

cálculo para ter um estudo mais completo” •  “Salta 4 x 80 para agora !!!” •  “Obrigado por poder sair 2 horas mais

cedo. Todos os casos já rodaram” •  “Rodamos o estudo em 2 minutos. O

sistema pode ser operacional e vai virar caso internacional de sucesso”

Page 76: Big dataandhp cforawsbrasilsummit

Sistema de Medição Sincronizada de Fasores - SMSF

PDC

Page 77: Big dataandhp cforawsbrasilsummit

Armazenamento Anual do SMSF

2013 •  8,5 TB

2015 •  70 TB

2018 •  120 TB

2022 •  312 TB

Big Data

Data

Coleta estimada para apenas 7 grandezas de medida

Volume total do Storage do DC do Rio em 2013

Page 78: Big dataandhp cforawsbrasilsummit

Histórico

1 Tb

Cluster Hadoop

OpenPDC

Coletor

Master

Nó 1

Nó 3

Nó N

Nó 2

HDFS

HDFS

HDFS

HDFS

S3

Armazenador

Glacier

Historiador

Glacier

Glacier Glacier

Glacier

Analytics

PMUs

Controlador

Processamento

Arquitetura

EM ESTUDO

Page 79: Big dataandhp cforawsbrasilsummit

Big Data HPC

Customer Success Story

Getting Started on AWS

What we’ll cover today…

Page 80: Big dataandhp cforawsbrasilsummit

Solution Architects

Professional Services

Premium Support

AWS Partner Network (APN)

AWS is here to help

Page 81: Big dataandhp cforawsbrasilsummit

AWS Architecture Diagrams

https://aws.amazon.com/architecture/

Processing large amounts of parallel data using a scalable cluster

Use commonly-available cluster scheduling tools, such as Grid Engine or Condor

Page 82: Big dataandhp cforawsbrasilsummit

AWS Online Software Store

http://aws.amazon.com/marketplace

Big Data Case Studies

Learn from other AWS customers

https://aws.amazon.com/solutions/case-studies/big-data

Page 83: Big dataandhp cforawsbrasilsummit

AWS Online Software Store

https://aws.amazon.com/marketplace

AWS Marketplace

Page 84: Big dataandhp cforawsbrasilsummit

AWS Online Software Store

http://aws.amazon.com/marketplace

AWS Public Data Sets

Free access to big data sets

https://aws.amazon.com/publicdatasets

Page 85: Big dataandhp cforawsbrasilsummit

AWS Online Software Store

AWS Big Data Test Drives

APN Partner-provided labs

https://aws.amazon.com/testdrive/bigdata

Page 86: Big dataandhp cforawsbrasilsummit

Webinars, Bootcamps, and Self-Paced Labs https://aws.amazon.com/training

AWS Training & Events

https://aws.amazon.com/events

Page 87: Big dataandhp cforawsbrasilsummit

AWS Online Software Store

Big Data to AWS

Brand new course on Big Data

https://aws.amazon.com/training/course-descriptions/bigdata/

Page 88: Big dataandhp cforawsbrasilsummit

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

https://aws.amazon.com/big-data https://aws.amazon.com/hpc