Big dataandhp cforawsbrasilsummit
-
Upload
amazon-web-services-latin-america -
Category
Business
-
view
104 -
download
0
description
Transcript of Big dataandhp cforawsbrasilsummit
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Big Data and High Performance Computing Solutions in the AWS Cloud
Michel Pereira, Enterprise Solutions Architect
May 27, 2014
Big Data HPC
Customer Success Story
Getting Started on AWS
What we’ll cover today…
Big Data HPC
Customer Success Story
Getting Started on AWS
What we’ll cover today…
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
GB TB PB
95% of the 1.2 ze.abytes of data in the digital universe is unstructured
70% of of this is user-‐generated content
Unstructured data growth explosive, with esDmates of compound annual growth (CAGR) at 62% from 2008 – 2012. Source: IDC
ZB
EB
Big Data: Unconstrained data growth
Lower cost, higher throughput Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Customer segmentation
Marketing spend optimization
Financial modeling & forecasting
Ad targeting & real time bidding
Clickstream analysis
Fraud detection
Use Cases
Visits, views, clicks, purchases
Source, device, location, time
Latency, throughput, uptime
Likes, shares, friends, follows
Price, frequency
Metrics
Relational
NoSQL
Web servers
Mobile phones
Tablets
3rd party feeds
Sources
Structured
Unstructured
Text
Binary
Near Real-time
Batched
Formats
Reporting
Dashboards
Sentiment
Clustering
Machine Learning
Optimization
Analysis
Lower cost, higher throughput
Highly constrained
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Generated data
Available for analysis
Data volume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Elastic and highly scalable
No upfront capital expense
Only pay for what you use +
+
Available on-demand +
= Remove constraints
Accelerated
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Technologies and techniques for working productively with data, at any scale.
Big Data
Big data and AWS cloud computing
Big data Cloud computing Variety, volume, and velocity requiring new tools
Variety of compute, storage, and networking options
Big data and AWS cloud computing
Big data Cloud computing Potentially massive datasets Massive, virtually unlimited capacity
Big data and AWS cloud computing
Big data Cloud computing Iterative, experimental style of data manipulation and analysis
Iterative, experimental style of infrastructure deployment/usage
Big data and AWS cloud computing
Big data Cloud computing Frequently not a steady-state workload; peaks and valleys
At its most efficient with highly variable workloads
Big data and AWS cloud computing
Big data Cloud computing Absolute performance not as critical as “time to results”; shared resources are a bottleneck
Parallel compute projects allow each workgroup to have more autonomy, get faster results
Ease of use Lower costs
no capital investment
pay as you go
no subscriptions
only pay for what you use
Ease of use Lower costs
programmable
zero admin easy to configure
integrate with existing tools
Ease of use Lower costs
One tool to rule them all
Use the right tools
Amazon S3
Amazon Kinesis
Amazon DynamoDB
Amazon Redshift
Amazon Elastic
MapReduce
Store anything
Object storage
Scalable
99.999999999% durability
Amazon S3
Real-time processing
High throughput; elastic
Easy to use
EMR, S3, Redshift, DynamoDB
Integrations
Amazon Kinesis
NoSQL Database
Seamless scalability
Zero admin
Single digit millisecond latency
Amazon DynamoDB
Relational data warehouse
Massively parallel
Petabyte scale
Fully managed
$1,000/TB/Year
Amazon Redshift
Hadoop/HDFS clusters
Hive, Pig, Impala, Hbase
Easy to use; fully managed
On-demand and spot pricing
Tight integration with S3,
DynamoDB, and Kinesis
Amazon Elastic
MapReduce
HDFS
Analytics languages
Data management
Amazon RedShift
Amazon EMR Amazon
RDS
Amazon S3 Amazon DynamoDB
Amazon Kinesis
Sources Sources Data
Sources
AWS Data Pipeline
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Amazon Glacier
S3
Amazon DynamoDB
Amazon RDS Amazon
Redshift
AWS Direct Connect
AWS Storage Gateway
AWS Import/ Export
Amazon Kinesis Amazon EMR
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Amazon EC2 Amazon EMR Amazon Kinesis
Generation
Collection & storage
Analytics & computation
Collaboration & sharing Amazon
CloudFront AWS
CloudFormation
S3
Amazon DynamoDB
Amazon RDS
Amazon Redshift
Amazon EC2 Amazon EMR
AWS Data Pipeline
The right tools. At the right scale. At the right time.
Big Data HPC
Customer Success Story
Getting Started on AWS
What we’ll cover today…
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
AWS Customer Success Story Victor Oliveira, Diretor de Engenharia Concrete Solutions Marcos Prete, Gerente de Parcerias SAS
The Power to Know
A Empresa - Mundo
• Líder Mundial em Inteligência Analítica q Dados para Informações Estratégicas q Decisões mais rápidas
q Antecipar oportunidades
• Fundada em 1976 • Matriz em Cary, Carolina do Norte • 14 mil funcionários em todo o mundo • 134 países, 400 escritórios • Great Place to Work
• 1º lugar nos rankings de 2010, 2011 e 2012
The Power to Know
Produtos oferecidos em formato de
licença, mas existe uma demanda latente
de entrega de software como serviço (SaaS)
A Empresa - Brasil
• Atuação desde 1996 • + 180 clientes • Escritórios em SP, RJ e DF • + 140 colaboradores • Certificação Top Employers
2012 e 2013
O Desafio do SAS
• Diminuir os Custo de Operação para seus clientes
The Power to Know
• Adquirir e Gerenciar Servidores Físicos
• Simplificar a venda (da licença para SaaS)
• Oferecer uma Solução Completa
• Diminuir os Custo de Entrada para seus clientes
• Big Data • O produto já existe !
• Evolução do Negócio • Value Proposition
• Alavancar IaaS da AWS • Parceria com Inteligência
• Concrete Solutions e SAS
The Power to Know
Abordagem
• Inédito em SaaS no Brasil. • Ferramenta beneficia departamentos que
precisam: q Tomar decisões rápidas baseadas em grande
volume e variedade de dados (Big Data) q Facilitar a análise dos indicadores de seus
negócios
• Facilidade e velocidade de entrega, com menor custo em relação ao modelo tradicional.
• O cliente não precisará gerenciar vários provedores e nem manter uma estrutura interna para suporte ao aplicativo.
The Power to Know
O Produto – Visual Analytics
Dashboards e Scorecards
Relatórios Corpora4vos
Análises Dinâmicas e ad hoc
Análises Avançadas e Data Mining
Mobile Apps, Distribuição informação e Alertas
• Ad Hoc Analysis • PredicDve Analysis • Data Mining
• Visual ExploraDon • Slice & Dice InvesDgaDve Analysis • Root Cause DeterminaDon
• Page-‐perfect OperaDonal ReporDng • Pixel-‐perfect Business ReporDng • Print-‐perfect Statements & Invoices
• Dynamic Dashboards • OperaDonal Scorecards • Metrics Management
• Mobile ApplicaDons • Massive InformaDon DistribuDon • iPad, iPhone, email • ExcepDon-‐based Alerts
The Power to Know
Introdução ao Visual Analytics
AWS e Benefícios
PARAGRAFO RESUMO
CASO _ KEY WORDS de BENEFICIO,
DESAFIO VENCIDO –
RESUMO DO CASO EM UM PARAGRAFO
• Flexibilidade de Capacidade
• Planejamento do Fluxo de Caixa
• Escalabilidade e Agilidade com baixo custo
• Flexibilidade no pagamento
• Menos funcionários para gerenciar a aplicação
• Melhora no fluxo de caixa
The Power to Know
Serviços
Software
• Instalação • Suporte • Treinamento • Carga de Dados
• SAS Visual Analytics
Infraestrutura Gerenciada
Solu
ção
Com
plet
a
• AWS e Concrete
The Power to Know
BI Tradicional vs. Ambiente de Exploração de Dados
The Power to Know
Obrigado!
Mais informações: estamos no estande da Concrete!
Marcos Prete Gerente de Alianças do SAS Brasil [email protected]
Victor Oliveira Diretor de Engenharia [email protected] @v_oliv
Big Data HPC
Customer Success Story
Getting Started on AWS
What we’ll cover today…
Take a typical big computation task…
…that an average cluster is too small (or simply takes too long to complete)…
…optimization of algorithms can give some leverage…
…and complete the task in hand…
Applying a large cluster…
…can sometimes be overkill and too expensive
AWS instance clusters can be balanced to the job in hand…
…nor too large…
…nor too small…
…with multiple clusters running at the same time
Why AWS for HPC?
Low cost with flexible pricing Efficient clusters
Unlimited infrastructure
Faster time to results
Concurrent Clusters on-demand
Increased collaboration
Cluster compute instances Implement HVM process execution Intel® Xeon® processors 10 Gigabit Ethernet –C3 has Enhanced Networking, SR-IOV
cc2.8xlarge
32 vCPUs 2.6 GHz Intel Xeon E5-2670 Sandy Bridge 60.5 GB RAM
4 x 840 GB Local HDD
c3.8xlarge
32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB Local SSD
AWS High Performance Computing
c3.8xlarge
32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB Local SSD
Top 500 Super Computer using Amazon EC2
64th fastest supercomputer, Nov 2013 26,496 Intel® Xeon® cores Linpack Performance (Rmax) 484.2 TFlop/s Theoretical (Rpeak) 593.5 Tflops/s
c3.8xlarge
32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB Local SSD
c3.8xlarge
32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB Local SSD
c3.8xlarge
32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB Local SSD
Network placement groups Cluster instances deployed in a Placement Group enjoy low latency, full bisection 10 Gbps bandwidth
10Gbps AWS High Performance Computing
GPU compute instances
cg1.4xlarge
Intel® Xeon® X5570 33.5 vCPUs
22.5GB RAM 2x NVIDIA GPU 448 Cores 3GB Mem
g2.2xlarge
Intel® Xeon E5-2670 8vCPUs
15GB RAM 1x NVIDIA GPU 1536 Cores 4GB Mem
G2 instances 1 NVIDIA Kepler GK104 GPU I/O Performance: Very High (10 Gigabit Ethernet)
CG1 instances 2 x NVIDIA Tesla “Fermi” M2050 GPUs I/O Performance: Very High (10 Gigabit Ethernet)
AWS High Performance Computing
HPC Partners and Apps
Making Production Cloud HPC easy from 64 cores to …
Pharma Johnson & Johnson
Manufacturing HGST, a Western Digital Company
Financial Services Pacific Life Insurance
Genomics Life Technologies
Research The Aerospace
Corporation
… 156,314 cores for better solar panel materials for $33k, not $68M
Amazon EC2 16,788 Spot
Instances
Amazon S3 4TB
Processed
Spot Instances on all 8 Regions
1.21 PetaFLOPS
Intel SandyBridge on CC2
Big Data HPC
Customer Success Story
Getting Started on AWS
What we’ll cover today…
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
AWS Customer Success Story Sergio Mafra, Líder de Inovação em TI ONS – Operador Nacional do Sistema Elétrico
• O Operador Nacional do Sistema Elétrico (ONS) é uma empresa privada, responsável pelo planejamento e operação da geração e transmissão de energia elétrica no Sistema Interligado Nacional (SIN).
• Com cerca de 800 funcionários, em 5 local idades (Rio de Janeiro, Recife, Florianópolis e Brasília), o ONS é uma empresa intensiva em informações com uso contínuo de modelos matemáticos que requer HPC (High Performance Computing e Big Data)
“A Amazon Web Services permite provisionar clusters de alto desempenho em minutos, reduzindo significantemente o tempo total de processamento”.
“Com isso, percebemos que a AWS transforma High Performance
Computers em High Performance Customers”
- Sérgio Mafra
O SIN atende 98% do consumo de eletricidade
do Brasil.
SIN - Sistema Elétrico Brasileiro
Sistemas Isolados Amazônia Legal 2% do Mercado Predominantemente Térmico + 300 localidades isoladas -
Modelo predominantemente hidroelétrico com grandes
reservatórios e grandes interligações.
O Desafio
• Prover ao ONS uma plataforma de maior capacidade de processamento, permitindo obter uma redução no tempo de solução dos modelos matemáticos, com custo adequado ao tempo de utilização, de fácil gestão do ambiente em cluster e que fosse transparente para a organização.
• Permitir o “time-to-market” para a área de TI , de tendo o conhec imento e a responsividade às demandas inesperadas provenientes das áreas da organização.
“Scotty, We Need More Power”
Benefícios alcançados
• Redução de cerca de 40% no tempo de resolução dos modelos matemáticos de planejamento eletro-energéticos, com custo 30% inferior.
• Condição de analisar 5 estratégias de utilização dos modelos Newave/Decomp em prazo recorde (1 semana), com a execução de 600 casos. O prazo on-premises seria de 3 semanas, incompatível com o compromisso acordado com o MME.
Virtual Private Cloud
Work
Controlador
Internet/AWS
10.24.0.0/24 10.24.1.0/24
10.21.0.0/16
Benefícios alcançados
• “Uau... 40 minutos para 4 minutos !!!!” • “Agora vou usar todos os parâmetros de
cálculo para ter um estudo mais completo” • “Salta 4 x 80 para agora !!!” • “Obrigado por poder sair 2 horas mais
cedo. Todos os casos já rodaram” • “Rodamos o estudo em 2 minutos. O
sistema pode ser operacional e vai virar caso internacional de sucesso”
Sistema de Medição Sincronizada de Fasores - SMSF
PDC
Armazenamento Anual do SMSF
2013 • 8,5 TB
2015 • 70 TB
2018 • 120 TB
2022 • 312 TB
Big Data
Data
Coleta estimada para apenas 7 grandezas de medida
Volume total do Storage do DC do Rio em 2013
Histórico
1 Tb
Cluster Hadoop
OpenPDC
Coletor
Master
Nó 1
Nó 3
Nó N
Nó 2
HDFS
HDFS
HDFS
HDFS
S3
Armazenador
Glacier
Historiador
Glacier
Glacier Glacier
Glacier
Analytics
PMUs
Controlador
Processamento
Arquitetura
EM ESTUDO
Big Data HPC
Customer Success Story
Getting Started on AWS
What we’ll cover today…
Solution Architects
Professional Services
Premium Support
AWS Partner Network (APN)
AWS is here to help
AWS Architecture Diagrams
https://aws.amazon.com/architecture/
Processing large amounts of parallel data using a scalable cluster
Use commonly-available cluster scheduling tools, such as Grid Engine or Condor
AWS Online Software Store
http://aws.amazon.com/marketplace
Big Data Case Studies
Learn from other AWS customers
https://aws.amazon.com/solutions/case-studies/big-data
AWS Online Software Store
https://aws.amazon.com/marketplace
AWS Marketplace
AWS Online Software Store
http://aws.amazon.com/marketplace
AWS Public Data Sets
Free access to big data sets
https://aws.amazon.com/publicdatasets
AWS Online Software Store
AWS Big Data Test Drives
APN Partner-provided labs
https://aws.amazon.com/testdrive/bigdata
Webinars, Bootcamps, and Self-Paced Labs https://aws.amazon.com/training
AWS Training & Events
https://aws.amazon.com/events
AWS Online Software Store
Big Data to AWS
Brand new course on Big Data
https://aws.amazon.com/training/course-descriptions/bigdata/
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
https://aws.amazon.com/big-data https://aws.amazon.com/hpc