Post on 19-Aug-2014
description
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
climateprediction.net: A Cloudy ApproachMaster in High Performance Computing
Master’s Thesis
Diego Perez Montes
advised by
Tomas Fernandez PenaJuan Antonio Anel Cabanelas
July 1, 2014
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
1 Problem BackgroundCurrent InfrastructureProblem Description
2 Computing Infrastructure MigrationMeasuring the Problem...Infrastructure Redesign
3 Storage
4 Central Control SystemBackend ComponentsDashboardRunning the Simulation
5 Conclusions
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Motivation
Solve a real problem, useful for someone and that can beexpanded in further works.
Apply what I’ve learned in the Master courses.
I do love large infrastructure problems (and this is a big one!).
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Current Infrastructure
First of all: How does the project currently work?
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Current Infrastructure
Figure : BOINC: High Level Architecture and Workflow
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Problem Description
So, what is the problem then?
The need of execution of a new model (HadGEM)
The resources requirements are higher (Hardware: Computingand Storage).
The current BOINC workunit processing time is 7-9 days , thiswants to be reduced.
Heterogeneous and unpredictable environment:
Can’t manage resources on-demand.Execution time can’t be properly measured.Processed data is missing.
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Problem Description
So, what is the problem then?
Need to establish metrics on the project.
Rationalization of costs (how much does a simulation reallycost?)
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Project Objectives
How is it going to be solved?
Conversion to an Infrastructure as a Service (Iaas) in theCloud (Amazon Web Services AWS: EC2 for Computing andS3 for Storage).
Creation of a new abstraction layer, the Central ControlSystem:
Infrastructure and resources management.Creation of metrics and statistics.
Free Software.
Fully documented.
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
Unknown problem real size and how is it going to behaveinto the new environment with the new parametrization.
Initial data from the current infrastructure over BOINC(Computing point of view):
A workunit takes in average from 7 to 9 days to be processed.
A full simulation is (minimum) 36,000 workunits into sectionsof 6,000.
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
Initial considerations:
Models used on the tests: weather@homeUK floods and andweather@home Australia New Zealand (full and regional:HaDAM3P and HadRM3P)
Two representative systems (on EC2) were selected and 10consecutive executions were done.
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
System #1: Moderate CPU
CPU: 2 x Xeon E5-2650
MEM: 8GB (4GB/Core)
GPU: No
Workunit Time: 7.32 days
Workunit Cost: USD 4.464
Full Simulation Cost: USD 160,704
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
System #2: Intensive CPU&GPU
CPU: 16 x Xeon X5570
MEM: 24GB (1.5GB/Core)
GPU: 2 x Tesla M2050
Workunit Time: 1.99 days
Workunit Cost: USD 100.966
Full Simulation Cost: USD 3,634,776
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
Figure : Workunit Processing Time ComparissonDiego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
How much does it really cost?
Figure : Simulation Price ComparissonDiego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Going IaaS
Figure : Proposed Infrastructure
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Going IaaS
Steps:1 Template an instance:
Install Operating System (Amazon Linux 2014.03.1 64bit)Configure network and firewall.Configure local storage: 16GBInstall and configure BOINC to use climateprediction.netInstall local client (Simulation Terminator)
2 Contextualize and scale.
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Storage
Every simulation (36,000 workunits) outputs 3.6 TB of data.
There are not enough resources (disk space) on the currentsystems.
Figure : Shared Storage Architecture
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Architecture
Figure : Central System Architecture
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Backend Components
Simple Scheduler: Runs and configures simulation with givenparameters (start/stop instances).
Reaper: Releases resources (terminates instances) when theyare powered off.
RESTful API: Gives access to configure and run simulations.
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
API
RESTful API
Get simulation status.
Get metric/statistic data.
Set/modify simulation parameters (number of workernodes/instances).
Stop simulation.
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Dashboard
Figure : Dashboard Interface
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Running the Simulation
[Overview of a Live System]
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Conclusions
Objectives Achieved
Computing and Storage successfully migrated to the Cloud(EC2 and S3).
Simulations were executed, showing that running the model inthe cloud is possible.
Development and a Central System (scheduler anddashboard).
Got costs and metrics of the project.
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Conclusions
What’s Next?
Migrate BOINC server.
More control/interaction with clients so the scheduler can beimproved (and give a full SaaS layer).
Costs: “warm up“ stage to dynamically recalculate price.
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Thanks!
Diego Perez Montes climateprediction.net: A Cloudy Approach
Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Used Icons Links
Iconset Windows 8 metro style: https://www.iconfinder.
com/iconsets/windows-8-metro-style
Link: http://sta.sh/0228t4fyjyjb
Diego Perez Montes climateprediction.net: A Cloudy Approach