Outline
description
Transcript of Outline
TRANSPARENT GRID ENABLEMENT OFWEATHER RESEARCH AND FORECASTINGS. MASOUD SADJADI1, LIANA FONG6, ROSA M. BADIA2, JAVIER FIGUEROA1,9, JAVIER DELGADO1, XABRIEL J. COLLAZO-MOJICA8, KHALID SALEEM1, RAJU RANGASWAMI1, SHU SHIMIZU4, HECTOR A. DURAN LIMON5, PAT WELSH3, SANDEEP PATTNAIK10, ANTHONY PRAINO6, DAVID VILLEGAS1, SELIM KALAYCI1, GARGI DASGUPTA7, ONYEKA EZENWOYE1, JUAN CARLOS MARTINEZ1, IVAN RODERO2, SHUYI CHEN9, JAVIER MUÑOZ1, DIEGO LOPEZ1, JULITA CORBALAN2, HUGH WILLOUGHBY1, MICHAEL MCFAIL1, CHRISTINE LISETTI1, AND MALEK ADJOUADI1
1: FLORIDA INTERNATIONAL UNIVERSITY (FIU), MIAMI, FLORIDA, USA; 2: BARCELONA SUPERCOMPUTING CENTER, BARCELONA, SPAIN; 3: UNIVERSITY OF NORTH FLORIDA, JACKSONVILLE, FLORIDA, USA; 4: IBM TOKYO RESEARCH LABORATORY, TOKYO, JAPAN; 5: UNIVERSITY OF GUADALAJARA, CUCEA, MEXICO; 6: IBM T. J. WATSON, NY, USA; 7: IBM IRL, INDIA; 8: UNIVERSITY OF PUERTO RICO, MAYAGUEZ CAMPUS, PUERTO RICO; 9: UNIVERSITY OF MIAMI, CORAL GABLES, FLORIDA, USA; 10: FLORIDA STATE UNIVERSITY, TALLAHASSEE, FLORIDA, USA
CONTACT: [email protected]
OUTLINE Motivation
Grid Enablement
Application and Scenario
System Overview
Remaining Challenges & Lessons Learned
MOTIVATION Weather Prediction can:
Save Lives Help Business Owners
How? Accurate Results Precise Location Information
What do we have? WRF – Weather Research Forecast “The Weather Research and Forecasting (WRF)
Model is a next-generation mesocale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs.”
MOTIVATION (CONT.) WRF Status
Single Machine/Cluster Single Domain Fine Resolution -> Resource Requirements
How to Overcome this? Through Grid Enablement
Expected Benefits to WRF More available resources – Different Domains Faster results Improved Accuracy
GRID ENABLEMENT “Grid-enabling is the practice of taking existing
applications, which currently run on a single node or on a cluster of homogeneous nodes, and adapt them (either automatically or manually) so that they can be deployed over non-homogeneous computing resources connected through the Internet across multiple organizational boundaries (e.g., multiple clusters from different organizations) without major modifications to the underlying source code.”
Grid-enablement process successful if the resulting Grid-enabled application “performs better” than the original application.
Performs better can be interpreted differently Improved execution time, better resource utilization, enabling
collaboration, …
APPLICATION AND SCENARIOTHREE-LAYER NESTED DOMAIN
04/22/23
6
04/22/23
7
15 km
5 km1 km
Application and ScenarioThree-Layer Nested Domain
APPLICATION AND SCENARIOTHREE-LAYER NESTED DOMAIN
04/22/23
8
SYSTEM OVERVIEW Web-Based Portal Grid Middleware (Plumbing)
Job-Flow Management Meta-Scheduling Profiling and Benchmarking
Development Tools and Environments Transparent Grid Enablement (TGE)
TRAP: Static and Dynamic adaptation of programs TRAP/BPEL, TRAP/J, TRAP.NET, etc.
GRID superscalar: Programming Paradigm for parallelizing a sequential application dynamically in a Computational Grid 9
04/22/23
10
System Architecture
Grid Middleware
04/22/23
11
Web-Based Portal ScreenshotMeteorologist Login Interface
04/22/23
12
Web-Based Portal ScreenshotBusiness Owners/Emergency Official’s Login Interface
GRID MIDDLEWAREMiddleware: “A layer between network operating systems and
applications that aims to resolve heterogeneity and distribution”
Examples: CORBA, Java’s RMI and .NET.
Grid Middleware: Middleware for Grid Enablement Examples: Globus, Legion, Condor-G, etc.
PEER-TO-PEER INTER-DOMAIN INTERACTIONS
04/22/23
14
BSC FIU
C
Job-Flow Manager
Job-Flow Manager
Peer-to-peerProtocols
Web-Base Portal
Web-Base Portal
Meteorologist Meteorologist
Local Resources
Local Resources
Local Resources
Local Resources
Meta-Schedule
r
Meta-Schedule
rLoca
schedulerLoca
schedulerLoca
schedulerLoca
schedulerResource Policies
Resource Policies
PEER-TO-PEER INTER-DOMAIN INTERACTIONS
04/22/23
15
BSC FIU
C
Job-Flow Manager
Job-Flow Manager
Peer-to-peerProtocols
Web-Base Portal
Web-Base Portal
Meteorologist Meteorologist
Local Resources
Local Resources
Local Resources
Local Resources
Meta-Schedule
r
Meta-Schedule
rLoca
schedulerLoca
schedulerLoca
schedulerLoca
scheduler
1
2 3 4
5 6
7
1
4
6
1
2 3
5
7
12357 1 4 67
Resource Policies
Resource Policies
PEER-TO-PEER INTER-DOMAIN INTERACTIONS
04/22/23
16TDWB
IBM-USA
TDWB
IBM-India
IBM
Fork
BSCgrid
BSC
SGE
GCB
Fork
GCBViz
FIU
Meta-Scheduler
Meta-Scheduler
Meta-Scheduler
LL/Fork
CEPBA
Job-Flow Manager
Job-Flow Manager
Job-Flow Manager
Peer-to-peer
Peer-to-peer
Peer-t
o-pe
er
04/22/23 17
FAULT-TOLERANT JOB-FLOW MANAGEMENT
GlobusGridwayMeta-Scheduler
ActiveBPELEngine
Portal Client
IBM TDWB Meta-Scheduler
Re-submit job to remote domain
Generic Proxy Generic Proxy
IBM’s Websphere Process Server
Local Scheduler
Local Scheduler
Local Scheduler
Local SchedulerLocal
Scheduler
Domain 1: IBM Domain 2: FIU
Re-poll job at remote domain
Portal Client
GlobusGridwayMeta-Scheduler
ActiveBPELEngine
Portal ClientPortal Client
IBM TDWB Meta-Scheduler
Re-submit job to remote domain
Generic Proxy Generic Proxy
IBM’s Websphere Process Server
Local SchedulerLocal Scheduler
Local SchedulerLocal Scheduler
Local Scheduler
Local SchedulerLocal
Scheduler
Domain 1: IBM Domain 2: FIU
Re-poll job at remote domain
Portal ClientPortal Client
JOB FLOW MANAGEMENT ARCHITECTURE
04/22/23
18
PatternsPatternsPatternsPatterns
PoliciesPolicies
LogsLogsLogsLogsLogs
Proxy: : Generic InvokeFM: : Notification
MS:: Job Submission and MonitoringMS:: Notification
Input job flow
Adapted job flow
Monitor
Recovery
Correlater
JobFlow
Manager(FM)
Meta-Scheduler
(MS)
Generic Proxy
Rule Editor
Deployment Time Run Time
FlowAdapter
After adaptation:
Operation:
submitJob
PartnerLink :
Proxy_JobSubmissionService
After adaptation:Operation:
genericInvokePartnerLink : Proxy_GenericInvoke
Sample Adapted job flow:
After adaptation:
Operation:
submitJob
PartnerLink :
Proxy_JobSubmissionService
Sample Adapted job flow:
After adaptation:
Operation:
submitJob
PartnerLink :
Proxy_JobSubmissionService
After adaptation:Operation:
genericInvokePartnerLink : Proxy_GenericInvoke
Sample Adapted job flow:
Input
Sample Job flow
(WS - BPEL + JSDL):
Sample Job flow
(WS - BPEL + JSDL):
Operation:
submitJob
PartnerLink :
MS_JobSubmissionService
To adapt:
Input
Sample Job flow
(WS - BPEL + JSDL):
Sample Job flow
(WS - BPEL + JSDL):
Operation:
submitJob
PartnerLink :
MS_JobSubmissionService
To adapt:
Sample Job flow
(WS - BPEL + JSDL):
Sample Job flow(WS- BPEL + JSDL):
Operation: submitJob
PartnerLink : MS_JobSubmissionService
To adapt:
Start
THE META-SCHEDULING PROTOCOL04/22/23
19
Connection API
Consumer
Site A
ConnectionManagement
JobManagement
ResourceManagement
Producer
Site B
ConnectionManagement
Job Management
ResourceManagement
Job Management API
Resource Exchange API
requestResourceData ()
resourceData () PUSH MODE
PULL MODE
FIU: META-SCHEDULER INTERNAL ARCHITECTURE
04/22/23
20GCB
Cluster
SGE
Globus
Gridway
Site scheduling manager
WS Client
GlobalSchedulingmanager
Resourcemanager
UserClient
JSDL
LA GridCluster
Fork
Globus
ConnectionManagement
JobManagement
ResourceManagement
BETTER SCHEDULING BY MODELING WRF BEHAVIOR
networkdiskmemory
k
kk bbbbbnx 443cache2CPU10
4
10 04/22/23
21
Mathematical Modeling
Parameter Estimation
ProfilingCode Inspection & Modeling
Texe= ( 0 + 1 / #nodes ) ( 0 + 1 / clock )
ModelingModelingWRFWRF
BehaviorBehavior
An Iterative Process
An Incremental
Process
Start
RESULTSEXECUTION TIME VS ALLOCATED CPU
RESULTSMODEL VALIDATION: A LINEAR MODEL!
04/22/23
230
5000
10000
15000
20000
25000
30000
0 0.5 1 1.5 2 2.5 3
I nverse CPU (GHz)
Com
puta
tion
Tim
e (s
econ
ds)
nodes 2 nodes 3 nodes 4 nodes 5 nodes 6 nodes 7 nodes 8 nodes 9 nodes 10 nodes 11 nodes 12 nodes 13 nodes 14 nodes 15
CHALLENGES REMAIN TO BE ADDRESSED
High latency of Internet compared to high-speed LANs
High overhead of the Grid middleware software
Risking compatibility with future WRF versions
High volume of the WRF sources code
Compiling WRF on unsupported platforms
LESSONS LEARNED No current and complete methodology for Grid
Enablement Grid enabling cluster applications Issues: LAN vs
WAN WRF lack of enough documentation, old
programming techniques Mathematical Model – May Optimize Speedup but
also Error Margin – More Clusters Needed Still on early stage of Concrete Scenario for
Forecast Ensemble
ACKNOWLEDGEMENTSWe are thankful to the following individuals for theircontributions to some of the ideas presented in this paper: Yanbin Liu, Norman Bobroff, Balaji Viswanathan, Steve Luis, Shu-Ching Chen, Lloyd Trinish, Jason Liu, Alex Orta, T. N. Krishnamurti, Eric Johnson, and Donald Llopis.
This work was supported in part by IBM (SUR and Student Support awards), the National Science Foundation (grants OISE-0730065, OCI-0636031, REU-0552555, and HRD-0317692). This work is part of the Latin American Grid (LA Grid) project
Contact Information:S. Masoud SadjadiS. Masoud Sadjadi
http://www.cs.fiu.edu/~sadjadi/[email protected]
Thank you!
and
Questions?