Cesar R. S. da Silva 1 Pedro R. C. da Silveira 1 Renata M. Wentzcovitch 1,2
description
Transcript of Cesar R. S. da Silva 1 Pedro R. C. da Silveira 1 Renata M. Wentzcovitch 1,2
VLab: A Collaborative Cyberinfrastructure for
Computations of Materials Properties at High Pressures
and Temperatures
VLab: A Collaborative Cyberinfrastructure for
Computations of Materials Properties at High Pressures
and Temperatures
Cesar R. S. da Silva1
Pedro R. C. da Silveira1
Renata M. Wentzcovitch1,2
1Minnesota Supercomputing Institute, University of Minnesota 2Department of Chemical Engineering and Materials Science, University of Minnesota
Work Sponsored by NSF grant ITR-0426757
-“VLab is a cyberinfrastructure aimed to facilitate execution of complex calculations - mostly parameter sampling workflows - of materials at high pressures and temperatures.”
-Parameter Sampling Workflows - High P,T Cij as example
-Basic Problem: -Job deluge -Proposed Solution: - Features - Performance-Overall Requirements-Workflow Support Specific Requirements-Service Oriented Architecture
Outline
Thermodynamic Method
• VDoS and F(T,V) within the QHA
Fitted at several temperatures either by- Vinet EOS, or - N-th (N=3,4,5…) order isothermal (eulerian) finite strain EoS
equilibrium structure
(Pn)
kl
re-optimize
Thermoelastic constant tensor CijS(T,P)
Basic Problem
- Wow can High (P,T) Materials Computations be improved?
Demand for Extensive Parameter Sampling
€
⇒Typical High (P,T) study(ex. Thermal Properties) {Pn}x{qi} => ~102 jobs
€
⇒Huge High (P,T) study( Cij(P,T) ) {Pn}x{i}x{qj} => ~103-4 jobs
• 102-104 Jobs to prepare, submit and monitor• Manual work is prone to human errors• First Principles => Sheer number (1015-1020) of operations (Today) => Well over 1022 in 3-5 years
The VLab- Consolidated Web Interface (Portal) to a set of tools:
- Quantum ESPRESSO Package tools - Input preparation for pwscf, phonon, workflows, etc … - Data Analysis Tools - Visualization Tools (VTK/OpenGL) - etc. …
- Workflow Management
Leverages computing capabilities of distributed resources (TeraGrid, Compute Farms, scattered resources, other grids)
Collaboration through shared access to resources
- Task Distribution and Data Recollection
The Big Challenge of Performance
Proposed Solution:Leveraging Concurrent Computing for
features and performance
High Performance Parallel Computing
High Throughput Distributed Processing
•Scale-up approach is difficult•Limited number of processors in a single system•Even using the fastest vector processors is not enough •Trend is towards denser processing, not faster single-thread execution
•MPP systems are not cost effective for this class of problems•FFT and matrix transposition: Limited scalability or•Low performance per processor
Vlab - Not Just a Client/Server
The Client/Server Approach:
-The portal and the supporting modules have access to a large central multi-processor system.
-Can work as a facilitator but lacks other important features found in VLab.
-No Flexibility of Scheduling -No redundancy => Poor availability-No choice for cost (usually High)
Vlab - Not Just a Client/Server
The VLab Distributed System Approach:
-Distributed resources are replicated for:
- Redundancy- Performance- Flexibility
-No central system to fail and bring everything down!
-More Flexible Scheduling for:- Cost- Turnaround Time - Job Throughput- Workload Balance- System Throughput
VLAB requirements•Workflow management => Facilitator
•Support for distributed computations
•Ease of use
•Support for collaboration
•Flexibility (update/add tools, new features)
•Fault tolerance
•Diversity of tools–analysis, visualization, data reduction, storage, etc .
VLab Workflows Typical VLab workflows, like
the High-T Cij calculation involve iterations through the following steps:
1) Prepare inputs for tasks, and generate execution packages containing required files.
2) Dispatch the execution packages to compute nodes for execution.
3) Gather results for analysis and eventually iterate steps 1-3.
- Results always return to the input sources => Tree-like service architecture
VLab Service Oriented Architecture On the Web:
http://dasilveira.msi.umn.edu:8080/vlab/Usage oriented view of VLab SOA
=> Tree-like structure in 4 layers: 1) User Interface (Portal)2) Workflow control and monitoring (Project Executor / Interaction)3) Task Dispatching / Interaction, task data retrieving, Auxiliary Services4) Heavy computations and Visualization resources layer.
Fault Tolerance
- Reactive: We have not identified any need for proactive FT.
- Registry Based: Persistent sessions are registered and must periodically inform the registry about its "alive" state.
- Redundant Registry and Metadata DB for data persistence
- Fully Journaling (data and metadata) of Critical Transactions for data and metadata integrity. This guarantee the state of any persistent session can be restored in case of failure.
• Only Project Executor sessions and few user and project interaction sessions are required to be persistent. Therefore, a simple approach to Fault Tolerance (FT) is possible:
SchedulingThe usual approach: -Use agents that interact with the broker
Problem: Agents are not stateless! -More complicated to develop -Persistence must be guaranteed
The VLab approach: -Use an independent WS to monitor workload. -Persistence of data is provided by a local DB. -Compute WS and Workload Monitor are stateless!
VLab in Action
Watch a demonstration movie at vlab.msi.umn.edu -> Follow the links “portal” -> “movie”
Calculation of High P,T Thermodynamic Properties Cubic MgO 2 atom cell Static + Lattice Dynamics calculation {Pn}x{i} sampling
Show distributed computing capabilities Ability to integrate visualization and data analysis tools
VLab Workflows
Left: Extensive High-T Cij
Right: Detailed View of Cij and phonon