The Open Science Grid
-
Upload
rob-gardner -
Category
Technology
-
view
243 -
download
0
Transcript of The Open Science Grid
The Open Science Grid
NMTIE Conference University of New Mexico, November 2015
Rob Gardner • University of Chicago [email protected]
Outline
1 32Jumpstart high throughput computing
What training resources are available?
OSG, and how to use it
4campus to national cyberinfrastructure
What is high throughput computing (HTC)?
● Big tasks split into smaller tasks● Batches of similar program runs ● Loops over independent tasks● Many scientific “HPC tasks” are really HTC
4
Q: can the job can be split into smaller tasks?
If so, bake the cake with many small ovens
● use commodity technology
● ovens easier to maintain, replace
● easier to schedule time
6
HTC is many small ovens working on pieces of the problem independently
Why HTC? ⇒ Reduce time to Science
A pattern that can be replicated for many sciences
Lauren MichaelUniversity of WisconsinACI-REF programOSG Summer School 2015
Serial execution Concurrent execution (save time!)7
If the order of subtasks is not important, jobs can be run independently on available processors
Transitioning “serial” to HTC
⇒
Lauren MichaelUniversity of WisconsinACI-REF programOSG Summer School 2015
8
Example use case: high energy physics
⇒ Datasets can be processed on independent clusters⇒ Jobs use single or multiple cores on a server
9
Science and computational methods
Any computation that can be split into independent pieces:
● Parameter sweeps● Multi-start simulations● Statistical model
optimization● Image analysis● Pattern recognition● Text mining● Data-intensive analysis
10
● Understanding evolution at molecular scale in DNA with combination of mathematical modeling and simulation
● How quickly does a genome fix a mutation?
● Role of randomness versus natural selection?
Joshua Plotkin, Penn“We use the OSG to run computer simulations for complex processes. By simulating evolution in populations, we can study hypothetical situations that you can’t study in the wet lab or field.”
Example: Evolutionary Biology
Image courtesy Joshua Plotkin. A computationally predicted structure of an influenza virus protein. Plotkin’s research group uses the Open Science Grid to study questions in evolutionary biology and ecology.
11
The Open Science Grid
● 500k jobs/day● 925M CPU-
hours/year
13
● A distributed computing partnership of over 125 campuses
● Data intensive: 0.5 PB data transfered/day
How do I use it? ⇒ OSG Connect
OSG as a campus cluster
★ Login host★ Job scheduler★ Software★ Storage
15
OSG Connect Service
● http://osgconnect.net site● login node for job management,
login.osgconnect.net
● Stash storage platform — common storage for:○ scp, rsync○ http○ Globus (gridftp)
● Recommended path for (most) new OSG users
16
OSG Connect Service
Has an identity bridge: local campus identity (CILogon) ‣ OSG Connect identity (Globus) ‣ virtual organization (OSG)+ HTCondor Glidein Overlay
⇒ Goal is to provide a virtual HTC cluster experience
17
Software & tools on the OSG
● Distributed software file system OASIS● Special module command
○ identical software on all clusters○ 170 libraries
#!/bin/bash
switchmodules oasismodule load R
module load matlab
...18
Storage service: “Stash”
● Storage service for job input/output data● Globus Server for managed transfers between
campus and the OSG ● POSIX access provided to the login host● 500 TB capacity● Personalized http accessible space● Connected to 100 Gbps SciDMZ (I2, ESnet)
19
Simulations for high rate error correction codes for optical data communication and data compression
● David Mitchell, NMSU EE faculty○ Ahmad Golmohammadi (EE graduate student) ○ Important for digital space and satellite
communication & wireless data transmission○ Whole system simulations - transmitter, decoder,
receiver & stochastic noise, data compression○ Computations are well-suited to HTC. Ahmad:
Local contact: Piyasat Nilkaew, Dir. Telecom, Networking, User Support
OSG Connect - onboard quickly
Researchers without local HPC or HTC can login to OSG Connect directly,but….
22
Bring the submit point to campus?
...some researchers prefer to work “from home”:● Local standard
configuration● Local standard data
management● Local standard
software access tools● First point of contact
for scientific computing consulting
23
Connect Client brings pools to campus
Idea is to bring the submit point “home” i.e. on campus
Submit locally,run globally
24
Annual OSG User School (July, Madison)
Week of lectures and intensive HTC challengesTargeted for grad, postdocs & HPC facilitators
Joint OSG Software Carpentry Workshops
● Two members of OSG User Support team are Software Carpentry instructors
● Extend standard core curriculum with a day of HTC computing best practices
HTC Recipes, Online Help, Ticketing
● Helpdesk portal● Knowledge
base● Online chat● Submit help
requests
How can I get my campus connected?
● Connect your campus researchers to the OSG○ OSG Connect - a login service to the OSG○ Connect Client : a job submit client for the local cluster
■ provide a “burst” like capability for HTC jobs to shared opportunistic resources
● Connect a campus cluster to OSG○ Lightweight connect : OSG sends “glidein” jobs to your cluster, using a
simple user account■ No local software or services needed!
○ Large scale: deploy the OSG software stack■ Support more science communities at larger scale■ Usually recommend as a second step
30
31
Campus to national networking
● OSG helps campuses analyze PerfSonar data● Can create campus or regional dashboards
32
Summary of the Open Science Grid
● Helps researchers speed up their research using high throughput computing methods
● Helps campus HPC administrators share local resources for multi-campus and national collaborative research
● Collects, archives and analyzes links between campuses and the national cyberinfrastructure
osgusers
support.opensciencegrid.org
www.opensciencegrid.org/links
opensciencegrid
33This talk:http://bit.ly/1WZ2grg