IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin–...
-
Upload
clarissa-haynes -
Category
Documents
-
view
215 -
download
1
Transcript of IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin–...
![Page 1: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/1.jpg)
Introduction to HTC2015 OSG User School, Monday, Lecture 1
Greg ThainUniversity of Wisconsin–
Madison
Center For High Throughput Computing
![Page 2: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/2.jpg)
Welcome!
2
![Page 3: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/3.jpg)
Why Are We Here?
3
![Page 4: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/4.jpg)
TransformYour
ResearchWith Computing
4
![Page 5: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/5.jpg)
Overview
5
![Page 6: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/6.jpg)
Overview of Week
••
Monday High Throughput Computing locallyMiscellaneous
SurveyUW reimbursement form
‣‣‣
•Tuesday Distributed High Throughput Computing• Security
• Tour of Wisconsin Institutes for Discovery
•Wednesday Distributed storage • Practical issues with DHTC
••
Thursday From science to productionPrinciples of HTC
• HTC Showcase • Next steps
6
![Page 7: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/7.jpg)
Overview of a Day
••••
Short introductory lectures
Lots of hands-on exercises
Some demos, interactive sessions,
Optional evening sessions
etc.
Monday – Wednesday, 7–9 p.m.Union South (check TITU)
School staff on hand
‣
‣
‣
7
![Page 8: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/8.jpg)
Keys to Success
••
Work hard
Ask questions!… during lectures…………
duringduring during
exercisesbreaks meals
in person is best, email is OK
• If we do not know an answer,the person who does
we will try to find
8
![Page 9: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/9.jpg)
Ready?
9
![Page 10: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/10.jpg)
One Thing
![Page 11: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/11.jpg)
Computing is Cheap!
![Page 12: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/12.jpg)
Goals For This Session
•
•
••
Understand the basics of High ThroughputComputing
Understand a few things about HTCondor,which is one kind of HTC
system
commandsUse basic HTCondor
locally!Run a job
10
![Page 13: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/13.jpg)
WhyHTC?
11
![Page 14: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/14.jpg)
Computing in Science
Sc i e n c e
12
The
ory
Exp
erim
ents
![Page 15: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/15.jpg)
Computing in Science
Sc i e n c e
12
The
ory
Com
puti
ng
Exp
erim
ents
![Page 16: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/16.jpg)
Example Challenge
•
•••
•
You have a program to run (simulation, MonteCarlo, data analysis, image analysis, stats, …)
Each run takes about 1 hour
You want to run the program 8 × 12 × 100 times
9600 hours ≈ 1.1 years … running nonstop!
Conference is next week
13
![Page 17: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/17.jpg)
Distributed Computing
••
Use many computers to perform 1 computation
Example:
2 computers => 4,800 hours ≈ ½ year
8 computers => 1,200 hours ≈ 2 months
‣
‣
‣
‣
100 computers => 96 hours
9,600 computers => 1 hour!
= 4 days
(but …)
14
These computers are no faster than your laptop!
![Page 18: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/18.jpg)
Performance vs. Throughput
• High Performance Computing (HPC)Focus on biggest, fastest systems (supercomputers)Maximize operations per secondOften requires special codeOften must request and wait for access
‣
‣
‣
‣
• High Throughput Computing (HTC)Focus on using all resources, reliably, all the timeMaximize operations per year
Use any kind of computer, even old, slow ones Must break task into separate, independent parts Access varies by availability, usage, etc.
‣
‣
‣
‣
‣
15
![Page 19: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/19.jpg)
HPC vs HTC: An Analogy
16
![Page 20: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/20.jpg)
HPC vs HTC: An Analogy
16
![Page 21: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/21.jpg)
Example HTC Site (Wisconsin)
••
Our local HTC systemsRecent CPU hours:~~~
280,000 / day8.3 million / month78 million / year
17
![Page 22: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/22.jpg)
Open Science Grid
• HTC scaled way upOver 110 sites
Mostly U.S. Some others Past year:
‣
‣
‣
‣~170 million~770 million
jobsCPUhours
~372 petabytes transferred
Can submit jobs locally, move to OSG
http://www.opensciencegrid.org/
••
2013 OSG User School Cartwright – Intro to HTC 18
![Page 23: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/23.jpg)
Other Distributed Computing
• Other systems to manage a local cluster:PBS/TorqueLSFSun Grid Engine/Oracle Grid EngineSLURM
‣‣‣‣
• Other wide-area systems:European Grid InfrastructureOther national and regional gridsCommercial cloud systems used to augment
‣‣‣ grids
• HPCVarious supercomputers (e.g., TOP500 list)XSEDE
‣‣
19
![Page 24: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/24.jpg)
HTCondor
20
![Page 25: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/25.jpg)
HTCondor History and Status
• HistoryStarted in 1988 as a “cycle scavenger”Protected interests of users and machine owners
‣‣
• TodayExpanded to become CHTC team: 20+ full-time staffCurrent production release: HTCondor 8.4.0
‣‣‣ HTCondor software: ~700,000 lines of C/C++ code
• Miron LivnyProfessor, UW–Madison CompSciDirector, CHTCDir. of Core Comp. Tech., WID/MIR Tech. Director & PI, OSG
‣‣‣‣
21
![Page 26: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/26.jpg)
HTCondor Functions
• UsersDefine jobs, their requirements, and preferences‣
‣‣‣
Submit and cancelCheck on the stateCheck on the state
jobsof a jobof the machines
• AdministratorsConfigure and control the HTCondor systemDeclare policies on machine use, pool use, etc.
‣‣
• InternallyMatch jobs to machines (enforcing all policies)‣
‣‣
TrackTrack
andand
manage machinesrun jobs
22
![Page 27: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/27.jpg)
Terminology: Job
••Job: A computer program or one run of it
Not interactive, no GUI (e.g., not Word or email)(How could you interact with 1,000 programs running at
1. Input: command-line arguments and/or files
2. Run: do stuff3. Output: standard output & error and/or files
once?)
• SchedulingUser decides when to submit job to be runSystem decides when to run job, based on policy
‣‣
23
![Page 28: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/28.jpg)
Terminology: Machine, Slot
• MachineA machine is a physical computer (typically)May have multiple processors (computer chips) One processor may have multiple cores (CPUs)
‣‣‣
• HTCondor: SlotOne assignable unit of a machine (i.e., 1 job perMost often, corresponds to one coreThus, typical machines today have 4–40 slots
slot)‣‣‣
• Advanced HTCondor feature: Can get 1 slot withmany cores on 1 machine, for MPI(-like) jobs
24
![Page 29: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/29.jpg)
Terminology: Matchmaking
Two-way process of finding a slot for a job
Jobs have requirements and preferencesE.g.: I need Red Hat Linux 6 and 100 GB of disk space, andprefer to get as much memory as possible
Machines have requirements and preferences E.g.: I run jobs only from users in the Comp. Sci. dept., and prefer to run ones that ask for a lot of memory
I
I
Important jobs
Thus: Not as
may replace less important ones
simple as waiting in a line!
25
![Page 30: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/30.jpg)
Running a Job
26
![Page 31: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/31.jpg)
Viewing Slots
•••
With no arguments, lists all slots currently in poolSummary info is printed at the end of the list
lectureFor more info: exercises, -h, manual, next
27
Total Owner Claimed Unclaimed Matched Preempting Backfill
NTEL/WINNT51 2 0 0 2 0 0 0INTEL/WINNT61 52 2 0 50 0 0 0X86_64/LINUX 2086 544 1258 284 0 0 0
Total 2140 546 1258 336
[email protected] LINUX X86_64 Claimed Busy 1.000 1024 0+19:09:[email protected] LINUX X86_64 Claimed Busy 1.000 1024 0+19:09:31 [email protected] LINUX X86_64 Unclaimed Idle 1.000 1024 0+17:37:54 [email protected] LINUX X86_64 Claimed Busy 1.000 1024 0+19:09:32 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+17:55:15 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+17:55:16
condor_status
![Page 32: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/32.jpg)
Viewing Jobs
••
With no args, lists all jobs waiting or running hereFor more info: exercises, -h, manual, next lecture
28
-- Submitter: osg-ss-submit.chtc.wisc.edu : <...> : ...ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
6.0 cat 11/12 09:30 0+00:00:00 I 0 0.0 explore.py6.1 cat 11/12 09:30 0+00:00:00 I 0 0.0 explore.py6.2 cat 11/12 09:30 0+00:00:00 I 0 0.0 explore.py6.3 cat 11/12 09:30 0+00:00:00 I 0 0.0 explore.py6.4 cat 11/12 09:30 0+00:00:00 I 0 0.0 explore.py
5 jobs; 5 idle, 0 running, 0 held
condor_q
![Page 33: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/33.jpg)
Basic Submit File
29
executable = word_freq.py universe = vanillaarguments = "words.txt 1000"
output = word_freq.out error = word_freq.err log = word_freq.log
should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = words.txt
queue
![Page 34: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/34.jpg)
Submit a Job
••
Submits job to local submit machineUse condor_q to track
A job ID is written as cluster.process (e.g., 8.0)
We will see how to make multiple processes later
30
Submitting job(s).1 job(s) submitted to cluster NNN.
condor_submit submit-file
Each condor_submit creates one Cluster
![Page 35: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/35.jpg)
Remove a Job
•••
Removes one or more jobs from the queueIdentify jobs by whole cluster or single job IDOnly you (or admin) can remove your jobs
31
Cluster NNN has been marked for removal.
condor_rm cluster [...]condor_rm cluster.process [...]
![Page 36: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/36.jpg)
Your
Turn!
32
![Page 37: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/37.jpg)
Thoughts on Exercises
•
•
Copy-and-paste is quick, but you may learn more by typing out commands yourself
Experiment!Try your own variations on the exercisesIf you have time, try to apply to your own work
‣
‣
•
•
If you do not finish, that’s OK — you can make up work later or during evenings, if you like
If you finish early, try any extra challenges or optional sections, or move ahead to the next section if you are brave
33
![Page 38: IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.](https://reader030.fdocuments.net/reader030/viewer/2022032804/56649e4f5503460f94b46e57/html5/thumbnails/38.jpg)
Exercises!
•••
Ask questions!
Lots of instructors around
Coming next:
Now – 10:3010:30–10:45
10:45–11:15
11:15–12:15
Hands-on
Break
Lecture
Hands-on
exercises
exercises
35