ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat...
Transcript of ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat...
![Page 1: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/1.jpg)
CUG 2006
This Presentation May Contain Preliminary Information That Is Subject To Change
ALPSApplication Level Placement Scheduler
Michael [email protected]
![Page 2: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/2.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 2
ALPS Design Goals (1)• Scalability• Thousands of OS instances and applications
• Efficiency• Maximize resource utilization• Minimize overhead
• Predictibility• Consistent performance of applications• Guaranteed resource availability
• Adaptability• Mask architecture specific details• Exploit architecture specific capabilities
![Page 3: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/3.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 3
ALPS Design Goals (2)• Extensibility• Adaptable to future architectures• Simplified integration with workload management
systems• Maintainability• Reduce complexity• Separate policy and mechanism
• Availability• Recover quickly with minimal impact• Minimize single points of failure
![Page 4: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/4.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 4
ALPS Operating Environment• Hardware• Multiple node types• Multiple processor types• Processor and memory variations• Distributed shared memory
• Software• Multiple parallel programming paradigms• Multiple OS instances• Supported on Compute Node Linux only• Multiple workload managers• Administration and configuration tools• Resource and event monitoring
![Page 5: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/5.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 5
ALPS Core Services• Launch and cleanup applications• Binary executable distribution• Monitor and report application status• Application ID assignment• Resource reservation management• Signal propagation• Standard input, output, and error management• Resource availability monitoring• Provide external access to application processes
for debugging and performance analysis
![Page 6: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/6.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 6
ALPS Features: Gang Scheduling• ALPS manages context switching• Consistent across entire application• Configurable interval
• Allows short and long running jobs to coexist• Supports configurable CPU oversubscription factor• No support for memory oversubscription
![Page 7: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/7.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 7
ALPS Features: Reservations• Maintain resource availability for batch jobs• Support interactive users• Reservation states:• FILED - Request registered• CONFIRMED - Resources locked• CLAIMED - Resources in use
![Page 8: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/8.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 8
ALPS Features: BASIL• Batch & Application Scheduler Interface Layer• Extensible XML-RPC implementation• Open interface specification• No proprietary APIs or libraries• Third party vendors manage integration• Three primary functions:• Inventory• Reservation creation• Reservation cancellation
• BASIL programmer’s guide
![Page 9: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/9.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 9
ALPS Features: Fanout Tree
1082401699054681341315338254369585851541057273732173
331795321111113216842
Tree Radix
Tree
Dep
th
• Provides scalability• Supports parallel operation• Simulated broadcast on unicast network• Configurable radix:
![Page 10: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/10.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 10
ALPS Components• Clients• aprun – Application submission• apstat – Application status• apkill – Signal delivery• apbasil – Workload manager interface
• Servers• apsys – Client interaction on login nodes• apinit – Process management on compute nodes• apsched – Reservations and placement• apbridge – System data collection• apwatch – Event monitoring
![Page 11: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/11.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 11
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 12: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/12.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 12
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 13: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/13.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 13
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 14: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/14.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 14
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 15: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/15.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 15
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 16: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/16.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 16
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 17: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/17.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 17
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 18: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/18.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 18
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 19: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/19.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 19
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 20: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/20.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 20
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 21: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/21.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 21
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 22: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/22.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 22
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 23: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/23.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 23
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 24: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/24.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 24
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 25: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/25.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 25
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 26: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/26.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 26
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 27: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/27.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 27
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 28: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/28.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 28
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 29: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/29.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 29
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 30: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/30.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 30
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 31: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/31.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 31
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 32: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/32.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 32
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 33: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/33.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 33
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 34: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/34.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 34
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 35: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/35.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 35
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 36: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/36.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 36
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 37: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/37.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 37
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 38: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/38.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 38
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 39: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/39.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 39
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 40: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/40.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 40
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 41: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/41.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 41
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 42: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/42.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 42
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 43: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/43.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 43
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 44: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/44.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 44
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 45: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/45.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 45
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 46: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/46.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 46
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 47: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/47.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 47
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 48: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/48.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 48
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 49: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/49.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 49
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 50: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/50.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 50
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 51: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/51.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 51
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 52: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/52.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 52
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 53: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/53.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 53
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 54: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/54.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 54
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 55: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/55.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 55
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
![Page 56: ALPS - Cray User Group · ALPS Components •Clients •aprun – Application submission •apstat – Application status •apkill – Signal delivery •apbasil – Workload manager](https://reader036.fdocuments.net/reader036/viewer/2022071212/6024378f8114202d5b4c9ba7/html5/thumbnails/56.jpg)
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 56
Q & A• Questions?• Thanks to the ALPS team:• Richard Lagerstrom - development• Marlys Kohnke - development• Carl Albing – development• Bob Gross - testing• Jan Gustafson - our current manager• Wayne Margotto - our former manager
• Thank You!