A CASE FOR TRANSFORMING PARALLEL RUNTIMES
Transcript of A CASE FOR TRANSFORMING PARALLEL RUNTIMES
A CASE FOR TRANSFORMING !PARALLEL RUNTIMES INTO OS KERNELS
Kyle Hale Peter Dinda
halek.co v3vee.org presciencelab.org xstack.sandia.gov/hobbes
3
parallel app
parallel runtime
general-purpose kernel
node HW
user mode
kernel mode
THE CURRENT OS/RUNTIME MODEL
5
user mode
kernel mode
ARE PROVIDED KERNEL ABSTRACTIONS THE RIGHT ONES?
NOT ALWAYS
I’d like to pin memory to a specific PFN
range please
NO!
runtime
general OS
6
user mode
kernel mode
NOT ALWAYS
I’d like to never be interrupted
please
NOPE
runtime
general OS
ARE PROVIDED KERNEL ABSTRACTIONS THE RIGHT ONES?
7
user mode
kernel mode
RESTRICTED ACCESS TO HARDWARE
I’d like to set up some custom
page mappings please
Uh no
runtime
general OS
8
user mode
kernel mode
I’d like to interrupt another
processor please
HA!
runtime
general OS
RESTRICTED ACCESS TO HARDWARE
14 we could mitigate these issues
FULL HARDWARE ACCESS
CONTROL OVER KERNEL ABSTRACTIONS
If runtime had
15
parallel app
parallel runtime
general-purpose kernel
node HW
user mode
kernel mode
THE CURRENT OS/RUNTIME MODEL
16
parallel app
hybrid runtime!
node HW
user mode
kernel mode
OUR PROPOSED MODEL: THE HYBRID RUNTIME (HRT)
17
parallel app
hybrid runtime!
node HW
user mode
kernel mode
OUR PROPOSED MODEL: THE HYBRID RUNTIME (HRT)
Mashup of OS and runtime
18
parallel app
hybrid runtime!
node HW
user mode
kernel mode
OUR PROPOSED MODEL: THE HYBRID RUNTIME (HRT)
The runtime IS the kernel, built within a kernel framework
19
parallel app
hybrid runtime!
node HW
user mode
kernel mode
OUR PROPOSED MODEL: THE HYBRID RUNTIME (HRT)
The runtime IS the kernel, built within a kernel framework
Everything is in kernel space
20
parallel app
hybrid runtime!
node HW
user mode
kernel mode
OUR PROPOSED MODEL: THE HYBRID RUNTIME (HRT)
The runtime IS the kernel, built within a kernel framework
Everything is in kernel space
HRT has full access to the hardware
21
parallel app
node HW
user mode
kernel mode
OUR PROPOSED MODEL: THE HYBRID RUNTIME (HRT)
HRT
HRT can control HW access
HRT can pick its own abstractions
22
parallel app
node HW
user mode
kernel mode
OUR PROPOSED MODEL: THE HYBRID RUNTIME (HRT)
HRT
MORE POWER!
25
We built a kernel framework to support HRTs
We ported an existing, complex parallel runtime
NAUTILUS
26
We built a kernel framework to support HRTs
We ported an existing, complex parallel runtime
NAUTILUS
LEGIONlegion.stanford.edu!
27
We built a kernel framework to support HRTs
We ported an existing, complex parallel runtime
We ported our framework to cutting-edge many-core hardware
NAUTILUS
LEGIONlegion.stanford.edu!
28
We built a kernel framework to support HRTs
We ported an existing, complex parallel runtime
We ported our framework to cutting-edge many-core hardware
NAUTILUS
LEGION
XEON PHI
legion.stanford.edu!
29
We built a kernel framework to support HRTs
We ported an existing, complex parallel runtime
We ported our framework to cutting-edge many-core hardware
We evaluated our port on a standard HPC benchmark
NAUTILUS
LEGION
XEON PHI
legion.stanford.edu!
30
We built a kernel framework to support HRTs
We ported an existing, complex parallel runtime
We ported our framework to cutting-edge many-core hardware
We evaluated our port on a standard HPC benchmark
NAUTILUS
LEGION
XEON PHI
HPCG
legion.stanford.edu!
31
0%
5%
10%
15%
20%
25%
1 2 4 8 16 32 64 128 200 220
Spe
ed
up o
ver L
inux
Legion Processor Count (Cores)
XEON PHI + NAUTILUS + LEGION + HPCG
32
0%
5%
10%
15%
20%
25%
1 2 4 8 16 32 64 128 200 220
Spe
ed
up o
ver L
inux
Legion Processor Count (Cores)
XEON PHI + NAUTILUS + LEGION + HPCG
33
0%
5%
10%
15%
20%
25%
1 2 4 8 16 32 64 128 200 220
Spe
ed
up o
ver L
inux
Legion Processor Count (Cores)
11% average speedup
XEON PHI + NAUTILUS + LEGION + HPCG
NAUTILUS
34
runtime
parallel app
threads sync. events HW info bootstrap paging timers IRQs console
Hardware
user mode
kernel mode
NAUTILUS
35
runtime
parallel app
threads sync. events HW info bootstrap paging timers IRQs console
Hardware
user mode
kernel mode
Nautilus primitives & utilities (HRT can use or not use any of them)
NAUTILUS
36
runtime
parallel app
threads sync. events HW info bootstrap paging timers IRQs console
Hardware
user mode
kernel mode
Nautilus primitives & utilities (HRT can use or not use any of them)
aerokernel
NAUTILUS
37
runtime
parallel app
threads sync. events HW info bootstrap paging timers IRQs console
Hardware
user mode
kernel mode
HRT
NAUTILUS
38
runtime
parallel app
threads sync. events HW info bootstrap paging timers IRQs console
Hardware
user mode
kernel mode
Kernel
LIGHTWEIGHT PRIMITIVES" EXAMPLE: THREADS
40
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
2 4 8 16 32 64
Cyc
les
Nautilus
Linux (pthreads)
x86_64 Opteron: 64 cores, 4 sockets, 8 numa zones, 128GB RAM
41
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
2 4 8 16 32 64
Cyc
les
Linux (pthreads)
LIGHTWEIGHT PRIMITIVES" EXAMPLE: THREADS
x86_64 Opteron: 64 cores, 4 sockets, 8 numa zones, 128GB RAM
42
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
2 4 8 16 32 64
Cyc
les
Nautilus
Linux (pthreads)
LIGHTWEIGHT PRIMITIVES" EXAMPLE: THREADS
x86_64 Opteron: 64 cores, 4 sockets, 8 numa zones, 128GB RAM
43
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
2 4 8 16 32 64
Cyc
les
Nautilus
Linux (pthreads)
LIGHTWEIGHT PRIMITIVES" EXAMPLE: THREADS
~3ms
~90µs
x86_64 Opteron: 64 cores, 4 sockets, 8 numa zones, 128GB RAM
FULL HARDWARE CONTROL"EXAMPLE: INTERRUPT CONTROL"
45
very simple modification: give runtime controlover interrupts in itstask scheduler
FULL HARDWARE CONTROL"EXAMPLE: INTERRUPT CONTROL"
46
very simple modification:
à modest speedups
give runtime controlover interrupts in itstask scheduler
FULL HARDWARE CONTROL"EXAMPLE: INTERRUPT CONTROL"
47
very simple modification:
MUCH more to come here
à modest speedups
give runtime controlover interrupts in itstask scheduler
48
in addition to Legion, we have 2 other high-level, parallel runtimes
running as HRTs
NESL: VCODE interpreter running as HRT
NDPC: home-grown, co-designed HRT
51
parallel app
hybrid runtime!
user mode
kernel mode parallel app
parallel runtime user mode
kernel mode parallel runtime
general OS
node hardware
Hybrid Virtual Machine (HVM)!
general virtualization model
specialized virtualization model
Legacy functionality from the Regular OS via the HVM!
Regular OS (ROS)
56
0%
5%
10%
15%
20%
25%
1 2 4 8 16 32 64 128 200 220
Spe
ed
up o
ver L
inux
Legion Processor Count (Cores)
11% average speedup
XEON PHI + NAUTILUS + LEGION + HPCG
my websitehalek.co
our development blog
haltloop.com
our labpresciencelab.org
the Hobbes project
xstack.sandia.gov/hobbes
Kyle Hale Peter Dinda
A CASE FOR TRANSFORMING !PARALLEL RUNTIMES INTO OS KERNELS
follow us here for: - experience report on
building OS for Phi - philix release (soon)
FULL HARDWARE CONTROL"EXAMPLE: INTERRUPT CONTROL"
59
0%
1%
2%
3%
4%
5%
6%
2 4 8 16 32 62
Spe
ed
up
Legion Processors (threads)
FULL HARDWARE CONTROL"EXAMPLE: INTERRUPT CONTROL"
60
0%
1%
2%
3%
4%
5%
6%
2 4 8 16 32 62
Spe
ed
up
Legion Processors (threads)
62
parallel app
hybrid runtime!
node HW
user mode
kernel mode parallel app
parallel runtime user mode
kernel mode parallel runtime
general OS
63
parallel app
hybrid runtime!
user mode
kernel mode parallel app
parallel runtime user mode
kernel mode parallel runtime
general OS
node hardware
Hybrid Virtual Machine (HVM)!
general virtualization model
specialized virtualization model
64
parallel app
hybrid runtime!
user mode
kernel mode parallel app
parallel runtime user mode
kernel mode parallel runtime
general OS
node hardware
Hybrid Virtual Machine (HVM)!
general virtualization model
specialized virtualization model
This is the performance path, through the HRT!
65
parallel app
hybrid runtime!
user mode
kernel mode parallel app
parallel runtime user mode
kernel mode parallel runtime
general OS
node hardware
Hybrid Virtual Machine (HVM)!
general virtualization model
specialized virtualization model
Legacy functionality from the Regular OS via the HVM!
Regular OS (ROS)
66
parallel app
hybrid runtime!
user mode
kernel mode parallel app
parallel runtime user mode
kernel mode parallel runtime
general OS
node hardware
Hybrid Virtual Machine (HVM)!
general virtualization model
specialized virtualization model
We can boot these things very quickly!!
67
parallel app
hybrid runtime!
user mode
kernel mode parallel app
parallel runtime user mode
kernel mode parallel runtime
general OS
node hardware
Hybrid Virtual Machine (HVM)!
general virtualization model
specialized virtualization model
parallel app
hybrid runtime!
parallel app
hybrid runtime!
parallel app
hybrid runtime!
several auxiliary HRTs spawned in!less than a millisecond !
68
0
1
2
3
4
5
6
1 2 4 8 16 32 64 128 200 220
Exe
cut
ion
Tim
e (
s)
Legion Processor Count (Cores)
Natuilus
Linux
HPCG IN LEGION ON XEON PHI
HPCG IN LEGION ON XEON PHI
69
0
1
2
3
4
5
6
1 2 4 8 16 32 64 128 200 220
Exe
cut
ion
Tim
e (
s)
Legion Processor Count (Cores)
Natuilus
Linux
71
port of NESL
- nested data parallel language aimed at vector machines
- we can run unmodified NESL programs in our kernel-mode VCODE interpreter
73
the first co-designed HRT: NDPC - Nested Data Parallelism in C/C++- subset of NESL- fork/join parallelism over flattened vector processing
74
the first co-designed HRT: NDPC - Nested Data Parallelism in C/C++- subset of NESL- fork/join parallelism over flattened vector processing- allows us to explore runtime/kernel co-design- e.g. smart kernel-mode thread fork
75
parallel app
hybrid runtime!
user mode
kernel mode parallel app
parallel runtime user mode
kernel mode parallel runtime
general OS
node hardware
Hybrid Virtual Machine (HVM)!
general virtualization model
specialized virtualization model
parallel app
hybrid runtime!
parallel app
hybrid runtime!
parallel app
hybrid runtime!
several auxiliary HRTs spawned in!less than a millisecond !
Regular OS (ROS)
specialized virtualization model
specialized virtualization model
specialized virtualization model
77
to get started with your own Xeon Phi prototype kernel:
• follow our blog • use our tool (philix) to boot it and
leverage MPSS stack
78
to get started with your own Xeon Phi prototype kernel:
• follow our blog • use our tool (philix) to boot it and
leverage MPSS stack