CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala)...
Transcript of CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala)...
![Page 1: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/1.jpg)
CS442: High Productivity and
Performance with Domain-
specific Languages in Scala
![Page 2: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/2.jpg)
(C) 2011 Kunle Olukotun 2
CS442 Overview
Experimental Research Course Lots of interest in DSLs
Opportunity to do something new
New ideas: DSLs for parallelism and productivity
New software: DSL infrastructure
May lead to publications
Goals
Knowledge and tools for DSL development
Bring domain experts and CS experts together
Develop new DSLs and push development of DSL infrastructure
Learn Scala: important new programming lang.
![Page 3: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/3.jpg)
(C) 2011 Kunle Olukotun 3
Course Information
Instructor: Kunle Olukotun E-mail: [email protected]
Office: Gates 302
Office Hours: 11:00am-noon M/W or by appointment
TAs Hassan Chafi, Arvind Sujeeth, Kevin Brown
E-Mail: cs442-spr1011-staff at lists dot stanford dot edu
Course Support Darlene Hadding
E-Mail: darleneh at stanford dot edu
Location:Gates 408
Telephone: 650-723-1430
Office Hours: M-F 10:00 AM-3:00 PM
![Page 4: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/4.jpg)
(C) 2011 Kunle Olukotun 4
Course Information
Prerequisites
CS background: CS108, and a systems course (CS143, CS140), preferably CS 149.
Non CS background: Expertise in a particular domain
Participation (10% of grade)
You are expected to participate in the class discussion
Website
http://www.stanford.edu/class/cs442
Programming Assignments
Scala
DSL embedding with
Final Project (90% of grade)
The Final Project is an open-ended research project
New DSL or adapt existing DSL
![Page 5: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/5.jpg)
(C) 2011 Kunle Olukotun 5
2020 Vision for Parallelism
Make parallelism accessible to all programmers
Parallelism is not for the average programmer Too difficult to find parallelism, to debug,
maintain and get good performance for the masses
Need a solution for “Joe/Jane the programmer”
Can’t expose average programmers to parallelism But auto parallelizatoin doesn’t work
![Page 6: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/6.jpg)
(C) 2011 Kunle Olukotun 6
What Is Computing?
Predicting the future
Modeling and simulation (weather, materials, products)
Decide what to build and experiment or instead of build and experiment ⇒ third pilar of science
Coping with the present (real time)
Embedded systems control (cars, planes, communication)
Virtual worlds (second life, facebook)
Electronic trading (airline reservation, stock market)
Robotics (manufacturing, cars, household)
Understanding the past Large data set analysis (commerce, web, census, simulation)
Discover trends and develop insight
![Page 7: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/7.jpg)
(C) 2011 Kunle Olukotun 7
The Changing Nature of Science
![Page 8: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/8.jpg)
(C) 2011 Kunle Olukotun 8
Explosion of Data Sources
![Page 9: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/9.jpg)
(C) 2011 Kunle Olukotun 9
Genetics Gets Personal
![Page 10: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/10.jpg)
(C) 2011 Kunle Olukotun 10
Computing System Power
second
OpsEnergyPower
Op
FIXED
![Page 11: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/11.jpg)
(C) 2011 Kunle Olukotun 11
Heterogeneous Hardware
Heterogeneous HW for energy efficiency Multi-core, ILP, threads, data-parallel engines, custom engines
H.264 encode study
1
10
100
1000
4 cores + ILP + SIMD + custom inst
ASIC
Performance
Energy Savings
Source: Understanding Sources of Inefficiency in General-Purpose Chips (ISCA’10)
3 orders of magnitude
![Page 12: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/12.jpg)
(C) 2011 Kunle Olukotun 12
DE Shaw Research: Anton
D. E. Shaw et al. SC 2009, Best Paper and Gordon Bell Prize
100 times more power efficient
Molecular dynamics computer
![Page 13: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/13.jpg)
(C) 2011 Kunle Olukotun 13
Heterogeneous Parallel Architectures Today
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
![Page 14: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/14.jpg)
(C) 2011 Kunle Olukotun 14
Heterogeneous Parallel Programming
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
MPIPGAS
PthreadsOpenMP
CUDA OpenCL
VerilogVHDL
![Page 15: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/15.jpg)
(C) 2011 Kunle Olukotun 15
Programmability Chasm
Too many different programming models
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
MPIPGAS
PthreadsOpenMP
CUDA OpenCL
VerilogVHDL
Virtual
Worlds
Personal
Robotics
Datainformatics
Scientific
Engineering
Applications
![Page 16: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/16.jpg)
(C) 2011 Kunle Olukotun 16
Hypothesis
It is possible to write one programand
run it on all these machines
![Page 17: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/17.jpg)
(C) 2011 Kunle Olukotun 17
Programmability Chasm
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
MPIPGAS
PthreadsOpenMP
CUDA OpenCL
VerilogVHDL
Virtual
Worlds
Personal
Robotics
Datainformatics
Scientific
Engineering
Applications
Ideal Parallel
Programming Language
![Page 18: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/18.jpg)
(C) 2011 Kunle Olukotun 18
Performance
Productivity Generality
The Ideal Parallel Programming Language
![Page 19: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/19.jpg)
(C) 2011 Kunle Olukotun 19
Successful Languages
Performance
Productivity Generality
![Page 20: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/20.jpg)
(C) 2011 Kunle Olukotun 20
True Hypothesis ⇒Domain Specific Languages
DomainSpecific
Languages
Performance(Heterogeneous Parallelism)
Productivity Generality
![Page 21: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/21.jpg)
(C) 2011 Kunle Olukotun 21
Domain Specific Languages
Domain Specific Languages (DSLs)
Programming language with restricted expressiveness for a particular domain
High-level, usually declarative, and deterministic
![Page 22: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/22.jpg)
(C) 2011 Kunle Olukotun 22
Benefits of Using DSLs for Parallelism
Productivity
•Shield average programmers from the difficulty of parallel programming
•Focus on developing algorithms and applications and not on low level implementation details
Performance
•Match high level domain abstraction to generic parallel execution patterns
•Restrict expressiveness to more easily and fully extract available parallelism
•Use domain knowledge for static/dynamic optimizations
Portability and forward scalability
•DSL & Runtime can be evolved to take advantage of latest hardware features
•Applications remain unchanged
•Allows innovative HW without worrying about application portability
![Page 23: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/23.jpg)
(C) 2011 Kunle Olukotun 23
Bridging the Programmability Chasm
Domain Embedding Language (Scala)
Virtual
Worlds
Personal
RoboticsData
informatics
Scientific
Engineering
Physics
(Liszt)
Data Analysis
(SQL)
Probabilistic
(RandomT)
Machine Learning(OptiML)
Rendering
Parallel Runtime (Delite)
Dynamic Domain Spec. Opt. Locality Aware Scheduling
StagingPolymorphic Embedding
Applications
Domain
Specific
Languages
Heterogeneous
Hardware
DSL
Infrastructure
Task & Data Parallelism
Static Domain Specific Opt.
![Page 24: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/24.jpg)
(C) 2011 Kunle Olukotun 24
Fuel injectionTransition Therm
al
Turbulence
Turbulence
Combustion
Liszt: DSL for Mesh PDEs
Z. DeVito, N. Joubert, P. Hanrahan
Solvers for mesh-based PDEs
Complex physical systems
Huge domains
millions of cells
Example: Unstructured Reynolds-averaged Navier Stokes (RANS) solver
Goal: simplify code of mesh-based PDE solvers
Write once, run on any type of parallel machine
From multi-cores and GPUs to clusters
![Page 25: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/25.jpg)
(C) 2011 Kunle Olukotun 25
Liszt Language Features
Minimal Programming language Aritmetic, short vectors, functions, control flow
Built-in mesh interface for arbitrary polyhedra Vertex, Edge, Face, Cell
Optimized memory representation of mesh
Collections of mesh elements
Element Sets: faces(c:Cell), edgesCCW(f:Face)
Mapping mesh elements to fields
Fields: val vert_position = position(v)
Parallelizable iteration
forall statements: for( f <- faces(cell) ) { … }
![Page 26: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/26.jpg)
(C) 2011 Kunle Olukotun 26
Liszt Code Example
Simple Set Comprehension
Functions, Function Calls
Mesh Topology Operators
Field Data Storage
for(edge <- edges(mesh)) {
val flux = flux_calc(edge)
val v0 = head(edge)
val v1 = tail(edge)
Flux(v0) += flux
Flux(v1) -= flux
}
Code contains possible write conflicts!
We use architecture specific strategies guided by domain knowledge
MPI: Ghost cell-based message passing
GPU: Coloring-based use of shared memory
![Page 27: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/27.jpg)
(C) 2011 Kunle Olukotun 27
MPI Performance
Using 8 cores per node, scaling up to 96 cores (12 nodes, 8 cores per node, all communication using MPI)
0
20
40
60
80
100
120
0 20 40 60 80 100 120
Spe
ed
up
ove
r Sc
alar
Number of MPI Nodes
MPI Speedup 750k Mesh
Linear Scaling Liszt Scaling Joe Scaling
1
10
100
1000
0 20 40 60 80 100 120
Ru
nti
m L
og
Scal
e (
seco
nd
s)
Number of MPI Nodes
MPI Wall-Clock Runtime
Liszt Runtime Joe Runtime
![Page 28: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/28.jpg)
(C) 2011 Kunle Olukotun 28
GPU Performance
Scaling mesh size from 50K (unit-sized) cells to 750K (16x) on a Tesla C2050. Comparison is against single threaded runtime on host CPU (Core 2 Quad 2.66Ghz)
Single-Precision: 31.5x, Double-precision:
28x
0
5
10
15
20
25
30
35
0 5 10 15 20
Spe
ed
up
ove
r Sc
alar
Problem Size
GPU Speedup over Single-Core
Speedup Double
Speedup Float
![Page 29: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/29.jpg)
(C) 2011 Kunle Olukotun 29
OptiML: A DSL for ML
A. Sujeeth and H. Chafi
Machine Learning domain Learning patterns from data
Applying the learned models to tasks
Regression, classification, clustering, estimation
Computationally expensive
Regular and irregular parallelism
Motivation for OptiML Raise the level of abstraction
Use domain knowledge to identify coarse-grained parallelism
Single source ⇒ multiple heterogeneous targets
Domain specific optimizations
![Page 30: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/30.jpg)
(C) 2011 Kunle Olukotun 30
OptiML Language Features
Provides a familiar (MATLAB-like) language and API for writing ML applications Ex. val c = a * b (a, b are Matrix[Double])
Implicitly parallel data structures General data types : Vector[T], Matrix[T]
Independent from the underlying implementation
Special data types : TrainingSet, TestSet, IndexVector, Image, Video ..
Encode semantic information
Implicitly parallel control structures sum{…}, (0::end) {…}, gradient { … }, untilconverged { … }
Allow anonymous functions with restricted semantics to be passed as arguments of the control structures
![Page 31: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/31.jpg)
(C) 2011 Kunle Olukotun 31
% x : Matrix, y: Vector
% mu0, mu1: Vector
n = size(x,2);
sigma = zeros(n,n);
parfor i=1:length(y)
if (y(i) == 0)
sigma = sigma + (x(i,:)-mu0)’*(x(i,:)-mu0);
else
sigma = sigma + (x(i,:)-mu1)’*(x(i,:)-mu1);
end
end
Example OptiML / MATLAB code(Gaussian Discriminant Analysis)
// x : TrainingSet[Double]
// mu0, mu1 : Vector[Double]
val sigma = sum(0,x.numSamples) {
if (x.labels(_) == false) {
(x(_)-mu0).trans.outer(x(_)-mu0)
}
else {
(x(_)-mu1).trans.outer(x(_)-mu1)
}
}
OptiML code (parallel) MATLAB code
ML-specific data types
Implicitly parallel
control structures
Restricted index
semantics
![Page 32: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/32.jpg)
(C) 2011 Kunle Olukotun 32
Benchmark Applications
6 machine learning applications Gaussian Discriminant Analysis (GDA)
Generative learning algorithm for probability distribution
Loopy Belief Propagation (LBP)
Graph based inference algorithm
Naïve Bayes (NB)
Supervised learning algorithm for classification
K-means Clustering (K-means)
Unsupervised learning algorithm for clustering
Support Vector Machine (SVM)
Optimal margin classifier using SMO algorithm
Restricted Boltzmann Machine (RBM)
Stochastic recurrent neural network
![Page 33: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/33.jpg)
(C) 2011 Kunle Olukotun 33
Experimental Setup
4 Different implementations OptiML+Delite
MATLAB (Original, GPU, Jacket)
System 1: Performance Tests Intel Nehalem
2 sockets, 8 cores, 16 threads
24 GB DRAM
NVIDIA GTX 275 GPU
System 2: Scalability Tests Sun Niagara T2+
4 sockets, 32 cores, 256 threads
128 GB DRAM
![Page 34: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/34.jpg)
(C) 2011 Kunle Olukotun 34
OptiML vs. MATLAB1.0
1.6
1.8
1.9
41.3
0.5
0.9
1.4
1.6
2.6
13.2
0.00
0.50
1.00
1.50
2.00
2.50
1
CPU
2
CPU
4
CPU
8
CPU
CPU
+
GPUNo
rm
alized
Execu
tio
n
Tim
e
GDA
1.0
1.7
2.7
3.5
11.0
1.0
1.9
3.2
4.7
8.9
16.1
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1 CPU 2 CPU 4 CPU 8 CPU CPU +
GPU
RBM
1.0
1.3
1.4
2.0 1.7
0.5
0.9
1.3 1.1
0.4 0.3
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
1 CPU 2 CPU 4 CPU 8 CPU CPU + GPU
Linear Regression
1.0
1.7
3.0
3.8
1.0
0.7
0.9
0.9
1.2
0.00
0.50
1.00
1.50
2.00
1 CPU 2 CPU 4 CPU 8 CPU CPU + GPU
0.1
0.1
7.00
15.00
SVM
... ...
1.0
1.9
3.6
5.8 1.1
0.1
0.2
0.2
0.3
0.1
0.00
2.00
4.00
6.00
8.00
10.00
1 CPU 2 CPU 4 CPU 8 CPU CPU +
GPU
0.0
1
100.00
110.00
Naive Bayes
...
DELITE MATLAB JacketOptiML
1.0
2.1
4.0
6.7
10.0
0.3
0.4
0.4
0.4
0.3
0.3
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
1 CPU 2 CPU 4 CPU 8 CPU CPU +
GPU
K-means
![Page 35: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/35.jpg)
(C) 2011 Kunle Olukotun 35
Measuring Intracellular Signaling with Mass Cytometry
Bioinformatics Algorithm Spanning-tree Progression Analysis of Density-normalized Events (SPADE)
P. Qiu, E. Simonds, M. Linderman, P. Nolan
Peng Qiu
![Page 36: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/36.jpg)
(C) 2011 Kunle Olukotun 36
SPADE is computationally intensive
Processing time for 30 files:
Matlab (parfor & vectorized loops)
2.5 days
C++ (hand-optimized OpenMP)
2.5 hours
…what happens when we have 1,000 files?
![Page 37: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/37.jpg)
(C) 2011 Kunle Olukotun 37
SPADE Downsample: OptiML
for(node <- G.nodes if node.density == 0) {
val (closeNbrs,closerNbrs) =
node.neighbors filter {dist(_,node) < kernelWidth}
{dist(_,node) < approxWidth}
node.density = closeNbrs.count
for(nbr <- closerNbrs) {
nbr.density = closeNbrs.count
}
}
kernelWidth
Downsample:
L1 distances between all 106
events in 13D space… reduce to
50,000 events
B. Wang and A. Sujeeth
![Page 38: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/38.jpg)
(C) 2011 Kunle Olukotun 38
SPADE Downsample: Matlabwhile sum(local_density==0)~=0
% process no more than 1000 nodes each time
ind = find(local_density==0); ind = ind(1:min(1000,end));
data_tmp = data(:,ind);
local_density_tmp = local_density(ind);
all_dist = zeros(length(ind), size(data,2));
parfor i=1:size(data,2)
all_dist(:,i) = sum(abs(repmat(data(:,i),1,size(data_tmp,2)) –data_tmp),1)';
end
for i=1:size(data_tmp,2)
local_density_tmp(i) = sum(all_dist(i,:) < kernel_width);
local_density(all_dist(i,:) < apprx_width) = local_density_tmp(i);
end
end
![Page 39: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/39.jpg)
(C) 2011 Kunle Olukotun 39
OptiML vs. C++
OptiML provides much simpler programming model
OptiML performance as good as C++ on full applications
1.0
1.7
3.1
4.9
0.7
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1 CPU 2 CPU 4 CPU 8 CPU
No
rm
alized
Execu
tion
Tim
e
Template Match
DELITE C++
1.0
1.9
3.4
5.4
1.0
2.0
3.6
5.9
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1 CPU 2 CPU 4 CPU 8 CPU
SPADE
1.0
2.5
4.9
9.8
1.6
2.1
4.3
7.1
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1 CPU 2 CPU 4 CPU 8 CPU
LBP
OptiML
![Page 40: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/40.jpg)
(C) 2011 Kunle Olukotun 40
More DSLs …
Graphs
Social networks, data analysis
Visualization
Data analysis
Query Language
Data analysis, financial trading
Bio-simulation
Molecular dynamics, cells & viruses , drug-design, prosthetics
Bio-informatics
Data analyis
Your DSL goes here
![Page 41: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/41.jpg)
(C) 2011 Kunle Olukotun 41
New Problem
We need to develop all of these DSLs
Current DSL methods are unsatisfactory
![Page 42: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/42.jpg)
(C) 2011 Kunle Olukotun 42
Current DSL Development Approaches
Stand-alone DSLs Can include extensive optimizations Enormous effort to develop to a sufficient degree of maturity
Actual Compiler/Optimizations Tooling (IDE, Debuggers,…)
Interoperation between multiple DSLs is very difficult
Purely embedded DSLs ⇒ “just a library” Easy to develop (can reuse full host language) Easier to learn DSL Can Combine multiple DSLs in one program Can Share DSL infrastructure among several DSLs Hard to optimize using domain knowledge Target same architecture as host language
Need to do better
![Page 43: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/43.jpg)
(C) 2011 Kunle Olukotun 43
Need to Do Better
Goal: Develop embedded DSLs that perform as well as stand-alone ones
Intuition: General-purpose languages should be designed with DSL embedding in mind
![Page 44: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/44.jpg)
(C) 2011 Kunle Olukotun 44
DSL Embedding Language
Mixes OO and FP paradigms
Targets JVM
Stanford/EPFL collaboration on leveraging Scala for parallelism
“Language Virtualization for Heterogeneous Parallel Computing” Onward 2010, Reno
![Page 45: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/45.jpg)
(C) 2011 Kunle Olukotun 45
What is Scala
Statically typed Allows OOP
Improves it by adding traits Everything is an object
Allows FP Functions are just like any other object
can be passed to and returned from other functions/methods
Also offers closures
Compiles to JVM/.NET Succinct, elegant and flexible syntax
Type inference Syntactic sugaring features make libraries feel “native”
Sophisticated type system Scalable language
![Page 46: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/46.jpg)
(C) 2011 Kunle Olukotun 46
Embedded DSL gets it all for free,but can’t change any of it
Lightweight Modular Staging Approach
Lexer ParserType
checkerAnalysis Optimization
Code gen
DSLs adopt front-end from highly expressive
embedding language
but can customize IR and participate in backend phases
Stand-alone DSL implements everything
Typical Compiler
Modular Staging provides a hybrid approach
GPCE’10: Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
![Page 47: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/47.jpg)
(C) 2011 Kunle Olukotun 47
Delite: A Framework for DSL Parallelism
Lexer ParserType
checkerAnalysis Optimization
Code gen
DSLs adopt front-end from highly expressive
embedding language
but can customize IR and participate in backend phases
Need a framework to simplify development of DSL backends
Delite
H. Chafi, A. Sujeeth, K. Brown, H. Lee
![Page 48: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/48.jpg)
(C) 2011 Kunle Olukotun 48
Benefits of a Shared DSL Infrastructure
Parallelization of DSLs Map domain abstractions to low level parallel execution
patterns Why not provide a collection of these execution patterns?
Code Generation to multiple targets Most DSLs have common functionality
Why no handle this once and for all DSLs?
Shared parallel runtime to accelerate DSL development
Handle other issues in a consistent fashion across DSLs Debugging IDE and tooling support
![Page 49: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/49.jpg)
(C) 2011 Kunle Olukotun 49
Intermediate Representation (IR)
Delite DSL Compiler
Provide a common IR that can be extended while still benefitting from generic analysis and opt.
Extend common IR and provide IR nodes that encode data parallel execution patterns
Now can do parallel optimizations and mapping
DSL extends appropriate data parallel nodes for their operations
Now can do domain-specific analysis and opt.
Generate an execution graph, kernels and data structures
Scala Embedding
Framework
Delite
Execution
Graph
Delite Parallelism
Framework
Base IR
Generic
Analysis & Opt.
Code Generation
Kernels(Sc
ala, C,
Cuda, MPI
Verilog, …)
Liszt
program
OptiML
program
DS IR
Domain
Analysis & Opt.
Delite IR
Parallelism Analysis,
Opt. & Mapping
⇒⇒
Data
Structures(arra
ys, trees,
graphs, …)
![Page 50: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/50.jpg)
(C) 2011 Kunle Olukotun 50
The Delite IR
Matrix
Plus
Vector
Exp
Matrix
Sum
ReduceMapZipWith
Expression
s = sum(M)V1 = exp(V2)M1 = M2 + M3
Domain
Analysis & Opt.
Domain User
Interface
Parallelism
Analysis & Opt.
Code Generation
DSL
User
Generic Analysis
& Opt.
Application
DS IR
Delite Op IR
Base IR
DSL
Author
Delite
Delite
Collection
Quicksort
Divide &
Conquer
C2 = sort(C1)
![Page 51: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/51.jpg)
(C) 2011 Kunle Olukotun 51
Partial schedules, Fused & specialized kernels
Cluster
GPUSMP
Machine Inputs
Delite
Execution
Graph
Kernels
(Scala, C,
Cuda,
Verilog, …)
Data
Structures
(arrays, trees,
graphs, …)
Application Inputs
Delite Execution
SchedulerCode Generator
Fusion, Specialization, Synchronization
Walk-Time
Schedule Dispatch, Dynamic load balancing, Memory management,
Lazy data transfers, Kernel auto-tuning, Fault tolerance
Maps the machine-agnostic DSL compiler output onto the machine configuration for execution
Walk-time scheduling produces partial schedules
Code generation produces fused, specialized kernels to be launched on each resource
Run-time executor controls and optimizes execution
Run-Time
![Page 52: CS442: High Productivity and Performance with Domain ... · Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics (Liszt)](https://reader034.fdocuments.net/reader034/viewer/2022042916/5f571c242ecfdd6dd56c6e7a/html5/thumbnails/52.jpg)
(C) 2011 Kunle Olukotun 52
Conclusions
DSLs have potential to solve the heterogeneous parallel programming problem
Don’t expose programmers to explicit parallelism unless they ask for it
Look at the process of developing DSLs for parallelism
Need programming languages to be designed for flexible embedding (Scala)
Lightweight modular staging in Scala allows for more powerful embedded DSLs
Delite provides a framework for adding parallelism
Explore DSLs and DSL infrastucture in this course