Intel Array Building Blocks
description
Transcript of Intel Array Building Blocks
Intel Array Building BlocksBY: EDWARD JONES
Background
Intel Ct: Developed in 2007
Parallel programming model for multicore chips
Exploits Single Instruction, Multiple Data (SIMD)
RapidMind Started in 2004
Provided software product that simplifies the use of multi-core processors and graphics processing units (GPUs)
Intel acquired RapidMind on August 19, 2009
Intel ArBB
Intel ArBB is a C++ API
Promote parallel programming
Hide intricacies hardware and vector ISA
Oriented to data-intensive mathematical computations
Built in protection An ArBB program cannot create race conditions or deadlocks by
default
What is it used for?
Bioinformatics
Engineering Design
Financial Analytics
Oil and Gas
Medical Imaging
Visual Computing
Signal and Image Processing
Science and Research
Enterprise
Extend C++
Use standard C++ feature to create new types and operators
Constructs of ArBB Scalar types – equivalent to primitive C++ types
Vector types – parallel collections of scalar data
Operators – Scalar and vector operators
Functions – User defined code fragments
Control flow
Scalar Types
Types Description C++ equivalents
f32, f64 32/64 bit floating point number
Float, double
i8, i16, i32, i64 8/16/32/64 bit signed integers
Char, short, int
u8, u16, u32, u64 8/16/32/64 bit unsigned integers
Unsigned char, short, int
Boolean Boolean value bool
usize, isize Signed/unsigned integers sufficiently large to store addresses.
size_t
Dense Containers
Very similar to vectors
Dynamically changes size during runtime
Operations: Element wise scalar operations
Indexing
Reordering
Reductions
Property Access
Most operations run in parallel
Dense Containers Example
void vecsum (dense<f32> a, dense<f32> b, dense<f32>&c){
c = a + b;
}
int main(int argc, char** argv){
#define SIZE = 1024;
float a[SIZE]; float b[SIZE]; float c[SIZE];
dense<f32> va; bind (va, a, SIZE);
dense<f32> vb; bind (vb, b, SIZE);
dense<f32> vc; bind (va, c, SIZE);
call(vecsum)(va, vb, vc);
}
Element-wise and Vector-scalar Operators
All standard C++ arithmetic, bitwise, and logical operators can be used in vector computations
This allows these operations to be done in parallel to speed up runtime.
Other operators
Operator Description
abs Absolute value
cos Cosine
sin Sine
tan Tangent
exp Exponent
log Natural logarithm
Collective Operators
Perform computations where output(s) depend on all of the inputs.
Example
Reduction – applies an operator over an entire vector to compute a distilled value or values.
add_reduce([1 0 2 -1 4]) yields
6
Scan – computes reductions on all prefixes of a collection
add_iscan([1 0 2 -1 4]) yields
[1 (1+0) (1+0+2) (1+0+2+(-1)) (1+0+2+(-1)+4)]
Other Types of Operators
Permutation Operators These operations alter the size and order of vectors
a = shift(b, -1, value);
a = rotate(b, -1)
Facility Operators Provides data processing features
Operator Dimension Description
cat 1, 2, 3 Concatenate dense containers
page 3 Retrieve slice of a dense container
Differences from C++
_for(i32 i=0, i<=N, i++) { _if(condition){
/* code */ /* code */
} _end_for; }
_else {
_while(condition){ /* code */
/* code */ } _end_if;
} _end_while;
Functions
Calling ArBB functions is different from normal function calls Form: mfc fnct = call(my_function);
Calling a function creates a closure for that function
Once created the first time it will never be created again
Allows for Currying
‘map’ function allows the programmer to execute a function for every element in a vector
Dynamic Execution Engine
Array Building Blocks provides a dynamic execution engine which comprises three major services:
Threading Runtime Provides a model for fine-grained model for data and task parallel
threading
Memory Manager Segregates normal C++ memory from the ArBB memory
Set of lock-free memory interfaces as a garbage collector
Just-in-time Compiler/Dynamic Engine Constructs intermediate representation of computations, performs
optimizations, and generates code.
Monte Carlo Computation of Pi
Monte Carlo Computation of PiC/C++
double computepi(){
int cnt = 0;
for(int i = 0; i < NEXP; i++){
float x = float(rand()) / float(RAND_MAX);
float y = float(rand()) / float(RAND_MAX);
float dst = sqrtf (x*x + y*y);
if (dst <= 1.0f){
cnt++;
}
}
return 4.0 * ((double) cnt) /NEXP;
}
*NEXP = O(2p(n))
Monte Carlo Computation of Pi ArBB
Void computepi(f64& pi) {
random_generator rng;
dense<f32> x = rng.randomize(NEXP);
dense<f32> y = rng.randomize(NEXP);
dense<f32> dist = sqrt(x*x + y*y);
dense<Boolean> mask = (dist <= 1.0f);
dense<i32> cnt = select(mask, 1, 0);
pi = 4.0 * add_reduce(cnt) / NEXP;
}
Evaluation of Monte Carlo
Samples Pi(Distances <= 1)
10 3.2
1,000 3.212
1,000,000 3.14572
10,000,000 3.141176
50,000,000 3.141698
Intel ArBB Today
Preview Release August 25, 2011 1.0 beta 6
Project retired by Intel October 2012
Overshadowed by Intel Cilk Plus and Intel Threading Building Blocks
Sources
http://www.drdobbs.com/parallel/array-building-blocks-a-flexible-paralle/227300084
http://openlab-mu-internal.web.cern.ch/openlab-mu-internal/03_Documents/4_Presentations/Slides/2010-list/02_CERN_openLab_Workshop-2010_Hans_Pabst.pdf