Intel Array Building Blocks

Intel Array Building BlocksBY: EDWARD JONES

Background

Intel Ct: Developed in 2007

Parallel programming model for multicore chips

Exploits Single Instruction, Multiple Data (SIMD)

RapidMind Started in 2004

Provided software product that simplifies the use of multi-core processors and graphics processing units (GPUs)

Intel acquired RapidMind on August 19, 2009

Intel ArBB

Intel ArBB is a C++ API

Promote parallel programming

Hide intricacies hardware and vector ISA

Oriented to data-intensive mathematical computations

Built in protection An ArBB program cannot create race conditions or deadlocks by

default

What is it used for?

Bioinformatics

Engineering Design

Financial Analytics

Oil and Gas

Medical Imaging

Visual Computing

Signal and Image Processing

Science and Research

Enterprise

Extend C++

Use standard C++ feature to create new types and operators

Constructs of ArBB Scalar types – equivalent to primitive C++ types

Vector types – parallel collections of scalar data

Operators – Scalar and vector operators

Functions – User defined code fragments

Control flow

Scalar Types

Types Description C++ equivalents

f32, f64 32/64 bit floating point number

Float, double

i8, i16, i32, i64 8/16/32/64 bit signed integers

Char, short, int

u8, u16, u32, u64 8/16/32/64 bit unsigned integers

Unsigned char, short, int

Boolean Boolean value bool

usize, isize Signed/unsigned integers sufficiently large to store addresses.

size_t

Dense Containers

Very similar to vectors

Dynamically changes size during runtime

Operations: Element wise scalar operations

Indexing

Reordering

Reductions

Property Access

Most operations run in parallel

Dense Containers Example

void vecsum (dense<f32> a, dense<f32> b, dense<f32>&c){

c = a + b;

}

int main(int argc, char** argv){

#define SIZE = 1024;

float a[SIZE]; float b[SIZE]; float c[SIZE];

dense<f32> va; bind (va, a, SIZE);

dense<f32> vb; bind (vb, b, SIZE);

dense<f32> vc; bind (va, c, SIZE);

call(vecsum)(va, vb, vc);

}

Element-wise and Vector-scalar Operators

All standard C++ arithmetic, bitwise, and logical operators can be used in vector computations

This allows these operations to be done in parallel to speed up runtime.

Other operators

Operator Description

abs Absolute value

cos Cosine

sin Sine

tan Tangent

exp Exponent

log Natural logarithm

Collective Operators

Perform computations where output(s) depend on all of the inputs.

Example

Reduction – applies an operator over an entire vector to compute a distilled value or values.

add_reduce([1 0 2 -1 4]) yields

6

Scan – computes reductions on all prefixes of a collection

add_iscan([1 0 2 -1 4]) yields

[1 (1+0) (1+0+2) (1+0+2+(-1)) (1+0+2+(-1)+4)]

Other Types of Operators

Permutation Operators These operations alter the size and order of vectors

a = shift(b, -1, value);

a = rotate(b, -1)

Facility Operators Provides data processing features

Operator Dimension Description

cat 1, 2, 3 Concatenate dense containers

page 3 Retrieve slice of a dense container

Differences from C++

_for(i32 i=0, i<=N, i++) { _if(condition){

/* code */ /* code */

} _end_for; }

_else {

_while(condition){ /* code */

/* code */ } _end_if;

} _end_while;

Functions

Calling ArBB functions is different from normal function calls Form: mfc fnct = call(my_function);

Calling a function creates a closure for that function

Once created the first time it will never be created again

Allows for Currying

‘map’ function allows the programmer to execute a function for every element in a vector

Dynamic Execution Engine

Array Building Blocks provides a dynamic execution engine which comprises three major services:

Threading Runtime Provides a model for fine-grained model for data and task parallel

threading

Memory Manager Segregates normal C++ memory from the ArBB memory

Set of lock-free memory interfaces as a garbage collector

Just-in-time Compiler/Dynamic Engine Constructs intermediate representation of computations, performs

optimizations, and generates code.

Monte Carlo Computation of Pi

Monte Carlo Computation of PiC/C++

double computepi(){

int cnt = 0;

for(int i = 0; i < NEXP; i++){

float x = float(rand()) / float(RAND_MAX);

float y = float(rand()) / float(RAND_MAX);

float dst = sqrtf (x*x + y*y);

if (dst <= 1.0f){

cnt++;

}

}

return 4.0 * ((double) cnt) /NEXP;

}

*NEXP = O(2p(n))

Monte Carlo Computation of Pi ArBB

Void computepi(f64& pi) {

random_generator rng;

dense<f32> x = rng.randomize(NEXP);

dense<f32> y = rng.randomize(NEXP);

dense<f32> dist = sqrt(x*x + y*y);

dense<Boolean> mask = (dist <= 1.0f);

dense<i32> cnt = select(mask, 1, 0);

pi = 4.0 * add_reduce(cnt) / NEXP;

}

Evaluation of Monte Carlo

Samples Pi(Distances <= 1)

10 3.2

1,000 3.212

1,000,000 3.14572

10,000,000 3.141176

50,000,000 3.141698

Intel ArBB Today

Preview Release August 25, 2011 1.0 beta 6

Project retired by Intel October 2012

Overshadowed by Intel Cilk Plus and Intel Threading Building Blocks

Sources

http://www.drdobbs.com/parallel/array-building-blocks-a-flexible-paralle/227300084

http://openlab-mu-internal.web.cern.ch/openlab-mu-internal/03_Documents/4_Presentations/Slides/2010-list/02_CERN_openLab_Workshop-2010_Hans_Pabst.pdf






Intel Array Building Blocks

Documents

Transcript of Intel Array Building Blocks