Intel Array Building Blocks

20
Intel Array Building Blocks BY: EDWARD JONES

description

Intel Array Building Blocks. By : Edward Jones. Background. Intel Ct: Developed in 2007 Parallel programming model for multicore chips Exploits Single Instruction, Multiple D ata (SIMD) RapidMind Started in 2004 - PowerPoint PPT Presentation

Transcript of Intel Array Building Blocks

Page 1: Intel Array Building Blocks

Intel Array Building BlocksBY: EDWARD JONES

Page 2: Intel Array Building Blocks

Background

Intel Ct: Developed in 2007

Parallel programming model for multicore chips

Exploits Single Instruction, Multiple Data (SIMD)

RapidMind Started in 2004

Provided software product that simplifies the use of multi-core processors and graphics processing units (GPUs)

Intel acquired RapidMind on August 19, 2009

Page 3: Intel Array Building Blocks

Intel ArBB

Intel ArBB is a C++ API

Promote parallel programming

Hide intricacies hardware and vector ISA

Oriented to data-intensive mathematical computations

Built in protection An ArBB program cannot create race conditions or deadlocks by

default

Page 4: Intel Array Building Blocks

What is it used for?

Bioinformatics

Engineering Design

Financial Analytics

Oil and Gas

Medical Imaging

Visual Computing

Signal and Image Processing

Science and Research

Enterprise

Page 5: Intel Array Building Blocks

Extend C++

Use standard C++ feature to create new types and operators

Constructs of ArBB Scalar types – equivalent to primitive C++ types

Vector types – parallel collections of scalar data

Operators – Scalar and vector operators

Functions – User defined code fragments

Control flow

Page 6: Intel Array Building Blocks

Scalar Types

Types Description C++ equivalents

f32, f64 32/64 bit floating point number

Float, double

i8, i16, i32, i64 8/16/32/64 bit signed integers

Char, short, int

u8, u16, u32, u64 8/16/32/64 bit unsigned integers

Unsigned char, short, int

Boolean Boolean value bool

usize, isize Signed/unsigned integers sufficiently large to store addresses.

size_t

Page 7: Intel Array Building Blocks

Dense Containers

Very similar to vectors

Dynamically changes size during runtime

Operations: Element wise scalar operations

Indexing

Reordering

Reductions

Property Access

Most operations run in parallel

Page 8: Intel Array Building Blocks

Dense Containers Example

void vecsum (dense<f32> a, dense<f32> b, dense<f32>&c){

c = a + b;

}

int main(int argc, char** argv){

#define SIZE = 1024;

float a[SIZE]; float b[SIZE]; float c[SIZE];

dense<f32> va; bind (va, a, SIZE);

dense<f32> vb; bind (vb, b, SIZE);

dense<f32> vc; bind (va, c, SIZE);

call(vecsum)(va, vb, vc);

}

Page 9: Intel Array Building Blocks

Element-wise and Vector-scalar Operators

All standard C++ arithmetic, bitwise, and logical operators can be used in vector computations

This allows these operations to be done in parallel to speed up runtime.

Other operators

Operator Description

abs Absolute value

cos Cosine

sin Sine

tan Tangent

exp Exponent

log Natural logarithm

Page 10: Intel Array Building Blocks

Collective Operators

Perform computations where output(s) depend on all of the inputs.

Example

Reduction – applies an operator over an entire vector to compute a distilled value or values.

add_reduce([1 0 2 -1 4]) yields

6

Scan – computes reductions on all prefixes of a collection

add_iscan([1 0 2 -1 4]) yields

[1 (1+0) (1+0+2) (1+0+2+(-1)) (1+0+2+(-1)+4)]

Page 11: Intel Array Building Blocks

Other Types of Operators

Permutation Operators These operations alter the size and order of vectors

a = shift(b, -1, value);

a = rotate(b, -1)

Facility Operators Provides data processing features

Operator Dimension Description

cat 1, 2, 3 Concatenate dense containers

page 3 Retrieve slice of a dense container

Page 12: Intel Array Building Blocks

Differences from C++

_for(i32 i=0, i<=N, i++) { _if(condition){

/* code */ /* code */

} _end_for; }

_else {

_while(condition){ /* code */

/* code */ } _end_if;

} _end_while;

Page 13: Intel Array Building Blocks

Functions

Calling ArBB functions is different from normal function calls Form: mfc fnct = call(my_function);

Calling a function creates a closure for that function

Once created the first time it will never be created again

Allows for Currying

‘map’ function allows the programmer to execute a function for every element in a vector

Page 14: Intel Array Building Blocks

Dynamic Execution Engine

Array Building Blocks provides a dynamic execution engine which comprises three major services:

Threading Runtime Provides a model for fine-grained model for data and task parallel

threading

Memory Manager Segregates normal C++ memory from the ArBB memory

Set of lock-free memory interfaces as a garbage collector

Just-in-time Compiler/Dynamic Engine Constructs intermediate representation of computations, performs

optimizations, and generates code.

Page 15: Intel Array Building Blocks

Monte Carlo Computation of Pi

Page 16: Intel Array Building Blocks

Monte Carlo Computation of PiC/C++

double computepi(){

int cnt = 0;

for(int i = 0; i < NEXP; i++){

float x = float(rand()) / float(RAND_MAX);

float y = float(rand()) / float(RAND_MAX);

float dst = sqrtf (x*x + y*y);

if (dst <= 1.0f){

cnt++;

}

}

return 4.0 * ((double) cnt) /NEXP;

}

*NEXP = O(2p(n))

Page 17: Intel Array Building Blocks

Monte Carlo Computation of Pi ArBB

Void computepi(f64& pi) {

random_generator rng;

dense<f32> x = rng.randomize(NEXP);

dense<f32> y = rng.randomize(NEXP);

dense<f32> dist = sqrt(x*x + y*y);

dense<Boolean> mask = (dist <= 1.0f);

dense<i32> cnt = select(mask, 1, 0);

pi = 4.0 * add_reduce(cnt) / NEXP;

}

Page 18: Intel Array Building Blocks

Evaluation of Monte Carlo

Samples Pi(Distances <= 1)

10 3.2

1,000 3.212

1,000,000 3.14572

10,000,000 3.141176

50,000,000 3.141698

Page 19: Intel Array Building Blocks

Intel ArBB Today

Preview Release August 25, 2011 1.0 beta 6

Project retired by Intel October 2012

Overshadowed by Intel Cilk Plus and Intel Threading Building Blocks