EAVL Extreme-scale Analysis and Visualization Library

Post on 15-Jan-2016

42 views 0 download

Tags:

description

EAVL Extreme-scale Analysis and Visualization Library. Jeremy Meredith SDAV Next-Gen Library Meeting September, 2012. History. Originally ORNL LDRD Jeremy Meredith, Sean Ahern, Dave Pugmire plus Rob Sisneros joined as a postdoc - PowerPoint PPT Presentation

Transcript of EAVL Extreme-scale Analysis and Visualization Library

EAVL

EXTREME-SCALE ANALYSIS AND VISUALIZATION LIBRARY

Jeremy Meredith

SDAV Next-Gen Library MeetingSeptember, 2012

History

• Originally ORNL LDRD– Jeremy Meredith, Sean Ahern, Dave Pugmire– plus Rob Sisneros joined as a postdoc

• Many hours sitting in conference rooms arguing over things like “what does it mean to have one of your dimensions be unstructured?”– then determine what to do that’s practical without

falling off the data modeling deep end . . . .

• Exascale focus

Approaching the Exascale Problems

• Update traditional data model to handle modern simulation codes and a wider range of data.

• Investigate how an updated data and execution model can achieve the necessary computational, I/O, and memory efficiency.

• Explore methods for visualization algorithm developers to achieve these efficiency gains and better support exascale architectures.

DATA MODELING CHALLENGES

Connectivity3D Point CoordinatesCell FieldsPoint Fields

Dimensions3D Point CoordinatesCell FieldsPoint Fields

Dimensions3D Axis CoordinatesCell FieldsPoint Fields

A Traditional Data Set Model

Data Set

Rectilinear Structured Unstructured

Challenge: Non-Physical Data Analysis

• Graph Data– topologically 0D vertices, 1D edges– non-spatial; storing X/Y/Z values is wasted space

• Pure Parameter Studies– e.g. reaction rate of combustion• FOUR “spatial” dimensions

– e.g. methane concentration vs oxygen concentration vs temperature vs pressure

• more complex reaction higher dimensionality

oxyg

en

met

hane

temperature

pressure

Challenge: Molecular Data(e.g., LAMMPS, VASP)

• To represent using vtkPolyData or vtkUnstructuredGrid:– VTK_VERTEX cells for the atoms– VTK_LINE cells for the bonds

• Any field data must exist on both element types– Not only inefficient:

• dummy bond strengths on the atoms?• dummy atomic numbers on the bonds?

– But also incorrect:• e.g. average(BondStrength) uses dummy values from atoms?

H

C

H

C

H

H

BondStr11112

AtomicNum661111

Challenge: Side Sets(e.g. Exodus, flux surfaces)

• The flow from A to B is defined on a set of faces• The flux variable is defined only on those faces– do you combine them into a single mesh?• waste space on dummy values, potentially introducing errors

– or create a separate mesh and lose the mapping info?• horribly expensive and error-prone to recalculate mapping

A Bflux surface lives inside the volumetric mesh

Challenge: Dimensionality, Refinement(e.g. GenASiS)

• (a) seven (or eight) dimensional mesh– f(x,y,z,ϴ,ϕ,λ,F)=E, plus time

• (b) refinement occurs on a per-cell basis– can’t assume per-block refinement– sometimes referred to as “unstructured AMR”

www.vacet.org

Challenge: Unique Mesh Topologies(e.g. MADNESS)

• MADNESS does not have a traditional mesh– Just a quad-tree with polynomial coefficients– Up to 30 refinement levels / tree depth

31 2 4

5

root

1

2

345

spatial structure internal tree representation

www.vacet.org

Challenge: Very High Order Fields(e.g. MADNESS)

• Legendre polynomial series at each tree node– Each tree node has Kdim coefficients– K can be up to approx. 20• i.e. 400 coeffs per tree node in 2D, 8000 in 3D

(example with K=3, dim=2)

0.834 0.592 0.0030.592 0.003 0.0100.003 0.010 0.007

THE EAVL DATA MODEL

Connectivity3D Point CoordinatesCell FieldsPoint Fields

Dimensions3D Point CoordinatesCell FieldsPoint Fields

Dimensions3D Axis CoordinatesCell FieldsPoint Fields

A Traditional Data Set Model (again)

Data Set

Rectilinear Structured Unstructured

TreeConnectivity Dimensions

FieldNameComponent

NameAssociationValues

Cells[]Points[]Fields[]

The EAVL Data Set Model

Data Set

CellSet

Explicit Structured

CoordsField

QuadTreeCellList

Subset

Connectivity: (a bunch of cells)

FieldName: “c” “c” “c”Component: 0 1 2

Name: “c”Association: PointsValues[3*npts]

Cells[1]Points[1]Fields[1]

Example: An Unstructured Grid(with interleaved coordinates)

eavlDataSet

eavlExplicitCellSet

eavlCoordinates

eavlField

Connectivity: (a bunch of cells)

FieldName: “x” “y” “z”Component: 0 0 0

Name: “x”Association: PointsValues[npts]

Cells[1]Points[1]Fields[3]

Example: An Unstructured Grid(with separated coordinates)

eavlDataSet

eavlExplicitCellSet

eavlCoordinates

eavlField #0Name: “y”Association: PointsValues[npts]

eavlField #1Name: “z”Association: PointsValues[npts]

eavlField #2

RegularStructure: 30 40 30

FieldName: “x” “y” “z”Component: 0 0 0

Name: “x”Association: PointsValues[npts]

Cells[1]Points[1]Fields[3]

Example: A Curvilinear Grid

eavlDataSet

eavlStructuredCellSet

eavlCoordinates

eavlField #0Name: “y”Association: PointsValues[npts]

eavlField #1Name: “z”Association: PointsValues[npts]

eavlField #2

RegularStructure: 30 40 30

FieldName: “x” “y” “z”Component: 0 0 0

Name: “x”Association: LogicalDim0Values[ni]

Cells[1]Points[1]Fields[3]

Example: A Rectilinear Grid

eavlDataSet

eavlStructuredCellSet

eavlCoordinates

eavlField #0Name: “y”Association: LogicalDim1Values[nj]

eavlField #1Name: “z”Association: LogicalDim2Values[nk]

eavlField #2

RegularStructure: 30 40 30 4 360

FieldName: “x” “y” “z” “μ” “ϴ”Component: 0 0 0 0 0

Name: “x”Association: LogicalDim0Values[ni]

Cells[1]Points[1]Fields[5]

Example: High-Dimensional Grid

eavlDataSet

eavlStructuredCellSet

eavlCoordinates

eavlField #0Name: “y”Association: LogicalDim1Values[nj]

eavlField #1Name: “z”Association: LogicalDim2Values[nk]

eavlField #2Name: “μ”Association: LogicalDim3Values[nμ]

eavlField #3Name: “ϴ”Association: LogicalDim4Values[nϴ]

eavlField #4

RegularStructure: 30 40

FieldName: “lat” “lon”Component: 0 0

Name: “lat”Association: LogicalDim0Values[ni]

Cells[1]Points[2]Fields[3]

Example: Geospatial Data

eavlDataSet

eavlStructuredCellSet

eavlCoordinates

eavlField #0Name: “lon”Association: LogicalDim1Values[nj]

eavlField #1Name: “c”Association: PointsValues[3*npts]

eavlField #2

FieldName: “c” “c” “c”Component: 0 1 2

eavlCoordinates

Example: Molecular Data

Connectivity: the atoms

FieldName: “c” “c” “c”Component: 0 1 2

Name:”atomic number”Association: Cell Set #0Values[ncells #0]

Cells[2]Points[1]Fields[3]

eavlDataSet

eavlExplicitCellSet #0

eavlCoordinates

eavlField #1

Connectivity: the bonds

eavlExplicitCellSet #1

Name: “c”Association: PointsValues[3*npts]

eavlField #0Name: “bond strength”Association: Cell Set #1Values[ncells #1]

eavlField #2

Example: Face-centered Data

Connectivity: volumetric

FieldName: “c” “c” “c”Component: 0 1 2

Cells[2]Points[1]Fields[2]

eavlDataSet

eavlExplicitCellSet

eavlCoordinates

Parent: ( )

eavlAllFacesOfExplicit

Name: “c”Association: PointsValues[3*npts]

eavlField #0Name: “facevariable”Association: Cell Set #2Values[nfaces]

eavlField #1

FILTERING IN EAVL

Data flow networks in EAVL (or not)

• A “Filter” is a stage in a data flow network– Creates a new data set from an old one

• Many operations do not change a mesh structure (assuming data model is sufficiently descriptive)– Arithmetic expressions: only modifies fields– External facelist: points and structure remain– Feature edges: just a new cell set with old points– Smooth, displace, elevate: only modify coordinates

• So: eavlMutator is an alternative to eavlFilter– Modifies a data set in-place

eavlMutator• In-place data set modification• Support for destructive in-place operation– free memory as you go

• Execute multiple mutators simultaneously on the same data set (barring conflicts)– e.g. displace (coords) + threshold (cells) concurrently

• How about data flow network support?– encapsulate an eavlMutator through a

eavlFilterFromMutator facade • Of course, some operations are natively eavlFilters– can facade through eavlMutatorFromFilter (?)

FieldName: “x” “y” “z”Component: 0 0 0

• Explicit cells can be combined with structured coordinates.

Example: Thresholding an RGrid (a)

eavlCoordinates

Name: “x”

Association: LogicalDim0Values[ni]

eavlField#0Name: “y”

Association: LogicalDim1Values[nj]

eavlField#1Name: “z”

Association: LogicalDim2Values[nk]

eavlField#2

RegularStructure: 30 40 30

eavlStructuredCellSet

FieldName: “x” “y” “z”Component: 0 0 0

eavlCoordinates

Name: “x”

Association: LogicalDim0Values[ni]

eavlField#0Name: “y”

Association: LogicalDim1Values[nj]

eavlField#1Name: “z”

Association: LogicalDim2Values[nk]

eavlField#2

Connectivity: (a bunch of cells)

eavlExplicitCellSet

Cells: (…)

Parent: ( )

• A second Cell Set can be added which refers to the first one

Example: Thresholding an RGrid (b)

RegularStructure: 30 40 30

eavlStructuredCellSet eavlSubset

FieldName: “x” “y” “z”Component: 0 0 0

eavlCoordinates

Name: “x”

Association: LogicalDim0Values[ni]

eavlField#0Name: “y”

Association: LogicalDim1Values[nj]

eavlField#1Name: “z”

Association: LogicalDim2Values[nk]

eavlField#2

RegularStructure: 30 40 30

eavlStructuredCellSet

Name: “x”

Association: LogicalDim0Values[ni]

eavlField#0Name: “y”

Association: LogicalDim1Values[nj]

eavlField#1Name: “z”

Association: LogicalDim2Values[nk]

eavlField#2

FieldName: “x” “y” “z”Component: 0 0 0

eavlCoordinates

30x30 or 30x40

eavlStructSubset

30x30 or 30x40

eavlStructSubset

• Add six new subset-cell sets to original mesh

Example: Structured External Facelist

FieldName: “x” “y” “z”Component: 0 0 0

eavlCoordinates

Name: “x”

Association: PointsValues[npts]

eavlField#0Name: “y”

Association: PointsValues[npts]

eavlField#1Name: “z”

Association: PointsValues[npts]

eavlField#2

FieldName: “x” “y” “z”Component: 0 0 0

eavlCoordinates

Name: “x”

Association: PointsValues[npts]

eavlField#0Name: “y”

Association: PointsValues[npts]

eavlField#1Name: “z”

Association: PointsValues[npts]

eavlField#2

RegularStructure: 30 40 30

eavlStructuredCellSet30 40 30

eavlStructCellSet30x30 or 30x40

eavlStructSubset

x6

• No problem-sized data modifications.– Interleaved and separated coordinates can be used simultaneously.

Example: Elevating a Structured Grid

FieldName: “c” “c”Component: 0 1

eavlCoordinates

Name: “c”Association:PointsValue[2*npts]

eavlField#0Name: “val”Association:PointsValues[npts]

eavlField#1

RegularStructure: 30 40

eavlStructuredCellSet

FieldName: “c” “c” “val”Component: 0 1 0

eavlCoordinates

RegularStructure: 30 40

eavlStructuredCellSet

Name: “c”Association:PointsValue[2*npts]

eavlField#0Name: “val”Association:PointsValues[npts]

eavlField#1

• No problem-sized data modifications.– Some axes on logical dims, with others on the points.

Example: Elevating a Regular Grid

FieldName: “x” “y”Component: 0 0

eavlCoordinates

Name: “x”

Association: LogicalDim0Values[ni]

eavlField#0Name: “y”

Association: LogicalDim1Values[nj]

eavlField#1Name: “val”

Association: PointsValues[npts]

eavlField#2

RegularStructure: 30 40

eavlStructuredCellSet

Name: “x”

Association: LogicalDim0Values[ni]

eavlField#0Name: “y”

Association: LogicalDim1Values[nj]

eavlField#1Name: “val”

Association: PointsValues[npts]

eavlField#2

FieldName: “x” “y” “val”Component: 0 0 0

eavlCoordinates

RegularStructure: 30 40

eavlStructuredCellSet

DEALING WITH CONCURRENCY

Concurrency at Multiple Levels

• Distributed Parallelism– Message passing still works well– Avoid global communication• local domain interconnectivity information

– Hybrid (e.g. spatiotemporal) parallelism• Task Parallelism– Fine-grain dependency tracking• e.g. displace (coords) + threshold (cells) concurrently

– eavlMutator helps– single eavlDataSet container class helps

• Thread Parallelism– Fine-grain data parallelism; CUDA, OpenMP

Data Parallelism for Developers

• Functor + iterator paradigm

• Iteration patterns for mesh topologies

• CUDA + OpenMP execution back-ends

A Simple Data-Parallel Operation

void CellToCellDivide(Field &a, Field &b, Field &c){ for_each(i) c[i] = a[i] / b[i];} 

void CalculateDensity(...){ //... CellToCellDivide(mass, volume, density);}

Internal Library API Provides ThisAlgorithm Developer Writes This

Functor + Iterator Approach

void CalculateDensity(...){ //... CellToCellBinaryOp(mass, volume, density, Divide());}

template <class T> void CellToCellBinaryOp<T>(Field &a, Field &b, Field &c T &f){ for_each(i) f(a[i],b[i],c[i]);}

struct Divide{ void operator()(float &a, float &b, float &c) { c = a / b; }};

Internal Library API Provides ThisAlgorithm Developer Writes This

Custom Functor

void CalculateDensity(...){ //... CellToCellBinaryOp(mass, volume, density, MyFunctor());}

template <class T> void CellToCellBinaryOp<T>(Field &a, Field &b, Field &c T &f){ for_each(i) f(a[i],b[i],c[i]);}

struct MyFunctor{ void operator()(float &a, float &b, float &c) { c = a + 2*log(b); }};

Algorithm DeveloperWrites These

Internal Library API Provides This

Functor Efficiency on CPU and GPU

• Data: noise.silo• Surface normal

0 μs

100 μs

200 μs

300 μs

400 μs

500 μs

600 μs

unoptimized optimized unoptimized optimized

Intel Xeon L5420 NVIDIA GeForce 8800GTS

Functor Inline

Binding Values to Functorsstruct ScaleByConst{ float scale; ScaleByConst(float s) : scale(s) { } void operator()(float &a, float &b) { b = a * scale; }};

void CalculateDensity(...){ //... cell_volume = mesh_volume / mesh_numcells; CellToCellUnaryOp(mass, density, ScaleByConst(1.0/cell_volume));}

DATA PARALLELISM BASICS

Map with 1 input, 1 output

Simplest data-parallel operation. Each result item can be calculated from its corresponding input item alone.

x 3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

6 14 0 2 8 0 0 8 10 6 2 0

struct f { float operator()(float x) { return x*2; }};

result

Map with 2 inputs, 1 output

With two input arrays, the functor takes two inputs. You can also have multiple outputs.

x 3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

5 11 2 2 12 3 9 9 10 4 3 1

struct f { float operator()(float a, float b) { return a+b; }};

result

y 2 4 2 1 8 3 9 5 5 1 2 1

Scatter with 1 input (and thus 1 output)

Possibly inefficient, risks of race conditions and uninitialized results. (Can also scatter to larger array if desired.)Often used in a scatter_if–type construct.

x 3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

0 1 3 0

No functor

result

indices

4

2 4 1 5 5 0 4 2 1 2 1 4

Gather with 1 input (and thus 1 output)

Unlike scatter, no risk of uninitialized data or race condition. Plus, parallelization is over a shorter indices array, and caching helps more, so can be more efficient.

x 3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

7 3 0 3 1

No functor

result

indices 1 9 6 9 3

Reduction with 1 input (and thus 1 output)

Example: max-reduction. Sum is also common.Often a fat-tree-based implementation.

x 3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

7result

struct f { float operator()(float a, float b) { return a>b ? a : b; }};

Inclusive Prefix Sum (a.k.a. Scan)with 1 input/output

Value at result[i] is sum of values x[0]..x[i].Surprisingly efficient parallel implementation.Basis for many more complex algorithms.

x 3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

3 10 10 11 15 15 15 19 24 27 28 28

No functor.

result

+ + + + + + + + + + +

Exclusive Prefix Sum (a.k.a. Scan)with 1 input/output

Initialize with zero, value is sum of only up to x[i-1].May be more commonly used than inclusive scan.

x 3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

0 10 10 11 15 15 15 19 24 27 28

No functor.

result 3

+ + + + + + + + + + +0

DATA PARALLELISM ON MESHES

Example: Surface Normal

• For each 2D cell(i.e. each polygon):– Get three adjacent points– Pair-wise vector subtract– Cross product

• Data-parallel:– Repeat for all cells

Example: Surface Normal

• INPUT:– 3-dimensional

coordinates array on the mesh NODES

– example: length = 9

• OUTPUT:– 3-component surface

normals array onthe mesh CELLS

– example: length = 4

Under the Covers: Node-to-Cell on CPUvoid NodeToCellOp3::ExecuteCPU(){#pragma omp parallel for for (int i=0; i<input->NumCells(); i++) { // get cell node indices

int nNodes, nodeIds[8]; float nodeValues[3][8]; conn.GetCellNodes(index, nNodes, nodeIds);

// get coordinates for nodes for (int i=0; i<nNodes; i++) { nodeValues[0][i] = array0[nodeIds[i]]; nodeValues[1][i] = array1[nodeIds[i]]; nodeValues[2][i] = array2[nodeIds[i]]; }

// call functor functor(nodeValues[0], nodeValues[1], nodeValues[2], &out0[i], &out1[i], &out2[i]); } }

Under the Covers: Node-to-Cell on GPU

void NodeToCellOp3::ExecuteGPU(){ float *d_arr0 = (float*)array0->GetCUDAArray(); float *d_arr1 = (float*)array1->GetCUDAArray(); float *d_arr2 = (float*)array2->GetCUDAArray(); float *d_out = (float*)output0->GetCUDAArray();

// calculate CUDA thread grid num blocks & threads // based on input->NumCells()

nodeToCellKernel3<<<nb,nt>>>(d_arr0, d_arr1, d_arr2, d_out0, d_out1, d_out2, conn, functor);}

Under the Covers: The CUDA Kerneltemplate <class F>__global__ void NodeToCellKernel3(float *array0, float *array1, float *array2, float *out0, float *out1, float *out2, ExplicitConnectivity conn, F functor){ const int index = blockIdx.x * blockDim.x + threadIdx.x; // get cell node indices int nNodes, nodeIds[8]; float nodeValues[3][8]; conn.GetCellNodes(index, nNodes, nodeIds); // get coordinates for nodes for (int i=0; i<nNodes; i++) { nodeValues[0][i] = array0[nodeIds[i]]; nodeValues[1][i] = array1[nodeIds[i]]; nodeValues[2][i] = array2[nodeIds[i]]; } // call functor functor(nodeValues[0], nodeValues[1], nodeValues[2], &out0[i], &out1[i], &out2[i]);}

WRITING ALGORITHMS IN EAVL

Face Surface Normal Functor+Iteratorstruct PolyNormalFunctor{ void operator()(int nvals, int shapetype, float x[], float y[], float z[], float *nx, float *ny, float *nz) { // get two adjacent edge vectors float ax = x[1]-x[0], ay = y[1]-y[0], az = z[1]-z[0]; float bx = x[2]-x[1], by = y[2]-y[1], bz = z[2]-z[1]; // calculate their cross product *nx = ay*bz - az*by; *ny = az*bx - ax*bz; *nz = ax*by - ay*bx; }};

void CalculateFaceNormals(...){ eavlExecutor::AddOperation( new eavlTopologyMap_3_3(inputcells, EAVL_NODES_OF_CELLS, xcoord, ycoord, zcoord, xnormal, ynormal, xnormal, PolyNormalFunctor()));}

Making it Easy for Developers

• Single functor code works on either CPU or GPU• Iteration patterns have multiple back-ends• CPU vs GPU execution choice is made at runtime• EAVL array class automatically supports

heterogeneous memory spaces for GPUs• Internal APIs support CUDA if developers want to

write their own CUDA kernels

Algorithms are Sequences of Operations(a) calculate face surface normals

with node-to-cell operation

(b) average to point normals via cell-to-node operation

EXAMPLE: THRESHOLD

47 52 63

32 49

31

Starting Mesh

We want to threshold a mesh based on its density values (shown here).

43 47 52 63

32 38 42 49

31 37 41 38

density 43 47 52 63 32 38 42 49 31 37 41 38

0 1 2 3 4 5 6 7 8 9 10 11

If we threshold 35 < density < 45, we want this result:

43

38 42

37 41 38

Which Cells to Include?

Evaluate a Map operation with this functor:struct InRange { float lo, hi; InRange(float l, float h) : lo(l), hi(h) { } int operator()(float x) { return x>lo && x<hi; }}

1 0 0 0

0 1 1 0

0 1 1 1

density 43 47 52 63 32 38 42 49 31 37 41 38

0 1 2 3 4 5 6 7 8 9 10 11

1 0 0 0 0 1 1 0 0 1 1 1inrange

InRange()

How Many Cells in Output?

Evaluate a Reduce operation using the Add<> functor.We can use this to create output cell length arrays.

1 0 0 0

0 1 1 0

0 1 1 1

0 1 2 3 4 5 6 7 8 9 10 11

1 0 0 0 0 1 1 0 0 1 1 1inrange

6result

plus

Where Do the Output Cells Go?

Inputindices

Outputindices

0 1 2 3

4 5 6 7

8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5output cell

0

1 2

3 4 5

input cell

How do we create this mapping?

Create Input-to-Output Indexing?

Exclusive Scan (exclusive prefix sum) gives us the output index positions.

0

1 2

3 4 5

0 1 2 3 4 5 6 7 8 9 10 11

1 0 0 0 0 1 1 0 0 1 1 1inrange

0 1 1 1 1 1 2 3 3 3 4 5startidx

+ + + + + + + + + + +0

startidx

Create Output-to-Input Indexing?

We want to work in the shorter output-length arrays and use gathers. A specialized scatter in EAVL creates this reverse index.

0 1 2 3 4 5 6 7 8 9 10 11

0 5 6 9 10 11revindex

0 1 1 1 1 1 2 3 3 3 4 5

0

5 6

9 10 11

density

Gather Input Mesh Arrays to Output?

We can now use simple gathers to pull input arrays (density, pressure) into the output mesh.

0 1 2 3 4 5 6 7 8 9 10 11

43 47 52 63 32 38 42 49 31 37 41 38

0 5 6 9 10 11revindex

43 38 42 37 41 38

43

38 42

37 41 38

output_density

PLANS AND STATUS

Plans and Status• Now Open Source (BSD “2-clause”)– Project website at http://ft.ornl.gov/eavl – Source code, docs at https://github.com/jsmeredith/EAVL

• Data Model– Most infrastructure in place– Still a few areas not fully fleshed out– The client-facing API needs work

• Execution Model– Currently imperative style– Exploring pipeline paths, contract/metadata needs

• Productization Status– More iterators– More optimization– Lots of internal cleanup