An Architecture for Large Scale Data Dave Nadeau SDSC Scientific Visualization Group.

16
NP ACI: National Partnership for Advanced Computational Infrastructure An Architecture for An Architecture for Large Scale Data Large Scale Data Dave Nadeau Dave Nadeau SDSC Scientific Visualization Group SDSC Scientific Visualization Group

Transcript of An Architecture for Large Scale Data Dave Nadeau SDSC Scientific Visualization Group.

NPACI: National Partnership for Advanced Computational Infrastructure

An Architecture forAn Architecture forLarge Scale DataLarge Scale Data

Dave NadeauDave Nadeau

SDSC Scientific Visualization GroupSDSC Scientific Visualization Group

NPACI: National Partnership for Advanced Computational Infrastructure

MotivationMotivation

CT

Cryosection

Classification

• Support analysis, filtering, and compositingSupport analysis, filtering, and compositing– Larger-than-core (and swap) data setsLarger-than-core (and swap) data sets

– Multi-modal and time-varying dataMulti-modal and time-varying data

– Multiple data sets simultaneouslyMultiple data sets simultaneously

• And...And...– Do efficient data movementDo efficient data movement

– Execute well on parallel architecturesExecute well on parallel architectures

– Integrate easily w/existing applications & toolkitsIntegrate easily w/existing applications & toolkits

• Support Alpha project applicationsSupport Alpha project applications

NPACI: National Partnership for Advanced Computational Infrastructure

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit

Layered Toolkit ArchitectureLayered Toolkit Architecture

Manage an N-space data gridManage an N-space data grid

Cache pages for lazy I/OCache pages for lazy I/O

Support specific file formatsSupport specific file formats

Manage file storageManage file storage

Bind a coord. system to dataBind a coord. system to dataOrchestrate filter executionOrchestrate filter execution

NPACI: National Partnership for Advanced Computational Infrastructure

Managing Data GridsManaging Data Grids• Manage a Manage a pagedpaged data grid (array-like) data grid (array-like)

– An N-dimensional grid of cellsAn N-dimensional grid of cells

– Spatial data & time-seriesSpatial data & time-series

– Arbitrary cell data contentArbitrary cell data content

• Handle Handle larger-than-corelarger-than-core data data– Transparently pages data in/outTransparently pages data in/out

– Support from ADR & DataCutterSupport from ADR & DataCutter

– Compressed data (disk & memory)Compressed data (disk & memory)

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Data Grid Toolkit

NPACI: National Partnership for Advanced Computational Infrastructure

• Random access (slow)Random access (slow)– Get/set cells in any orderGet/set cells in any order

• Structured access (faster)Structured access (faster)– Get/set cells in a pre-defined orderGet/set cells in a pre-defined order

• Data-order access (fastest)Data-order access (fastest)– Get/set cells in the data’s storage orderGet/set cells in the data’s storage order

Pre-fetching IntelligentlyPre-fetching Intelligently5 13 2467

51 32 4 6 7 8 9

9 8

51 32 4 6 7 8 9

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Data Grid Toolkit

NPACI: National Partnership for Advanced Computational Infrastructure

Paging IntelligentlyPaging Intelligently• Neighborhood-awareNeighborhood-aware paging paging

– Page in nearby cells in N dimensionsPage in nearby cells in N dimensions

– Support convolution filtering, rendering, marching-cubes, ...Support convolution filtering, rendering, marching-cubes, ...

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

Current center cellCurrent center cell

Keep neighboring Keep neighboring cells paged-in as wellcells paged-in as well

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Data Grid Toolkit

Filter windowFilter window

NPACI: National Partnership for Advanced Computational Infrastructure

Using Coordinate SystemsUsing Coordinate Systems• Bind a coordinate system to a data gridBind a coordinate system to a data grid

– Euclidean, cylindrical, spherical, time-series, ...Euclidean, cylindrical, spherical, time-series, ...

– Uniform, structured, unstructuredUniform, structured, unstructured

• Handle coordinate system-based operationsHandle coordinate system-based operations– Resampling with interpolationResampling with interpolation

– Lazy-evaluationLazy-evaluation

• Multiple file format handlersMultiple file format handlers

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB,ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Mesh Toolkit

NPACI: National Partnership for Advanced Computational Infrastructure

Operating on DataOperating on Data• Define an Define an expression treeexpression tree for data operations for data operations

– Leaf nodes are data sets, functions, ...Leaf nodes are data sets, functions, ...

– Interior nodes are composite, filter, ...Interior nodes are composite, filter, ...

– Transforms align overlapping data setsTransforms align overlapping data sets

• ExecuteExecute it to generate samples it to generate samples– Client defines the expressionClient defines the expression

– Server on big iron executes itServer on big iron executes it

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Expression Tree Toolkit

ClientClient

ServerServer

NPACI: National Partnership for Advanced Computational Infrastructure

Operating on ExpressionsOperating on Expressions• Expressions can be optimizedExpressions can be optimized

– Re-order operatorsRe-order operators

– Similar to optimizing compilers & databasesSimilar to optimizing compilers & databases

• Sample order can be optimizedSample order can be optimized– Re-order data accesses for better cache efficiencyRe-order data accesses for better cache efficiency

• Data can be staged & intermediate results cachedData can be staged & intermediate results cached

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Expression Tree Toolkit

NPACI: National Partnership for Advanced Computational Infrastructure

Combining Brain Data SetsCombining Brain Data Sets

RGB to HSIScalar to RGB

Mask by Hue

ScalarScalarCT-scanCT-scan

Color Color CryosectionCryosection

Color Color SegmentationSegmentation

Extract Hue

Composite

512 x 512 x 230512 x 512 x 230 547 x 710 x 672547 x 710 x 672 547 x 710 x 672547 x 710 x 672

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Expression Tree Toolkit

NPACI: National Partnership for Advanced Computational Infrastructure

Combining Brain Data SetsCombining Brain Data Sets

CT Cryosection

Composited

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Expression Tree Toolkit

NPACI: National Partnership for Advanced Computational Infrastructure

Combining Stellar Data SetsCombining Stellar Data Sets• Complex expression treesComplex expression trees

– 60+ nodes in the Orion body60+ nodes in the Orion body

• 90+ separate expression trees90+ separate expression trees– Orion, proplyds, shock fronts, ...Orion, proplyds, shock fronts, ...

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Expression Tree Toolkit

NPACI: National Partnership for Advanced Computational Infrastructure

And more toolkits...And more toolkits...• Interactive imaging with...Interactive imaging with...

– Mitsubishi VolumePro cardsMitsubishi VolumePro cards

– Point clouds & 3D texture mapping with graphics pipelinesPoint clouds & 3D texture mapping with graphics pipelines

• High-quality imaging with High-quality imaging with VISTA...VISTA...

Application

Data Grid Toolkit

Data Management

File Format Handling

SRB, ADR, etc.

Mesh Toolkit

Expression Tree Toolkit Other Toolkits

VolumeProPointCloud VISTA

3DTexture

NPACI: National Partnership for Advanced Computational Infrastructure

Design TeamDesign TeamScripps ResearchScripps Research

Art OlsonArt Olson

Mike PiqueMike Pique

Michel SannerMichel Sanner

SDSCSDSC

Bernard PailthorpeBernard Pailthorpe

Dave NadeauDave Nadeau

Jon GenettiJon Genetti

John MorelandJohn Moreland

Mike BaileyMike Bailey

Rich CharlesRich Charles

Alex DecastroAlex Decastro

U. TexasU. Texas

Chandrajit BajajChandrajit Bajaj

Ariel ShamirAriel Shamir

NPACI: National Partnership for Advanced Computational Infrastructure

Data-Visualization PipelineData-Visualization Pipeline

Get data from disk efficiently

Get data from disk efficiently

Manage data in memory efficiently

Manage data in memory efficiently

Compute on data

efficiently

Compute on data

efficiently

Visualize data

efficiently

Visualize data

efficiently

Computation Visualization

Data

SRB ServerSRB Server

MCAT (Metadata)

MCAT (Metadata)

ADR DataCutter

ADR DataCutter

SRB ServerSRB Server

KeLP FloorPlan

KeLP FloorPlan

Data

Data Orchestration

. . .

NPACI: National Partnership for Advanced Computational Infrastructure

Data-Visualization PipelineData-Visualization Pipeline

Get data from disk efficiently

Get data from disk efficiently

Manage data in memory efficiently

Manage data in memory efficiently

Compute on data

efficiently

Compute on data

efficiently

Visualize data

efficiently

Visualize data

efficiently

Computation Visualization

Data

Data - Vis Toolkits

Data - Vis Toolkits

Interaction Tools

Interaction Tools

VISTA Renderer

VISTA Renderer

Data Orchestration

. . .