Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI.
An Architecture for Large Scale Data Dave Nadeau SDSC Scientific Visualization Group.
-
Upload
michael-sullivan -
Category
Documents
-
view
218 -
download
3
Transcript of An Architecture for Large Scale Data Dave Nadeau SDSC Scientific Visualization Group.
NPACI: National Partnership for Advanced Computational Infrastructure
An Architecture forAn Architecture forLarge Scale DataLarge Scale Data
Dave NadeauDave Nadeau
SDSC Scientific Visualization GroupSDSC Scientific Visualization Group
NPACI: National Partnership for Advanced Computational Infrastructure
MotivationMotivation
CT
Cryosection
Classification
• Support analysis, filtering, and compositingSupport analysis, filtering, and compositing– Larger-than-core (and swap) data setsLarger-than-core (and swap) data sets
– Multi-modal and time-varying dataMulti-modal and time-varying data
– Multiple data sets simultaneouslyMultiple data sets simultaneously
• And...And...– Do efficient data movementDo efficient data movement
– Execute well on parallel architecturesExecute well on parallel architectures
– Integrate easily w/existing applications & toolkitsIntegrate easily w/existing applications & toolkits
• Support Alpha project applicationsSupport Alpha project applications
NPACI: National Partnership for Advanced Computational Infrastructure
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit
Layered Toolkit ArchitectureLayered Toolkit Architecture
Manage an N-space data gridManage an N-space data grid
Cache pages for lazy I/OCache pages for lazy I/O
Support specific file formatsSupport specific file formats
Manage file storageManage file storage
Bind a coord. system to dataBind a coord. system to dataOrchestrate filter executionOrchestrate filter execution
NPACI: National Partnership for Advanced Computational Infrastructure
Managing Data GridsManaging Data Grids• Manage a Manage a pagedpaged data grid (array-like) data grid (array-like)
– An N-dimensional grid of cellsAn N-dimensional grid of cells
– Spatial data & time-seriesSpatial data & time-series
– Arbitrary cell data contentArbitrary cell data content
• Handle Handle larger-than-corelarger-than-core data data– Transparently pages data in/outTransparently pages data in/out
– Support from ADR & DataCutterSupport from ADR & DataCutter
– Compressed data (disk & memory)Compressed data (disk & memory)
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Data Grid Toolkit
NPACI: National Partnership for Advanced Computational Infrastructure
• Random access (slow)Random access (slow)– Get/set cells in any orderGet/set cells in any order
• Structured access (faster)Structured access (faster)– Get/set cells in a pre-defined orderGet/set cells in a pre-defined order
• Data-order access (fastest)Data-order access (fastest)– Get/set cells in the data’s storage orderGet/set cells in the data’s storage order
Pre-fetching IntelligentlyPre-fetching Intelligently5 13 2467
51 32 4 6 7 8 9
9 8
51 32 4 6 7 8 9
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Data Grid Toolkit
NPACI: National Partnership for Advanced Computational Infrastructure
Paging IntelligentlyPaging Intelligently• Neighborhood-awareNeighborhood-aware paging paging
– Page in nearby cells in N dimensionsPage in nearby cells in N dimensions
– Support convolution filtering, rendering, marching-cubes, ...Support convolution filtering, rendering, marching-cubes, ...
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
Current center cellCurrent center cell
Keep neighboring Keep neighboring cells paged-in as wellcells paged-in as well
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Data Grid Toolkit
Filter windowFilter window
NPACI: National Partnership for Advanced Computational Infrastructure
Using Coordinate SystemsUsing Coordinate Systems• Bind a coordinate system to a data gridBind a coordinate system to a data grid
– Euclidean, cylindrical, spherical, time-series, ...Euclidean, cylindrical, spherical, time-series, ...
– Uniform, structured, unstructuredUniform, structured, unstructured
• Handle coordinate system-based operationsHandle coordinate system-based operations– Resampling with interpolationResampling with interpolation
– Lazy-evaluationLazy-evaluation
• Multiple file format handlersMultiple file format handlers
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB,ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Mesh Toolkit
NPACI: National Partnership for Advanced Computational Infrastructure
Operating on DataOperating on Data• Define an Define an expression treeexpression tree for data operations for data operations
– Leaf nodes are data sets, functions, ...Leaf nodes are data sets, functions, ...
– Interior nodes are composite, filter, ...Interior nodes are composite, filter, ...
– Transforms align overlapping data setsTransforms align overlapping data sets
• ExecuteExecute it to generate samples it to generate samples– Client defines the expressionClient defines the expression
– Server on big iron executes itServer on big iron executes it
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Expression Tree Toolkit
ClientClient
ServerServer
NPACI: National Partnership for Advanced Computational Infrastructure
Operating on ExpressionsOperating on Expressions• Expressions can be optimizedExpressions can be optimized
– Re-order operatorsRe-order operators
– Similar to optimizing compilers & databasesSimilar to optimizing compilers & databases
• Sample order can be optimizedSample order can be optimized– Re-order data accesses for better cache efficiencyRe-order data accesses for better cache efficiency
• Data can be staged & intermediate results cachedData can be staged & intermediate results cached
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Expression Tree Toolkit
NPACI: National Partnership for Advanced Computational Infrastructure
Combining Brain Data SetsCombining Brain Data Sets
RGB to HSIScalar to RGB
Mask by Hue
ScalarScalarCT-scanCT-scan
Color Color CryosectionCryosection
Color Color SegmentationSegmentation
Extract Hue
Composite
512 x 512 x 230512 x 512 x 230 547 x 710 x 672547 x 710 x 672 547 x 710 x 672547 x 710 x 672
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Expression Tree Toolkit
NPACI: National Partnership for Advanced Computational Infrastructure
Combining Brain Data SetsCombining Brain Data Sets
CT Cryosection
Composited
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Expression Tree Toolkit
NPACI: National Partnership for Advanced Computational Infrastructure
Combining Stellar Data SetsCombining Stellar Data Sets• Complex expression treesComplex expression trees
– 60+ nodes in the Orion body60+ nodes in the Orion body
• 90+ separate expression trees90+ separate expression trees– Orion, proplyds, shock fronts, ...Orion, proplyds, shock fronts, ...
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Expression Tree Toolkit
NPACI: National Partnership for Advanced Computational Infrastructure
And more toolkits...And more toolkits...• Interactive imaging with...Interactive imaging with...
– Mitsubishi VolumePro cardsMitsubishi VolumePro cards
– Point clouds & 3D texture mapping with graphics pipelinesPoint clouds & 3D texture mapping with graphics pipelines
• High-quality imaging with High-quality imaging with VISTA...VISTA...
Application
Data Grid Toolkit
Data Management
File Format Handling
SRB, ADR, etc.
Mesh Toolkit
Expression Tree Toolkit Other Toolkits
VolumeProPointCloud VISTA
3DTexture
NPACI: National Partnership for Advanced Computational Infrastructure
Design TeamDesign TeamScripps ResearchScripps Research
Art OlsonArt Olson
Mike PiqueMike Pique
Michel SannerMichel Sanner
SDSCSDSC
Bernard PailthorpeBernard Pailthorpe
Dave NadeauDave Nadeau
Jon GenettiJon Genetti
John MorelandJohn Moreland
Mike BaileyMike Bailey
Rich CharlesRich Charles
Alex DecastroAlex Decastro
U. TexasU. Texas
Chandrajit BajajChandrajit Bajaj
Ariel ShamirAriel Shamir
NPACI: National Partnership for Advanced Computational Infrastructure
Data-Visualization PipelineData-Visualization Pipeline
Get data from disk efficiently
Get data from disk efficiently
Manage data in memory efficiently
Manage data in memory efficiently
Compute on data
efficiently
Compute on data
efficiently
Visualize data
efficiently
Visualize data
efficiently
Computation Visualization
Data
SRB ServerSRB Server
MCAT (Metadata)
MCAT (Metadata)
ADR DataCutter
ADR DataCutter
SRB ServerSRB Server
KeLP FloorPlan
KeLP FloorPlan
Data
Data Orchestration
. . .
NPACI: National Partnership for Advanced Computational Infrastructure
Data-Visualization PipelineData-Visualization Pipeline
Get data from disk efficiently
Get data from disk efficiently
Manage data in memory efficiently
Manage data in memory efficiently
Compute on data
efficiently
Compute on data
efficiently
Visualize data
efficiently
Visualize data
efficiently
Computation Visualization
Data
Data - Vis Toolkits
Data - Vis Toolkits
Interaction Tools
Interaction Tools
VISTA Renderer
VISTA Renderer
Data Orchestration
. . .