Compiler (and Runtime) Support for CyberInfrastructure Gagan Agrawal (joint work with Wei Du,...
-
Upload
monica-stevenson -
Category
Documents
-
view
218 -
download
3
Transcript of Compiler (and Runtime) Support for CyberInfrastructure Gagan Agrawal (joint work with Wei Du,...
Compiler (and Runtime) Support for CyberInfrastructure
Gagan Agrawal
(joint work with Wei Du, Xiaogang Li, Ruoming Jin, Li Weng)
What is CyberInfrastructure ? How computing is done is changing with
advances in internet and emergence of web Access web-pages, data, web-services from the
internet What does it mean in terms of large scale
computing Supercomputers are no longer stand-alone
resources Large data repositories are common
What is CyberInfrastructure ? Infrastructures we are familiar with
Transportation infrastructure Telecommunication infrastructure Power supply/distribution infrastructure
CyberInfrastructure means large scale computing infrastructure on the internet Enable sharing of resources Enable large-scale web-services
Access and process a 1 tera-byte file as a web-service Run a job on a large supercomputer using your web-browser !
CyberInfrastructure
CyberInfrastructure is also a new division with CISE directorate of National Science Foundation Shows the importance
Needs new research at all levels Networking / parallel computing hardware System software Applications
Why is Compiler Support Needed for Cynerinfrastructure ? Compilers have often simplified application
development Application development for
Cyberinfrastructure is a hard problem !! We need transparence to different resources We need transparence to different dataset
sources and formats We need applications to adapt to resource
availability ….
Outline
Compiler supported Coarse-grained pipelined parallelism Why ? How ?
XML Based front-ends to scientific datasets Compiler support for application self-
adaptation A SQL front-end to a grid data management
system
General Motivation
Language and Compiler Support for Parallelism of many forms has been explored Shared memory parallelism Instruction-level parallelism Distributed memory parallelism Multithreaded execution
Application and technology trends are making another form of parallelism desirable and feasible Coarse-Grained Pipelined Parallelism
Coarse-Grained Pipelined Parallelism(CGPP) Definition
Computations associated with an application are carried out in several stages, which are executed on a pipeline of computing units
Example — K-nearest Neighbor
Given a 3-D range R= <(x1, y1, z1), (x2, y2, z2)>, and
a point = (a, b, c).
We want to find the nearest K neighbors of within R.
Range_query Find the K-nearest neighbors
Coarse-Grained Pipelined Parallelism is Desirable & Feasible Application scenarios
Internet
data
data
data
data
datadatadata
Coarse-Grained Pipelined Parallelism is Desirable & Feasible A new class of data-intensive applications
Scientific data analysis data mining data visualization image analysis
Two direct ways to implement such applications Downloading all the data to
user’s machine – often not feasible Computing at the data repository - usually too slow
Our belief
A coarse-grained pipelined execution model is a good match
Internet
data
data
Coarse-Grained Pipelined Parallelism is Desirable & Feasible
Coarse-Grained Pipelined Parallelism needs Compiler Support Computation needs to be decomposed into stages
Decomposition decisions are dependent on execution environment How many computing sites available How many available computing cycles on each site What are the available communication links What’s the bandwidth of each link
Code for each stage follows the same processing pattern, so it can be generated by compiler
Shared or distributed memory parallelism needs to be exploited
High-level language and compiler support are necessary
An Entire Picture
Java Dialect
Compiler Support
DataCutter Runtime System
Decomposition
Code Generation
Language Dialect Goal
to give compiler information about independent collections of objects, parallel loops and reduction operations, pipelined parallelism
Extensions of Java Pipelined_loop Domain & Rectdomain Foreach loop reduction variables
ISO-Surface Extraction Example Code
public class isosurface { public static void main(String arg[]) { float iso_value; RectDomain<1> CubeRange = [min:max]; CUBE[1d] InputData = new CUBE[CubeRange]; Point<1> p, b;
RectDomain<1> PacketRange = [1:runtime_def_num_packets];
RectDomain<1> EachRange = [1:(max-min)/runtime_define_num_packets];
Pipelined_loop (b in PacketRange) { Foreach (p in EachRange) {
InputData[p].ISO_SurfaceTriangles(iso_value,…); } … … }}
For (int i=min; i++; i<max-1){ // operate on InputData[i]}
Pipelined_loop (b in PacketRange)Pipelined_loop (b in PacketRange) { { 0. foreach ( …) { … }0. foreach ( …) { … }1. foreach ( …) { … }1. foreach ( …) { … } … …… …n-1. S;n-1. S; }} Merge Merge
RectDomain<1> PacketRange = [1:4];RectDomain<1> PacketRange = [1:4];
Experimental Results Versions
Default version Site hosting the data only reads and transmits data, no
processing at all User’s desktop only views the results, no processing at all All the work are done by the compute nodes
Compiler-generated version Intelligent decomposition is done by the compiler More computations are performed on the end nodes to
reduce the communication volume Manual version
Hand-written DataCutter filters with similar decomposition as the compiler-generated version
Computing nodes workload heavyCommunication volume high
workload balanced between each nodeCommunication volume reduced
Experimental Results: ISO-Surface Rendering (Z-Buffer Based)
0
5
10
15
20
25
30
35
40
1 2 40
20
40
60
80
100
120
140
160
1 2 4
Decomp
Default
Width of pipeline Width of pipeline
Small dataset150M
Large dataset600M
Speedup 1.92 3.34 Speedup 1.99 3.82
20% improvement over default version
Outline
Compiler supported Coarse-grained pipelined parallelism Why ? How ?
XML Based front-ends to scientific datasets Compiler support for application self-
adaptation A SQL front-end to a grid data management
system
Motivation
The need Analysis of datasets is becoming crucial for scientific advances
Emergence of X-Informatics Complex data formats complicate processing Need for applications that are easily portable – compatibility
with web/grid services The opportunity
The emergence of XML and related technologies developed by W3C
XML is already extensively used as part of Grid/Distributed Computing
Can XML help in scientific data processing?
The Big Picture
TEXT
…
NetCDF
RMDB
HDF5
XML
XQuery
???
Programming/Query Language High-level declarative languages ease application
development Popularity of Matlab for scientific computations
New challenges in compiling them for efficient execution
XQuery is a high-level language for processing XML datasets Derived from database, declarative, and functional
languages ! XPath (a subset of XQuery) embedded in an imperative
language is another option
Approach / Contributions
Use of XML Schemas to provide high-level abstractions on complex datasets
Using XQuery with these Schemas to specify processing
Issues in Translation High-level to low-level code Data-centric transformations for locality in low-level codes Issues specific to XQuery
Recognizing recursive reductions Type inferencing and translation
External Schema
XQuery Sources
Compiler
XML Mapping Service
System Architecture
logical XML schema physical XML schema
C++/C
Satellite Data ProcessingTime[t]
···
Data collected by satellites is a collection of chunks, each of which captures an irregular section of earth captured at time t The entire dataset comprises multiples pixels for each point in earth at different times, but not for all times Typical processing is a reduction along the time dimension - hard to write on the raw data format
Using a High-level Schema
High-level view of the dataset – a simple collection of pixels
Latitude, longitude, and time explicitly stored with each pixel
Easy to specify processing Don’t care about locality / unnecessary scans
At least one order of magnitude overhead in storage Suitable as a logical format only
XQuery Overview
XQuery -A language for querying and
processing XML document - Functional language - Single Assignment - Strongly typed XQuery Expression - for let where return (FLWR) - unordered - path expression
Unordered(For $d in document(“depts.xml”)//deptno let $e:=document(“emps.xml”)//emp [Deptno= $d] where count($e)>=10 return <big-dept> {$d, <count> {count($e) }</count> <avg> {avg($e/salary)}<avg> } </big-dept> )
Satellite- XQuery Code
Unordered ( for $i in ( $minx to $maxx) for $j in ($miny to $maxy) let p:=document(“sate.xml”)
/data/pixel where lat = i and long = j return <pixel> <latitude> {$i} </latitude> <longitude> {$j} <longitude>
<sum>{accumulate($p)}</sum> </pixel> )
Define function accumulate ($p) as double { let $inp := item-at($p,1) let $NVDI := (( $inp/band1
-$inp/band0)div($inp/band1+$inp/band0)+1)*512
return if (empty( $p) ) then 0 else { max($NVDI,
accumulate(subsequence ($p, 2 ))) }
Challenges
Need to translate to low-level schema Focus on correctness and avoiding unnecessary reads
Enhancing locality Data-centric execution on XQuery constructs Use information on low-level data layout
Issues specific to XQuery Reductions expressed as recursive functions Generating code in an imperative language
For either direct compilation or use a part of a runtime system
Requires type conversion
Mapping to Low-level Schema
A number of getData functions to access elements(s) of required types
getData functions written in XQuery
allow analysis and transformations
Want to insert getData functions automatically
preserve correctness and avoid unnecessary scans
getData(lat x, long y)
getData(lat x)
getData(long y)
getData(lat x, long y, time t)
….
Summary – XML Based Front-ends • A case for the use of XML technologies in scientific
data analysis • XQuery – a data parallel language ? • Identified and addressed compilation challenges • A compilation system has been built
• Very large performance gains from data-centric transformations
• Preliminary evidence that high-level abstractions and query language do not degrade performance substantially
Outline
Compiler supported Coarse-grained pipelined parallelism Why ? How ?
XML Based front-ends to scientific datasets Compiler support for application self-
adaptation A SQL front-end to a grid data management
system
Applications in a Grid Environment
characteristics summarized long-running applications adaptation to changing environments is desirable constraints-based
response time output can be varied in a given range
resolution accuracy precision
How to achieve adaptation?
Proposed Language Extensions public interface Adapt_Spec
{string constraints; // “RESP_TIME <= 50ms”
List<string> opti_vars;// “m”, “clipwin.x”
List<string> thresholds;// “m>=N”, “sampling_factor>=1”
List<int> opti_dir;}
Implementation Issues & Strategies
Language Aspect Compiler Implementation Performance Modeling & Resource Monitoring Experimental Design
Outline
Compiler supported Coarse-grained pipelined parallelism Why ? How ?
XML Based front-ends to scientific datasets Compiler support for application self-
adaptation A SQL front-end to a grid data management
system
Overview of the Project
Cyber-infrastructure/grid environment comprises distributed data sources
Users will like seem-less access to the data SQL is popular for accessing data from a single
database SQL for grid-based accesses
Data is distributed Data is not managed by the relational database system
Need to export data layout information to the query planner
Overview (Contd.)
Use Grid-db-lite as the backend A grid data management middleware
Define and use a data description language Parse SQL queries and the data description
language and generate a Grid-db-lite application
Design Dataset description file
Data set schema Dataset list file
Cluster configure Dataset storage location
Meta-data Logical data space ( number of dimension ) Attributes for index declaration Partition Physical data storage annotation
[IPARS]RID = INT2TIME = INT4X = FLOATY = FLOATZ = FLOATPOIL = FLOATPWAT = FLOAT……
[bh]DatasetDescription = IPARSio = fileDim = 17x65x65Npart = 8…Osumed1 = osumed01.epn.osc.edu, osumed02.epn.osc.edu,
…0 = bh-10-1 osumed1 /scratch1/bh-10-11 = bh-10-2 osumed1 /scratch1/bh-10-2……
Description file
Data list file
{ Group “ROOT” { DATASET “bh” {
DATATYPE { IPARS }DATASPACE {RANK 3 }DATAINDEX { RID, TIME }PARTS { 9503, 9503, 9537, 9554, 9503, 9707, 9520, 9520 }DATA { DATASET SPACIAL, DATASET POIL, DATASET PWAT, …… }}
Group “SUBGROUP” { DATASET “SPACIAL” {
DATATYPE { } DATASPACE {
SKIP 4 LINESLOOP PARTS { X SPACE Y SPACE Z SKIP 1 LINE
} }
DATA {PART in (0,1,2,3,4,5,6,7) .0.PART.5.init }}
DATASET “POIL” { DATATYPE { } DATASPACE {
LOOP TIME { SKIP 1 double LOOP PARTS
{ POIL } }}
DATA { PART in (0,1,2,3,4,5,6,7) .0.PART.5.0}
…… }
Meta-data
[TITAN]X = INT4Y = INT4Z = INT4S1 = INT4S2 = INT4S3 = INT4S4 = INT4S5 = INT4
[TitanData]DatasetDescription = TITANio = fileDim = NULLNpart = 1Osumed1 = osumed01.epn.osc.edu0 = NULL osumed1 /scratch1/weng/Titan/
Description file
Data list file
{ Group “ROOT” { DATASET “TitanData” {
DATATYPE { TITAN }DATASPACE {RANK 3 }DATAINDEX { FID, OFFSET, BSIZE }DATA { DATASET TITAN, INDEXSET TITANINDEX}
} Group “SUBGROUP” {
DATASET “TITAN” { DATATYPE { struct TITAN_Record_t {unsigned int x, y, z; unsigned int s1,s2,s3,s4,s5; }; } DATASPACE { LOOP {struct TITAN_Record_t}
} DATA { 0 }
} INDEXSET “TITANINDEX” {
DATATYPE { HOST hostid; struct Block3D { MBR rect;
JMP jmp; FID fid; OFFSET offset; BSIZE bsize; }; }
DATASPACE { LOOP {
HOST SPACE struct Block3D }
} DATA { IndexFile }
}}
}
Meta-data
Compilation Issues
Interface between Index() and Extractor() Range query
A chunk can be totally in the query range, partially in the query range, or totally outside of the query range
How to choose a suitable size for indexed chunks Interface between Extractor() and GridDB-lite
Explore alternative methods to get tuples/records Smarter extractor can signal GridDB for some filtering operations
Query transform Optimization? Hosts allocation for stages ( DP, DM, Client)
Some other potential issues? The granularity of “tuple” Data partitioning methods …
Other Research Areas
Runtime support systems Ease parallelization of data mining algorithms in a
cluster environment (FREERIDE) Grid-based processing of distributed data
streams Algorithms for Data Mining / OLAP
Parallel and scalable algorithms Algorithms for processing distributed data streams
Group Members
Seven Ph.D students Liang Chen Wei Du Anjan Goswami Ruoming Jin Xiaogang Li Li Weng Xuan Zhang
Two Masters students Leo Glimcher Swarup Sahoo
Part-time student Kolagatla Reddy
Getting Involved
Talk to me Sign in for my 888