Compiler (and Runtime) Support for CyberInfrastructure Gagan Agrawal (joint work with Wei Du,...

Compiler (and Runtime) Support for CyberInfrastructure

Gagan Agrawal

(joint work with Wei Du, Xiaogang Li, Ruoming Jin, Li Weng)

What is CyberInfrastructure ? How computing is done is changing with

advances in internet and emergence of web Access web-pages, data, web-services from the

internet What does it mean in terms of large scale

computing Supercomputers are no longer stand-alone

resources Large data repositories are common

What is CyberInfrastructure ? Infrastructures we are familiar with

Transportation infrastructure Telecommunication infrastructure Power supply/distribution infrastructure

CyberInfrastructure means large scale computing infrastructure on the internet Enable sharing of resources Enable large-scale web-services

Access and process a 1 tera-byte file as a web-service Run a job on a large supercomputer using your web-browser !

CyberInfrastructure

CyberInfrastructure is also a new division with CISE directorate of National Science Foundation Shows the importance

Needs new research at all levels Networking / parallel computing hardware System software Applications

Why is Compiler Support Needed for Cynerinfrastructure ? Compilers have often simplified application

development Application development for

Cyberinfrastructure is a hard problem !! We need transparence to different resources We need transparence to different dataset

sources and formats We need applications to adapt to resource

availability ….

Outline

Compiler supported Coarse-grained pipelined parallelism Why ? How ?

XML Based front-ends to scientific datasets Compiler support for application self-

adaptation A SQL front-end to a grid data management

system

General Motivation

Language and Compiler Support for Parallelism of many forms has been explored Shared memory parallelism Instruction-level parallelism Distributed memory parallelism Multithreaded execution

Application and technology trends are making another form of parallelism desirable and feasible Coarse-Grained Pipelined Parallelism

Coarse-Grained Pipelined Parallelism(CGPP) Definition

Computations associated with an application are carried out in several stages, which are executed on a pipeline of computing units

Example — K-nearest Neighbor

Given a 3-D range R= <(x1, y1, z1), (x2, y2, z2)>, and

a point = (a, b, c).

We want to find the nearest K neighbors of within R.

Range_query Find the K-nearest neighbors

Coarse-Grained Pipelined Parallelism is Desirable & Feasible Application scenarios

Internet

data

data

data

data

datadatadata

Coarse-Grained Pipelined Parallelism is Desirable & Feasible A new class of data-intensive applications

Scientific data analysis data mining data visualization image analysis

Two direct ways to implement such applications Downloading all the data to

user’s machine – often not feasible Computing at the data repository - usually too slow

Our belief

A coarse-grained pipelined execution model is a good match

Internet

data

data

Coarse-Grained Pipelined Parallelism is Desirable & Feasible

Coarse-Grained Pipelined Parallelism needs Compiler Support Computation needs to be decomposed into stages

Decomposition decisions are dependent on execution environment How many computing sites available How many available computing cycles on each site What are the available communication links What’s the bandwidth of each link

Code for each stage follows the same processing pattern, so it can be generated by compiler

Shared or distributed memory parallelism needs to be exploited

High-level language and compiler support are necessary

An Entire Picture

Java Dialect

Compiler Support

DataCutter Runtime System

Decomposition

Code Generation

Language Dialect Goal

to give compiler information about independent collections of objects, parallel loops and reduction operations, pipelined parallelism

Extensions of Java Pipelined_loop Domain & Rectdomain Foreach loop reduction variables

ISO-Surface Extraction Example Code

public class isosurface { public static void main(String arg[]) { float iso_value; RectDomain<1> CubeRange = [min:max]; CUBE[1d] InputData = new CUBE[CubeRange]; Point<1> p, b;

RectDomain<1> PacketRange = [1:runtime_def_num_packets];

RectDomain<1> EachRange = [1:(max-min)/runtime_define_num_packets];

Pipelined_loop (b in PacketRange) { Foreach (p in EachRange) {

InputData[p].ISO_SurfaceTriangles(iso_value,…); } … … }}

For (int i=min; i++; i<max-1){ // operate on InputData[i]}

Pipelined_loop (b in PacketRange)Pipelined_loop (b in PacketRange) { { 0. foreach ( …) { … }0. foreach ( …) { … }1. foreach ( …) { … }1. foreach ( …) { … } … …… …n-1. S;n-1. S; }} Merge Merge

RectDomain<1> PacketRange = [1:4];RectDomain<1> PacketRange = [1:4];

Experimental Results Versions

Default version Site hosting the data only reads and transmits data, no

processing at all User’s desktop only views the results, no processing at all All the work are done by the compute nodes

Compiler-generated version Intelligent decomposition is done by the compiler More computations are performed on the end nodes to

reduce the communication volume Manual version

Hand-written DataCutter filters with similar decomposition as the compiler-generated version

Computing nodes workload heavyCommunication volume high

workload balanced between each nodeCommunication volume reduced

Experimental Results: ISO-Surface Rendering (Z-Buffer Based)

0

5

10

15

20

25

30

35

40

1 2 40

20

40

60

80

100

120

140

160

1 2 4

Decomp

Default

Width of pipeline Width of pipeline

Small dataset150M

Large dataset600M

Speedup 1.92 3.34 Speedup 1.99 3.82

20% improvement over default version

Outline




system

Motivation

The need Analysis of datasets is becoming crucial for scientific advances

Emergence of X-Informatics Complex data formats complicate processing Need for applications that are easily portable – compatibility

with web/grid services The opportunity

The emergence of XML and related technologies developed by W3C

XML is already extensively used as part of Grid/Distributed Computing

Can XML help in scientific data processing?

The Big Picture

TEXT

…

NetCDF

RMDB

HDF5

XML

XQuery

???

Programming/Query Language High-level declarative languages ease application

development Popularity of Matlab for scientific computations

New challenges in compiling them for efficient execution

XQuery is a high-level language for processing XML datasets Derived from database, declarative, and functional

languages ! XPath (a subset of XQuery) embedded in an imperative

language is another option

Approach / Contributions

Use of XML Schemas to provide high-level abstractions on complex datasets

Using XQuery with these Schemas to specify processing

Issues in Translation High-level to low-level code Data-centric transformations for locality in low-level codes Issues specific to XQuery

Recognizing recursive reductions Type inferencing and translation

External Schema

XQuery Sources

Compiler

XML Mapping Service

System Architecture

logical XML schema physical XML schema

C++/C

Satellite Data ProcessingTime[t]

···

Data collected by satellites is a collection of chunks, each of which captures an irregular section of earth captured at time t The entire dataset comprises multiples pixels for each point in earth at different times, but not for all times Typical processing is a reduction along the time dimension - hard to write on the raw data format

Using a High-level Schema

High-level view of the dataset – a simple collection of pixels

Latitude, longitude, and time explicitly stored with each pixel

Easy to specify processing Don’t care about locality / unnecessary scans

At least one order of magnitude overhead in storage Suitable as a logical format only

XQuery Overview

XQuery -A language for querying and

processing XML document - Functional language - Single Assignment - Strongly typed XQuery Expression - for let where return (FLWR) - unordered - path expression

Unordered(For $d in document(“depts.xml”)//deptno let $e:=document(“emps.xml”)//emp [Deptno= $d] where count($e)>=10 return <big-dept> {$d, <count> {count($e) }</count> <avg> {avg($e/salary)}<avg> } </big-dept> )

Satellite- XQuery Code

Unordered ( for $i in ( $minx to $maxx) for $j in ($miny to $maxy) let p:=document(“sate.xml”)

/data/pixel where lat = i and long = j return <pixel> <latitude> {$i} </latitude> <longitude> {$j} <longitude>

<sum>{accumulate($p)}</sum> </pixel> )

Define function accumulate ($p) as double { let $inp := item-at($p,1) let $NVDI := (( $inp/band1

-$inp/band0)div($inp/band1+$inp/band0)+1)*512

return if (empty( $p) ) then 0 else { max($NVDI,

accumulate(subsequence ($p, 2 ))) }

Challenges

Need to translate to low-level schema Focus on correctness and avoiding unnecessary reads

Enhancing locality Data-centric execution on XQuery constructs Use information on low-level data layout

Issues specific to XQuery Reductions expressed as recursive functions Generating code in an imperative language

For either direct compilation or use a part of a runtime system

Requires type conversion

Mapping to Low-level Schema

A number of getData functions to access elements(s) of required types

getData functions written in XQuery

allow analysis and transformations

Want to insert getData functions automatically

preserve correctness and avoid unnecessary scans

getData(lat x, long y)

getData(lat x)

getData(long y)

getData(lat x, long y, time t)

….

Summary – XML Based Front-ends • A case for the use of XML technologies in scientific

data analysis • XQuery – a data parallel language ? • Identified and addressed compilation challenges • A compilation system has been built

• Very large performance gains from data-centric transformations

• Preliminary evidence that high-level abstractions and query language do not degrade performance substantially

Outline




system

Applications in a Grid Environment

characteristics summarized long-running applications adaptation to changing environments is desirable constraints-based

response time output can be varied in a given range

resolution accuracy precision

How to achieve adaptation?

Proposed Language Extensions public interface Adapt_Spec

{string constraints; // “RESP_TIME <= 50ms”

List<string> opti_vars;// “m”, “clipwin.x”

List<string> thresholds;// “m>=N”, “sampling_factor>=1”

List<int> opti_dir;}

Implementation Issues & Strategies

Language Aspect Compiler Implementation Performance Modeling & Resource Monitoring Experimental Design

Outline




system

Overview of the Project

Cyber-infrastructure/grid environment comprises distributed data sources

Users will like seem-less access to the data SQL is popular for accessing data from a single

database SQL for grid-based accesses

Data is distributed Data is not managed by the relational database system

Need to export data layout information to the query planner

Overview (Contd.)

Use Grid-db-lite as the backend A grid data management middleware

Define and use a data description language Parse SQL queries and the data description

language and generate a Grid-db-lite application

Design Dataset description file

Data set schema Dataset list file

Cluster configure Dataset storage location

Meta-data Logical data space ( number of dimension ) Attributes for index declaration Partition Physical data storage annotation

[IPARS]RID = INT2TIME = INT4X = FLOATY = FLOATZ = FLOATPOIL = FLOATPWAT = FLOAT……

[bh]DatasetDescription = IPARSio = fileDim = 17x65x65Npart = 8…Osumed1 = osumed01.epn.osc.edu, osumed02.epn.osc.edu,

…0 = bh-10-1 osumed1 /scratch1/bh-10-11 = bh-10-2 osumed1 /scratch1/bh-10-2……

Description file

Data list file

{ Group “ROOT” { DATASET “bh” {

DATATYPE { IPARS }DATASPACE {RANK 3 }DATAINDEX { RID, TIME }PARTS { 9503, 9503, 9537, 9554, 9503, 9707, 9520, 9520 }DATA { DATASET SPACIAL, DATASET POIL, DATASET PWAT, …… }}

Group “SUBGROUP” { DATASET “SPACIAL” {

DATATYPE { } DATASPACE {

SKIP 4 LINESLOOP PARTS { X SPACE Y SPACE Z SKIP 1 LINE

} }

DATA {PART in (0,1,2,3,4,5,6,7) .0.PART.5.init }}

DATASET “POIL” { DATATYPE { } DATASPACE {

LOOP TIME { SKIP 1 double LOOP PARTS

{ POIL } }}

DATA { PART in (0,1,2,3,4,5,6,7) .0.PART.5.0}

…… }

Meta-data

[TITAN]X = INT4Y = INT4Z = INT4S1 = INT4S2 = INT4S3 = INT4S4 = INT4S5 = INT4

[TitanData]DatasetDescription = TITANio = fileDim = NULLNpart = 1Osumed1 = osumed01.epn.osc.edu0 = NULL osumed1 /scratch1/weng/Titan/

Description file

Data list file

{ Group “ROOT” { DATASET “TitanData” {

DATATYPE { TITAN }DATASPACE {RANK 3 }DATAINDEX { FID, OFFSET, BSIZE }DATA { DATASET TITAN, INDEXSET TITANINDEX}

} Group “SUBGROUP” {

DATASET “TITAN” { DATATYPE { struct TITAN_Record_t {unsigned int x, y, z; unsigned int s1,s2,s3,s4,s5; }; } DATASPACE { LOOP {struct TITAN_Record_t}

} DATA { 0 }

} INDEXSET “TITANINDEX” {

DATATYPE { HOST hostid; struct Block3D { MBR rect;

JMP jmp; FID fid; OFFSET offset; BSIZE bsize; }; }

DATASPACE { LOOP {

HOST SPACE struct Block3D }

} DATA { IndexFile }

}}

}

Meta-data

Compilation Issues

Interface between Index() and Extractor() Range query

A chunk can be totally in the query range, partially in the query range, or totally outside of the query range

How to choose a suitable size for indexed chunks Interface between Extractor() and GridDB-lite

Explore alternative methods to get tuples/records Smarter extractor can signal GridDB for some filtering operations

Query transform Optimization? Hosts allocation for stages ( DP, DM, Client)

Some other potential issues? The granularity of “tuple” Data partitioning methods …

Other Research Areas

Runtime support systems Ease parallelization of data mining algorithms in a

cluster environment (FREERIDE) Grid-based processing of distributed data

streams Algorithms for Data Mining / OLAP

Parallel and scalable algorithms Algorithms for processing distributed data streams

Group Members

Seven Ph.D students Liang Chen Wei Du Anjan Goswami Ruoming Jin Xiaogang Li Li Weng Xuan Zhang

Two Masters students Leo Glimcher Swarup Sahoo

Part-time student Kolagatla Reddy

Getting Involved

Talk to me Sign in for my 888

Compiler (and Runtime) Support for CyberInfrastructure Gagan Agrawal (joint work with Wei Du,...

Documents

Transcript of Compiler (and Runtime) Support for CyberInfrastructure Gagan Agrawal (joint work with Wei Du,...