OLAP Query Processing in Grids

38
OLAP Query Processing in Grids Nelson Kotowski Federal University of Rio de Janeiro, Brazil Alexandre A. B. Lima University of Grande Rio, Brazil Esther Pacitti, Patrick Valduriez INRIA and University of Nantes, France Marta Mattoso Federal University of Rio de Janeiro, Brazil DMG 2007

description

DMG 2007. OLAP Query Processing in Grids. Nelson Kotowski Federal University of Rio de Janeiro, Brazil Alexandre A. B. Lima University of Grande Rio, Brazil Esther Pacitti, Patrick Valduriez INRIA and University of Nantes, France Marta Mattoso Federal University of Rio de Janeiro, Brazil. - PowerPoint PPT Presentation

Transcript of OLAP Query Processing in Grids

Page 1: OLAP Query Processing in Grids

OLAP Query Processing in Grids

Nelson KotowskiFederal University of Rio de Janeiro, Brazil

Alexandre A. B. LimaUniversity of Grande Rio, Brazil

Esther Pacitti, Patrick ValduriezINRIA and University of Nantes, France

Marta MattosoFederal University of Rio de Janeiro, Brazil

DMG 2007

Page 2: OLAP Query Processing in Grids

2

Agenda

• OLAP in Grids

• Database clusters

• GParGRES

• Preliminary experimental results

• Conclusion

Page 3: OLAP Query Processing in Grids

3

OLAP using Grids

• Problem How to fulfill OLAP needs within current grid software

infrastructure ?- Grid Services ?- Adapting database cluster techniques to grids ?

Grid

Figure thanks to Peter Kacsuk and Gergely Sipos

Page 4: OLAP Query Processing in Grids

4

Using Database Clusters in Grids

A sequential “black-box” DBMS runs at each node It is based on database replication The middleware coordinates parallel query execution Applications and databases are easily migrated from sequential

environments Both inter and intra-query parallelism can be explored

Middleware

DBMS

DBMS

DBMS

DBMS

DBMS

PC Cluster

Clients

Page 5: OLAP Query Processing in Grids

5

DBMS

Q4

Inter-query Parallelism

DBMS

DBMS

DBMS

Q1

Q2

Q3

Node 1

Node 2

Node 3

Node 4

•Improves overall system throughput•Good for OLTP applications•Not adequate for OLAP

Page 6: OLAP Query Processing in Grids

6

DBMS

Intra-query Parallelism

DBMS

DBMS

DBMS

Q1Q12

Q14

Q13

Q11

Q4

Q2

Q3

Node 1

Node 2

Node 3

Node 4

•Reduces individual query execution time•Required for high-performance OLAP

VirtualPartitioning

Page 7: OLAP Query Processing in Grids

7

ParGRES

• Database cluster middleware developed by our research group

• Optimized for OLAP support

• Provides inter and intra-query parallelism

• Offers high-performance for heavy-weight query processing over large databases

- using non-expensive components- in a non-intrusive way

- Making no changes to database applications- Keeping the same DBMS- Keeping the same logical database schema

• Shows super-linear speedup

Page 8: OLAP Query Processing in Grids

GParGRES

Page 9: OLAP Query Processing in Grids

9

GParGRES: a Database Grid Middleware

• Middleware that provides Transparent access to distributed databases in a grid Intra-query parallelism during heavy-weight query processing

• Based on ParGRES Assumes that grid nodes are PC clusters running ParGRES

instances

• Intra-query parallelism is achieved through virtual partitioning

• Two levels of query splitting Grid-level splitting: implemented by GParGRES Node-level splitting: implemented by ParGRES

Page 10: OLAP Query Processing in Grids

10

GParGRES: Architecture

Page 11: OLAP Query Processing in Grids

11

GParGRES: ArchitectureConcentrates metadata concerning GParGRES services, such as the state of each FS and DQS instance, and ParGRES execution in the nodes

Page 12: OLAP Query Processing in Grids

12

GParGRES: Architecture

GParGRES entry point, responsible for creating new instances of DQS

Page 13: OLAP Query Processing in Grids

13

GParGRES: ArchitectureManages global query execution. Receives the query and splits it into subqueries by using virtual partitioning to implement intra-query parallelism. It also performs final result composition

Page 14: OLAP Query Processing in Grids

14

GParGRES: Architecture

Grid Local Query Service (GLQS) – local component responsible for receiving subqueries from DQS and passing them to the local ParGRES instance

Page 15: OLAP Query Processing in Grids

15

GParGRES: Architecture

Page 16: OLAP Query Processing in Grids

16

GParGRES: a Database Grid Middleware

Page 17: OLAP Query Processing in Grids

17

GParGRES: a Database Grid Middleware

Page 18: OLAP Query Processing in Grids

18

GParGRES: a Database Grid Middleware

Page 19: OLAP Query Processing in Grids

19

GParGRES: a Database Grid Middleware

Page 20: OLAP Query Processing in Grids

20

GParGRES: a Database Grid Middleware

select o_orderpriority, count(*) from orderswhere o_orderdate >= date '1993-07-01' group by o_orderpriority;

Page 21: OLAP Query Processing in Grids

21

GParGRES: a Database Grid Middleware

create table temp_result_1 ( o_orderpriority varchar(2), order_count integer);

Page 22: OLAP Query Processing in Grids

22

GParGRES: a Database Grid Middleware

select o_orderpriority, count(*) from orderswhere o_orderdate >= date '1993-07-01' and o_orderkey >= ? and o_orderkey < ? group by o_orderpriority;

Page 23: OLAP Query Processing in Grids

23

GParGRES: a Database Grid Middleware

Page 24: OLAP Query Processing in Grids

24

GParGRES: a Database Grid Middleware

Page 25: OLAP Query Processing in Grids

25

GParGRES: a Database Grid Middleware

Page 26: OLAP Query Processing in Grids

26

GParGRES: a Database Grid Middleware

insert into temp_result_1 values (?,?);

Page 27: OLAP Query Processing in Grids

27

GParGRES: a Database Grid Middleware

select o_orderpriority, sum(order_count) from temp_result_1group by o_orderpriority;

Page 28: OLAP Query Processing in Grids

28

GParGRES: a Database Grid Middleware

Page 29: OLAP Query Processing in Grids

29

GParGRES: Preliminary Experimental Results• A preliminary GParGRES prototype has been

implemented in Java Simple versions of DQS and GLQS (using

ParGRES components) were implemented

• Experimental Setup Two clusters from Grid’5000

- Parasol cluster: 64 nodes, each with 2 Opteron 2.2GHz CPUs, 2GB RAM and 73 GB HD

- Paraquad cluster: 64 nodes, each with 2 Dual Core Xeon 2.33GHz CPUs, 4GB RAM and 160GB HD

Kadeploy- Generate customized images of operating

systems and applications PostgreSQL 8.2.4 ParGRES TPC-H database and queries

- SF = 1

Page 30: OLAP Query Processing in Grids

30

GParGRES: Preliminary Experimental Results (cont.)

• Two kinds of experiments

Isolated clusters

Mixed Configuration

Page 31: OLAP Query Processing in Grids

31

GParGRES: Preliminary Experimental Results (cont.)

• Isolated cluster - Parasol

Page 32: OLAP Query Processing in Grids

32

GParGRES: Preliminary Experimental Results (cont.)

• Isolated cluster - Paraquad

Page 33: OLAP Query Processing in Grids

33

GParGRES: Preliminary Experimental Results (cont.)

• Mixed Configuration

Page 34: OLAP Query Processing in Grids

34

GParGRES – Implementation Issues

• Goals To implement all components as grid services WSRF-compliant components: RS, FS and GLQS

• When running in a grid managed by Globus Toolkit 4, RS can be implemented by Web Service Monitoring and Discovery Service (WS MDS)

• Techniques employed in OGSA-DAI will help implementing some components (e.g. FS)

Page 35: OLAP Query Processing in Grids

35

Related Work

• OGSA-DAI Open Grid Services Architecture - Data Access and

Integration

• OGSA-DQP Open Grid Services Architecture - Distributed Query

Processing

• New data models for grid warehouses Wehrle et al. propose a data model for distributing and

querying a data warehouse in computing grids- The warehouse is formed by data “chunks”- Special structures are needed (e.g. X-Tree)

Page 36: OLAP Query Processing in Grids

36

Conclusion

• GParGRES is a grid service for OLAP query processing It provides transparent inter and intra-query processing with

- No need for application migration- No need for database schema migration- DBMS independence

• GParGRES explore successful techniques implemented in ParGRES

• Two levels of query splitting Grid-level splitting: implemented by GParGRES Node-level splitting: implemented by ParGRES

• Components are WSRF-compliant, easing the compatibility with existing grid solutions

• Preliminary results obtained in Grid’5000 show good performance

Page 37: OLAP Query Processing in Grids

37

Future Work

• Integration with OGSA-DAI

• Support for partial database replication

• Support for top-k queries Extension of best position algorithms

Page 38: OLAP Query Processing in Grids

A different view of the Grid

DMG 2007

Kandinskythe Grid, 1923

Albertina MuseumVienna

Thanks!