Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated...

46
Developing Software tools in Academia Deepali Nemade Anshuman Dutt Database Systems Lab In the real world

Transcript of Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated...

Page 1: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Developing Software tools in Academia

Deepali Nemade

Anshuman Dutt Database Systems Lab

In the real world

Page 2: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Software Types

Software

System Software

Embedded Software

Programming Software

Application Software

Operating Systems Device drivers Server management

BIOS Washing machine control Routing utility

Compilers Debuggers Interpreters Integrated Development environment

Word processor Image/video editing Video games Simulation software Databases Mathematical software Medical software Computer-aided design Educational software Industrial automation Decision-making software

Free software

Proprietary software

Page 3: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Database software

• A general-purpose DBMS is a software system designed to allow the definition, creation, querying, update, and administration of databases. – Oracle

– SAP

– Microsoft Access

– IBM DB2

– Microsoft SQL Server

– HP Non-stop SQL/MX

– PostgreSQL

– MySQL

– SQLite

Free software

Proprietary software

Page 4: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

DBMS architecture

Page 5: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Query Optimizer

• The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans.

• There is a trade-off between – the amount of time spent figuring out the best query plan

– the quality of the plan

• Hence, provide – a "good enough" plan (performance comparable to the best)

– in a reasonable time

• The role of query optimizers has become especially critical in recent times due to the high degree of query complexity – data warehousing and mining over databases

Page 6: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Query Plan Selection

• Core technique

Query (Q) Query Optimizer (dynamic programming)

Minimum Cost

Plan P(Q)

DB catalogs

Cost Model

Search Space

Page 7: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Need for careful plan selection

• Cost difference between best plan choice and a random choice can be enormous (orders of magnitude!)

• Only a small percentage of really good plans over the (exponential) search space

Page 8: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Tools Developed

• PICASSO

– A tool developed at DSL – Best Demo in VLDB 2010

– Showed that current query optimizers have become much complex

than expected VLDB-2005, VLDB 2008

– The complexity of behavior is not actually required, we showed the

possibility of simplification – VLDB 2007

• CODD

– One more tool developed at DSL

– Testing of query optimizers do not need data but only statistics, so

we can create DATALESS DATABASES – DBTEST 2012

Analyze the behavior of the optimizer for a given data instance

Creates futuristic testing scenarios (impractical otherwise)

Page 9: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Picasso: Drawing Out the Artistic Talents of DB Query Optimizers

Mr. Query Optimizer

See, I am a painter too !!

Page 10: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Parametric query

Select *

From authors, publishers, sources ….

Where language = ‘ENG’ and year between ‘1997’ and ‘1999’

and lastname like ‘Autier’ and title like ‘melanoma’ …

Page 11: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Parametric query optimization (PQO)

• It attempts to identify several execution plans

– each one of which is optimal for a subset of all possible values of the run-time parameters

– termed as Parametric Optimal Set of Plans (POSP)

• At run time, when the parameter values are known

– a simple lookup using parameters to identify appropriate plan from POSP

– avoids full scale query optimization for each query instance

– hence saves optimization time

Expectations Small number of plans in POSP Plan choices will not change frequently

Page 12: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Query Template [Q7 of TPC-H]

select supp_nation, cust_nation, l_year, sum(volume) as revenue from (select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ((n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')) and l_shipdate between date '1995-01-01' and date '1996-12-31'

group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year

and o_totalprice ≤ C1 and c_acctbal ≤ C2 ) as shipping

Determines the value of goods shipped between nations in a time period

Value determines selectivity of ORDERS

relation

Value determines selectivity of

CUSTOMER relation

Page 13: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Parametric Space

Selectivity

Sele

ctiv

ity

Page 14: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Plan Diagram Generation Process

ORDERS.o_totalprice

CUSTOMER.

c_acctbal

Page 15: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Sample Plan Diagram [QT7,OptB]

Page 16: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Plan P1 Plan P3 Plan P5

Page 17: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Sample Cost Diagram [QT7,OptB]

MinCost: 6.08E3 MaxCost: 3.24E4

Page 18: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

The Picasso Connection

Plan diagrams are often similar to cubist paintings ! [ Pablo Picasso founder of cubist genre ]

Woman with a guitar

Georges Braque, 1913

Page 19: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Complex Plan Diagram [QT8,OptA*]

Extremely fine-grained coverage (P76 ~ 0.01%)

Highly irregular plan boundaries

Intricate Complex Patterns

# of plans: 76

Increases to 90 plans with 300x300 grid !

The Picasso Connection

Page 20: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Picasso Architecture

Picasso Client

Picasso Server

DATABASE ENGINE

Engine-Specific (Plan Information, Statistics Information)

Column Statistics Plan Tree EXPLAIN

QUERY

STATS QUERY

visualization

Repeated several times

Page 21: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Overview

Picasso is a Java tool that, given a multi-dimensional SQL query template and a choice of database engine, automatically generates plan diagram and cost diagram

– Fires queries at user-specified granularity (10, 30, 100, 300, 1000 queries per dimension)

– Visualization: 2-D plan diagrams (slices if n > 2) 3-D cost and card diagrams Also: Plan-trees, Plan differences

– >60000 lines of code (2004-12) with ~100 classes

– Uses Java3D, VisAd, JGraph, Swing

Page 22: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Tool Status • Operational on DB2/Oracle/SQLServer/Sybase/PostgreSQL

• Copyrighted by IISc in May 2006

• Released as free software in Nov 2006 by Associate Director of IISc

• Release of version 1.0 in May 2007, version 2.0 in Feb 2009, (version 3.0 in 2013)

• In use at academic and industrial labs worldwide – CMU, Purdue, Duke, TU Munich, NU Singapore, IIT-B, …

– IBM, Microsoft, Oracle, Sybase, HP, …

• Received Best Software award in Very Large Data Base (VLDB) conference, 2010

Page 23: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Why do they care?

• Excited the interest of industrial and academic communities

– serious problems and anomalies in current optimizer design

• optimizer evaluator / debugger / designer

• database administrators – response time fault profiler

– testbed for database researchers

– educational aid for students

Page 24: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Why do we care ?

• Not software development for its own sake

• Development of the tool has thrown up many core CS research problems involving theory, algorithms, statistics, tree matching …

– One of these we will discuss next

Page 25: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Plan Diagram Reduction

Can the plan diagram be recolored with a smaller set of colors (i.e. some plans are “swallowed” by others), such that Guarantee:

No query point in the original diagram has its cost increased, post-swallowing, by more than λ percent (user-defined)

Analogy: Sri Lanka agrees to be annexed by India if it is assured that the cost of living of each Lankan citizen is not increased by more than λ percent

Page 26: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Complex Plan Diagram [QT8, OptA*] Reduced Plan Diagram [λ=10%]

Reduced to 5 plans from 76 !

Comparatively smoother contours

Page 27: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Note:

• A 10% threshold is well within the confidence intervals of the cost estimates of modern optimizers

• The degradation value is an upper bound the average degradation is much lower in practice

Page 28: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Demo Diagrams

• Plan Diagram

• Cost Diagram

• Reduced Plan-Diagram

• Plan-tree Diagram (qualitative / quantitative)

• Plan-difference Diagram

Page 29: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Picasso Demo

Page 30: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

CODD: COnstructing Dataless Databases

Average Joe Mr. CODD

Page 31: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Why do we need of such tool?

• Want to construct futuristic metadata scenario for effective testing:

Test the query optimizer for metadata corresponding to 100 GB data scenario

DATA

METADATA Lets create a 100 GB data

scenario CODD

Impossible! I can directly

create metadata

Page 32: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

CODD Overview

Database Engine

Metadata Store

CODD Metadata Processor

Vendor Neutral

Interface

1. Metadata Construction

2. Metadata Retention

3. Metadata Porting

4. Metadata Scaling

• DB2 • Oracle • SQL Server

Page 33: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

• Provides interface for ab-initio creation of metadata

Metadata Construction

Data

Metadata CODD

Interface

DB Tester 1. Create relation schemas

2. Input metadata values

3. Fill catalog tables

Metadata

Page 34: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Construct Mode Interface

Page 35: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Metadata Validation – Construct Mode

• Can user input arbitrary values? – No. The input metadata values must be

• legal (valid type and range)

• consistent with other metadata values

• Verification approach – Construct a directed acyclic constraint graph CG(V, E)

• V represents the set of metadata entities and its structural constraints

• E represents the consistency constraints

– Run topological sort on CG to obtain CGlinear . Complexity: O(|V| +|E|)

– Force the user through the linear ordering and ensure that the constraints are met along the linear ordering

e.g. Column Cardinality Integer ; -1 or Colcard > 0 Colcard <= Card

Page 36: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Metadata Validation

Relation level metadata Index level metadata

Column level metadata

Overflow (4)

FPages (3)

Card (1) Integer type >= 0 (or) -1

NPages (2)

NLeaf (16)

IndCard (12)

Page_Fetch_Pairs (14)

ClusterFactor (13)

NumNULLS (6)

Colcard (5)

Quantile Value Distribution (11)

Frequency Value Distribution (10)

High2Key (9)

NLevels (17)

Density (19)

NumRIDs (15)

Num_Empty_Leafs (18)

AvgColLenChar (7)

Low2Key (8)

DB2 Directed Acyclic Constraint Graph

Super Nodes

Legality Constraint

Consistency constraint

Signifies Order

Additional constraints

Page 37: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Metadata Validation

DB2 Constraint Graph – Super nodes expanded

Quantile Value Distribution Frequency Value Distribution

ColValue (1) ValCount (2) DistCount (3)

ColValue (4) DistCount (6)

ColValue (7) DistCount (9)

ColValue (55) DistCount (57)

ColValue (58) DistCount (60)

ColValue (1) ValCount (2)

ColCard (5) Card (1)

ValCount (5)

ValCount (8)

ValCount (56)

ValCount (59)

ColValue (3)

ColValue (5)

ColValue (19)

ValCount (4)

ValCount (6)

ValCount (20)

High2Key (9) Low2Key (8) High2Key (9) Low2Key (8) ColCard (5) Card (1)

... …

...

... ... ...

... …

... …

Page 38: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Space overheads due to data stored

• What if we have only the data and not the metadata?

Drop Mode

DATA

METADATA

DATA

I will remove data from database

No space overheads during testing process

Mr. CODD

METADATA

Page 39: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Porting Mode

• Supports transfer of metadata statistics from one engine to another

DB Engine 1

DB Engine 2

Read Catalogs

Write Catalogs

Metadata Metadata

Page 40: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Scale Mode

• Database engine testing had always involved the testing on scaled database instances – TPC-H benchmark provides 1GB – 100TB dataset

• Can we directly produce scaled version of metadata?

• After scaling data remains same, only metadata is scaled

Metadata corresponding

to 1 GB database

Metadata corresponding

to 100 GB database

CODD

Page 41: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

CODD in Action

• Following experiment shows how we can easily access , using CODD, the optimizer’s altered behavior in response to futuristic scenarios

32 Plans 77 Plans

QT 9 Plan diagrams (Baseline and Scaled)

Page 42: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

CODD in Action

• By iteratively executing CODD on popular commercial query optimizer, with the database size increasing in each iteration, it was discovered that the cardinality estimation module “saturated” when the input data size exceeds 10e19 bytes, - no mention of this threshold was found in publically available documents of the system

Page 43: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Overview

• CODD is a java based graphical tool that supports ab-initio creation, retention, scaling and porting of metadata statistics

– Operational on DB2/Oracle/SQLServer/Sybase/PostgreSQL

– Around 40,000 LOC

– Released in 2012

– Accepted at DBTest, 2012

– Awarded at IBM-ICARE 2012.

Page 44: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Construct Mode Demo

• Database engine: DB2

• TPC-H Schema is created (Data is not loaded)

• Relation part is chosen to construct from scratch

Page 45: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Construct Mode Demo

Page 46: Towards Universally Robust Plans Perspective Seminar · Debuggers Interpreters Integrated Development environment Word processor Image/video editing Video games Simulation software

Thank for the attention!

Questions ?