PostgreSQL BI & Performance - 2ndQuadrant

24
© 2ndQuadrant 2013 [email protected] PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Transcript of PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

PostgreSQLBusiness Intelligence& Performance

Simon RiggsCTO, 2ndQuadrantPostgreSQL Major Contributor

© 2ndQuadrant 2013 [email protected]

The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under gran t agreem ent num ber318633

© 2ndQuadrant 2013 [email protected]

AXLE Project

• Analytics on Xtremely Large European data– Secure– Big– Fast– Hardware

optimised– Visual Analytics

axleproject.eu

© 2ndQuadrant 2013 [email protected]

Topics

• Business Intelligence & Architecture• BI Performance Feature Effectiveness• Benchmark Analysis & Opportunities• New Features in Progress

© 2ndQuadrant 2013 [email protected]

Business Intelligence

• ETL• Reporting• Ad-hoc queries• Data Mining

• Many query types• Counting• Summarisation• Strategic Analysis• Analytics

© 2ndQuadrant 2013 [email protected]

BI Architecture

• SQL was invented for Business Intelligence• Classic DW

– DB2 v Teradata

• Specialist Databases

OLTP DW

© 2ndQuadrant 2013 [email protected]

Specialist OLTP Problems

• M ongoDB– Joins don't scale!

• V oltDB– No concurrency– All SQL must run in same duration– Partial SQL implementation

© 2ndQuadrant 2013 [email protected]

Specialist DW Problems

• Second specialist system required• ETL middleware also often required for loading• Data delayed on route to second system• Frequently highly compressed, so read only or

difficult to maintain

© 2ndQuadrant 2013 [email protected]

Get Real

• Big Data– 99% of databases are <100GB

• Real Time results– Business Intelligence required 24x7– Closed loop processing requires fast response

• SQL is much easier to use than alternatives– More expressive and easier to use– Already the de facto standard for BI

© 2ndQuadrant 2013 [email protected]

Minimal Approach

• Emphasise that additional BI technology will not reduce costs and may not offer solutions

• Keep Business Intelligence on PostgreSQL• Use Hot Standby to expand capacity and

isolate Business Intelligence workloads• Minimise ETL whenever possible• Gain benefits of SQL and concurrency

– Immediate access to data

© 2ndQuadrant 2013 [email protected]

Things To Learn

• Query performance is important• Custom/special data structures are important

in increasing performance• Stale answers are acceptable for many

situations

© 2ndQuadrant 2013 [email protected]

BI Feature Effectiveness

• Problem 1: Get the work into the database

• Problem 2: Speed up the work in the database+++++ Work Avoidance

++ Algorithmic Improvement

+ Brute Force

© 2ndQuadrant 2013 [email protected]

Orange Data Mining

• Orange 3.0generatesSQL for alldata flows

• Directly utilises the power of databases

• Integrates withPostgreSQL

© 2ndQuadrant 2013 [email protected]

BI Tuning Opportunities

• COPY batch optimisations defeated• Btree insert bottlenecks on large data loads• Aggregate Optimisation

– Use sum()/count() not avg()

• Join Estimate/Actual Mismatch– Use enable_nestloops = off

• Plan Pushdown– Manual SQL rewrite

© 2ndQuadrant 2013 [email protected]

Speed Up: Work Avoidance

• Cacheing– Result Cache– Materialized Views

• Approximation• Partition Elimination• Improved Optimisation

© 2ndQuadrant 2013 [email protected]

Speed Up: Algorithms

• Compression• Column Orientation• Vectors• Hardware approaches

© 2ndQuadrant 2013 [email protected]

Speed Ups: Brute Force

• Parallel Query is a brute force approach• Gains in performance come from additional

utilisation of resources, not from being smarter– Reduces overall concurrency– Still requires extensive optimizer changes

• The industry thinks we need it• Some queries do require it• PostgreSQL should do this,

2ndQuadrant can, will and has already helped

© 2ndQuadrant 2013 [email protected]

9.4 BI Features In-Progress

• Min Max Indexes• Parallel Sort & Parallel Query infrastructure• Materialized Views++• Multi-core scalability gains (lwlocks)• (DDL Locking impact reductions)• (Row Level Security)

© 2ndQuadrant 2013 [email protected]

Min Max Indexes (9.4)

• Automatic Partitioning– Store min and max tuples for each page range– Use theorem proving to avoid sections of scan– Covers all columns, not just defined partition key– Can be added easily to existing applications

© 2ndQuadrant 2013 [email protected]

Min Max Index results

• 2 GB table MinMax B-Tree• Index build time 11s 96s• Index size 24kB 1.1GB• Load time w index 1 x2-3• Index SEL (1 row) x2-3 1• Index SEL (many) same same

© 2ndQuadrant 2013 [email protected]

MinMax Indexes

• Does not require complex DDL

• Generate almost no index inserts– Fits in RAM even for Petabytes of data

• Generate almost no additional WAL– Works well with Hot Standby data warehousing

• Only works with some data distributions– Additional indexing may be needed

© 2ndQuadrant 2013 [email protected]

PostgreSQL BI Roadmap

Advanced Business Intelligence

9.4 10.0 10.1

High Security

Online ChangeVery Large Database

© 2ndQuadrant 2013 [email protected]

2ndQuadrant

Consulting,Migration

Training Support,RemoteDBA

Open SourceDevelopment

© 2ndQuadrant 2013 [email protected]

The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under gran t agreem ent num ber318633