Grouping sets sfpug_20141118

Post on 01-Jul-2015

452 views 0 download

description

Upcoming features for PostgreSQL 9.5: GROUPING SETS, including CUBE and ROLLUP

Transcript of Grouping sets sfpug_20141118

GROUPING SETSCCUBE, ROLLUP, and Friends

SFPUG 2014/11/18Copyright© 2014David Fetter

Tuesday, November 18, 14

Thanks,

Tuesday, November 18, 14

Why?!?Tuesday, November 18, 14

Analyzing

Tuesday, November 18, 14

Reporting

Tuesday, November 18, 14

Tuesday, November 18, 14

• CUBE (Power set/Ring the changes)

• ROLLUP (Hierarchy)

• GROUPING SETS (Precision)

Tuesday, November 18, 14

Shhh. A little code.

Tuesday, November 18, 14

CREATE TABLE employee ( id SERIAL PRIMARY KEY, first_name TEXT, last_name TEXT);

CREATE TABLE sales ( employee_id INTEGER NOT NULL, sale_closed TIMESTAMPTZ NOT NULL DEFAULT NOW(), sale_amount MONEY, /* We need to do fix this */ FOREIGN KEY(employee_id) REFERENCES employee(id));

Tables

Tuesday, November 18, 14

INSERT INTO employee (first_name, last_name)VALUES ('Larry', 'Ellison'), ('Bill', 'Gates'), ('Vladimir', 'Yulianov');

Data

Tuesday, November 18, 14

Moar Data

INSERT INTO salesSELECT floor(random()*3)+1, /* Who */ '2014-01-01 00:00:00+00'::timestamptz + random() * interval '1 year', /* When */ (random() * 1000)::numeric(8,2)::MONEY /* ¿Cuando? */FROM generate_series(1,1000);

Tuesday, November 18, 14

How much did each sell each quarter?

Tuesday, November 18, 14

SIMPLE!

Tuesday, November 18, 14

SELECT employee_id, date_trunc('Quarter', sale_closed) AS "Quarter", SUM(sale_amount)FROM sales GROUP BY employee_id, date_trunc('Quarter', sale_closed)ORDER BY employee_id, date_trunc('Quarter', sale_closed);

* I left out some formatting.

Tuesday, November 18, 14

!"""""""""""""#"""""""""#""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %)"""""""""""""*"""""""""*""""""""""""+(12 rows)

Results:

Tuesday, November 18, 14

That's nice, BUT

(We all grimace when we hear that)

Tuesday, November 18, 14

How about annual totals?

Tuesday, November 18, 14

Old way:UNION ALL

Tuesday, November 18, 14

( SELECT employee_id, to_char(date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q') AS "Quarter", sum(sale_amount) FROM sales GROUP BY employee_id, date_trunc('Quarter', sale_closed) ORDER BY employee_id, date_trunc('Quarter', sale_closed))UNION ALL( SELECT employee_id, to_char(date_trunc('Year', sale_closed), 'YYYY') AS "Year", sum(sale_amount) FROM sales GROUP BY employee_id, date_trunc('Year', sale_closed) ORDER BY employee_id, date_trunc('Year', sale_closed));

Still Doable...Kinda

Tuesday, November 18, 14

Results!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 1 % 2014 % $160,477.14 %% 2 % 2014 % $165,131.20 %% 3 % 2014 % $158,285.04 %)"""""""""""""*"""""""""*"""""""""""""+(15 rows)

Tuesday, November 18, 14

That's nice, BUT

Tuesday, November 18, 14

Can't we look at each sales repwith each of their quarterly totals?

Tuesday, November 18, 14

ARGHH!!!!!!

Tuesday, November 18, 14

Tuesday, November 18, 14

These requests are reasonable!

Tuesday, November 18, 14

But the code...not so much.

Tuesday, November 18, 14

Take it from the top!

Tuesday, November 18, 14

CUBE...ring the changes...

Tuesday, November 18, 14

Quick stareSELECT employee_id, to_char( date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q' ) AS "Quarter", sum(sale_amount)FROM sales GROUP BY CUBE ( employee_id, date_trunc('Quarter', sale_closed))ORDER BY employee_id, date_trunc('Quarter', sale_closed);

Tuesday, November 18, 14

Results:!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % 2014-Q1 % $126,311.81 %% % 2014-Q2 % $120,127.75 %% % 2014-Q3 % $118,708.84 %% % 2014-Q4 % $118,744.98 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(20 rows)

Tuesday, November 18, 14

That's nice, BUT

Tuesday, November 18, 14

We don't careabout undifferentiated

quarterly totals.

Tuesday, November 18, 14

Results:!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % 2014-Q1 % $126,311.81 %% % 2014-Q2 % $120,127.75 %% % 2014-Q3 % $118,708.84 %% % 2014-Q4 % $118,744.98 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(20 rows)

Tuesday, November 18, 14

ROLLUP...hierarchy...

Tuesday, November 18, 14

Let's try that!

Tuesday, November 18, 14

SELECT employee_id, to_char( date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q' ) AS "Quarter", sum(sale_amount)FROM sales

GROUP BY ROLLUP( employee_id, date_trunc('Quarter', sale_closed))ORDER BY employee_id, date_trunc('Quarter', sale_closed);

Tuesday, November 18, 14

!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(16 rows)

Hmmm...

Tuesday, November 18, 14

That's nice, BUT

Tuesday, November 18, 14

There was an extra line.

Tuesday, November 18, 14

!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(16 rows)

Tuesday, November 18, 14

Hierarchies: Top to Bottom

Tuesday, November 18, 14

We didn't want the top.

Tuesday, November 18, 14

GROUPING SETS...Precision

Tuesday, November 18, 14

SELECT employee_id, to_char( date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q' ) AS "Quarter", sum(sale_amount)FROM sales GROUP BY GROUPING SETS( (employee_id, date_trunc('Quarter', sale_closed)), (employee_id))ORDER BY employee_id, date_trunc('Quarter', sale_closed);

Tuesday, November 18, 14

Results:!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %)"""""""""""""*"""""""""*"""""""""""""+(15 rows)

Tuesday, November 18, 14

There we go!

Tuesday, November 18, 14

HOW?!?

Tuesday, November 18, 14

Extant Planner/Executor

Tuesday, November 18, 14

Extant Planner/Executor

•HashAgg

Tuesday, November 18, 14

Extant Planner/Executor

•HashAgg

•GroupAggTuesday, November 18, 14

HashAgg

Result Group Intermediate State

Tuesday, November 18, 14

HashAgg

• One pass:

• Update hash value for each row

• Output final value at the end

Tuesday, November 18, 14

HashAgg

• Not yet in GROUPING SETS

• Algorithmic speedup opportunity:

• O(n) vs. O(n log n)

Tuesday, November 18, 14

HashAgg-- :-(

• Non-hashable data types

• Aggregate functions with LOTS of state

• Ordered aggs

• Distinct aggs

• No spill-to-disk

Tuesday, November 18, 14

GroupAgg

• Sorts all input to the agg node to

• Detect group boundary

• Output that group

• Results before end-of-scan

Tuesday, November 18, 14

Phase I

Tuesday, November 18, 14

GroupAgg for ROLLUP

Tuesday, November 18, 14

GroupAgg for ROLLUP

• Sort for the heirarchy

Tuesday, November 18, 14

GroupAgg for ROLLUP

• Sort for the heirarchy

• Output results at each boundary

Tuesday, November 18, 14

GroupAgg for ROLLUP

• Sort for the heirarchy

• Output results at each boundary

• k for the price of one!

Tuesday, November 18, 14

Phase II

Tuesday, November 18, 14

GroupAgg !ROLLUP

Tuesday, November 18, 14

GroupAgg !ROLLUP

Tuesday, November 18, 14

GroupAgg !ROLLUP

• Re-plan input to sort with >1 order

Tuesday, November 18, 14

GroupAgg !ROLLUP

• Re-plan input to sort with >1 order

• Plan keeps tons of global state

Tuesday, November 18, 14

GroupAgg !ROLLUP

• Re-plan input to sort with >1 order

• Plan keeps tons of global state

• Does NOT like to be called >1x/plan

Tuesday, November 18, 14

Tuesday, November 18, 14

GROUPING SETS ~ WINDOW

Tuesday, November 18, 14

WINDOW implementation

Tuesday, November 18, 14

Shuffle a deck of WindowAgg and Sort nodes.

Tuesday, November 18, 14

WindowAgg → Sort → WindowAgg → Sort ...

Tuesday, November 18, 14

Similar pattern

Tuesday, November 18, 14

Tuesday, November 18, 14

• Expand all GROUPING SETS

Tuesday, November 18, 14

• Expand all GROUPING SETS

• Arrange into fewest ROLLUPs

Tuesday, November 18, 14

• Expand all GROUPING SETS

• Arrange into fewest ROLLUPs

• Shuffle Sort and ChainAgg

Tuesday, November 18, 14

GroupAgg → Sort → ChainAgg → Sort → (input data)

Tuesday, November 18, 14

ChainAgg?!?

Tuesday, November 18, 14

ChainAgg Nodes

• Pass input state through unchanged

• Update aggregate state

• Put rows into a chain-wide shared tuplestore when they hit a group boundary

Tuesday, November 18, 14

The Last GroupAgg

• Produces its normal output until end-of-data

• Outputs the shared tuplestore

Tuesday, November 18, 14

Phase III

Tuesday, November 18, 14

Future

Tuesday, November 18, 14

• HashAgg

• Alone?

• With ChainAggs?

• Agg Associativity (A + B) + C = A + (B + C)

• Make CUBE a reserved word?

Tuesday, November 18, 14

Questions?Comments?

Tuesday, November 18, 14