Post on 01-Jul-2015
description
GROUPING SETSCCUBE, ROLLUP, and Friends
SFPUG 2014/11/18Copyright© 2014David Fetter
Tuesday, November 18, 14
Thanks,
Tuesday, November 18, 14
Why?!?Tuesday, November 18, 14
Analyzing
Tuesday, November 18, 14
Reporting
Tuesday, November 18, 14
Tuesday, November 18, 14
• CUBE (Power set/Ring the changes)
• ROLLUP (Hierarchy)
• GROUPING SETS (Precision)
Tuesday, November 18, 14
Shhh. A little code.
Tuesday, November 18, 14
CREATE TABLE employee ( id SERIAL PRIMARY KEY, first_name TEXT, last_name TEXT);
CREATE TABLE sales ( employee_id INTEGER NOT NULL, sale_closed TIMESTAMPTZ NOT NULL DEFAULT NOW(), sale_amount MONEY, /* We need to do fix this */ FOREIGN KEY(employee_id) REFERENCES employee(id));
Tables
Tuesday, November 18, 14
INSERT INTO employee (first_name, last_name)VALUES ('Larry', 'Ellison'), ('Bill', 'Gates'), ('Vladimir', 'Yulianov');
Data
Tuesday, November 18, 14
Moar Data
INSERT INTO salesSELECT floor(random()*3)+1, /* Who */ '2014-01-01 00:00:00+00'::timestamptz + random() * interval '1 year', /* When */ (random() * 1000)::numeric(8,2)::MONEY /* ¿Cuando? */FROM generate_series(1,1000);
Tuesday, November 18, 14
How much did each sell each quarter?
Tuesday, November 18, 14
SIMPLE!
Tuesday, November 18, 14
SELECT employee_id, date_trunc('Quarter', sale_closed) AS "Quarter", SUM(sale_amount)FROM sales GROUP BY employee_id, date_trunc('Quarter', sale_closed)ORDER BY employee_id, date_trunc('Quarter', sale_closed);
* I left out some formatting.
Tuesday, November 18, 14
!"""""""""""""#"""""""""#""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %)"""""""""""""*"""""""""*""""""""""""+(12 rows)
Results:
Tuesday, November 18, 14
That's nice, BUT
(We all grimace when we hear that)
Tuesday, November 18, 14
How about annual totals?
Tuesday, November 18, 14
Old way:UNION ALL
Tuesday, November 18, 14
( SELECT employee_id, to_char(date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q') AS "Quarter", sum(sale_amount) FROM sales GROUP BY employee_id, date_trunc('Quarter', sale_closed) ORDER BY employee_id, date_trunc('Quarter', sale_closed))UNION ALL( SELECT employee_id, to_char(date_trunc('Year', sale_closed), 'YYYY') AS "Year", sum(sale_amount) FROM sales GROUP BY employee_id, date_trunc('Year', sale_closed) ORDER BY employee_id, date_trunc('Year', sale_closed));
Still Doable...Kinda
Tuesday, November 18, 14
Results!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 1 % 2014 % $160,477.14 %% 2 % 2014 % $165,131.20 %% 3 % 2014 % $158,285.04 %)"""""""""""""*"""""""""*"""""""""""""+(15 rows)
Tuesday, November 18, 14
That's nice, BUT
Tuesday, November 18, 14
Can't we look at each sales repwith each of their quarterly totals?
Tuesday, November 18, 14
ARGHH!!!!!!
Tuesday, November 18, 14
Tuesday, November 18, 14
These requests are reasonable!
Tuesday, November 18, 14
But the code...not so much.
Tuesday, November 18, 14
Take it from the top!
Tuesday, November 18, 14
CUBE...ring the changes...
Tuesday, November 18, 14
Quick stareSELECT employee_id, to_char( date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q' ) AS "Quarter", sum(sale_amount)FROM sales GROUP BY CUBE ( employee_id, date_trunc('Quarter', sale_closed))ORDER BY employee_id, date_trunc('Quarter', sale_closed);
Tuesday, November 18, 14
Results:!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % 2014-Q1 % $126,311.81 %% % 2014-Q2 % $120,127.75 %% % 2014-Q3 % $118,708.84 %% % 2014-Q4 % $118,744.98 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(20 rows)
Tuesday, November 18, 14
That's nice, BUT
Tuesday, November 18, 14
We don't careabout undifferentiated
quarterly totals.
Tuesday, November 18, 14
Results:!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % 2014-Q1 % $126,311.81 %% % 2014-Q2 % $120,127.75 %% % 2014-Q3 % $118,708.84 %% % 2014-Q4 % $118,744.98 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(20 rows)
Tuesday, November 18, 14
ROLLUP...hierarchy...
Tuesday, November 18, 14
Let's try that!
Tuesday, November 18, 14
SELECT employee_id, to_char( date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q' ) AS "Quarter", sum(sale_amount)FROM sales
GROUP BY ROLLUP( employee_id, date_trunc('Quarter', sale_closed))ORDER BY employee_id, date_trunc('Quarter', sale_closed);
Tuesday, November 18, 14
!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(16 rows)
Hmmm...
Tuesday, November 18, 14
That's nice, BUT
Tuesday, November 18, 14
There was an extra line.
Tuesday, November 18, 14
!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(16 rows)
Tuesday, November 18, 14
Hierarchies: Top to Bottom
Tuesday, November 18, 14
We didn't want the top.
Tuesday, November 18, 14
GROUPING SETS...Precision
Tuesday, November 18, 14
SELECT employee_id, to_char( date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q' ) AS "Quarter", sum(sale_amount)FROM sales GROUP BY GROUPING SETS( (employee_id, date_trunc('Quarter', sale_closed)), (employee_id))ORDER BY employee_id, date_trunc('Quarter', sale_closed);
Tuesday, November 18, 14
Results:!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %)"""""""""""""*"""""""""*"""""""""""""+(15 rows)
Tuesday, November 18, 14
There we go!
Tuesday, November 18, 14
HOW?!?
Tuesday, November 18, 14
Extant Planner/Executor
Tuesday, November 18, 14
Extant Planner/Executor
•HashAgg
Tuesday, November 18, 14
Extant Planner/Executor
•HashAgg
•GroupAggTuesday, November 18, 14
HashAgg
Result Group Intermediate State
Tuesday, November 18, 14
HashAgg
• One pass:
• Update hash value for each row
• Output final value at the end
Tuesday, November 18, 14
HashAgg
• Not yet in GROUPING SETS
• Algorithmic speedup opportunity:
• O(n) vs. O(n log n)
Tuesday, November 18, 14
HashAgg-- :-(
• Non-hashable data types
• Aggregate functions with LOTS of state
• Ordered aggs
• Distinct aggs
• No spill-to-disk
Tuesday, November 18, 14
GroupAgg
• Sorts all input to the agg node to
• Detect group boundary
• Output that group
• Results before end-of-scan
Tuesday, November 18, 14
Phase I
Tuesday, November 18, 14
GroupAgg for ROLLUP
Tuesday, November 18, 14
GroupAgg for ROLLUP
• Sort for the heirarchy
Tuesday, November 18, 14
GroupAgg for ROLLUP
• Sort for the heirarchy
• Output results at each boundary
Tuesday, November 18, 14
GroupAgg for ROLLUP
• Sort for the heirarchy
• Output results at each boundary
• k for the price of one!
Tuesday, November 18, 14
Phase II
Tuesday, November 18, 14
GroupAgg !ROLLUP
Tuesday, November 18, 14
GroupAgg !ROLLUP
Tuesday, November 18, 14
GroupAgg !ROLLUP
• Re-plan input to sort with >1 order
Tuesday, November 18, 14
GroupAgg !ROLLUP
• Re-plan input to sort with >1 order
• Plan keeps tons of global state
Tuesday, November 18, 14
GroupAgg !ROLLUP
• Re-plan input to sort with >1 order
• Plan keeps tons of global state
• Does NOT like to be called >1x/plan
Tuesday, November 18, 14
Tuesday, November 18, 14
GROUPING SETS ~ WINDOW
Tuesday, November 18, 14
WINDOW implementation
Tuesday, November 18, 14
Shuffle a deck of WindowAgg and Sort nodes.
Tuesday, November 18, 14
WindowAgg → Sort → WindowAgg → Sort ...
Tuesday, November 18, 14
Similar pattern
Tuesday, November 18, 14
Tuesday, November 18, 14
• Expand all GROUPING SETS
Tuesday, November 18, 14
• Expand all GROUPING SETS
• Arrange into fewest ROLLUPs
Tuesday, November 18, 14
• Expand all GROUPING SETS
• Arrange into fewest ROLLUPs
• Shuffle Sort and ChainAgg
Tuesday, November 18, 14
GroupAgg → Sort → ChainAgg → Sort → (input data)
Tuesday, November 18, 14
ChainAgg?!?
Tuesday, November 18, 14
ChainAgg Nodes
• Pass input state through unchanged
• Update aggregate state
• Put rows into a chain-wide shared tuplestore when they hit a group boundary
Tuesday, November 18, 14
The Last GroupAgg
• Produces its normal output until end-of-data
• Outputs the shared tuplestore
Tuesday, November 18, 14
Phase III
Tuesday, November 18, 14
Future
Tuesday, November 18, 14
• HashAgg
• Alone?
• With ChainAggs?
• Agg Associativity (A + B) + C = A + (B + C)
• Make CUBE a reserved word?
Tuesday, November 18, 14
Questions?Comments?
Tuesday, November 18, 14
Thanks!SFPUG 2014/11/18Copyright© 2014David Fetter
Tuesday, November 18, 14