Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious...

38
Db2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data Where it Resides Maryela Weihrauch, IBM Distinguished Engineer, [email protected]

Transcript of Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious...

Page 1: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

Db2 for z/OS is Serious About Analytics:

Leveraging SQL to Analyze Your Data

Where it Resides

Maryela Weihrauch, IBM Distinguished Engineer,

[email protected]

Page 2: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 2

Notices and disclaimers • Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.

• U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

• Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

• IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.”

• Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

• Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

• References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

• Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

• It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.

Page 3: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 3

Notices and disclaimers continued

• Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

• The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

• IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Page 4: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 4

Agenda

• Db2 and Analytics – The bigger picture

• Db2 12 Optimizer enhancements

• OLAP BIFs in DB2 and IDAA

– OLAP Specification

– Rolling Aggregations

– Grouping

– Math functions

Page 5: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 5

System z: re-inventing business around an integrated data lifecycle

Reduced data movement, reduced complexity, reduced configuration resources

More accurate, more secure, more available

Better business response

VSAM

Db2

VSAM

IMS

DB2AA

Page 6: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

HTAP -Db2 with Analytics Accelerator

OLAP

Db

2 fo

r z/O

S

Pro

ce

ss

ing

IBM

Db

2 A

na

lytic

s A

cce

lera

tor

Db2 for z/OS CPU

savings target

• Operational (in

transaction) analytics

• (complex) OLTP

Accelerator focus

• Ad-hoc queries

• Complex queries

scanning large

amount of data

• ETL

acceleration/virtual

transformation

Complex queries (more history)

OLTP Transactions

High concurrency

Hybrid Transactional &

Analytical Processing

Standard reports

Page 7: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 7

Optimizer

Acce

lera

tor D

RD

AR

equ

esto

r

Application

Applicatio

n

Interface

Queries executed with Accelerator

Queries executed without Accelerator

Heartbeat (availability and performance indicators)

Query execution run-time for queries

that cannot be or should not be off-

loaded to Accelerator

SPU

MemorySPU

MemorySPU

MemorySPU

Memory

SM

PH

ost

Heartbeat

Db2 for z/OS

CPU FPGA

CPU FPGA

CPU FPGA

CPU FPGA

CPU FPGA

CPU FPGA

CPU FPGA

CPU FPGA

Accelerator

Query Execution Flow

Page 8: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 8

Db2 12 high-level query performance focus

• Query focus based upon newer workloads (modern applications)– Complex views or table UDFs

– UNION ALL

– Outer joins

– Join predicates with (stage 2) expressions

– CASE expressions, CAST functions, scalar functions

• Addressing general bottlenecks

– More sort avoidance, work file and sparse index usage optimization

– Reducing prepare cost and frequency

– List prefetch

– I/O performance

– Reduce unnecessary prefetch scheduling

• NOTE: 100% zIIP eligible for parallel child tasks

Page 9: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 9

Significant CPU reduction in Db2 query workloads

• 30% - 90% reduction in elapsed time and CPU for SQL containing – Complex outer joins, UNION ALL, UDFs and table UDFs

– Combinations of table expressions, views and outer joins

– VARGRAPHIC data type

– Disorganized data, poorly clustered indexes

– Nearly 100% new access paths vs. Db2 11

34%

66,00%

80%

85%

90%

0% 5% 10%15%20%25%30%35%40%45%50%55%60%65%70%75%80%85%90%

SAP CDS FIN

Cust1-Uncl

WAS Portal

SAP CDS Fiori

Cryst. Rpt.-Long

CPU time %

Complex real-time analytics - outer joins, UNION ALL, UDF/Table UDFs

Real-time analytics - large GROUP BY sorts/WF usage, sparse index on VARGRAPHIC

Reporting - poorly clustered index – list prefetch and hybrid join benefit

Complex OLTP -UNION ALL

Complex real-time analytics – outer joins, table expr, CASE, DGTTs

Db2 performanceworkloads

Db2 class 2 CPU reduction

Page 10: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 10

Significant CPU reduction in Db2 query workloads..

15,70%

17,00%

18,00%

19,00%

23,77%

24,00%

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90%

Cryst.Rpt-…

Cust1-Clus

TPCH-…

TPCH-Seq

Cust2

BIDAY-ShortLarge sorts for GROUP BY

Quick complex SQL

Sort intensive and unclustered (DEG=1)

Sort intensive and unclustered (DEG=ANY)

Sub-second complex analytics queries

Traditional query workload

Db2 performanceworkloads

Db2 class 2 CPU

reduction

15% - 25% reduction in elapsed time and CPU for more traditional query workloads

Sort intensive and work file processing

Lots of aggregating

Many new access paths vs. Db2 11

Page 11: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 11

Significant CPU reduction in Db2 query workloads..

0%-15% reduction in elapsed time and CPU for

Simple query workloads

SQL where vast majority of CPU and elapsed time are due to data scanning (IDAA candidates)

As % drops - fewer new access paths vs. Db2 11

3,02%

4,52%

8,66%

11,00%

13,18%

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90%

BIDAY-Long

SAP BW

TPCD

TPCH-SQLPL

Cust3 Traditional simple query application

Sort intensive, clustered version of TPCH

Lightweight query app.

Star Join, mostly scanning overhead (IDAA candidate)

Brute force data scanning (IDAA candidate)

Db2 performanceworkloads

Db2 class 2 CPU reduction

Page 12: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 12

• Rebind without APREUSE to see full potential

• Queries with new access paths consistently were the biggest winners

• Typical performance gains for new access path

– 10%-40% reduction in CPU and elapsed time (can be much higher)

• Typical performance gains with no access path change

– 0%-20% reduction in CPU and elapsed time

• Set DB2 zparm ACCELMODEL to enable accelerator modeling – Supports static and dynamic SQL– Explain will externalize reason if query is not eligible for acceleration– DB2 writes new fields (class 2 elapsed time and CPU time spent in

DB2 to execute SQL eligible for acceleration) in IFCID 3 and IFCID 148, similar to the SECP CPU field – needs to be formatted by a monitor

Give Db2 12 optimizer a chance

Page 13: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 13

BIF-Analytics through SQL

V9 V10 V11 V12

RANK √

DENSE_RANK √

ROW_NUMBER √

Moving AVERAGE √

Moving SUM √

Grouping √

ROLLUP √

CUBE √

MEDIAN √

PERCENTILE √

LISTAGG* √

Page 14: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 14

OLAP Specifications

• OLAP specifications can be used to determine ranking, row numbering, and aggregation as a scalar value

• The OLAP specification is defined using a Window

– A window may specify three (3) components (each is optional)

1. Partitioning – PARTITION BY CLAUSE

2. Ordering – ORDER BY clause

3. Aggregation Group – ROWS/RANGE

Page 15: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 15

Differences between the function types

Function Type

Action Examples

Scalar compute a single value for the current row

SUBSTR, CONCAT, LTRIM

Aggregate collapse a group of rows into a single row and compute a single value for this group

MAX, MIN, SUM, COUNT

Moving Aggregate

compute a single value for the current row based on some or all rows in a defined group

Aggregate function, like above, for a window

Page 16: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 16

RANK, DENSE_RANK, ROW_NUMBER

• RANK/DENSE_RANK

– Returns the ordinal rank of a row in the window

• RANK: may have a gap in the ranking

• DENSE_RANK: no gaps in the ranking

• ROW_NUMBER

– Returns the sequential row number

Page 17: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 17

RANK Example

SELECT EMPNO, LASTNAME, SALARY+BONUS AS TOTAL_SALARY,

RANK() OVER(ORDER BY SALARY+BONUS DESC) AS RANK_SALARY

FROM DSN8A10.EMP WHERE SALARY+BONUS > 30000 ORDER BY RANK_SALARY;

+-----------------------------------------------------------+

| EMPNO | LASTNAME | TOTAL_SALARY | RANK_SALARY |

+-----------------------------------------------------------+

1_| 000010 | HAAS | 53750.00 | 1 |

2_| 200010 | HEMMINGER | 47500.00 | 2 |

3_| 000110 | LUCCHESI | 47400.00 | 3 |

4_| 000020 | THOMPSON | 42050.00 | 4 |

5_| 000050 | GEYER | 40975.00 | 5 |

6_| 000030 | KWAN | 39050.00 | 6 |

7_| 000070 | PULASKI | 36870.00 | 7 |

8_| 000060 | STERN | 32850.00 | 8 |

9_| 000220 | LUTZ | 30440.00 | 9 |

10_| 200220 | JOHN | 30440.00 | 9 |

11_| 000090 | HENDERSON | 30350.00 | 11 |

+-----------------------------------------------------------+

LUTZ & JOHN have same total salary therefore they get the same ranking

HENDERSON gets the rank based off of the number of rows that previously qualified the prior rank

Page 18: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 18

DENSE_RANK –Rank based on uniqueness

SELECT EMPNO, LASTNAME, SALARY+BONUS AS TOTAL_SALARY,

DENSE_RANK() OVER(ORDER BY SALARY+BONUS DESC) AS RANK_SALARY

FROM DSN8A10.EMP WHERE SALARY+BONUS > 30000 ORDER BY RANK_SALARY;

+-----------------------------------------------------------+

| EMPNO | LASTNAME | TOTAL_SALARY | RANK_SALARY |

+-----------------------------------------------------------+

1_| 000010 | HAAS | 53750.00 | 1 |

2_| 200010 | HEMMINGER | 47500.00 | 2 |

3_| 000110 | LUCCHESI | 47400.00 | 3 |

4_| 000020 | THOMPSON | 42050.00 | 4 |

5_| 000050 | GEYER | 40975.00 | 5 |

6_| 000030 | KWAN | 39050.00 | 6 |

7_| 000070 | PULASKI | 36870.00 | 7 |

8_| 000060 | STERN | 32850.00 | 8 |

9_| 000220 | LUTZ | 30440.00 | 9 |

10_| 200220 | JOHN | 30440.00 | 9 |

11_| 000090 | HENDERSON | 30350.00 | 10 |

+-----------------------------------------------------------+

HENDERSON gets a ranking based on the # of unique total salaries that precede it

Page 19: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 19

ROW_NUMBER - Sequential row number

SELECT EMPNO, LASTNAME, SALARY+BONUS AS TOTAL_SALARY,

ROW_NUMBER() OVER(ORDER BY SALARY+BONUS DESC) AS RANK_SALARY

FROM DSN8A10.EMP WHERE SALARY+BONUS > 30000 ORDER BY RANK_SALARY;

+-----------------------------------------------------------+

| EMPNO | LASTNAME | TOTAL_SALARY | RANK_SALARY |

+-----------------------------------------------------------+

1_| 000010 | HAAS | 53750.00 | 1 |

2_| 200010 | HEMMINGER | 47500.00 | 2 |

3_| 000110 | LUCCHESI | 47400.00 | 3 |

4_| 000020 | THOMPSON | 42050.00 | 4 |

5_| 000050 | GEYER | 40975.00 | 5 |

6_| 000030 | KWAN | 39050.00 | 6 |

7_| 000070 | PULASKI | 36870.00 | 7 |

8_| 000060 | STERN | 32850.00 | 8 |

9_| 000220 | LUTZ | 30440.00 | 9 |

10_| 200220 | JOHN | 30440.00 | 10 |

11_| 000090 | HENDERSON | 30350.00 | 11 |

+-----------------------------------------------------------+

Page 20: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 20

Moving AVG Example

SELECT Territory, Month, Sales,

AVG(Sales) OVER (PARTITION BY Territory

ORDER BY Month

ROWS 2 PRECEDING) AS Moving_Avg

FROM Sales_History;

Territory Month Sales

East 199810 10

East 199811 4

East 199812 10

East 199901 7

East 199902 10

West 199810 8

West 199811 12

West 199812 11

West 199901 7

West 199902 6

Aggregation group

Compute the average sales over the current month and the preceding two months, for each territory, and month

Page 21: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 21

Territory Month Sales Moving_Average

East 199810 10 10

East 199811 4 7

East 199812 10 8

East 199901 7 7

East 199902 10 9

West 199810 8 8

West 199811 12 10

West 199812 7 9

West 199901 11 10

West 199902 6 8

Moving averagein action

The results of the query illustrate the “moving average”.In the first row 10 is the average of only that row.The second row is the average of the first 2 rows 10 + 4.The third row and all of the following rows through to the end of the table are the average of the current row and the two preceding 10 + 4 + 10 divided by 3.

Note that the moving average resets on a change in territory starting at row 6 (change EAST to WEST).

Page 22: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 22

Moving SUM Example

SELECT Territory, Month, Sales,

SUM(Sales) OVER (PARTITION BY Territory

ORDER BY Month ASC

ROWS UNBOUNDED PRECEDING) AS Cumulative_Sum

FROM Sales_History;

Aggregation group

Page 23: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 23

Territory Month Sales Cumulative_Sum

East 199810 10 10

East 199811 4 14

East 199812 10 24

East 199901 7 31

East 199902 10 41

West 199810 8 8

West 199811 12 20

West 199812 7 27

West 199901 11 38

West 199902 6 44

Cumulative Sum in action

Page 24: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 24

Grouping Sets –GROUPING, ROLLUP, CUBE

• Grouping sets allow multiple groups to be specified

• Querying and reporting are made easier and faster by producing a single result from essentially performing a UNION ALL of two or more groups of rows

• The group-by-clause has been enhanced:– grouping-sets

• Allows multiple grouping clauses to be specified in a single statement

• Allows the groups to be computed with a single pass over the data

• Can be used to determine subtotals and grand totals

– super-groups

• Pre-defined grouping-sets : ROLLUP, CUBE

Page 25: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 25

ROLLUP• Creates subtotals that ‘roll up’ from the most detailed level to a

grand total

• N elements translate to N+1 grouping sets:

The order specified is significant to the result:

GROUP BY ROLLUP (C1,C2,C3)

GROUP BY GROUPING SETS ((C1,C2,C3),

(C1,C2),

(C1),

());

equivalent

GROUP BY ROLLUP (C1,C2)GROUP BY GROUPING SETS ((C1,C2),

(C1),

());

equivalent

GROUP BY ROLLUP (C2,C1)

GROUP BY GROUPING SETS ((C2,C1),

(C2),

());

equivalent

versus

Page 26: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 26

CUBE • Creates subtotals for all permutations

• N elements translate to 2n grouping sets:

• Unlike ROLLUP, the order specified is NOT significant to the result therefore use ORDER BY to guarantee the order

• grand-total

– the overall aggregation of previous subtotals

– Both CUBE and ROLLUP return this row as the last row

GROUP BY CUBE (C1,C2,C3)GROUP BY GROUPING SETS ((C1,C2,C3),

(C1,C2), (C1,C3), (C2,C3),

(C1), (C2), (C3),

());

equivalent

8 groups

3 elements

grand-total

Page 27: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 29

Grouping Sets on DSN8B10.PROJ

DEPTNO RESPEMP PRSTAFF

B01 000020 1.00

C01 000030 2.00

C01 000030 1.00

D01 000010 12.00

D01 000010 6.50

D11 000060 9.00

D11 000220 2.00

D11 000150 3.00

D11 000160 3.00

D21 000070 6.00

D21 000230 2.00

D21 000250 1.00

D21 000270 2.00

E01 000050 6.00

E01 000050 5.00

E11 000090 5.00

E21 000100 4.00

E21 000320 1.00

E21 000330 1.00

E21 000340 1.00

DEPTNO RESPEMP PRSTAFF

B01 000020 1.00

C01 000030 3.00

D01 000010 18.50

D11 000060 9.00

D11 000150 3.00

D11 000160 3.00

D11 000220 2.00

D21 000070 6.00

D21 000230 2.00

D21 000250 1.00

D21 000270 2.00

E01 000050 11.00

E11 000090 5.00

E21 000100 4.00

E21 000320 1.00

E21 000330 1.00

E21 000340 1.00

SELECT DEPTNO, RESPEMP,

SUM(PRSTAFF)

FROM DSN8B10.PROJ

GROUP BY DEPTNO,RESPEMP;

SELECT DEPTNO, RESPEMP,

SUM(PRSTAFF)

FROM DSN8B10.PROJ

GROUP BY

GROUPING SETS ((DEPTNO,RESPEMP));

17 rows after grouping20 rows originally

equivalent

Page 28: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 30

ROLLUP on DSN8B10.PROJ

B01 ? 1.00

C01 ? 3.00

D01 ? 18.50

D11 ? 17.00

D21 ? 11.00

E01 ? 11.00

E11 ? 5.00

E21 ? 7.00

? ? 73.50

DEPTNO RESPEMP PRSTAFF

B01 000020 1.00

C01 000030 3.00

D01 000010 18.50

D11 000060 9.00

D11 000150 3.00

D11 000160 3.00

D11 000220 2.00

D21 000070 6.00

D21 000230 2.00

D21 000250 1.00

D21 000270 2.00

E01 000050 11.00

E11 000090 5.00

E21 000100 4.00

E21 000320 1.00

E21 000330 1.00

E21 000340 1.00

SELECT DEPTNO, RESPEMP,

SUM(PRSTAFF)

FROM DSN8B10.PROJ

GROUP BY

ROLLUP (DEPTNO,RESPEMP);

SELECT DEPTNO, RESPEMP,

SUM(PRSTAFF)

FROM DSN8B10.PROJ

GROUP BY

GROUPING SETS ((DEPTNO,RESPEMP),

(DEPTNO),

());

26 rows after ROLLUP

Grand total of ALL groups

Subtotals for each

DEPTNO group

equivalent

Subtotals for each

DEPTNO, RESPEMP group

Page 29: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

SELECT DEPTNO, RESPEMP, SUM(PRSTAFF)

FROM DSN8B10.PROJ

GROUP BY

GROUPING SETS ((DEPTNO,RESPEMP),

(DEPTNO),

(RESPEMP),

());

CUBE on DSN8B10.PROJ

? 000010 18.50

? 000020 1.00

? 000030 3.00

? 000050 11.00

? 000060 9.00

? 000070 6.00

? 000090 5.00

? 000100 4.00

? 000150 3.00

? 000160 3.00

? 000220 2.00

? 000230 2.00

? 000250 1.00

? 000270 2.00

? 000320 1.00

? 000330

? 000340 1.00

? ? 73.50

DEPT.. RESP.. PR...

B01 000020 1.00

C01 000030 3.00

D01 000010 18.50

D11 000060 9.00

D11 000150 3.00

D11 000160 3.00

D11 000220 2.00

D21 000070 6.00

D21 000230 2.00

D21 000250 1.00

D21 000270 2.00

E01 000050 11.00

E11 000090 5.00

E21 000100 4.00

E21 000320 1.00

E21 000330 1.00

E21 000340 1.00

SELECT DEPTNO, RESPEMP, SUM(PRSTAFF)

FROM DSN8B10.PROJ

GROUP BY

CUBE (DEPTNO,RESPEMP);

equivalent

Grand total of ALL groups

B01 ? 1.00

C01 ? 3.00

D01 ? 18.50

D11 ? 17.00

D21 ? 11.00

E01 ? 11.00

E11 ? 5.00

E21 ? 7.00

Subtotals for each

DEPTNO group

Subtotals for

each RESPEMP

group

Page 30: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 32

PERCENTILE_CONT and PERCENTILE_DISC - Overview

• PERCENTILE_CONT and PERCENTILE_DISC are categorized as "inverse distribution functions" in the SQL Standard that compute a percentile value within a given group.

• MEDIAN, which we know intuitively as producing the "middle" value can be equivalently expressed as PERCENTILE_CONT(0.5), where the argument 0.5 indicates that we want the function to return the "middle" percentage value.

Page 31: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 33

PERCENTILE_CONT vs PERCENTILE_DISC vs MEDIAN

• PERCENTILE_CONT treats the group as a "continuous distribution", meaning that if the percentage falls between two values, the result is interpolated between those two values.

– For example, the median (PERCENTILE_CONT(0.5)) of a group of 6 numbers would be average of the 3rd and 4th number, because we find the interpolated middle value between those two.

• PERCENTILE_DISC treats the group as a set of discrete values. The result is not interpolated and is always one of the values in the group. PERCENTILE_DISC(0.5) over a group of 6 numbers would return the 3rd number.

• MEDIAN is equivalent with PERCENTILE_CONT(0.5).

Salary

2450

2975

1375

Salary

2450

2975

1250

PERCENTILE_CONT

PERCENTILE_DISC

VS.

Page 32: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

PERCENTILE_CONT and MEDIAN

• Suppose we would like to determine the salary, within each department, which is at the 50th percentile

Salary

2450

2975

1375

Result:

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY SALARY)

FROM EMP

GROUP BY DEPT_NO;

SELECT MEDIAN(SALARY)

FROM EMP

GROUP BY DEPT_NO;

NAME SALARY DEPT_NO

Miller 1300 10

Clark 2450 10

King 5002 10

Smith 800 20

Adams 1100 20

Jones 2975 20

Scott 3000 20

Ford 3000 20

James 950 30

Martin 1250 30

Ward 1250 30

Turner 1500 30

Allen 1600 30

Blake 2850 30

MEDIAN == PERCENTILE(0.5)

Page 33: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

PERCENTILE_DISC

Name Salary DEPT_NO

Miller 1300 10

Clark 2450 10

King 5002 10

Smith 800 20

Adams 1100 20

Jones 2975 20

Scott 3000 20

Ford 3000 20

James 950 30

Martin 1250 30

Ward 1250 30

Turner 1500 30

Allen 1600 30

Blake 2850 30

Salary

2450

2975

1250

Result:

Suppose we would like to determine the

actual salary value, within each department,

which is at the 50th percentile

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY SALARY)

FROM EMP

GROUP BY DEPT_NO;

Page 34: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 37

LISTAGG Overview

• The LISTAGG function is used to aggregate a set of string values into a single string

• The values are appended based on the order specified by the WITHIN GROUP clause

• The values may be separated by a separator

Page 35: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

LISTAGG ALL

DID EName EID EBYear

10 Jack 0012 1983

10 Lily 0015 1990

20 Tom 0019 1983

20 Bob 0022 1976

20 Frank 0004 1983

20 Tom 0014 1985

30 Jerry 0028 1991

30 Chris 0021 1981

30 Jill 0002 1984

30 Jerry 0031 1984

30 Allan 0006 1995

DID Name_List

10 Jack; Lily

20 Bob; Frank; Tom; Tom

30 Chris; Jill; Jerry; Jerry; Allan

Result:

Output employee names, in ascending order, under

the same department according to their employee ID

and birth year from the EMP table

SELECT Dept_ID,

ListAgg(ALL E_Name, ‘; ’) WITHIN GROUP (ORDR BY Birth, Emp_ID)

AS Name_List

FROM EMP

GROUP BY Dept_ID;

*To be delivered in the service stream

Page 36: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

LISTAGG DISTINCT

DID EName EID EBYear

10 Jack 0012 1983

10 Lily 0015 1990

20 Tom 0019 1983

20 Bob 0022 1976

20 Frank 0004 1983

20 Tom 0014 1985

30 Jerry 0028 1991

30 Chris 0021 1981

30 Jill 0002 1984

30 Jerry 0031 1984

30 Allan 0006 1995

DID Name_List

10 Jack; Lily

20 Bob; Frank; Tom

30 Chris; Jill; Jerry; Allan

Result:

Suppose we would like to output employee names, in ascending order, under the same department

according to their employee ID and birth year from the EMP table, eliminating any duplicate names

SELECT Dept_ID,

ListAgg(DISTINCT E_Name, ‘; ’) WITHIN GROUP (ORDR BY Birth, Emp_ID)

AS Name_List

FROM EMP

GROUP BY Dept_ID;

Page 37: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

© 2016 IBM Corporation9/25/2017World of Watson 2016 40

Summary • Bringing analytics to the data is generally more efficient than bringing data to analytics

– Especially true with ever growing amount of data

– Improve data governance, data quality, e2e operational reliance

• Db2 12 Optimizer enhancements focus on emerging SQL pattern

• OLAP BIFs in DB2 and IDAA

– Simple and efficient to find complex insight into data

Page 38: Db2 for z/OS is Serious About Analytics: Leveraging SQL · PDF fileDb2 for z/OS is Serious About Analytics: Leveraging SQL to Analyze Your Data ... CAST functions, scalar ... Db2 for

Maryela Weihrauch IBM

[email protected]

Title

Db2 for z/OS is Serious About Analytics:

Leveraging SQL to Analyze Your Data

Where it ResidesPlease fill out your

session evaluation before

leaving!