Recursive SQL

75
1 May 22, 2008 • 08:00 a.m. – 09:00 a.m. Platform: DB2 for z/OS Suresh Sane DST Systems Inc. Recursive SQL – Unleash the Power! Session: G13 Recursive SQL is one of the most fascinating and powerful (and dangerous!) features offered in DB2 for z/OS Version 8. In this session, we will introduce the feature and show numerous examples of how it can be used to achieve things you would not have imagined being possible with SQL – all in one SQL statement! Fasten your seat belts and come join us in this exciting journey!

Transcript of Recursive SQL

Page 1: Recursive SQL

1

May 22, 2008 • 08:00 a.m. – 09:00 a.m.Platform: DB2 for z/OS

Suresh SaneDST Systems Inc.

Recursive SQL –Unleash the Power!

Session: G13

Recursive SQL is one of the most fascinating and powerful (and dangerous!) features offered in DB2 for z/OS Version 8. In this session, we will introduce the feature and show numerous examples of how it can be used to achieve things you would not have imagined being possible with SQL – all in one SQL statement! Fasten your seat belts and come join us in this exciting journey!

Page 2: Recursive SQL

2

1. Recursion basics 2. Case studies - mathematical3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations

2

Session Outline

Recursion basics•Theory and introduction

Case studies - mathematical•String of numbers•Factorial•Primes•Fibonacci Series

Case studies - business•Org chart•Generating test data•Missing data•Rollup•Allocation•Weighted allocation•RI children•Cheapest fare•Account linking

Performance aspects•Comparison to procedural logic

•Org chart •Rollup •Allocate

Pitfalls and recommendations•Best practices

Page 3: Recursive SQL

3

♦ Co-author-IBM Redbooks SG24-6418, May 2002SG24-7083, March 2004SG24-7111, July 2006

♦ Educational seminars and presentations at IDUG North America, Asia Pacific, Canada and Europe

♦ IDUG Solutions Journal article – Winter 2000♦ Numerous DB2 courses at various locations♦ IBM Certified Solutions Expert for both platforms for

Application Development and Database Administration

Suresh Sane

3

About the Instructor

Suresh Sane works as a Database Architect at DST Systems in Kansas City, MO and manages the Database Consulting Group, which provides strategic direction for deployment of database technology at DST and has overall responsibility for the DB2 curriculum for about 1,500 technical associates.

Contact Information:

[email protected]

Suresh SaneDST Systems, Inc.1055 BroadwayKansas City, MO 64105USA

(816) 435-3803

Page 4: Recursive SQL

4

4

http://www.dstsystems.com

About DST Systems

♦Leading provider of computer software solutions and services, NYSE listed – “DST”♦Revenue $2.24 billion♦110 million+ shareowner accounts

♦24,000 MIPS♦150 TB DASD♦145,000 workstations♦462,000 DB2 objects♦Non-mainframe: 600 servers (DB2, Oracle, Sybase) with 2.3 million objects

If you have ever invested in a mutual fund, have had a prescription filled, or are a cable or satellite television subscriber, you may have already had dealings with our company.

DST Systems, Inc. is a publicly traded company (NYSE: DST) with headquarters in Kansas City, MO. Founded in 1969, it employs about 12,000 associates domestically and internationally.

The three operating segments - Financial Services, Output Solutions and Customer Management - are further enhanced by DST’s advanced technology and e-commerce solutions.

Page 5: Recursive SQL

5

5

It’s cool but I will never use it…

“…400,000 customers in a hierarchy of about 4,000 nodesresulting in over 2.5 million queries, the application took daysto calculate total sales.”“ …a single recursive query processed …in about 5 minutes”

Dan Luksetich, IDUG Solutions Journal, May 2004.

♦ Of pure academic interest …”cool stuff” but♦ … does it have any business value? ♦ THINK AGAIN!

See ref #7 for details on this article. I have to admit my first reaction was that this feature was “cool” but of little business value. Dan’s article provoked my interest in this fascinating area of SQL.

Page 6: Recursive SQL

6

6

1. Recursion basics 2. Case studies - mathematical3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations

Where Are We?Where Are We?

We will begin with a simple explanation which introduces the concept of recursion in general (not just for SQL).

Page 7: Recursive SQL

7

7

Recursion Simplified

What is 4! ?

It is 4 * 3!

But what is 3! ?

It is 3 * 2!

But what is 2! ?

It is 2 * 1!

But what is 1! ? 1

2*1

2

3*2

6

4*6

24

A simple example to illustrate the concept.

Page 8: Recursive SQL

8

8

Case 1 – String of Numbers

WITH NUMBERS (LEVEL, NEXTONE) AS(

SELECT 1, 1 FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, LEVEL + 1 FROM NUMBERS WHERE LEVEL < 100

)SELECT NEXTONE FROM NUMBERS ORDER BY NEXTONE

1 2 3 4 5

Prime

Pumpmore

CTE

Use the result

Coloring scheme used throughout thispresentation to denote each of the 4 parts

What makesthis recursive

The best way to introduce recursive SQL? Look at a simple example. Let’s dive right in.

The definition – WITH NUMBERS – is an example of a Common Table Expression (CTE) which is similar to the Nested Table Expression (NTE) most of us already familiar with. It is required for using recursive SQL.

Also note the coloring scheme used consistently throughout this presentation to denote the 4 parts of the SQL

Page 9: Recursive SQL

9

9

Recursion Basics

♦ A technique where a function calls itself to perform some part of the task

♦ Requires:An initialization select (“priming the pump”)An iterative select (“pump more”) – from the previous, how do I get the next one?A main select (“use the result”)

♦ Must use a common table expression (CTE)♦ Allows for looping “procedural” logic – all in one statement!

DO WHILE, REPEAT etc from SQL Procedures can sometimes be replaced by one SQL statement

Some of the basics. The “pump” terminology is borrowed from Tink Tysor (see ref #1). Tink provides very thorough introduction to this fascinating topic.

Page 10: Recursive SQL

10

10

Rules for CTE

♦ First full select of first UNION must not reference the CTE♦ All selects within CTE cannot use DISTINCT

This is a major limitation when cycles are present –DISTINCT on the outer query is expensiveNeed an ability to specify DISTINCT without the level

♦ All selects within CTE cannot use GROUP BY or HAVING♦ Include only 1 reference to the CTE♦ Initialization select and Iterative select columns must match (data

types, lengths, CCSIDs)♦ UNION must be a UNION ALL♦ Outer joins cannot be part of any recursion cycle♦ Subquery cannot be part of any recursion cycle

Some of the restrictions on what can be coded within the common table expression (CTE).

Page 11: Recursive SQL

11

Where Are We?

11

1. Recursion basics 2. Case studies - mathematical 3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations

Let us look at some mathematical applications starting with some simple examples. We will look at Factorial, Generating primes and Fibonacci series.

If mathematics scares you, do not get discouraged – these are actually easier to illustrate the concept. We will build on this foundation and cover several business cases in the next section.

Page 12: Recursive SQL

12

12

Case 2 – Factorial

WITH NUMBERS (LEVEL, FACTO) AS(

SELECT 1, 1 FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, FACTO * (LEVEL + 1) FROM NUMBERS WHERE LEVEL < 12

)SELECT LEVEL AS NUMBER, FACTO AS FACTORIAL FROM NUMBERS ORDER BY LEVEL

1 1 2 2 3 6 4 24 5 120

Just a little bit more complex – we simply generate the numbers and report them in the “use the results” section of the SQL.

Page 13: Recursive SQL

13

WITH NUMBERS (LEVEL, PRIME) AS ( SELECT 1, 1

FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, LEVEL + 1 FROM NUMBERS WHERE LEVEL < 5000 )

13

Case 3 – Generating Primes - SQL

SQL Continued on next slide

SQL to generate prime numbers.

Page 14: Recursive SQL

14

SELECT X.PRIME FROM (SELECT NUMBERS.LEVEL,

NUMBERS.PRIME FROM NUMBERS ) AS X WHERE NOT EXISTS

(SELECT 1 FROM NUMBERS Y WHERE Y.PRIME BETWEEN 2 AND SQRT(X.PRIME) AND MOD(X.PRIME, Y.PRIME) = 0)

ORDER BY X.PRIME

14

Case 3 – Generating Primes – SQL (cont)

12357

11

4689

10

Prime

Non-prime

Has a factor

Actual SQL to generate prime numbers, continued.

Note that we can stop checking for factors once we cross SQRT of that number (a big gain in performance – for example, instead of checking 5000, we can stop at 70).

On a theoretical note – the answer to the question “Is 1 a prime?” depends on the definition. Let’s leave that discussion to Math forums. Here, I assume it is a prime.

Page 15: Recursive SQL

15

15

Case 4 – Fibonacci Series

♦ The first two numbers in the series are one and one. ♦ To obtain each number of the series, you simply add the two

numbers that came before it. In other words, each number of the series is the sum of the two numbers preceding it.

1 1 2 3 5 8 13 21The Parthenon with the “Golden ratio”

The Fibonacci Spiral

Some historical background information about Fibonacci Series (made popular by the recent “Da Vinci Code”):

•A sequence of numbers first created by Leonardo Fibonacci in 1202.

•It is a deceptively simple series, but its ramifications and applications are nearly limitless.

•The number 1.618..., or Phi, is the ratio of the next number to the previous number. Called the “Golden ratio”, it has implications in art, architecture and numerous other disciplines.

Page 16: Recursive SQL

16

WITH NUMBERS (LEVEL, NEXTNUM, TOTAL) AS(SELECT 1, 1, 1 FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1

, (NEXTNUM + TOTAL) , (NEXTNUM + TOTAL + TOTAL)

FROM NUMBERS WHERE LEVEL = LEVEL AND LEVEL <= 22 )

16

Case 4 – Fibonacci Series - SQL

SQL Continued on next slide

Level Nextnum Total

1 1 12 2 33 5 84 13 215 34 55…

SQL to generate Fibonacci series.

Page 17: Recursive SQL

17

SELECT A.NEXTNUM FROM NUMBERS A WHERE A.LEVEL <= 22 UNION ALL SELECT B.TOTAL FROM NUMBERS B WHERE B.LEVEL <= 22 ORDER BY 1

17

Case 4 – Fibonacci Series – SQL (cont)

1 1 2 3 5 8

13213455..

Series

Level Nextnum Total

1 1 12 2 33 5 84 13 215 34 55…

Actual SQL to generate Fibonacci series, continued.

Page 18: Recursive SQL

18

18

1. Recursion basics 2. Case studies - mathematical 3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations

Where Are We?Where Are We?

Now let us start to explore recursive SQL for business applications.

Page 19: Recursive SQL

19

19

Case studies - Business

♦ Case 5 – Org chart♦ Case 6 – Generating test data♦ Case 7 – Missing data♦ Case 8 – Rollup♦ Case 9 – Even allocation♦ Case 10 – Weighted allocation♦ Case 11– RI children♦ Case 12 – Cheapest fare♦ Case 13 – Account linking

A list of case studies we explore in this section.

Page 20: Recursive SQL

20

20

Case 5 - Org Chart

1-Wolf

2-Sontag 3-Truscott 4-Rodwell

5-Hamman 6-Jacoby 7-Wei 8-Kelsey 9-Reese

10-Schapiro

Case of how recursive SQL can be used effectively for traversing hierarchical structures.

Page 21: Recursive SQL

21

21

Case 5 – RECASE05 Table

EMPID EMPNAME MGRID

1 Wolf -2 Sontag 13 Truscott 14 Rodwell 15 Hamman 26 Jacoby 27 Wei 38 Kelsey 49 Reese 410 Schapiro 9

RECASE05

Contents of the table RECASE05 that defines the hierarchy.

Page 22: Recursive SQL

22

22

Case 5 - SQL

WITH OC (LEVEL, MGRID, MGRNAME, EMPID, EMPNAME) AS ( SELECT 0, 0, ' ', EMPID, EMPNAME FROM RECASE05 WHERE MGRID IS NULLUNION ALL SELECT BOSS.LEVEL + 1, SUB.MGRID, BOSS.EMPNAME , SUB.EMPID, SUB.EMPNAME FROM OC BOSS, RECASE05 SUB WHERE BOSS.EMPID = SUB.MGRID AND BOSS.LEVEL < 5 )SELECT OC.LEVEL, OC.MGRNAME, OC.EMPNAME FROM OC WHERE LEVEL > 0 ORDER BY OC.LEVEL , OC.MGRID, OC.EMPID

My direct reports

Actual SQL to generate the org chart.

The initialization select (WHERE MGRID IS NULL) could have been written instead as: (WHERE EMPID = 1) .

Page 23: Recursive SQL

23

23

Case 5 - Intermediate Result

LEVEL MGRID MGRNAME EMPID EMPNAME

0 0 NULL 1 WOLF

1 1 WOLF 2 SONTAG1 1 WOLF 3 TRUSCOTT1 1 WOLF 4 RODWELL2 2 SONTAG 5 HAMMAN2 2 SONTAG 6 JACOBY 2 3 TRUSCOTT 7 WEI

2 4 RODWELL 8 KELSEY2 4 RODWELL 9 REESE 3 9 REESE 10 SCHAPIRO

Common Table ExpressionOC

Contents of the common table expression OC as the SQL is executed.

Page 24: Recursive SQL

24

24

Case 5 - Result

LEVEL MGRNAME EMPNAME1 WOLF SONTAG 1 WOLF TRUSCOTT 1 WOLF RODWELL 2 SONTAG HAMMAN 2 SONTAG JACOBY 2 TRUSCOTT WEI 2 RODWELL KELSEY 2 RODWELL REESE 3 REESE SCHAPIRO

… and the result.

Page 25: Recursive SQL

25

25

Case 6 – Generating Test Data

EMPID SMALLINT 1 thru 10000FNAME CHAR(20) 3-7 charsLNAME CHAR(20) 3-10 charsSALARY DEC(7,2) 1000 to 5000HIREDATE DATE 21 years thru 1 year

RECASE06

Table structure for a table containing various data types which needs to be populated with test data.

If you needed 10,000 rows of test data on this table you would use a table editor (or SPUFI) to insert some rows and repeat them. How “random” would this data really be? In real life, not really random at all. This technique allows you to do so quite easily.

Page 26: Recursive SQL

26

26

Case 6 – Generating Test Data - SQL

INSERT INTO RECASE06 WITH NUMBERS (LEVEL, NEXTONE) AS

(SELECT 1, 1 FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, LEVEL + 1 FROM NUMBERS WHERE LEVEL < 10 )

SELECT INTEGER(ROUND(RAND()*9999,0)) + 1, LEFT(SUBSTR('BCDFGHJKLMNPRSTVWZ',

INTEGER(ROUND(RAND()*17,0))+1, 1)CONCAT SUBSTR('AEIOUY', INTEGER(ROUND(RAND()*5,0))+1, 1)<<< REPEAT 5 TIMES>>> , INTEGER(ROUND(RAND()*4,0)) + 3)

1 thru 10,000

5 sets of letters +vowels

Min 3, max 7SQL Continued on next slide

SQL used for this purpose.

Page 27: Recursive SQL

27

27

Case 6 – Generating Test Data – SQL (cont)

, LEFT(SUBSTR('BCDFGHJKLMNPRSTVWZ', INTEGER(ROUND(RAND()*17,0))+1, 1)CONCAT SUBSTR('AEIOUY', INTEGER(ROUND(RAND()*5,0))+1, 1)

<<< REPEAT 5 TIMES>>> , INTEGER(ROUND(RAND()*7,0)) + 3)

, DECIMAL((1000.00 + RAND()*4000),7,2) , CURRENT DATE - 1 YEAR

- INTEGER(20*365*RAND()) DAYS FROM NUMBERS

Same for Last name

Min 1000, max 5000

1 year thru21 years

Min 3, max 10

SQL continued.

Page 28: Recursive SQL

28

28

Case 6 – Generating Test Data - Result

EMPID FNAME LNAME SALARY HIREDATE

1369 ZITO RYDI 2250.93 2003-02-032714 KUMU PUDIN 3112.80 1997-02-023485 FOCAC VOGYNO 1155.08 1991-09-115351 CINIFI DELOV 2651.32 2003-02-166297 CIKEZ TAGINIMUW 1252.73 1990-10-296474 COLO VOLECECU 2948.95 1993-01-286629 LIHU DACACOMILY 4721.00 1989-06-078463 VIVO VOLEZO 1634.31 2002-01-158494 FALIHED SUPO 1542.59 1988-06-068787 MUDOD SIC 3874.86 1999-06-04…

…and the result from one of my test runs (we come up with some interesting names!). It could be adjusted to reflect the regional demographics.

Page 29: Recursive SQL

29

EMPID FRIDAY_DATE HOURS1 2005-03-04 41.01 2005-03-11 42.01 2005-03-18 43.01 2005-03-25 44.02 2005-03-04 39.02 2005-03-18 46.0

EMPID EMPNAME1. Alan Sontag 2. Bobby Wolf 3. Dorothy Truscott

29

Case 7 – Missing Data Non-recursive

EX7EMPL

EX7TIME

For the month, notice that empid 3 has not reported any time; 2 has entered only partially.

A simple case involving a set of two tables with time reported by week. Some employees may fail to report their time.

Page 30: Recursive SQL

30

EMPNAME FRIDAY_DATE TOTHOURSAlan Sontag 2005-03-04 41.0 Alan Sontag 2005-03-11 42.0 Alan Sontag 2005-03-18 43.0 Alan Sontag 2005-03-25 44.0 Bobby Wolf 2005-03-04 39.0 Bobby Wolf 2005-03-11 0.0 Bobby Wolf 2005-03-18 46.0 Bobby Wolf 2005-03-25 0.0 Dorothy Truscott 2005-03-04 0.0 Dorothy Truscott 2005-03-11 0.0 Dorothy Truscott 2005-03-18 0.0 Dorothy Truscott 2005-03-25 0.0

30

Case 7 – Missing Data Non-recursive

Required Report

List all employees and the reported time for March 2005. If an employee failed to report time, show it as zeroes.

We need a report that includes any missing information. Since we are dealing with optional data, our first thought would be a Left Outer Join.

However, how do you create a “left” table that has all the data when rows themselves are missing? See next slide to see how we can generate such a “left” table.

Page 31: Recursive SQL

31

SELECT CART.EMPNAME, CART.FRIDAY_DATE, SUM(COALESCE(TIME.HOURS,0.0)) AS TOTHOURS

FROM ( SELECT EMPL.EMPID, EMPL.EMPNAME,

FRIDAYS.FRIDAY_DATE FROM EX7EMPL EMPL Employees

INNER JOIN All Fridays( SELECT DISTINCT FRIDAY_DATE FROM EX7TIME WHERE FRIDAY_DATE BETWEEN

'2005-03-01' AND '2005-03-31‘) AS FRIDAYS ON 1=1 Cartesian Product) AS CART

31

Case 7 – Missing Data Non-recursive

SQL Continued on next slide

CART is a nested table expression (NTE) of all employee-Fridays combinations that is such a “left” table, mentioned in the previous slide.

Page 32: Recursive SQL

32

32

Case 7 – Missing Data Non-recursive (cont)

LEFT OUTER JOIN EX7TIME TIME … and their time if anyON CART.EMPID = TIME.EMPID AND CART.FRIDAY_DATE = TIME.FRIDAY_DATE GROUP BY

CART.EMPNAME , CART.FRIDAY_DATE

ORDER BY CART.EMPNAME

, CART.FRIDAY_DATE

We now perform a Left Outer Join of CART with the TIME table to show the time reported, if any.

Notice that if all of them cooperate and no one enters the time, this query will fail!

Page 33: Recursive SQL

33

33

Case 7 – Missing Data - Recursive

WITH DATES (LEVEL, NEXTDAY) AS(

SELECT 1, DATE('2005-03-01') FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, NEXTDAY + 1 DAYFROM DATES WHERE LEVEL < 32

)

SQL Continued on next slide

An alternative solution to the same problem using recursive SQL.

Also note that unlike the previous solution that requires at least one employee to enter the time, this solution works irrespective of who has entered the time.

Page 34: Recursive SQL

34

34

Case 7 – Missing Data – Recursive (cont)

SELECT EMPL.EMPNAME , DATES.NEXTDAY AS FRIDAY_DATE , COALESCE(TIME.HOURS,0.0) AS TOTHOURS

FROM DATES INNER JOIN

EX7EMPL EMPL ON 1 = 1 LEFT OUTER JOIN

EX7TIME TIME ON DATES.NEXTDAY = TIME.FRIDAY_DATE AND EMPL.EMPID = TIME.EMPID WHERE NEXTDAY BETWEEN '2005-03-01' AND '2005-03-31'AND DAYOFWEEK(NEXTDAY) = 6 ORDER BY EMPL.EMPNAME

, DATES.NEXTDAY

Cartesian Product

is a Friday

SQL to generate the missing data, continued.

Alternatively, we could generate just the Friday dates by pushing these constructs into the CTE itself. However, care must be taken to start with a Friday (2005-03-04), otherwise the initial select adds zero rows to the CTE. You must also increment by 7 since first non-Friday will terminate the loop (this is left as an exercise for the reader).

Page 35: Recursive SQL

35

35

Case 8 – Rollup

1-CORP(500)

2-I.T.(400)

3-H.R.(100)

4-DB SVC(1000)

5-PROG(5000)

6-EMPL REL(50)

7-BENEFITS(30)

Case of a hierarchy where the salary amounts must be rolled up to the higher costs center (e.g. Database Services and Programming are to be rolled up to the IT salary budget).

Page 36: Recursive SQL

36

36

Case 8 – Rollup SQL

WITH ROLLUP (ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, TOTAL_BUDGET) AS

(SELECT ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, BUDGET_AMTFROM RECASE08

UNION ALL SELECT A.ACCT_NUM, A.ACCT_NAME,

A.PARENT_ACCT_NUM, B.TOTAL_BUDGETFROM RECASE08 A

, ROLLUP B WHERE A.ACCT_NUM = B.PARENT_ACCT_NUM AND B.PARENT_ACCT_NUM IS NOT NULL )

SQL Continued on next slide

me

and my boss

SQL used for this purpose.

The table RECCASE09 contains:

ACCT_NUM INTEGER NOT NULL (PK)ACCT_NAME CHAR(20) NOT NULLPARENT_ACCT_NUM INTEGER BUDGET_AMT DEC(9,2) NOT NULL

Page 37: Recursive SQL

37

37

Case 8 – Rollup SQL (cont)

SELECT ACCT_NUM, ACCT_NAME, SUM(TOTAL_BUDGET) AS TOTAL_BUDGET

FROM ROLLUP GROUP BY ACCT_NUM, ACCT_NAME ORDER BY ACCT_NUM

SQL continued.

Page 38: Recursive SQL

38

ACCT_NUM ACCT_NAME TOTAL_BUDGET

1 CORP 5002 I.T. 4003 H.R. 1004 DB SVC 10005 PROG 50006 EMPL REL 507 BENEFITS 30

38

Case 8 – Rollup Result

18064007080

… and the result.

Page 39: Recursive SQL

39

39

Case 9 – Even Allocation

1-CORP(6000)

2-I.T.(4000)

3-H.R.(1000)

4-DB SVC(5000)

5-PROG(1000)

6-EMPL REL(100)

7-BENEFITS(50)

Case of a hierarchy where bonus amounts are to be allocated evenly across all subordinates. For example, the $6,000 for DST is to be split into 2 parts ($3,000 each) for IT and HR. We must allocate from top down.

Page 40: Recursive SQL

40

40

Case 9 – Even Allocation SQL

WITH ALLOC (ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, TOTAL_BONUS) AS(SELECT ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, BONUS_AMT FROM RECASE10UNION ALL SELECT A.ACCT_NUM, A.ACCT_NAME, A.PARENT_ACCT_NUM , B.TOTAL_BONUS /

(SELECT COUNT(*) FROM RECASE10 X WHERE X.PARENT_ACCT_NUM =

A.PARENT_ACCT_NUM) FROM RECASE10 A

, ALLOC B WHERE B.ACCT_NUM = A.PARENT_ACCT_NUM)

DivideEvenly

SQL Continued on next slide

me

and mychildren

SQL used for this purpose.

The table RECCASE10 contains:

ACCT_NUM INTEGER NOT NULL (PK)ACCT_NAME CHAR(20) NOT NULL PARENT_ACCT_NUM INTEGER BONUS_AMT DEC(9,2) NOT NULL

Page 41: Recursive SQL

41

41

Case 9 – Even Allocation SQL (cont)

SELECT ACCT_NUM, ACCT_NAME, SUM(TOTAL_BONUS) FROM ALLOC GROUP BY ACCT_NUM, ACCT_NAME ORDER BY ACCT_NUM

SQL continued.

Page 42: Recursive SQL

42

ACCT_NUM ACCT_NAME TOTAL_BONUS

1 CORP 60002 I.T. 40003 H.R. 10004 DB SVC 50005 PROG 10006 EMPL REL 1007 BENEFITS 50

42

Case 9 – Even Allocation Result

850040007000

450021002050

…and the result.

Page 43: Recursive SQL

43

43

Case 10 – Weighted Allocation

1-CORP(5000,R=null)

2-I.T.(4000,R=4)

3-H.R.(1000,R=1)

5-DB SVC(5000,R=5)

6-PROG(1000,R=3)

7-EMPL REL(100,R=3)

8-BENEFITS(50,R=2)

Similar hierarchy as before but this time, we need to accomplish a weighted allocation based on rating. For example the $6,000 bonus for CORP is to be divided using the ratings of 5 and 1 for IT and HR - IT will obtain 4 / (4+1) = 4/5 of the $5,000. As before, we must process top down.

Page 44: Recursive SQL

44

44

Case 10 – Weighted Allocation SQL

WITH ALLOC (ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, TOTAL_BONUS) AS(SELECT ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, BONUS_AMT FROM RECASE11UNION ALL SELECT A.ACCT_NUM, A.ACCT_NAME, A.PARENT_ACCT_NUM , B.TOTAL_BONUS * A.RATING / (SELECT SUM(RATING) FROM RECASE11 X

WHERE X.PARENT_ACCT_NUM = A.PARENT_ACCT_NUM) FROM RECASE11 A

, ALLOC B WHERE B.ACCT_NUM = A.PARENT_ACCT_NUM )

DivideBasedOn Rating

SQL Continued on next slide

My children

SQL for this purpose.

The table RECCASE11 contains:

ACCT_NUM INTEGER NOT NULL (PK)ACCT_NAME CHAR(20) NOT NULL RATING SMALLINT PARENT_ACCT_NUM INTEGER BONUS_AMT DEC(9,2) NOT NULL

Page 45: Recursive SQL

45

45

Case 10 – Weighted Allocation SQL (cont)

SELECT ACCT_NUM, ACCT_NAME, SUM(TOTAL_BONUS)FROM ALLOC GROUP BY ACCT_NUM, ACCT_NAME ORDER BY ACCT_NUM

SQL continued.

Page 46: Recursive SQL

46

ACCT_NUM ACCT_NAME TOTAL_BONUS

1 CORP 50002 I.T. 40003 H.R. 10004 DB SVC 50005 PROG 10006 EMPL REL 1007 BENEFITS 50

46

Case 10 – Weighted Allocation Result

1000020008000

40001300850

41

53

32

… and the result.

Page 47: Recursive SQL

47

47

Case 11 – RI Children

A

B C D

E F G H I

J

An example of a hierarchy that show the referential integrity (RI) implemented using foreign keys. These relationships are visible in the catalog table SYSIBM.SYSRELS.

Notice the recursive relationship between B and itself, as well as the circular definition of constraints from D to H to J and back to D.

Page 48: Recursive SQL

48

48

Case 11 – RI Children SQL

WITH OC (LEVEL, PARENTTAB, CHILDTAB) AS (SELECT 0 , 'Z', ‘your start table'

FROM SYSIBM.SYSDUMMYUUNION ALL SELECT PARENT.LEVEL + 1, SUBSTR(CHILD.REFTBNAME,1,1) , SUBSTR(CHILD.TBNAME,1,1) FROM OC PARENT, SYSIBM.SYSRELS CHILD WHERE PARENT.CHILDTAB = CHILD.REFTBNAME AND CHILD.CREATOR = …AND CHILD.REFTBCREATOR = …AND CHILD.REFTBNAME <> CHILD.TBNAME AND PARENT.LEVEL < 10 )

Unicodein catalog

EliminateSelf-ref

SQL Continued on next slide

my children

SQL used for this purpose.

Page 49: Recursive SQL

49

49

Case 11 – RI Children SQL (cont)

SELECT DISTINCT OC.PARENTTAB, OC.CHILDTABFROM OC WHERE OC.LEVEL > 0 ORDER BY OC.PARENTTAB, OC.CHILDTAB

SQL continued.

Notice that the CTE generates much more data than necessary (e.g. node D is revisited multiple times) and the DISTINCT is needed to eliminate the duplicates. A very handy SQL enhancement would be for DB2 to allow us to specify DISTINCT (without level) within the CTE itself. V(future)) perhaps?

Page 50: Recursive SQL

50

PATRENTTAB CHILDTAB

A BA CA DB EB FC GC JD HD IH JJ D

50

Case 11 – RI Children Result

… and the result.

Page 51: Recursive SQL

51

51

Case 12 – Cheapest Fare

MCI

ORD STL

IAH DFW

Start City

EndCity

200100

100

50

25

50

150

500

Chicago

Kansas City

Houston

Dallas

St Louis

A case study to list and evaluate multiple paths.

Page 52: Recursive SQL

52

52

Case 12 – RECASE12 Table

FROM_CITY TO_CITY FAREMCI ORD 100MCI CDG 200MCI DFW 500ORD STL 50ORD IAH 100STL IAH 25STL DFW 150IAH DFW 50

RECASE12

Contents of the table RECASE12 that defines the available routes and the associated fares.

Page 53: Recursive SQL

53

53

Case 12 – Cheapest Fare SQL

WITH ROUTING (LEVEL, FROM_CITY, TO_CITY, CHAIN, FARE) AS( SELECT 0

, CAST('-->' AS CHAR(3) ) , CAST('MCI' AS CHAR(3) ) , CAST('-->MCI' AS VARCHAR(60) ) , 0

FROM SYSIBM.SYSDUMMY1 UNION ALL SELECT PARENT.LEVEL + 1 , CAST(SUBSTR(CHILD.FROM_CITY,1,3) AS CHAR(3) ), CAST(SUBSTR(CHILD.TO_CITY,1,3) AS CHAR(3) ) , PARENT.CHAIN CONCAT '->' CONCAT

CAST(SUBSTR(CHILD.TO_CITY,1,3) AS CHAR(3) ) , PARENT.FARE + CHILD.FARE FROM ROUTING PARENT, RECASE12 CHILD WHERE PARENT.TO_CITY = CHILD.FROM_CITY AND PARENT.LEVEL < 10)

SQL Continued on next slide

Builditinerary

Add fareFrom here to?

SQL used for this purpose.

Page 54: Recursive SQL

54

54

Case 12 – Cheapest Fare SQL (cont)

SELECT DISTINCT SUBSTR(ROUTING.CHAIN,4,56) AS ROUTE

, ROUTING.FARE FROM ROUTING WHERE ROUTING.LEVEL > 0 AND ROUTING.CHAIN LIKE '-->MCI%' AND ROUTING.CHAIN LIKE '%DFW' ORDER BY ROUTE

SQL continued.

The length of the generated path (60 bytes, of which 56 are reported) is arbitrary and truncated here to make the output readable.

Page 55: Recursive SQL

55

ROUTE FARE

MCI->DFW 500MCI->ORD->IAH->DFW 250MCI->ORD->STL->DFW 300MCI->ORD->STL->IAH->DFW 225MCI->STL->DFW 350MCI->STL->IAH->DFW 275

55

Case 12 – Cheapest Fare Result

… and the result.

A further enhancement (including the flight start and end days/times) would allow us to select connecting flights (e.g. next one must leave 1 hour to 4 hours from the arrival of the previous flight etc) and prepare a true itinerary like Travelocity or other search engines (… all with no pop-up ads!!).

Page 56: Recursive SQL

56

56

Case 13 – Account Linking

30,80

Start

4

80,1207

10,20,30 1

20,60,70 310,40,50 2

50,60,110 640,90

5

90,100 8 110,130 9

14010

140,15011

customeraccount

A case study to group accounts that share at least one customer. Such linked accounts may receive favorable pricing.

Page 57: Recursive SQL

57

57

Case 13 – Account Linking SQL

WITH LINKING (LEVEL, ACCT_NUM, CUST_NUM) AS( SELECT 0 , ACCT_NUM , CUST_NUM FROM RECASE13 WHERE ACCT_NUM = 4UNION ALL SELECT PARENT.LEVEL + 1 , CHILD2.ACCT_NUM , CHILD2.CUST_NUM FROM LINKING PARENT

, RECASE13 CHILD1 , RECASE13 CHILD2

WHERE PARENT.CUST_NUM = CHILD1.CUST_NUM AND PARENT.ACCT_NUM <> CHILD1.ACCT_NUM AND CHILD1.ACCT_NUM = CHILD2.ACCT_NUM AND PARENT.LEVEL < 5)

SELECT DISTINCT ACCT_NUM FROM LINKING

Starting acct num

Link by customer

Diff acctAll customers for that acct

SQL used for this purpose.

Page 58: Recursive SQL

58

58

1. Recursion basics 2. Case studies - mathematical 3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations

Where Are We?Where Are We?

OK, it is complex and compact but from a performance perspective, how does recursive SQL look? We will explore this issue by comparing it with procedural logic (COBOL program) that accomplishes the same tasks.

Page 59: Recursive SQL

59

59

Hierarchy

Level 0 : 1Level 1 : 4Level 2 : 16Level 3 : 64Level 4 : 256Level 5 : 1,024Level 6 : 4,096Level 7 : 16,384Level 8 : 65,536

(total 87,381 nodes)

Level 0

Level 1

Level 2

Level 8…..

The hierarchical structure used for the case study – each node with 4 children, 8 level deep.

Page 60: Recursive SQL

60

60

Table Structure

…ACCT_NUM INTEGERACCT_NAME CHAR(20)PARENT_ACCT_NUM INTEGERAMOUNT DEC(15,2)ACCT_LEVEL INTEGER

RECASE14

Index K0 (uniq)

Index K1

Index K2

Derived column

Table structure and the indexes available.

Page 61: Recursive SQL

61

61

Procedural Logic Setting Level

SELECT C.PARENT_ACCT_NUM , C.ACCT_NUM FROM RECASE14 P

, RECASE14 C WHERE P.ACCT_LEVEL = :WS-LOOP-LEVEL AND P.ACCT_NUM = C.PARENT_ACCT_NUMORDER BY PARENT_ACCT_NUM

, ACCT_NUM

SET LEVEL = 0 FOR ROOTFOR EACH LEVEL (0 TO 7 BY 1)

FOR EACH ROW OF CHILD-CURSOR UPDATE CHILD WITH LEVEL = PARENT + 1

NEXT CHILD NEXT LEVEL

my children

SQL for setting the level (used in all 3 cases) later.

Page 62: Recursive SQL

62

62

Procedural Logic for Org Chart

<<< using the pre-set level-number >>>

SELECT BOSS.LEVEL_NUM , BOSS.ACCT_NAME , SUB.ACCT_NAME

FROM RECASE14 BOSS , RECASE14 SUB

WHERE SUB.PARENT_ACCT_NUM = BOSS.ACCT_NUM ORDER BY BOSS.LEVEL_NUM , BOSS.ACCT_NUM, SUB.ACCT_NUM my

children

Logic for Org chart.

Page 63: Recursive SQL

63

63

Procedural Logic for Rollup

<<< using the pre-set level-number >>>

FOR EACH LEVEL 8 TO 1 BY -1FOR EACH ROW OF CHILD-CURSOR

ADD AMOUNT TO PARENT AND UPDATE NEXT CHILD

NEXT LEVEL

SHOW ALL ROWS

Logic for rollup.

Page 64: Recursive SQL

64

64

Procedural Logic for Allocate

<<< using the pre-set level-number >>>

FOR EACH LEVEL 0 TO 7 BY 1FOR EACH ROW OF PARENT-CURSOR

DETERMINE # OF CHILDREN FOR EACH ROW OF CHILD-CURSOR

ADD PRO-RATED AMOUNT AND UPDATE CHILD

NEXT CHILD NEXT PARENT

NEXT LEVEL

SHOW ALL ROWS

Logic for allocation.

Page 65: Recursive SQL

65

65

Benchmarks

Comparison of procedural logic with recursive logic

0

5

10

15

20

25

30

35

Org chart Roll up AllocateFunction

CPU

sec

ProceduralRecursive

…and a comparison of they stack up.

Unlike the Dan Luksetich case mentioned earlier, my results are not dazzling in favor of recursive SQL – it is better, but not by an order of magnitude.

Admittedly, other variables will also impact the benchmarks. For example, if the procedural logic is needlessly complex or inefficient, recursive SQL can look much better. I have avoided a biased approach – the best SQL and the best procedural logic are being compared. Other variables that could affect are the presence of indexes, depth of the hierarchy and how balanced the hierarchy is (very evenly balanced in this “lab” exercise). Perhaps, the size and number of SORTWORK datasets for DSNDB07 may also affect the performance.

Page 66: Recursive SQL

66

66

1. Recursion basics 2. Case studies - mathematical 3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations

Where Are We?Where Are We?

Let’s summarize what we have learned and point out some pitfalls.

Page 67: Recursive SQL

67

67

Limiting the Depth

♦ Prone to infinite cycles unless controlled properly ♦ DB2 imposes no limit on the depth of recursion but issues a

warning (SQLSTATE = 01605, SQLCODE=+347) for an SQL statement that does not use a control variable to limit the depth

Warning is essentially useless in SPUFI (displayed too late)♦ Warning is based on the absence of:

An integer column that increments by a constantA predicate of the type LEVEL < constant or LEVEL < :hvLEVEL < (subselect query) is allowed but issues a warningThe SQL parser logic is quite primitive – for example:

• a loop from 10 to 1 is not infinite but will still generate a warning• Subtracting -1 (same as adding 1) will still generate a warning

The potential to create an infinite loop is ever present since DB2 will not limit it to a specified depth.

Page 68: Recursive SQL

68

68

Gotchas

A

B C D

E F G H I

J

Danger lurks around the corner! Lets’ revisit case 11.

Page 69: Recursive SQL

69

69

Causing Infinite Loops

DSNT404I SQLCODE = 347, WARNING: THE RECURSIVE COMMONTABLE EXPRESSION OC MAY

CONTAIN AN INFINITE LOOPDSNT418I SQLSTATE = 01605 SQLSTATE RETURN CODE DSNT415I SQLERRP = DSNXODML SQL PROCEDURE DETECTING ERROR DSNT416I SQLERRD = 0 0 0 1209090530 0 0 SQL DIAGNOSTIC INFORMATION DSNT416I SQLERRD = X'00000000' X'00000000' X'00000000' X'481141E2'

X'00000000' X'00000000' SQL DIAGNOSTIC INFORMATION

LEVEL PARENTTAB CHILDTAB

Removed

WITH OC (LEVEL, PARENTTAB, CHILDTAB) AS

(SELECT 0 , 'Z', ‘your start table' FROM SYSIBM.SYSDUMMYU

UNION ALL SELECT PARENT.LEVEL + 1, SUBSTR(CHILD.REFTBNAME,1,1) , SUBSTR(CHILD.TBNAME,1,1) FROM OC PARENT, SYSIBM.SYSRELS

CHILD WHERE PARENT.CHILDTAB =

CHILD.REFTBNAME AND CHILD.CREATOR = …AND CHILD.REFTBCREATOR = …AND CHILD.REFTBNAME <>

CHILD.TBNAME AND PARENT.LEVEL < 10 )

Without the extra protection offered by the level clause, the warning and possible infinite loop.

Page 70: Recursive SQL

70

70

Recommendations

♦ Use controls to limit the depth of recursion♦ Consider possible future changes that could cause infinite loops

(e.g. recursive or circular references)♦ Very useful for reporting data that does not exist!♦ Allows looping logic that would otherwise require procedural

code or SQL Procedures ♦ In most cases, simplifies processing logic♦ In specific instances, can lead to a huge performance gain

Powerful but dangerous in the wrong hands.

Page 71: Recursive SQL

71

71

But it seems too hard…

“Don't blame me for the fact that competent programming, as I view it as an intellectual possibility, will be too difficult for 'the average programmer', you must not fall into the trap of rejecting a surgical technique because it is beyond the capabilities of the barber in his shop around the corner.”

E. F. Dijkstra

♦ Seems too complex for “the average programmer”?♦ Perhaps, but consider what Prof Dijkstra had to say…

Just because it is not for the masses, let it not stop you.

Page 72: Recursive SQL

72

1. Recursion basicsIntroducing the new feature

2. Case studies - mathematical String of numbers, factorial, primes and FibonacciIt’s not that hard!

3. Case studies - businessOrg chart, Generating test data, Missing data, Rollup, Even allocation, Weighted allocation, RI children, Cheapest fare, Account linkingA little complex but compact

4. Performance aspectsOrg chart, rollup, allocateCompact and a better performer

5. Pitfalls and recommendationsPowerful but dangerous!

72

Summary

I trust this session has empowered you with the knowledge to exploit Recursive SQL fully. Good Luck!

Page 73: Recursive SQL

73

73

References

1. Recursive SQL for Dummies – B.L. “Tink” Tysor - IDUG NA 2005 - Session A1

2. Recursive SQL – Unleash the Power! – Suresh Sane - IDUG EU 2007 - Session E5

3. DB2 UDB for z/OS V8– Everything You Ever Wanted to Know, … and More – SG24-6079

4. DB2 UDB for z/OS Version 8 Performance Topics – SG24-6465

5. Having Fun with Complex SQL – Suresh Sane - IDUG AP 2005 - Session B5

6. Parlez-Vous Klingon – Alexander Kopac - IDUG NA 2007 -Session ALT

7. “Rinse, Lather, Repeat: Utilizing Recursive SQL on DB2 UDB for z/OS” – Daniel L. Luksetich – IDUG Solutions Journal May 2004

Some of the many useful references.

Page 74: Recursive SQL

74

74

Recursive SQL –Unleash the Power!

Not really…in the unlikely event that we actually have time ….I doubt it!

Page 75: Recursive SQL

75

75

Suresh SaneDST Systems Inc.

[email protected]

Session G13Recursive SQL –Unleash the Power!

Thank you and good luck with Recursive SQL!