Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP...

22
Type 2 Slowly Changing Dimensions: A Case Study Using the Co>Operating System Craig Stanfill Ab Initio Software DOLAP’12, Maui

Transcript of Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP...

Page 1: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Type 2 Slowly Changing Dimensions:

A Case Study Using the Co>Operating System

Craig StanfillAb Initio SoftwareDOLAP’12, Maui

Page 2: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Overview

The Co>Operating System• Ab Initio’s parallel computing framework• Based on partitioned dataflow• Graphic programming

A little about what graphs really look like• Primary computation• Secondary computations: “Salad Dressing”• Five solutions to “Type 2 Slowly Changing Dimensions”

Performance:• Scalability• Insights into the important tradeoffs for optimization

Page 3: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

The Co>Operating System

Parallel framework for Enterprise Computing• Widely used for ETL, Data Warehousing• Widely used for Mission Critical Realtime Apps• Stock Exchanges, Telecommunications, Credit Card

Processing• Batch, Streaming, Service, Transactional Modes

Compared with MapReduce:• Broader applicability• More built-in functionality• Handles extreme levels of complexity

Page 4: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Parallel Graphic Dataflow

Real World: Deal with Errors, Log Output, Auditing Etc

Page 5: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Graphs Nest Very Deeply

9 Levels Deep; 33 Subgraphs; 259 Components

Page 6: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

The Problem: Type 2 Slowly Changing Dimension

RawDimension

Old SurrogateKey Map

Join + SaladDressing

CookedDimension

New SurrogateKey Map

Other AnnoyingStuff

Page 7: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

3 Cases

Raw Old Cooked NewDimension Keymap Dimension Keymap

Initial Huge Empty Huge BigFull Reload Huge Big Small BigIncremental Reload Small Big Small Big

Which cases do you optimize for?• Initial Load: Big Job, Only Once• Full Reload: Room for optimization• Incremental: Lots of room for optimization

(May not be a viable option)

Page 8: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Solution 1: Sort Merge Join

Depth: 2Subgraphs: 3 (all shared)Components: 27

Page 9: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Salad Dressing: Handle Dups and Rejects

This is a reusable subgraph

Page 10: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Salad Dressing: Audit and Update Surrogate Key SeedRead paper for how we generate surrogate keys

Page 11: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Solution 2: Hybrid Hash Join

Page 12: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Solution 3: Lookup Files

Page 13: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Solution 3: Lookup File Optimization

Page 14: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Solution 4: Keep Surrogate Keys in Database

Page 15: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Solution 5: Stored Procedure

Page 16: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Wall Clock Time (32 ways parallel)

1% 10% 100% 1000% 10000%

Load

Full Update

Incremental

MJ HJ LU SQL SP

Page 17: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Cores Used

-

4

8

12

16

20

24

28

32

1 2 4 8 16 32

MJ HJ LU SQL SP

Page 18: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Cores Used: MPP (Lookup Solution, Initial Load)

-

16

32

48

64

80

96

112

128

1 2 4 8 16 32 64 128

1 Node 4 Nodes

Page 19: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

CPU Time/Record

0%

50%

100%

150%

1 2 4 8 16 32

MJ HJ LU SQL SP

Page 20: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

CPU Time/Record (Lookup Solution, Initial Load)

0%

50%

100%

150%

1 2 4 8 16 32 64 128

1 Node 4 Nodes

Page 21: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Copyright (c) Ab Initio Software LLC.

Pipeline Factor

-

0.50

1.00

1.50

2.00

1 2 4 8 16 32

MJ HJ LU SQL SP

Page 22: Type 2 Slowly Changing Dimensions - unibo.itbias.csr.unibo.it/golfarelli/dolap2012/Slides/DOLAP 2012... · 2012-12-03 · Type 2 Slowly Changing Dimensions: A Case Study Using the

Questions?