To REORG or not to REORG That is the Question

44
To REORG or not to REORG That is the Question Kevin Baker BMC Software

description

 

Transcript of To REORG or not to REORG That is the Question

Page 1: To REORG or not to REORG That is the Question

To REORG or not to REORGThat is the Question

Kevin BakerBMC Software

Page 2: To REORG or not to REORG That is the Question

2

Objectives

› Identify I/O performance trends for DB2 pagesets

› Correlate reorganization benefits to I/O performance trends

› Understand the methods for collecting I/O performance data

› Identify object and application metrics that help identify objects in need of reorganization

› Establish a process for identifying application pagesets for analysis

Page 3: To REORG or not to REORG That is the Question

3

Why Pageset Organization matters - the basics

› Its all about your data and the I/O required to access it.

› DB2 computes the best access path for an SQL statement to minimize I/O wait time.

› The access path chosen is primarily influenced by catalog statistics about the referenced tables and indexes.

› DB2’s primary mechanism for avoiding I/O wait time is asynchronous I/O via PREFETCH.

› PREFETCH is most effective in situations involving sequential processing, even if in small bursts, and is significantly affected by the physical ordering of the pages on disk.

Page 4: To REORG or not to REORG That is the Question

4

About Access Paths - Prefetch

› Given viable indexes, the DB2 Optimizer will try to use PREFETCH to read in the data pages asynchronously.

› If it can determine that sequential processing is reasonable when the access path is determined, it will call for SEQUENTIAL PREFETCH.

› If it identifies the need for a lot of specific records that are not sequentially located it may call for RID-based LIST PREFETCH.

› Even if the access path calls for random access (which involves synchronous I/O), runtime monitoring may invoke DYNAMIC PREFETCH if the pages requested appear to be even loosely sequential.

Page 5: To REORG or not to REORG That is the Question

5

About Access Paths - BIND

› For STATIC SQL, the access path is determined at the time the program containing the SQL goes through the BIND process.

› This means that the table and index statistics in the catalog at the time of the BIND determine the access path, and it remains fixed until the next time a BIND is done.

› For DYNAMIC SQL, the access path is set by the PREPARE process every time the statement is executed (more or less).

› This means that the access path can change from run to run if the relevant catalog statistics are updated.

Page 6: To REORG or not to REORG That is the Question

6

About Access Paths - Statistics

› There are several dozen statistics, kept in various catalog tables, that are used by the DB2 optimizer to select access paths.

› Many of these influence the choice of indexes, order of joins, etc.

› For the purpose of this discussion, we are interested in just a few that indicate organization level of the table or index.

› We will cover some of the statistics traditionally used to recommend REORGS later in the presentation.

Page 7: To REORG or not to REORG That is the Question

7

About Access Paths - Statistics

› Tables (TABLESPACES/PARTITIONS)– CARD (Cardinality) – the number of rows in the tablespace or partition– FARINDREF – the number of rows relocated far from their original page

› Indexes (INDEXSPACES)– CLUSTERRATIO – the percentage of rows in clustering order – LEAFFAR – the percentage of leaf pages physically located far from the

previous leaf page accessed in an index scan.

Page 8: To REORG or not to REORG That is the Question

8

Impact on SQL Performance

› To explore the impact on SQL performance we set up some special tables, indexes, and SQL workloads.

› To minimize variables:– We used test DB2 subsystems with stable configurations throughout the

testing.– We isolated measured pagesets to their own buffer pools that were large, and

cleared prior to each run.– We avoided workloads that would cause dis-organization to an extent that

would drastically affect access path.

› Factors explored were inserts, updates that relocate rows, and free space.

Page 9: To REORG or not to REORG That is the Question

9

Case 1: Dynamic Sequential after Updates

› DB2 version 8› 100k row table, no freespace, clustering index› DYNAMIC SELECT workload; returns 10k rows

– access path using index and sequential prefetch› UPDATE workload that updates 5k random rows in such a way that

the rows have to be relocated.› RUNSTATS for the table and index done after each update workload

and key statistics captured.› Access performance statistics for the SELECT workloads were

gathered for both the table and index.

Page 10: To REORG or not to REORG That is the Question

10

Case 1: Dynamic Sequential after Updates

187651422801000100000861272828414FINAL RUN

REORG

36812114244223801001433510000012666566122023Run10

36812114244223801001298910000012036060220583Run9

36812114244223801001162910000011965453319204Run8

36812114244223801001026510000011064946917574Run7

3681211424422380100888710000010634640716044Run6

3681211424422380100752410000010304433214564Run5

368121142442238010061421000009554128313205Run4

368121142442238010046881000009243822211765Run3

370121122422237210032311000008673214410377Run2

33812112242222861001696100000812288487910Run1

155551252501000100000733232699350Initial Run

pgsreqSIOFARratioDREFpgsreqSIO

PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/

Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)

Page 11: To REORG or not to REORG That is the Question

11

Case 2: Dynamic Random after Updates

› DB2 version 8

› 100k row table, no freespace

› DYNAMIC SELECT workload; returns 10k rows– access path completely random

› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.

› RUNSTATS for the table and index done after each update workload.

Page 12: To REORG or not to REORG That is the Question

12

Case 2: Dynamic Random after Updates

0074730234010001000000098110031FINAL RUN

REORG

008863046323801001433510000000122912961Run10

009003051323801001298910000000122712791Run9

008883052323801001162910000000119312501Run8

008873052323801001026510000000118212311Run7

00911305432380100888710000000112811781Run6

00908306032380100752410000000110511571Run5

00894305232380100614210000000107811301Run4

00912305532380100468810000000106811091Run3

00878304532372100323110000000103710811Run2

0088630483228610016961000000099010301Run1

0071430274010001000000096610031Initial Run

pgsreqSIOFARratioDREFpgsreqSIO

PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/

Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)

Page 13: To REORG or not to REORG That is the Question

13

Case 3: Dynamic Sequential after Inserts

› DB2 version 8

› 100k row table, no freespace, clustering index

› DYNAMIC SELECT workload; returns 10k rows– access path using index and sequential prefetch

› Insert workload that inserts 100 random rows each run.

› RUNSTATS for the table and index done after each insert workload.

Page 14: To REORG or not to REORG That is the Question

14

Case 3: Dynamic Sequential after Inserts

155551212401000101000733232681341FINAL RUN

REORG

186610722122389901010007268551232446Run10

186610822322389901009007308147217746Run9

186610922522389901008006996243202847Run8

186611122822389901007007316137187551Run7

187711223022389901006007315831170655Run6

187711323222389901005007315425154462Run5

187711323522389901004007315320136368Run4

189811223522349901003007313414120186Run3

202810021921969901002007313591024114Run2

1967751953142990100100733303848283Run1

155551252501000100000733232698349Initial Run

pgsreqSIOFARratioDREFpgsreqSIO

PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/

Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)

Page 15: To REORG or not to REORG That is the Question

15

Case 4: Dynamic Random after Inserts

› DB2 version 8

› 100k row table, no freespace

› DYNAMIC SELECT workload; returns 10k rows– access path completely random

› Insert workload that inserts 100 random rows each run.

› RUNSTATS for the table and index done after each Insert workload.

Page 16: To REORG or not to REORG That is the Question

16

Case 4: Dynamic Random after Inserts

00132302023010001010000058710132FINAL RUN

REORG

002403033132389901010000059111132Run10

002403060132389901009000057310782Run9

002413066132389901008000056910802Run8

002393057132389901007000057010732Run7

002363055132389901006000055910642Run6

002413049132389901005000056110422Run5

002423046132389901004000056110492Run4

002343047132349901003000054110182Run3

002213045141969901002000054810242Run2

001943042161429901001000054910112Run1

00123302225010001000000054610032Initial Run

pgsreqSIOFARratioDREFpgsreqSIO

PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/

Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)

Page 17: To REORG or not to REORG That is the Question

17

Case 5: Dynamic Sequential after Updates without RUNSTATS

› DB2 version 8

› 100k row table, no freespace, clustering index

› DYNAMIC SELECT workload; returns 10k rows– access path using index and sequential prefetch

› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.

› No RUNSTATS between runs so the catalog statistics are not beingupdated.

Page 18: To REORG or not to REORG That is the Question

18

Case 5: Dynamic Sequential after Updates without RUNSTATS

187651422801000100000861272828414FINAL RUN

REORG

3681211424420100010000012666566122023Run10

3681211424420100010000012036060220583Run9

3681211424420100010000011965453319204Run8

3681211424420100010000011064946917574Run7

3681211424420100010000010634640716044Run6

3681214424420100010000010304433214564Run5

368121142442010001000009554128313205Run4

368121142442010001000009243822211765Run3

370121122422010001000008673214410377Run2

33812112242201000100000812288487910Run1

155551252501000100000733232698349Initial Run

pgsreqSIOFARratioDREFpgsreqSIO

PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/

Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)

Page 19: To REORG or not to REORG That is the Question

19

Case 6: Static Sequential after Updates without Rebind or RUNSTATS

› DB2 version 8

› 100k row table, no freespace, clustering index

› STATIC SELECT workload; returns 10k rows– access path using index and sequential prefetch

› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.

› No RUNSTATS and no REBIND between runs

› Final runs after REORG, and then after RUNSTATS and REBIND

Page 20: To REORG or not to REORG That is the Question

20

Case 6: Static Sequential after Updates without Rebind or RUNSTATS

187651422801000100000861272828414

After REORG, STATS, REBIND

160681421801000100000832275828166Final run after

REORG

002422441010001000007362390422022Run10

002422441010001000007362381220583Run9

002422441010001000007362372419203Run8

002422441010001000007362362517573Run7

002422441010001000007362353716043Run6

002422441010001000007362345014563Run5

002422441010001000007362337013204Run4

002422441010001000007362328611764Run3

002402421010001000007362320410375Run2

00240242101000100000736231078798Run1

160581251601000100000704235698140Initial Run

pgsreqSIOFARratioDREFpgsreqSIO

PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/

Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)

Page 21: To REORG or not to REORG That is the Question

21

Case 7: Dynamic Sequential after Updates, no RUNSTATS, with FREESPACE

› DB2 version 8

› 100k row table, – table PCTFREE = 10; index PCTFREE = 5

› Clustering index

› Dynamic SELECT workload; returns 10k rows– access path using index and sequential prefetch

› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.

› No RUNSTATS between runs

Page 22: To REORG or not to REORG That is the Question

22

Case 7: Dynamic Sequential after Updates, with FREESPACE

187651493001000100000957302926463FINAL RUN

REORG

28410126259223801001433510000010275727516166Run10

2841012625922380100129891000009675220914717Run9

2761012625922380100116291000009525014913389Run8

2761012625922380100102651000009024689119013Run7

27691202532238010088871000008293821102849Run6

2769100233223801007524100000829353928309Run5

214778306323801006142100000829282862431Run4

187654182323801004688100000829272814407Run3

187628156623721003231100000829272790395Run2

187681361722861001696100000829262780390Run1

187651332701000100000829262778389Initial Run

pgsreqSIOFARratioDREFpgsreqSIO

PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/

Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)

Page 23: To REORG or not to REORG That is the Question

23

Chart of Case1: Sequential after Updates without Freespace

Page 24: To REORG or not to REORG That is the Question

24

Chart of Case7: Sequential after Updates with Freespace

Page 25: To REORG or not to REORG That is the Question

25

Conclusions about SQL Performance and Pageset Organization

› Relocating update activity reduces organization, significantly affects sequential performance ,

– can cause some increases in workload due to increased row sizes,› Insert activity reduces organization, significantly affects sequential

performance, – can also increase workload (rows fetched)

› Random table access will not be significantly affected by reduced organization or improved by REORGs

– Index access can be degraded although this is usually a lesser impact› Freespace delays worst performance impacts,

– at cost of unused disk space

Page 26: To REORG or not to REORG That is the Question

26

Conclusions about SQL Performance and Pageset Organization

› Typical performance impacts include– Increased getpage activity

• Can also be caused by increased workloads– Increased sync I/Os, increased sync I/Os per getpage

• Can be masked by buffer pool tuning

› Updated statistics can help optimizer compensate – Dynamic SQL, Static SQL if rebound– Can also cause unexpected access path changes; could make things much

worse – RUNSTATS causes statement invalidation in the Dynamic statement cache.

Page 27: To REORG or not to REORG That is the Question

27

Case 8: DB2 9 - Dynamic Sequential after Updates, no FREESPACE

› DB2 version 9

› 100k row table, no freespace, clustering index

› DYNAMIC SELECT workload; returns 10k rows– access path using index and sequential prefetch

› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.

› RUNSTATS for the table and index done after each update workload.

Page 28: To REORG or not to REORG That is the Question

28

Case 8: DB2 9 - Dynamic Sequential after Updates, no FREESPACE

224781952401000100000864275828166FINAL RUN

REORG

0038238413768100141111000007362393221582Run10

0038238413754100127821000007362384120182Run9

0038238413742100114521000007362375418873Run8

0038138313720100101031000007362365817263Run7

003803821365810087491000007362356915793Run6

003753771355210074171000007362347914363Run5

003663681338210060571000007362338913023Run4

003463481306210046371000007362329711654Run3

3212823161250810032231000007362321010295Run2

224781267315621001707100000736231138768Run1

224781952401000100000736235698140Initial Run

pgsreqSIOFARratioDREFpgsreqSIO

PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/

Thread Statistics (indexspace)Runstats Statistics

Thread Statistics (tablespace)

Page 29: To REORG or not to REORG That is the Question

29

Case 9: DB2 9 - Dynamic Random after Updates, no FREESPACE

› DB2 version 9

› 100k row table, no freespace

› DYNAMIC SELECT workload; returns 10k rows– access path completely random

› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.

› RUNSTATS for the table and index done after each update workload.

Page 30: To REORG or not to REORG That is the Question

30

Case 9: DB2 9 - Dynamic Random after Updates, no FREESPACE

0082430264010001000000097810031FINAL RUN

REORG

009683054337681001411110000000122812811Run10

009513039337541001278210000000121012721Run9

009553032337421001145210000000119812611Run8

009553038337201001010310000000115712141Run7

00938303733658100874910000000113911861Run6

00949304233552100741710000000109311311Run5

00939304033382100605710000000108211211Run4

00945304233062100463710000000104710801Run3

00932302832508100322310000000102810671Run2

00881303031562100170710000000100410401Run1

0082030204010001000000097310031Initial Run

pgsreqSIOFARratioDREFpgsreqSIO

PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/

Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)

Page 31: To REORG or not to REORG That is the Question

31

DB2 9 and Real-Time Statistics (RTS)

› DB2 version 9 has added the ability to dynamically maintain pageset statistics in separate catalog tables.

– SYSIBM.SYSTABLESPACESTATS– SYSIBM.SYSINDEXSPACESTATS

› These values are maintained without RUNSTATS› The Optimizer does not use them for access path analysis› Many of the statistics are relative to the last REORG › With an understanding of when they get updated, they are basically

always available. › These are available in v8 too. (even v7!) You just have to do some

work to set them up.

Page 32: To REORG or not to REORG That is the Question

32

Some RTS Statistics

› The RTS tables have the basic statistics we have been looking at:

– SYSIBM.SYSTABLESPACESTATS– REORGLASTTIME – timestamp of last REORG– REORGNEARINDREF – number rows relocated since REORG but near the

original page– REORGFARINDREF - number rows relocated since REORG far from the

original page– SYSIBM.SYSINDEXSPACESTATS– NLEVELS - number of index levels– REORGLEAFNEAR – number of leaf pages relocated but near its previous

logical leaf page– REORGLEAFFAR – number of leaf pages relocated far from its previous

logical leaf page

Page 33: To REORG or not to REORG That is the Question

33

Methods for deciding when to REORG

› Typical methods include– Automatically on fixed schedule– When certain catalog statistics breach a threshold

• Change in cardinality and degraded cluster ratio• Degraded page ordering (FARINDREF)• Degraded leaf page distribution, ordering, levels

› We can save resources used for REORGs if we– Correlate those statistics with performance data– Only REORG if and when needed

Page 34: To REORG or not to REORG That is the Question

34

Collecting Performance Data

› DB2 IFCID 199 contains key statistics for each pageset with at least 1 I/O per second average in a stat cycle:

– DBID, PSID, partition – Getpage counts– Sync I/Os– Async I/Os – Async pages read

› If activated, these records are produced on the regular DB2 statistics cycle.

Page 35: To REORG or not to REORG That is the Question

35

Collecting Performance Data

› A custom program, SAS procedure, or vendor tool can be used to– collect these records periodically (e.g. daily), – summarize the numbers by pageset, – add them to a performance statistics table,

› This performance statistics table can have columns for date-time, DBID, PSID, Partition, getpages, sync I/Os, async I/Os, async pages read.

› It’s also necessary to know when REORGS are done – add a column to indicate a REORG event

Page 36: To REORG or not to REORG That is the Question

36

Real-Time Statistics

› With the availability of the RTS tables, – it is now possible to capture key organization statistics,– on a daily basis,– easily correlate with performance statistics.– Could capture them at the same time the performance statistics are

summarized for the day.– Could keep them in the same table.– Solves the problem of capturing last REORG time.

› Now possible to do percentage change calculations on catalog statistics as well as performance statistics.

Page 37: To REORG or not to REORG That is the Question

37

Collecting Performance Data - sample table layout

› CREATE TABLE PAGESET_PERFORMANCE_TABLE › (COLLECT_TIME TIMESTAMP, › DBID SMALLINT, › PSID SMALLINT,› PART SMALLINT, › BPID SMALLINT, › DBNAME CHAR(8),› PSNAME CHAR(8), › TYPE CHAR(1), › REORGLASTTIME TIMESTAMP, › TABLE_FARINDREF INTEGER, › TABLE_NEARRINDREF INTEGER, › TABLE_REORGUNCLUSTINS INTEGER, › TABLE_TOTALROWS INTEGER,› INDEX_REORGLEAFFAR INTEGER, › INDEX_REORGLEAFNEAR INTEGER, › INDEX_NLEAF INTEGER,› INDEX_TOTALENTRIES INTEGER,› GETPAGES INTEGER, › SYNC_IO INTEGER, › GPPERSIO INTEGER,› ASYNC_IO INTEGER, › ASYNC_PAGES INTEGER);

Page 38: To REORG or not to REORG That is the Question

38

Developing REORG Triggers

› Performance data analysis can be used either– to recommend REORGS as needed, or– to study the results of REORGS and use the information to adjust fixed

schedules.› Either way, the analysis usually depends on a “trigger”, which is a

metric or formula threshold that is used to decide whether a REORG is needed or not.

› The threshold part of these triggers often have to be tailored to the needs of each application.

› Note that physical state and storage use triggers, such as extents, percentage of dropped table rows, etc. are indicators that represent issues unrelated to performance.

Page 39: To REORG or not to REORG That is the Question

39

Developing REORG Triggers

› Recommendations from the Administration Guide

– Table spaces, REORG if• More than 10% of rows relocated far • If clustering index, then CLUSTERRATIO< 90%

– Else number referenced rows far from optimal > 10%

– Index spaces, REORG if• More than 10% of active leaf pages are far from optimal position• The average distance between consecutive leaf pages exceeds 2• More than a designated percentage of rows have been inserted or deleted

Page 40: To REORG or not to REORG That is the Question

40

Developing REORG Triggers

› Factoring in degraded performance…

› Tables paces or index spaces, REORG if– Baseline prefetch pages is > 2 x sync I/Os– And sync I/Os have increased 20% since baseline– And getpages per sync I/O have fallen 20%

› Any pageset nominated for REORG by the performance triggers and the catalog statistics triggers is a good candidate for REORG.

Page 41: To REORG or not to REORG That is the Question

41

Developing REORG Triggers

› Post REORG analysis – If using regularly scheduled REORGS, analysis of the performance data a day

after the reorganization can indicate the degree of improvement.

› Interesting data points: (getpages, sync I/O, async pages, and getpages per sync I/O)

– Before the REORG, % increase in metrics since the last REORG (baseline).– After the latest REORG, % decrease in metrics.– Small values indicate REORG was too soon or not needed at all.– Large values mean the REORG was too late.

Page 42: To REORG or not to REORG That is the Question

42

Putting it all Together

› Activate Real-Time Statistics in DB2 – (pre v9, do setup work)

› Activate DB2 statistics class 8 and begin recording IFCID 199 data › Create the pageset performance table.› Setup daily job to summarize 199 data, collect RTS data, and populate

the pageset performance table.› Either use the data to

– adjust REORG schedules, – directly trigger REORGS, or – perform post REORG analysis on benefits.

Page 43: To REORG or not to REORG That is the Question

43

Conclusion

› With a little work it is possible to setup a process to capture pageset performance statistics and real-time object statistics.

› With this data better triggers can be developed that only recommend REORGS when performance has been degraded.

› Post REORG analysis of this data can help to refine trigger thresholds or adjust schedules to balance performance versus costs.

Page 44: To REORG or not to REORG That is the Question

44

Kevin BakerBMC Software, [email protected]