IQ Indexes

66
IQ Indexes

description

2. IQ Indexes. Indexes. Typical RDBMS uses B-tree index usually a separate structure in addition to data consists of data pages with values and pointers Sybase (and others) has special type which contains pointers and data pages B-tree indexes are expensive - PowerPoint PPT Presentation

Transcript of IQ Indexes

Page 1: IQ Indexes

IQ Indexes

Page 2: IQ Indexes

Indexes

• Typical RDBMS uses B-tree index– usually a separate structure in addition to data– consists of data pages with values and pointers– Sybase (and others) has special type which contains

pointers and data pages

• B-tree indexes are expensive– consume space and time (to build)– require maintenance after data refresh– not useful with low cardinality data columns

Page 3: IQ Indexes

B-Tree – an example

Let’s assume a table with key values from 1 to 8 – and build the B-Tree

4

2

1

1 2 4

3

3

6

5

5 6 8

7

7

The decision tree is now very simple. The question asked is “Is the required value less than or equal to the value in the tree?”If it is take the left fork else go right

Leaf Level (Data)

Index Levels - Keys

Page 4: IQ Indexes

IQ-M Indexes

• Indexes are the data (a data access method)– no separate data store

• A column is likely to have multiple indexes– always has at least one index

• Index selection for a column is based upon– Number of discrete values (cardinality)– Usage in queries– To some extent, the column’s data type

Page 5: IQ Indexes

Index Types – In total now 10!

• Default Index (aka Fast Projection or FP)– Raw data compressed on disk– comes in 3 flavours

• Low Fast (LF)– Bit map index

• High Non Group (HNG)– Bit-wise index

• High Group (HG)– G-Array (relative of a B-tree)– Comes in 3 flavours

• Compare Index (CMP)– Column Compare Index

• Word Index (WD)– Sort of “Free Text” Index

Page 6: IQ Indexes

Default Index - FP

• Created automatically by Create Table

• Used for– Ad-Hoc joins– String searches– Certain calculations– Projection of data

• This index cannot be created or dropped

Page 7: IQ Indexes

IQ Unique Constraint - 1

Create Table department(dept_id char(4) Not Null IQ Unique(200),emp_lname varchar(25) Not Null IQ Unique(75000),

• Improves load processing– IQ will construct 1-byte FP index for dept_id– Won’t try either a 1-byte or 2-byte FP on emp_lname

Too many values for either

Page 8: IQ Indexes

IQ Unique Constraint - 2

• In terms of query performance the 1-byte and 2-byte FP indexes can speed up the server

• The first method is that searching an FP (of any type) can be done in parallel, which may be faster than a so-called fast index (LF and HG)

• The second speed up is that for a LIKE operator the 1-byte and 2-byte FP can be faster than an HNG index– Based on the width of the search predicate– And the width of the FP index (1 or 2 bytes)

Page 9: IQ Indexes

FP Index Types

• Depending on cardinality and the use of IQ UNIQUE, IQ-M will initially construct FP index in one of 3 ways

• If > 65536 distinct values in column– Flat FP index

• If < 256 distinct values– 1-byte FP

• Between 256 and 65536 distinct values– 2-byte FP

Page 10: IQ Indexes

Flat FP Index

• More than 65536 values in a column

• The raw data is compressed (on disk)

Color

Red

Blue

Green

Red

Page 11: IQ Indexes

1-byte FP Index

• Less than 256 values in a column

• A one byte lookup table is built

Data Values

Red

Blue

Green

Red

Color

Red

Blue

Green

1

2

3

1

1

1

2

3

3

3

2

Lookup TableData

Page 12: IQ Indexes

2-byte FP Index

• Between 256 and 65536 values in a column 4 pages

• A two byte lookup table is built

Data Values

Red

Blue

Green

Red

Color

Red

Blue

Green

1

2

3

1

1

1

2

3

3

3

2

Lookup Table Data

1

1

1

2

3

3

3

2

1

2

3

Page 13: IQ Indexes

FP Indexes

• During load, IQ-M will try to build Flat FP first, unless otherwise specified

• If you specify a low cardinality 1-byte FP at the start (by using IQ UNIQUE) then this will– *NEW in 12.4.3* Resorts to a 2-byte FP after 256 values or n pages

see two slides on…(not as we used to think directly to a flat FP)

• If you specify a 2-byte FP at the start (by using IQ UNIQUE) then this will– Resort to Flat FP after approx. 16500 values or n pages see two

slides on…

Page 14: IQ Indexes

FP Index Growth

• Although an FP can, as the cardinality grows, change (1-byte FP>2-byte FP) or (2-byte FP>Flat FP) it can never revert

• There is a high cost in conversion – either way

• The only backwards conversion is to drop the column (not the index), recreate and reload (Expensive)

Page 15: IQ Indexes

1-Byte FP to Flat FP

• FP_Number_Lookup_Pages Def 1

• This option controls the number of pages (in 12.4) in the 1-byte lookup table

• When the lookup table grows beyond this the index will flip to a 2-byte FP

• Do not make this more than 4 pages – or you will use a vast amount of memory for each and every FP – but you may prove me wrong!

• This was we use PageList for the storage of 1-byte in 12.4.

Page 16: IQ Indexes

Limit for 2-byte FP - 1

• FP_Lookup_Size Def 32767

• This option controls the number of discrete values that a 2-byte FP can contain, as a maximum Note this is in Kbytes

• This is the new 12.4.2 and beyond option for constraining the size of a 1- or 2- byte FP

• By default a 2-byte FP to Flat FP flip will occur when the lookup table grows beyond 32 MBNote – If you have a pre-12.4 1 byte FP it is constrained by

FP_Number_Lookup_Pages

Page 17: IQ Indexes

Limit for 2-byte FP - 2

• FP_Lookup_Size Def 32767

• For a bigint of 8 bytes, 65536 entries take only around 1+ MB.

• For a max varchar of 255 bytes, 65536 entries take no more than 18 MB.

• It is critical to keep the entire ByteStore (lookup table) in memory for performance reason.

Page 18: IQ Indexes

IQ UNIQUE – A catch - 1

• If you have specified a very incorrect value for the IQ UNIQUE and the real number of discrete values is high there can be a problem

Page 19: IQ Indexes

IQ UNIQUE – A catch - 2

• Say you have specified IQ UNIQUE 100, and the column has 1.2 million distinct values– The FP will start as a 1-byte, failover to a 2-byte then failover

to a flat FP– Now because you “thought” the column only had a cardinality

of 100 you did not put a fast index on it– Then the optimizer has no way to determine the exact

cardinality of the column. It knows that the IQ UNIQUE value is wrong (because there is a flat FP index), so it doubles the FP-lookup-size value and assumes this is the cardinality.

– If you had specified the IQ UNIQUE as greater than FP-lookup-size then the optimizer has to assume the IQ UNIQUE value is the cardinality

Page 20: IQ Indexes

Use of 1 and 2-byte FP Indexes - 1

• Even with the “catch” noted above the lookup FP provides exact counts for distinct values of the optimizer– This may mean you do not need an LF or HG

• Projection of the column can be significantly faster, because of fewer disk reads

• But these are very small advantages compared to …

Page 21: IQ Indexes

Use of 1 and 2-byte FP Indexes - 2

• Access Paths• In 12.4.2 and beyond we can perform more operations

on the lookup table of the FP – instead of the other indexes.– LIKE (simple and Complex)– Simple Predicate

YEAR(column_name) > ‘1995’ SQRT(column_name) < 100 DATEPART(hh,column_name) between 10 and 12

• Also the resulting “scan” is performed in parallel (if required)

Page 22: IQ Indexes

Low Fast Index

• Traditional Bit Map for Low Cardinality– Less than 10,000 unique values in a column– Can be unique

• Required for performance involving– Joins– Group by– MIN, MAX, SUM, AVG functions– Where clause predicates

Equality / Inequality, Ranges, IN lists

Page 23: IQ Indexes

Bitmap Indexes

What are bitmaps?

• Bitmaps are representations for each value in a field• True = 1• False = 0

• Bit position corresponds to a fixed row ID

• For each discrete value there is one bitmap – the length of which is number of rows in the table

Bitmap for "Yes" or "No" field

row-id YES NO

1 0 1

2 0 1

3 0 1

4 1 0

5 0 1

6 1 0

7 1 0

8 0 1

9 0 1

10 1 0

11 1 0

12 1 0

Page 24: IQ Indexes

Digression on Bitmaps - 1

• If there are 7,000,000 rows in the table each bitmap will be 7,000,000 bits long

• This could be 1,000,000 bytes (almost a megabyte)

• If there are 1000 possible values in the column this could mean 1 Gbyte (approx.) for the column

• Is this correct ?

Page 25: IQ Indexes

Digressions on Bitmaps - 2

• Yes, sort of…

• In IQ-M there are 4 ways of holding a bitmap page

• The conditions for the 4 types of page are– All Zero Bitmap– Few 1s– 20-80% 1s– Almost all 1s– All 1 bitmap

Page 26: IQ Indexes

Bitmap “Types”

• An all Zero bitmap page is not stored– just an entry in the block map

• An all 1 bitmap page is also not stored– a similar entry in the block map

• For the 20-80% 1s there is a real bitmap– called a bit vector

• For the nearly all 1s or nearly all zero pages the data in Run Length Encoded

Page 27: IQ Indexes

Run Length Encoding

• Used when there is a very sparse set in bits set (or not set)

• Very efficient on storage

1-50,90,102-135,1090-4573,7833,

9011-11430,...

Page 28: IQ Indexes

Bitmap Indexes

Bitmap Index for STATE

row-id AK AL AR CA CO DC DE FL

1 0 0 0 1 0 0 0 0

2 0 0 0 0 0 0 0 1

3 0 1 0 0 0 0 0 0

4 1 0 0 0 0 0 0 0

5 0 1 0 0 0 0 0 0

...

• Each unique value has it’s own bitmap

• Designed for incremental additions of rows

• Query:

select count(*) from customers where state =‘AL’

Page 29: IQ Indexes

High Group Index - 1

• High Cardinality data columns– More than 1000 unique values

• Can enforce uniqueness– Special internal structure for unique HG indexes– Automatically created by Create Table for columns with

UNIQUEUNIQUE or PRIMARYPRIMARY KEYKEY constraint (regardless of cardinality)

Page 30: IQ Indexes

High Group Index - 2

New Blocks can be addedinto the Linked List

B-TreeIndex

a b c

1,2 4

3,5,6

Page 31: IQ Indexes

High Group Index - 3

When a page is completely filledWith one value the array is

converted to a bitmap

B-TreeIndex

a B(ptr) C = 4

1,2

101101010100110100010010010010100101010

Much faster for loadMuch faster for Skewed data

give bitmaps directly to optimizer

Page 32: IQ Indexes

High Group Index - 4

• Required for performance on High Cardinality columns used for:– Joins– Select Distinct, Count Distinct– Group By

• Takes up the most space in the database• Requires the longest time to load/delete• Cannot be used with certain data types

Page 33: IQ Indexes

Specialised High Group - 1

• When a column is created (or altered) to have the following constraints– UNIQUE– Primary Key

• Then the column has a unique HG index created automatically

• This is a HG without a G-Array

Page 34: IQ Indexes

Specialised High Group - 2

• Provided the combination of two or more columns is unique, then you can generate a unique multi-column HG index on product of the two columns

• You may still generate other indexes on the base columns

Page 35: IQ Indexes

High Non Group Index (HNG)

• Bit-Wise Index– data stored as binary– vertically partitioned– patented by Sybase– cannot be unique– cannot be used with certain data types

• Used for– range searches for all cardinality columns– aggregation (sum and average functions)

Page 36: IQ Indexes

•Data with large number of values stored in binary form

•Data sliced vertically - each bit position can be manipulated separately

•Many bit positions are either all on or all off so no storage space is required with compression

HNG - High Card Bit-Wise Index

Sales

6951193712

Sales in binary form8 bit 4 bit 2 bit 1 bit

0 1 1 01 0 0 10 1 0 11 0 1 11 0 0 10 0 1 10 1 1 11 1 0 0

Sales in binary form8 bit 4 bit 2 bit 1 bit

0 1 1 01 0 0 10 1 0 11 0 1 11 0 0 10 0 1 10 1 1 11 1 0 0

Page 37: IQ Indexes

HNG Index Processing

Sales

6951193712

8 bit 4 bit 2 bit 1 bit

0 1 1 01 0 0 10 1 0 11 0 1 11 0 0 10 0 1 10 1 1 11 1 0 0

#1bits on*1 + #2bits on*2 + #4bits on*4 +#8bits on*8

(6*1) + (4*2) +(4*4)+ (4*8) = 62

For the query:select sum(sales) from customers

ASIQ performs the sumas follows :

Page 38: IQ Indexes

HNG Indexes with Other Indexes

• Any Cardinality columns also need an HNG Index– Columns used with aggregates (sum, avg)– Range searches– Root String searches

example: where cust_name like “Stan%” all other string searches will use the FP index

Page 39: IQ Indexes

Compare (CMP) Index

• The CMP index is used for “comparing” 2 columns in the same table

• It is really just 3 bitmaps– A “less than” bitmap– An “Equal to” bitmap– A “Greater than” bitmap

• Performance of t1.col1 > t1.col2 is substantially improved

• Load times are only marginally affected (<1%)

Page 40: IQ Indexes

Word (WD) Index

• This is a specialised index that indexes each and every “word” in a column

• Used for char() varchar() and long varchar()• Slightly faster to load than an equivalent column HG

index• Accessed by the “contains” verb

where t1.col1 contains (‘Richard’, ‘Soundy’)

Page 41: IQ Indexes

Word Index Use

• The delimiters and the length of the entries can be set for the WD index during index creation time.

• By default the delimiters are all the ASCII characters not defined as number or alphabetic

• The limit is the max size of each entry

CREATE WD INDEX earnings_wd ON earnings_report_table(varchar) DELIMITED BY ‘ :;.’ LIMIT 25

Page 42: IQ Indexes

Indexing Strategy

• Some columns may now need 2 or 3 indexes– Possible Combinations of indexes:

FP + LF FP + LF + HNG FP + HG FP + HG + HNG

• Unlikely Combination– FP + LF + HG

Page 43: IQ Indexes

Assigning Indexes

• Every column has the Default Index (FP)– Create Table command builds this index– cannot be dropped

• No additional indexes needed with– columns used only for Projection (Select list)– columns with ‘bit’ datatype

Page 44: IQ Indexes

High Group – 1

Regardless of cardinality put an HG on:

• Any Primary Key (if not defined in create or alter table – where it is created automatically)

• Any UNIQUE column (if not defined in create or alter table – where it is created automatically)

• Any column used in a join

Page 45: IQ Indexes

High Group – 2

• Why do the above?

• It is better and generally faster – especially on Low Cardinality Columns

• This is a change from previous thoughts

Page 46: IQ Indexes

High Group – Warning

• A High Group Index is the most complex index

• It takes the longest to load (and delete)

• The load time is directly proportional to the length or width of the data component– Smaller (in size not cardinality) data is better– Cardinality is also an issue the higher the number of “new”

values in an incremental load the slower the load can get– Can the data be broken into more than one column?

Page 47: IQ Indexes

Low Fast or High Group

• If a column is not– A Primary Key– UNIQUE– used in a join (ad-hoc or Join Index)

• But is still used for more than just a projection column it will require another one or two indexes)

Page 48: IQ Indexes

Low Fast or High Group

• Put an HG or an LF (depending on Cardinality) on the following column types– used in SELECT statement with functions:

min, max, count, count(distinct)– used in WHERE clause predicates:

equality / inequality IN list EXISTS

– used in GROUP BY– used in ORDER BY

Page 49: IQ Indexes

Additional Indexes

Misconception• The 1000 cardinality break point is not a fixed limit• Absolute limit is cardinality of 9,999• There are no hard and fast rules

– A 5,000 card LF may work faster in one case– In an other an 800 card HG may be faster

• Remember a LF is much faster to load data into than an HG

Page 50: IQ Indexes

Regardless of Cardinality

• Build an HNG index on all columns used– in BETWEEN or RANGE comparisons

– with AVG() or SUM() functions

– root string searches

• Examples:– where sold_date between ‘1/5/99’ and ‘2/5/99’

– where revenue > 1000

– Select sum(revenue) from ...

– where customer_name like ‘Syb%’

• But remember a 1 or 2 Byte FP may do the above faster!

Page 51: IQ Indexes

HNG and LF

• It is also suggested that for columns that are used in Range searches have both an HNG and an LF on them

• If the LF can be applied (for cardinality reasons)

• Why? –– The optimiser can (in certain circumstances) use either an

HNG or an LF to satisfy a range query – so lets have both – if we can afford the load and delete overheads

Page 52: IQ Indexes

HNG or LF (?)

• An HNG will be faster than an LF for SUM() anytime the number of distinct values is larger than the number of bits in the column’s data type

• An HNG will be faster than an LF for range predicates whenever the number of distinct values within the selected range exceeds ~2.5 times the number of bits in the column’s data type

• For very low cardinality, in a join column a HG will be the fastest index

Page 53: IQ Indexes

HG vs. LF and HNG

• For very low cardinality (dimension) tables then an HG may be better than both an LF and an HNG

• Why?

• The HG may only be 2 pages long (1 for the B-tree and 1 for the G-Array)

• Whereas the LF will be n pages long (1 for the B-Tree Identity page and 1 for each value)

• The HNG can also be large (1 page for the B-Tree Identity Page and 1 for each bit in the datatype)

Page 54: IQ Indexes

HNG

• Given the above slide – if you have an HNG and an HG on a low cardinality column, test the range queries

• If they are erratic in performance– Sometimes fast, sometimes slow

• Remove the HNG and try again

• This is an optimiser “deficiency” that might not be corrected…

Page 55: IQ Indexes

Indexing Tips - 1

• Use IQ Unique constraint in Create Table– This will improve (usually) load performance– Cardinality near the critical thresholds is the most important

• Create Indexes before loading Data– ASIQ is NOT like ASE – indexes load very fast, so do all the

index loads simultaneously

Page 56: IQ Indexes

Indexing Tips – 2

• Choose Indexes by committee– If in doubt as to the use of a column ask everybody!– If in doubt index everything, you can always drop indexes

later (but beware of load times on wide HG)

• Specify HG index as UNIQUE– This information is of great use to the optimiser– Also the UNIQUE HG is a very efficient index

Page 57: IQ Indexes

One Last thing on Indexes

• There is a bitmap on each and every column and each and every table called the Existence Bitmaps

• The row Existence Bitmap tells IQ-M whether there is a row for that RowId or not

• The Column Existence Bitmap tells IQ-M that the column has a value (not NULL)

Page 58: IQ Indexes

Index Use

Page 59: IQ Indexes

Optimiser Use of Indexes

• During the optimisation of a query, the query plan is created

• This is the road map for the execution of the query

• During this time the optimiser will determine the query (search and select) columns and what indexes are present

Page 60: IQ Indexes

Optimiser Use of Indexes

• The type of indexes present may allow the optimiser to select different “ways” of solving the query

• In addition certain search clauses can only be satisfied quickly by using certain indexes

Page 61: IQ Indexes

Index Selection - 1

Start Index Selection

INDEX_PREFERENCE LOW_FAST HIGH_GROUP HIGH_NON_GROUP FAST_PROJECTION

Selection Order

Query is Like

YES

NO

Query is equality YES

NO

FAST_PROJECTION

Found Set < 2000 rows

YES

NO

FAST_PROJECTION

Page 62: IQ Indexes

Index Selection - 2

Found Set < 2000 rows

YES

NO

FAST_PROJECTION

Query is distinct YES

NO

INDEX_PREFERENCE LOW_FAST HIGH_GROUP FAST_PROJECTION

Selection Order

Query is min, max group by YES

NO

Binary Collation

YES

NO

INDEX_PREFERENCE LOW_FAST HIGH_GROUP FAST_PROJECTION

INDEX_PREFERENCE LOW_FAST FAST_PROJECTION

Next Page

Page 63: IQ Indexes

Index Selection - 3

From Prev. Page

Selection Order

Query is Sum Avg YES

NO

Col Card. <

Col. Width YES

NO

INDEX_PREFERENCE LOW_FAST HIGH_GROUP HIGH_NON_GROUP FAST_PROJECTION

INDEX_PREFERENCE HIGH_NON_GROUP LOW_FAST HIGH_GROUP FAST_PROJECTION

h

Page 64: IQ Indexes

Index Selection - 4

INDEX_PREFERENCE HIGH_NON_GROUP LOW_FAST HIGH_GROUP FAST_PROJECTION

INDEX_PREFERENCE LOW_FAST HIGH_GROUP HIGH_NON_GROUP FAST_PROJECTION

Query is Between,

Range YES

NO

OK So What is the Query?

Range Card. <

Col. Width YES

NO

Page 65: IQ Indexes

A Well Forgotten Fact

• There are 2 reasons IQ-M needs enumerated indexes (LF, HG, 1-byte and 2-byte FP)

• The primary reason we have discussed, that is to solve the query

• In IQ 12.4 and beyond there is another reason, to provide the Optimiser with accurate distinct counts

• The second reason is very important – we will discuss it more when we get to the Query Tree

Page 66: IQ Indexes

Index Descriptions - End