MySQL Indexes

29
MySQL Indexes

Transcript of MySQL Indexes

Page 1: MySQL Indexes

MySQL Indexes

Page 2: MySQL Indexes
Page 3: MySQL Indexes

Why use indexes?Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in b-trees

B-tree is a self-balancing tree data structure that keeps data sorted and allows searches,

sequential access, insertions, and deletions in predictable time

Page 4: MySQL Indexes

B-tree

Time complexity:

Full table scan = O(n)

Using index = O(log(n))

Page 5: MySQL Indexes

Selectivity

Selectivity is the ratio of unique values within a certain column

The more unique the values, the higher the selectivity

The query engine likes highly selective key columns

The higher the selectivity, the faster the query engine can reduce the

size of the result set

Page 6: MySQL Indexes

Selectivity and CardinalityCardinality is number of unique values in the index.

In simple words:

Max cardinality: all values are unique

Min cardinality: all values are the same

Selectivity of index = cardinality/(number of records) * 100%

The perfect selectivity is 100%. Can be reached by unique indexes on NOT NULL

columns.

Page 7: MySQL Indexes

Query optimization

The main idea is not to try to tune your database, but optimize your query based on the data you have

Page 8: MySQL Indexes

Selectivity by exampleExample:

Table of 10,000 rows with column `gender` (number of males ~ number of females)Let’s count selectivity for the `gender` columnSelectivity = 2/10000 * 100% = 0.02% which is very low

Page 9: MySQL Indexes

When selectivity can be neglectedSelectivity can be neglected when values are distributed unevenly

Example:

If our query select rows with stat IN (0,1) then we can still use index.

As a general idea, we should create indexes on tables that are often queried for less than

15% of the table's rows

Page 10: MySQL Indexes

How MySQL uses indexes

• Data Lookups

• Sorting

• Avoiding reading “data”

• Special Optimizations

Page 11: MySQL Indexes

Data Lookups

SELECT * FROM employees WHERE lastname=“Smith”

The classical use of index on (lastname)

Can use Multiple column indexes

SELECT * FROM employees WHERE lastname=“Smith” AND

dept=“accounting”

Will use index on (dept, lastname)

Page 12: MySQL Indexes

Use casesIndex (a,b,c) - order of columns matters

Will use Index for lookup (all listed keyparts)

a>5

a=5 AND b>6

a=5 AND b=6 AND c=7

a=5 AND b IN (2,3) AND c>5

Will NOT use Index

b>5 – Leading column is not referenced

b=6 AND c=7 - Leading column is not referenced

Will use Part of the index

a>5 AND b=2 - range on first column; only use this key part

a=5 AND b>6 AND c=2 - range on second column, use 2 parts

Page 13: MySQL Indexes

The thing with rangesMySQL will stop using key parts in multi part index as soon as it met the real range (<,>, bETWEEN), it however is able to continue using key parts further to the right if IN(…) range is used

Page 14: MySQL Indexes

Sorting

SELECT * FROM players ORDER BY score DESC LIMIT 10

Will use index on SCORE column

Without index MySQL will do “filesort” (external sort) which is very expensive

Often Combined with using Index for lookup

SELECT * FROM players WHERE country=“US” ORDER BY score DESC LIMIT 10

Best served by Index on (country, score)

Page 15: MySQL Indexes

Use CasesIt becomes even more restricted!

KEY(a,b)

Will use Index for Sorting

ORDER BY a - sorting by leading column

a=5 ORDER BY b - EQ filtering by 1st and sorting by 2nd

ORDER BY a DESC, b DESC - Sorting by 2 columns in same order

a>5 ORDER BY a - Range on the column, sorting on the same

Will NOT use Index for Sorting

ORDER BY b - Sorting by second column in the index

a>5 ORDER BY b – Range on first column, sorting by second

a IN(1,2) ORDER BY b - In-Range on first column

ORDER BY a ASC, b DESC - Sorting in the different order

Page 16: MySQL Indexes

Sorting rules

You can’t sort in different order by 2 columns

You can only have Equality comparison (=) for columns which are not part of ORDER BY

Not even IN() works in this case

Page 17: MySQL Indexes

Avoid reading the data“Covering Index”

Applies to index use for specific query, not type of index.

Reading Index ONLY and not accessing the “data”

SELECT status FROM orders WHERE customer_id=123

KEY(customer_id, status)

Index is typically smaller than data

Access is a lot more sequential

Access through data pointers is often quite “random”

Page 18: MySQL Indexes

Aggregation functions

Index help MIN()/MAX() aggregate functions

But only these

SELECT MAX(id) FROM table;

SELECT MAX(salary) FROM employee GROUP BY dept_id

Will benefit from (dept_id, salary) index

“Using index for group-by”

Page 19: MySQL Indexes

JoinsMySQL Performs Joins as “Nested Loops”

SELECT * FROM posts p, comments c WHERE p.author=“Peter” AND c.post_id=p.id

Scan table `posts` finding all posts which have Peter as an author

For every such post go to `comments` table to fetch all comments

Very important to have all JOINs Indexed

Index is only needed on table which is being looked up

The index on posts.id is not needed for this query performance

Re-Design JOIN queries which can’t be well indexed

Page 20: MySQL Indexes

Multiple indexesMySQL Can use More than one index

“Index Merge”

SELECT * FROM table WHERE a=5 AND b=6

Can often use Indexes on (a) and (b) separately

Index on (a,b) is much better

SELECT * FROM table WHERE a=5 OR b=6

2 separate indexes is as good as it gets

Index (a,b) can’t be used for this query

Page 21: MySQL Indexes

String indexesThere is no difference… really

Sort order is defined for strings (collation)

“AAAA” < “AAAB”

Prefix LIKE is a special type of Range

LIKE “ABC%” means

“ABC[LOWEST]”<KEY<“ABC[HIGHEST]”

LIKE “%ABC” can’t be optimized by use of the index

Page 22: MySQL Indexes

Real case: ProblemLets take example from real world (Voltu first page campaigns list)

Page 23: MySQL Indexes

Real case: Timing

Initially it was like 1m 20sec seconds to run for the

first time

After mysql cached the response, it was about 20sec

Page 24: MySQL Indexes

Real case: QuerySELECT wk2_campaign.*, wk2_campaignGroup.category_id as group_category_id, wk2_campaignGroup.subcategory_id as group_subcategory_id, wk2_campaignGroup.summary as group_summary, IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) category_id FROM `wk2_campaign`LEFT JOIN wk2_resource_status ON( wk2_resource_status.id = wk2_campaign.CaID) LEFT JOIN campaign_has_group ON( wk2_campaign.CaID = campaign_has_group.campaign_id) LEFT JOIN wk2_campaignGroup ON( campaign_has_group.campaign_group_id = wk2_campaignGroup.GrID) LEFT JOIN si_private_campaigns pc ON( pc.campaign_id = wk2_campaign.CaID) WHERE(wk2_campaign.tracking_active = '1') AND ((IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) IS NOT NULL) AND (IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) NOT IN(SELECT id FROM campaign_categories WHERE name IN( 'Mobile Content Subscription'))) AND(countries REGEXP 'US')) AND( ((wk2_campaign.stat IN('0', '1')) AND( wk2_resource_status.resource_type = 'ca') AND( wk2_resource_status.status = '1') AND(wk2_campaign.access != '0') AND(wk2_campaign.external_id IS NULL) AND( wk2_campaign.name IS NOT NULL ) AND(wk2_campaign.countries IS NOT NULL) AND( trim(wk2_campaign.countries) IS NOT NULL )) OR(pc.campaign_id IS NOT NULL));

Page 25: MySQL Indexes

Steps to optimize1.Add missing indexes for the joined tables

2.Check the selectivity for different columns of the main table wk2_campaign

The `tracking_active`, `stat` columns have the best selectivity (the low number of possible values) which can be indexed fast and boost query response time.

Page 26: MySQL Indexes

Steps to optimize3. Add index on these columns:

ALTER TABLE wk2_campaign ADD INDEX(tracking_active, stat);

4. We needed just to move some conditions so that they would fit the index

Page 27: MySQL Indexes

Result of optimizationWith these manipulations we made the query use only indexes

The explain select of this query:

Query run before after Performance increase

First time 1m 20s 0m 2s 4000%

Subsequent (cached by mysql)

20s 0.26s 7692%

Page 28: MySQL Indexes

Another example with “or”BeforeSELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id = caid) WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%') OR mobile_app_id LIKE '%buscape%' OR caid in

('89630','89632');

130 rows in set (7.43 sec)

AfterSELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =

caid) WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%') UNION SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =

caid) WHERE mobile_app_id LIKE '%buscape%' UNION SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =

caid) WHERE caid in ('89630','89632');130 rows in set (4.12 sec)

Page 29: MySQL Indexes

> SELECT text FROM questions LIMIT 5;> EXPLAIN