MySQL InnoDB Storage Engine

41
MySQL 5.6 InnoDB Kristian Köhntopp

Transcript of MySQL InnoDB Storage Engine

Page 1: MySQL InnoDB Storage Engine

MySQL 5.6 InnoDBKristian Köhntopp

Page 2: MySQL InnoDB Storage Engine

InnoDB

• Transactional Engine (ACID).

• Mostly lockless.

• Checksums and Crash Recovery.

2

Page 3: MySQL InnoDB Storage Engine

Using InnoDB

• CREATE TABLE t (…) ENGINE=InnoDB; • ALTER TABLE t ENGINE=InnoDB;

• Must not have FULLTEXT, GIS type.

• Must not have compound PK with auto_increment.

3

Page 4: MySQL InnoDB Storage Engine

Transactions

• START TRANSACTION [READ ONLY|READ WRITE|WITH CONSISTENT SNAPSHOT];

• do stuff

• COMMIT;

• Changes visible to others on commit (Isolation).

• Changes undone on ROLLBACK (or disconnect).

4

Page 5: MySQL InnoDB Storage Engine

Transactions

• By Default: SET AUTOCOMMIT = 1; • ; is an implicit commit.

• Enter explicit transaction w/ START TRANSACTION even in autocommit mode.

• DDL such as CREATE/ALTER TABLE implies COMMIT.

5

Page 6: MySQL InnoDB Storage Engine

What a write does6

UPDATE t SET d = 'one' WHERE id = 1

id d txn#

1 one 2

2 zwei 1

3 drei 1

id d txn#

1 eins 1

Undo Log

Table t

Page 7: MySQL InnoDB Storage Engine

What a write does

• The insert pushes the old data into the undo log.

• The new data is put into place.

• On commit, nothing needs to be done.

• On rollback, the old row needs to be copied back.

• This used to be slow - 1000 to 10000 rows/txn,much improved.

7

Page 8: MySQL InnoDB Storage Engine

What reads do

• SET TRANSACTION ISOLATION LEVEL …

• Can be set per connection.

• Is a reader thing, only.

• Write into undo log is necessary anyway for rollback.

8

Page 9: MySQL InnoDB Storage Engine

What reads do

• … READ UNCOMMITTED; • Reads from the table, always.

• Can return 'illegal data' that has never been there, logically.

9

Page 10: MySQL InnoDB Storage Engine

What reads do

• … READ COMMITTED; • Follows roll pointer one deep.

• Only one write per row at any point in time.

• If roll pointer exists, it points to comitted data, always.

10

Page 11: MySQL InnoDB Storage Engine

What reads do

• … REPEATABLE READ; • Reader is in a transaction, too:

• START TRANSACTION … SELECT … SELECT … COMMIT

• “Follow the roll pointer chain and get the newest version that is older than the start of the readers transaction.”

11

Page 12: MySQL InnoDB Storage Engine

Shrinking the UNDO log

• For all transactions in the system:

• Determine the oldest txn#.

• Purge all records from Undo log that are older than this txn#.

• SHOW ENGINE INNODB STATUS\G

12

Page 13: MySQL InnoDB Storage Engine

Shrinking the UNDO log

• START TRANSACTION WITH CONSISTENT SNAPSHOT;

• Go on vacation.

• What happens?

13

Page 14: MySQL InnoDB Storage Engine

Shrinking the UNDO log

• START TRANSACTION WITH CONSISTENT SNAPSHOT;

• Go on vacation.

• What happens?

• wait_timeout / interactive_timeout = 28800;

• Undo Log always part of ibdata1.

• Should be set to autoextend.

14

Page 15: MySQL InnoDB Storage Engine

Shrinking the UNDO log15

Page 16: MySQL InnoDB Storage Engine

Shrinking the UNDO log

• mysql> pager grep ACTIVE mysql> show engine innodb status\G...---TRANSACTION A90E003AB, ACTIVE 16830 sec, process no 12098, OS thread id 1749563712...mysql> pager lessmysql> show engine innodb status\G...---TRANSACTION A90E003AB, ACTIVE 16830 sec, process no 12098, OS thread id 17495637123 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1 MySQL thread id 146473882, query id 4156570244 mc02cronapp-01 192.168.1.10 cron_bpTrx read view will not see trx with id >= A90E14DE8, sees < A90E14DE8

16

Page 17: MySQL InnoDB Storage Engine

Shrinking the UNDO log

• mysql> pager grep cron mysql> show processlist; ...| 146473882 | cron_bp | mc02cronapp-01:50154 | bp | Sleep | 16434 |…

• ssh mc02cronapp-01 ‘lsof -i -n -P | grep 50154’

• Und dies ist der Prozeß, der gekillt werden muß!

17

Page 18: MySQL InnoDB Storage Engine

Concurrent writes

• Implement a counter in the application

• SELECT value FROM cnt WHERE name = ? • Calculate new value.

• UPDATE cnt SET value = ? WHERE name = ?

• Does not work without locking.

18

Page 19: MySQL InnoDB Storage Engine

Concurrent writes19

BEGIN WORK

SELECT valueFROM cntWHERE name = ?FOR UPDATE

BEGIN WORK

Thread 1 Thread 2

name = ? lockedby Thread 1

UPDATE cntSET value = ?WHERE name = ?

COMMIT lock dropped

SELECT valueFROM cntWHERE name = ?FOR UPDATE

name = ? lockedby Thread 2

Page 20: MySQL InnoDB Storage Engine

Concurrent writes

• Concurrent writes are handled by

• SELECT … FOR UPDATE • Is a read statement, locks like a write.

• Locks are taken via the index, have a close look at the execution plan.

• SELECT … WHERE id IN ( … ) FOR UPDATE

20

Page 21: MySQL InnoDB Storage Engine

Isolation Level #4

• … SERIALIZEABLE

• Like an implied SELECT … IN SHARE MODE.

• Never needed:

• …FOR UPDATE: create X-Lock on every row.

• … IN SHARE MODE: create S-Lock on every row.

21

Page 22: MySQL InnoDB Storage Engine

Commit and Checkpoint22

Log Buffer

Buffer Pool(Data Page Cache)

Add't'l Mem Pool(Metadata Cache)

Redo Log

ib_logfile0

ib_logfile1

ibdata1

data dictionary

undo log

doublewrite bufferinsert merge buffer

table data

Page 23: MySQL InnoDB Storage Engine

Commit and Checkpoint

• Change a record:

• Load page

• Modify page, move old data to Undo Log

• Create Redo Log Record in Log Buffer

23

Page 24: MySQL InnoDB Storage Engine

Commit and Checkpoint

• On Commit:

• Write Log Buffer to Redo Log

24

Page 25: MySQL InnoDB Storage Engine

Commit and Checkpoint

• On checkpoint:

• Find records from Redo Log

• Determine dirty pages in Buffer Pool referenced

• Flush pages to Doublewrite Buffer

• Free Redo Log

• Flush pages to tablespaces

25

Page 26: MySQL InnoDB Storage Engine

Commit and Checkpoint

• Checkpointing - when?

• When mysqld feels idle.

• When innodb_max_dirty_pages_pct exceeded.

• When Redo Log is full.

• Many improvements in 5.6:

• Large buffer pools, writes to SSD, bandwidth mgmt.

26

Page 27: MySQL InnoDB Storage Engine

Commit and Checkpoint27

Page 28: MySQL InnoDB Storage Engine

Commit and Checkpoint28

Page 29: MySQL InnoDB Storage Engine

Primary Key29

Index Block

Index Block Index Block Index Block

Index BlockIndex Block Index Block Index Block

Data Data Data …

Page 30: MySQL InnoDB Storage Engine

Primary Key

• “There is no MYD”: Data in leaves of PK - data_size is PK.

• Data is stored in PK Order.

• auto_increment always inserts at the right end of the table.

• The tree is kept balanced: Lots of reorders.

• Change buffer optimization fixes this.

30

Page 31: MySQL InnoDB Storage Engine

Primary Key and auto_increment31

Page 32: MySQL InnoDB Storage Engine

Secondary Indexes

• Secondary Indices use the Primary Key as a row pointer.

• Size Limit 64T for other reasons.

• Keep the PK short, it is part of every Index.

• The optimizer can use this.

32

Page 33: MySQL InnoDB Storage Engine

Page Structure33

Page Header

Page Footer

Record

Record

Record

Page Address:Page # +

Directory Entry #

Page 34: MySQL InnoDB Storage Engine

Page Structure

• All I/O done in 16K pages.

• Page:

• Page Header w/ Page Directory

• Many Records + free space

• Page Footer w/ Checksum

34

Page 35: MySQL InnoDB Storage Engine

Page Structure

• SK Row Pointer = PK

• Does not change when record relocates.

• PK Records: Page# + Page Directory Slot#

• Does not change when record moves around in page.

35

Page 36: MySQL InnoDB Storage Engine

Page Structure

• Page Structure avoids Index Update Storms

• Records 8-15KB full (always some empty space)

• Strings can be grown in place most of the time.

• Page splits do not affect secondary indices (rebalancing the PK is enough).

36

Page 37: MySQL InnoDB Storage Engine

Configuration

• innodb_buffer_pool_size = …

• As large as possible.

• Needs vm.swappiness = 0 in /etc/sysctl.conf

• innodb_log_buffer_size = 8..32M

• innodb_additional_mem_pool_size = 16M

37

Page 38: MySQL InnoDB Storage Engine

Configuration

• innodb_log_files_in_group = 2

• innodb_log_file_size = <buffer pool/6>

• Total Limit: 4G (gone in 5.6 and higher)

• innodb_file_per_table = 1

• innodb_data_file_path =ibdata1:10M:autoextend

• innodb_autoextend_increment = 8

38

Page 39: MySQL InnoDB Storage Engine

Configuration

• innodb_flush_log_at_trx_commit = 0|1|2

• 1 = ACID: Commit writes to OS, OS flushes to disk

• 2 = ACID w/ software failure

• Commit writes to OS, OS flushes every second

• 0 = not ACID

• Commit is logical op, write every second

39

Page 40: MySQL InnoDB Storage Engine

–Kristian Köhntopp

innodb_flush_log_at_trx_commit = 0

“Still a better database than MongoDB”

40

Page 41: MySQL InnoDB Storage Engine

Monitoring

• SHOW ENGINE INNODB STATUS\G

• SHOW GLOBAL VARIABLES LIKE 'inno%';

41