My sql innovation work -innosql

47
David Jiang [email protected] weibo.com/insidemysql MySQL Innovation Works -- InnoSQL

Transcript of My sql innovation work -innosql

Page 1: My sql innovation work -innosql

David Jiang [email protected]

weibo.com/insidemysql

MySQL Innovation Works -- InnoSQL

Page 2: My sql innovation work -innosql

About Me 7+ years work on different databases SQL Server MySQL Oracle

Now work for Netease Development and Research Center Lab MySQL kernel development

Author <<Inside MySQL: InnoDB Storage Engine>> <<Inside MySQL: SQL Programming >> (coming soon

2012.3)

Page 3: My sql innovation work -innosql

What is InnoSQL A new MySQL branch Open source High performance (flash cache) Ease of use Fully compatible with original MySQL Collect creative idea for MySQL and make it happen

MySQL Innovation Works http://www.innomysql.org

Page 4: My sql innovation work -innosql

InnoSQL Feature Flash Cache for InnoDB Provide high performance than just use SSD as durable storage

Share memory(SHM) for InnoDB Buffer Pool Quick warm-up InnoDB buffer pool Less than 1 sec !!!

InnoDB IO Statistic Get each SQL’s physical and logic read

Page Clean Thread Remove block in user query thread

Page 5: My sql innovation work -innosql

InnoSQL Flash Cache InnoSQL Flash Cache Using SSD as Cache

Other flash cache solution Facebook flash cache Oracle flash cache Secondary Buffer Pool for InnoDB ( InnoSQL 5.5.8 )

Page 6: My sql innovation work -innosql

Facebook Flash Cache A general solution Open source https://github.com/facebook/flashcache

Integration with file systems built using the Linux Device Mapper

Not optimize for database Good in read intensive workload Worse in write intensive workload Need time to warm up

Page 7: My sql innovation work -innosql

Oracle Flash Cache Work for Oracle 11g Page write to flash cache is slow Not so aggressive

Need warm up

Page 8: My sql innovation work -innosql

Secondary Buffer Pool Support in InnoSQL 5.5.8 Good in read intensive workload Also not good for write intensive workload TPC-C

Can warm up database when start up Slow for each start

Cache is not a persistent storage

Page 9: My sql innovation work -innosql

Why need warm up ? Capacity: SSD >> Memory

Speed SSD << Memory

Page 10: My sql innovation work -innosql

Flash Cache in InnoSQL 5.5.13 Can cache both read & write operation Sequential write on SSD No random write

Merge write Cache is persistent

Page 11: My sql innovation work -innosql

Why not use SSD as durable storage SSD is good for random read 7000+ IOPS 100 ~ 150 IOPS for disk

SSD life cycle SSD write performance Write: page Wipe: extent ( 128~256 page)

Database is not fully optimized for SSD Read ahead algorithm 512 bytes alignment write for log file Random write

Page 12: My sql innovation work -innosql

Why use SSD as Cache Cache is everywhere Register L1 cache L2 cache L3 cache Memory

Disk Tape

SSD

volatile

non-volatile

Page 13: My sql innovation work -innosql

Question Using your SSD as volatile or non-volatile ?

Page 14: My sql innovation work -innosql

Analyze If use SSD as durable storage

Non-volatile But now the database not fully optimize it

If use Secondary Buffer Pool or Oracle Flash Cache Volatile Performance degrade

Need to write twice ( flash cache & durable storage )

If use Facebook flash cache Volatile or Non-volatile

Base on cache modes Writethrough Writearound writeback

Performance degrade Still need to write twice, but use some optimization

Not fully optimize for database

Page 15: My sql innovation work -innosql

Cache in MySQL InnoDB InnoDB Buffer Pool Cache page Asynchronous operation for page Read page in buffer pool first Modify page in buffer pool first Then make fuzzy or sharp checkpoint to disk Need log manager for recovery

More buffer pool, better performance Because speed gap between disk and memory However, we can not get enough memory to cache all the database

Page 16: My sql innovation work -innosql

Cache in MySQL InnoDB Insert Buffer

Insert buffer is a B+ Tree, MySQL version < 4.1.x, one table on insert buffer tree.

(page_no, fields_type_info, actual record) >=4.1, only on insert buffer tree.

(space_id, one-byte-marker, page_no,fields_type_info, actual record) index by (space_id, page_no)

Work for non-unique secondary index Write to insert buffer , if page is not in the buffer pool Insert buffer bitmap page to track the free space of page

2 bit per page Merge write operation

Merge write Delay page write raise write performance However, increase read operation

MySQL 5.5 Change Buffer insert、purge、delete mark

Page 17: My sql innovation work -innosql

InnoDB Insert Buffer mysql> show engine innodb status\G; *************************** 1. row *************************** Status: ===================================== 090922 11:52:51 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 15 seconds …… ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 2249, free list len 3346, seg size 5596, 374650 inserts, 51897 merged recs, 14300 merges Hash table size 4980499, node heap has 1246 buffer(s) 1640.60 hash searches/s, 3709.46 non-hash searches/s

Used Page Free Page Seg size=size+free list len+1

merged recs: merges = insert buffer efficiency

Page 18: My sql innovation work -innosql

Cache in MySQL InnoDB Cache can increase performance Delay write operation Gap between disk and cache

However, there is another cache in InnoDB Doublewrite

Page 19: My sql innovation work -innosql

What is Doublewrite ? Doublewrite Avoid partial write problem 512 byte write is always OK But 16K write is not

Doublewrite buffer 2M

Doublewrite file 2M Share tablespace: ibdata1

Page 20: My sql innovation work -innosql

Doublewrite Architecture Stores all data twice, first to the doublewrite buffer, and then

to the actual data files --skip-innodb_doublewrite

mysql> show global status like 'innodb_dbl%'\G; ************** 1. row ************************ Variable_name: Innodb_dblwr_pages_written Value: 152362 ************** 2. row ************************ Variable_name: Innodb_dblwr_writes Value: 1465 2 rows in set (0.00 sec)

Page 21: My sql innovation work -innosql

Doublewrite Feature Size: 2M All the page should first write here Sequential write Cache write

Hence, what about have a 100G or 300G doublewrite ? This makes flash cache happen

Page 22: My sql innovation work -innosql

Flash Cache in InnoSQL 5.5.13 Replace original doublewrite work Now user can have a large doublewrite Page write is sequential SSD write feature

Doublewrite can read now SSD random read feature

Cache both read and write operation Persistent cache Merge write 60 ~ 70% in workload like TPC-C

Support AIO read on flash cache Not supported in Secondary Buffer Pool

Page 23: My sql innovation work -innosql

Flash Cache Architecture

Page 24: My sql innovation work -innosql

Flash Cache Data Structure /** Flash cache block struct */

struct trx_flashcache_block_struct{

unsigned space:32; /*!< tablespace id */

unsigned offset:32; /*!< page number */

unsigned fil_offset:32; /*!< flash cache page number */

unsigned state:2; /*!< flash cache state*/

trx_flashcache_block_t* hash; /*!< hash chain */

};

Four State: BLOCK_NOT_USED BLOCK_READY_FOR_FLUSH BLOCK_READ_CACHE BLOCK_FLUSHED

Page 25: My sql innovation work -innosql

Flash Cache Data Structure struct trx_flashcache_struct{ mutex_t fc_mutex;/*!< mutex protecting flash cache */ hash_table_t* fc_hash; /*!< hash table of flash cache pages */ ulint fc_size; /*!< flash cache size */ ulint write_off; /*!< write to flash cache offset */ ulint flush_off; /*!< flush to disk this offset */ ulint write_round; /* write round */ ulint flush_round; /* flush round */ trx_flashcache_block_t* block; /* flash cache block */ byte* read_buf_unalign; /* unalign read buf */ byte* read_buf; /* read buf */ }

Page 26: My sql innovation work -innosql

From Developer Perspective View Flash Cache File

Flash Cache Block

Block Block Block Block Block Block Block Block

Flash Cache Hash Table (In Memory)

Lookup

Write write_offset flush_offset

Flash Cache Log File write_offset flush _offset write_round flush_round

Page 27: My sql innovation work -innosql

Flash Cache Flush Algorithms Flush page in flash cache to disk Take over the flush in master thread Flush in flash cache background thread Algorithms Less than innodb_flash_cache_write_cache_pct No flush Default 10

Less than innodb_flash_cache_do_full_io_pct Flush 10% innodb_io_capacity Default 90

Else Flush 100% innodb_io_capacity

If idle Flush 100% innodb_io_capacity

Page 28: My sql innovation work -innosql

Merge Write in Flash Cache

(7,7) (2,6) (0,6) (3,7) …… (3,7) (2,6) (4,8)

write_offset flush_offset

Page (2,6)、(3,7) can be merged This much like insert buffer Delay write operation

Page 29: My sql innovation work -innosql

Flash Cache Benchmark Sysbench OLTP Read intensive

TPC-C Write intensive

Blogbench Blog like application oriented Developed by Netease

Page 30: My sql innovation work -innosql

Sysbench OLTP

InnoDB Buffer Pool: 6G DB Size: 19G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1

Page 31: My sql innovation work -innosql

TPC-C

SSD:3607.183 Tpm Flash Cache:7230.05 Tpm Merge Write Ratio:65.47%

InnoDB Buffer Pool: 12G DB Size: 39G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1 Flash Cache: 100G

Page 32: My sql innovation work -innosql

Blogbench

InnoDB Buffer Pool: 4G DB Size: 21G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1 Merge write ratio: 60%

Page 33: My sql innovation work -innosql

Conclusion Flash Cache can work in both read and write workload Work better than using SSD as durable storage Optimize for SSD in database kernel No more writes in flash cache Merge write support

Page 34: My sql innovation work -innosql

SHM for InnoDB Buffer Pool Use share memory to allocate innodb buffer pool Why use share memory? Speed warm up

Warm up speed? Random read 10~20M/sec 30G buffer pool need 30~60 minutes

Page 35: My sql innovation work -innosql

Warm up Method Use SQL to warm up SELECT count(*) FROM table ( force index ( primary key ) ) Warm up speed convert to sequential read But can not make database to previous workload environment

Dump buffer pool to file MySQL 5.6+ support Warm up speed convert to sequential read Make database to previous workload environment Dump file is big Database crash ?

Page 36: My sql innovation work -innosql

Warm up Method Percona Server Export (space_id, page_no) in LRU list to file Load this file order by (space_id,page_no) to make read

sequential when MySQL is startup Make database to previous workload environment Still need long time to warm up if you have big buffer pool:128G、256G

Page 37: My sql innovation work -innosql

Warm up in InnoSQL Use share memory --innodb_use_shm_preload=1

Share memory configuration like Oracle /proc/sys/kernel/shmmax /proc/sys/kernel/shmall

Warm up less than 1 sec All page is in memory

Page 38: My sql innovation work -innosql

SHM for InnoDB Buffer Pool # list share memory info

innosql@db-62:~$ ipcs -a

------ Shared Memory Segments --------

key shmid owner perms bytes nattch status

0x0008c231 4653056 innosql 600 549715968 0

------ Semaphore Arrays --------

key semid owner perms nsems

------ Message Queues --------

key msqid owner perms used-bytes messages

# remove share memory

innosql@db-62:~$ ipcrm -m 4653056

Page 39: My sql innovation work -innosql

InnoDB IO Statistics Get read IO statistics Like SQL Server:SET STATISTICS IO ON

InnoSQL realize it in Slow query Log Both file and table

Help SQL developer 10 reads may be not good in OLTP application

Help DBA Know the SQL real IO statistics Not only the time it consumes

Still in develop You can preview this feature

Page 40: My sql innovation work -innosql

InnoDB IO Statistics # Time: 111103 13:29:06 # User@Host: root[root] @ localhost [::1] # Query_time: 119.293823 Lock_time: 119.274822 Rows_sent: 1

Rows_examined: 1 Logical_reads: 198 Physical_reads: 3 use tpcc; SET timestamp=1320298146; select * from warehouse where w_id=1; # Time: 111103 13:31:28 # User@Host: root[root] @ localhost [::1] # Query_time: 0.335019 Lock_time: 0.333019 Rows_sent: 1

Rows_examined: 1 Logical_reads: 164 Physical_reads: 50 SET timestamp=1320298288; select * from history;

Page 41: My sql innovation work -innosql

Configuration long_query_time io_slow_query slow_query_type 0 long_query_time 1 io_slow_query 2 both

Page 42: My sql innovation work -innosql

Page Cleaner Thread Flush page in Master Thread Adaptive Flush IO Capacity

Problem Master Thread have a lot to cope Async flush can block user query thread

Page cleaner thread MySQL 5.6 support InnoSQL support it in MySQL 5.5 Can also help flush in FLUSH_LRU_LIST

Page 43: My sql innovation work -innosql

Flush Algorithms in InnoDB checkpoint_age:current_lsn – checkpint_lsn async_water_mark: ~78%*Log_Group_Size sync_water_mark: ~90%*Log_Group_Size For example: Log file size 1G, Log file number 2 Async_water_mark = ~1.5G Sync_water_mark = ~1.8G

Page 44: My sql innovation work -innosql

Flush Algorithms in InnoDB checkpoint_age < async_water_mark adaptive_flusing 5% innodb_io_capacity

async_water_mark < checkpoint_age < sync_water_mark Block one user query thread Async flush

checkpoint_age > sync_water Block all user query thread Sync flush

n_dirty_pages > innodb_max_dirty_page_pct Flush innodb_io_capacity

Page 45: My sql innovation work -innosql

Page Cleaner Thread Reduce master thread burden Async flush move to this background No block happened in user query thread

Page 46: My sql innovation work -innosql

However Flush not only happen in master thread FLUSH_LRU_LIST Check if there at least 64 page can be used In this situation, flush almost in user query thread Adaptive flush, innodb_io_capacity helps nothing Happen in user query thread

InnoSQL also move this flush to page cleaner thread MySQL 5.6 does not support Still need more optimize

Page 47: My sql innovation work -innosql

Q & A