High.performance.my sql.3rd.edition

826

Transcript of High.performance.my sql.3rd.edition

  • THIRD EDITION

    High Performance MySQL

    Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko

    Beijing Cambridge Farnham Kln Sebastopol Tokyo

  • High Performance MySQL, Third Editionby Baron Schwartz, Peter Zaitsev, and Vadim TkachenkoCopyright 2012 Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko. All rights reserved.Printed in the United States of America.Published by OReilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.OReilly books may be purchased for educational, business, or sales promotional use. Online editionsare also available for most titles (http://my.safaribooksonline.com). For more information, contact ourcorporate/institutional sales department: (800) 998-9938 or [email protected].

    Editor: Andy OramProduction Editor: Holly BauerProofreader: Rachel Head

    Indexer: Jay MarchandCover Designer: Karen MontgomeryInterior Designer: David FutatoIllustrator: Rebecca Demarest

    March 2004: First Edition. June 2008: Second Edition. March 2012: Third Edition.

    Revision History for the Third Edition:2012-03-01 First release

    See http://oreilly.com/catalog/errata.csp?isbn=9781449314286 for release details.

    Nutshell Handbook, the Nutshell Handbook logo, and the OReilly logo are registered trademarks ofOReilly Media, Inc. High Performance MySQL, the image of a sparrow hawk, and related trade dressare trademarks of OReilly Media, Inc.Many of the designations used by manufacturers and sellers to distinguish their products are claimed astrademarks. Where those designations appear in this book, and OReilly Media, Inc., was aware of atrademark claim, the designations have been printed in caps or initial caps.While every precaution has been taken in the preparation of this book, the publisher and authors assumeno responsibility for errors or omissions, or for damages resulting from the use of the information con-tained herein.

    ISBN: 978-1-449-31428-6[LSI]1330630256

  • Table of Contents

    Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

    1. MySQL Architecture and History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1MySQLs Logical Architecture 1

    Connection Management and Security 2Optimization and Execution 3

    Concurrency Control 3Read/Write Locks 4Lock Granularity 4

    Transactions 6Isolation Levels 7Deadlocks 9Transaction Logging 10Transactions in MySQL 10

    Multiversion Concurrency Control 12MySQLs Storage Engines 13

    The InnoDB Engine 15The MyISAM Engine 17Other Built-in MySQL Engines 19Third-Party Storage Engines 21Selecting the Right Engine 24Table Conversions 28

    A MySQL Timeline 29MySQLs Development Model 33Summary 34

    2. Benchmarking MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Why Benchmark? 35Benchmarking Strategies 37

    iii

  • What to Measure 38Benchmarking Tactics 40

    Designing and Planning a Benchmark 41How Long Should the Benchmark Last? 42Capturing System Performance and Status 44Getting Accurate Results 45Running the Benchmark and Analyzing Results 47The Importance of Plotting 49

    Benchmarking Tools 50Full-Stack Tools 51Single-Component Tools 51

    Benchmarking Examples 54http_load 54MySQL Benchmark Suite 55sysbench 56dbt2 TPC-C on the Database Test Suite 61Perconas TPCC-MySQL Tool 64

    Summary 66

    3. Profiling Server Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Introduction to Performance Optimization 69

    Optimization Through Profiling 72Interpreting the Profile 74

    Profiling Your Application 75Instrumenting PHP Applications 77

    Profiling MySQL Queries 80Profiling a Servers Workload 80Profiling a Single Query 84Using the Profile for Optimization 91

    Diagnosing Intermittent Problems 92Single-Query Versus Server-Wide Problems 93Capturing Diagnostic Data 97A Case Study in Diagnostics 102

    Other Profiling Tools 110Using the USER_STATISTICS Tables 110Using strace 111

    Summary 112

    4. Optimizing Schema and Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115Choosing Optimal Data Types 115

    Whole Numbers 117Real Numbers 118String Types 119

    iv | Table of Contents

  • Date and Time Types 125Bit-Packed Data Types 127Choosing Identifiers 129Special Types of Data 131

    Schema Design Gotchas in MySQL 131Normalization and Denormalization 133

    Pros and Cons of a Normalized Schema 134Pros and Cons of a Denormalized Schema 135A Mixture of Normalized and Denormalized 136

    Cache and Summary Tables 136Materialized Views 138Counter Tables 139

    Speeding Up ALTER TABLE 141Modifying Only the .frm File 142Building MyISAM Indexes Quickly 143

    Summary 145

    5. Indexing for High Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147Indexing Basics 147

    Types of Indexes 148Benefits of Indexes 158Indexing Strategies for High Performance 159

    Isolating the Column 159Prefix Indexes and Index Selectivity 160Multicolumn Indexes 163Choosing a Good Column Order 165Clustered Indexes 168Covering Indexes 177Using Index Scans for Sorts 182Packed (Prefix-Compressed) Indexes 184Redundant and Duplicate Indexes 185Unused Indexes 187Indexes and Locking 188

    An Indexing Case Study 189Supporting Many Kinds of Filtering 190Avoiding Multiple Range Conditions 192Optimizing Sorts 193

    Index and Table Maintenance 194Finding and Repairing Table Corruption 194Updating Index Statistics 195Reducing Index and Data Fragmentation 197

    Summary 199

    Table of Contents | v

  • 6. Query Performance Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201Why Are Queries Slow? 201Slow Query Basics: Optimize Data Access 202

    Are You Asking the Database for Data You Dont Need? 202Is MySQL Examining Too Much Data? 204

    Ways to Restructure Queries 207Complex Queries Versus Many Queries 207Chopping Up a Query 208Join Decomposition 209

    Query Execution Basics 210The MySQL Client/Server Protocol 210The Query Cache 214The Query Optimization Process 214The Query Execution Engine 228Returning Results to the Client 228

    Limitations of the MySQL Query Optimizer 229Correlated Subqueries 229UNION Limitations 233Index Merge Optimizations 234Equality Propagation 234Parallel Execution 234Hash Joins 234Loose Index Scans 235MIN() and MAX() 237SELECT and UPDATE on the Same Table 237

    Query Optimizer Hints 238Optimizing Specific Types of Queries 241

    Optimizing COUNT() Queries 241Optimizing JOIN Queries 244Optimizing Subqueries 244Optimizing GROUP BY and DISTINCT 244Optimizing LIMIT and OFFSET 246Optimizing SQL_CALC_FOUND_ROWS 248Optimizing UNION 248Static Query Analysis 249Using User-Defined Variables 249

    Case Studies 256Building a Queue Table in MySQL 256Computing the Distance Between Points 258Using User-Defined Functions 262

    Summary 263

    vi | Table of Contents

  • 7. Advanced MySQL Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265Partitioned Tables 265

    How Partitioning Works 266Types of Partitioning 267How to Use Partitioning 268What Can Go Wrong 270Optimizing Queries 272Merge Tables 273

    Views 276Updatable Views 278Performance Implications of Views 279Limitations of Views 280

    Foreign Key Constraints 281Storing Code Inside MySQL 282

    Stored Procedures and Functions 284Triggers 286Events 288Preserving Comments in Stored Code 289

    Cursors 290Prepared Statements 291

    Prepared Statement Optimization 292The SQL Interface to Prepared Statements 293Limitations of Prepared Statements 294

    User-Defined Functions 295Plugins 297Character Sets and Collations 298

    How MySQL Uses Character Sets 298Choosing a Character Set and Collation 301How Character Sets and Collations Affect Queries 302

    Full-Text Searching 305Natural-Language Full-Text Searches 306Boolean Full-Text Searches 308Full-Text Changes in MySQL 5.1 310Full-Text Tradeoffs and Workarounds 310Full-Text Configuration and Optimization 312

    Distributed (XA) Transactions 313Internal XA Transactions 314External XA Transactions 315

    The MySQL Query Cache 315How MySQL Checks for a Cache Hit 316How the Cache Uses Memory 318When the Query Cache Is Helpful 320How to Configure and Maintain the Query Cache 323

    Table of Contents | vii

  • InnoDB and the Query Cache 326General Query Cache Optimizations 327Alternatives to the Query Cache 328

    Summary 329

    8. Optimizing Server Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331How MySQLs Configuration Works 332

    Syntax, Scope, and Dynamism 333Side Effects of Setting Variables 335Getting Started 337Iterative Optimization by Benchmarking 338

    What Not to Do 340Creating a MySQL Configuration File 342

    Inspecting MySQL Server Status Variables 346Configuring Memory Usage 347

    How Much Memory Can MySQL Use? 347Per-Connection Memory Needs 348Reserving Memory for the Operating System 349Allocating Memory for Caches 349The InnoDB Buffer Pool 350The MyISAM Key Caches 351The Thread Cache 353The Table Cache 354The InnoDB Data Dictionary 356

    Configuring MySQLs I/O Behavior 356InnoDB I/O Configuration 357MyISAM I/O Configuration 369

    Configuring MySQL Concurrency 371InnoDB Concurrency Configuration 372MyISAM Concurrency Configuration 373

    Workload-Based Configuration 375Optimizing for BLOB and TEXT Workloads 375Optimizing for Filesorts 377

    Completing the Basic Configuration 378Safety and Sanity Settings 380Advanced InnoDB Settings 383Summary 385

    9. Operating System and Hardware Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387What Limits MySQLs Performance? 387How to Select CPUs for MySQL 388

    Which Is Better: Fast CPUs or Many CPUs? 388CPU Architecture 390

    viii | Table of Contents

  • Scaling to Many CPUs and Cores 391Balancing Memory and Disk Resources 393

    Random Versus Sequential I/O 394Caching, Reads, and Writes 395Whats Your Working Set? 395Finding an Effective Memory-to-Disk Ratio 397Choosing Hard Disks 398

    Solid-State Storage 400An Overview of Flash Memory 401Flash Technologies 402Benchmarking Flash Storage 403Solid-State Drives (SSDs) 404PCIe Storage Devices 406Other Types of Solid-State Storage 407When Should You Use Flash? 407Using Flashcache 408Optimizing MySQL for Solid-State Storage 410

    Choosing Hardware for a Replica 414RAID Performance Optimization 415

    RAID Failure, Recovery, and Monitoring 417Balancing Hardware RAID and Software RAID 418RAID Configuration and Caching 419

    Storage Area Networks and Network-Attached Storage 422SAN Benchmarks 423Using a SAN over NFS or SMB 424MySQL Performance on a SAN 424Should You Use a SAN? 425

    Using Multiple Disk Volumes 427Network Configuration 429Choosing an Operating System 431Choosing a Filesystem 432Choosing a Disk Queue Scheduler 434Threading 435Swapping 436Operating System Status 438

    How to Read vmstat Output 438How to Read iostat Output 440Other Helpful Tools 441A CPU-Bound Machine 442An I/O-Bound Machine 443A Swapping Machine 444An Idle Machine 444

    Summary 445

    Table of Contents | ix

  • 10. Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447Replication Overview 447

    Problems Solved by Replication 448How Replication Works 449

    Setting Up Replication 451Creating Replication Accounts 451Configuring the Master and Replica 452Starting the Replica 453Initializing a Replica from Another Server 456Recommended Replication Configuration 458

    Replication Under the Hood 460Statement-Based Replication 460Row-Based Replication 460Statement-Based or Row-Based: Which Is Better? 461Replication Files 463Sending Replication Events to Other Replicas 465Replication Filters 466

    Replication Topologies 468Master and Multiple Replicas 468Master-Master in Active-Active Mode 469Master-Master in Active-Passive Mode 471Master-Master with Replicas 473Ring Replication 473Master, Distribution Master, and Replicas 474Tree or Pyramid 476Custom Replication Solutions 477

    Replication and Capacity Planning 482Why Replication Doesnt Help Scale Writes 483When Will Replicas Begin to Lag? 484Plan to Underutilize 485

    Replication Administration and Maintenance 485Monitoring Replication 485Measuring Replication Lag 486Determining Whether Replicas Are Consistent with the Master 487Resyncing a Replica from the Master 488Changing Masters 489Switching Roles in a Master-Master Configuration 494

    Replication Problems and Solutions 495Errors Caused by Data Corruption or Loss 495Using Nontransactional Tables 498Mixing Transactional and Nontransactional Tables 498Nondeterministic Statements 499Different Storage Engines on the Master and Replica 500

    x | Table of Contents

  • Data Changes on the Replica 500Nonunique Server IDs 500Undefined Server IDs 501Dependencies on Nonreplicated Data 501Missing Temporary Tables 502Not Replicating All Updates 503Lock Contention Caused by InnoDB Locking Selects 503Writing to Both Masters in Master-Master Replication 505Excessive Replication Lag 507Oversized Packets from the Master 511Limited Replication Bandwidth 511No Disk Space 511Replication Limitations 512

    How Fast Is Replication? 512Advanced Features in MySQL Replication 514Other Replication Technologies 516Summary 518

    11. Scaling MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521What Is Scalability? 521

    A Formal Definition 523Scaling MySQL 527

    Planning for Scalability 527Buying Time Before Scaling 528Scaling Up 529Scaling Out 531Scaling by Consolidation 547Scaling by Clustering 548Scaling Back 552

    Load Balancing 555Connecting Directly 556Introducing a Middleman 560Load Balancing with a Master and Multiple Replicas 564

    Summary 565

    12. High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567What Is High Availability? 567What Causes Downtime? 568Achieving High Availability 569

    Improving Mean Time Between Failures 570Improving Mean Time to Recovery 571

    Avoiding Single Points of Failure 572Shared Storage or Replicated Disk 573

    Table of Contents | xi

  • Synchronous MySQL Replication 576Replication-Based Redundancy 580

    Failover and Failback 581Promoting a Replica or Switching Roles 583Virtual IP Addresses or IP Takeover 583Middleman Solutions 584Handling Failover in the Application 585

    Summary 586

    13. MySQL in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589Benefits, Drawbacks, and Myths of the Cloud 590The Economics of MySQL in the Cloud 592MySQL Scaling and HA in the Cloud 593The Four Fundamental Resources 594MySQL Performance in Cloud Hosting 595

    Benchmarks for MySQL in the Cloud 598MySQL Database as a Service (DBaaS) 600

    Amazon RDS 600Other DBaaS Solutions 602

    Summary 602

    14. Application-Level Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605Common Problems 605Web Server Issues 608

    Finding the Optimal Concurrency 609Caching 611

    Caching Below the Application 611Application-Level Caching 612Cache Control Policies 614Cache Object Hierarchies 616Pregenerating Content 617The Cache as an Infrastructure Component 617Using HandlerSocket and memcached Access 618

    Extending MySQL 618Alternatives to MySQL 619Summary 620

    15. Backup and Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621Why Backups? 622Defining Recovery Requirements 623Designing a MySQL Backup Solution 624

    Online or Offline Backups? 625Logical or Raw Backups? 627

    xii | Table of Contents

  • What to Back Up 629Storage Engines and Consistency 632Replication 634

    Managing and Backing Up Binary Logs 634The Binary Log Format 635Purging Old Binary Logs Safely 636

    Backing Up Data 637Making a Logical Backup 637Filesystem Snapshots 640

    Recovering from a Backup 647Restoring Raw Files 648Restoring Logical Backups 649Point-in-Time Recovery 652More Advanced Recovery Techniques 653InnoDB Crash Recovery 655

    Backup and Recovery Tools 658MySQL Enterprise Backup 658Percona XtraBackup 658mylvmbackup 659Zmanda Recovery Manager 659mydumper 659mysqldump 660

    Scripting Backups 661Summary 664

    16. Tools for MySQL Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665Interface Tools 665Command-Line Utilities 666SQL Utilities 667Monitoring Tools 667

    Open Source Monitoring Tools 668Commercial Monitoring Systems 670Command-Line Monitoring with Innotop 672

    Summary 677

    A. Forks and Variants of MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679

    B. MySQL Server Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685

    C. Transferring Large Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715

    D. Using EXPLAIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719

    Table of Contents | xiii

  • E. Debugging Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735

    F. Using Sphinx with MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745

    Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771

    xiv | Table of Contents

  • Foreword

    Ive been a fan of this book for years, and the third edition makes a great book evenbetter. Not only do world-class experts share that expertise, but they have taken thetime to update and add chapters with high-quality writing. While the book has manydetails on getting high performance from MySQL, the focus of the book is on the pro-cess of improvement rather than facts and trivia. This book will help you figure outhow to make things better, regardless of changes in MySQLs behavior over time.The authors are uniquely qualified to write this book, based on their experience, prin-cipled approach, focus on efficiency, and commitment to improvement. By experi-ence, I mean that the authors have been working on MySQL performance from the dayswhen it didnt scale and had no instrumentation to the current period where things aremuch better. By principled approach, I mean that they treat this like a science, firstdefining problems to be solved and then using reason and measurement to solve thoseproblems.I am most impressed by their focus on efficiency. As consultants, they dont have theluxury of time. Clients getting billed by the hour want problems solved quickly. So theauthors have defined processes and built tools to get things done correctly and effi-ciently. They describe the processes in this book and publish source code for the tools.Finally, they continue to get better at what they do. This includes a shift in concernfrom throughput to response time, a commitment to understanding the performanceof MySQL on new hardware, and a pursuit of new skills like queueing theory that canbe used to understand performance.I believe this book augurs a bright future for MySQL. As MySQL has evolved to supportdemanding workloads, the authors have led a similar effort to improve the under-standing of MySQL performance within the community. They have also contributeddirectly to that improvement via XtraDB and XtraBackup. I continue to learn from themand hope you take the time to do so as well.

    Mark Callaghan, Software Engineer, Facebook

    xv

  • Preface

    We wrote this book to serve the needs of not just the MySQL application developerbut also the MySQL database administrator. We assume that you are already relativelyexperienced with MySQL. We also assume some experience with general system ad-ministration, networking, and Unix-like operating systems.The second edition of this book presented a lot of information to readers, but no bookcan provide complete coverage of a topic. Between the second and third editions, wetook notes on literally thousands of interesting problems wed solved or seen otherssolve. When we started to outline the third edition, it became clear that not only wouldfull coverage of these topics require three to five thousand pages, but the book stillwouldnt be complete. After reflecting on this problem, we realized that the secondeditions emphasis on deep coverage was actually self-limiting, in the sense that it oftendidnt teach readers how to think about MySQL.As a result, this third edition has a different focus from the second edition. We stillconvey a lot of information, and we still emphasize the same goals, such as reliabilityand correctness. But weve also tried to imbue the book with a deeper purpose: we wantto teach the principles of why MySQL works as it does, not just the facts about how itworks. Weve included more illustrative stories and case studies, which demonstratethe principles in action. We build on these to try to answer questions such as GivenMySQLs internal architecture and operation, what practical effects arise in real usage?Why do those effects matter? How do they make MySQL well suited (or not well suited)for particular needs?Ultimately, we hope that your knowledge of MySQLs internals will help you in situa-tions beyond the scope of this book. And we hope that your newfound insight will helpyou to learn and practice a methodical approach to designing, maintaining, and trou-bleshooting systems that are built on MySQL.

    How This Book Is OrganizedWe fit a lot of complicated topics into this book. Here, we explain how we put themtogether in an order that makes them easier to learn.

    xvii

  • A Broad OverviewChapter 1, MySQL Architecture and History is dedicated to the basicsthings youllneed to be familiar with before you dig in deeply. You need to understand how MySQLis organized before youll be able to use it effectively. This chapter explains MySQLsarchitecture and key facts about its storage engines. It helps you get up to speed if youarent familiar with some of the fundamentals of a relational database, including trans-actions. This chapter will also be useful if this book is your introduction to MySQL butyoure already familiar with another database, such as Oracle. We also include a bit ofhistorical context: the changes to MySQL over time, recent ownership changes, andwhere we think its headed.

    Building a Solid FoundationThe early chapters cover material we hope youll reference over and over as you useMySQL.Chapter 2, Benchmarking MySQL discusses the basics of benchmarkingthat is, de-termining what sort of workload your server can handle, how fast it can perform certaintasks, and so on. Benchmarking is an essential skill for evaluating how the server be-haves under load, but its also important to know when its not useful.Chapter 3, Profiling Server Performance introduces you to the response timeorientedapproach we take to troubleshooting and diagnosing server performance problems.This framework has proven essential to solving some of the most puzzling cases weveseen. Although you might choose to modify our approach (we developed it by modi-fying Cary Millsaps approach, after all), we hope youll avoid the pitfalls of not havingany method at all.In Chapters 4 through 6, we introduce three topics that together form the foundationfor a good logical and physical database design. In Chapter 4, Optimizing Schema andData Types, we cover the various nuances of data types and table design. Chapter 5,Indexing for High Performance extends the discussion to indexesthat is, physicaldatabase design. A firm understanding of indexes and how to use them well is essentialfor using MySQL effectively, so youll probably find yourself returning to this chapterrepeatedly. And Chapter 6, Query Performance Optimization wraps the topics togetherby explaining how MySQL executes queries and how you can take advantage of itsquery optimizers strengths. This chapter also presents specific examples of many com-mon classes of queries, illustrating where MySQL does a good job and how to transformqueries into forms that use its strengths.Up to this point, weve covered the basic topics that apply to any database: tables,indexes, data, and queries. Chapter 7, Advanced MySQL Features goes beyond thebasics and shows you how MySQLs advanced features work. We examine topics suchas partitioning, stored procedures, triggers, and character sets. MySQLs implementa-tion of these features is different from other databases, and a good understanding of

    xviii | Preface

  • them can open up new opportunities for performance gains that you might not havethought about otherwise.

    Configuring Your ApplicationThe next two chapters discuss how to make MySQL, your application, and your hard-ware work well together. In Chapter 8, Optimizing Server Settings, we discuss how youcan configure MySQL to make the most of your hardware and to be reliable and robust.Chapter 9, Operating System and Hardware Optimization explains how to get the mostout of your operating system and hardware. We discuss solid-state storage in depth,and we suggest hardware configurations that might provide better performance forlarger-scale applications.Both chapters explore MySQL internals to some degree. This is a recurring theme thatcontinues all the way through the appendixes: learn how it works internally, and youllbe empowered to understand and reason about the consequences.

    MySQL as an Infrastructure ComponentMySQL doesnt exist in a vacuum. Its part of an overall application stack, and youllneed to build a robust overall architecture for your application. The next set of chaptersis about how to do that.In Chapter 10, Replication, we discuss MySQLs killer feature: the ability to set upmultiple servers that all stay in sync with a master servers changes. Unfortunately,replication is perhaps MySQLs most troublesome feature for some people. Thisdoesnt have to be the case, and we show you how to ensure that it keeps running well.Chapter 11, Scaling MySQL discusses what scalability is (its not the same thing asperformance), why applications and systems dont scale, and what to do about it. Ifyou do it right, you can scale MySQL to suit nearly any purpose. Chapter 12, HighAvailability delves into a related-but-distinct topic: how to ensure that MySQL staysup and functions smoothly. In Chapter 13, MySQL in the Cloud, youll learn aboutwhats different when you run MySQL in cloud computing environments.In Chapter 14, Application-Level Optimization, we explain what we call full-stack op-timizationoptimization from the frontend to the backend, all the way from the usersexperience to the database.The best-designed, most scalable architecture in the world is no good if it cant survivepower outages, malicious attacks, application bugs or programmer mistakes, and otherdisasters. Thats why Chapter 15, Backup and Recovery discusses various backup andrecovery strategies for your MySQL databases. These strategies will help minimize yourdowntime in the event of inevitable hardware failure and ensure that your data survivessuch catastrophes.

    Preface | xix

  • Miscellaneous Useful TopicsIn the last chapter and the books appendixes, we delve into several topics that eitherdont fit well into any of the earlier chapters, or are referenced often enough in multiplechapters that they deserve a bit of special attention.Chapter 16, Tools for MySQL Users explores some of the open source and commercialtools that can help you manage and monitor your MySQL servers more efficiently.Appendix A introduces the three major unofficial versions of MySQL that have arisenover the last few years, including the one that our company maintains. Its worthknowing what else is available; many problems that are difficult or intractable withMySQL are solved elegantly by one of the variants. Two of the three (Percona Serverand MariaDB) are drop-in replacements, so the effort involved in trying them out is notlarge. However, we hasten to add that we think most users are well served by stickingwith the official MySQL distribution from Oracle.Appendix B shows you how to inspect your MySQL server. Knowing how to get statusinformation from the server is important; knowing what that information means is evenmore important. We cover SHOW INNODB STATUS in particular detail, because it providesdeep insight into the operations of the InnoDB transactional storage engine. There is alot of discussion of InnoDBs internals in this appendix.Appendix C shows you how to copy very large files from place to place efficientlyamust if you are going to manage large volumes of data. Appendix D shows you how toreally use and understand the all-important EXPLAIN command. Appendix E shows youhow to decipher whats going on when queries are requesting locks that interfere witheach other. And finally, Appendix F is an introduction to Sphinx, a high-performance,full-text indexing system that can complement MySQLs own abilities.

    Software Versions and AvailabilityMySQL is a moving target. In the years since Jeremy wrote the outline for the firstedition of this book, numerous releases of MySQL have appeared. MySQL 4.1 and 5.0were available only as alpha versions when the first edition went to press, but todayMySQL 5.1 and 5.5 are the backbone of many large online applications. As we com-pleted this third edition, MySQL 5.6 was the unreleased bleeding edge.We didnt rely on a single version of MySQL for this book. Instead, we drew on ourextensive collective knowledge of MySQL in the real world. The core of the book isfocused on MySQL 5.1 and MySQL 5.5, because those are what we consider the cur-rent versions. Most of our examples assume youre running some reasonably matureversion of MySQL 5.1, such as MySQL 5.1.50 or newer or newer. We have made aneffort to note features or functionalities that might not exist in older releases or thatmight exist only in the upcoming 5.6 series. However, the definitive reference for map-ping features to specific versions is the MySQL documentation itself. We expect that

    xx | Preface

  • youll find yourself visiting the annotated online documentation (http://dev.mysql.com/doc/) from time to time as you read this book.Another great aspect of MySQL is that it runs on all of todays popular platforms:Mac OS X, Windows, GNU/Linux, Solaris, FreeBSD, you name it! However, we arebiased toward GNU/Linux1 and other Unix-like operating systems. Windows users arelikely to encounter some differences. For example, file paths are completely differenton Windows. We also refer to standard Unix command-line utilities; we assume youknow the corresponding commands in Windows.2Perl is the other rough spot when dealing with MySQL on Windows. MySQL comeswith several useful utilities that are written in Perl, and certain chapters in this bookpresent example Perl scripts that form the basis of more complex tools youll build.Percona Toolkitwhich is indispensable for administering MySQLis also written inPerl. However, Perl isnt included with Windows. In order to use these scripts, youllneed to download a Windows version of Perl from ActiveState and install the necessaryadd-on modules (DBI and DBD::mysql) for MySQL access.

    Conventions Used in This BookThe following typographical conventions are used in this book:Italic

    Used for new terms, URLs, email addresses, usernames, hostnames, filenames, fileextensions, pathnames, directories, and Unix commands and utilities.

    Constant widthIndicates elements of code, configuration options, database and table names, vari-ables and their values, functions, modules, the contents of files, or the output fromcommands.

    Constant width boldShows commands or other text that should be typed literally by the user. Also usedfor emphasis in command output.

    Constant width italicShows text that should be replaced with user-supplied values.

    This icon signifies a tip, suggestion, or general note.

    1. To avoid confusion, we refer to Linux when we are writing about the kernel, and GNU/Linux when weare writing about the whole operating system infrastructure that supports applications.

    2. You can get Windows-compatible versions of Unix utilities at http://unxutils.sourceforge.net or http://gnuwin32.sourceforge.net.

    Preface | xxi

  • This icon indicates a warning or caution.

    Using Code ExamplesThis book is here to help you get your job done. In general, you may use the code inthis book in your programs and documentation. You dont need to contact us forpermission unless youre reproducing a significant portion of the code. For example,writing a program that uses several chunks of code from this book doesnt requirepermission. Selling or distributing a CD-ROM of examples from OReilly books doesrequire permission. Answering a question by citing this book and quoting examplecode doesnt require permission. Incorporating a significant amount of example codefrom this book into your products documentation does require permission.Examples are maintained on the site http://www.highperfmysql.com and will be updatedthere from time to time. We cannot commit, however, to updating and testing the codefor every minor release of MySQL.We appreciate, but dont require, attribution. An attribution usually includes the title,author, publisher, and ISBN. For example: High Performance MySQL, Third Edi-tion, by Baron Schwartz et al. (OReilly). Copyright 2012 Baron Schwartz, Peter Zaitsev,and Vadim Tkachenko, 978-1-449-31428-6.If you feel your use of code examples falls outside fair use or the permission given above,feel free to contact us at [email protected].

    Safari Books OnlineSafari Books Online (www.safaribooksonline.com) is an on-demand digitallibrary that delivers expert content in both book and video form from theworlds leading authors in technology and business. Technology profes-sionals, software developers, web designers, and business and creativeprofessionals use Safari Books Online as their primary resource for re-search, problem solving, learning, and certification training.

    Safari Books Online offers a range of product mixes and pricing programs for organi-zations, government agencies, and individuals. Subscribers have access to thousandsof books, training videos, and prepublication manuscripts in one fully searchable da-tabase from publishers like OReilly Media, Prentice Hall Professional, Addison-WesleyProfessional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, JohnWiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FTPress, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Tech-nology, and dozens more. For more information about Safari Books Online, please visitus online.

    xxii | Preface

  • How to Contact UsPlease address comments and questions concerning this book to the publisher:

    OReilly Media, Inc.1005 Gravenstein Highway NorthSebastopol, CA 95472800-998-9938 (in the United States or Canada)707-829-0515 (international or local)707-829-0104 (fax)

    We have a web page for this book, where we list errata, examples, and any additionalinformation. You can access this page at:

    http://shop.oreilly.com/product/0636920022343.doTo comment or ask technical questions about this book, send email to:

    [email protected] more information about our books, conferences, Resource Centers, and theOReilly Network, see our website at:

    http://www.oreilly.comFind us on Facebook: http://facebook.com/oreillyFollow us on Twitter: http://twitter.com/oreillymediaWatch us on YouTube: http://www.youtube.com/oreillymediaYou can also get in touch with the authors directly. You can use the contact form onour companys website at http://www.percona.com. Wed be delighted to hear fromyou.

    Acknowledgments for the Third EditionThanks to the following people who helped in various ways: Brian Aker, Johan An-dersson, Espen Braekken, Mark Callaghan, James Day, Maciej Dobrzanski, EwenFortune, Dave Hildebrandt, Fernando Ipar, Haidong Ji, Giuseppe Maxia, Aurimas Mi-kalauskas, Istvan Podor, Yves Trudeau, Matt Yonkovit, and Alex Yurchenko. Thanksto everyone at Percona for helping in dozens of ways over the years. Thanks to the manygreat bloggers3 and speakers who gave us a great deal of food for thought, especiallyYoshinori Matsunobu. Thanks also to the authors of the previous editions: Jeremy D.Zawodny, Derek J. Balling, and Arjen Lentz. Thanks to Andy Oram, Rachel Head, andthe whole OReilly staff who do such a classy job of publishing books and runningconferences. And much gratitude to the brilliant and dedicated MySQL team inside

    3. You can find a wealth of great technical blogging on http://planet.mysql.com.

    Preface | xxiii

  • Oracle, as well as all of the ex-MySQLers, wherever you are, and especially to SkySQLand Monty Program.Baron thanks his wife Lynn, his mother, Connie, and his parents-in-law, Jane andRoger, for helping and supporting this project in many ways, but most especially fortheir encouragement and help with chores and taking care of the family. Thanks alsoto Peter and Vadim for being such great teachers and colleagues. Baron dedicates thisedition to the memory of Alan Rimm-Kaufman, whose great love and encouragementare never forgotten.

    Acknowledgments for the Second EditionSphinx developer Andrew Aksyonoff wrote Appendix F. Wed like to thank him firstfor his in-depth discussion.We have received invaluable help from many people while writing this book. Its im-possible to list everyone who gave us helpwe really owe thanks to the entire MySQLcommunity and everyone at MySQL AB. However, heres a list of people who contrib-uted directly, with apologies if weve missed anyone: Tobias Asplund, Igor Babaev,Pascal Borghino, Roland Bouman, Ronald Bradford, Mark Callaghan, Jeremy Cole,Britt Crawford and the HiveDB Project, Vasil Dimov, Harrison Fisk, Florian Haas,Dmitri Joukovski and Zmanda (thanks for the diagram explaining LVM snapshots),Alan Kasindorf, Sheeri Kritzer Cabral, Marko Makela, Giuseppe Maxia, Paul McCul-lagh, B. Keith Murphy, Dhiren Patel, Sergey Petrunia, Alexander Rubin, Paul Tuckfield,Heikki Tuuri, and Michael Monty Widenius.A special thanks to Andy Oram and Isabel Kunkle, our editor and assistant editor atOReilly, and to Rachel Wheeler, the copyeditor. Thanks also to the rest of the OReillystaff.

    From BaronI would like to thank my wife, Lynn Rainville, and our dog, Carbon. If youve writtena book, Im sure you know how grateful I am to them. I also owe a huge debt of gratitudeto Alan Rimm-Kaufman and my colleagues at the Rimm-Kaufman Group for theirsupport and encouragement during this project. Thanks to Peter, Vadim, and Arjen forgiving me the opportunity to make this dream come true. And thanks to Jeremy andDerek for breaking the trail for us.

    From PeterIve been doing MySQL performance and scaling presentations, training, and consult-ing for years, and Ive always wanted to reach a wider audience, so I was very excitedwhen Andy Oram approached me to work on this book. I have not written a bookbefore, so I wasnt prepared for how much time and effort it required. We first started

    xxiv | Preface

  • talking about updating the first edition to cover recent versions of MySQL, but wewanted to add so much material that we ended up rewriting most of the book.This book is truly a team effort. Because I was very busy bootstrapping Percona,Vadims and my consulting company, and because English is not my first language, weall had different roles. I provided the outline and technical content, then I reviewed thematerial, revising and extending it as we wrote. When Arjen (the former head of theMySQL documentation team) joined the project, we began to fill out the outline. Thingsreally started to roll once we brought in Baron, who can write high-quality book contentat insane speeds. Vadim was a great help with in-depth MySQL source code checksand when we needed to back our claims with benchmarks and other research.As we worked on the book, we found more and more areas we wanted to explore inmore detail. Many of the books topics, such as replication, query optimization,InnoDB, architecture, and design could easily fill their own books, so we had to stopsomewhere and leave some material for a possible future edition or for our blogs, pre-sentations, and articles.We got great help from our reviewers, who are the top MySQL experts in the world,from both inside and outside of MySQL AB. These include MySQLs founder, MichaelWidenius; InnoDBs founder, Heikki Tuuri; Igor Babaev, the head of the MySQL op-timizer team; and many others.I would also like to thank my wife, Katya Zaytseva, and my children, Ivan and Na-dezhda, for allowing me to spend time on the book that should have been Family Time.Im also grateful to Perconas employees for handling things when I disappeared towork on the book, and of course to Andy Oram and OReilly for making things happen.

    From VadimI would like to thank Peter, who I am excited to have worked with on this book andlook forward to working with on other projects; Baron, who was instrumental in gettingthis book done; and Arjen, who was a lot of fun to work with. Thanks also to our editorAndy Oram, who had enough patience to work with us; the MySQL team that createdgreat software; and our clients who provide me the opportunities to fine-tune myMySQL understanding. And finally a special thank you to my wife, Valerie, and oursons, Myroslav and Timur, who always support me and help me to move forward.

    From ArjenI would like to thank Andy for his wisdom, guidance, and patience. Thanks to Baronfor hopping on the second edition train while it was already in motion, and to Peterand Vadim for solid background information and benchmarks. Thanks also to Jeremyand Derek for the foundation with the first edition; as you wrote in my copy, Derek:Keep em honest, thats all I ask.

    Preface | xxv

  • Also thanks to all my former colleagues (and present friends) at MySQL AB, where Iacquired most of what I know about the topic; and in this context a special mentionfor Monty, whom I continue to regard as the proud parent of MySQL, even though hiscompany now lives on as part of Sun Microsystems. I would also like to thank everyoneelse in the global MySQL community.And last but not least, thanks to my daughter Phoebe, who at this stage in her younglife does not care about this thing called MySQL, nor indeed has she any idea whichof The Wiggles it might refer to! For some, ignorance is truly bliss, and they provide uswith a refreshing perspective on what is really important in life; for the rest of you, mayyou find this book a useful addition on your reference bookshelf. And dont forgetyour life.

    Acknowledgments for the First EditionA book like this doesnt come into being without help from literally dozens of people.Without their assistance, the book you hold in your hands would probably still be abunch of sticky notes on the sides of our monitors. This is the part of the book wherewe get to say whatever we like about the folks who helped us out, and we dont haveto worry about music playing in the background telling us to shut up and go away, asyou might see on TV during an awards show.We couldnt have completed this project without the constant prodding, begging,pleading, and support from our editor, Andy Oram. If there is one person most re-sponsible for the book in your hands, its Andy. We really do appreciate the weeklynag sessions.Andy isnt alone, though. At OReilly there are a bunch of other folks who had somepart in getting those sticky notes converted to a cohesive book that youd be willing toread, so we also have to thank the production, illustration, and marketing folks forhelping to pull this book together. And, of course, thanks to Tim OReilly for his con-tinued commitment to producing some of the industrys finest documentation for pop-ular open source software.Finally, wed both like to give a big thanks to the folks who agreed to look over thevarious drafts of the book and tell us all the things we were doing wrong: our reviewers.They spent part of their 2003 holiday break looking over roughly formatted versionsof this text, full of typos, misleading statements, and outright mathematical errors. Inno particular order, thanks to Brian Krow Aker, Mark JDBC Matthews, Jeremythe other Jeremy Cole, Mike VBMySQL.com Hillyer, Raymond Rainman DeRoo, Jeffrey Regex Master Friedl, Jason DeHaan, Dan Nelson, Steve Unix WizFriedl, and, last but not least, Kasia Unix Girl Trapszo.

    xxvi | Preface

  • From JeremyI would again like to thank Andy for agreeing to take on this project and for continuallybeating on us for more chapter material. Dereks help was essential for getting the last2030% of the book completed so that we wouldnt miss yet another target date.Thanks for agreeing to come on board late in the process and deal with my sporadicbursts of productivity, and for handling the XML grunt work, Chapter 10, AppendixF, and all the other stuff I threw your way.I also need to thank my parents for getting me that first Commodore 64 computer somany years ago. They not only tolerated the first 10 years of what seems to be a lifelongobsession with electronics and computer technology, but quickly became supportersof my never-ending quest to learn and do more.Next, Id like to thank a group of people Ive had the distinct pleasure of working withwhile spreading the MySQL religion at Yahoo! during the last few years. Jeffrey Friedland Ray Goldberger provided encouragement and feedback from the earliest stages ofthis undertaking. Along with them, Steve Morris, James Harvey, and Sergey Kolychevput up with my seemingly constant experimentation on the Yahoo! Finance MySQLservers, even when it interrupted their important work. Thanks also to the countlessother Yahoo!s who have helped me find interesting MySQL problems and solutions.And, most importantly, thanks for having the trust and faith in me needed to putMySQL into some of the most important and visible parts of Yahoo!s business.Adam Goodman, the publisher and owner of Linux Magazine, helped me ease into theworld of writing for a technical audience by publishing my first feature-length MySQLarticles back in 2001. Since then, hes taught me more than he realizes about editingand publishing and has encouraged me to continue on this road with my own monthlycolumn in the magazine. Thanks, Adam.Thanks to Monty and David for sharing MySQL with the world. Speaking of MySQLAB, thanks to all the other great folks there who have encouraged me in writing this:Kerry, Larry, Joe, Marten, Brian, Paul, Jeremy, Mark, Harrison, Matt, and the rest ofthe team there. You guys rock.Finally, thanks to all my weblog readers for encouraging me to write informally aboutMySQL and other technical topics on a daily basis. And, last but not least, thanks tothe Goon Squad.

    From DerekLike Jeremy, Ive got to thank my family, for much the same reasons. I want to thankmy parents for their constant goading that I should write a book, even if this isntanywhere near what they had in mind. My grandparents helped me learn two valuablelessons, the meaning of the dollar and how much I would fall in love with computers,as they loaned me the money to buy my first Commodore VIC-20.

    Preface | xxvii

  • I cant thank Jeremy enough for inviting me to join him on the whirlwind book-writingroller coaster. Its been a great experience and I look forward to working with him againin the future.A special thanks goes out to Raymond De Roo, Brian Wohlgemuth, David Calafran-cesco, Tera Doty, Jay Rubin, Bill Catlan, Anthony Howe, Mark ONeal, George Mont-gomery, George Barber, and the myriad other people who patiently listened to me gripeabout things, let me bounce ideas off them to see whether an outsider could understandwhat I was trying to say, or just managed to bring a smile to my face when I needed itmost. Without you, this book might still have been written, but I almost certainly wouldhave gone crazy in the process.

    xxviii | Preface

  • CHAPTER 1

    MySQL Architecture and History

    MySQL is very different from other database servers, and its architectural characteris-tics make it useful for a wide range of purposes as well as making it a poor choice forothers. MySQL is not perfect, but it is flexible enough to work well in very demandingenvironments, such as web applications. At the same time, MySQL can power embed-ded applications, data warehouses, content indexing and delivery software, highlyavailable redundant systems, online transaction processing (OLTP), and much more.To get the most from MySQL, you need to understand its design so that you canwork with it, not against it. MySQL is flexible in many ways. For example, you canconfigure it to run well on a wide range of hardware, and it supports a variety of datatypes. However, MySQLs most unusual and important feature is its storage-enginearchitecture, whose design separates query processing and other server tasks from datastorage and retrieval. This separation of concerns lets you choose how your data isstored and what performance, features, and other characteristics you want.This chapter provides a high-level overview of the MySQL server architecture, the majordifferences between the storage engines, and why those differences are important. Wellfinish with some historical context and benchmarks. Weve tried to explain MySQL bysimplifying the details and showing examples. This discussion will be useful for thosenew to database servers as well as readers who are experts with other database servers.

    MySQLs Logical ArchitectureA good mental picture of how MySQLs components work together will help you un-derstand the server. Figure 1-1 shows a logical view of MySQLs architecture.The topmost layer contains the services that arent unique to MySQL. Theyre servicesmost network-based client/server tools or servers need: connection handling, authen-tication, security, and so forth.The second layer is where things get interesting. Much of MySQLs brains are here,including the code for query parsing, analysis, optimization, caching, and all the

    1

  • built-in functions (e.g., dates, times, math, and encryption). Any functionality providedacross storage engines lives at this level: stored procedures, triggers, and views, forexample.The third layer contains the storage engines. They are responsible for storing andretrieving all data stored in MySQL. Like the various filesystems available for GNU/Linux, each storage engine has its own benefits and drawbacks. The server communi-cates with them through the storage engine API. This interface hides differencesbetween storage engines and makes them largely transparent at the query layer. TheAPI contains a couple of dozen low-level functions that perform operations such asbegin a transaction or fetch the row that has this primary key. The storage enginesdont parse SQL1 or communicate with each other; they simply respond to requestsfrom the server.

    Connection Management and SecurityEach client connection gets its own thread within the server process. The connectionsqueries execute within that single thread, which in turn resides on one core or CPU.The server caches threads, so they dont need to be created and destroyed for each newconnection.2When clients (applications) connect to the MySQL server, the server needs to authen-ticate them. Authentication is based on username, originating host, and password.

    Figure 1-1. A logical view of the MySQL server architecture

    1. One exception is InnoDB, which does parse foreign key definitions, because the MySQL server doesntyet implement them itself.

    2. MySQL 5.5 and newer versions support an API that can accept thread-pooling plugins, so a small poolof threads can service many connections.

    2 | Chapter 1:MySQL Architecture and History

  • X.509 certificates can also be used across an SSL (Secure Sockets Layer) connection.Once a client has connected, the server verifies whether the client has privileges foreach query it issues (e.g., whether the client is allowed to issue a SELECT statement thataccesses the Country table in the world database).

    Optimization and ExecutionMySQL parses queries to create an internal structure (the parse tree), and then appliesa variety of optimizations. These can include rewriting the query, determining the orderin which it will read tables, choosing which indexes to use, and so on. You can passhints to the optimizer through special keywords in the query, affecting its decision-making process. You can also ask the server to explain various aspects of optimization.This lets you know what decisions the server is making and gives you a reference pointfor reworking queries, schemas, and settings to make everything run as efficiently aspossible. We discuss the optimizer in much more detail in Chapter 6.The optimizer does not really care what storage engine a particular table uses, but thestorage engine does affect how the server optimizes the query. The optimizer asksthe storage engine about some of its capabilities and the cost of certain operations, andfor statistics on the table data. For instance, some storage engines support index typesthat can be helpful to certain queries. You can read more about indexing and schemaoptimization in Chapter 4 and Chapter 5.Before even parsing the query, though, the server consults the query cache, which canstore only SELECT statements, along with their result sets. If anyone issues a query thatsidentical to one already in the cache, the server doesnt need to parse, optimize, orexecute the query at allit can simply pass back the stored result set. We write moreabout that in Chapter 7.

    Concurrency ControlAnytime more than one query needs to change data at the same time, the problem ofconcurrency control arises. For our purposes in this chapter, MySQL has to do this attwo levels: the server level and the storage engine level. Concurrency control is a bigtopic to which a large body of theoretical literature is devoted, so we will just give youa simplified overview of how MySQL deals with concurrent readers and writers, so youhave the context you need for the rest of this chapter.Well use an email box on a Unix system as an example. The classic mbox file formatis very simple. All the messages in an mbox mailbox are concatenated together, oneafter another. This makes it very easy to read and parse mail messages. It also makesmail delivery easy: just append a new message to the end of the file.

    Concurrency Control | 3

  • But what happens when two processes try to deliver messages at the same time to thesame mailbox? Clearly that could corrupt the mailbox, leaving two interleaved mes-sages at the end of the mailbox file. Well-behaved mail delivery systems use locking toprevent corruption. If a client attempts a second delivery while the mailbox is locked,it must wait to acquire the lock itself before delivering its message.This scheme works reasonably well in practice, but it gives no support for concurrency.Because only a single process can change the mailbox at any given time, this approachbecomes problematic with a high-volume mailbox.

    Read/Write LocksReading from the mailbox isnt as troublesome. Theres nothing wrong with multipleclients reading the same mailbox simultaneously; because they arent making changes,nothing is likely to go wrong. But what happens if someone tries to delete messagenumber 25 while programs are reading the mailbox? It depends, but a reader couldcome away with a corrupted or inconsistent view of the mailbox. So, to be safe, evenreading from a mailbox requires special care.If you think of the mailbox as a database table and each mail message as a row, its easyto see that the problem is the same in this context. In many ways, a mailbox is reallyjust a simple database table. Modifying rows in a database table is very similar to re-moving or changing the content of messages in a mailbox file.The solution to this classic problem of concurrency control is rather simple. Systemsthat deal with concurrent read/write access typically implement a locking system thatconsists of two lock types. These locks are usually known as shared locks and exclusivelocks, or read locks and write locks.Without worrying about the actual locking technology, we can describe the concept asfollows. Read locks on a resource are shared, or mutually nonblocking: many clientscan read from a resource at the same time and not interfere with each other. Writelocks, on the other hand, are exclusivei.e., they block both read locks and other writelocksbecause the only safe policy is to have a single client writing to the resource ata given time and to prevent all reads when a client is writing.In the database world, locking happens all the time: MySQL has to prevent one clientfrom reading a piece of data while another is changing it. It performs this lock man-agement internally in a way that is transparent much of the time.

    Lock GranularityOne way to improve the concurrency of a shared resource is to be more selective aboutwhat you lock. Rather than locking the entire resource, lock only the part that containsthe data you need to change. Better yet, lock only the exact piece of data you plan to

    4 | Chapter 1:MySQL Architecture and History

  • change. Minimizing the amount of data that you lock at any one time lets changes toa given resource occur simultaneously, as long as they dont conflict with each other.The problem is locks consume resources. Every lock operationgetting a lock, check-ing to see whether a lock is free, releasing a lock, and so onhas overhead. If the systemspends too much time managing locks instead of storing and retrieving data, perfor-mance can suffer.A locking strategy is a compromise between lock overhead and data safety, and thatcompromise affects performance. Most commercial database servers dont give youmuch choice: you get what is known as row-level locking in your tables, with a varietyof often complex ways to give good performance with many locks.MySQL, on the other hand, does offer choices. Its storage engines can implement theirown locking policies and lock granularities. Lock management is a very important de-cision in storage engine design; fixing the granularity at a certain level can give betterperformance for certain uses, yet make that engine less suited for other purposes. Be-cause MySQL offers multiple storage engines, it doesnt require a single general-purpose solution. Lets have a look at the two most important lock strategies.

    Table locks

    The most basic locking strategy available in MySQL, and the one with the lowest over-head, is table locks. A table lock is analogous to the mailbox locks described earlier: itlocks the entire table. When a client wishes to write to a table (insert, delete, update,etc.), it acquires a write lock. This keeps all other read and write operations at bay.When nobody is writing, readers can obtain read locks, which dont conflict with otherread locks.Table locks have variations for good performance in specific situations. For example,READ LOCAL table locks allow some types of concurrent write operations. Write locksalso have a higher priority than read locks, so a request for a write lock will advance tothe front of the lock queue even if readers are already in the queue (write locks canadvance past read locks in the queue, but read locks cannot advance past write locks).Although storage engines can manage their own locks, MySQL itself also uses a varietyof locks that are effectively table-level for various purposes. For instance, the serveruses a table-level lock for statements such as ALTER TABLE, regardless of the storageengine.

    Row locks

    The locking style that offers the greatest concurrency (and carries the greatest overhead)is the use of row locks. Row-level locking, as this strategy is commonly known, isavailable in the InnoDB and XtraDB storage engines, among others. Row locks areimplemented in the storage engine, not the server (refer back to the logical architecturediagram if you need to). The server is completely unaware of locks implemented in the

    Concurrency Control | 5

  • storage engines, and as youll see later in this chapter and throughout the book, thestorage engines all implement locking in their own ways.

    TransactionsYou cant examine the more advanced features of a database system for very long beforetransactions enter the mix. A transaction is a group of SQL queries that are treatedatomically, as a single unit of work. If the database engine can apply the entire groupof queries to a database, it does so, but if any of them cant be done because of a crashor other reason, none of them is applied. Its all or nothing.Little of this section is specific to MySQL. If youre already familiar with ACID trans-actions, feel free to skip ahead to Transactions in MySQL on page 10.A banking application is the classic example of why transactions are necessary. Imaginea banks database with two tables: checking and savings. To move $200 from Janeschecking account to her savings account, you need to perform at least three steps:

    1. Make sure her checking account balance is greater than $200.2. Subtract $200 from her checking account balance.3. Add $200 to her savings account balance.

    The entire operation should be wrapped in a transaction so that if any one of the stepsfails, any completed steps can be rolled back.You start a transaction with the START TRANSACTION statement and then either make itschanges permanent with COMMIT or discard the changes with ROLLBACK. So, the SQL forour sample transaction might look like this:

    1 START TRANSACTION;2 SELECT balance FROM checking WHERE customer_id = 10233276;3 UPDATE checking SET balance = balance - 200.00 WHERE customer_id = 10233276;4 UPDATE savings SET balance = balance + 200.00 WHERE customer_id = 10233276;5 COMMIT;

    But transactions alone arent the whole story. What happens if the database servercrashes while performing line 4? Who knows? The customer probably just lost $200.And what if another process comes along between lines 3 and 4 and removes the entirechecking account balance? The bank has given the customer a $200 credit without evenknowing it.Transactions arent enough unless the system passes the ACID test. ACID stands forAtomicity, Consistency, Isolation, and Durability. These are tightly related criteria thata well-behaved transaction processing system must meet:Atomicity

    A transaction must function as a single indivisible unit of work so that the entiretransaction is either applied or rolled back. When transactions are atomic, there isno such thing as a partially completed transaction: its all or nothing.

    6 | Chapter 1:MySQL Architecture and History

  • ConsistencyThe database should always move from one consistent state to the next. In ourexample, consistency ensures that a crash between lines 3 and 4 doesnt result in$200 disappearing from the checking account. Because the transaction is nevercommitted, none of the transactions changes are ever reflected in the database.

    IsolationThe results of a transaction are usually invisible to other transactions until thetransaction is complete. This ensures that if a bank account summary runs afterline 3 but before line 4 in our example, it will still see the $200 in the checkingaccount. When we discuss isolation levels, youll understand why we said usu-ally invisible.

    DurabilityOnce committed, a transactions changes are permanent. This means the changesmust be recorded such that data wont be lost in a system crash. Durability is aslightly fuzzy concept, however, because there are actually many levels. Some du-rability strategies provide a stronger safety guarantee than others, and nothing isever 100% durable (if the database itself were truly durable, then how could back-ups increase durability?). We discuss what durability really means in MySQL inlater chapters.

    ACID transactions ensure that banks dont lose your money. It is generally extremelydifficult or impossible to do this with application logic. An ACID-compliant databaseserver has to do all sorts of complicated things you might not realize to provide ACIDguarantees.Just as with increased lock granularity, the downside of this extra security is that thedatabase server has to do more work. A database server with ACID transactions alsogenerally requires more CPU power, memory, and disk space than one without them.As weve said several times, this is where MySQLs storage engine architecture worksto your advantage. You can decide whether your application needs transactions. If youdont really need them, you might be able to get higher performance with a nontran-sactional storage engine for some kinds of queries. You might be able to use LOCKTABLES to give the level of protection you need without transactions. Its all up to you.

    Isolation LevelsIsolation is more complex than it looks. The SQL standard defines four isolation levels,with specific rules for which changes are and arent visible inside and outside a trans-action. Lower isolation levels typically allow higher concurrency and have loweroverhead.

    Transactions | 7

  • Each storage engine implements isolation levels slightly differently, andthey dont necessarily match what you might expect if youre used toanother database product (thus, we wont go into exhaustive detail inthis section). You should read the manuals for whichever storage en-gines you decide to use.

    Lets take a quick look at the four isolation levels:READ UNCOMMITTED

    In the READ UNCOMMITTED isolation level, transactions can view the results of un-committed transactions. At this level, many problems can occur unless you really,really know what you are doing and have a good reason for doing it. This level israrely used in practice, because its performance isnt much better than the otherlevels, which have many advantages. Reading uncommitted data is also known asa dirty read.

    READ COMMITTEDThe default isolation level for most database systems (but not MySQL!) is READCOMMITTED. It satisfies the simple definition of isolation used earlier: a transactionwill see only those changes made by transactions that were already committedwhen it began, and its changes wont be visible to others until it has committed.This level still allows whats known as a nonrepeatable read. This means you canrun the same statement twice and see different data.

    REPEATABLE READREPEATABLE READ solves the problems that READ UNCOMMITTED allows. It guaranteesthat any rows a transaction reads will look the same in subsequent reads withinthe same transaction, but in theory it still allows another tricky problem: phantomreads. Simply put, a phantom read can happen when you select some range of rows,another transaction inserts a new row into the range, and then you select the samerange again; you will then see the new phantom row. InnoDB and XtraDB solvethe phantom read problem with multiversion concurrency control, which we ex-plain later in this chapter.REPEATABLE READ is MySQLs default transaction isolation level.

    SERIALIZABLEThe highest level of isolation, SERIALIZABLE, solves the phantom read problem byforcing transactions to be ordered so that they cant possibly conflict. In a nutshell,SERIALIZABLE places a lock on every row it reads. At this level, a lot of timeouts andlock contention can occur. Weve rarely seen people use this isolation level, butyour applications needs might force you to accept the decreased concurrency infavor of the data stability that results.

    Table 1-1 summarizes the various isolation levels and the drawbacks associated witheach one.

    8 | Chapter 1:MySQL Architecture and History

  • Table 1-1. ANSI SQL isolation levels

    Isolation level Dirty reads possibleNonrepeatable readspossible

    Phantom readspossible Locking reads

    READ UNCOMMITTED Yes Yes Yes No

    READ COMMITTED No Yes Yes No

    REPEATABLE READ No No Yes No

    SERIALIZABLE No No No Yes

    DeadlocksA deadlock is when two or more transactions are mutually holding and requesting lockson the same resources, creating a cycle of dependencies. Deadlocks occur when trans-actions try to lock resources in a different order. They can happen whenever multipletransactions lock the same resources. For example, consider these two transactionsrunning against the StockPrice table:Transaction #1

    START TRANSACTION;UPDATE StockPrice SET close = 45.50 WHERE stock_id = 4 and date = '2002-05-01';UPDATE StockPrice SET close = 19.80 WHERE stock_id = 3 and date = '2002-05-02';COMMIT;

    Transaction #2START TRANSACTION;UPDATE StockPrice SET high = 20.12 WHERE stock_id = 3 and date = '2002-05-02';UPDATE StockPrice SET high = 47.20 WHERE stock_id = 4 and date = '2002-05-01';COMMIT;

    If youre unlucky, each transaction will execute its first query and update a row of data,locking it in the process. Each transaction will then attempt to update its second row,only to find that it is already locked. The two transactions will wait forever for eachother to complete, unless something intervenes to break the deadlock.To combat this problem, database systems implement various forms of deadlock de-tection and timeouts. The more sophisticated systems, such as the InnoDB storageengine, will notice circular dependencies and return an error instantly. This can be agood thingotherwise, deadlocks would manifest themselves as very slow queries.Others will give up after the query exceeds a lock wait timeout, which is not alwaysgood. The way InnoDB currently handles deadlocks is to roll back the transaction thathas the fewest exclusive row locks (an approximate metric for which will be the easiestto roll back).Lock behavior and order are storage enginespecific, so some storage engines mightdeadlock on a certain sequence of statements even though others wont. Deadlockshave a dual nature: some are unavoidable because of true data conflicts, and some arecaused by how a storage engine works.

    Transactions | 9

  • Deadlocks cannot be broken without rolling back one of the transactions, either par-tially or wholly. They are a fact of life in transactional systems, and your applicationsshould be designed to handle them. Many applications can simply retry their transac-tions from the beginning.

    Transaction LoggingTransaction logging helps make transactions more efficient. Instead of updating thetables on disk each time a change occurs, the storage engine can change its in-memorycopy of the data. This is very fast. The storage engine can then write a record of thechange to the transaction log, which is on disk and therefore durable. This is also arelatively fast operation, because appending log events involves sequential I/O in onesmall area of the disk instead of random I/O in many places. Then, at some later time,a process can update the table on disk. Thus, most storage engines that use this tech-nique (known as write-ahead logging) end up writing the changes to disk twice.If theres a crash after the update is written to the transaction log but before the changesare made to the data itself, the storage engine can still recover the changes upon restart.The recovery method varies between storage engines.

    Transactions in MySQLMySQL provides two transactional storage engines: InnoDB and NDB Cluster. Severalthird-party engines are also available; the best-known engines right now are XtraDBand PBXT. We discuss some specific properties of each engine in the next section.

    AUTOCOMMIT

    MySQL operates in AUTOCOMMIT mode by default. This means that unless youve ex-plicitly begun a transaction, it automatically executes each query in a separate trans-action. You can enable or disable AUTOCOMMIT for the current connection by setting avariable:

    mysql> SHOW VARIABLES LIKE 'AUTOCOMMIT';+---------------+-------+| Variable_name | Value |+---------------+-------+| autocommit | ON |+---------------+-------+1 row in set (0.00 sec)mysql> SET AUTOCOMMIT = 1;

    The values 1 and ON are equivalent, as are 0 and OFF. When you run with AUTOCOMMIT=0, you are always in a transaction, until you issue a COMMIT or ROLLBACK. MySQL thenstarts a new transaction immediately. Changing the value of AUTOCOMMIT has no effecton nontransactional tables, such as MyISAM or Memory tables, which have no notionof committing or rolling back changes.

    10 | Chapter 1:MySQL Architecture and History

  • Certain commands, when issued during an open transaction, cause MySQL to committhe transaction before they execute. These are typically Data Definition Language(DDL) commands that make significant changes, such as ALTER TABLE, but LOCKTABLES and some other statements also have this effect. Check your versions docu-mentation for the full list of commands that automatically commit a transaction.MySQL lets you set the isolation level using the SET TRANSACTION ISOLATION LEVELcommand, which takes effect when the next transaction starts. You can set the isolationlevel for the whole server in the configuration file, or just for your session:

    mysql> SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

    MySQL recognizes all four ANSI standard isolation levels, and InnoDB supports all ofthem.

    Mixing storage engines in transactions

    MySQL doesnt manage transactions at the server level. Instead, the underlying storageengines implement transactions themselves. This means you cant reliably mix differentengines in a single transaction.If you mix transactional and nontransactional tables (for instance, InnoDB andMyISAM tables) in a transaction, the transaction will work properly if all goes well.However, if a rollback is required, the changes to the nontransactional table cant beundone. This leaves the database in an inconsistent state from which it might be difficultto recover and renders the entire point of transactions moot. This is why it is reallyimportant to pick the right storage engine for each table.MySQL will not usually warn you or raise errors if you do transactional operations ona nontransactional table. Sometimes rolling back a transaction will generate the warn-ing Some nontransactional changed tables couldnt be rolled back, but most of thetime, youll have no indication youre working with nontransactional tables.

    Implicit and explicit locking

    InnoDB uses a two-phase locking protocol. It can acquire locks at any time during atransaction, but it does not release them until a COMMIT or ROLLBACK. It releases all thelocks at the same time. The locking mechanisms described earlier are all implicit.InnoDB handles locks automatically, according to your isolation level.However, InnoDB also supports explicit locking, which the SQL standard does notmention at all:3

    SELECT ... LOCK IN SHARE MODE SELECT ... FOR UPDATE

    3. These locking hints are frequently abused and should usually be avoided; see Chapter 6 for more details.

    Transactions | 11

  • MySQL also supports the LOCK TABLES and UNLOCK TABLES commands, which are im-plemented in the server, not in the storage engines. These have their uses, but they arenot a substitute for transactions. If you need transactions, use a transactional storageengine.We often see applications that have been converted from MyISAM to InnoDB but arestill using LOCK TABLES. This is no longer necessary because of row-level locking, and itcan cause severe performance problems.

    The interaction between LOCK TABLES and transactions is complex, andthere are unexpected behaviors in some server versions. Therefore, werecommend that you never use LOCK TABLES unless you are in a trans-action and AUTOCOMMIT is disabled, no matter what storage engine youare using.

    Multiversion Concurrency ControlMost of MySQLs transactional storage engines dont use a simple row-locking mech-anism. Instead, they use row-level locking in conjunction with a technique for increas-ing concurrency known as multiversion concurrency control (MVCC). MVCC is notunique to MySQL: Oracle, PostgreSQL, and some other database systems use it too,although there are significant differences because there is no standard for how MVCCshould work.You can think of MVCC as a twist on row-level locking; it avoids the need for lockingat all in many cases and can have much lower overhead. Depending on how it is im-plemented, it can allow nonlocking reads, while locking only the necessary rows duringwrite operations.MVCC works by keeping a snapshot of the data as it existed at some point in time.This means transactions can see a consistent view of the data, no matter how long theyrun. It also means different transactions can see different data in the same tables at thesame time! If youve never experienced this before, it might be confusing, but it willbecome easier to understand with familiarity.Each storage engine implements MVCC differently. Some of the variations include optimistic and pessimistic concurrency control. Well illustrate one way MVCC worksby explaining a simplified version of InnoDBs behavior.InnoDB implements MVCC by storing with each row two additional, hidden valuesthat record when the row was created and when it was expired (or deleted). Ratherthan storing the actual times at which these events occurred, the row stores the systemversion number at the time each event occurred. This is a number that increments eachtime a transaction begins. Each transaction keeps its own record of the current systemversion, as of the time it began. Each query has to check each rows version numbers

    12 | Chapter 1:MySQL Architecture and History

  • against the transactions version. Lets see how this applies to particular operationswhen the transaction isolation level is set to REPEATABLE READ:SELECT

    InnoDB must examine each row to ensure that it meets two criteria:a. InnoDB must find a version of the row that is at least as old as the transaction

    (i.e., its version must be less than or equal to the transactions version). Thisensures that either the row existed before the transaction began, or the trans-action created or altered the row.

    b. The rows deletion version must be undefined or greater than the transactionsversion. This ensures that the row wasnt deleted before the transaction began.

    Rows that pass both tests may be returned as the querys result.INSERT

    InnoDB records the current system version number with the new row.DELETE

    InnoDB records the current system version number as the rows deletion ID.UPDATE

    InnoDB writes a new copy of the row, using the system version number for the newrows version. It also writes the system version number as the old rows deletionversion.

    The result of all this extra record keeping is that most read queries never acquire locks.They simply read data as fast as they can, making sure to select only rows that meetthe criteria. The drawbacks are that the storage engine has to store more data with eachrow, do more work when examining rows, and handle some additional housekeepingoperations.MVCC works only with the REPEATABLE READ and READ COMMITTED isolation levels. READUNCOMMITTED isnt MVCC-compatible4 because queries dont read the row versionthats appropriate for their transaction version; they read the newest version, no matterwhat. SERIALIZABLE isnt MVCC-compatible because reads lock every row they return.

    MySQLs Storage EnginesThis section gives an overview of MySQLs storage engines. We wont go into greatdetail here, because we discuss storage engines and their particular behaviors through-out the book. Even this book, though, isnt a complete source of documentation; youshould read the MySQL manuals for the storage engines you decide to use.MySQL stores each database (also called a schema) as a subdirectory of its data directoryin the underlying filesystem. When you create a table, MySQL stores the table definition

    4. There is no formal standard that defines MVCC, so different engines and databases implement it verydifferently, and no one can say any of them is wrong.

    MySQLs Storage Engines | 13

  • in a .frm file with the same name as the table. Thus, when you create a table namedMyTable, MySQL stores the table definition in MyTable.frm. Because MySQL uses thefilesystem to store database names and table definitions, case sensitivity depends onthe platform. On a Windows MySQL instance, table and database names are caseinsensitive; on Unix-like systems, they are case sensitive. Each storage engine stores thetables data and indexes differently, but the server itself handles the table definition.You can use the SHOW TABLE STATUS command (or in MySQL 5.0 and newer versions,query the INFORMATION_SCHEMA tables) to display information about tables. For example,to examine the user table in the mysql database, execute the following:

    mysql> SHOW TABLE STATUS LIKE 'user' \G*************************** 1. row *************************** Name: user Engine: MyISAM Row_format: Dynamic Rows: 6 Avg_row_length: 59 Data_length: 356Max_data_length: 4294967295 Index_length: 2048 Data_free: 0 Auto_increment: NULL Create_time: 2002-01-24 18:07:17 Update_time: 2002-01-24 21:56:29 Check_time: NULL Collation: utf8_bin Checksum: NULL Create_options: Comment: Users and global privileges1 row in set (0.00 sec)

    The output shows that this is a MyISAM table. You might also notice a lot of otherinformation and statistics in the output. Lets look briefly at what each line means:Name

    The tables name.Engine

    The tables storage engine. In old versions of MySQL, this column was namedType, not Engine.

    Row_formatThe row format. For a MyISAM table, this can be Dynamic, Fixed, or Compressed.Dynamic rows vary in length because they contain variable-length fields such asVARCHAR or BLOB. Fixed rows, which are always the same size, are made up of fieldsthat dont vary in length, such as CHAR and INTEGER. Compressed rows exist only incompressed tables; see Compressed MyISAM tables on page 19.

    RowsThe number of rows in the table. For MyISAM and most other engines, this numberis always accurate. For InnoDB, it is an estimate.

    14 | Chapter 1:MySQL Architecture and History

  • Avg_row_lengthHow many bytes the average row contains.

    Data_lengthHow much data (in bytes) the entire table contains.

    Max_data_lengthThe maximum amount of data this table can hold. This is engine-specific.

    Index_lengthHow much disk space the index data consumes.

    Data_freeFor a MyISAM table, the amount of space that is allocated but currently unused.This space holds previously deleted rows and can be reclaimed by future INSERTstatements.

    Auto_incrementThe next AUTO_INCREMENT value.

    Create_timeWhen the table was first created.

    Update_timeWhen data in the table last changed.

    Check_timeWhen the table was last checked using CHECK TABLE or myisamchk.

    CollationThe default character set and collation for character columns in this table.

    ChecksumA live checksum of the entire tables contents, if enabled.

    Create_optionsAny other options that were specified when the table was created.

    CommentThis field contains a variety of extra information. For a MyISAM table, it containsthe comments, if any, that were set when the table was created. If the table usesthe InnoDB storage engine, the amount of free space in the InnoDB tablespaceappears here. If the table is a view, the comment contains the text VIEW.

    The InnoDB EngineInnoDB is the default transactional storage engine for MySQL and the most importantand broadly useful engine overall. It was designed for processing many short-livedtransactions that usually complete rather than being rolled back. Its performance andautomatic crash recovery make it popular for nontransactional storage needs, too. Youshould use InnoDB for your tables unless you have a compelling need to use a differentengine. If you want to study storage engines, it is also well worth your time to study

    MySQLs Storage Engines | 15

  • InnoDB in depth to learn as much as you can about it, rather than studying all storageengines equally.

    InnoDBs history

    InnoDB has a complex release history, but its very helpful to understand it. In 2008,the so-called InnoDB plugin was released for MySQL 5.1. This was the next generationof InnoDB created by Oracle, which at that time owned InnoDB but not MySQL. Forvarious reasons that are great to discuss over beers, MySQL continued shipping theolder version of InnoDB, compiled into the server. But you could disable this and installthe newer, better-performing, more scalable InnoDB plugin if you wished. Eventually,Oracle acquired Sun Microsystems and thus MySQL, and removed the older codebase,replacing it with the plugin by default in MySQL 5.5. (Yes, this means that now theplugin is actually compiled in, not installed as a plugin. Old terminology dies hard.)The modern version of InnoDB, introduced as the InnoDB plugin in MySQL 5.1, sportsnew features such as building indexes by sorting, the ability to drop and add indexeswithout rebuilding the whole table, and a new storage format that offers compression,a new way to store large values such as BLOB columns, and file format management.Many people who use MySQL 5.1 dont use the plugin, sometimes because they arentaware of it. If youre using MySQL 5.1, please ensure that youre using the InnoDBplugin. Its much better than the older version of InnoDB.InnoDB is such an important engine that many people and companies have investedin developing it, not just Oracles team. Notable contributions have come from Google,Yasufumi Kinoshita, Percona, and Facebook, among others. Some of these improve-ments have been included into the official InnoDB source code, and many others havebeen reimplemented in slightly different ways by the InnoDB team. In general,InnoDBs development has accelerated greatly in the last few years, with major im-provements to instrumentation, scalability, configurability, performance, features, andsupport for Windows, among other notable items. MySQL 5.6 lab previews and mile-stone releases include a remarkable palette of new features for InnoDB, too.Oracle is investing tremendous resources in improving InnoDB performance, and doinga great job of it (a considerable amount of external contribution has helped with this,too). In the second edition of this book, we noted that InnoDB failed pretty miserablybeyond four CPU cores. It now scales well to 24 CPU cores, and arguably up to 32 oreven more cores depending on the scenario. Many improvements are slated for theupcoming 5.6 release, but there are still opportunities for enhancement.

    InnoDB overview

    InnoDB stores its data in a series of one or more data files that are collectively knownas a tablespace. A tablespace is essentially a black box that InnoDB manages all by itself.In MySQL 4.1 and newer versions, InnoDB can store each tables data and indexes in

    16 | Chapter 1:MySQL Architecture and History

  • separate files. InnoDB can also use raw disk partitions for building its tablespace, butmodern filesystems make this unnecessary.InnoDB uses MVCC to achieve high concurrency, and it implements all four SQL stan-dard isolation levels. It defaults to the REPEATABLE READ isolation level, and it has a next-key locking strategy that prevents phantom reads in this isolation level: rather thanlocking only the rows youve touched in a query, InnoDB locks gaps in the index struc-ture as well, preventing phantoms from being inserted.InnoDB tables are built on a clustered index, which we will cover in detail in later chap-ters. InnoDBs index structures are very different from those of most other MySQLstorage engines. As a result, it provides very fast primary key lookups. However, sec-ondary indexes (indexes that arent the primary key) contain the primary key columns,so if your primary key is large, other indexes will also be large