Choosing a MySQL HA Solution
Transcript of Choosing a MySQL HA Solution
![Page 1: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/1.jpg)
Choosing a MySQL HA SolutionErnie Souhrada, Senior Consultant
Webinar Presentation05 June 2013
![Page 2: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/2.jpg)
www.percona.com
Agenda
● Know Thy Presenter● Why Are We Here?● High Availability? HA!● So Many Options... So Little Time● Escaping Choice Paralysis● Square Pegs, Square Holes● Square Pegs, Round Holes● Pegs? What Pegs?
![Page 3: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/3.jpg)
www.percona.com
Know Thy Presenter
● Joined Percona in April 2012● Mathematics / Political Science academic● Coming up on 20 years in IT, 15 with MySQL● Skier, Psytrancer, Technological Generalist● Specialization is for insects
![Page 4: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/4.jpg)
www.percona.com
Why Are We Here?
● The great existential question● There is no perfect solution.● Beware the snake-oil peddlers.● It's all about what works for you.● Think. Ask questions. Experiment.
![Page 5: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/5.jpg)
www.percona.com
High Availability is Easy?!
● Availability is usually measured in percentages that indicate the amount of downtime per year.● 98% availability ~ 1 week per year● 99% ~ 3.65 days | 99.9% ~ 8.75 hours● The holy grail – 99.999% ~ 5 minutes
● But what does it mean for a system to be considered available?
● How do we check?● When is an “UP” server not really “UP” ?
![Page 6: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/6.jpg)
www.percona.com
HA! Think again.
● PING www.example.com --- www.example.com ping statistics ---1 packets transmitted, 1 received, 0% packet loss, time 37851msrtt min/avg/max/mdev = 32229.497/32229.497/32229.497/0.000 ms
● time HEAD http://www.example.com | head -1200 OKreal 0m15.539s
● SELECT COUNT(*) FROM mysql.user;MySQL Server has gone away.
![Page 7: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/7.jpg)
www.percona.com
A-HA, Now IT Makes Sense
● Thinking too simplistically about availability can be misleading.● Downtime vs. scheduled downtime● System responsiveness● The system is greater than the sum of its parts.
● Have a meaningful SLA or don't have one at all.
![Page 8: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/8.jpg)
www.percona.com
MySQL HA: So Many Options...
● Traditional (async) MySQL replication● Master-Slave or Master-Master● Manual Failover
– Yes, this can be an HA solution.● External manager frameworks / applications
– An alphabet soup of options: PRM, MHA, MMM, VIPs,keepalived, Pacemaker, Heartbeat
![Page 9: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/9.jpg)
www.percona.com
So Little Time.
● Non-traditional MySQL replication● MySQL Semi-sync replication (MySQL 5.5+)● Tungsten Replicator● Galera (Percona XtraDB Cluster, MariaDB/Galera)● MySQL Cluster (NDB)
● Non-MySQL replication● Shared storage – DRBD, Lustre, NFS, SANs,● Other Esoterica – Clustrix, Xeround, etc.
![Page 10: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/10.jpg)
www.percona.com
Escaping Choice Paralysis
● Choosing a solution doesn't have to be painful.● Avoid tunnel vision.● Avoid buzzwords and the flavor of the week.
● What works for Google probably isn't right for you.
● Consider three perspectives (often related)● Business● Philosophical● Technological
![Page 11: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/11.jpg)
www.percona.com
Satisfying the Suits
● Your CEOs may not know FHA from MHA, but they have a vested interest in the outcome.● Cost of downtime
– The Lamborghini Factor– It's not just financial
● Risk analysis● Budgetary constraints● Tolerance for lost transactions● Maturity of the proposed solution● Supportability
![Page 12: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/12.jpg)
www.percona.com
Placating the Philosophers
● Automated versus Manual Failover● GitHub's well-publicized issues from 2012.
– The debate isn't likely to end anytime soon.● Some people have very strong opinions here.
– No automated solution is guaranteed to do the right thing every time, but manual failover may involve a longer outage if a human can't be reached in a timely fashion.
● Need to balance the ideals with what's realistic.– A 24x7 NOC is great for manual failover.
● Tolerance for data loss / drift● Maturity of the proposed solution
![Page 13: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/13.jpg)
www.percona.com
Talking Technically
● MySQL Feature Set Usage● Application-Related Issues● Performance● Scalability● Failover / Recovery Time● Operational Complexity● CAP Theorem● In-house knowledge / skills
![Page 14: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/14.jpg)
www.percona.com
Square Pegs, Square Holes
● In light of these perspectives, where do our previously-discussed HA options fit?
● Traditional MySQL Replication● PRO: It's cheap and well-understood. Both
automated and manual failover options are available. Read-scaling is easy, and it works with any storage engine.
● CON: It's MySQL replication. Data drift is obscenely easy. Active-active replication topologies are brittle, and without some serious voodoo, one slave can still only have one master.
![Page 15: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/15.jpg)
www.percona.com
A Round of Hand-Waving
● I started out as a physicist● Master-Slave w/Manual Failover
– Simple slave promotion. – Not too many moving parts unless you have multiple
slaves that need to be re-homed.
● Master-Master (1 active) w/Manual Failover– Change DNS, move a VIP, build failover awareness into
your application, etc.– Still reasonably simple as long as writes go to one place.– Additional slaves can complicate matters; binary log
coordinates differ between the master servers.
![Page 16: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/16.jpg)
www.percona.com
Hand-Waving, Part Deux
● M-S or M-M with automated failover● Keepalived: VIP management● HAProxy: L4 traffic director● Pacemaker or Hearbeat: VIP management● PRM (Percona Replication Manager):
● Automated solution for master promotion and slave re-homing. Can be used manually, too, but not really designed for such.
● Generally works very well, but because it's a Pacemaker resource agent, it's subject to the whims of the Pacemaker developers, which has been an issue recently with CentOS 6.4.
● Does not make any guarantees about node consistency after a failover, but can be paired with semi-sync replication.
![Page 17: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/17.jpg)
www.percona.com
Hand-Waving, Part The Third
● M-S / M-M with automated failover, continued● MMM – Multi-Master Replication Manager
● Agent-based system. Unreliable agent communication● Not sure if it's even still actively being developed. Don't use.
● MHA – Master High Availability for MySQL● Tries very hard to ensure data consistency when promoting a
new slave into the master role.● Can be dropped into an existing MySQL topology without
extensive reconfiguration.● The preferred choice of Percona's Remote DBA team.
● MySQL Utilities● New tools from Oracle designed to work with MySQL 5.6 and
GTID-based replication. Have yet to see this in the wild.
![Page 18: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/18.jpg)
www.percona.com
Use Cases, Contraindications I
● Traditional replication is the most generic HA solution out there. If it runs fine on a single MySQL server, there's almost certainly a way to make it work reasonably well with replication. But...
● Pacemaker/Corosync solutions can have trouble with high-latency networks.
● None of the automated solutions handle extremely high load very well. MHA requires SSH connectivity; Pacemaker/Corosync solutions can lose messages and trigger spurious failovers.
● Under the hood, it's still replication. Periodic data consistency checks (think pt-table-checksum) should be de rigueur.
● Oracle's new MySQL Utilities are MySQL 5.6+ only.
![Page 19: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/19.jpg)
www.percona.com
Square Pegs, Round Holes
● Non-traditional MySQL replication● Semi-sync replication (MySQL 5.5+)
– Master doesn't return success to the client until a slave has acknowledged receipt of the event (or the semi-sync timeout occurs).
– Sounded like a good idea in theory, but never saw wide adoption as far as I'm aware.
– Still requires something like PRM, MHA, or other external management framework to actually effect a failover.
![Page 20: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/20.jpg)
www.percona.com
Square Pegs, Round Holes II
● Tungsten Replicator (from Continuent)● Comes in FOSS and commercial flavors.● Java-based binary log processing and relaying
framework.● Complex replication topologies are possible,
including multi-master and replication between different DB platforms. This is cool, if it works as advertised.
● Haven't seen it or worked with it in the wild, so I'm hesitant to make any judgement calls.
![Page 21: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/21.jpg)
www.percona.com
All Your Square Pegs are in RAM
● MySQL Cluster (NDB)– Until recently, ran entirely in memory, but now disk-based
tables are available (with some limitations).– Handles sharding and data redistribution automatically as
nodes are added.– If your application is of the type that NDB was designed
for (lots of small, simple writes, simple key-value lookups that don't require JOINs), it will likely outperform any other cluster/multi-machine solution in the MySQL ecosphere.
● But... Setup, tuning and configuration is extremely complicated; NDB-specific knowledge is required.
![Page 22: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/22.jpg)
www.percona.com
Synchronous Pegs
● Galera-based Solutions● Percona XtraDB Cluster (PXC), MariaDB + Galera
– Synchronous replication– Can support true multi-master writing, reads are served
up locally.– InnoDB-only (MyISAM support is experimental)– In the CAP theorem, Galera's focus is on consistency
over availability.– Tremendous potential here; seems to solve a lot of the
most common gripes about MySQL replication, but still in the process of maturing.
![Page 23: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/23.jpg)
www.percona.com
Use Cases, Contraindications II
● Semi-sync replication – Can be paired with MHA or PRM.– Probably won't work very well if all the slaves are in
remote datacenters.
● Tungsten– Doesn't appear to work with PXC.
● NDB Cluster– Extremely fast for a specific class of applications (small
writes, key-value lookups without JOINs)– Can achieve 99.999% uptime if configured properly.– Setup, tuning, and troubleshooting are complex.
![Page 24: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/24.jpg)
www.percona.com
Use Cases, Contraindications III
● PXC / MariaDB+Galera– Good for InnoDB where all the tables have PKs. – FK handling has had some issues recently.– Can be used over the WAN if the application or the end
users can handle the increased latency at COMMIT time, but this can also be a deal-breaker.
– A PXC cluster can automatically repair itself when a node drops out / returns; likewise, it can automatically expand the cluster if a new node joins.
![Page 25: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/25.jpg)
www.percona.com
Pegs? What Pegs?
● Replication outside of MySQL● DRBD / Shared(SAN) Storage (typically used with
Pacemaker or some other management framework)– DRBD is mature and focused on data integrity, but there's
roughly a 20% disk performance penalty right off the top.– DRBD only really works with a two-server pair, and the
second server sits idle.– DRBD failover requires a MySQL crash recovery when it
spins up on the other side.– Many enterprise-level SANs have built-in facilities for
snapshots and replication independent of the database.
![Page 26: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/26.jpg)
www.percona.com
There is No Peg
● Other Esoterica● Clustrix: Completely separate database product
that's MySQL-protocol compatible on the wire. Can't say anything else about it, but I suspect that it's geared for a few specific use cases.
● Xeround: Apparently no longer in business.
![Page 27: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/27.jpg)
www.percona.com
One Last Round of Use Cases
● I probably wouldn't entertain Clustrix unless it solved a specific need that I couldn't meet some other way. I just don't know enough about it.
● For small to medium installations where data integrity is paramount and the write load isn't likely to exceed the capacity of a single machine, DRBD is a very good choice. Bare metal servers with directly-connectable NIC ports are a must.
![Page 28: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/28.jpg)
www.percona.com
Parting Thoughts
● It bears repeating: there is no perfect solution, there is only what fits best for you.
● Plan and test thoroughly now, or cry later.
● If you want another nine, be prepared to add another zero.
● Any Questions?
![Page 29: Choosing a MySQL HA Solution](https://reader036.fdocuments.net/reader036/viewer/2022071602/613d709d736caf36b75d5a23/html5/thumbnails/29.jpg)
Email: [email protected]: @denshikarasu
Join us in Portland for Percona MySQL University on Monday, 17 June 2013.Registration is FREE, but space is limited.
Visit http://www.percona.com for additional information.