Choosing a MySQL HA Solution

29
Choosing a MySQL HA Solution Ernie Souhrada, Senior Consultant Webinar Presentation 05 June 2013

Transcript of Choosing a MySQL HA Solution

Page 1: Choosing a MySQL HA Solution

Choosing a MySQL HA SolutionErnie Souhrada, Senior Consultant

Webinar Presentation05 June 2013

Page 2: Choosing a MySQL HA Solution

www.percona.com

Agenda

● Know Thy Presenter● Why Are We Here?● High Availability? HA!● So Many Options... So Little Time● Escaping Choice Paralysis● Square Pegs, Square Holes● Square Pegs, Round Holes● Pegs? What Pegs?

Page 3: Choosing a MySQL HA Solution

www.percona.com

Know Thy Presenter

● Joined Percona in April 2012● Mathematics / Political Science academic● Coming up on 20 years in IT, 15 with MySQL● Skier, Psytrancer, Technological Generalist● Specialization is for insects

Page 4: Choosing a MySQL HA Solution

www.percona.com

Why Are We Here?

● The great existential question● There is no perfect solution.● Beware the snake-oil peddlers.● It's all about what works for you.● Think. Ask questions. Experiment.

Page 5: Choosing a MySQL HA Solution

www.percona.com

High Availability is Easy?!

● Availability is usually measured in percentages that indicate the amount of downtime per year.● 98% availability ~ 1 week per year● 99% ~ 3.65 days | 99.9% ~ 8.75 hours● The holy grail – 99.999% ~ 5 minutes

● But what does it mean for a system to be considered available?

● How do we check?● When is an “UP” server not really “UP” ?

Page 6: Choosing a MySQL HA Solution

www.percona.com

HA! Think again.

● PING www.example.com --- www.example.com ping statistics ---1 packets transmitted, 1 received, 0% packet loss, time 37851msrtt min/avg/max/mdev = 32229.497/32229.497/32229.497/0.000 ms

● time HEAD http://www.example.com | head -1200 OKreal 0m15.539s

● SELECT COUNT(*) FROM mysql.user;MySQL Server has gone away.

Page 7: Choosing a MySQL HA Solution

www.percona.com

A-HA, Now IT Makes Sense

● Thinking too simplistically about availability can be misleading.● Downtime vs. scheduled downtime● System responsiveness● The system is greater than the sum of its parts.

● Have a meaningful SLA or don't have one at all.

Page 8: Choosing a MySQL HA Solution

www.percona.com

MySQL HA: So Many Options...

● Traditional (async) MySQL replication● Master-Slave or Master-Master● Manual Failover

– Yes, this can be an HA solution.● External manager frameworks / applications

– An alphabet soup of options: PRM, MHA, MMM, VIPs,keepalived, Pacemaker, Heartbeat

Page 9: Choosing a MySQL HA Solution

www.percona.com

So Little Time.

● Non-traditional MySQL replication● MySQL Semi-sync replication (MySQL 5.5+)● Tungsten Replicator● Galera (Percona XtraDB Cluster, MariaDB/Galera)● MySQL Cluster (NDB)

● Non-MySQL replication● Shared storage – DRBD, Lustre, NFS, SANs,● Other Esoterica – Clustrix, Xeround, etc.

Page 10: Choosing a MySQL HA Solution

www.percona.com

Escaping Choice Paralysis

● Choosing a solution doesn't have to be painful.● Avoid tunnel vision.● Avoid buzzwords and the flavor of the week.

● What works for Google probably isn't right for you.

● Consider three perspectives (often related)● Business● Philosophical● Technological

Page 11: Choosing a MySQL HA Solution

www.percona.com

Satisfying the Suits

● Your CEOs may not know FHA from MHA, but they have a vested interest in the outcome.● Cost of downtime

– The Lamborghini Factor– It's not just financial

● Risk analysis● Budgetary constraints● Tolerance for lost transactions● Maturity of the proposed solution● Supportability

Page 12: Choosing a MySQL HA Solution

www.percona.com

Placating the Philosophers

● Automated versus Manual Failover● GitHub's well-publicized issues from 2012.

– The debate isn't likely to end anytime soon.● Some people have very strong opinions here.

– No automated solution is guaranteed to do the right thing every time, but manual failover may involve a longer outage if a human can't be reached in a timely fashion.

● Need to balance the ideals with what's realistic.– A 24x7 NOC is great for manual failover.

● Tolerance for data loss / drift● Maturity of the proposed solution

Page 13: Choosing a MySQL HA Solution

www.percona.com

Talking Technically

● MySQL Feature Set Usage● Application-Related Issues● Performance● Scalability● Failover / Recovery Time● Operational Complexity● CAP Theorem● In-house knowledge / skills

Page 14: Choosing a MySQL HA Solution

www.percona.com

Square Pegs, Square Holes

● In light of these perspectives, where do our previously-discussed HA options fit?

● Traditional MySQL Replication● PRO: It's cheap and well-understood. Both

automated and manual failover options are available. Read-scaling is easy, and it works with any storage engine.

● CON: It's MySQL replication. Data drift is obscenely easy. Active-active replication topologies are brittle, and without some serious voodoo, one slave can still only have one master.

Page 15: Choosing a MySQL HA Solution

www.percona.com

A Round of Hand-Waving

● I started out as a physicist● Master-Slave w/Manual Failover

– Simple slave promotion. – Not too many moving parts unless you have multiple

slaves that need to be re-homed.

● Master-Master (1 active) w/Manual Failover– Change DNS, move a VIP, build failover awareness into

your application, etc.– Still reasonably simple as long as writes go to one place.– Additional slaves can complicate matters; binary log

coordinates differ between the master servers.

Page 16: Choosing a MySQL HA Solution

www.percona.com

Hand-Waving, Part Deux

● M-S or M-M with automated failover● Keepalived: VIP management● HAProxy: L4 traffic director● Pacemaker or Hearbeat: VIP management● PRM (Percona Replication Manager):

● Automated solution for master promotion and slave re-homing. Can be used manually, too, but not really designed for such.

● Generally works very well, but because it's a Pacemaker resource agent, it's subject to the whims of the Pacemaker developers, which has been an issue recently with CentOS 6.4.

● Does not make any guarantees about node consistency after a failover, but can be paired with semi-sync replication.

Page 17: Choosing a MySQL HA Solution

www.percona.com

Hand-Waving, Part The Third

● M-S / M-M with automated failover, continued● MMM – Multi-Master Replication Manager

● Agent-based system. Unreliable agent communication● Not sure if it's even still actively being developed. Don't use.

● MHA – Master High Availability for MySQL● Tries very hard to ensure data consistency when promoting a

new slave into the master role.● Can be dropped into an existing MySQL topology without

extensive reconfiguration.● The preferred choice of Percona's Remote DBA team.

● MySQL Utilities● New tools from Oracle designed to work with MySQL 5.6 and

GTID-based replication. Have yet to see this in the wild.

Page 18: Choosing a MySQL HA Solution

www.percona.com

Use Cases, Contraindications I

● Traditional replication is the most generic HA solution out there. If it runs fine on a single MySQL server, there's almost certainly a way to make it work reasonably well with replication. But...

● Pacemaker/Corosync solutions can have trouble with high-latency networks.

● None of the automated solutions handle extremely high load very well. MHA requires SSH connectivity; Pacemaker/Corosync solutions can lose messages and trigger spurious failovers.

● Under the hood, it's still replication. Periodic data consistency checks (think pt-table-checksum) should be de rigueur.

● Oracle's new MySQL Utilities are MySQL 5.6+ only.

Page 19: Choosing a MySQL HA Solution

www.percona.com

Square Pegs, Round Holes

● Non-traditional MySQL replication● Semi-sync replication (MySQL 5.5+)

– Master doesn't return success to the client until a slave has acknowledged receipt of the event (or the semi-sync timeout occurs).

– Sounded like a good idea in theory, but never saw wide adoption as far as I'm aware.

– Still requires something like PRM, MHA, or other external management framework to actually effect a failover.

Page 20: Choosing a MySQL HA Solution

www.percona.com

Square Pegs, Round Holes II

● Tungsten Replicator (from Continuent)● Comes in FOSS and commercial flavors.● Java-based binary log processing and relaying

framework.● Complex replication topologies are possible,

including multi-master and replication between different DB platforms. This is cool, if it works as advertised.

● Haven't seen it or worked with it in the wild, so I'm hesitant to make any judgement calls.

Page 21: Choosing a MySQL HA Solution

www.percona.com

All Your Square Pegs are in RAM

● MySQL Cluster (NDB)– Until recently, ran entirely in memory, but now disk-based

tables are available (with some limitations).– Handles sharding and data redistribution automatically as

nodes are added.– If your application is of the type that NDB was designed

for (lots of small, simple writes, simple key-value lookups that don't require JOINs), it will likely outperform any other cluster/multi-machine solution in the MySQL ecosphere.

● But... Setup, tuning and configuration is extremely complicated; NDB-specific knowledge is required.

Page 22: Choosing a MySQL HA Solution

www.percona.com

Synchronous Pegs

● Galera-based Solutions● Percona XtraDB Cluster (PXC), MariaDB + Galera

– Synchronous replication– Can support true multi-master writing, reads are served

up locally.– InnoDB-only (MyISAM support is experimental)– In the CAP theorem, Galera's focus is on consistency

over availability.– Tremendous potential here; seems to solve a lot of the

most common gripes about MySQL replication, but still in the process of maturing.

Page 23: Choosing a MySQL HA Solution

www.percona.com

Use Cases, Contraindications II

● Semi-sync replication – Can be paired with MHA or PRM.– Probably won't work very well if all the slaves are in

remote datacenters.

● Tungsten– Doesn't appear to work with PXC.

● NDB Cluster– Extremely fast for a specific class of applications (small

writes, key-value lookups without JOINs)– Can achieve 99.999% uptime if configured properly.– Setup, tuning, and troubleshooting are complex.

Page 24: Choosing a MySQL HA Solution

www.percona.com

Use Cases, Contraindications III

● PXC / MariaDB+Galera– Good for InnoDB where all the tables have PKs. – FK handling has had some issues recently.– Can be used over the WAN if the application or the end

users can handle the increased latency at COMMIT time, but this can also be a deal-breaker.

– A PXC cluster can automatically repair itself when a node drops out / returns; likewise, it can automatically expand the cluster if a new node joins.

Page 25: Choosing a MySQL HA Solution

www.percona.com

Pegs? What Pegs?

● Replication outside of MySQL● DRBD / Shared(SAN) Storage (typically used with

Pacemaker or some other management framework)– DRBD is mature and focused on data integrity, but there's

roughly a 20% disk performance penalty right off the top.– DRBD only really works with a two-server pair, and the

second server sits idle.– DRBD failover requires a MySQL crash recovery when it

spins up on the other side.– Many enterprise-level SANs have built-in facilities for

snapshots and replication independent of the database.

Page 26: Choosing a MySQL HA Solution

www.percona.com

There is No Peg

● Other Esoterica● Clustrix: Completely separate database product

that's MySQL-protocol compatible on the wire. Can't say anything else about it, but I suspect that it's geared for a few specific use cases.

● Xeround: Apparently no longer in business.

Page 27: Choosing a MySQL HA Solution

www.percona.com

One Last Round of Use Cases

● I probably wouldn't entertain Clustrix unless it solved a specific need that I couldn't meet some other way. I just don't know enough about it.

● For small to medium installations where data integrity is paramount and the write load isn't likely to exceed the capacity of a single machine, DRBD is a very good choice. Bare metal servers with directly-connectable NIC ports are a must.

Page 28: Choosing a MySQL HA Solution

www.percona.com

Parting Thoughts

● It bears repeating: there is no perfect solution, there is only what fits best for you.

● Plan and test thoroughly now, or cry later.

● If you want another nine, be prepared to add another zero.

● Any Questions?

Page 29: Choosing a MySQL HA Solution

Email: [email protected]: @denshikarasu

Join us in Portland for Percona MySQL University on Monday, 17 June 2013.Registration is FREE, but space is limited.

Visit http://www.percona.com for additional information.