MMUG18 - MySQL Failover and Orchestrator

26
MMUG18 MySQL Failover and Orchestrator Simon Mudd [email protected] 17 th May 2017

Transcript of MMUG18 - MySQL Failover and Orchestrator

MMUG18

MySQLFailoverandOrchestratorSimonMudd [email protected]

17th May2017

GraciasaTuenti

117/05/2017 MadridMySQLUsersGroup- MMUG18

• Por permitir eluso desus oficinas paraesta presentación

Content

• HandlingfailoverwithMySQL• Downtime&Requirements• MySQLClusteringsolutions• Non-clusterings solutionsandconsiderations

• Orchestrator• Questions

217/05/2017 MadridMySQLUsersGroup- MMUG18

IsDowntimeAcceptable?

• Doyouhaveasystemthatneedstorun24x7?• Noteveryonedoes• Ifyouhaveawebsitethengenerallydowntimeisnotacceptable

317/05/2017 MadridMySQLUsersGroup- MMUG18

Requirements

Goal:Run24x7x365withno downtime• Isthisreallynecessary?• Ifyouaskmanagementthey’llalwayssayyes…• Whatisthecost?• Shorterdowntimerequirementsmeanmoreeffortspenttoachievethat• Howdoyoureliably detectfailure?Hardproblemtosolve

Ifyouacceptdowntimehowmuchcanyoureallytolerate?• 1s,5s,30s,1min?

417/05/2017 MadridMySQLUsersGroup- MMUG18

Whatoptionsareavailable?

• MySQLCluster• carriergrade• veryhighuptime• NotInnoDB – specialised workloads

• Galera• Oftenwithasynchronousreplicationbetweendatacentres

• InnoDB Cluster• Verynew

• Allrequireclientstotakeactiononfailureofanode• Ifyouuseaproxythatcanfailtoo…

517/05/2017 MadridMySQLUsersGroup- MMUG18

Whatoptionsareavailable?

“Clustersolutions”• Donotworkwellcross-DCduetolatency• Ifyouacceptwritesintomultiplemastersthere’sachanceofconflict• Slowsthingsdown• InnoDB Clusternowdoesnotrecommendthisbehaviour – requirescare

• Onlysmallsetupsworkinasingledata-centre soadaptationhereisalsoneeded• Clustersetupsdonotscaleeasilyto10ormoreservers

617/05/2017 MadridMySQLUsersGroup- MMUG18

Whatoptionsareavailable?

• StandardMySQL,MariaDB,AmazonRDS,GoogleCloudSQL,…• Readscale-out• Asynchronousreplication• Semi-synchelpsimproveperformanceandensuredatais“somewhereelse”whenacknowledgingatransaction

• Ifyouareoutofthecloudthen:differentsetups• SBRorRBR?• NoGTID,OracleorMySQLGTID?• Optionalsemi-sync?

• Ifyouareoutofthecloudthen:doityourself• MHA• MariaDB ReplicationManager• Orchestrator

717/05/2017 MadridMySQLUsersGroup- MMUG18

Orchestrator

817/05/2017 MadridMySQLUsersGroup- MMUG18

Orchestrator

• Handlesmasterfailover,butmore…• GUItomanageandvisualise topology– veryhandy• CLItodothesamethings– goodforscripting• APIcallstorunatadistance(moregenericinterface)• NeedsaDBbackendtostorestate.• NormallyMySQLbutcanbeSQLite

917/05/2017 MadridMySQLUsersGroup- MMUG18

Orchestrator

Whatfailuresdoesithandle?• Masterfailures– needstotalktoexternalsystems• Intermediatemasterfailures– canhandleonitsown• Doesnot careaboutslavesorapplications• WorkswithGTID:OracleorMariaDB• WorkswithoutusingGTID:CanaddPseudo-GTID (eventsinjectedonthemasterareusedtofindamatch)sononeedtomigratetoGTIDifnotwanted• Handlesmulti-leveltopologies

1017/05/2017 MadridMySQLUsersGroup- MMUG18

OrchestratorGUI

1117/05/2017 MadridMySQLUsersGroup- MMUG18

OrchestratorGUI

1217/05/2017 MadridMySQLUsersGroup- MMUG18

OrchestratorGUI

1317/05/2017 MadridMySQLUsersGroup- MMUG18

OrchestratorCLI

Over100commandsyoucanuse• E.g.

• relocate• discover• begin-downtime,end-downtime• topology

1417/05/2017 MadridMySQLUsersGroup- MMUG18

OrchestratorCLI

17/05/2017 MadridMySQLUsersGroup- MMUG18 15

FailureNotifications

• Usingthehookscantalktojabberoremailtoadviseoftheactionstaken:

17/05/2017 MadridMySQLUsersGroup- MMUG18 16

FailureAuditing

17/05/2017 MadridMySQLUsersGroup- MMUG18 17

OrchestratorSetup

• Sourceatgithub.com/github/orchestrator• Binarieswritteningo• Daemonrunswebserviceanddiscovery,clientoneachMySQLserver• StatestoredinMySQL/SQLite• Singlejson configurationfile:/etc/orchestrator.conf.json

• Howtoreachbackenddatabase(storesstate)• Howtorecognise delay• Mostdefaultsaregoodtogetyougoing• Whichsystemsyouwanttotriggerrecoveryon• Hookstohandlerecovery(talkingtoexternalsystems)• Ifyouneedhelppleaseask

1817/05/2017 MadridMySQLUsersGroup- MMUG18

OrchestratorCharacteristics

• Discover oneserverinyourclusterandorchestratorwillfindtheothers• Detectsnewserversintheclusterautomatically• Notifiesyouofproblemsseen• Recoveryisoptional(percluster)• Optionalselectionofcandidatemastersorserverstoblacklist• GlobalON /OFF switch– handyifseveralfailureshappenatonce• ForparanoidDBAs,sofarorchestratorhasalwaysdonetherightthing

1917/05/2017 MadridMySQLUsersGroup- MMUG18

OrchestratorHA?

OrchestratorcanberuninHAmode• Multipledaemonswillco-operatesoifonefailsanotheronetakesover(theysharethedatabasebackend)• UsealoadbalancertoprovideanHAGUIservice• Usenginx (orsimilar)forauthenticationandTLSifneeded• Upgradesareeasier• ReplicatetheorchestratorMySQLbackendtonotlosedata

2017/05/2017 MadridMySQLUsersGroup- MMUG18

DoesitScale?

Yes• Booking.com hasalargeinstallationwithasingleclustermonitoringthousandsofMySQLservers• RecommendedbyYouTubeformanagingVitess servers• Quiteanumberofotherusersbuttheyarenotveryvisible

2117/05/2017 MadridMySQLUsersGroup- MMUG18

Futurework

• Simplifyconfigurationandsetupsomorepeoplecanuseit• Improvescalability• MakeitworkonAmazonRDS• Spreadtheword…

17/05/2017 MadridMySQLUsersGroup- MMUG18 22

Furtherhelpneeded?

• github.com/github/orchestrator• forIssues(Problems/Questions)andPullRequests(patches)

• GoogleGroup:OrchestratorMySQL• https://groups.google.com/forum/#!forum/orchestrator-mysql

• FeelfreetocontactmeandIwilltrytohelpprovidepointers

2317/05/2017 MadridMySQLUsersGroup- MMUG18

Oh,andBooking.com ishiring!

• Almostanyrole:• MySQLEngineer/DBA• SystemAdministrator• SystemEngineer• SiteReliabilityEngineer• Developer• Designer• TechnicalTeamLead• ProductOwner• DataScientist• Andmanymore…

• https://workingatbooking.com/

17/05/2017 MadridMySQLUsersGroup- MMUG18 24

Questions

?

17/05/2017 MadridMySQLUsersGroup- MMUG18 25