Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers...

27
Orchestrator on Ra: internals, benefits and considerations Shlomi Noach GitHub FOSDEM 2018

Transcript of Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers...

Page 1: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Orchestrator on Raft: internals, benefits and considerations

Shlomi Noach GitHub

FOSDEM 2018

Page 2: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

About me

• @github/database-infrastructure

• Author of orchestrator, gh-ost, freno, ccql and others.

• Blog at http://openark.org

• @ShlomiNoach

Page 3: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Agenda

• Raft overview

• Why orchestrator/raft

• orchestrator/raft implementation and nuances

• HA, fencing

• Service discovery

• Considerations

Page 4: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Raft

• Consensus algorithm

• Quorum based

• In-order replication log

• Delivery, lag

• Snapshots! !

!!

!

Page 5: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

HashiCorp raft

• golang raft implementation

• Used by Consul

• Recently hit 1.0.0

• github.com/hashicorp/raft

Page 6: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

orchestrator

• MySQL high availability solution and replication topology manager

• Developed at GitHub

• Apache 2 license

• github.com/github/orchestrator

"

"

"

" ""

"

" ""

"

" ""

"

""

Page 7: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Why orchestrator/raft

• Remove MySQL backend dependency

• DC fencing

And then good things happened that were not planned:

• Better cross-DC deployments

• DC-local KV control

• Kubernetes friendly

Page 8: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

orchestrator/raft

• n orchestrator nodes form a raft cluster

• Each node has its own,dedicated backend database (MySQL or SQLite)

• All nodes probe the topologies

• All nodes run failure detection

• Only the leader runs failure recoveries

"

"

"

" ""

"

" ""

"

" ""

"

""

Page 9: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Implementation & deployment @ GitHub• One node per DC

• 1 second raft polling interval

• step-down

• raft-yield

• SQLite-backed log store

• MySQL backend (SQLite backend use case in the works)

"

"

"

"

"

"

DC1

DC2

DC3

Page 10: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

A high availability scenario

o2 is leader of a 3-node orchestrator/raft setup

"

"

" ""

"" ""

"""

o1

o2

o3

Page 11: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Injecting failure

master: killall -9 mysqld

o2 detects failure. About to recover, but…

"

"

" ""

"" ""

"""

o1

o2

o3

Page 12: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Injecting 2nd failure

o2: DROP DATABASE orchestrator;

o2 freaks out. 5 seconds later it steps down

"

"

" ""

"" ""

"""

o1

o2

o3

Page 13: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

orchestrator recovery

o1 grabs leadership

"

"

" ""

"" ""

"""

o1

o2

o3

Page 14: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

MySQL recovery

o1 detected failure even before stepping up as leader.

o1, now leader, kicks recovery, fails over MySQL master

"

"

" ""

"

"

"

"""

o1

o3

o2

Page 15: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

orchestrator self health tests

Meanwhile, o2 panics and bails out.

"

"

" ""

"

"

"

"""

o1

o3

o2

Page 16: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

puppet

Some time later, puppet kicks orchestrator service back on o2.

"

"

" ""

"

"

"

"""

o1

o3

o2

Page 17: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

orchestrator startup

orchestrator service on o2 bootstraps, creates orchestrator schema and tables.

"

"

" ""

"

"

"

"""

o1

o3

o2

Page 18: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Joining raft cluster

o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the group

"

"

" ""

"

"

"

"""

o1

o3

o2

Page 19: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Grabbing leadership

Some time later, o2 grabs leadership

"

"

" ""

"

"

"

"""

o1

o3

o2

Page 20: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

DC fencing

• Assume this 3 DC setup

• One orchestrator node in each DC

• Master and a few replicas in DC2

• What happens if DC2 gets network partitioned?

• i.e. no network in or out DC2

"

"

" ""

"" ""

"""

DC1

DC2

DC3

Page 21: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

DC fencing

• From the point of view of DC2 servers, and in particular in the point of view of DC2’s orchestrator node:

• Master and replicas are fine.

• DC1 and DC3 servers are all dead.

• No need for fail over.

• However, DC2’s orchestrator is not part of a quorum, hence not the leader. It doesn’t call the shots.

"

"

" ""

"" ""

"""

DC1

DC2

DC3

Page 22: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

DC fencing

• In the eyes of either DC1’s or DC3’s orchestrator:

• All DC2 servers, including the master, are dead.

• There is need for failover.

• DC1’s and DC3’s orchestrator nodes form a quorum. One of them will become the leader.

• The leader will initiate failover.

"

"

" ""

"" ""

"""

DC1

DC2

DC3

Page 23: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

DC fencing

• Depicted potential failover result. New master is from DC3.

"

"

"""

"

"

"

"

"""

DC1

DC2

DC3

Page 24: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

orchestrator/raft & consul

• orchestrator is Consul-aware

• Upon failover orchestrator updates Consul KV with identity of promoted master

• Consul @ GitHub is DC-local, no replication between Consul setups

• orchestrator nodes, update Consul locally on each DC

Page 25: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Considerations, watch out for

• Eventual consistency is not always your best friend

• What happens if, upon replay of raft log, you hit two failovers for the same cluster?

• NOW() and otherwise time-based assumptions

• Reapplying snapshot/log upon startup

Page 26: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

orchestrator/raft roadmap

• Kubernetes

• ClusterIP-based configuration in progress

• Already container-friendly via auto-reprovisioning of nodes via Raft

Page 27: Orchestrator on Ra : internals, benefits and considerations · Joining ra! cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the ... • Eventual

Thank you!

Questions?github.com/shlomi-noach @ShlomiNoach