Zookeeper big sonata

32
Introduction to Zookeeper Anh Le @BigSonata

description

 

Transcript of Zookeeper big sonata

Page 1: Zookeeper  big sonata

Introduction to Zookeeper

Anh Le @BigSonata

Page 2: Zookeeper  big sonata

What is a Distributed System?

A distributed system consists of multiple computers that communicate through a

computer network and interact with each other to achieve a common goal.

-Wikipedia

Page 3: Zookeeper  big sonata

Coordination in a Distributed System?

Coordination: An act that multiple nodes must perform together.Examples: Leader Election Managing group membership Managing metadata Synchronization (Semaphore, Mutex...)

Page 4: Zookeeper  big sonata

Coordination in a Distributed System?

To coordinate, processes can Exchange messages through network Read/Write using shared storage Use distributed locks

Problems for exchanging messages Message delays Processor speed Clock drift

Page 5: Zookeeper  big sonata

Use case for Master-Work Applications

Problems Master crashes Worker crashes Communication failures

Page 6: Zookeeper  big sonata

Use case for Master-Work Applications

Problems for Master Crashes Use a backup master Recover the latest state ? Backup master may suspect the primary master has crashed ? !

> Split Brain scenario

Page 7: Zookeeper  big sonata

Use case for Master-Work Applications

Problems for Worker Crashes Master must detect worker crashes Recover assigned tasks

Problems for Communication Failures Execute a same task only once

Page 8: Zookeeper  big sonata

Introduction to ZooKeeper

An open source, performant coordination service for distributed applications

Was a sub project of Hadoop but is now a Apache top-level project

Exposes common services in simple interface Leader Election Naming Configuration management Locks & Synchronization Group Service

→ Don't have to write them from scratch

Page 9: Zookeeper  big sonata

ZooKeeper Use cases

Distributed Cluster Management Node join/leave Node statuses in real time

Distributed synchronization Locks Barriers Queues

Page 10: Zookeeper  big sonata

ZooKeeper Use cases

Apache Hbase use ZooKeeper to Elect a cluster master Keep track of available servers Keep cluster metadata

Apache Kafka use Zookeeper to Detect crashes Implement topic discovery Maintain state for topics

Page 11: Zookeeper  big sonata

ZooKeeper Guarantees

Sequential Consistency: Updates are applied in order Atomicity: Updates either succeed or fail Single System Image: A client sees the same view of the service

regardless of the ZK server it connects to. Reliability: Updates persists once applied, till overwritten by some

clients. Timeliness: The clients’ view of the system is guaranteed to be up-

to-date within a certain time bound. (Eventual Consistency)

Page 12: Zookeeper  big sonata

ZooKeeper Services

All machines store a copy of the data (in memory) A leader is elected on service startup Clients only connect to a single server & maintains a TCP

connection. Client can read from any server, writes go through the leader &

needs majority consensus.

Page 13: Zookeeper  big sonata

ZooKeeper Data Model

ZooKeeper has a hierarchal name space. Each node is called as a ZNode. Every ZNode has data (given as byte[]) ZNode paths:

canonical, absolute, slash-separated no relative references. names can have Unicode characters

Page 14: Zookeeper  big sonata

ZNode

Maintain a stat structure with version numbers for data changes, ACL changes and timestamps.

Version numbers increases with changes

Data is read and written in its entirety

Page 15: Zookeeper  big sonata

ZNode types

Persistent Nodes exists till explicitly deletedEphemeral Nodes exists as long as the session is active can’t have childrenSequence Nodes (Unique Naming) append a monotonically increasing counter to the end of path applies to both persistent & ephemeral nodes

Page 16: Zookeeper  big sonata

ZNode watches

Clients can set watches on znodes: NodeChildrenChanged NodeCreated NodeDataChanged NodeDeleted

Changes to a znode trigger the watch and ZooKeeper sends the client a notification.

Watches are one time triggers. Watches are always ordered. Client sees watched event before new ZNode data.

Page 17: Zookeeper  big sonata

ZNode APIs

String create(path, data, acl, flags)void delete(path, expectedVersion)Stat setData(path, data, expectedVersion)(data, Stat) getData(path, watch)Stat exists(path, watch)String[] getChildren(path, watch)→ Each API has its own asynchronous version also

Page 18: Zookeeper  big sonata

ZooKeeper Recipes

Page 19: Zookeeper  big sonata

Recipe: Leader Election

/master

Page 20: Zookeeper  big sonata

Recipe: Leader Election

Continuous watching on znodes requires reset of watches after every events / triggers

Too many watches on a single znode creates the “herd effect” - causing bursts of traffic and limiting scalability

Page 21: Zookeeper  big sonata

Recipe: Leader Election (Improved)

1.All participants create an ephemeral-sequential node on the same election path.

2.The node with the smallest sequence number is the leader.

3.Each “follower” node listens to the node with the next lower seq. number

4.Upon leader removal go to election-path and find a new leader, or become the leader if it has the lowest sequence number.5.Upon session expiration check the election state

and go to election if needed

Page 22: Zookeeper  big sonata

Zookeeper Programming

https://github.com/anhldbk/Zookeeper-Demo

Page 23: Zookeeper  big sonata

Zookeeper Programming

Page 24: Zookeeper  big sonata

Zookeeper Programming

Page 25: Zookeeper  big sonata

Zookeeper Programming

Difficult to use Zookeeper APIs Connection Issues:

Initial connection: Requires a handshake before executing any operations (create(), delete()...)

Session expiration: Clients are expected to watch for this state and close and re-create the ZooKeeper instance.

Page 26: Zookeeper  big sonata

Zookeeper Programming

Difficult to use Zookeeper APIs Recoverable Errors:

When creating a sequential ZNode on the server, there is the possibility that the server will successfully create the ZNode but crash prior to returning the node name to the client.

There are several recoverable exceptions thrown by the ZooKeeper client. Users are expected to catch these exceptions and retry the operation.

Page 27: Zookeeper  big sonata

Zookeeper Programming

Difficult to use Zookeeper APIs Recipes:

The standard ZooKeeper "recipes" (locks, leaders, etc.) are only minimally described and subtly difficult to write correctly..

Page 28: Zookeeper  big sonata

Zookeeper Programming with Curator

l Curator- The Netflix Zookeeper library

Page 29: Zookeeper  big sonata

Zookeeper Programming with Curator

Page 30: Zookeeper  big sonata

Zookeeper Programming with Curator

Page 31: Zookeeper  big sonata

Zookeeper for Our Systems

Page 32: Zookeeper  big sonata