Zookeeper at the bigdata roundtable

18
Zookeeper What it actually is. Tobias Schlottke November 2011 Freitag, 2. Dezember 11

description

Tobias Schlottkes presentation on Zookeeper at the first bigdata-roundtable in Hamburg

Transcript of Zookeeper at the bigdata roundtable

Page 1: Zookeeper at the bigdata roundtable

ZookeeperWhat it actually is.

Tobias SchlottkeNovember 2011

Freitag, 2. Dezember 11

Page 2: Zookeeper at the bigdata roundtable

ZookeeperWhat it actually is.

Tobias SchlottkeNovember 2011

Freitag, 2. Dezember 11

Page 3: Zookeeper at the bigdata roundtable

What it is

„ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed

synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race

conditions that are inevitable. Because of the difficulty of implementing these kinds of

services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to

manage. Even when done correctly, different implementations of these services lead to management complexity when the

applications are deployed.“

Freitag, 2. Dezember 11

Page 4: Zookeeper at the bigdata roundtable

WTF?

Freitag, 2. Dezember 11

Page 5: Zookeeper at the bigdata roundtable

In simple words

„A service which helps to coordinate distributed

systems.“

Freitag, 2. Dezember 11

Page 6: Zookeeper at the bigdata roundtable

Invented by

Freitag, 2. Dezember 11

Page 7: Zookeeper at the bigdata roundtable

Invented by

Freitag, 2. Dezember 11

Page 8: Zookeeper at the bigdata roundtable

More details please!

• Naming nodes

• Configuring nodes

• Synchronizing nodes

• Organize groups of nodes

• Heartbeat

• Democracy / Leader election

Freitag, 2. Dezember 11

Page 9: Zookeeper at the bigdata roundtable

But how to organize this stuff?

Freitag, 2. Dezember 11

Page 10: Zookeeper at the bigdata roundtable

Filesystem Schema

application1

application1/service1application1/servicen

applicationn

application1/service1/node0001 application1/service1/node0002

Freitag, 2. Dezember 11

Page 11: Zookeeper at the bigdata roundtable

„Filesystem“ Features

• Organized in a node tree app1/service1

• „Ephemeral“ nodes

• „Sequential“ nodes 0123

• Notification system

• High availability

• Exchange format does not matter

Freitag, 2. Dezember 11

Page 12: Zookeeper at the bigdata roundtable

Other things you get for free

• „Atomicity“

• „Consistency“

• „Reliability“

• „Timeliness“

• „Conditional updates“ (Versioning)

Freitag, 2. Dezember 11

Page 13: Zookeeper at the bigdata roundtable

The Zookeeper Service

ZK Server 1Elected Master

ZK Server 2Slave

ZK Server NSlave

Application 1 Application N

Freitag, 2. Dezember 11

Page 14: Zookeeper at the bigdata roundtable

The Zookeeper Service

• All Servers store a copy of data in Mem

• Master election at startup

• All updates go to the leader

• Responses are sent when a majority persited

Freitag, 2. Dezember 11

Page 15: Zookeeper at the bigdata roundtable

Snares

• Data must fit in memory

• Data size for each node should be <1MB

• Client libraries not all that good so far

• Ephemeral nodes are not allowed to have children

Freitag, 2. Dezember 11

Page 16: Zookeeper at the bigdata roundtable

Usecase

„I am an adserver. I deliver 20.000 ads a second and for this, you will most likely

design me as a distributed service“

Freitag, 2. Dezember 11

Page 17: Zookeeper at the bigdata roundtable

Adserver Usecase

Delivery1 Delivery2 Delivery3 Delivery4

Database

Coordinator

ZKThrift

Freitag, 2. Dezember 11

Page 18: Zookeeper at the bigdata roundtable

Thats it! Pretty simple, huh?

Freitag, 2. Dezember 11