ZooKeeper - the unsung hero - Berlin...

42
user perspective internals ZooKeeper use(r)s Praise and Rant ZooKeeper the unsung hero Thomas Koch June 6, 2011 Thomas Koch ZooKeeper

Transcript of ZooKeeper - the unsung hero - Berlin...

Page 1: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

ZooKeeperthe unsung hero

Thomas Koch

June 6, 2011

Thomas Koch

ZooKeeper

Page 2: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

me

Thomas Kochhttp://www.koch.ro

finished music, physics5 years software developer: PHP (RIP!), Java, Hadoop, Search,Crawling, ERP, Groupwarecurrently: finishing computer science bachelor, ETA: Q1/2012tags: quality, ATTAC, FSFE, FIfF, social responsibility, Romania,Switzerland

Thomas Koch

ZooKeeper

Page 3: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

Outline

user perspective

internals

ZooKeeper use(r)s

Praise and Rant

Thomas Koch

ZooKeeper

Page 4: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

what does it look like?

I file system of zNodes

I zNode is a file (max. 1MB)

I zNode if a folder

I watches / notifications

I atomic operations

I optimistic locking (changes provide last seen version number)

Thomas Koch

ZooKeeper

Page 5: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

zNode flags

ephemeral: removed when client disappearssequential: append incremental number

create(”/my-znode-”,SEQUENTIAL) → /my-znode-00000001

create(”/my-znode-”,SEQUENTIAL) → /my-znode-00000002

Thomas Koch

ZooKeeper

Page 6: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

zNode flags

ephemeral: removed when client disappearssequential: append incremental number

create(”/my-znode-”,SEQUENTIAL) → /my-znode-00000001

create(”/my-znode-”,SEQUENTIAL) → /my-znode-00000002

Thomas Koch

ZooKeeper

Page 7: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

zNode flags

ephemeral: removed when client disappearssequential: append incremental number

create(”/my-znode-”,SEQUENTIAL) → /my-znode-00000001

create(”/my-znode-”,SEQUENTIAL) → /my-znode-00000002

Thomas Koch

ZooKeeper

Page 8: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

ZooKeeper architecture

Thomas Koch

ZooKeeper

Page 9: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

ZooKeeper guaranties

I Sequential Consistency

I Atomicity (A from ACID)

I Single System Image

I Reliability (Durability from ACID)

I Timeliness

I No Split Brain

Thomas Koch

ZooKeeper

Page 10: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

why?

I simple, powerful building blocks for distributed algorithms

I one place for coordination (instead of a jungle of cron jobs /daemons / lock files)

Thomas Koch

ZooKeeper

Page 11: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

standard algorithms

I distributed lock (with wait queue)

I leader election

I configuration store

I rendezvous, group membership

I (low scale) producer consumer queue

Thomas Koch

ZooKeeper

Page 12: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

Outline

user perspective

internals

ZooKeeper use(r)s

Praise and Rant

Thomas Koch

ZooKeeper

Page 13: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

ZAB - ZooKeeper Atomic Broadcast

two modes: recovery / broadcast

Thomas Koch

ZooKeeper

Page 14: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

ZAB: Broadcast mode

Thomas Koch

ZooKeeper

Page 15: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

ZAB: Recovery (Leader election)

I elect member with highest zxid

I votes from more then n2 servers

I increment epoch (zxid: 00000011︸ ︷︷ ︸epoch

: 00004213︸ ︷︷ ︸counter

)

I replay all not yet committed proposals

Thomas Koch

ZooKeeper

Page 16: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

Outline

user perspective

internals

ZooKeeper use(r)s

Praise and Rant

Thomas Koch

ZooKeeper

Page 17: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

bookkeeperI originally intended for the Hadoop NameNode write-ahead-logI high availability, and high throughputI ledgers readable only after close

Thomas Koch

ZooKeeper

Page 18: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

Hedwid (Yahoo), Kafka (LinkedIn)

publish-subscribe systemsHedwig:

I strong durability guarantees

I many topics

I C++

Kafka:

I many subscribers and publishers

I replay already consumed messages

I Scala

Thomas Koch

ZooKeeper

Page 19: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

Hedwid (Yahoo), Kafka (LinkedIn)

publish-subscribe systemsHedwig:

I strong durability guarantees

I many topics

I C++

Kafka:

I many subscribers and publishers

I replay already consumed messages

I Scala

Thomas Koch

ZooKeeper

Page 20: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

more users1 (free software only)

I Eclipse Communication Framework

I Katta (distributed Lucene indexes)

I HBase: master election, server lease management,bootstrapping

I Norbert, LinkedIn, partitioned routing, cluster management

I Mesos, cluster computing management platform

I Neo4j, graph database

I S4, Yahoo, stream processing

I Apache CXF distributed OSGi: discovery

I Apache Solr: configuration, leader election

I Hadoop MapReduce 2.0?

1cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredByThomas Koch

ZooKeeper

Page 21: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

interesting development

I ZOOKEEPER-892 Remote replication of Zookeeper data

I Extracting Zab from Zookeeper2 - Andre Oriani

2https://github.com/aoriani/zab

Thomas Koch

ZooKeeper

Page 22: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

Outline

user perspective

internals

ZooKeeper use(r)s

Praise and Rant

Thomas Koch

ZooKeeper

Page 23: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

Praise

I right balance: functionality vs. usability

I turn key ready server

I bindings to other languages

I large user base

I proven scalability

Thomas Koch

ZooKeeper

Page 24: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

Comparison: JGroups

I Bela Ban (JBoss, Kreuzlingen), 1998

I “toolkit for reliable multicast communication”

I highly complex, highly powerful

I ISIS → Horrus → Ensemble → JGroups

I presumably missing: n2 + 1 commits, changeable leader

election

I failure detection, dynamic groups, auto discovery

google://JGroups ZooKeeper !others like JGroups3: Appia, Spread

3http://jgcs.sourceforge.net/implementations/

Thomas Koch

ZooKeeper

Page 25: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

PMDs opinion on ZooKeeper

Thomas Koch

ZooKeeper

Page 26: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

my favourite PMD violations: NPath complexity

possible paths through the method

ClientCnxn.run() NPath complexity of 314

.onConnected() NPath complexity of 750

.readResponse() NPath complexity of 9750

Thomas Koch

ZooKeeper

Page 27: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

my favourite PMD violations: NPath complexity

possible paths through the method

ClientCnxn.run() NPath complexity of 314.onConnected() NPath complexity of 750

.readResponse() NPath complexity of 9750

Thomas Koch

ZooKeeper

Page 28: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

my favourite PMD violations: NPath complexity

possible paths through the method

ClientCnxn.run() NPath complexity of 314.onConnected() NPath complexity of 750.readResponse() NPath complexity of 9750

Thomas Koch

ZooKeeper

Page 29: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

my favourite PMD violations: cyclomatic complexity

branching points

ClientCnxn.processEvent() Cyclomatic complexity 26

Thomas Koch

ZooKeeper

Page 30: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

other favourite PMD violations

I Assigning an Object to null is a code smell. Considerrefactoring.

I Avoid really long methods.

I This class has too many methods, consider refactoring it.

I A high ratio of statements to labels in a switch statement.Consider refactoring.

I Avoid empty catch blocks.

I Avoid really long parameter lists.

I (Class has) Too many fields.

Thomas Koch

ZooKeeper

Page 31: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

other favourite PMD violations

I Assigning an Object to null is a code smell. Considerrefactoring.

I Avoid really long methods.

I This class has too many methods, consider refactoring it.

I A high ratio of statements to labels in a switch statement.Consider refactoring.

I Avoid empty catch blocks.

I Avoid really long parameter lists.

I (Class has) Too many fields.

Thomas Koch

ZooKeeper

Page 32: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

ZooKeeper Devs on refactoring

code quality is important, and there are things we shouldkeep in mind, but in general i really don’t like the idea ofrisking code breakage because of a gratuitous codecleanup. . . .

vs.

Thomas Koch

ZooKeeper

Page 33: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

BIG BALL OF MUD4

I tight coupling

I therefor no units testable in isolation

I therefor most unit tests are actually acceptance tests

4http://www.laputan.org/mud/mud.html

Thomas Koch

ZooKeeper

Page 34: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

copy ’n paste programming

. . . leads to code duplicatione.g ZOOKEEPER-911 removes 162 lines of duplicate code

Thomas Koch

ZooKeeper

Page 35: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

client API

your experience?e.g. ZkClient, cagesno client only jar

Thomas Koch

ZooKeeper

Page 36: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

feature bloat

I chroot

I automatic event re-subscription

I chroot + automatic event re-subscription = broken

I chroot not fully transparent (ZOOKEEPER-1027)

I multi-update command in the works since half a year

Thomas Koch

ZooKeeper

Page 37: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

horrible concurrency

XYZ extends Thread

instead of

I implements runnable

I or better: executor framework

I or much better: actors (e.g. Akka)

Thomas Koch

ZooKeeper

Page 38: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

QA

Live is too short for crap.

Live is short. Strive for Excellence in Everything You Do!

http://www.koch.ro

http://identi.ca/thkoch

[email protected]

Thomas Koch

ZooKeeper

Page 39: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

QA

Live is too short for crap.

Live is short. Strive for Excellence in Everything You Do!

http://www.koch.ro

http://identi.ca/thkoch

[email protected]

Thomas Koch

ZooKeeper

Page 40: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

classification

Google free software

GFS → HDFS (KFS, . . . )MapReduce → Hadoop MapReduceBigTable → HBase (Hypertable)Chubby 6= ZooKeeper

Thomas Koch

ZooKeeper

Page 41: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

another distributed file system?

HDFS ZooKeeper

big files small “cookies”dumb watches, sequential, ephemeralstreaming random access

ordered

Thomas Koch

ZooKeeper

Page 42: ZooKeeper - the unsung hero - Berlin Buzzwords2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/...Hedwid (Yahoo), Kafka (LinkedIn) publish-subscribe systems Hedwig: I strong durability

user perspective internals ZooKeeper use(r)s Praise and Rant

ZooKeeper API

I String create (path, data, acl, ephemeral|sequential)

I void delete (path, expected Version)

I Stat setData (path, data, expected Version)

I byte[] getData (path, watch)

I Stat exists (path, watch)

I String[] getChildren (path, watch)

I void sync (path)

Thomas Koch

ZooKeeper