Apache Kafka - Free Friday

Click here to load reader

  • date post

    16-Apr-2017
  • Category

    Software

  • view

    195
  • download

    1

Embed Size (px)

Transcript of Apache Kafka - Free Friday

  • Apache Kafka

    Free Friday

    Luiza Souza / Otvio [email protected]

    [email protected]

  • Apache Kafka

    Apache Kafka is a distributed messaging system Provides fast, highly scalable and redundant messaging

    through a pub-sub model

    It was built at LinkedIn to be used as central hub for all of the messaging communication between their systems

    Focus on scalability and fault tolerance

  • Motivation

    Microservices "In short, the microservice architectural style is an approach to developing a

    single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery."- Martin Fowler

    Monolith First Using microservices as a way to decompose monolitical

    infrastructures

    Message Queues Asynchronous processing Decoupling Load balancing Scalability

  • How is it different?

    High throughput Millions of events per second per node

    Fault-tolerance guarantees Relies on Apache Zookeeper for detection of node failures

    and leader election Maintains a structure called ISR (In-Sync Replica Set) in order

    to be able to tolerate node failures (Claims to) Guarantees up to f failures with f+1 replicas

    without losing data

    Distributed More nodes can be included and the system keeps its

    high-performance and fault-tolerance capabilities

  • Broker-centric (AMQP) AMQP implementations are usually broker-centric Focus on delivery guarantees between producers/consumers Transient preferred over durable messages Use the broker itself to maintain state of what is consumed

    (via message acknowledgements)

    Producer-centric (Kafka) Partition a fire hose of event data into durable message

    brokers with cursors (pointers) Support to batch consumers that may be offline, or online

    consumers that want messages at low latency Doesn't have message acknowledgements, it assumes the

    consumer tracks what has been consumed so far

    Comparison with AMQP

  • Kafka Terminology

    Producers Processes that publishes

    msgs to topics Consumers

    Processes that readsmsgs from topics

    Topic Name of the feed to which

    msgs are published Broker

    Process running on asingle machine

    Cluster Group of brokers working

    together

  • Kafka Terminology

    Partitions Subdivision of Topics

    Scalability Load balancing

    Consumers controltheir own offsets

  • Replication In-Sync-Replica (ISR) sets

    Kafka Terminology

    Figure 1. A Kafka cluster with 4 brokers, 1 topic and 2 partitions, each with 3 replicas

  • Use Cases

    Messaging

    Distributed log / Log aggregation

    Change Data Capture

    Stream Processing / Event Sourcing

  • Use Cases - Messaging

    Messaging Simple Queueing

    e.g. Queue for sending e-mails Tracking user events Near real-time metrics

  • Use Cases - Distributed Log

    Distributed log / Log aggregation LinkedIn usage

    The whole platform is built around a central log 13 million messages/sec, 15 gigabytes per sec Over 1100 brokers in more than 60 clusters

  • Use Cases - Change Data Capture

  • Use Cases - Stream Processing

    Stream Processing / Event Sourcing

    LinkedIn's example Netflix's example

  • DEMO

    14

  • ISSUES15

  • Issues

    CAP theorem (Consistency, Availability, Partitioning) "You can't sacrifice partition tolerance"

    Jepsen tests (@aphyr) In order to force failures on Kafka, it needs to shrink ISR

    (In-Sync Replica Set) to one node (the master) and then lose the master itself It will cause a leader election and a new leader will be

    elected It causes Kafka to lose ~50% of writes done during this

    partition time Kafka users usually set a replication factor of 2 or 3

    replicas for each partition on a given topic

  • THANK YOU

    20

    Luiza Souza / Otvio [email protected]

    [email protected]

  • https://aphyr.com/posts/315-jepsen-rabbitmq https://aphyr.com/posts/293-jepsen-kafka https://thoughtworks.jiveon.com/people/tbartlet/blog/2015/11/

    02/project-metamorphosis-with-kafka-spark https://thoughtworks.jiveon.com/message/1013489 https://medium.com/@ikem/event-sourcing-and-cqrs-a-look-at-

    kafka-e0c1b90d17d8#.x4f9ezrwn https://martin.kleppmann.com/2016/01/29/event-sourcing-stre

    am-processing-at-ddd-europe.html http://microservices.io/patterns/microservices.html http://martinfowler.com/articles/microservices.html https://engineering.linkedin.com/kafka/running-kafka-scale https://engineering.linkedin.com/kafka/intra-cluster-replication-

    apache-kafka http://martinfowler.com/bliki/MonolithFirst.html

    Links

  • https://www.oreilly.com/learning/making-sense-of-stream-processing/page/3/integrating-databases-and-kafka-with-change-data-capture

    http://kafka.apache.org/documentation.html https://github.com/toddpalino/kafkafromscratch/blob/master/A

    pache%20Kafka%20from%20Scratch.pdf http://www.javaworld.com/article/3060078/big-data/big-data-m

    essaging-with-kafka-part-1.html https://sookocheff.com/post/kafka/kafka-in-a-nutshell/

    Links

  • Use Cases - Change Data Capture

    Log compaction Kafka + Kafka Connect

  • Partitioning

    Custom Partitioner Write your own logic

    Default Partitioner Manual Hashing

    The most common approach Messages with the same key go to the same producer

    Spraying Random partitioning