Multi-Tenant Apache Kafka for Hops - Diva 1091136/FULLTEXT01.pdf · PDF fileMulti-Tenant...

Click here to load reader

  • date post

    05-May-2018
  • Category

    Documents

  • view

    220
  • download

    0

Embed Size (px)

Transcript of Multi-Tenant Apache Kafka for Hops - Diva 1091136/FULLTEXT01.pdf · PDF fileMulti-Tenant...

  • INOM EXAMENSARBETE INFORMATIONS- OCH KOMMUNIKATIONSTEKNIK,AVANCERAD NIV, 30 HP

    , STOCKHOLM SVERIGE 2016

    Multi-Tenant Apache Kafka for HopsKafka Topic-Based Multi-Tenancy and ACL-Based Authorization for Hops

    MISGANU DESSALEGN MURUTS

    KTHSKOLAN FR INFORMATIONS- OCH KOMMUNIKATIONSTEKNIK

  • Multi-Tenant Apache Kafka for Hops

    Kafka Topic-Based Multi-Tenancyand

    ACL-Based Authorization for Hops

    Misganu Dessalegn Muruts

    Master of Science Thesis

    Software Engineering of Distributed SystemsSchool of Information and Communication Technology

    KTH Royal Institute of Technology

    Stockholm, Sweden

    15 November 2016

    Examiner: Dr. Jim DowlingSupervisor: Gautier Berthou

    TRITA Number: TRITA-ICT-EX-2016:120

  • c Misganu Dessalegn Muruts, 15 November 2016

  • Abstract

    Apache Kafka is a distributed, high throughput and fault-tolerant publish/subscribemessaging system in the Hadoop ecosystem. It is used as a distributed datastreaming and processing platform. Kafka topics are the units of message feedsin the Kafka cluster. Kafka producer publishes messages into these topics and aKafka consumer subscribes to topics to pull those messages. With the increasedusage of Kafka in the data infrastructure of many companies, there are manyKafka clients that publish and consume messages to/from the Kafka topics. Infact, these client operations can be malicious. To mitigate this risk, clients mustauthenticate themselves and their operation must be authorized before they canaccess to a given topic. Nowadays, Kafka ships with a pluggable Authorizerinterface to implement access control list (ACL) based authorization for clientoperation. Kafka users can implement the interface differently to satisfy theirsecurity requirements. SimpleACLAuthorizer is the out-of-box implementationof the interface and uses a Zookeeper for ACLs storage.

    HopsWorks, based on Hops - a next generation Hadoop distribution, providessupport for project-based multi-tenancy, where projects are fully isolated at thelevel of the Hadoop Filesystem and YARN. In this project, we added Kafka topic-based multi-tenancy in Hops projects. Kafka topic is created from inside Hopsproject and persisted both at the Zookeeper and the NDBCluster. Persisting atopic into a database enabled us for topic sharing across projects. ACLs areadded to Kafka topics and are persisted only into the database. Client accessto Kafka topics is authorized based on these ACLs. ACLs are added, updated,listed and/or removed from the HopsWorks WebUI. HopsACLAuthorizer, a Hopsimplementation of the Authorizer interface, authorizes Kafka client operationsusing the ACLs in the database. The Apache Avro schema registry for topicsenabled the producer and consumer to better integrate by transferring a pre-established message format. The result of this project is the first Hadoopdistribution that supports Kafka multi-tenancy.

    Keywords: Hadoop, Kafka, Hops, HopsWorks, Multi-Tenancy, Kafka Topics,Schema Registry, Messaging Systems, ACL Authorization

    i

  • Acknowledgements

    I would like to express my profound gratitude to Dr. Jim Dowling, for hiscontinuous support during my thesis period. His open-minded approach tonew ideas, insightful and tactful feedback to technical disagreement were theinvaluable assets to the success of the project. Moreover, learning from him wasthe thrilling experience that I would not pass without singling it out.

    My advisor, Gauthier Berthou, was also of much help to get the goals of thethesis achieved. His door was always open when I needed to talk to him. I amalso indebted to recognize the substantial assistance from Ermias, Theofilos andthe remaining Hops-team members. Moreover, I enjoyed their friendship and, ofcourse, the Friday fika.

    My extended appreciation goes to the Swedish Institute (SI ) which sponsoredmy two years master study in Sweden. My contract with SI did not terminate withmy master completion. I believe the support has laid a foundation for my futurecareer.

    Last but not least, my heartfelt gratitude goes to my family for standing by meto share my dream. The unfailing support they offer and the enduring confidencethey bestowed on me keeps constantly energizing me.

    iii

  • Contents

    1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3.1 Benefits, Ethics and Sustainability . . . . . . . . . . . . . 41.4 Delimitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Background 72.1 What is Messaging . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Kafka Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Kafka Main Characteristics . . . . . . . . . . . . . . . . . . . . . 112.4 Kafka and other Messaging Services . . . . . . . . . . . . . . . . 122.5 Apache Kafka Use Cases . . . . . . . . . . . . . . . . . . . . . . 122.6 About SICS Hops Project . . . . . . . . . . . . . . . . . . . . . . 132.7 Multi-Tenant Architecture . . . . . . . . . . . . . . . . . . . . . 14

    2.7.1 Multi-Tenancy in Cloud Computing . . . . . . . . . . . . 142.7.2 Multi-Tenancy in Database . . . . . . . . . . . . . . . . . 14

    3 Related Works 173.1 Kafka Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 ACL Based Authorization . . . . . . . . . . . . . . . . . . . . . 17

    3.2.1 Kafka SimpleAclAuthorizer . . . . . . . . . . . . . . . . 183.2.2 DefaultPrincipalBuilder . . . . . . . . . . . . . . . . . . 193.2.3 Other Implementations . . . . . . . . . . . . . . . . . . . 19

    3.3 Schema Registry . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4 Hops Project based Multi-tenancy . . . . . . . . . . . . . . . . . 21

    4 Methodology 234.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    v

  • vi CONTENTS

    4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    5 Analysis 255.1 Kafka Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    5.1.1 Existing Apache Kafka Topic Operations . . . . . . . . . 255.1.2 Hops New Topic Operations . . . . . . . . . . . . . . . . 27

    5.2 Topic Availability Invariant . . . . . . . . . . . . . . . . . . . . 275.2.1 Topic Availability States . . . . . . . . . . . . . . . . . . 285.2.2 Why Always to State Four . . . . . . . . . . . . . . . . . 28

    5.3 Why Topic Sharing . . . . . . . . . . . . . . . . . . . . . . . . . 295.4 Hops Schema Registry . . . . . . . . . . . . . . . . . . . . . . . 29

    5.4.1 Kafka Clients Access to Avro Schema . . . . . . . . . . . 305.4.2 Avro Schema Compatibility and Evolution . . . . . . . . 31

    5.5 Synchronizing Zookeeper and Database for Kafka Topics . . . . . 315.5.1 Topic Synchronization Failure Scenarios . . . . . . . . . 325.5.2 Possible Synchronization Approaches . . . . . . . . . . . 33

    5.6 Hops ACL Definition . . . . . . . . . . . . . . . . . . . . . . . . 355.6.1 The Role Concept . . . . . . . . . . . . . . . . . . . . . 375.6.2 Fine-grained ACLs and Role-base ACLs . . . . . . . . . 37

    5.7 HopsAclAuthorizer Class . . . . . . . . . . . . . . . . . . . . . . 395.7.1 HopsPrincipalBuilder Class . . . . . . . . . . . . . . . . 405.7.2 When is a topic operation authorized . . . . . . . . . . . . 40

    6 Test Configurations 436.1 Kafka Broker Configurations . . . . . . . . . . . . . . . . . . . . 436.2 Zookeeper Configuration . . . . . . . . . . . . . . . . . . . . . . 446.3 Configuring Kafka Clients . . . . . . . . . . . . . . . . . . . . . 446.4 Local Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.5 Kafka Util Library . . . . . . . . . . . . . . . . . . . . . . . . . 456.6 Spark job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    7 Conclusions, Recommendations and Future Works 477.1 Conclusions and Recommendations . . . . . . . . . . . . . . . . 477.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    Bibliography 49

  • List of Figures

    2.1 Forms of messaging . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Kafka Architecture [1] . . . . . . . . . . . . . . . . . . . . . . . 92.3 Kafka consumer group abstraction [2] . . . . . . . . . . . . . . . 112.4 Shared database, shared schema [3] . . . . . . . . . . . . . . . . 15

    vii

  • List of Tables

    2.1 Degree of multi-tenancy in cloud computing . . . . . . . . . . . . 14

    5.1 Sample Hops ACL definitions . . . . . . . . . . . . . . . . . . . 385.2 Hops ACL definition levels . . . . . . . . . . . . . . . . . . . . . 39

    ix

  • List of Listings

    3.1 Schema example . . . . . . . . . . . . . . . . . . . . . . . . . . 216.1 Sample Kafka broker configuration . . . . . . . . . . . . . . . . . 43

    xi

  • List of Acronyms and Abbreviations

    This document requires readers to be familiar with terms and concepts ofdistributed systems, cloud computing multi-tenancy architecture and data accessauthorizations. For clarity, we summarize some of these terms and concepts in theabove context.

    ACL Accesses Control List

    CA Certificate Authority

    Hops Hadoop Open Platform-as-a-Service

    IaaS Infrastructure as a Service

    IDL Interactive Data Language

    JSON JavaScript Object Notation

    KIP Kafka Improvement Proposals

    OSDM Open Source Development Model

    PaaS Platform as a Service

    REST Representational State Transfer

    SaaS Software as a Service

    SASL Simple Authentic