All about Apache Kafka - AIOUG Kafka In Oracle Event Hub... · Oracle Event Hub Cloud Service...
Transcript of All about Apache Kafka - AIOUG Kafka In Oracle Event Hub... · Oracle Event Hub Cloud Service...
Agenda
• Why need Streaming / Messaging platform?
• Introduction to Apache Kafka
• Architecture of Kafka
• Kafka Core Concepts
• Kafka Components
• Oracle Event Hub Cloud Service
• Kafka in Event Hub Cloud
What is Kafka
• Apache Kafka, is an open source technology for developing real-time, fault tolerant, highly scalable and immutable messaging system.
• Developed by LinkedIn and donated to Apache Software Foundation
• Its key strength is its ability to make high volume data available as a real-time stream.
• It can work with OLTP systems, Batch systems like Hadoop, Real time systems that require low-latency access, Stream processing engines etc.,
• It is very well suitable for distributed and disconnected applications.
Re
al-Time
Batch
Re
al-Time
Batch
How Kafka works… ?
Data Source Data Sink
REPORTS
RDBMS, NO SQL
HADOOP
DATA WAREHOUSE
ERP / CRM
SENSOR DATA
MOBILE DEVICES
LOGS / EVENTS / FEEDS
REAL-TIME ANALYTICS
DASHBOARDS / ALERTS
API / ML SYSTEM
AUDIT
STORAGE / DATABASE
DATA SCIENCE
PR
OD
UC
ER
CO
NSU
MERKAFKA PLATFORM
TOPICSPARTITIONS OFFSETSBROKERS
SCHEMA REGISTRYREST PROXYSTREAMS
ZOO KEEPER
Kafka Ecosystem
Kafka Core Architecture
ZOO KEEPER
KAFKA CLUSTER
BROKERBROKERBROKER
BROKERBROKERBROKER
BROKERBROKERBROKER
ProducerProducerProducer
ProducerProducerProducer
ProducerProducerProducer
ProducerProducerConsumer
ProducerProducerConsumer
ProducerProducerConsumer
Consumer Groups
push message
pull message
elect leader
commit & manage offset
assign broker
BROKERBROKER
BROKER
Real-time use case…
Customer calls for investment
advice
portfolio analysis
service request
provide recommendations
investment is made
CreateExchangeAnalyse
AggregateData
( 1 request processed per hr )( 100 calls per hr )
Kafka Architecture (contd.,)
KAFKA CLUSTER
BROKERBROKERBROKER
BROKERBROKERBROKER
BROKERBROKERBROKER
ProducerProducerCall
ProducerProducerChat
ProducerProducerScheduled
ProducerProducerApp 1
ProducerProducerApp 2
ProducerProducerApp 3
Consumer Groups
push message
pull message
commit & manage offset
Customer Requests Messaging Platform Applications
1. Topics, Partitions & Offset
Topics:• Topic is a particular stream of data• It is basically a log file that stores data, and can be compared to a Table in
the database• A topic is always identified by its name
Partitions:• A topic is split into Partitions, and stores message in them• Partitions are ordered• The messages in a partition are identified by an incremental id called Offset• Data written into a partition cannot be changed (immutable)
Anatomy of a Topic
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8
Partition 0
Partition 1 Writes
Old New
2. Brokers (a.k.a Kafka Server)
• A Kafka cluster contains 1 or many Brokers (servers)
• A Broker is identified by its id
• A Broker takes care of certain partitions in a topic
• In a cluster, a connection to any Broker provides access to the entire cluster
Sample Kafka Cluster
Broker 1
Topic 1Partition 0
Topic 2Partition 1
Broker 2
Topic 1Partition 1
Topic 1Partition 0
Broker 3
Topic 1Partition 2
3 BrokersTopic 1 – 3 partitionsTopic 2 – 2 partitions
3. Producers
• Producers are responsible for writing data into a Topic
• The producer will require the Topic Name and Broker to connect to the Kafka cluster, and produce the messages
• In case more than one Broker is there, the Kafka cluster is responsible for Leader election and routing the message to the right broker
• Producer can choose to add a Key to the message. Messages with same key value will always be routed to the same Partition.
4. Consumers
• Consumers are responsible for writing data into a Topic
• The consumer will require the Topic Name and Broker to connect to the Kafka cluster, and produce the messages
• In case more than one Broker is there, the Kafka cluster is responsible for Leader election and pulling the message from the right broker
• Data is read in order for each Partition.
• Consumers read the data in Groups. A consumer in a group reads from only one Partition exclusively.
• So the number of Consumer should be less or equal to the number of Partition.
• Kafka broker stores the offsets read by the Consumer.
5. Zookeeper
• Zookeeper manages the Kafka brokers
• It helps the Brokers in performing the leader election for partitions
• Zookeeper send notification to Brokers in the event of any new Topic creation, Topic deleted, Broker dies, Brokers comes up, etc.,
• A Zookeeper cluster is called Quorum, and it always contain odd number of servers in it (1,3,5,7…)
• Zookeeper quorum elects its own leader, and the others will be followers
Kafka Schema Registry
KAFKA CLUSTERProducer Consumer
SCHEMA REGISTRY
json data json data
register avroschema
retrieve avroschema
Kafka REST Proxy
KAFKA CLUSTER
SCHEMA REGISTRY
REST PROXYProducer Consumerhttp
post
http
get
register retrieve
write data consume data
Kafka Connect & Streams Architecture
KAFKA CLUSTER
BROKER
BROKER
BROKER
SourceKAFKA CONNECT
WORKER
WORKER
WORKERSink
STREAM API
STREAM API
STREAM API
Oracle Event Hub Cloud
Oracle Event Hub Cloud Service delivers the power of Kafka as a managed streaming data platform integrated with the rest of Oracle’s Cloud.
This enables the rapid, secure and cost-effective operation on streaming data by leveraging Kafka.
Instances,
- An instance in Oracle Event Hub Cloud Service – Dedicated refers to a Kafka Cluster.
- An instance in Oracle Event Hub Cloud Service refers to a Kafka Topic.
Oracle Event Hub Cloud
Features
• Apache Kafka delivered as a managed service
• Available in dedicated and multi-tenant flavours
• Elastic by nodes and by partitions
Benefits
• Realtime streaming platform
• Easy to use with REST APIs
• High performance Native API support
• Lift and Shift Kafka workloads from on-premise
• Elastic – scale from thousand to million of events per second
• Reliable – Highly available with in-cluster replication and cluster-mirroring
Oracle Event Hub Cloud – REST APIs
Use the Event Hub Cloud Platform REST API to
• Create and manage Oracle Event Hub Cloud Service clusters and topics
• View and manage network security rules
• Monitor the health of your service
• Apply patches and Scale clusters on-demand
• Manage the life cycle of your Oracle Event Hub Cloud Service
REST APIs offer easy provisioning and lifecycle management of Apache Kafka topic on
Oracle Public Cloud with ability to create partitions, consume and produce messages.
These options are also available from Oracle PaaS Service Manager CLI as well.
Oracle Event Hub Cloud – Setup Steps
• Access the Oracle Event Hub Cloud Service Console
• Create an SSH Key Pair
• Create a Kafka Cluster
• Create a Kafka Topic
• Produce to and Consume from a Topic
• View the runtime metrics for the Topic
• Kafka Connect and REST configuration are optional