Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ •...

28
Apache Ka)a A distributed Publish/subscribe messaging system: A middleware to reliably distribute data to consumers

Transcript of Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ •...

Page 1: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Apache  Ka)a  

A  distributed  Publish/subscribe  messaging  system:  

A  middleware  to  reliably  distribute  data  to  consumers  

Page 2: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Intro  •  Originally  developed  by  LinkedIn,  and  later  turned  into  an  Apache  open  source  project  

•  Designed  for  processing  real-­‐Eme  acEvity  streams  (e.g,  log  metrics  collecEons)  

•  WriJen  in  Scala  •  Features:  

–  Persistent  messaging  –  High-­‐throughput  –  Supports  both  queue  and  Pub/Sub  semanEcs  –  Parallel  data  distribuEon  –  Uses  Zookeeper  for  forming  a  cluster  of  nodes    

•  hJp://ka)a.apache.org  

Page 3: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

•  A  Ka)a  cluster  consists  of  N  Ka)a  brokers,  using  zookeeper  to  coordinate  their  states.  

•  Ka)a  brokers  receive  messages  from  Producers  (push)  and  deliver  messages  to  Consumers  (pull)  

•  They  are  responsible  for  persisEng  the  messages  for  a  configurable  period  of  Eme  or  up  to  an  available  space  (depending  on  a  configuraEon)  

•  Messages  are  persisted  to  append-­‐only  log  files  (sequenEal  writes),  and  consumers  read  a  range  of  these  files  (sequenEal  reads).  

Push-­‐based  

Pull-­‐based  

Page 4: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Example  

Page 5: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Need  for  Persistent  Queuing  

Typical  configuraEon  of  Real-­‐Eme  Stream  processing  Reasons  for  data  persistency:  •  workers  may  fail  during  processing  and  data  must  be  distributed  

again    •  System  must  be  flexible  to  handle  bursts  in  traffic  =>  the  persistent  

queue  becomes  the  data  buffer  unEl  more  workers  are  started  

Stream  Producer  

Dispatcher  Worker  

Worker  

Worker  

….  

Page 6: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Interface  for  a  Single-­‐consumer  Persistent  queue  server  

Basic  idea:  •  When  an  item  is  read  it  is  not  immediately  removed,    •  Consumer  sends  an  explicit  ack  telling  that  processing  of  the  item  

was  successful.  Else,  a  failure  is  reported  and  the  item  is  reassigned  to  a  new  worker.  

•  Only  when  an  item  gets  an  ack,  it  is  definitely  removed  from  the  queue.    

But,  what  if  many  applicaEons  need  to  consume  the  same  stream?  (e.g.  page-­‐view  data  stream  -­‐>  App1:  analysis  of  pageviews  over  Eme,  App2:  analysis  of  unique  visitors  over  Eme)  

Struct  Item  {    Long  Id;    byte[]  item;  

}  

Interface  Queue  {    Item  get();    void  ack(long  Id)    void  fail  (long  Id)item;  

}  

Page 7: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Handling  several  consuming  applicaEons  

Op#on  A:  •  Make  all  applicaEons  reside  in  the  same  codebase  (and  run  in  the  same  workers)  or  use  a  

single  “queue  consumer”  •  Disadvantage:  lack  of  isolaEon  and  of  parallelism    Op#on  B:  •  Maintain  a  separate  queue  for  each  consumer  applicaEon  •  Disadvantage:  The  load  on  the  server  is  now  proporEonal  to  the  number  of  separate  

applicaEons  and  the  frequency  of  incoming  events  The  best  choice  would  be:  •  Single  queue  where  adding  a  consumer  is  simple  and  introduces  a  minimal  increase  of  load  

The  main  problem  of  a  single-­‐consumer  queue  is  that  the  queue  server  has  to  keep  track  whether  an  item  has  been  successfully  consumed  or  not.    Main  idea:  why  not  shil  this  responsibility  to  the  consuming  client  applicaEons?  

Queue  Queue  

consumer  

App  1  

App  N  OpEon  A:  Single  consumer    

Page 8: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

MulE-­‐consumer  queue  

Main  idea:    •  each  consumer  applicaEon  keeps  track  of  the  consumed  event  objects  •  It  can  request  the  event  stream  to  be  replayed  from  any  point  in  the  event  

stream  history.  •  The  queue  server  guarantees  that  a    certain  amount  of  stream  is  always  

available  (e.g.  the  items  produced  during  the  last  12  hours,  or  the  freshest  events  in  50  GBs)  

•  It  also  ensures  that  events  are  processed  in  the  order  in  which  they  have  been  produced  (their  order  in  the  queue)  

0 1 2 3 4 5 7 8

MulE-­‐consumer  queue  

App1  

App2  

Get  3  items  from    PosiEon  5  on  

Get  7  items  from    PosiEon  2  on  

Last  succ-­‐consumed  =  4  

Last  succ-­‐consumed  =  1  

Page 9: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Ka)a  Main  Concepts  •  Data  distribuEon  is  organized  in  topics.  Data  contained  within  a  topic  is  somehow  related  (e.g.  they  are  parsed  in  the  same  way).    

•  Topic::  a  category  or  feed  name  to  which  messages  are  published.    

•  Producer::  any  program  that  can  publish  a  message  to  a  topic  (in  some  given  serializaEon  method).  It  can  also  send  a  set  of  messages  in  single  publish  “burst”  request.  

•  Published  messages  are  stored  in  a  set  of  distributed  servers  (Brokers),  called  a  Ka)a  Cluster.    

Page 10: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Logs  Ka)a  is  “inspired”  by  logs  Logs  are/have:  •  Append-­‐only  data  structure  •  SequenEal,  strictly  ordered  •  SequenEal  writes  and  sequenEal  reads  can  be  made  very  

fast  •  the  log  records  when  things  happened  (through  the  

message’s  offset)  •  Allows  determinisEc  replay  of  the  history  of  events  •  A  log  can  be  seen  as  a  queue  of  things  that  sEll  have  to  be  

processed  

•  In  Ka)a,  a  topic  is  a  sort  of  append-­‐only  log  

Page 11: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Ka)a  Main  concepts  •  Each  topic  is  like  a  parallel  queue.  •  Data  in  a  topic  is  further  divided  into  par##ons  (disjoint  parallel  

slices),  which  is  a  means  of  parallelizing  the  consumpEon  of  messages  

•  Each  par##on  is  an  ordered,  immutable  sequence  of  messages  of  the  topic  that  is  conEnually  appended  to—a  log.    

•  Each  parEEon  is  hold  by  a  Ka)a  broker  •  Producers  choose  which  message  to  assign  to  which  parEEon  

within  the  topic.  This  can  be  done  in  a  round-­‐robin  fashion  to  balance  load  or  it  can  be  done  according  to  some  semanEc  parEEon  funcEon  (say  based  on  some  key  in  the  message).  

Page 12: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Advantage  of  ParEEons  

•  Split  up  the  message  stream  into  independent  message  streams  

•  Split  the  load  among  the  consumers  (the  workers)  èhorizontal  parallelism  

•  Strict  ordering  (append-­‐only)  within  each  parEEon  

•  No  ordering  between  parEEons  (producer  may  write  in  any  order  into  parEEons)  è  no  need  to  coordinate  any  consumpEon  

Page 13: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Main  Concepts  

•  The  parEEons  of  the  topic  have  two  purposes:  –  allow  the  log  to  scale  beyond  a  size  that  will  fit  on  a  single  broker.    

–  Each  individual  parEEon  must  fit  on  the  broker  that  host  it,  but  a  topic  may  have  many  parEEons  so  it  can  handle  an  arbitrary  amount  of  data.  

–  They  act  as  a  unit  of  parallelism  (each  consumer  instance  will  receive  data  from  a  single  parEEon)  

– Gives  a  way  of  distribuEng  the  produced  data  items  (messages)  to  different  workers  for  beJer  load  balancing  

Page 14: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Parallel  DistribuEon  through  parEEons  

•  Producers  select  the  parEEons  to  send  their  messages.    •  Consumer  always  consumes  messages  from  a  single  parEEon  sequenEally  (through  a  

pull  request  to  the  Ka)a  cluster),  informing  the  message’s  offset    •  if  the  consumer  acknowledges  parEcular  message  offset,  it  implies  that  the  consumer  

has  consumed  all  prior  messages.  

Source:  Abhishek  Sharma  –  Ka)a  Artcitecture,  2014  

Page 15: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Main  Concepts  •  Each  parEEon  of  a  topic  corresponds  to  a  logical  log.    •  Every  Eme  a  producer  publishes  a  message  to  a  parEEon,  the  

broker  simply  appends  the  message  to  the  end  of  the  log.  •  The  messages  in  the  parEEons  are  each  assigned  an  id  number  

which  is  its  offset  in  the  log.  

ParEEons  with  the  sequenEal  ids  of  messages.  

Page 16: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Consumer  Groups  In  Big  Data  it  is  common  to  have  several  machines  working  together  to  consume/process  data  from  a  topic  •  Consumer  machines  label  themselves  with  a  consumer  group  •  Each  message  published  to  a  topic  is  delivered  to  one  

machine  within  each  consumer  group  –  If  all  consumers  are  in  one  consumer  group  -­‐>  then  Ka)a  provides  tradiEonal  queuing  messaging  semanEcs  

–  If  each  consumer  has  its  own  group,  then  Ka)a  provides  broadcast-­‐type  Pub/Sub  semanEcs  (all  messages  get  delivered  to  all  the  consumer  machines)  

–  More  commonly,  each  topic  has  several  consumer  groups  (each  of  which  with  some  machines)  

•  Ka)a  assigns  each  parEEon  of  the  topic  to  exactly  one  consumer  instance/machine  in  the  group.  Since  there  are  several  parEEons  this  balances  the  load  over  the  consumer  instances.  

Page 17: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project
Page 18: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Log-­‐Structured  Storage  •  Ka)a  maintains  separate  logs  for  each  parEEon  of  each  topic  

•  Append-­‐only  log  mechanism  similar  to  the  write-­‐ahead-­‐log  protocol  

•  write-­‐ahead-­‐log:  a  new  message  (being  wriJen)  is  only  made  available  to  consumers  unEl  aler  it  has  been  commiJed  to  the  log.  So,  no  consumer  will  consume  a  message  that  may  potenEally  be  lost  in  the  event  of  a  broker  failure;    

•  This  maximizes  throughput  while  guaranteeing  reliable  message  delivery  (messages  are  hold  in  several  Brokers)  

Page 19: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Ka)a  Delivery  Guarantees    •  Messages  sent  by  a  producer  to  a  parEcular  topic  parEEon  

will  be  appended  in  the  order  they  are  sent.  That  is,  if  a  message  M1  is  sent  by  the  same  producer  as  a  message  M2,  and  M1  is  sent  first,  then  M1  will  have  a  lower  offset  than  M2  and  appear  earlier  in  the  log.  

•  A  consumer  instance  sees  messages  in  the  order  they  are  stored  in  the  log.  

•  For  a  topic  with  replicaEon  factor  N,  it  will  tolerate  up  to  N-­‐1  server  failures  without  loosing  any  messages  commiJed  to  the  log.  

Page 20: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Not  a  pure  Queuing  System  •  Ka)a  avoids  the  overhead  of  guaranteeing  that  messages  

are  processed  in  the  order  in  which  they  were  received  (such  as  in  AcEveMQ)  but  only  guarantees  ordering  within  each  parEEon  (by  a  same  producer)  

•  There  is  no  defined  order  of  writes  and  reads  to  different  parEEons  of  a  topic  -­‐>  Ka)a  does  not  guarantee  that  different  consumer  instances  will  read  messages  in  the  same  order  

•  This  enhances  parallelism  •  Ka)a  also  does  not  remove  messages  that  have  been  

passed  to  consumers.  (It  is  the  consumer’s  duty  to  keep  track  of  the  offset  of  the  latest  consumed  message).  

•  Messages  are  automaEcally  removed  aler  some  Eme.  

Page 21: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

ReplicaEon  Ka)a  (from  version  0.8  onwards)  protects  against  broker  failures  by  replicaEng  data  •  Each  topic  has  a  replicaEon  factor:  it  means  that  each  of  

the  topic’s  parEEon  will  have  a  number  of  synchronized  replicas    

•  Replicas  guarantee  that  commiJed  messages  won’t  be  lost  as  long  as  at  least  one  replica  survives.    

•  One  replica  is  designated  as  the  Leader;  •  The  follower  replicas  fetch  data  from  the  leader  •  The  leader  holds  the  list  of  “in-­‐sync”  replicas  (ISR  –  In-­‐

synch-­‐replica  set),  i.e.  brokers  that  have  up-­‐to-­‐date  logs  of  the  parEEon(s).  

Page 22: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Topics,  ParEEons  and  Replicas  

Source:  Michael  G.  Noll,  Running  a  MulE-­‐Broker  Apache  Ka)a  0.8  Cluster  on  a  Single  Node,  www.michael-­‐noll.com  

Page 23: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

ReplicaEon  •  Each  parEEon  of  a  topic  has  a  leader  par##on  that  is  replicated  in  follower  par##ons.  

•  The  replicas  are  distributed  among  the  different  physical  brokers  (each  follower  on  a  different  broker)  

•  When  a  message  arrives  at  the  leader  parEEon,  it  is  first  appended  to  the  leader’s  log,  and  forwarded  to  the  followers’  parEEons  in  the  ISR.  Only    aler  each  followers’  parEEons  sends  an  acknowledgement,  the  message  is  considered  to  be  commiJed  and  is  made  available  to  consumers  for  reading.  

•  The  leader  also  occasionally  sends  a  high  watermark  with  the  offset  of  the  most  recently  commiJed  messages  and  propagates  this  to  the  follower  parEEons  in  the  ISR.  

Page 24: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Replication and ISRs

0  

1  

2  

0  

1  

2  

0  

1  

2  

Producer  

Broker 100

Broker 101

Broker 102

Topic: Partitions: Replicas:

my_topic 3 3

Partition: Leader:

ISR:

1 101

100,102

Partition: Leader:

ISR:

2 102

101,100

Partition: Leader:

ISR:

0 100

101,102

Page 25: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

ReplicaEon  •  If  a  broker  containing  a  parEcular  parEEon  fails,  it  is  removed  from  the  ISR,  and  the  leader  does    not  wait  any  more  for  its  acknowledgement  

•  Ka)a’s  Zookeeper  support  is  in  charge  of  keeping  a  set  of  live  brokers  for  a  Topic  with  replicaEon  factor  >1    

•  Aler  recovery  of  a  failed  follower  parEEon,  it  examines  the  last  known  high  watermark  and  copies  all  messages  from  the  leader  parEEon  up  to  the  current  commiJed  offset.  Aler  this  is  finished,  it  is  added  back  to  the  ISR  

Page 26: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Three  levels  of  acknowledgement  in  the  Producer  API  

Producer and consumer are replication-aware. Durability can be configured with the producer configuration  •  None:  producer  awaits  no  ack  from  the  leader  parEEon.  

Highest  throughput  but  messages  may  be  lost  •  Leader:  leader  parEEon  sends  an  ackn  as  soon  as  it  has  

received  the  message.  This  reduces  performance  a  bit,  but  offers  a  reasonable  level  of  durability  

•  All:  leader  sends  an  ack  only  aler  the  message  has  been  commiJed  (through  acks  from  all  follower  parEEons  in  ISR).  Much  worse  performance,  but  message  can  be  recovered  as  long  as  there  are  parEEons  in  the  ISR.    

•  If  less  than  Min_ISR  follower  parEEons  are  acEve,  Producer  gets  an  feedback  and  starts  to  buffer      

Page 27: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Ka)a  RetenEon  Policies  

Ka)a  is  not  a  long-­‐terms  storage  system  (as  HDFS),  so  data  cannot  be  persisted  indefinitely  in  the  brokers.  A  retenEon  policy  determines  which  newest  data  will  be  maintained  in  each  parEEon.  There  are  several  policies:  •  Space-­‐based::  keep  the  last  X  GBs  of  messages  •  Time-­‐based::  keep  the  messages  produced  during  last  24  

hours  •  Key-­‐based::  keep  only  the  N  latest  messages  of  each  key.  

Page 28: Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

CoordinaEon  of  Ka)a  Brokers  

•  Ka)a  Producers,  Subscribers  and  Brokers  need  to  know  which  Ka)a  brokers  are  acEve  and  execuEng  

•  For  each  parEEon,  brokers  need  to  know  which  one  is  the  leader,  and  which  ones  are  the  followers  

•  This  coordinaEon  is  done  through  Zookeeper  service,  a  distributed  coordinaEon  service.