Stream Processing with Kafka and Samza

25
Stream Processing with Kafka and Samza Diego Pacheco @diego_pacheco Principal Software Architect

Transcript of Stream Processing with Kafka and Samza

Page 1: Stream Processing with Kafka and Samza

Stream Processing with Kafka and Samza

Diego Pacheco @diego_pacheco Principal Software Architect

Page 2: Stream Processing with Kafka and Samza
Page 3: Stream Processing with Kafka and Samza
Page 4: Stream Processing with Kafka and Samza

●LinkedIN 2011●Implemented with Scala and Java●Motivation: Real-time data feeds●Goals:–Low Latency–High Throughtput

●Kafka at LinkedIN(2014):–300+ brokers–18k topics–140k partitions–220B messages per day–40TB inboud–160TB outbound–Peak Load: 3.25M messages/second

●Use case: Activity Stream, Offline log processing

Page 5: Stream Processing with Kafka and Samza

NO JMS

Page 6: Stream Processing with Kafka and Samza
Page 7: Stream Processing with Kafka and Samza
Page 8: Stream Processing with Kafka and Samza
Page 9: Stream Processing with Kafka and Samza
Page 10: Stream Processing with Kafka and Samza
Page 11: Stream Processing with Kafka and Samza
Page 12: Stream Processing with Kafka and Samza
Page 13: Stream Processing with Kafka and Samza
Page 14: Stream Processing with Kafka and Samza
Page 15: Stream Processing with Kafka and Samza

● LinkedIN 2013

● Stream Processing with Save Points.

● Multi-tenancy: 1 Thread per container

● State is simple

– You handle logging and restoring

– Single threaded programing

● Works with YARN

● Works well with Kafka

● Simple API – Record-like.

Page 16: Stream Processing with Kafka and Samza
Page 17: Stream Processing with Kafka and Samza
Page 18: Stream Processing with Kafka and Samza
Page 19: Stream Processing with Kafka and Samza
Page 20: Stream Processing with Kafka and Samza
Page 21: Stream Processing with Kafka and Samza
Page 22: Stream Processing with Kafka and Samza
Page 23: Stream Processing with Kafka and Samza

● Stream Processing

● Low Latency

● Async Processing

● Local State● Stores data localy on DISK● SAME machine where container runs

– Awesome FIT for Statefull processing

● Tight Integration with Kafka

● Strong Model For Streams: Ordered, Highly Avaliable, Partitioned and Durable(Kafka).

● Full feature Set of Kafka

● Client Side Join

Page 24: Stream Processing with Kafka and Samza
Page 25: Stream Processing with Kafka and Samza

Stream Processing with Kafka and Samza

Diego Pacheco @diego_pacheco Principal Software Architect

Thank You!Obrigado !