Real-time streams and logs with Storm and Kafka

download Real-time streams and logs with Storm and Kafka

of 47

  • date post

  • Category


  • view

  • download


Embed Size (px)


Some of the biggest issues at the center of analyzing large amounts of data are query flexibility, latency, and fault tolerance. Modern technologies that build upon the success of “big data” platforms, such as Apache Hadoop, have made it possible to spread the load of data analysis to commodity machines, but these analyses can still take hours to run and do not respond well to rapidly-changing data sets. A new generation of data processing platforms -- which we call “stream architectures” -- have converted data sources into streams of data that can be processed and analyzed in real-time. This has led to the development of various distributed real-time computation frameworks (e.g. Apache Storm) and multi-consumer data integration technologies (e.g. Apache Kafka). Together, they offer a way to do predictable computation on real-time data streams. In this talk, we will give an overview of these technologies and how they fit into the Python ecosystem. As part of this presentation, we also released streamparse, a new Python that makes it easy to debug and run large Storm clusters. Links: * * *

Transcript of Real-time streams and logs with Storm and Kafka

  • 1. Real-time Streams & Logs Andrew Montalenti, CTO Keith Bourgoin, Backend Lead 1 of 47

2. Agenda problem space Aggregating the stream (Storm) Organizing around logs (Kafka) 2 of 47 3. Admin Our presentations and code: This presentation's slides: This presentation's notes: 3 of 47 4. What is 4 of 47 5. What is Web content analytics for digital storytellers. 5 of 47 6. Velocity Average post has