ASPgems - kappa architecture
-
Upload
juantomas-garcia-molina -
Category
Data & Analytics
-
view
1.237 -
download
0
Transcript of ASPgems - kappa architecture
![Page 1: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/1.jpg)
diciembre 2010
Kappa Architecture Our Experience
![Page 2: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/2.jpg)
Who am I
CDO ASPgems
Former President of Hispalinux (Spanish LUG)
Author “La Pastilla Roja” first spanish book about Free Software.
![Page 3: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/3.jpg)
Menu
A little context about Kappa Architecture
What’s Kappa Architecture
What is not Kappa Architecture
How we implement it
Real use cases with KA
![Page 4: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/4.jpg)
A little contextJuly 2, 2014 Jay Kreps coined the term Kappa Architecture in an article for O’reilly Radar
![Page 5: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/5.jpg)
Who is Jay Kreps
Jay has been involved in lots of projects:
Author of the essay:
The Log: What every software engineer should know about real-time data's unifying abstraction (12/16/2013)
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
![Page 6: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/6.jpg)
Jay KrepsAuthor of the book: I ♥ Logs
![Page 7: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/7.jpg)
Jay KrepsInvolved with projects as:
Apache Kafka
Apache Samza
Voldemort
Azkaban
Ex-Linkedin
Now co-founder and CEO of Confluent
![Page 8: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/8.jpg)
Lambda ArchitectureLook something like this:
https://www.mapr.com/developercentral/lambda-architecture
![Page 9: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/9.jpg)
Lambda ArchitectureBatch layer that provides the following functionality
managing the master dataset, an immutable, append-only set of raw data.
pre-computing arbitrary query functions, called batch views.
https://www.mapr.com/developercentral/lambda-architecture
![Page 10: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/10.jpg)
Lambda ArchitectureServing layer
This layer indexes the batch views so that they can be queried in ad hoc with low latency.
Speed layer
This layer accommodates all requests that are subject to low latency requirements. Using fast and incremental algorithms, the speed layer deals with recent data only.
![Page 11: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/11.jpg)
Lambda Architecturebatch layer datasets can be in a distributed filesystem, while MapReduce can be used to create batch views that can be fed to the serving layer.
The serving layer can be implemented using NoSQL technologies such as HBase,Apache Druid, etc.
Querying can be implemented by technologies such as Apache Drill or Impala
Speed layer can be realized with data streaming technologies such as Apache Storm or Spark Streaming
https://www.mapr.com/developercentral/lambda-architecture
![Page 12: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/12.jpg)
Pros of Lambda Architecture
Retain the input data unchanged.
Think about modeling data transformations, series of data states from the original input.
Lambda architecture take in account the problem of reprocessing data.
this happens all the time, the code will change, and you will need to reprocess all the information. Lots of reasons and you will need to live with this.
![Page 13: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/13.jpg)
Cons of Lambda Architecture
Maintain the code that need to produce the same result from two complex distributed system is painful.
Very different code for MapReduce and Storm/Apache Spark
Not only is about different code, is also about debugging and interaction with other products like (hive, Oozie, Cascading, etc)
At the end is a problem about different and diverging programming paradigms.
![Page 14: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/14.jpg)
So what is Kappa Architecture
The proposal of Jay Kreps is so simple:
Use kafka (or other system) that will let you retain the full log of the data you need to reprocess.
When you want to do the reprocessing, start a second instance of your stream processing job that starts processing from the beginning of the retained data, but direct this output data to a new output table.
![Page 15: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/15.jpg)
So what is Kappa Architecture
part II
When the second job has caught up, switch the application to read from the new table.
Stop the old version of the job, and delete the old output table.
![Page 16: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/16.jpg)
So what is Kappa Architecture
This architecture looks something like this:
![Page 17: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/17.jpg)
So what is Kappa Architecture
The first benefit is that only you need to reprocessing only when you change the code.
You can check if the new version is working ok and if not reverse to the old output table.
You can mirror a Kafka topic to HDFS so you are not limited to the Kafka retention configuration.
You have only a code to maintain with an unique framework.
![Page 18: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/18.jpg)
So what is Kappa Architecture
The real advantage is not about efficiency at all (You will need extra temporarily storage when reprocessing for example) is allowing your team to develop, test, debug and operate their systems on top of a single processing framework.
![Page 19: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/19.jpg)
What is not Kappa Architecture
Is not a silver bullet to solve every problem at Big Data. Is not a list of prescriptions of technologies. You
can implement with your favorite frameworks. Is not a rigid set of rules. But helps to maintain
the complex projects simple.
![Page 20: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/20.jpg)
How we use Kappa Architecture
We start working with projects with a complex structure like Linkedin looks at early stage. That’s very usual.
![Page 21: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/21.jpg)
How we use Kappa Architecture
![Page 22: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/22.jpg)
How we use Kappa Architecture
We try to refactoring the data flows to fix in a Kappa Architecture.
![Page 23: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/23.jpg)
How we use Kappa Architecture
![Page 24: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/24.jpg)
How we use Kappa Architecture
We use Kafka as Stream Data Platform Instead of Samza we feel more comfortable with
Spark Streaming. At ASPGems we choose Apache Spark as our
Analytics Engine and not only for Spark Streaming.
![Page 25: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/25.jpg)
How we use Kappa Architecture
At the end, Kappa Architecture is design pattern for us. We use/clone this pattern in almost our projects. We have projects of every size, volume of data
or speed needing and fix with the Kappa Architecture.
![Page 26: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/26.jpg)
Use Cases
![Page 27: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/27.jpg)
Telefónica - MSS
We use KA to calculate near real time KPIs, SLAs related with the managed security system. We simplify the data flow of the input data. Kafka in the streaming data platform. As MPP we use CassandraDB.
![Page 28: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/28.jpg)
IOT - OBD II
One of our clients install On Board Devices in the cars of its customers. We implement an API to got all the information
in real time and inject the information in Kafka. The business rules are implemented in a CEP
running into Apache Spark Streaming. As MPP we use Elastic Search.
![Page 29: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/29.jpg)
Insurance Company
We implement Kappa Architecture to process click stream in real time and clustering users We show content and offers that better fix users
![Page 30: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/30.jpg)
Energy Facility
We implement Kappa Architecture to process and predict energy consume. Our customer include energy storage systems
and we got all the information about energy storage (ultra-capacitors and batteries). We process this information to calculate the
effective lifetime of the components and its degradation.
![Page 31: ASPgems - kappa architecture](https://reader031.fdocuments.net/reader031/viewer/2022021919/587b09371a28abb15c8b4c35/html5/thumbnails/31.jpg)
Questions