Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
-
Upload
voxxed-days-thessaloniki -
Category
Software
-
view
153 -
download
0
Transcript of Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
![Page 1: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/1.jpg)
Streaming Engines for Big DataSpark Streaming: a case study
Stavros KontopoulosSenior Software Engineer @ Lightbend, M.Sc.
21st October 2016, Thessaloniki
#VoxxedDaysThessaloniki
![Page 2: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/2.jpg)
2
Who Am I?
Fast Data Team Engineer @ Lightbend
OSS contributor (Apache Spark on Mesos) https://github.com/skonto
#VoxxedDaysThessaloniki
![Page 3: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/3.jpg)
3
● A bit of history...● Streaming Engines for Big Data
○ Key concepts - Design Considerations○ Modern analysis of infinite streams○ Streaming Engines Examples○ Which one to use?
● Spark Streaming A Case Study○ DStream API○ Structured Streaming
#VoxxedDaysThessaloniki
![Page 4: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/4.jpg)
Who likes history?
#VoxxedDaysThessaloniki4
![Page 5: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/5.jpg)
Why Streaming?
5#VoxxedDaysThessaloniki
![Page 6: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/6.jpg)
Big Data - The story● One decade ago people started looking to the problem of how to process
massive data sets (Velocity, Variety, Volume).
● The Apache Hadoop project appeared at that time and became the golden solution for batch processing running on commodity hardware. Later became an ecosystem of several other projects: Pig, Hive, HBase etc.
present
GFS paper
2003
Mapreduce Paper
2004
Hadoop project, 0.1.0 release
2006 2009
Hadoop sorts 1 Petabyte
Spark on Yarn by Clouder, Yarn in production
2010
Hadoop 2.4, 2.5, 2.6 releases
2014
HBase, Pig, Hive graduate
2013 2015
Hadoop 2.7release
#VoxxedDaysThessaloniki6
![Page 7: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/7.jpg)
Big Data - The story
X
Y
Z
MAP
MAP
SHUFFLEMAP
MAP-REDUCE
A
B
A
REDUCE
REDUCE
Q
W
#VoxxedDaysThessaloniki7
![Page 8: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/8.jpg)
Big Data - The story
Hadoop pros/cons
● Batch jobs usually take hours if not days to complete, in many applications that is not acceptable anymore.
● Traditionally focus is on throughput than latency. Frameworks like Hadoop were designed with that in mind.
● Accuracy is the best you can get.
#VoxxedDaysThessaloniki8
![Page 9: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/9.jpg)
Big Data - The story● Giuseppe DeCandia et al., ”Dynamo: amazon's highly available key-value
store.” changed the DataBase world in 2007.
● NoSQL Databases along with general system like Hadoop solve problems cannot be solved with traditional RDBMs.
● Technology facts: Cheap memory, SSDs, HDDs are the new tape, more cpus over more powerful cpus.
#VoxxedDaysThessaloniki9
![Page 10: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/10.jpg)
Big Data - The story● Disruptive companies need to utilize ML and latest information to come up
with smart decisions sooner.
● And so we need streaming in the enterprise… We no longer talk about Big Data only, its Fast Data first.
Searching Recommendations Real-time financial activities Fraud Detection
#VoxxedDaysThessaloniki10
![Page 11: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/11.jpg)
Big Data - The storyOpsClarity Report Summary:
● 92% plan to increase their investment in stream processing applications in the next year
● 79% plan to reduce or eliminate investment in batch processing● 32% use real time analysis to power core customer-facing applications● 44% agreed that it is tedious to correlate issues across the pipeline● 68% identified lack of experience and underlying complexity of new data
frameworks as their barrier to adoption
http://info.opsclarity.com/2016-fast-data-streaming-applications-report.html
#VoxxedDaysThessaloniki11
![Page 12: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/12.jpg)
#VoxxedDaysThessaloniki12
Key Concepts
![Page 13: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/13.jpg)
Streams ● A Stream is flow of data. The flow consists of ephemeral data elements
flowing from a source to a sink.● Streams become useful when a set of operations/transformations are applied
on them.● Can be infinite or finite in size. This translates to the notions of bounded/
unbounded data.
#VoxxedDaysThessaloniki13
![Page 14: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/14.jpg)
Stream Processing
Stream Processing: processing done on an (un)bounded data stream. Not all data are available.
Source Sink
Processing
#VoxxedDaysThessaloniki14
![Page 15: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/15.jpg)
Stream Processing
Multiple Streams Source1
Sink
Processing
Source 2
#VoxxedDaysThessaloniki15
![Page 16: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/16.jpg)
Stream ProcessingProcessing can be…
● Stream management: connect, iterate...● Data manipulation: map, flatmap…● Input/Output
Graph as the abstraction for defining how all the pieces are put together and how data flows between them. Some systems use a DAG.
16#VoxxedDaysThessaloniki
Map ReduceCount
Distinct DFS
DBDFS
![Page 17: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/17.jpg)
Stream Processing - Parallelism
Source Sink
#VoxxedDaysThessaloniki
map
map17
partitioner
![Page 18: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/18.jpg)
Stream Processing - Execution ModelMap your graph to an execution plan and run it.
Execution Model Abstractions: Job, Task etc.
Actors: JobManager, TaskManager.
Where TaskManager and Tasks run? Threads, nodes etc…
Important: code runs close to the data… Serialize and send over the network the task code along with any dependencies, communicate back the results to the application...
18#VoxxedDaysThessaloniki
![Page 19: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/19.jpg)
Stream vs Batch ProcessingBatch processing is processing done on finite data set with all data available.
Two types of engines: batch and streaming engines which can actually be used for both types of processing!
19#VoxxedDaysThessaloniki
![Page 20: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/20.jpg)
Streaming ApplicationsUser code that materializes streams and applies stream processing.
...
...
20#VoxxedDaysThessaloniki
![Page 21: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/21.jpg)
Streaming Engines for Big DataStreaming Engines allows to building streaming applications:
Streaming Engines for Big data provide in addition:
● A rich ecosystem built around them for example connectors for common sources, outputs to different sinks etc.
● Fault tolerance, scalability (cluster management support), management of strugglers
● ML, Graph, CEP, processing capabilities
+ API Streaming App
21#VoxxedDaysThessaloniki
![Page 22: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/22.jpg)
Streaming Engines for Big DataA big data system at minimum needs:
● A data processing framework eg. a streaming engine.● A Distributed File System.
22#VoxxedDaysThessaloniki
![Page 23: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/23.jpg)
23
Designing A Streaming Engine
![Page 24: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/24.jpg)
Design Considerations of A Streaming Engine
● Strong consistency. If a machine fails how my results are affected?○ Exactly once processing.○ Checkpointing
● Appropriate semantics for integrating time. Late data?● API (Language Support, DAG, SQL Support etc)
24#VoxxedDaysThessaloniki
![Page 25: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/25.jpg)
Design Considerations of A Streaming Engine
● Execution Model - integration with cluster manager(s)● Elasticity - Dynamic allocation● Performance: Throughput vs Latency● Libraries for CEP, Graph, ML, SQL based processing
25#VoxxedDaysThessaloniki
![Page 26: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/26.jpg)
Design Considerations of A Streaming Engine
● Deployment modes: local vs cluster mode● Streaming vs Batch mode, Code looks the same?● Logging ● Local state management● Support for session state
26#VoxxedDaysThessaloniki
![Page 27: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/27.jpg)
Design Considerations of A Streaming Engine
● Backpressure● Off Heap Management● Caching● Security● UI● CLI env for interactive sessions
27#VoxxedDaysThessaloniki
![Page 28: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/28.jpg)
28
State of the Art Stream Analysis
![Page 29: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/29.jpg)
Analyzing Infinite Data Streams
● Recent advances in Streaming are a result of the pioneer work:
○ MillWheel: Fault-Tolerant Stream Processing at Internet Scale, VLDB 2013.
○ The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing, Proceedings of the VLDB Endowment, vol. 8 (2015), pp. 1792-1803
29#VoxxedDaysThessaloniki
![Page 30: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/30.jpg)
Analyzing Infinite Data Streams● Two cases for processing:
○ Single event processing: event transformation, trigger an alarm on an error event
○ Event aggregations: summary statistics, group-by, join and similar queries. For example compute the average temperature for the last 5 minutes from a sensor data stream.
30#VoxxedDaysThessaloniki
![Page 31: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/31.jpg)
Analyzing Infinite Data Streams● Event aggregation introduces the concept of windowing wrt the notion of time
selected:○ Event time (the time that events happen): Important for most use cases where context and
correctness matter at the same time. Example: billing applications, anomaly detection.
○ Processing time (the time they are observed during processing): Use cases where I only care about what I process in a window. Example: accumulated clicks on a page per second.
○ System Arrival or Ingestion time (the time that events arrived at the streaming system).
● Ideally event time = Processing time. Reality is: there is skew.
31#VoxxedDaysThessaloniki
![Page 32: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/32.jpg)
Time in Modern Data Stream Analysis Windows come in different flavors:
● Tumbling windows discretize a stream into non-overlapping windows.○ Eg. report all distinct users every 10 seconds
● Sliding Windows: slide over the stream of data.○ Eg. report all distinct users for the last 10 minutes every 1 minute.
32#VoxxedDaysThessaloniki
![Page 33: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/33.jpg)
Analyzing Infinite Data Streams
● Watermarks: indicates that no elements with a timestamp older or equal to
the watermark timestamp should arrive for the specific window of data.
○ Allows us to mark late data. Late data can either be added to the window or discarded.
● Triggers: decide when the window is evaluated or purged.○ Allows complex logic for window processing
33#VoxxedDaysThessaloniki
![Page 34: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/34.jpg)
Analyzing Infinite Data Streams● Apache Beam is the open source successor of Google’s DataFlow
● It is becoming the standard api streaming. Provides the advanced semantics needed for the current needs in streaming applications.
34#VoxxedDaysThessaloniki
![Page 35: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/35.jpg)
Streaming Engines for Big Data OSS
● Apache Flink● Apache Spark Streaming● Apache Storm● Apache Samza● Apache Apex● Apache Kafka Streams (Confluent Platform)● Akka Streams/Gearpump● Apache Beam
Cloud:
● Amazon Kinesis● Google Dataflow 35
#VoxxedDaysThessaloniki
![Page 36: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/36.jpg)
Streaming Engines for Big Data - Pick oneMany criteria: use case at hand, existing infrastructure, performance, customer support, cloud vendor, features
Recommend to first to look at:
● Apache Flink for low latency and advanced semantics● Apache Spark for its maturity and rich set of functionality: ML, SQL, GraphX● Apache Kafka Streams for simple data transformations from and back to
Kafka topics
36#VoxxedDaysThessaloniki
![Page 37: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/37.jpg)
37
Apache Spark 2.0
![Page 38: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/38.jpg)
Spark in a NutshellApache Spark: A memory optimized distributed computing framework.
Supports caching of data in memory for speeding computations.
38#VoxxedDaysThessaloniki
![Page 39: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/39.jpg)
Spark in a Nutshell - RDDsRepresents a bounded dataset as an RDD (Resilient Distributed Dataset).
An RDD can be seen as an immutable distributed collection.
Two types of operations can be applied on an RDD: transformations like map and actions like collect.
Transformations are lazy while actions trigger computation on the cluster.
Operations like groupBy cause shuffle of data across the network.
39#VoxxedDaysThessaloniki
![Page 40: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/40.jpg)
Spark in a Nutshell - Deployment Mode
40#VoxxedDaysThessaloniki
![Page 41: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/41.jpg)
Spark in a Nutshell - Basic Components
41#VoxxedDaysThessaloniki
![Page 42: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/42.jpg)
42#VoxxedDaysThessaloniki
Spark Batch Sample Word Count
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
![Page 43: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/43.jpg)
Spark in a nutshell - Key FeaturesDynamic Allocation
Memory management (Project Tungsten + off heap operations)
Cluster managers: Yarn, StandAlone, Mesos
Scala, Python, Java, R
Micro-batch engine
SQL API, ML library, GraphX
Monitoring UI43
#VoxxedDaysThessaloniki
![Page 44: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/44.jpg)
Spark Streaming
Two flavors of Streaming:
● DStream API Spark 1.X -> mature API
● Structured Streaming (Alpha), Spark 2.0 -> Don’t go to production yet
“Based on Spark SQL. User does not need to reason about streaming end to end”
44#VoxxedDaysThessaloniki
![Page 45: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/45.jpg)
Spark Streaming DStream API Discretizes the stream based on batchDuration (batch interval) which is configured once.
Provides exactly one semantics with KafkaDirect for DStream or with WAL enabled for reliable receivers/drivers plus checkpointing for driver context recovery.
Many transformations and actions you get on a RDD you can get them on DStream as well.
45#VoxxedDaysThessaloniki
![Page 46: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/46.jpg)
Spark Structured Streaming ● Integrates with DF and Dataset API (Spark SQL) for structured queries● Allows for end-to-end exactly once for specific sources/sinks (HDFS/S3)
○ Requires replayable sources and idempotent sinks
● Input is sent to a query and output of the query is written to a sink.
Two types of output implemented:
● Complete Mode - The entire updated Result Table will be written to the external storage. It is up to the storage connector to decide how to handle writing of the entire table.
● Append Mode - Only the new rows appended in the Result Table since the last trigger will be written to the external storage. This is applicable only on the queries where existing rows in the Result Table are not expected to change.
○
46#VoxxedDaysThessaloniki
![Page 47: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/47.jpg)
Spark Structured Streaming - Not Yet Implemented● More Sources/Sinks● Watermarks● Late data management● State Sessions
47#VoxxedDaysThessaloniki
![Page 48: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/48.jpg)
48#VoxxedDaysThessaloniki
DStream API Example
reportMax rdd.map(data => data.toInt).max()
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
![Page 49: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/49.jpg)
49#VoxxedDaysThessaloniki
reportMax rdd.map(data => data.toInt).max()
DStream API ExampleCheckPointing
get or create the streaming context
All streaming code goes here
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
![Page 50: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/50.jpg)
50
Spark SQL - Batch
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
![Page 51: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/51.jpg)
51
Structured Streaming
mean code same as batch
readStream instead of read
writeStream instead of write
Session creation is the same as with batch case
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
![Page 52: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/52.jpg)
Thank You!
Questions?
#VoxxedDaysThessaloniki
![Page 53: Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data](https://reader034.fdocuments.net/reader034/viewer/2022042619/586fdda71a28ab18428b68b9/html5/thumbnails/53.jpg)
References1. http://data-artisans.com/batch-is-a-special-case-of-streaming/2. http://www.slideshare.net/rolandkuhn/reactive-streams3. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-1014. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-1025. http://www.slideshare.net/FlinkForward/flink-case-study-capital-one6. http://flink.apache.org/poweredby.html7. https://en.wikipedia.org/wiki/Apache_Hadoop8. http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/9. http://data-artisans.com/batch-is-a-special-case-of-streaming/
10. https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
11. Ellen Friedman & Kostas Tzoumas, Introduction to Apache Flink, Oreilly 201612. http://spark.apache.org/docs/latest/sql-programming-guide.html13. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
53#VoxxedDaysThessaloniki