GNW03: Stream Processing with Apache Kafka by Gwen Shapira
-
Upload
gluent -
Category
Technology
-
view
1.911 -
download
0
Transcript of GNW03: Stream Processing with Apache Kafka by Gwen Shapira
I’lltellyouabout
• Whatisstreamprocessingandwhyitmatters• WhatisApacheKafka• HowKafkahelpsstreamprocessing
Stayawakeforthispart
StreamProcessingParadigm
• Dataisgeneratedatitsownrateas“Streams”• Wecanprocessasmuchoraslittleaswewant• Continuously• Resultsareavailableinreal-time• Butnothingwaitsforspecificresults• Timefordataavailability?• Morethan“fewms”• Lessthan“hours”
Thisistheworldchangingbit
• Mostofthebusinessis…• Noturgentenoughtorequireimmediateresponse• Butcan’twaitforthenextday
• “Streamsofevents”representssomethingfundamental• Samewayrelationaltablesarefundamental
ButLogsarealsoaSTREAMofeventsAndKafkastoresthoselogs
Allowingtoreadthepastandkeepgettingupdatesonthefuture
Method2:TheStreamProcessingFrameworks• Storm• Spark• Flink• Samza• Apex• Nifi• StreamBase• InfoSphere Streams• GoogleDataFlow (AKABeam)• Icangoonfor5morepages…
WhatdoImeanbytoocomplex?
HadoopClusterIIStorage Processing
SolR
HadoopClusterI
ClientClientFlumeAgents
Hbase /Memory
SparkStreaming
HDFS
Hive/Impala
Map/Reduce
Spark
Search
Automated&Manual
AnalyticalAdjustmentsandPatterndetection
Fetching&UpdatingProfiles
AdjustingNRTStats
HDFSEventSink
SolR Sink
BatchTimeAdjustments
Automated&Manual
ReviewofNRTChangesandCounters
LocalCache
Kafka
Clients:(Swipehere!)
WebApp
Whysomanymovingparts?
Weneeded…Hbase tohandlecomplexstateSparkrequiresHDFSIngestlayerBatchlayertohandlere-calculations
NoFramework
• Itisjustalibrarythatdoestransformations• Wecanaddlanguagesontop• Kafkadoeseverythingweneededtheframeworktodo• Youdon’tneed“framework”torunqueries,whydoyouneedittorunqueriescontinuously?
Wecanconverttablestostreamsandback:
Stream->Apply->TableTable->ChangeCapture->Stream
ThisiscalledTable-StreamDuality.