AWS Webcast - Amazon Kinesis and Apache Storm
-
Upload
amazon-web-services -
Category
Technology
-
view
2.659 -
download
3
Transcript of AWS Webcast - Amazon Kinesis and Apache Storm
![Page 1: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/1.jpg)
@ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed
in whole or in part without the express consent of Amazon.com, Inc.
CLICKSTREAM ANALYTICS –
AMAZON KINESIS AND
APACHE STORM
![Page 2: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/2.jpg)
Agenda
Clickstream Analytics
Data Ingestion
Amazon Kinesis
Data Processing
Apache Storm
Amazon EMR
Q & A
![Page 3: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/3.jpg)
Clickstream Analytics in Real-time
![Page 4: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/4.jpg)
Clickstream Analytics
From Wikipedia
“… clicks anywhere in the webpage or application, the
action is logged on …”
“… useful for web activity analysis, software testing,
market research …”
It’s all about People & Products !!!
![Page 5: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/5.jpg)
Clickstream Analytics in Real-time
Ingestion
Files to Events
Processing
Batch to Continuous
Consumption
Reports to Alerts
![Page 6: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/6.jpg)
Real-Time Analytics
Real-time Ingest
• Highly Scalable
• Durable
• Elastic
• Replay-able Reads
Continuous Processing FX
• Load-balancing incoming streams
• Fault-tolerance, Checkpoint / Replay
• Elastic
• Enable multiple apps to process in parallel
Continuous data flow
Low end-to-end latency
Continuous, real-time workloads
+
![Page 7: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/7.jpg)
Data Ingestion
![Page 8: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/8.jpg)
Global top-10
foo-analysis.com
Starting simple...
![Page 9: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/9.jpg)
Global top-10Elastic Beanstalk
foo-analysis.com
Distributing the workload…
![Page 10: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/10.jpg)
Global top-10
Elastic Beanstalk
foo-analysis.com
Local top-10
Local top-10
Local top-10
Or using a Elastic Data Broker…
![Page 11: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/11.jpg)
Global top-10
Elastic Beanstalk
foo-analysis.com
K
I
N
E
S
I
S
Data
Record
StreamShard
Partition Key
Worker
My top-10
Data RecordSequence Number
14 17 18 21 23
Amazon Kinesis – Managed Stream
![Page 12: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/12.jpg)
AW
S E
nd
po
int
S3
DynamoDB
Redshift
Data
Sources
Availability
Zone
Availability
Zone
Data
Sources
Data
Sources
Data
Sources
Data
Sources
Availability
Zone
Shard 1
Shard 2
Shard N
[Data
Archive]
[Metric
Extraction]
[Sliding Window
Analysis]
[Machine
Learning]
App. 1
App. 2
App. 3
App. 4
EMR
Amazon Kinesis – Common Data Broker
![Page 13: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/13.jpg)
Amazon Kinesis – Distributed Streams
From batch to continuous processing
Scale UP or DOWN without losing sequencing
Workers can replay records for up to 24 hours
Scale up to GB/sec without losing durability
Records stored across multiple availability zones
Multiple parallel Kinesis Apps
RDBMS, S3, Data Warehouse
![Page 14: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/14.jpg)
Data Processing
![Page 15: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/15.jpg)
Batch
Real
Time
Clickstream – Real-time and Batch
Batch
Analysis
DW
Hadoop
Notifications
& Alerts
Dashboards/
visualizations
APIsStreaming
AnalyticsClickstream
Deep Learning
Dashboards/
visualizations
Spark
Storm
KCL
Data
Archive
![Page 16: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/16.jpg)
Processing Stream in real-time
![Page 17: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/17.jpg)
Storm Concepts
Streams
Unbounded sequence of tuples
Spout
Source of Stream e.g. Read from Twitter streaming API
Bolts
Processes input streams and produces new streams e.g. Functions, Filters, Aggregation, Joins
Topologies
Network of spouts and bolts
![Page 18: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/18.jpg)
Storm Architecture
Master
Node
Cluster
CoordinationWorker
Processes
Worker
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor Worker
Worker
Worker
Launches
Workers
![Page 19: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/19.jpg)
Apache Storm
Guaranteed data processing
Horizontal scalability
Fault-tolerance
Integration with queuing system
Higher level abstractions
![Page 20: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/20.jpg)
Demo: Real time stream processing
![Page 21: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/21.jpg)
Real-time: Event-based processing
KinesisStormSpout
ProducerAmazonKinesis
Apache Storm
ElastiCache(Redis) Node.js Client
(D3)
http://blogs.aws.amazon.com/bigdata/post/Tx36LYSCY2R0A9B/Implement-a-Real-time-Sliding-Window-Application-Using-Amazon-Kinesis-and-Apache
![Page 22: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/22.jpg)
Creating a Storm Topology
KinesisSpoutConfig(streamName, zookeeperEndpoint). withZookeeperPrefix(zookeeperPrefix)
.withInitialPositionInStream(initialPositionInStream)
.withRegion(Regions.fromName(regionName));…
builder.setSpout("Kinesis", spout, 2);builder.setBolt("Parse", new ParseReferrerBolt(),6).shuffleGrouping("Kinesis");builder.setBolt("Count", new RollingCountBolt(5, 2,elasticCacheRedisEndpoint),
6).fieldsGrouping("Parse", new Fields("referrer"));..StormSubmitter.submitTopology(topologyName, topoConf, builder.createTopology());
KinesisStormSpout
![Page 23: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/23.jpg)
Sliding window using Tick Tuple
…public void execute(Tuple tuple){
if (TupleHelpers.isTickTuple(tuple)){
LOG.debug("Received tick tuple, triggering emit of current window counts");emitCurrentWindowCounts();
}else {
countObjAndAck(tuple);}
}
![Page 24: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/24.jpg)
Using Redis as an Event relay
for (Entry<Object, Long> entry : counts.entrySet()){…msg.put("name", referrer);msg.put("time", currentEPOCH);msg.put("count", count);…jedis.publish("pubsubCounters",msg.toString());
}
ElastiCache(Redis)
![Page 25: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/25.jpg)
NodeJs – PubSub to Server Side Events
function ticker(req,res) {… subscriber.subscribe("pubsubCounters");subscriber.on("message", function(channel, message) {
res.json(message);…res.json = function(obj) { res.write("data: "+obj+"\n\n"); }}
connect(){
... if(req.url == '/eventCounters') { ticker(req,res); }
Node.js
![Page 26: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/26.jpg)
Visualizing the events in Client
var source = new EventSource('/ticker');source.addEventListener('message',tick);
function tick(e) {if(e){var eventData = JSON.parse(e.data);window[eventData.name].push([{ time: eventData.time,
y:eventData.count}]);
Client(D3)
![Page 27: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/27.jpg)
Amazon EMR
Processing Streams with Hadoop
![Page 28: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/28.jpg)
Amazon EMR?
Map-Reduce engine Integrated with tools
Hadoop-as-a-service
Massively parallel
Cost effective AWS wrapper
Integrated to AWS services
Introduction to Amazon EMR
![Page 29: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/29.jpg)
Master instance group
Task instance groupCore instance group
HDFS HDFS
Amazon S3
Amazon EMR - Architecture
Master instance
Controls the cluster
Core instance
Life of cluster
DataNode and TaskTracker daemons
Task instances
Added or subtracted to perform work (SPOT)
S3 as underlying ‘file system’
![Page 30: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/30.jpg)
Offline Analysis
Ad-hocAnalysis
Analyzing Kinesis using Amazon EMR
EMRS3Kinesis ApplicationProducer Amazon Kinesis
EMR
HivePig
SparkMapReduceAmazon Kinesis
![Page 31: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/31.jpg)
Demo: Stream processing with Spark
![Page 32: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/32.jpg)
Spark Streaming and Kinesis
Launch a EMR cluster with Spark
http://blogs.aws.amazon.com/bigdata/post/Tx15AY5C50K70RV/Installing-Apache-Spark-on-an-Amazon-EMR-Cluster
Spark Streaming
http://spark.apache.org/docs/1.2.0/streaming-programming-guide.html
Spark Streaming Kinesis integration
http://spark.apache.org/docs/1.2.0/streaming-kinesis-integration.html
![Page 33: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/33.jpg)
Kinesis Word Count Example
private object KinesisWordCountASL extends Logging {…val sparkConfig = new SparkConf().setAppName("KinesisWordCount")
val ssc = new StreamingContext(sparkConfig, batchInterval)
val unionStreams = ssc.union(kinesisStreams)
/* Convert each line of Array[Byte] to String, split into words, and count them */val words = unionStreams.flatMap(byteArray => new String(byteArray).split(" "))
/* Map each word to a (word, 1) tuple so we can reduce/aggregate by key. */val wordCounts = words.map(word => (word, 1)).reduceByKey(_ + _)
![Page 34: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/34.jpg)
Amazon Kinesis with Apache Storm:
http://d0.awsstatic.com/whitepapers/building-sliding-window-analysis-of-clickstream-data-kinesis.pdf
Amazon Kinesis with Amazon EMR
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-kinesis.html
Amazon Kinesis with Apache Spark
http://spark.apache.org/docs/1.2.0/streaming-kinesis-integration.html
Q & A
![Page 35: AWS Webcast - Amazon Kinesis and Apache Storm](https://reader034.fdocuments.net/reader034/viewer/2022050719/55a78bae1a28ab306e8b46ab/html5/thumbnails/35.jpg)
THANK YOU !!!
http://aws.amazon.com/big-data