Post on 14-Jul-2015
Amazon Kinesis
toshiake@amazon.co.jp @ToshiakiEnami
AWS Amazon Kinesis Amazon DynamoDB
AWS
S3
ProcessSubmissions
StoreBatches
ProcessHourly w/Hadoop
ClientsSubmitting
Data
DataWarehouse
100ETL Job
100
, keep everything
Ingest
Client/Sensor
Ingest Processing StorageAnalytics + Visualization + Reporting
Ingest Layer"
"
Processing
Kafka
OrKinesis
Processing
Kin
esis
Kinesis
Amazon Kinesis
Kinesis1AZ
POS
Kinesis
Kinesis Client Library + Connector Library
HTTPS Post
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Apache Storm
Amazon Elastic MapReduce
MobileSDK & Cognito
Kinesis
Data Sources
App.4
[Machine Learning]
App.1
[Aggregate & De-Duplicate]
Data Sources
Data Sources
Data Sources
App.2
[Metric Extraction]
S3
DynamoDB
Redshift
App.3
[Real-timeDashboard]
Data Sources
Availability Zone
Shard 1Shard 2Shard N
Availability Zone
Availability Zone
Kinesis
AWS Endpoint
StreamStream1ShardShard 1MB/sec, 1000 TPS 2 MB/sec, 5TPS Data RecordData Record24 AZShard
Stream
Kinesis &
$0.0195/shard/
Put $0.043/100Put
$14 Get EC2
PutRecord API http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html
AWS SDK for Java, Javascript, Python, Ruby, PHP, .Net
botoput_record
http://docs.pythonboto.org/en/latest/ref/kinesis.html#module-boto.kinesis.layer1
DataRecord
Shard Shard
MD5Shard
0
2128
Shard-1
MD5()
Shard-0
0
2127
shard
KinesisStream 24
SeqNo(14)
SeqNo(17)
SeqNo(25)
SeqNo(26)
SeqNo(32)
Web
Fluentd Plugin Web
GithubPluginhttps://github.com/awslabs/aws-uent-plugin-kinesis
Log4J JavaLog4J
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/kinesis-pig-publisher.html
Web
# KINESIS appender log4j.logger.KinesisLogger=INFO, KINESIS log4j.additivity.KinesisLogger=false log4j.appender.KINESIS=com.amazonaws.services.kinesis.log4j.KinesisAppender log4j.appender.KINESIS.layout=org.apache.log4j.PatternLayout log4j.appender.KINESIS.layout.ConversionPattern=%m
log4j.properties
MQTT Broker Kinesis-MQTT Bridge
MQTT) MQTT BrokerMQTT-Kinesis BridgeKinesis
GithubMQTT-Kinesis Bridge
https://github.com/awslabs/mqtt-kinesis-bridge
MQTT Broker Kinesis-MQTT Bridge
Auto scaling Group
CognitoMobileSDKKinesis Kinesis
App w/SDK
End Users
Login OAUTH/OpenID Access Token
Cognito ID, Temp
Credentials
Access Token Pool ID
Role ARNs
Put Recode
Identitypool
Identity Providers
Access Policy identitypool Unauthenticated
Identities
authenticated identities AWS
Account
Amazon Cognito - ID
GetShardIterator APIShardGetRecords
API http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetShardIterator.html http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html
AWS SDK for Java, Javascript, Python, Ruby, PHP, .Net
botoget_shard_iterator, get_records
http://docs.pythonboto.org/en/latest/ref/kinesis.html#module-boto.kinesis.layer1
GetShardIterator GetShardIterator APIShardIteratorType
ShardIteratorType
AT_SEQUENCE_NUMBER ( ) AFTER_SEQUENCE_NUMBER ( ) TRIM_HORIZON ( Shard ) LATEST ( )
Seq: xxx
LATEST
AT_SEQUENCE_NUMBERAFTER_SEQUENCE_NUMBER
TRIM_HORIZON
GetShardIterator
Shard 1
Shard 2
Shard 3
Shard n
Shard 4
KCL Worker 1
KCL Worker 2
EC2 Instance
KCL Worker 3
KCL Worker 4
EC2 Instance
KCL Worker n
EC2 Instance
Kinesis
Kinesis Client Library (KCL)Client library for fault-tolerant, at least-once, Continuous Processing
ShardWorker Worker Worker worker AutoScaling At least once
Kinesis Client Library
StreamShard-0
Shard-1
Kinesis
(KCL)
Instance A 12345
Instance A 98765
Data Record(12345)
Data Record(24680)
Data Record(98765)
DynamoDBInstance A
1. Kinesis Client LibraryShardData Record2. ID
DynamoDB3. Shard
Key, Attribute
Kinesis Client LibraryStream
Shard-0
Shard-1
Kinesis
(KCL)
Instance A 12345
Instance B 98765
Data Record(12345)
Data Record(24680)
Data Record(98765)
DynamoDB
Instance A
Kinesis
(KCL)
Instance B
1.
Key, Attribute
Kinesis Client LibraryStream
Shard-0
Shard-1
Kinesis
(KCL)
Instance AInstance B
12345
Instance B 98765
Data Record(12345)
Data Record(24680)
Data Record(98765)
DynamoDB
Instance A
Kinesis
(KCL)
Instance B
Instance AInstance BDynamoDB
Key, Attribute
Kinesis Client LibraryStream
Shard-0
Kinesis
(KCL)
Shard
Shard-0 Instance A 12345
Shard-1 Instance A 98765
Data Record(12345)
Data Record(24680)
DynamoDB
Instance A
Shard-1Shard-1DynamoDB
Shard-1Data Record(98765)
New
Key, Attribute
Kinesis
(12345)
(98765)
(24680)
(12345)
(98765)
(24680)
(KCL)
DynamoDBInstance A
Shard
Shard-0 Instance A 12345
Shard-1 Instance A 98765
(KCL)
Instance AShard
Shard-0 Instance A 24680
Shard-1 Instance A 98765
Archive Table
Calc Table
Kinesis Client Library (KCL) for Python
KCL for PythonKCL for JavaMultiLangDaemonPython
MultiLangDaemon
STDIN/STDOUT
Kinesis Client Library (KCL) for Python KCL for PythonKCL for JavaMultiLangDaemon
Python
MultiLangDaemon
STDIN/STDOUT
KCL(Java)
Shard-0
Shard-1 Worker Thread
Worker Thread Python Logic Process
Python Logic Process
KCL for Python#!env python from amazon_kclpy import kcl import json, base64 class RecordProcessor(kcl.RecordProcessorBase): def initialize(self, shard_id): pass def process_records(self, records, checkpointer): pass def shutdown(self, checkpointer, reason): pass if __name__ == "__main__": kclprocess = kcl.KCLProcess(RecordProcessor()) kclprocess.run()
KCL for Python
https://github.com/awslabs/amazon-kinesis-client-python/blob/master/amazon_kclpy/kcl.py
https://github.com/awslabs/amazon-kinesis-client/tree/master/src/main/java/com/amazonaws/services/kinesis/multilang
KCL for Python
KCL for Java
Multi Language Protocol
Action Parameter Initialize "shardId" : "string" processRecords [{ "data" : base64encoded_string",
"partitionKey" : partition key", "sequenceNumber" : sequence number"; }] // a list of records
checkpoint "checkpoint" : sequence number", "error" : NameOfException"
shutdown "reason" : TERMINATE|ZOMBIE"
KCL for Python
failoverTimeMillis WorkerWorkerDynamoDBPIOPS
maxRecords 1
idleTimeBetweenReadsInMillis
callProcessRecordsEvenForEmptyRecordList
True or Fault
parentShardPollIntervalMillis ShardDynamoDBPIOPS
cleanupLeasesUponShardCompletion shrad
taskBackoTimeMillis KCL
metricsBuerTimeMillis CloudWatchAPI
metricsMaxQueueSize CloudWatchAPI
validateSequenceNumberBeforeCheckpointing
Checkpointing
maxActiveThreads MultiLangDaemon
KCL for Python[ec2-user@ip-172-31-17-43 samples]$ amazon_kclpy_helper.py --print_command -j /usr/bin/java -p /home/ec2-user/amazon-kinesis-client-python/samples/sample.properties /usr/bin/java -cp /usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/amazon-kinesis-client-1.2.0.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/jackson-annotations-2.1.1.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/commons-codec-1.3.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/commons-logging-1.1.1.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/joda-time-2.4.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/jackson-databind-2.1.1.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/jackson-core-2.1.1.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/aws-java-sdk-1.7.13.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/httpclient-4.2.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/httpcore-4.2.jar:/home/ec2-user/amazon-kinesis-client-python/samples com.amazonaws.services.kinesis.multilang.MultiLangDaemon sample.properties
KCL
Kinesis
A
BSeqNo
(14)SeqNo(17)
SeqNo(25)
SeqNo(26)
SeqNo(32)
Kinesis
Simple ETL KinesisIngestS3DynamoDBRedshift-
ETL/MapReduce KinesisIngestHadoopSparkStorm- - ETL
Filter KinesisFiltering/MapReduce- -
AWS Lambda AWS Lambda
KCL
Dashboard
Redshift
DynamoDB
Simple ETL DynamoDBRedshiftS3Kinesis
Connector Libraryhttps://github.com/awslabs/amazon-kinesis-connectors
Redshift
S3
Redshift
S3
Transformer Filter Buer Emitter
Kinesis Connector
ETL/MapReduce1 HadoopSpark KinesisHivePigHadoopETLMap Reduce
Kinesis Stream, S3, DynamoDB, HDFSHive Table
JOIN Data pipeline / CrontabKinesis
EMR AMI 3.0.4Kinesis
EMR Cluster S3
Data Pipeline
DataPipelineHiveKinesisS3
Kinesis
ETL/MapReduce2 Apache Storm Bolt KinesisApache StormSpout
https://github.com/awslabs/kinesis-storm-spout
Data Sources
Data Sources
Data Sources
Storm Spout
Storm Bolt
Storm Bolt
Storm Bolt
Filter Kinesis FilterMapReduceKinesis Kinesis
Data Sources
Data Sources
Data Sources
Kinesis App
Kinesis App
Kinesis App
Kinesis App
Filter Layer () Process Layer ()
Apache SparkApache Storm
Data Sources
Data Sources
Data Sources
Jubatus
Dashboard
Jubatus
AWS Lambda Lambda Function
Data Sources
Data Sources
Data Sources
AWS Lambda
Redshift
S3
KinesisEC2
Jubatus
(iPhone)
HTTP/WS
Put Record
HTTP/WS Get Records
IoT
AWS