A quick introduction to AWS Kinesis

11
@ogeisser #Devoxx #Kinesis A quick introduction to AWS Kinesis Streams Oliver Geisser `

Transcript of A quick introduction to AWS Kinesis

Page 1: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

A quick introduction to AWS Kinesis Streams

Oliver Geisser

`

Page 2: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

Kinesis Platform Family KinesisStreams

KinesisFirehose

KinesisAnalytics

Build your own custom application

that process or analyze streaming

data

Available since 2014

Load massive volumes of streaming data into Amazon S3

and Redshift

NEW Oct 2015

Analyze data streams using SQL queries

Announced for 2016

Page 3: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

Kinesis Platform Family KinesisStreams

KinesisFirehose

KinesisAnalytics

Build your own custom application

that process or analyze streaming

Data

Available since 2014

Load massive volumes of streaming data into Amazon S3

and Redshift

NEW Oct 2015

Analyze data streams using SQL queries

Announced for 2016

Page 4: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

Kinesis Streams – Example Use Case

Page 5: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

High Level Architecture

Page 6: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

Concepts (I) Stream•  Named Event Stream of Data Records

•  Data is stored for 24 hours (default) – up to 168 hours (7 days)

•  Data is partioned into Shards

Data Record•  Unit of data stored in an Stream

•  Data Record = Data Blob + Partition Key + Sequence Number

Page 7: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

Concepts (II) Partition Key•  Assigned to the Data Record by the data producer

•  Used for partitioning of data across Shards

•  MD5 Hash determines Shard

Sequence Number•  Unique identifier of a Data Record

•  Assigned by Kinesis on write

Page 8: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

Concepts (III) Shard•  A shard is a group of Data Records in a Stream

•  A stream is composed of multiple shards

•  You scale Kinesis streams by adding or removing Shards

•  Each shard provides a fixed unit of capacity

•  Each shard ingests up to 1MB/sec of data up to 1000 records/sec

Page 9: A quick introduction to AWS Kinesis

Demo

@ogeisser #Devoxx #Kinesis

Page 10: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

Closing Remarks • Understand the consequences of the limits•  Shards (=Capacity), Number of Consumers, Latency, etc.

• Trade Off: Vendor Lock-In vs. Managed Service•  Alternative: Manage your own Kafka Cluster

• Choose the right access library for your use-case•  HTTP, SDK, Client, Producer, Connector, Third Party

Page 11: A quick introduction to AWS Kinesis

@ogeisser #Devoxx #Kinesis

Thank you Oliver Geisser

Twitter: @ogeisser