The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework...

56

Transcript of The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework...

Page 1: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter
Page 2: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

The Team

Renato Marroquín

● PhD student at:● Interested in:

○ Stream processing○ Distributed data management

● Apache contributor○ Apache Gora, Giraph, Nutch, Samza

● rmarroquin [at] apache [dot] org

Page 3: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

The Team

Yan Fang

● Software engineer at:● Interested in:

○ Stream processing○ Information retrieval○ Natural language processing

● Committer and PMC for Apache Samza ● yfang [at] apache [dot] org

Page 4: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Background● Messaging systems

Shared message queue

SendersReceivers

Page 5: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Background● Messaging systems

SendersReceivers

Twitter’s Kestrel

Page 6: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Background● Stream processing

○ Message producers and consumers are not so trivial○ Partitioning○ State○ Failure semantics○ Reprocessing○ Joins to services or databases

Page 7: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

● Distributed stream processing framework● Developed @ LinkedIn● Open sourced ~ 2013● Used by

○ LinkedIn○ Uber○ Tivo○ Nextel○ Metamarkers○ ...

Page 8: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza’s architecture

Execution layer

Messaging layer

Processing layer

Page 9: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza’s architecture

YARN

Page 10: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza’s architecture

YARNAmazon Kinesis

Page 11: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

How it works?

● Streams● Tasks● Execution

Page 12: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

1

Partition 0

2

3

4

1

Partition 1

2

3

1

Partition 2

2

3

4

5

Streams

How it works?

● Streams● Tasks● Execution

Page 13: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

1

Partition 0

2

3

4

1

Partition 1

2

3

1

Partition 2

2

3

4

5

Streams

5

How it works?

● Streams● Tasks● Execution

Page 14: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

1

Partition 0

2

3

4

1

Partition 1

2

3

1

Partition 2

2

3

4

5

Streams

5

4

6

How it works?

● Streams● Tasks● Execution

Page 15: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

How it works?

● Streams● Tasks● Execution

1

Partition 0

2

3

4

1

Partition 1

2

3

1

Partition 2

2

3

4

5

Streams

5

4

66

5

7

to be appended

Page 16: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

Streams

Tasks

How it works?

● Streams● Tasks● Execution

Stream A Stream B

Task 1 Task 2 Task 3

Stream C

input

Page 17: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

Streams

Tasks

How it works?

● Streams● Tasks● Execution

Stream A Stream B

Task 1 Task 2 Task 3

Stream C

output

Page 18: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

Streams

Tasks

How it works?

● Streams● Tasks● Execution

Stream A Stream B

Task 1 Task 2 Task 3

Stream C

input output

Page 19: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

Streams

Tasks

How it works?

● Streams● Tasks● Execution

CarUpdates StationStatus

Task 1 Task 2 Task 3

NearbyDiscounts

input output

Page 20: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

How it works?

● Streams● Tasks● Execution

Class MyExampleTask implements StreamTask {

public void process (IncomingMessageEnvelope env,

MessageCollector col,

TaskCoordinator coord) {

final TextMessage msg = (TextMessage) envelope.getMessage();

Map<String, Object> outMap = new HashMap<>(){{

put(msg.getMessageID(), msg.getText());

}};

collector.send(new OutgoingMessageEnvelope(outStream, outMap));

}}

Page 21: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

Streams

Tasks

How it works?

● Streams● Tasks● Execution

CarUpdates StationStatus

Task 1 Task 2 Task 3

NearByDiscounts

Samza Container 2

SamzaContainer 1

Page 22: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

How it works?

● Streams● Tasks● Execution

Task 1 Task 2 Task 3

Samza Container 2

SamzaContainer 1

Host 1 Host 2

Page 23: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

How it works?

● Streams● Tasks● Execution

Task 1 Task 2 Task 3

Samza Container 2

SamzaContainer 1

NodeManager NodeManager

Host 1 Host 2

Page 24: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza

How it works?

● Streams● Tasks● Execution

Task 1 Task 2 Task 3

Samza Container 2

SamzaContainer 1

NodeManager NodeManager

Host 1 Host 2

Samza YARN AM

Page 25: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Kafka Broker

Apache Samza

How it works?

● Streams● Tasks● Execution

Task 1 Task 2 Task 3

Samza Container 2

SamzaContainer 1

NodeManager NodeManager

Host 1 Host 2

Samza YARN AM

Kafka Broker

Page 26: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Amazon KinesisIntroduction

1

ShardId-0000000001

2

3

Messages sequence numbers

1

ShardId-0000000002

2

3...

Scale out to as many shards as needed

Scale down to the min. shards needed

ShardId

Streams

Page 27: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Amazon Kinesis● Two types of API

○ Amazon KCL○ Amazon Kinesis API

Page 28: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Amazon Kinesis● Two types of API

○ Amazon KCL■ Specification for handling records: IRecordProcessor interface■ Assigns a shard to an IRecordProcessor■ Create a KCL Worker to consumer data■ If many workers, they coordinate through Amazon Dynamo

KinesisClientLibConfiguration config = new KinesisClientLibConfiguration(...)

IRecordProcessorFactory recordProcessorFactory = new RecordProcessorFactory();

Worker worker = new Worker.Builder()

.recordProcessorFactory(recordProcessorFactory)

.config(config)

.build();

○ Amazon Kinesis API

Page 29: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Amazon Kinesis● Two types of API

○ Amazon KCL■ Specification for handling records: IRecordProcessor interface■ Assigns a shard to an IRecordProcessor■ Create a KCL Worker to consumer data■ If many workers, they coordinate through Amazon Dynamo

KinesisClientLibConfiguration config = new KinesisClientLibConfiguration(...)

IRecordProcessorFactory recordProcessorFactory = new RecordProcessorFactory();

Worker worker = new Worker.Builder()

.recordProcessorFactory(recordProcessorFactory)

.config(config)

.build();

○ Amazon Kinesis API

Record processor

Page 30: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Amazon Kinesis● Two types of API

○ Amazon KCL■ Specification for handling records: IRecordProcessor interface■ Assigns a shard to an IRecordProcessor■ Create a KCL Worker to consumer data■ If many workers, they coordinate through Amazon Dynamo

KinesisClientLibConfiguration config = new KinesisClientLibConfiguration(...)

IRecordProcessorFactory recordProcessorFactory = new RecordProcessorFactory();

Worker worker = new Worker.Builder()

.recordProcessorFactory(recordProcessorFactory)

.config(config)

.build();

○ Amazon Kinesis API

KCL worker

Page 31: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Amazon Kinesis● Two types of API

○ Amazon KCL○ Amazon Kinesis API

Page 32: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Amazon Kinesis● Two types of API

○ Amazon KCL○ Amazon Kinesis API

■ Get data from Kinesis shards ■ To iterate over shards: getNextShardIterator■ To get shard iterator: getShardIterator

GetRecordsRequest getRecordsRequest = new GetRecordsRequest();getRecordsRequest.setShardIterator(shardIterator);getRecordsRequest.setLimit(25);

GetRecordsResult getRecordsResult = client.getRecords(getRecordsRequest);List<Record> records = getRecordsResult.getRecords();

Configure Kinesis request

Page 33: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Amazon Kinesis● Two types of API

○ Amazon KCL○ Amazon Kinesis API

■ Get data from Kinesis shards ■ To iterate over shards: getNextShardIterator■ To get shard iterator: getShardIterator

GetRecordsRequest getRecordsRequest = new GetRecordsRequest();getRecordsRequest.setShardIterator(shardIterator);getRecordsRequest.setLimit(25);

GetRecordsResult getRecordsResult = client.getRecords(getRecordsRequest);List<Record> records = getRecordsResult.getRecords();

Executing the request

Page 34: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration storypublic interface SystemConsumer {

void start();

void stop();

void register( SystemStreamPartition systemStreamPartition, String lastReadOffset);

List<IncomingMessageEnvelope> poll(

Map<SystemStreamPartition, Integer> systemStreamPartitions,

long timeout) throws InterruptedException;

}

Page 35: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration storypublic interface SystemProducer {

void start();

void stop();

void register(String source);

void send(String source, OutgoingMessageEnvelope envelope);

void flush(String source);

}

Page 36: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration story● Map each Kinesis shard → Samza’s logical partition ● Remember → Messages coming from a specific partition go to:

○ Particular task ○ Inside a specific container

● Approach 1:○ Auto-scaling○ Load balancing

● Approach 2:○ Correctness○ At-least once guarantees

Page 37: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration storyAttempt 1: KCL

● Every task uses its own KCL

Stream A Stream B

KCL KCLKCL

Stream C

Samza Container 2

SamzaContainer 1

Task 2Task 1 Task 3

Page 38: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration storyAttempt 1: KCL

● Every task uses its own KCL

● KCL is in charge of auto-scaling and fault-tolerance

Stream A Stream B

KCL

Stream C

Samza Container 2

SamzaContainer 1

Task 2Task 1 Task 3

KCLKCL

Page 39: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration story

Stream A Stream B

KCL

Stream C

Samza Container 2

SamzaContainer 1

Task 2Task 1 Task 3

KCLKCL

Attempt 1: KCL

● Every task uses its own KCL

● KCL is in charge of auto-scaling and fault-tolerance

Page 40: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration story

Stream A Stream B

KCL

Stream C

Samza Container 2

SamzaContainer 1

Task 2Task 1 Task 3

KCLKCL

Attempt 1: KCL

● Every task uses its own KCL● KCL is in charge of auto-scaling

and fault-tolerance● But

○ Fix mapping between streams partitions and containers

○ Msgs should go to a specific container

● Messages could be lost!

Page 41: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration storyAttempt 2:

Amazon Kinesis API

● Assuring correctness

● Letting Samza deal with fault-tolerance and load balancing

Stream A Stream B Stream C

Samza Container 2

SamzaContainer 1

Task 1+

Kinesis API

Task 2+

Kinesis API

Task 3+

Kinesis API

Page 42: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration storyAttempt 2: Amazon Kinesis API

● Register each Kinesis shard as Samza’s partitions.○ Involves creating a shardIterator per partition

● Create a GetShardIteratorRequest per partition to fetch data○ From the beginning of the stream○ From a specific sequence number

● Keep track of each record received from Kinesis● Checkpoint in Samza

Page 43: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration story // Create a new getRecordsRequest with an existing shardIterator

GetRecordsRequest getRecordsRequest = new GetRecordsRequest();

getRecordsRequest.setShardIterator(shardIterator);

GetRecordsResult result = kClient.getRecords(getRecordsRequest);

// Put the result into record list.

for (Record record : result.getRecords()) {

IncomingMessageEnvelope envelope = new IncomingMessageEnvelope(ssp,

record.getSequenceNumber(),

record.getPartitionKey(),

record.getData());

put(ssp, envelope);

trackDeliveries(ssp.getName(), envelope);

}

Page 44: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration story // Create a new getRecordsRequest with an existing shardIterator

GetRecordsRequest getRecordsRequest = new GetRecordsRequest();

getRecordsRequest.setShardIterator(shardIterator);

GetRecordsResult result = kClient.getRecords(getRecordsRequest);

// Put the result into record list.

for (Record record : result.getRecords()) {

IncomingMessageEnvelope envelope = new IncomingMessageEnvelope(ssp,

record.getSequenceNumber(),

record.getPartitionKey(),

record.getData());

put(ssp, envelope);

trackDeliveries(ssp.getName(), envelope);

}

Creating a handle toKinesis shards

Page 45: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration story // Create a new getRecordsRequest with an existing shardIterator

GetRecordsRequest getRecordsRequest = new GetRecordsRequest();

getRecordsRequest.setShardIterator(shardIterator);

GetRecordsResult result = kClient.getRecords(getRecordsRequest);

// Put the result into record list.

for (Record record : result.getRecords()) {

IncomingMessageEnvelope envelope = new IncomingMessageEnvelope(ssp,

record.getSequenceNumber(),

record.getPartitionKey(),

record.getData());

put(ssp, envelope);

trackDeliveries(ssp.getName(), envelope);

}

Executing the request

Page 46: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration story // Create a new getRecordsRequest with an existing shardIterator

GetRecordsRequest getRecordsRequest = new GetRecordsRequest();

getRecordsRequest.setShardIterator(shardIterator);

GetRecordsResult result = kClient.getRecords(getRecordsRequest);

// Put the result into record list.

for (Record record : result.getRecords()) {

IncomingMessageEnvelope envelope = new IncomingMessageEnvelope(ssp,

record.getSequenceNumber(),

record.getPartitionKey(),

record.getData());

put(ssp, envelope);

trackDeliveries(ssp.getName(), envelope);

}

Passing messages to Samza

Page 47: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Integration story // Create a new getRecordsRequest with an existing shardIterator

GetRecordsRequest getRecordsRequest = new GetRecordsRequest();

getRecordsRequest.setShardIterator(shardIterator);

GetRecordsResult result = kClient.getRecords(getRecordsRequest);

// Put the result into record list.

for (Record record : result.getRecords()) {

IncomingMessageEnvelope envelope = new IncomingMessageEnvelope(ssp,

record.getSequenceNumber(),

record.getPartitionKey(),

record.getData());

put(ssp, envelope);

trackDeliveries(ssp.getName(), envelope);

}

Keeping track of messages received

Page 48: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Task 1+

Kinesis API

Task 2+

Kinesis API

Task 3+

Kinesis API

Integration story: Samza’s fault tolerance

Stream A Stream B Stream C

Samza Container 2

SamzaContainer 1

Page 49: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Local

Task 1+

Kinesis API

Task 2+

Kinesis API

Task 3+

Kinesis API

Integration story: Samza’s fault tolerance

Stream A Stream B Stream C

Samza Container 2

SamzaContainer 1

Local Local

Durable change log

Page 50: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Local

Task 1+

Kinesis API

Task 2+

Kinesis API

Task 3+

Kinesis API

Integration story: Samza’s fault tolerance

Stream A Stream B Stream C

Samza Container 2

SamzaContainer 1

Local Local

Durable change log

Page 51: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Local

Task 1+

Kinesis API

Task 2+

Kinesis API

Task 3+

Kinesis API

Integration story: Samza’s fault tolerance

Stream A Stream B Stream C

Samza Container 2

SamzaContainer 1

Local Local

Durable change log

Page 52: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Local

Task 1+

Kinesis API

Task 2+

Kinesis API

Task 3+

Kinesis API

Integration story: Samza’s fault tolerance

Stream A Stream B Stream C

Samza Container 2

SamzaContainer 1

Local Local

Durable change log

Page 53: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Local

Task 1+

Kinesis API

Task 2+

Kinesis API

Task 3+

Kinesis API

Integration story: Samza’s fault tolerance

Stream A Stream B Stream C

Samza Container 2

SamzaContainer 1

Local Local

Durable change log

->Restart container->Replay log from durable change log

Page 54: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Local

Task 1+

Kinesis API

Task 2+

Kinesis API

Task 3+

Kinesis API

Integration story: Samza’s fault tolerance

Stream A Stream B Stream C

Samza Container 2

SamzaContainer 1

Local Local

Durable change log

->Restart container->Replay log from durable change log

Page 55: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Apache Samza next moves● Tighter integration with Apache Kafka● Samza as

○ a stream processing as a service○ the transformation layer for many services

● Pluggable mapping from streams/tasks/containers● Different execution layers

○ Yarn is not always needed○ Standalone mode

● Integrate with other systems

Page 56: The Team - events.static.linuxfound.org · Apache Samza Distributed stream processing framework Developed @ LinkedIn Open sourced ~ 2013 Used by LinkedIn Uber Tivo Nextel ... Tighter

Thanks!

@renatomarroquinrmarroquin [at] apache [dot] org