Introducing Exactly Once Semantics To Apache Kafka

68
1 Introducing Exactly Once Semantics in Apache Kafka Jason Gustafson, Guozhang Wang, Sriram Subramaniam, and Apurva Mehta

Transcript of Introducing Exactly Once Semantics To Apache Kafka

Page 1: Introducing Exactly Once Semantics To Apache Kafka

1

Introducing Exactly Once Semantics in Apache KafkaJason Gustafson, Guozhang Wang, SriramSubramaniam, and Apurva Mehta

Page 2: Introducing Exactly Once Semantics To Apache Kafka

2

On deck..

• Kafka’s existing delivery semantics.

• Why did we improve them?

• What’s new?

• How do you use it?

• Summary.

Page 3: Introducing Exactly Once Semantics To Apache Kafka

3

Apache Kafka’s existing semantics

Page 4: Introducing Exactly Once Semantics To Apache Kafka

4

Existing Semantics

Page 5: Introducing Exactly Once Semantics To Apache Kafka

5

Existing Semantics

Page 6: Introducing Exactly Once Semantics To Apache Kafka

6

Existing Semantics

Page 7: Introducing Exactly Once Semantics To Apache Kafka

7

Existing Semantics

Page 8: Introducing Exactly Once Semantics To Apache Kafka

8

Existing Semantics

Page 9: Introducing Exactly Once Semantics To Apache Kafka

9

Existing Semantics

Page 10: Introducing Exactly Once Semantics To Apache Kafka

10

Existing Semantics

Page 11: Introducing Exactly Once Semantics To Apache Kafka

11

Existing Semantics

Page 12: Introducing Exactly Once Semantics To Apache Kafka

12

Existing Semantics

Page 13: Introducing Exactly Once Semantics To Apache Kafka

13

TL;DR – What we have today

• At least once in order delivery per partition.

• Producer retries can introduce duplicates.

Page 14: Introducing Exactly Once Semantics To Apache Kafka

14

Why improve?

Page 15: Introducing Exactly Once Semantics To Apache Kafka

15

Why improve?

• Stream processing is becoming an ever bigger part of the

data landscape.

• Apache Kafka is the heart of the streams platform.

• Strengthening Kafka’s semantics expands the universe of

streaming applications.

Page 16: Introducing Exactly Once Semantics To Apache Kafka

16

A motivating example..

A peer to peer lending platform which processes micro-loans

between users.

Page 17: Introducing Exactly Once Semantics To Apache Kafka

17

A Peer to Peer Lender

Page 18: Introducing Exactly Once Semantics To Apache Kafka

18

The Basic Flow

Page 19: Introducing Exactly Once Semantics To Apache Kafka

19

Offset commits

Page 20: Introducing Exactly Once Semantics To Apache Kafka

20

Reprocessed transfer, eek!

Page 21: Introducing Exactly Once Semantics To Apache Kafka

21

Lost money! Eek eek!

Page 22: Introducing Exactly Once Semantics To Apache Kafka

22

What’s new?

Page 23: Introducing Exactly Once Semantics To Apache Kafka

23

What’s new

• Exactly once in order delivery per partition

• Atomic writes across multiple partitions

• Performance considerations

Page 24: Introducing Exactly Once Semantics To Apache Kafka

24

What’s new, Part 1

Exactly once, in order, delivery per partition

Page 25: Introducing Exactly Once Semantics To Apache Kafka

25

The idempotent producer

Page 26: Introducing Exactly Once Semantics To Apache Kafka

26

The idempotent producer

Page 27: Introducing Exactly Once Semantics To Apache Kafka

27

The idempotent producer

Page 28: Introducing Exactly Once Semantics To Apache Kafka

28

The idempotent producer

Page 29: Introducing Exactly Once Semantics To Apache Kafka

29

The idempotent producer

Page 30: Introducing Exactly Once Semantics To Apache Kafka

30

The idempotent producer

Page 31: Introducing Exactly Once Semantics To Apache Kafka

31

The idempotent producer

Page 32: Introducing Exactly Once Semantics To Apache Kafka

32

The idempotent producer

Page 33: Introducing Exactly Once Semantics To Apache Kafka

33

TL;DR

• Sequence numbers and producer ids:

• enable de-dup

• are in the log.

• Hence de-dup works transparently across leader

changes.

• Will not de-dup application-level resends.

• Works transparently – no API changes.

Page 34: Introducing Exactly Once Semantics To Apache Kafka

34

What’s new, part 2

Multi partition writes.

Page 35: Introducing Exactly Once Semantics To Apache Kafka

35

Introducing ‘transactions’

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

Page 36: Introducing Exactly Once Semantics To Apache Kafka

36

Introducing ‘transactions’

Page 37: Introducing Exactly Once Semantics To Apache Kafka

37

Initializing ‘transactions’

Page 38: Introducing Exactly Once Semantics To Apache Kafka

38

Transactional sends – part 1

Page 39: Introducing Exactly Once Semantics To Apache Kafka

39

Transactional sends – part 2

Page 40: Introducing Exactly Once Semantics To Apache Kafka

40

Commit – phase 1

Page 41: Introducing Exactly Once Semantics To Apache Kafka

41

Commit – phase 2

Page 42: Introducing Exactly Once Semantics To Apache Kafka

42

Commit – phase 2

Page 43: Introducing Exactly Once Semantics To Apache Kafka

43

Success!

Page 44: Introducing Exactly Once Semantics To Apache Kafka

44

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

Page 45: Introducing Exactly Once Semantics To Apache Kafka

45

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

Page 46: Introducing Exactly Once Semantics To Apache Kafka

46

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

Page 47: Introducing Exactly Once Semantics To Apache Kafka

47

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

Page 48: Introducing Exactly Once Semantics To Apache Kafka

48

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

Page 49: Introducing Exactly Once Semantics To Apache Kafka

49

Consumer returns only committed messages

Page 50: Introducing Exactly Once Semantics To Apache Kafka

50

Some notes on consuming transactions

• Two ‘isolation levels’ : read_committed, and

read_uncommitted.

• Messages read in offset order.

• read_committed consumers read to the point where there

are no open transactions.

Page 51: Introducing Exactly Once Semantics To Apache Kafka

51

TL;DR

• Transaction coordinator and transaction log maintain

transaction state.

• Use the new producer APIs for transactions.

• Consumers can read only committed messages.

Page 52: Introducing Exactly Once Semantics To Apache Kafka

52

Part 3

Performance!

Page 53: Introducing Exactly Once Semantics To Apache Kafka

53

What’s new, part 3: Performance boost!

• Up to +20% producer throughput

• Up to +50% consumer throughput

• Up to -20% disk utilization

• Savings start when you batch

• Details: https://bit.ly/kafka-eos-perf

Page 54: Introducing Exactly Once Semantics To Apache Kafka

54

Too good to be true?

Let’s understand how!

Page 55: Introducing Exactly Once Semantics To Apache Kafka

55

The old message format

Page 56: Introducing Exactly Once Semantics To Apache Kafka

56

The new format

Page 57: Introducing Exactly Once Semantics To Apache Kafka

57

The new format -> new fields

Page 58: Introducing Exactly Once Semantics To Apache Kafka

58

The new format -> new fields

Page 59: Introducing Exactly Once Semantics To Apache Kafka

59

The new format -> delta encoding

Page 60: Introducing Exactly Once Semantics To Apache Kafka

60

A visual comparison with 7 records, 10 bytes each

Page 61: Introducing Exactly Once Semantics To Apache Kafka

61

TL;DR

• With a batch size of 2, the new format starts saving

space.

• Savings are maximal for large batches of small

messages.

• Hence higher throughput when IO bound.

• Works as soon as you upgrade to the new format.

Page 62: Introducing Exactly Once Semantics To Apache Kafka

62

Cool!

But how do I use this?

Page 63: Introducing Exactly Once Semantics To Apache Kafka

63

Producer Configs

• enable.idempotence = true

• max.inflight.requests.per.connection=1

• acks = “all”

• retries > 1 (preferably MAX_INT)

• transactional.id = ‘some unique id’

• enable.idempotence = true

Page 64: Introducing Exactly Once Semantics To Apache Kafka

64

Consumer configs

• isolation.level:

• “read_committed”, or

• “read_uncommitted”

Page 65: Introducing Exactly Once Semantics To Apache Kafka

65

Streams config

• processing.mode = “exactly_once”

Page 66: Introducing Exactly Once Semantics To Apache Kafka

66

Putting it together

• We understood Kafka’s existing delivery semantics

• Understood why we want to improve them

• Learned how these have been strengthened

• Learned how the new semantics work

Page 67: Introducing Exactly Once Semantics To Apache Kafka

67

When is it available?

Available to try in Kafka 0.11, June 2017.

Page 68: Introducing Exactly Once Semantics To Apache Kafka

68

Thank You!