GOTO Night with Todd Montgomery: Aeron: What, why and what next?

123
Use promo code “ilovegoto” for $100 off registration

Transcript of GOTO Night with Todd Montgomery: Aeron: What, why and what next?

Use$promo$code!“ilovegoto”!!for$$100$off$registration!

Aeron What, Why, and What Next?

Todd L. Montgomery

@toddlmontgomery

THANK YOU!

1. Why build another Product?

2. What Features are really needed? 3. How does one Design for this?

4. How did things Evolve along the way?

5. What’s Next?

1. Why build another product?

Feature Bloat & Complexity

Not Fast Enough

Low & Predictable Latency is key

We are in a new world

Multi-core, Multi-socket, Cloud...

We are in a new world

UDP, IPC, InfiniBand,RDMA, PCI-e

Multi-core, Multi-socket, Cloud...

Aeron is trying a new approach

The Team

Todd Montgomery Richard Warburton

Martin Thompson

2. What featuresare really needed?

Publishers SubscribersChannel

Stream

Messaging

Channel

A library, not a framework, on which other abstractions and

applications can be built

Composable Design

OSI layer 4 Transport for message oriented streams

OSI Layer 4 (Transport) Services

1. Connection Oriented Communication

2. Reliability 3. Flow Control

4. Congestion Avoidance/Control

5. Multiplexing

Connection Oriented Communication

Reliability

Flow Control

Congestion Avoidance/Control

Multiplexing

Multi-Everything World!

Publishers Subscribers

Channel

Stream

Multi-Everything World

3. How does onedesign for this?

Design Principles

1. Garbage free in steady state running 2. Smart Batching in the message path

3. Wait-free algos in the message path

4. Non-blocking IO in the message path

5. No exceptional cases in message path

6. Apply the Single Writer Principle 7. Prefer unshared state

8. Avoid unnecessary data copies

It’s all about 3 things

It’s all about 3 things

1. System Architecture

It’s all about 3 things

1. System Architecture

2. Data Structures

It’s all about 3 things

1. System Architecture

2. Data Structures

3. Protocols of Interaction

Publisher

Subscriber

Subscriber

Publisher

Architecture

IPC Log Buffer

Sender

Receiver

Receiver

Sender

Publisher

Subscriber

Subscriber

Publisher

Architecture

Media

IPC Log BufferMedia (UDP, InfiniBand, PCI-e 3.0)

Conductor

Sender

Receiver

Conductor

Receiver

Sender

Publisher

Subscriber

Subscriber

Publisher

Admin

Events

Architecture

Admin

EventsMedia

IPC Log BufferMedia (UDP, InfiniBand, PCI-e 3.0)Function/Method CallVolatile Fields & Queues

ClientMedia DriverMedia Driver

Conductor

Sender

Receiver

Conductor

Receiver

Sender

Client

Publisher

Conductor Conductor

Subscriber

Subscriber

Publisher

Admin

Events

Architecture

Admin

EventsMedia

IPC Log Buffer

IPC Ring/Broadcast Buffer

Media (UDP, InfiniBand, PCI-e 3.0)Function/Method CallVolatile Fields & Queues

Is the Media Driver a broker?

Broker Relationship Status: Complicated…

Broker Relationship Status: Complicated…

1. Media Driver can be standalone

Broker Relationship Status: Complicated…

1. Media Driver can be standalone

2. Media Driver can be embedded

Broker Relationship Status: Complicated…

1. Media Driver can be standalone

2. Media Driver can be embedded

3. Core-to-Core memory handoff

Broker Relationship Status: Complicated…

1. Media Driver can be standalone

2. Media Driver can be embedded

3. Core-to-Core memory handoff

4. Isolation of protocol logic

Data Structures

• Maps

• IPC Ring Buffers

• IPC Broadcast Buffers

• ITC Queues

• Dynamic Arrays

• Log Buffers

Creates a replicated persistent log

of messages

What Aeron does

TailFile

Tail

File

Message 1

Header

Tail

File

Message 1

Header

Message 2

Header

Tail

File

Message 1

Header

Message 2

Header

Tail

File

Message 1

Header

Message 2

Header

Message 3

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Persistent data structures can be safe to read without locks

One big file that goes on forever?

No!!!

Page faults, page cache churn, VM pressure, ...

ActiveDirtyClean

Tail

Message

Header

Message

Header

Message

Header

Message

Header

Message

Header

Message

Header

Message

Header

Message

Header

How do we stay “wait-free”?

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Header

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Header

Padding

TailFile

Message 1

Header

Message 2

Header

Message 3

Header

Message Y

Header

Padding

File

Message X

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Header

Padding

File

Header

What’s in a header?

Data Message Header

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version |B|E| Flags | Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------------+ |R| Frame Length | +-+-------------------------------------------------------------+ |R| Term Offset | +-+-------------------------------------------------------------+ | Session ID | +---------------------------------------------------------------+ | Stream ID | +---------------------------------------------------------------+ | Term ID | +---------------------------------------------------------------+ | Encoded Message ... ... | +---------------------------------------------------------------+

Header Evolution

1. Same header on wire & in memory

2. Frame Length originally 16-bits

3. IPv6-style Header Chains? Pfft!

4. Frame Alignment & Padding

5. Fragmentation? 2-bits

6. Position! Position! Position!

What is a position?

Unique identification of a byte within each stream across time

(streamId, sessionId, termId, termOffset)

count = currentTerm - initialTerm

(count * termLength) + termOffset

// termLength power of 2

bts = ntz(termLength) (count << bts) + termOffset

Position

How do we replicate a log?

We need a Protocol of messages

Sender Receiver

Receiver

Setup

Sender

Sender

Status

Receiver

DataData

Sender Receiver

DataData

Status

Sender Receiver

DataDataHeartbeat

Sender Receiver

DataData

NAK

Sender Receiver

Data

Sender Receiver

Protocol Evolution

1. 0-length Data Frames (SETUP & HB)

2. Ranged NAKs vs NAKing a Range

3. Send Padding & Frame Alignment!

4. Eliminating Special Cases

How are message streams reassembled?

High Water MarkFile

Completed

High Water Mark

File

Message 1

Header

Completed

High Water Mark

File

Message 1

Header

Message 3

Header

Completed

File

Message 1

Header

Message 2

Header

Message 3

Header

Completed High Water Mark

How do we know what is consumed?

Publishers, Senders, Receivers, and Subscribers all keep position counters

Counters are the key to flow control and monitoring

Flow Control

Three Flow Control Windows

1. Publisher to Sender – Counter

2. Sender to Receiver – Status Messages

3. Receiver to Subscriber – Counter

Status Messages

Completed Position of Subscriber +

Receiver Window

Status Message Generation

1. Term Rollover

2. Every ¼ Term Progression

3. On Timeout

Clocked by Sender Data Rate

Flow control strategies handle composition of status from

multiple receivers

Safety

No Status Messages,

No Completed Progression,

No Sending

Congestion Control

Status Messages

Completed Position of Subscriber +

Receiver Window

Dynamically adjust

Receiver Window based on loss

Loss, throughput, and buffer size are all strongly related!!!

Pro Tip:Know your OS network parameters and how to tune them

4. What else did we discover?

Some parts of Java really suck!

Some parts of Java really suck!

Unsigned Types?

Some parts of Java really suck!

Unsigned Types?

NIO (most of) - Locks

Some parts of Java really suck!

Unsigned Types?

NIO (most of) - Locks

Off-heap, PAUSE, Signals, etc.

Some parts of Java really suck!

Unsigned Types?

String Encoding

NIO (most of) - Locks

Off-heap, PAUSE, Signals, etc.

Some parts of Java really suck!

Unsigned Types?

String Encoding

NIO (most of) - Locks

Off-heap, PAUSE, Signals, etc.

Managing External Resources

Some parts of Java really suck!

Unsigned Types?

Off-heap, PAUSE, Signals, etc.

Selectors - GC

String Encoding

NIO (most of) - Locks

Managing External Resources

Some parts of Java are really nice!

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram

Profiling – Flight Recorder

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram

Bytecode Instrumentation

Profiling – Flight Recorder

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram

Bytecode Instrumentation

Unsafe!!! + Java 8

Profiling – Flight Recorder

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram

Bytecode Instrumentation

Profiling – Flight Recorder

The Optimiser

Unsafe!!! + Java 8

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram

Bytecode Instrumentation

Profiling – Flight Recorder

The Optimiser – Love/Hate

Unsafe!!! + Java 8

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram

Bytecode Instrumentation

Garbage Collection!!!

Profiling – Flight Recorder

Unsafe!!! + Java 8

The Optimiser – Love/Hate

5. What’s Next?

Finished a few passes ofProfiling and Tuning

20+ Million 40 bytemessages per second!!!

Replication

Services

Aeron Core

Persistence

Queueing

Performance

C/C++ Port

Batch Send/Recv

IPC

InfinibandMulti Unicast Send

Stream Query

In closing…

Aeron: https://github.com/real-logic/Aeron Twitter: @toddlmontgomery

Thank You!

Questions?