Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and...

73
1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015

Transcript of Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and...

Page 1: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

1

Tales of the TailHardware, OS, and Application-level

Sources of Tail Latency

Jialin Li, Naveen Kr. Sharma,Dan R. K. Ports and Steven D. Gribble

February 2, 2015

Page 2: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

2

Introduction What is Tail Latency?

What is Tail Latency?

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 3: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

2

Introduction What is Tail Latency?

What is Tail Latency?

Request Processing Time

Fraction of Requests

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 4: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

2

Introduction What is Tail Latency?

What is Tail Latency?

Request Processing Time

Fraction of Requests

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 5: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

2

Introduction What is Tail Latency?

What is Tail Latency?

Request Processing Time

Fraction of Requests

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 6: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

2

Introduction What is Tail Latency?

What is Tail Latency?

Request Processing Time

Fraction of Requests

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 7: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

3

Introduction What is Tail Latency?

Why is the Tail important?

Low latency is crucial for interactive services.

500ms delay can cause 20% drop in user traffic. [Google Study]

Latency is directly tied to traffic, hence revenue.

What makes it challenging is today’s datacenter workloads.

Interactive services are highly parallel.

Single client request spawns thousands of sub-tasks.

Overall latency depends on slowest sub-task latency.

Bad Tail ⇒ Probability of any one sub-task getting delayed is high.

Page 8: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

3

Introduction What is Tail Latency?

Why is the Tail important?

Low latency is crucial for interactive services.

500ms delay can cause 20% drop in user traffic. [Google Study]

Latency is directly tied to traffic, hence revenue.

What makes it challenging is today’s datacenter workloads.

Interactive services are highly parallel.

Single client request spawns thousands of sub-tasks.

Overall latency depends on slowest sub-task latency.

Bad Tail ⇒ Probability of any one sub-task getting delayed is high.

Page 9: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

4

Introduction What is Tail Latency?

A real-life example

Nishtala et. al. Scaling memcache at Facebook, NSDI 2013.

All requests have to finish within the SLA latency.

Page 10: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

4

Introduction What is Tail Latency?

A real-life example

Nishtala et. al. Scaling memcache at Facebook, NSDI 2013.

All requests have to finish within the SLA latency.

Page 11: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

5

Introduction What is Tail Latency?

What can we do?

People in industry have worked hard on solutions.

Hedged Requests [Jeff Dean et. al.]

Effective sometimes, but adds application specific complexity.

Intelligently avoid slow machines

Keep track of server status; route requests around slow nodes.

Attempts to build predictable response out of less predictable parts.

We still don’t know what is causing requests to get delayed.

Page 12: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

5

Introduction What is Tail Latency?

What can we do?

People in industry have worked hard on solutions.

Hedged Requests [Jeff Dean et. al.]

Effective sometimes, but adds application specific complexity.

Intelligently avoid slow machines

Keep track of server status; route requests around slow nodes.

Attempts to build predictable response out of less predictable parts.

We still don’t know what is causing requests to get delayed.

Page 13: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

6

Introduction What is Tail Latency?

Our Approach

1 Pick some real life applications: RPC Server, Memcached, Nginx.

2 Generate the ideal latency distribution.

3 Measure the actual distribution on a standard Linux server.

4 Identify a factor causing deviation from ideal distribution.

5 Explain and mitigate it.

6 Iterate over this till we reach the ideal distribution.

Page 14: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

7

Introduction What is Tail Latency?

Rest of the Talk

1 Introduction

2 Predicted Latency from Queuing Models

3 Measurements: Sources of Tail Latencies

4 Summary

Page 15: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 16: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 17: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 18: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

Server

Clients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 19: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 20: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 21: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 22: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 23: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 24: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 25: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

99th percentile ⇒ 60 µs

Page 26: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

99.9th percentile ⇒ 200 µs

Page 27: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 28: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 29: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 30: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Inherent tail latency due to request burstiness.

Page 31: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Tail latency depends on the average server utilization.

Page 32: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Additional workers can reduce tail latency, even at constant utilization.

Page 33: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

10

Measurements: Sources of Tail Latencies

1 Introduction

2 Predicted Latency from Queuing Models

3 Measurements: Sources of Tail Latencies

4 Summary

Page 34: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

11

Measurements: Sources of Tail Latencies

Testbed

Cluster of standard datacenter machines.

2 x Intel L5640 6 core CPU

24 GB of DRAM

Mellanox 10Gbps NIC

Ubuntu 12.04, Linux Kernel 3.2.0

All servers connected to a single 10 Gbps ToR switch.

One server runs Memcached, others run workload generating clients.

Other application results are in the paper.

Page 35: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

12

Measurements: Sources of Tail Latencies

Timestamping Methodology

Append a blank buffer ≈ 32 bytes to each request.

Overwrite buffer with timestamps as it goes through the server.

Incoming

Server NIC

Memcached

read() return

Outgoing

Server NIC

Memcached

write()

After TCP/UDP

processing

Memcached thread

scheduled on CPU

Very low overhead and no server side logging.

Page 36: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

13

Measurements: Sources of Tail Latencies

How far are we from the ideal?

Page 37: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

14

Measurements: Sources of Tail Latencies

How far are we from the ideal?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

Single CPU, single core, Memcached running at 80% utilization.

Page 38: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

14

Measurements: Sources of Tail Latencies

How far are we from the ideal?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

Single CPU, single core, Memcached running at 80% utilization.

Page 39: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

14

Measurements: Sources of Tail Latencies

How far are we from the ideal?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

Single CPU, single core, Memcached running at 80% utilization.

Page 40: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

15

Measurements: Sources of Tail Latencies

Rest of the talk

Source of Tail Latency Potential way to fix

Background Processes

Multicore Concurrency

Interrupt Processing

Page 41: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

15

Measurements: Sources of Tail Latencies

Rest of the talk

Source of Tail Latency Potential way to fix

Background Processes

Multicore Concurrency

Interrupt Processing

Page 42: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

16

Measurements: Sources of Tail Latencies

How can background processes affect tail latency?

Memcached threads time-share a CPU core with other processes.

We need to wait for other processes to relinquish CPU.

Scheduling time-slices are usually couple of milliseconds.

How can we mitigate it?

Raise priority (decrease niceness) ⇒ More CPU time.

Upgrade scheduling class to real-time ⇒ Pre-emptive power.

Run on a dedicated core ⇒ No interference what-so-ever.

Page 43: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

16

Measurements: Sources of Tail Latencies

How can background processes affect tail latency?

Memcached threads time-share a CPU core with other processes.

We need to wait for other processes to relinquish CPU.

Scheduling time-slices are usually couple of milliseconds.

How can we mitigate it?

Raise priority (decrease niceness) ⇒ More CPU time.

Upgrade scheduling class to real-time ⇒ Pre-emptive power.

Run on a dedicated core ⇒ No interference what-so-ever.

Page 44: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated CoreInterference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 45: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated CoreInterference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 46: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated CoreInterference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 47: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated Core

Interference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 48: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated CoreInterference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 49: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

18

Measurements: Sources of Tail Latencies

Source of Tail Latency Potential way to fix

Background Processes Isolate by running on a dedicated core.

Multicore Concurrency

Interrupt Processing

Page 50: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

18

Measurements: Sources of Tail Latencies

Source of Tail Latency Potential way to fix

Background Processes Isolate by running on a dedicated core.

Multicore Concurrency

Interrupt Processing

Page 51: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 52: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 53: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 54: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 55: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 56: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

20

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

Yes it does! Provided we maintain a single queue abstraction.

Memcached partitions requests statically among threads.

ServerClients

Ideal Model

ServerClients

Memcached Architecture

How can we mitigate it?

Modify Memcached concurrency model to use a single queue.

Page 57: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

20

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

Yes it does! Provided we maintain a single queue abstraction.

Memcached partitions requests statically among threads.

ServerClients

Ideal Model

ServerClients

Memcached Architecture

How can we mitigate it?

Modify Memcached concurrency model to use a single queue.

Page 58: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

20

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

Yes it does! Provided we maintain a single queue abstraction.

Memcached partitions requests statically among threads.

ServerClients

Ideal Model

ServerClients

Memcached Architecture

How can we mitigate it?

Modify Memcached concurrency model to use a single queue.

Page 59: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

20

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

Yes it does! Provided we maintain a single queue abstraction.

Memcached partitions requests statically among threads.

ServerClients

Ideal Model

ServerClients

Memcached Architecture

How can we mitigate it?

Modify Memcached concurrency model to use a single queue.

Page 60: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

21

Measurements: Sources of Tail Latencies

Impact of Multicore Concurrency Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

For multi-threaded applications, a singlequeue abstraction can reduce tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 61: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

21

Measurements: Sources of Tail Latencies

Impact of Multicore Concurrency Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

For multi-threaded applications, a singlequeue abstraction can reduce tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 62: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

21

Measurements: Sources of Tail Latencies

Impact of Multicore Concurrency Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

For multi-threaded applications, a singlequeue abstraction can reduce tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 63: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

21

Measurements: Sources of Tail Latencies

Impact of Multicore Concurrency Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

For multi-threaded applications, a singlequeue abstraction can reduce tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 64: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

22

Measurements: Sources of Tail Latencies

Source of Tail Latency Potential way to fix

Background Processes Isolate by running on a dedicated core.

Concurrency Model Ensure a single queue abstraction.

Interrupt Processing

Page 65: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

22

Measurements: Sources of Tail Latencies

Source of Tail Latency Potential way to fix

Background Processes Isolate by running on a dedicated core.

Concurrency Model Ensure a single queue abstraction.

Interrupt Processing

Page 66: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

23

Measurements: Sources of Tail Latencies

How can interrupts affect tail latency?

By default, Linux irqbalance spreads interrupts across all cores.

OS pre-empts Memcached threads frequently.

Introduces extra context switching overheads and cache pollution.

How can we mitigate it?

Separate cores for interrupt processing and application threads.

3 cores run Memcached threads, and 1 core processes interrupts.

Page 67: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

23

Measurements: Sources of Tail Latencies

How can interrupts affect tail latency?

By default, Linux irqbalance spreads interrupts across all cores.

OS pre-empts Memcached threads frequently.

Introduces extra context switching overheads and cache pollution.

How can we mitigate it?

Separate cores for interrupt processing and application threads.

3 cores run Memcached threads, and 1 core processes interrupts.

Page 68: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

24

Measurements: Sources of Tail Latencies

Impact of Interrupt Processing

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

4 core Linux - Interrupt spread

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

4 core Linux - Interrupt spread

4 core Linux - Separate Interrupt core

Separate cores for interrupt and applicationprocessing improves tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 69: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

24

Measurements: Sources of Tail Latencies

Impact of Interrupt Processing

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

4 core Linux - Interrupt spread

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

4 core Linux - Interrupt spread

4 core Linux - Separate Interrupt core

Separate cores for interrupt and applicationprocessing improves tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 70: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

24

Measurements: Sources of Tail Latencies

Impact of Interrupt Processing

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

4 core Linux - Interrupt spread

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

4 core Linux - Interrupt spread

4 core Linux - Separate Interrupt core

Separate cores for interrupt and applicationprocessing improves tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 71: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

25

Measurements: Sources of Tail Latencies

Other sources of tail latency

Source of Tail Latency Underlying Cause

Thread Scheduling Policy Non-FIFO ordering of requests.

NUMA Effects Increased latency across NUMA nodes.

Hyper-threadingContending hyper-threads can increaselatency.

Power Saving FeaturesExtra time required to wake CPU fromidle state.

Page 72: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

26

Summary

Summary and Future Works

We explored hardware, OS and application-level sources of tail latency.

Pin-point sources using finegrained timestaming, and an ideal model.

We obtain substantial improvements, close to ideal distributions.

99.9th percentile latency of Memcached from 5 ms to 32 µs.

Sources of tail latency in multi-process environment.

How does virtualization effect tail latency?

Overhead of virtualization, interference from other VMs.

New effects when moving to a distributed setting, network effects.

Page 73: Tales of the Tail Hardware, OS, and Application-level ......1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K.

26

Summary

Summary and Future Works

We explored hardware, OS and application-level sources of tail latency.

Pin-point sources using finegrained timestaming, and an ideal model.

We obtain substantial improvements, close to ideal distributions.

99.9th percentile latency of Memcached from 5 ms to 32 µs.

Sources of tail latency in multi-process environment.

How does virtualization effect tail latency?

Overhead of virtualization, interference from other VMs.

New effects when moving to a distributed setting, network effects.