Livio slides-libflexsc-usenix-atc11

30
Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto

Transcript of Livio slides-libflexsc-usenix-atc11

Page 1: Livio slides-libflexsc-usenix-atc11

Exception-Less System Calls forEvent-Driven Servers

Livio Soares and Michael Stumm

University of Toronto

Page 2: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

2

Talk overview➔ At OSDI'10: exception-less system calls

➔ Technique targeted at highly threaded servers➔ Doubled performance of Apache

➔ Event-driven servers are popular➔ Faster than threaded servers

We show that exception-less system calls make event-driven server faster

➔ memcached speeds up by 25-35%memcached speeds up by 25-35%➔ nginx speeds up by 70-120%nginx speeds up by 70-120%

Page 3: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

3

Event-driven server architectures➔ Supports I/O concurrency with a single

execution context➔ Alternative to thread based architectures

➔ At a high-level:➔ Divide program flow into non-blocking stages➔ After each stage register interest in event(s)➔ Notification of event is asynchronous, driving next

stage in the program flow➔ To avoid idle time, applications multiplex execution

of multiple independent stages

Page 4: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

4

Example: simple network server

void server() {......fd = accept();......read(fd);......write(fd);......close(fd);......

}

Page 5: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

5

S2

S3

S4

S5

S1

Example: simple network server

void server() {......fd = accept();......read(fd);......write(fd);......close(fd);......

}

S1

S2

S3

S4

S5

UNIX options:

Non-blocking I/Opoll()select()epoll()

Async I/O

Page 6: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

6

Performance: events vs. threads

0 100 200 300 400 5000

2000

4000

6000

8000

10000

12000

14000ApacheBench

nginx (events)Apache (threads)

Concurrency

Re

qu

est

s/se

c.

nginx delivers 1.7x the throughput of Apache;gracefully copes with high loads

Page 7: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

7

Issues with UNIX event primitives➔ Do not cover all system calls

➔ Mostly work with file-descriptors (files and sockets)➔ Overhead

➔ Tracking progress of I/O involves both application and kernel code

➔ Application and kernel communicate frequently

Previous work shows that fine-grain mode switching can half processor efficiency

Page 8: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

8

FlexSC component overview

FlexSC and FlexSC-Threads presented at OSDI 2010

This work: libflexsc for event-driven servers 1) memcached throughput increase of up to 35% 2) nginx throughput increase of up to 120%

Page 9: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

9

Benefits for event-driven applications

1) General purpose➔ Any/all system calls can be asynchronous

2) Non-intrusive kernel implementation➔ Does not require per syscall code

3) Facilitates multi-processor execution➔ OS work is automatically distributed

4) Improved processor efficiency➔ Reduces frequent user/kernel mode switches

Page 10: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

10

Summary of exception-less syscalls

Page 11: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

11

Exception-less interface: syscall page

write(fd, buf, 4096);

entry = free_syscall_entry();

/* write syscall *//* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096;entry->status = SUBMITSUBMIT;

whilewhile (entry->status != DONEDONE)do_something_else();

returnreturn entry->return_code;

Page 12: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

12

Exception-less interface: syscall page

write(fd, buf, 4096);

entry = free_syscall_entry();

/* write syscall *//* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096;entry->status = SUBMITSUBMIT;

whilewhile (entry->status != DONEDONE)do_something_else();

returnreturn entry->return_code;

SUBMITSUBMIT

Page 13: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

13

Exception-less interface: syscall page

write(fd, buf, 4096);

entry = free_syscall_entry();

/* write syscall *//* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096;entry->status = SUBMITSUBMIT;

whilewhile (entry->status != DONEDONE)do_something_else();

returnreturn entry->return_code;

DONEDONE

Page 14: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

14

Syscall threads➔ Kernel-only threads

➔ Part of application process➔ Execute requests from syscall page➔ Schedulable on a per-core basis

Page 15: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

15

Dynamic multicore specialization

1) FlexSC makes specializing cores simple2) Dynamically adapts to workload needs

Core 0

Core 2

Core 1

Core 3

Page 16: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

16

libflexsc: async syscall library➔ Async syscall and notification library

➔ Similar to libevent➔ But operates on syscalls instead of file-descriptors

➔ Three main components:

1) Provides main loop (dispatcher)

2) Support asynchronous syscall with associated callback to notify completion

3) Cancellation support

Page 17: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

17

Main API: async system call1  struct flexsc_cb {2  void (*callback)(struct flexsc_cb *); /* event handler */3  void *arg;                            /* auxiliary var */4  int64_t ret;                          /* syscall return */5  }6 7  int flexsc_##SYSCALL(struct flexsc_cb *, ... /*syscall args*/); 8   /* Example: asynchronous accept */9   struct flexsc_cb cb;10  cb.callback = handle_accept;11  flexsc_accept(&cb, master_sock, NULL, 0);12 13  void handle_accept(struct flexsc_cb *cb) {14   int fd = cb­>ret;15   if (fd != ­1) {16   struct flexsc_cb read_cb;17   read_cb.callback = handle_read;18   flexsc_read(&read_cb, fd, read_buf, read_count);19   }20  }

Page 18: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

18

memcached port to libflexsc➔ memcached: in-memory key/value store

➔ Simple code-base: 8K LOC➔ Uses libevent

➔ Modified 293 LOC➔ Transformed libevent calls to libflexsc➔ Mostly in one file: memcached.c➔ Most memcached syscalls are socket based

Page 19: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

19

nginx port to libflexsc➔ Most popular event-driven webserver

➔ Code base: 82K LOC➔ Natively uses both non-blocking (epoll) I/O and

asynchronous I/O

➔ Modified 255 LOC➔ Socket based code already asynchronous➔ Not all file-system calls were asynchronous

➔ e.g., open, fstat, getdents

➔ Special handling of stack allocated syscall args

Page 20: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

20

Evaluation➔ Linux 2.6.33➔ Nehalem (Core i7) server, 2.3GHz

➔ 4 cores➔ Client connected through 1Gbps network➔ Workloads

➔ memslap on memcached (30% user, 70% kernel)➔ httperf on nginx (25% user, 75% kernel)

➔ Default Linux (“epoll”) vs. libflexsc (“flexsc”)

Page 21: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

21

memcached on 4 cores

0 200 400 600 800 10000

20000

40000

60000

80000

100000

120000

140000

flexscepoll

Request Concurrency

Th

rou

gh

pu

t (r

eq

ue

sts/

sec

.)

30% improvement

Page 22: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

22

memcached processor metrics

CPIL2

d-cachei-cache

CPIL2

d-cachei-cache

0

0.2

0.4

0.6

0.8

1

1.2

Re

lati

ve P

erf

orm

an

ce

(fle

xsc

/ep

oll)

User Kernel

Page 23: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

23

httperf on nginx (1 core)

0 10000 20000 30000 40000 50000 600000

20

40

60

80

100

120

flexscepoll

Requests/s

Th

rou

gh

pu

t (M

bp

s)

100% improvement

Page 24: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

24

nginx processor metrics

CPIL2

d-cachei-cache

BranchCPI

L2d-cache

i-cacheBranch

0

0.2

0.4

0.6

0.8

1

1.2

Re

lati

ve P

erfo

rma

nce

(fle

xsc

/ep

oll)

User Kernel

Page 25: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

25

Concluding remarks➔ Current event-based primitives add overhead

➔ I/O operations require frequent communication between OS and application

➔ libflexsc: exception-less syscall library

1) General purpose

2) Non-intrusive kernel implementation

3) Facilitates multi-processor execution

4) Improved processor efficiency➔ Ported memcached and nginx to libflexsc

➔ Performance improvements of 30 - 120%

Page 26: Livio slides-libflexsc-usenix-atc11

Exception-Less System Calls forEvent-Driven Servers

Livio Soares and Michael Stumm

University of Toronto

Page 27: Livio slides-libflexsc-usenix-atc11

Backup Slides

Page 28: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

28

Difference in improvements

Server memcached nginxFrequency of syscalls

(in instructions) 3,750 1,460

Why does nginx improve more than memcached?

1) Frequency of mode switches:

2) nginx uses greater diversity of system calls➔ More interference in processor structures (caches)

3) Instruction count reduction➔ nginx with epoll() has connection timeouts

Page 29: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

29

Limitations➔ Scalability (number of outstanding syscalls)

➔ Interface: operations perform linear scan➔ Implementation: overheads of syscall threads

non-negligible➔ Solutions

➔ Throttle syscalls at application or OS➔ Switch interface to scalable message passing➔ Provide exception-less versions of async I/O➔ Make kernel fully non-blocking

Page 30: Livio slides-libflexsc-usenix-atc11

Livio Soares | Exception-Less System Calls for Event-Driven Servers

30

Latency (ApacheBench)

1 core 2 cores0

5

10

15

20

25

30epollflexsc

La

ten

cy

(ms)

50% latencyreduction