asyncio internals

45
asyncio internals Saúl Ibarra Corretgé @saghul PyGrunn 2014 Friday, May 9, 14

description

Slides from the talk given at PyGrunn 2014 about asyncio internals.

Transcript of asyncio internals

Page 1: asyncio internals

asyncio internalsSaúl Ibarra Corretgé

@saghul

PyGrunn 2014Friday, May 9, 14

Page 2: asyncio internals

Intro

New asynchronous I/O framework for Python

PEP-3156

Python >= 3.3 (backport available: Trollius)

Uses new language features: yield from

Designed to interoperate with other frameworks

You went to Rodrigo’s talk earlier today, right?

Friday, May 9, 14

Page 3: asyncio internals

Friday, May 9, 14

Page 4: asyncio internals

Architecture

Event loop

Coroutines, Futures and Tasks

Transports, Protocols and Streams

I’ll cover these

Homework!

Friday, May 9, 14

Page 5: asyncio internals

Event Loop

Friday, May 9, 14

Page 6: asyncio internals

Calculatepoll time Poll

Runcallbacks

Friday, May 9, 14

Page 7: asyncio internals

There is no abstraction for an “event”

It runs callbacks which are put in a queue

Callbacks can be scheduled due to i/o, time or user desire

The event loop acts as an implicit scheduler

Friday, May 9, 14

Page 8: asyncio internals

Simplified

def call_soon(self, callback, *args): handle = events.Handle(callback, args, self) self._ready.append(handle) return handle

Friday, May 9, 14

Page 9: asyncio internals

events.Handle is like a “callbak wrapper”

The ready queue is a deque

Once per loop iteration al handles in the ready queue are executed

Friday, May 9, 14

Page 10: asyncio internals

def call_later(self, delay, callback, *args): return self.call_at(self.time() + delay, callback, *args)

def call_at(self, when, callback, *args): timer = events.TimerHandle(when, callback, args, self) heapq.heappush(self._scheduled, timer) return timer

Simplified

Friday, May 9, 14

Page 11: asyncio internals

Timers are stored in a heap (loop._scheduled)

TimerHandle subclasses Handle, but stores the time when it’s due and has comparison methods for keeping the heap sorted by due time

Friday, May 9, 14

Page 12: asyncio internals

ntodo = len(self._ready)for i in range(ntodo): handle = self._ready.popleft() if not handle._cancelled: handle._run()handle = None # break cycles

Friday, May 9, 14

Page 13: asyncio internals

This is the single place where the ready queue is iterated over

A thread-safe iteration method is used, since other threads could modify the ready queue (see call_soon_threadsafe)

If any handles are scheduled while the ready queue is being processed, they will be run on the next loop iteration

Friday, May 9, 14

Page 14: asyncio internals

Different polling mechanisms on Unix: select, poll, epoll, kqueue, devpoll

Windows is a completely different beast

Different paradigms: readyness vs completion

APIs are provided for both

I/O handling

Friday, May 9, 14

Page 15: asyncio internals

I/O handling APIs

Readyness style

add_reader/add_writer

remove_reader/remove_writer

Completion style

sock_recv/sock_sendall

sock_connect/sock_accept

Friday, May 9, 14

Page 16: asyncio internals

import selectors

New module in Python 3.4

Consistent interface to Unix polling mechanisms

On Windows it uses select()

64 file descriptors default* limit - WEBSCALE!

IOCP is the way to go, but has a different API

Caveat emptor: doesn’t work for file i/o

Friday, May 9, 14

Page 17: asyncio internals

Simplified

def add_reader(self, fd, callback, *args): handle = events.Handle(callback, args, self) try: key = self._selector.get_key(fd) except KeyError: self._selector.register(fd, selectors.EVENT_READ, (handle, None)) else: mask, (reader, writer) = key.events, key.data self._selector.modify(fd, mask | selectors.EVENT_READ, (handle, writer)) if reader is not None: reader.cancel()

Friday, May 9, 14

Page 18: asyncio internals

The selector key stores the fd, events and user provided arbitrary data

In this case the arbitrary data is the reader, writer handle tuple

Only one reader and writer per fd are allowed

Friday, May 9, 14

Page 19: asyncio internals

1.Calculate timeout

2.Block for I/O

3.Process I/O events: schedule callbacks

4.Process timers: schedule callbacks

5.Run pending callbacks

Polling for I/O

Friday, May 9, 14

Page 20: asyncio internals

timeout = None if self._ready: timeout = 0 elif self._scheduled: # Compute the desired timeout. when = self._scheduled[0]._when deadline = max(0, when - self.time()) if timeout is None: timeout = deadline else: timeout = min(timeout, deadline)

event_list = self._selector.select(timeout) self._process_events(event_list)

end_time = self.time() while self._scheduled: handle = self._scheduled[0] if handle._when >= end_time: break handle = heapq.heappop(self._scheduled) self._ready.append(handle)

# run all handles in the ready queue...

Simplified

Friday, May 9, 14

Page 21: asyncio internals

If timeout is None an infinite poll is performed

_process_events puts the read / write handles in the ready queue, if applicable

Friday, May 9, 14

Page 22: asyncio internals

def call_soon_threadsafe(self, callback, *args): handle = self._call_soon(callback, args) self._write_to_self() return handle

Simplified

Friday, May 9, 14

Page 23: asyncio internals

The event loop has the read end of a socketpair added to the selector

When _write_to_self is called the loop will be “waken up” from the select/poll/epoll_wait/kevent syscall

Friday, May 9, 14

Page 24: asyncio internals

Coroutines, Futures & Tasks

Friday, May 9, 14

Page 25: asyncio internals

Generator functions, can also receive values

Use the @asyncio.coroutine decorator

Does extra checks in debug mode

Serves as documentation

Chain them with yield from

Coroutines

Friday, May 9, 14

Page 26: asyncio internals

Not actually PEP-3148 (concurrent.futures)

API almost identical

Represent a value which is not there yet

yield from can be used to wait for it!

asyncio.wrap_future can be used to wrap a PEP-3148 Future into one of these

Futures

Friday, May 9, 14

Page 27: asyncio internals

f = Future()

Usually a future will be the result of a function

f.set_result / f.set_exception

Someone will set the result eventually

yield from f

Wait until the result arrives

add_done_callback / remove_done_callback

Callback based interface

Friday, May 9, 14

Page 28: asyncio internals

def set_result(self, result): if self._state != _PENDING: raise InvalidStateError('{}: {!r}'.format(self._state, self)) self._result = result self._state = _FINISHED self._schedule_callbacks()

def _schedule_callbacks(self): callbacks = self._callbacks[:] if not callbacks: return self._callbacks[:] = [] for callback in callbacks: self._loop.call_soon(callback, self)

Friday, May 9, 14

Page 29: asyncio internals

After the result or exception is set all callbacks added with Future.add_done_callback are called

Note how callbacks are scheduled in the event loop using call_soon

Friday, May 9, 14

Page 30: asyncio internals

Simplifieddef sock_connect(self, sock, address): fut = futures.Future(loop=self) self._sock_connect(fut, False, sock, address) return fut

def _sock_connect(self, fut, registered, sock, address): fd = sock.fileno() if registered: self.remove_writer(fd) if fut.cancelled(): return try: if not registered: sock.connect(address) else: err = sock.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR) if err != 0: raise OSError(err, 'Connect call failed %s' % (address,)) except (BlockingIOError, InterruptedError): self.add_writer(fd, self._sock_connect, fut, True, sock, address) except Exception as exc: fut.set_exception(exc) else: fut.set_result(None)

Friday, May 9, 14

Page 31: asyncio internals

Not a coroutine, but we can wait on it using yield from because it returns a Future

The Uncallback Pattern (TM)

Hey, look at those nice exceptions: BlockingIOError, InterruptedError

Much nicer than checking if errno is EWOULDBLOCK or EINTR

Friday, May 9, 14

Page 32: asyncio internals

def run_until_complete(self, future): future = tasks.async(future, loop=self) future.add_done_callback(_raise_stop_error) self.run_forever() future.remove_done_callback(_raise_stop_error) if not future.done(): raise RuntimeError('Event loop stopped before Future completed.') return future.result()

Friday, May 9, 14

Page 33: asyncio internals

Loop.run_forever will run the loop until Loop.stop is called

_raise_stop_error is an implementation detail, it causes an exception to bubble up and makes run_forever return

Friday, May 9, 14

Page 34: asyncio internals

def __iter__(self): if not self.done(): self._blocking = True yield self # This tells Task to wait for completion. assert self.done(), "yield from wasn't used with future" return self.result() # May raise too.

Friday, May 9, 14

Page 35: asyncio internals

Returning a value from __iter__ is the same as raising StopIteration(value)

The _blocking flag is used to check if yield future was used intead of yield from future

Task has a way to wait on a Future if yielded to it, also checks that yield from was used (_blocking flag)

Friday, May 9, 14

Page 36: asyncio internals

Friday, May 9, 14

Page 37: asyncio internals

Unit of concurrent asynchronous work

It’s actually a coroutine wrapped in a Future

Magic!

Schedules callbacks using loop.call_soon

Use asyncio.async to run a coroutine in a Task

Tasks

Friday, May 9, 14

Page 38: asyncio internals

import asyncio

@asyncio.coroutinedef f(n, x): while True: print(n) yield from asyncio.sleep(x)

loop = asyncio.get_event_loop()asyncio.async(f('f1', 0.5))asyncio.async(f('f2', 1.5))loop.run_forever()

Friday, May 9, 14

Page 39: asyncio internals

Both coroutines will run concurrently

asyncio.async returns a Task if a coroutine was passed, or the unchanged value if a Future was passed

Go and check how asyncio.sleep is implemented, it’s really simple!

Friday, May 9, 14

Page 40: asyncio internals

def __init__(self, coro, *, loop=None): assert iscoroutine(coro), repr(coro) # Not a coroutine function! super().__init__(loop=loop) self._coro = iter(coro) # Use the iterator just in case. self._fut_waiter = None self._loop.call_soon(self._step)

Simplified

Friday, May 9, 14

Page 41: asyncio internals

Tasks are not run immediately, the actual work is done by Task._step, which is scheduled with loop.call_soon

_fut_waiter is used to store a Future which this Task is waiting for

Friday, May 9, 14

Page 42: asyncio internals

Simplifieddef _step(self, value=None, exc=None): assert not self.done(), '_step(): already done' coro = self._coro self._fut_waiter = None try: if exc is not None: result = coro.throw(exc) elif value is not None: result = coro.send(value) else: result = next(coro) except StopIteration as exc: self.set_result(exc.value) except Exception as exc: self.set_exception(exc) except BaseException as exc: self.set_exception(exc) raise else: if isinstance(result, futures.Future): # Yielded Future must come from Future.__iter__(). if result._blocking: result._blocking = False result.add_done_callback(self._wakeup) self._fut_waiter = result else: # error... elif result is None: # Bare yield relinquishes control for one event loop iteration. self._loop.call_soon(self._step) else: # error...

Friday, May 9, 14

Page 43: asyncio internals

The Magic (TM)

The coroutine is stepped over until it finishes

Note the check of _blocking to verify yield vs yield from usage

The _wakeup function will schedule _step with either a result or an exception

At any point in time, either _step is scheduled or _fut_waiter is not None

Friday, May 9, 14

Page 44: asyncio internals

There is a lot more in asyncio

Go read PEP-3156

Don’t be afraid of looking under the hood

Don’t rely on internals, they are implementation details

Join the mailing list, check the third party libraries!

raise SystemExit

“I hear and I forget. I see and I remember.

I do and I understand.” - Confucius

Friday, May 9, 14

Page 45: asyncio internals

Questions?

bettercallsaghul.com@saghul

Friday, May 9, 14