Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure...

19
Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie https://runway.systems A new tool for distributed systems design

Transcript of Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure...

Page 1: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Runway

Diego OngaroLead Software Engineer, Compute Infrastructure

@ongardiehttps://runway.systems

A new tool for distributed systems design

Page 2: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Outline

1. Why we need new tools for distributed systems design

2. Overview and demo of Runway

3. Building a Runway model

Page 3: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Distributed Systems Are Hard

● Concurrency and message delays

● Failures, failures during failures

● Many possible interleavings of events

● Little visibility, poor debugging environments

Page 4: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Raft Background / Difficult Bug

Raft: fault-tolerant consensus algorithm

Used in many examples in this talk

Quick summary:

1. Use majority voting to elect a leader2. Leader replicates its log to followers

Difficult design bug:

Page 5: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Typical Approaches Find Design Issues Too Late

These are good techniques for implementation errors

● Localized: easy to fix

Too expensive for design errors

● May require large changes

● May cause unforeseen consequences

Let’s find the right design sooner...

Code reviews

Unit tests

System tests

Randomized tests, fuzzing, Jepsen

Benchmarks

Metrics

Dark launches

Bug reports

Page 6: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Communication:

● Build intuition quickly● Unambiguous● Reviewable: discuss major issues and

consider alternatives

Design Phase

Evaluation:

● Simplicity● Correctness● Performance● Availability

State of the art:

● Visualization (animation)● Specification● Model checking● Simulation

Commonly used today:

Tool

sG

oals

Page 7: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Visualization

Specification

Model checking

Simulation

A model is a representation of a system that captures its essential concepts and omits irrelevant details.

Design Tools Use System Models

Page 8: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

A Tour of Runway

Page 9: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Runway Overview

Integrated into one tool: write one model, get many benefits

Specify, simulate, visualize, and check system models

model(spec)

(error)interaction

graphs, data

visualization(animation)

randomized simulator

modelchecker

executionS2:recv

S3:proc

S1:send

Page 10: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Runway DemoToo many bananas, elevators, and Raft

Page 11: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Runway Integration

Independent tools: create independent models

● Write similar models for different tools● Change the design: revise them all

Runway: reuse the same model

● Lower cost, additional benefit ⇒ create models sooner● More likely to find modeling bugs

TLA+

500 LOC

JS

300 LOC

Rust

550 LOC

pseudo

150 LOC

Specification, simulation, and model checking all benefit from visualization

Page 12: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Building a Runway Model

Page 13: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Developing a Model

Idealized steps:

1. Sketch view by hand2. Define types, state variables3. Create view based on sketch4. Write invariants5. Write transition rules

visualization aids with

debugging

specification

view

Tip: set convenient starting state

Page 14: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

● Specification is code

● Define starting state, transition rules, and invariants

○ Labeled Transition System

● Rules encode behavior + failures

● Applying a rule is atomic (one at a time)

● A rule is active if applying it would change the state

● If multiple rules are active, system decides

○ Simulator: random choice

○ Model checker: walk the tree

Runway’s Specification Language

Page 15: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Example: Too Many Bananas (1)Type and variable declarations, invariant

type-safe variant: can’t access unless ReturningFromStore

Page 16: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Example: Too Many Bananas (2)Transition rule

no state changed:

inactive until readset changes

Page 17: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

It’s About Time

Developers: each server tries to approximate “the global clock”

Physicists: Ha! Blah blah blah, blah, blah! Blah blah blah blah. Blah!

Want some safety properties to hold even if clocks misbehave

Need time to describe availability and performance

Runway’s current approach: global clock, conditionally

server.timeoutAt <= clocktrue

Page 18: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

Summary

● Let’s apply tools to help us design distributed systems

● Modeling helps focus our attention on concepts, leaving out unimportant details

● Runway combines spec, model checking, simulation, and interactive visualization

● Go view the models, build your own, and help develop Runway

Page 19: Runway - web.stanford.edu · Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie A new tool for distributed systems design. Outline 1. Why we need new tools

solve design problems in the design phasehttps://runway.systems