unba.se - ACM CSCW 2017 - IWCES15

27
unba.se Framework for distributed computing and collaboration Daniel Norman CTO, güdTECH unba.se contributor Twitter: @DreamingInCode Michael MacFadden CTO, Convergence Labs unba.se contributor Twitter: @MMacFadden

Transcript of unba.se - ACM CSCW 2017 - IWCES15

Page 1: unba.se - ACM CSCW 2017 - IWCES15

unba.seFramework for distributed computing and collaboration

Daniel NormanCTO, güdTECHunba.se contributorTwitter: @DreamingInCode

Michael MacFaddenCTO, Convergence Labsunba.se contributorTwitter: @MMacFadden

Page 2: unba.se - ACM CSCW 2017 - IWCES15

BackgroundThe Problem

For decades, we’ve suffered from various myths:

● Networks can be made reliable

● A single arbiter of state makes a consistency-model “strong”

● Objective state exists

● Objective simultaneity exists

These convenient beliefs were “close enough” for a while, but not much longer.

Page 3: unba.se - ACM CSCW 2017 - IWCES15

BackgroundThe Problem

Distance between you and the stuff you probably care about:

Much much less

Distance between you and the arbiter of linearization:

Usually thousands of Km

(Stuff you care about)

Collaborating with Bob & Alice down the hall

YouMessages/docs you authoredMessages/docs you readYour wristwatchYour IOT devicesNext door neighbors

The linearization you must visitBackh

oes

Congestion

Light travel time

Alligators

Net non-neutrality

BGP Screwups

Power outagesState Hacking

Spanning tree errors

Tripped-over cables

Cat on serverSubmarine cable break

Settlement-free

peering disputes

State Surveillance

Packet corruption

Lie-FiUnder Provisioned hardware

Cheap electrolytic capacitors

Thousands of Km of:

DNS errors

Saturated cell-tower backhaul

RF reflections

Interference from microwave oven

Cosmic Rays

Disgruntled employees

DDOS AttacksRat-chewed cables

Corrosion Late internet payments

Gravitational time-dilation

Doppler effect

F*#KING BOINGO

Failed B-side PowerCore router problem

F*#KING TMOBILE

F*#KING NETGEAR

Cabinet switch failure

Misconfigured health check

NAT Misconfiguration

Load balancer failure

NTP failures

Page 4: unba.se - ACM CSCW 2017 - IWCES15

BackgroundThe Problem

We’ve tried lots of things

● Multi-mastering

● Geo-sharding

● Eventual consistency

● Simulated Simultaneity ( global wallclock )

One way or another, these all violate the user’s expectation

Page 5: unba.se - ACM CSCW 2017 - IWCES15

BackgroundWhat do humans expect?

I set my glass on the table ≻ It’s there when I pick it up.

Edge computing IRL.Local, coordination-free consistency.No need to visit Ashburn.

Page 6: unba.se - ACM CSCW 2017 - IWCES15

BackgroundExamples of The Problem

● Total internet outage○ Bob and Alice prevented from collaborating, even if they’re just down the hall

● LieFi○ Elevator door closes, messaging UI hangs, unable to determine message state

● Replication failure○ IO/network failure causes data loss – If you’re lucky, failover state machine works promptly,

decisively and correctly. You’re not that lucky

● Application state gets “weird”○ Eventual consistency models lead to non-causal outcomes, and unhappy/confused users.

Page 7: unba.se - ACM CSCW 2017 - IWCES15

BackgroundThis problem isn’t going away

● Globalization, Trade, Collaboration are increasing

● System complexity is becoming unmanageable

● Local bandwidth requirements will scale exponentially, backbone will not

● Safety / Mission critical uses are proliferating, outages will get worse

● Distributed systems challenges becoming increasingly problematic on-chip

Page 8: unba.se - ACM CSCW 2017 - IWCES15

unba.se design ideologyIt’s time for another approach

Page 9: unba.se - ACM CSCW 2017 - IWCES15

● State is an illusion, Sort of...

● Computation and storage must occur near the edge

● A priori resource planning is undesirable

● Durability is non-boolean

● Distributed causal consistency – possible, efficient, desirable

unba.se design ideologyMotivating Suppositions

Page 10: unba.se - ACM CSCW 2017 - IWCES15

unba.se motivating suppositionsState is an illusion, sort of

● State is a projection of observed events○ Absolutely no such thing as shared state, ever.

● It’s an abstraction we like○ Not entirely unlike “solid matter” vs coulomb repulsion

● Some abstractions we need not question in CS○ Example: “why don’t I fall through the floor?”

● Collaborative systems however, DO warrant deeper reflection on “State”○ ALL multi-user systems are collaborative systems○ The user expects causal fidelity beyond which can be provided serializable systems.

Philosophy Corner:

What if there was no reality, only causality?

Page 11: unba.se - ACM CSCW 2017 - IWCES15

unba.se motivating suppositionsLocation, Location, Location

● Humans expect devices to behave like physical objects○ Physical proximity is literally the only way to meet their causal expectations○ How proximal is “proximal”?

● Travel to arbiter of linearization is an unacceptable default assumption○ Physical reality does not have centralized arbiters of truth, and neither should we

● Data should be localized by origin and utility, not by shard○ Why would you want to manually geo-shard?○ Or worse, shard by some non-spatial criteria

● A priori shard/resource planning is expensive, and you’ll do it wrong○ Humans are expensive, terrible at system orchestration, capacity planning○ Automated A priority planning is marginally better○ A squeezed balloon does not “plan” the expansion on the other side

Page 12: unba.se - ACM CSCW 2017 - IWCES15

unba.se motivating suppositionsDurability is not a boolean

● Every piece of data has a nonzero probability of loss

● Most database systems merely tend to experience data loss all at once○ Usually through corruption○ Sometimes through common mode failure

● We care more about some data than others

● So let’s set a specific target durability for each piece of data

We prefer to plan how often to lose data rather than engage in self-deception about its permanence.

Page 13: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsCore Concepts Overview

● Immutable Data Structure

● Gravity and pressure

● Sparse vector clocks

● Commutative index merging

● Selective hearing

● Infectious knowledge consistency model

Page 14: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsImmutable Data Structure

Page 15: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsImmutable Data Structure

Page 16: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsImmutable Data Structure

Page 17: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsState Projection

Page 18: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsState Projection

Page 19: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsInfectious Knowledge

Page 20: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsGravity and Pressure

Page 21: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsBeacons / Sparse Vector Clocks

Page 22: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsProbabilistic Merging

Page 23: unba.se - ACM CSCW 2017 - IWCES15

unba.se core conceptsCommutative Index Merging

Page 24: unba.se - ACM CSCW 2017 - IWCES15

With this design we may:

● Align system performance with user’s causal expectations

● Increase availability (for data I care about)

● Reduce system complexity

● Save time and money on system operations / staff

unba.se core conceptsConclusion

Page 25: unba.se - ACM CSCW 2017 - IWCES15

Thank You!

Daniel NormanCTO, güdTECHunba.se contributorTwitter: @DreamingInCode

Michael MacFaddenCTO, Convergence Labsunba.se contributorTwitter: @MMacFadden

Page 26: unba.se - ACM CSCW 2017 - IWCES15

Bonus slides

Page 27: unba.se - ACM CSCW 2017 - IWCES15

● Consistency○ Is against human causal expectation, not serializability○ Serializability usually just a hack for causality

● Availability○ Objectively no such thing as availability○ Is only ever declared according to a human observational standard

● Partitions○ Objectively no such thing as a partition○ Only exist according to an observational standard

With sufficiently tortured definitions, one can prove a ham sandwich.

See Martin Kleppmann’s excellent paper: A critique of the CAP theorem – https://arxiv.org/abs/1509.05393

Obligatory SlideProblems with CAP Theorem