Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps...

57
Yair Amir 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University Yair Amir www.dsn.jhu.edu ACM STC’07

Transcript of Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps...

Page 1: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 12 Nov 07

The Insider Threat in Scalable Distributed Systems:

Algorithms, Metrics, Gaps

Distributed Systems and Networks labJohns Hopkins University

Yair Amir

www.dsn.jhu.edu

ACM STC’07

Page 2: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 22 Nov 07

Acknowledgement

• Johns Hopkins University– Claudiu Danilov, Jonathan Krisch, John Lane

• Purdue University– Cristina Nita-Rotaru, Josh Olsen, Dave Zage

• Hebrew University– Danny Dolev

• Telcordia Technologies– Brian Coan

Page 3: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 32 Nov 07

This Talk in ContextScalable Information Access & Communication

• High availability (80s – 90s)– Benign faults, accidental errors, crashes, recoveries, network

partitions, merges.– Fault tolerant replication as an important tool. – Challenges – consistency, scalability and performance.

• Security (90s – 00s) – Attackers are external.– Securing the distributed system and the network.– Crypto+ as an important tool.

• Survivability ( 00s – …) – Millions of compromised computers: there is always a chance the

system will be compromised.– Lets start the game when parts of it are already compromised.– Can the system still achieve its goal, and under what assumptions ?– Challenges – assumptions, scalability, performance.

Page 4: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 42 Nov 07

Trends: Information Access &

Communication• Networks become one

– From one’s network to one Internet.– Therefore: (inherently,) the environment becomes increasingly

hostile.

• Stronger adversaries => weaker models.– Benign faults - mean time to failure, fault independence

• Fail-stop, crash-recovery, network partitions-merges.

• Goals: high availability, consistency (safety, liveness).

– External attacks – us versus them• Eavesdropping, replay attacks, resource consumption kind of DoS.

• Goals: keep them out. Authentication, Integrity, Confidentiality.

– Insider attacks – the enemy is us• Byzantine behavior

• Goals: safety, liveness, (performance?)

Page 5: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 52 Nov 07

The Insider Threat• Networks are already hostile!

– 250,000 new zombie nodes per day.– Very likely that some of them are part of critical

systems.– Insider attacks are a real threat, even for well-

protected systems.

• Challenges:– Service level: Can we provide “correct” service?– Network level: Can we “move” the bits?– Client level: Can we handle “bad” input?

Page 6: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 62 Nov 07

The Insider Threat in Scalable Systems

• Service level: Byzantine Replication– Hybrid approach: few trusted components, everything

else can be compromised.– Symmetric approach: No trusted component,

compromise up to some threshold.

• Network level: Byzantine Routing– Flooding “solves” the problem.– “Stable” networks - some limited solutions,

good starting point [Awerbuch et al. 02]– “Dynamic” networks – open problem.

• Client level: ?– Input replication – not feasible in most cases.– Recovery after the fact – Intrusion detection,

tracking and backtracking [Chen et al. 03].– Open question – is there a better approach?

Page 7: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 72 Nov 07

Outline• Context and trends• Various levels of the insider threat problem• Service level problem formulation• Relevant background• Steward: First scalable Byzantine replication

– A bit on how it works– Correctness– Performance– Tradeoffs

• Composable architecture– A bit on how it works – BLink – Byzantine link protocol– Performance and optimization

• Theory hits reality– Limitation of existing correctness criteria– Proposed model and metrics

• Summary

Page 8: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 82 Nov 07

Service Level: Problem Formulation

• Servers are distributed in sites, over a Wide Area Network.• Clients issue requests to servers, then get back answers.• Some servers can act maliciously.• Wide area connectivity is limited and unstable.• How to get good performance and guarantee correctness ?• What is correctness?

Server Replicas

1 o o o2 3 N

Clients

A site

Page 9: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 92 Nov 07

Relevant Prior Work• Byzantine Agreement

– Byzantine generals [Lamport et al. 82], [Dolev 83]

• Replication with benign faults– 2-phase commit [Eswaran, Gray et al. 76]– 3-phase commit [Skeen, Stonebreaker 82]– Paxos [Lamport 98]

• Hybrid architectures– Hybrid Byzantine tolerant systems [Correia, Verissimo et al. 04]

• Symmetric approaches for Byzantine-tolerant replication– BFT [Castro, Liskov 99]– Separating agreement from execution [Yin, Alvisi et al. 03]– Fast Byzantine consensus [Martin, Alvisi 05]– Byzantine-tolerant storage using erasure codes [Goodson,

Reiter et al. 04]

Page 10: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Background: Paxos and BFT

• Paxos [Lamport 98]– Ordering coordinated by an

elected leader.– Two rounds among servers

during normal case (Proposal and Accept).

– Requires 2f+1 servers to tolerate f benign faults.

• BFT [Castro, Liskov 99]– Extends Paxos into the Byzantine

environment.

– One additional round of communication, crypto.

– Requires 3f+1 servers to tolerate f Byzantine servers.

2

C

0

1

request proposal accept replyC

0

1

2

3

request pre-prepare prepare commit reply

October 10, 2007SRDS 2007

Page 11: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 112 Nov 07

Background: Threshold Crypto

• Practical Threshold Signatures [Schoup 2000]– Each participant receives a secret share.– Each participant signs a certain message with its share, and

sends the signed message to a combiner.– Out of k valid signed shares, the combiner creates a

(k, n) threshold signature.

• A (k, n) threshold signature – Guarantees that at least k participants signed the same

message with their share.– Can be verified with simple RSA operations.– Combining the shares is fairly expensive. – Signature verification is fast.

Page 12: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 122 Nov 07

Steward: First Byzantine Replication Scalable to Wide

Area Networks

• Each site acts as a trusted unit that can crash or partition.

• Within each site: Byzantine-tolerant agreement (similar to BFT).– Masks f malicious faults in each site.– Threshold signatures prove agreement to other sites.

• ---------- that is optimally intertwined with --------------

• Between sites: light-weight, fault-tolerant protocol (similar to Paxos).

• There is no free lunch: we pay with more hardware.– 3f+1 servers in each site.

Server Replicas

1 o o o2 3 N

Clients

A site[DSN 2006]

Page 13: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 132 Nov 07

Outline• Context and trends• Various levels of the insider threat problem• Service level problem formulation• Relevant background• Steward: First scalable Byzantine replication

– A bit on how it works– Correctness– Performance– Tradeoffs

• Composable architecture– A bit on how it works – BLink – Byzantine link protocol– Performance and optimization

• Theory hits reality– Limitation of existing correctness criteria– Proposed model and metrics

• Summary

Page 14: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 142 Nov 07

Main Idea 1: Common Case Operation

• A client sends an update to a server at its local site.

• The update is forwarded to the leader site.

• The representative of the leader site assigns order in agreement and issues a threshold signed proposal.

• Each site issues a threshold signed accept.

• Upon receiving a majority of accepts, servers in each site “order” the update.

• The original server sends a response to the client.

Byzantine ordering

Threshold signedproposal (2f+1)

Threshold signedaccept (2f+1)

Page 15: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 152 Nov 07

Steward Hierarchy Benefits• Reduces the number of messages sent on the wide

area network.– O(N2) O(S2) – helps both in throughput and latency.

• Reduces the number of wide area crossings. – BFT-based protocols require 3 wide area crossings.– Paxos-based protocols require 2 wide area crossings.

• Optimizes the number of local Byzantine agreements.– A single agreement per update at leader site.– Potential for excellent performance.

• Increases system availability– (2/3 of total servers + 1) (A majority of sites).– Read-only queries can be answered locally.

Page 16: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 162 Nov 07

Steward Hierarchy Challenges

• Each site has a representative that:– Coordinates the Byzantine protocol inside the site.– Forwards packets in and out of the site.

• One of the sites act as the leader in the wide area protocol– The representative of the leading site is the one assigning sequence

numbers to updates.

• Messages coming out of a site during leader election are based on communication between 2f+1(out of 3f+1) servers inside the site.– There can be multiple sets of 2f+1 servers.– In some instances, multiple correct but different site messages can be

issued by a malicious representative.– It is sometimes impossible to completely isolate a malicious server

behavior inside its own site.

• How do we select and change representatives in agreement ?• How do we select and change the leader site in agreement ?• How do we transition safely when we need to change them ?

Page 17: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 172 Nov 07

Main Idea 2: View Changes• Sites change their local

representatives based on timeouts.

• Leader site representative has a larger timeout.– allows to contact at

least one correct rep. at other sites.

• After changing enough leader site representatives, servers at all sites stop participating in the protocol, and elect a different leading site.

Page 18: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 182 Nov 07

Correctness Criteria

• Safety:– If two correct servers order an update with the

same sequence i, then these updates are identical.

• Liveness:– If there exists a set of a majority of sites, each

consisting of at least 2f+1 correct, connected servers, and a time after which all sites in the set are connected, then if a client connected to a site in the set proposes an update, some correct server at a site in the set eventually orders the update.

Page 19: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 192 Nov 07

Intuition Behind a Proof• Safety:

Any agreement (ordering or view change) involves a majority of sites, and 2f+1 servers in each.

Any two majorities intersect in at least one site. Any two sets of 2f+1 servers in that site intersect in at least f+1

servers (which means at least one correct server). That correct server will not agree to order two different updates

with the same sequence.

• Liveness: A correct representative or leader site cannot be changed

by f local servers. The selection of different timeouts ensures that a correct

representative of the leader site has enough time to contact correct representatives at other sites.

Page 20: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 202 Nov 07

Testing Environment

Platform: Dual Intel Xeon CPU 3.2 GHz 64 bits 1 GByte RAM, Linux Fedora Core 4.

Library relies on Openssl :- Used OpenSSL 0.9.7a, Feb 2003.

Baseline operations:- RSA 1024-bits sign: 1.3 ms, verify: 0.07 ms.- Perform modular exponentiation 1024 bits, ~1 ms.- Generate a 1024 bits RSA key ~ 55ms.

Page 21: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 212 Nov 07

Symmetric Wide Area Network

• Synthetic network used for analysis and understanding.

• 5 sites, each of which connected to all other sites with equal bandwidth/latency links.

• One fully deployed site of 16 replicas; the other sites are emulated by one computer each.

• Total – 80 replicas in the system, emulated by 20 computers.

• 50 ms wide area links between sites.

• Varied wide area bandwidth and the number of clients.

Page 22: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 222 Nov 07

Write Update Performance

• Symmetric network.• 5 sites.

• BFT:• 16 replicas total.• 4 replicas in one site,

3 replicas in each other site. • Up to 5 faults total.

• Steward:• 16 replicas per site. • Total of 80 replicas (four

sites are emulated). Actual computers: 20.

• Up to 5 faults in each site.

• Update only performance (no disk writes).

Update Throughput

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30

Clients

Up

dat

es/s

ec

Steward 10Mbps

Steward 5Mbps

Steward 2.5Mbps

BFT 10Mbps

BFT 5Mbps

BFT 2.5Mbps

Update Latency

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30

Clients

Lat

ency

(m

s)Steward 10Mbps

Steward 5Mbps

Steward 2.5Mbps

BFT 10Mbps

BFT 5Mbps

BFT 2.5Mbps

Page 23: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 232 Nov 07

Read-only Query Performance

• 10 Mbps on wide area links.

• 10 clients inject mixes of read-only queries and write updates.

• None of the systems was limited by bandwidth.

• Performance improves between a factor of two and more than an order of magnitude.

• Availability: Queries can be answered locally, within each site.

Query Mix Throughput

0

50

100

150

200

250

300

350

400

450

500

0 10 20 30 40 50 60 70 80 90 100

Update ratio (%)

Act

ion

s/se

c

Steward

BFT

Query Mix Latency

0

50

100

150

200

250

300

350

400

450

500

0 10 20 30 40 50 60 70 80 90 100

Update ratio (%)

Lat

ency

(m

s)

Steward

BFT

Page 24: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 242 Nov 07

Wide-Area Scalability

• Selected 5 Planetlab sites, in 5 different continents: US, Brazil, Sweden, Korea and Australia.

• Measured bandwidth and latency between every pair of sites.

• Emulated the network on our cluster, both for Steward and BFT.

• 3-fold latency improvement even when bandwidth is not limited. (how come ?)

Planetlab Update Throughput

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30

Clients

Up

dat

es/s

ec

Steward

BFT

Planetlab Update Latency

0

200

400

600

800

1000

1200

1400

0 5 10 15 20 25 30

Clients

Lat

ency

(m

s)

Steward

BFT

Page 25: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 252 Nov 07

Non-Byzantine Comparison

• Based on a real experimental network (CAIRN).• Several years ago we benchmarked benign replication on this network.• Modeled on our cluster, emulating bandwidth and latency constraints,

both for Steward and BFT.

ISIPC

ISIPC4

TISWPC

ISEPC3

ISEPC

UDELPC

MITPC

38.8 ms1.86Mbits/sec

1.4 ms1.47Mbits/sec

4.9 ms9.81Mbits/sec

3.6 ms1.42Mbits/sec

100 Mb/s< 1ms

100 Mb/s<1ms

Virginia

Delaware

Boston

San Jose

Los Angeles

Page 26: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 262 Nov 07

CAIRN Emulation Performance

• Steward is limited by bandwidth at 51 updates per second.

• 1.8Mbps can barely accommodate 2 updates per second for BFT.

• Earlier experimentation with benign fault 2-phase commit protocols achieved up to 76 updates per sec. [Amir et. all 02].

CAIRN Update Throughput

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30

Clients

Up

dat

es/s

ec

Steward

BFT

CAIRN Update Latency

0

200

400

600

800

1000

1200

1400

0 5 10 15 20 25 30

Clients

Lat

ency

(m

s)

Steward

BFT

Page 27: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 272 Nov 07

Steward: Approach Tradeoffs

• Excellent performance– Optimized based on intertwined knowledge among global

and local protocols.• Highly complex

– Complex correctness proof.– Complex implementation.

• Limited model does not translate well to wide area environment needs– Global benign protocol over local Byzantine.– “What if the whole site is compromised?”– Partially addressed by implementing 4 different protocols:

Byzantine/Benign, Byzantine/Byzantine, Benign/Benign, Benign/Byzantine (Steward).

– “Different sites have different security profiles…”

Page 28: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 282 Nov 07

A Composable Approach

• Use clean two-level hierarchy to maintain scalability.– Clean separation of the local and global protocols.– Message complexity remains O(Sites2).

• Use state machine based logical machines to achieve a customizable architecture.– Free substitution of the fault tolerance method used in each

site and among the sites.

• Use efficient wide-area communication to achieve high performance. – Byzantine Link (BLink) protocol for inter logical machine

communication.

[SRDS 2007]

Page 29: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 292 Nov 07

Outline• Context and trends• Various levels of the insider threat problem• Service level problem formulation• Relevant background• Steward: First scalable Byzantine replication

– A bit on how it works– Correctness– Performance– Tradeoffs

• Composable architecture– A bit on how it works – BLink – Byzantine link protocol– Performance and optimization

• Theory hits reality– Limitation of existing correctness criteria– Proposed model and metrics.

• Summary

Page 30: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 302 Nov 07

Building a Logical Machine

• A single instance of the wide-area replication protocol runs among a group of logical machines (LMs), one in each site.

– Logical machines behave like single physical machines with respect to the wide-area protocol.– Logical machines send threshold-signed wide-area messages via BLink.

• Each logical machine is implemented by a separate instance of a local state machine replication protocol.

– Physical machines in each site locally order all wide-area protocol events:• Wide-area message reception events.• Wide-area protocol timeout events.

• Each logical machine executes a single stream of wide-area protocol events.

Wide-Area Protocol Wide-Area Protocol

Local-Area ProtocolLocal-Area Protocol

Site A Logical Machine A Site B Logical Machine B

BLink BLink

Page 31: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 312 Nov 07

A Composable Architecture

• Clean separation and free substitution– We can choose the local-area protocol deployed in each

site, and the wide-area protocol deployed among sites.– Trade performance for fault tolerance

• Protocol compositions: wide area / local area– Paxos on the wide area: Paxos/Paxos, Paxos/BFT– BFT on the wide area: BFT/Paxos, BFT/BFT

Page 32: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 322 Nov 07

A Composable Architecture

• Clean separation and free substitution– We can choose the local-area protocol deployed in each

site, and the wide-area protocol deployed among sites.– Trade performance for fault tolerance

• Protocol compositions: wide area / local area– Paxos on the wide area: Paxos/Paxos, Paxos/BFT– BFT on the wide area: BFT/Paxos, BFT/BFT

Page 33: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 332 Nov 07

An Example: Paxos/BFT

LM1

LM2 LM5

LM3 LM4

Leader site Logical Machine

Physical Machines BLink Logical Links

Client

Wide-Area Network

Logical Machine

Page 34: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 342 Nov 07

Paxos/BFT in Action

LM1

LM2 LM5

LM3 LM4

Update initiation from Client

Page 35: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 352 Nov 07

Paxos/BFT in Action

LM1

LM2 LM5

LM3 LM4

Local Ordering of Update,Threshold Signing of Update

Page 36: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 362 Nov 07

Paxos/BFT in Action

LM1

LM2 LM5

LM3 LM4

Forwarding of Update to Leader LM via BLink

Page 37: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 372 Nov 07

Paxos/BFT in Action

LM1

LM2 LM5

LM3 LM4

Local Ordering of Update,Threshold Signing of Proposal

Page 38: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 382 Nov 07

Paxos/BFT in Action

LM1

LM2 LM5

LM3 LM4

Dissemination of Proposal via BLink

Page 39: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 392 Nov 07

Paxos/BFT in Action

LM1

LM2 LM5

LM3 LM4

Local Ordering of Proposal, Threshold Signing of Accept

Page 40: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 402 Nov 07

Paxos/BFT in Action

LM1

LM2 LM5

LM3 LM4

Dissemination of Accepts via BLink

Page 41: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 412 Nov 07

Paxos/BFT in Action

LM1

LM2 LM5

LM3 LM4

Local Ordering of Accepts, Global Ordering of Proposal

Page 42: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 422 Nov 07

Paxos/BFT in Action

LM1

LM2 LM5

LM3 LM4

Reply to client

Page 43: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 432 Nov 07

The BLink Protocol

• Faulty servers can block communication into and out of logical machines.

• Redundant message sending is not feasible in wide-area environments.

• Our approach: BLink protocol– Outgoing wide-area messages are normally sent only once.– Four sub-protocols, depending on fault tolerance method in sending

and receiving logical machines:• (Byzantine, Byzantine), (Byzantine, benign) • (benign, Byzantine), (benign, benign)

– This talk: (Byzantine, Byzantine)

Page 44: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 442 Nov 07

Constructing Logical Links

• Logical links are constructed from sets of virtual links.• Each virtual link contains:

– Forwarder from the sending logical machine– Peer from the receiving logical machine.

• Virtual links are constructed via a mapping function.• At a given time, the LM delegates wide-area communication responsibility

to one virtual link on each logical link.• Virtual links suspected of being faulty are replaced according to a selection

order.

BLink Logical Link

Virtual Links

Sending Logical Machine

Receiving Logical Machine

Page 45: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 452 Nov 07

Intuition: A Simple Mapping

• Two important metrics:– Ratio of correct to faulty virtual links– Worst-case number of consecutive faulty virtual links

• With the simple mapping:– At least 1/3 of the virtual links are correct.– The adversary can block at most 2F consecutive virtual links.

• With a more sophisticated mapping:– At least 4/9 of the virtual links are correct.– The adversary can block at most 2F consecutive virtual links.

X

X

X

X

Sending LM Receiving LM

0

1

2

3

4

5

6

0

1

2

3

4

5

6

• F = 2, N = 3F+1 = 7• Servers 0 and 1 from Sending LM and

Servers 2 and 3 from Receiving LM faulty.• Mapping function:

– Virtual link i consists of the servers with id i mod N

• Selection order:– Cycle through virtual links in sequence

(1, 2, 3, ….)

Page 46: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Architectural Comparison

• Protocols were CPU-limited.

• Relative maximum throughput corresponds to the number of expensive cryptographic operations.

Update Latency vs. Clients50ms Diameter, 10Mbps Links

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 10 20 30 40

Number of Clients

Up

date

Late

ncy (

s)

Steward

Paxos/Paxos

Paxos/BFT

BFT/Paxos

BFT/BFT

Update Throughput vs. Clients 50ms Diameter, 10Mbps Links

0

10

20

30

40

50

60

70

80

90

0 10 20 30 40

Number of Clients

Up

date

Th

rou

gh

pu

t (u

pd

ate

s/s

ec) Steward

Paxos/Paxos

Paxos/BFT

BFT/Paxos

BFT/BFT

ProtocolThreshold RSA Sign

RSA Sign

Steward 1 3

Paxos/Paxos 0 2+(S-1)

BFT/Paxos 0 3+2(S-1)

Paxos/BFT 1 3+2(S-1)

BFT/BFT 2 4+4(S-1)

Page 47: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Architectural Comparison

• Protocols were CPU-limited.

• Relative maximum throughput corresponds to the number of expensive cryptographic operations.

Update Latency vs. Clients50ms Diameter, 10Mbps Links

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 10 20 30 40

Number of Clients

Up

date

Late

ncy (

s)

Steward

Paxos/Paxos

Paxos/BFT

BFT/Paxos

BFT/BFT

Update Throughput vs. Clients 50ms Diameter, 10Mbps Links

0

10

20

30

40

50

60

70

80

90

0 10 20 30 40

Number of Clients

Up

date

Th

rou

gh

pu

t (u

pd

ate

s/s

ec) Steward

Paxos/Paxos

Paxos/BFT

BFT/Paxos

BFT/BFT

• Paxos/BFT vs. Steward– Same level of fault tolerance– Paxos/BFT locally orders all wide-area

protocol events, Steward orders events only when necessary.

– Paxos/BFT achieves about 2.5 times lower throughput than Steward.

– Difference is the cost of providing customizability!

Page 48: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 482 Nov 07

Performance Optimizations

• Computational Bottlenecks: – 1. Ordering all message reception events.– 2. Threshold signing outgoing messages.

• Solutions:– Aggregate local ordering: batching– Aggregate threshold signing: Merkle trees

• Use a single threshold signature for many outgoing messages.

• Outgoing messages contain additional information needed to verify the threshold signature.

Page 49: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Merkle Hash Trees

D(m1) D(m2) D(m3) D(m4) D(m5) D(m6) D(m7) D(m8)

D(N1 || N2)

N5 N6 N7 N8 N1 N2 N3 N4

D(N1-2 || N3-4)

Root hash: D(N1-4 || N5-8) Threshold Sign

N1-2 N3-4 D(N3 || N4) D(N5 || N6)N5-6 N7-8 D(N7 || N8)

N1-4 D(N5-6 || N7-8)N5-8

N1-8

• Use a single threshold signature for many outgoing wide-area messages.

• Each leaf contains the digest of a message to be sent.

• Each interior node contains the digest of the concatenation of its two children.

• Threshold signature is computed on the root hash.

Page 50: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Example: Sending Message m4

D(m1) D(m2) D(m3) D(m4) D(m5) D(m6) D(m7) D(m8)

D(N1 || N2)

N5 N6 N7 N8 N1 N2 N3 N4

D(N1-2 || N3-4)

Root hash: D(N1-4 || N5-8)

N1-2 N3-4 D(N3 || N4) D(N5 || N6)N5-6 N7-8 D(N7 || N8)

N1-4 D(N5-6 || N7-8)N5-8

N1-8

• Outgoing message contains additional information needed to verify the signature.– The message itself

– The siblings of the nodes on the path from m4 to the root hash

– The signature on the root hash

• To verify, use the digests to reconstruct the root hash, then verify the threshold signature.

Root hashm4 || N3 || N1-2 || N5-8Send:

Page 51: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Performance of Optimized Systems

Protocol Rounds

Protocol Wide Area

Local Area Total

Steward 2 4 6

Paxos/Paxos 2 6 8

BFT/Paxos 3 8 11

Paxos/BFT 2 11 13

BFT/BFT 3 15 18

Update Latency vs. Clients50ms Diameter, 10Mbps Links

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 50 100 150

Number of Clients

Up

date

Late

ncy (

s)

Paxos/Paxos

OptimizedStewardPaxos/BFT

BFT/Paxos

BFT/BFT

• Maximum throughput limited by wide-area bandwidth and impacted by number of wide-area rounds.

• Optimizations effectively eliminate the computational bottleneck associated with local ordering.

Update Throughput vs. Clients50ms Diameter, 10 Mbps Links

0

50

100

150

200

250

300

350

400

0 50 100 150

Number of Clients

Update

Thro

ughput

(update

s/s

ec)

Paxos/Paxos

Optimized Steward

Paxos/BFT

BFT/Paxos

BFT/BFT

Page 52: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Update Throughput vs. Clients50ms Diameter, 10 Mbps Links

0

50

100

150

200

250

300

350

400

0 50 100 150

Number of Clients

Update

Thro

ughput

(update

s/s

ec)

Paxos/Paxos

Optimized Steward

Paxos/BFT

BFT/Paxos

BFT/BFT

Performance of Optimized Systems

Update Latency vs. Clients50ms Diameter, 10Mbps Links

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 50 100 150

Number of Clients

Up

date

Late

ncy (

s)

Paxos/Paxos

OptimizedStewardPaxos/BFT

BFT/Paxos

BFT/BFT

• Maximum throughput limited by wide-area bandwidth and impacted by number of wide-area rounds.

• Optimizations effectively eliminate the computational bottleneck associated with local ordering.

• Paxos/BFT vs. Steward– Paxos/BFT and Steward achieve

almost identical maximum throughput.

• BFT/BFT vs. Paxos/BFT– BFT/BFT offers stronger fault tolerance

properties than Paxos/BFT and achieves roughly 75% of the throughput.

Page 53: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 532 Nov 07

Outline• Context and trends• Various levels of the insider threat problem• Service level problem formulation• Relevant background• Steward: First scalable Byzantine replication

– A bit on how it works– Correctness– Performance– Tradeoffs

• Composable architecture– A bit on how it works – BLink – Byzantine link protocol– Performance and optimization

• Theory hits reality– Limitation of existing correctness criteria– Proposed model and metrics

• Summary

Page 54: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 542 Nov 07

Red Team Attack• Steward under attack

– Five sites, 4 replicas each.

– Red team had full control (root) over five replicas, one in each site. Full access to source code.

– Both representative and stand-by replicas were attacked.

– Compromised replicas were injecting:

• Loss (up to 20% each) • Delay (up to 200ms) • Packet reordering• Fragmentation (up to 100 bytes)• Replay attacks

– Compromised replicas were running modified servers that contained malicious code.

4

51

2

3

Page 55: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 552 Nov 07

Red Team Results• The system was NOT compromised!

– Safety and liveness guarantees were preserved.

– The system continued to run correctly under all attacks.

• Most of the attacks did not affect the performance.• The system was slowed down when the representative of the

leading site was attacked.– Speed of update ordering was slowed down by a factor of 5.

• Big problem: – A better attack could slow the system down by a factor of 100.

– Still ok in terms of liveness criterion.

• Main lesson:

– Correctness criteria used by the community are not good enough for scalable systems over wide area networks.

Page 56: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 562 Nov 07

New Attack Model and Metrics

• In addition to existing safety and liveness.• Performance attack:

– Once the adversary cannot compromise safety and cannot stop the system, the next best thing is to slow it down below usefulness.

• Performance metric:– Can we guarantee a certain average fraction of the “clean”

performance while under attack– Assumptions: correct nodes can freely communicate (non

resource consumption denial of service); “clean” performance is defined as performance of “best” algorithm.

• Response metric:– How fast can we get to the above average fraction

Can we design algorithms that achieve these metrics?

Page 57: Yair Amir 1 2 Nov 07 The Insider Threat in Scalable Distributed Systems: Algorithms, Metrics, Gaps Distributed Systems and Networks lab Johns Hopkins University.

Yair Amir 572 Nov 07

Summary

• Insider threat problem is important on several levels.

• For the service level– Algorithmic engines for scalable solutions seem on

the right track– But still a gap between algorithmic engines and

practical systems (e.g. management).

• Solutions for network and client levels less mature

• What fits small scale systems does not necessarily fit large scale systems, especially on the wide area.– New attack models– New metrics– New algorithmic approaches