Outline for Today’s Lecture Administrative: Objective: –Peer-to-peer file systems Mechanisms...

Post on 04-Jan-2016

212 views 0 download

Transcript of Outline for Today’s Lecture Administrative: Objective: –Peer-to-peer file systems Mechanisms...

Outline for Today’s Lecture

Administrative:

Objective: – Peer-to-peer file systems

• Mechanisms employed• Issues• Some examples

The Security EnvironmentThreats

Security goals and threats

Intruders

Common Categories1. Casual prying by nontechnical users2. Snooping by insiders3. Determined attempt to make trouble (or

personal gain)4. Commercial or military espionage

Accidental Data Loss

Common Causes

1. Acts of God- fires, floods, wars

2. Hardware or software errors- CPU malfunction, bad disk, program bugs

3. Human errors- data entry, wrong tape mounted, rm *

Reliability Mechanisms(Redundancy)

• Replication of data, geographically distributed– As simple as backups– First-class replication (Coda)– Voting schemes

• Error detection-correction– Erasure codes (encode n blocks into >n blocks,

requiring r blocks to recover original content of original n)

– Parity bits, checksums

Basics of Cryptography

Relationship between the plaintext and the ciphertext

• Secret-key crypto called symmetric-key crypto– If keys are long enough there are OK

algorithms– Secret key must be shared by both parties

Secret-Key Cryptography

Public-Key Cryptography• All users pick a public key/private key pair

– publish the public key– private key not published

• Public key is (usually*) the encryption key• Private key is (usually*) the decryption key

• RSA

One-Way Functions• Function such that given formula for f(x)

– easy to evaluate y = f(x)• But given y

– computationally infeasible to find x

• Example: Hash functions – produce fixed size result– MD5– SHA

Digital Signatures

(b)• Computing a signature block– Hash is fixed length – apply private key as encryption key*

• What the receiver gets– Use public key as decryption key* on signature block to get hash back– Compute the hash of document part– Do these match?

• Assumes E(D(x)) = x when we usually want D(E(x))=x• Public key must be known by receiver somehow – certificate

Distributing Public Keys

• Certificate authority– Trusted 3rd party– Their public key known

• Send name and public key, digitally signed by ca

Byzantine Generals ProblemReaching consensus among geographically separated

(distributed) players if some of them are compromised.• Generals of army units need to agree on a common

plan of attack (consensus)• Traitorous generals will lie (faulty or malicious)• Generals communicate by sending messages directly

general-to-general through runners between units (they won’t all see the same intell)

• Solutions are for all loyal generals to reach consensus, in spite of liars (up to some % of generals being bad)

Solution with Digital Sigs

• Iteratively execute “rounds” of message exchanges

• As each message passes by, the receiving general digitally signs it and forwards it on.

• Each General maintains the set of orders received

• Inconsistent orders indicate traitor

Peer-to-peer File Systems

Problems with Centralized Storage Server Farms

• Weak availability:– Susceptible to point failures and DoS attacks

• Management overhead– Data often manually partitioned to obtain scale– Management and maintenance large fraction of cost

• Per-application design (e.g., GoogleOS)– High hurdle for new applications

• Don’t leverage the advent of powerful clients– Limits scalability and availability

Slides from Shenker and Stoica, UCB

What is a P2P system?

• A distributed system architecture:– No centralized control– Nodes are symmetric in function

• Large number of (perhaps) server-quality nodes• Enabled by technology improvements

Node

Node

Node Node

Node

Internet

Slides from Shenker and Stoica, UCB

P2P as Design Style

• Resistant to DoS and failures– Safety in numbers, no single point of attack or

failure

• Self-organizing– Nodes insert themselves into structure– Need no manual configuration or oversight

• Flexible: nodes can be– Widely distributed or co-located– Powerful hosts or low-end PCs– Trusted or unknown peers

Slides from Shenker and Stoica, UCB

Issues

• Goal is to have no centralized server and to utilize desktop-level idle resources.

• Trust – privacy, security, data integrity– Using untrusted hosts

• Availability – – Using lower “quality” resources– Using machines that may regularly go off-line

• Fairness – freeloaders who just use and don’t contribute any resources– Using voluntarily contributed resources

Issues

• Goal is to have no centralized server and to utilize desktop-level idle resources.

• Trust – privacy, security, data integrity– Using untrusted hosts -- crypto solutions

• Availability – – Using lower “quality” resources -- replication– Using machines that may regularly go off-line

• Fairness – freeloaders who just use and don’t contribute any resources– Using voluntarily contributed resources – use economic

incentives

What Interface?

• Challenge for P2P systems: finding content– Many machines, must find one that holds file

• Essential task: Lookup(key)– Given key, find host (IP) that has file with that key

• Higher-level interface: Put()/Get()– Easy to layer on top of lookup()– Allows application to ignore details of storage

• System looks like one hard disk

– Good for some apps, not for others

Slides from Shenker and Stoica, UCB

Distributed Hash Tables vs Unstructured P2P

• DHTs good at:– exact match for “rare” items

• DHTs bad at: – keyword search, etc. [can’t construct DHT-based Google]– tolerating extreme churn

• Gnutella etc. good at:– general search– finding common objects– very dynamic environments

• Gnutella etc. bad at:– finding “rare” items

Slides from Shenker and Stoica, UCB

DHT Layering

Distributed hash table

Distributed application

get (key) data

node node node….

put(key, data)

Lookup service

lookup(key) node IP address

• Application may be distributed over many nodes• DHT distributes data storage over many nodes

Slides from Shenker and Stoica, UCB

Two Crucial Design Decisions

• Technology for infrastructure: P2P– Take advantage of powerful clients– Decentralized– Nodes can be desktop machines or server quality

• Choice of interface: Lookup and Hash Table– Lookup(key) returns IP of host that “owns” key– Put()/Get() standard HT interface– Some flexibility in interface (no strict layers)

Slides from Shenker and Stoica, UCB

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

A DHT in Operation: Overlay

Slides from Shenker and Stoica, UCB

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

A DHT in Operation: put()

put(K1,V1)

Slides from Shenker and Stoica, UCB

put(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

A DHT in Operation: put()

Slides from Shenker and Stoica, UCB

(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

A DHT in Operation: put()

Slides from Shenker and Stoica, UCB

get(K1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

A DHT in Operation: get()

Slides from Shenker and Stoica, UCB

get(K1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

A DHT in Operation: get()

Slides from Shenker and Stoica, UCB

Key Requirement

• All puts and gets for a particular key must end up at the same machine– Even in the presence of failures and new

nodes (churn)

• This depends on the DHT routing algorithm– Must be robust and scalable

Slides from Shenker and Stoica, UCB

DHTs

• Examples– CAN– Chord– Pastry– Tapestry– In BitTorrent and

Coral CDN

• Keyspace partitioning – ownership of keys split among participating nodes– Node has ID and owns keys

“close” to its ID by some distance function

• Hash filename to key• Routing in the overlay

– To node with a closer ID or else it’s mine

PASTRY Overlay Network

k

Route k

• Nodes assigned 1-dimensional IDs in hash space at random (e.g., hash on IP address)

• Each node has log n neighbors & maintains routing table

• Lookup with fileID k is routed to live node with nodeID close to k

PAST• Rice Univ. and MSR Cambridge UK• Based on Internet-based overlay• Not traditional file system semantics • File is associated with fileID upon insertion into

PAST and can have k replicas– fileID is secure hash of filename, owner’s public key,

random salt #– K nodes whose nodeIDs are “closest” to msb of fileID

• Instead of directory lookup, retrieve by knowing fileID

• DHash replicates each key/value pair at the nodes after it on the circle

• It’s easy to find replicas

• Put(k,v) to all

• Get(k) from closest

N32

N10N5

N110

N99

N80N60

N20K19

K19

N40 K19

Data Availability via Replication

Slides from Shenker and Stoica, UCB

N40

N10

N5

N20

N110

N99

N80

N60

N50

Block19

N68

Copy of19

First Live Successor Manages Replicas

Slides from Shenker and Stoica, UCB

Other P2P FS examples

Farsite

• Microsoft Research – intended to look like NTFS• Desktops on LAN (not Internet-scale)• 3 roles: client, member of directory group, file host• Directory metadata managed by Byzantine replication• File hosts store encrypted replicated file data• Directory group stores secure hash of content to

validate authenticity of file• Multiple namespace tree roots with namespace

certificate provided by CA• File performance by local caching under leasing

system

LOCKSS

• Lots of Copies Keeps Stuff Safe (HPLabs, Stanford, Harvard, Intel)

• Library application for L-O-N-G term archival of digital library content (deal with bit rot, obsolescence of format, malicious users).

• Continuous audit and repair of replicas based on taking polls of sites with copies of content (comparing digest of content and repairing my copy if it differs from consensus).

• Rate-limited and churn of voter lists to deter attackers from compromising enough copies to force a malicious “repair”.

Sampled Poll• Each peer holds for every preserved Archival Unit

– reference list of peers it has discovered– friends list of peers its operator knows externally– history of interactions with others (balance of contributions)

• Periodically (faster than rate of storage failures)– Poller takes a sample of the peers in its reference list– Invites them to vote: send a hash of their replica

• Compares votes with its local copy– Overwhelming agreement (> 70%) Sleep blissfully– Overwhelming disagreement (< 30%) Repair– Too close to call Raise an alarm

• To repair, the peer gets the copy of somebody who disagreed and then reevaluates the same votes

Churn of Voter Lists

• Reference List– Take out voters, so that the next poll is based ondifferent group– Replenish with some “strangers” and some “friends”

• Strangers: Accepted nominees proposed by voters who agree with poll outcome

• Friends: From the friends list• The measure of favoring friends is called friend bias• History

– Poller owes its voters a vote (for their future polls)– Detected misbehavior penalized in victim’s history