TECS Week, Pune, 5-9 January 2009 1 The User is the Computer: From Decentralized Systems to Social...

TECS Week, Pune, 5-9 January 2009 1

The User is the Computer: From Decentralized Systems to Social Computing

Peter Druschel

Course overview

Today’s computer systems augment a wide range of human activity, including cooperation among individuals, organizations, businesses

This course deals with some of the technology underlying this trend, as well as the challenges and opportunities that come with it

2TECS Week, Pune, 5-9 January 2009

3

Course overview1. Decentralized systems (~2 hours)

Overlays, object lookup, routing Shared state and coordination Applications Challenges

2. Accountability for distributed systems (~1.5 hours) Why and what is accountability? How can we implement it? How well does it work?

3. Social computing and applications (~1.5 hours) Exploiting social networks for distributed computing Example: enhancing Web search Example: thwarting unwanted communication

TECS Week, Pune, 5-9 January 2009

4

Credits

Group members: Andreas Haeberlen Jeff Hoye Petr Kuznetsov Alan Mislove Animesh Nandi Ansley Post Atul Singh Jim Stewart

Colleagues: Krishna Gummadi, MPI-SWS Rodrigo Rodrigues, MPI-SWS Anne-Marie Kermarrec, INRIA Ant Rowstron, MSRC Miguel Castro, MSRC Ion Stoica, UC Berkeley John Kubiatowicz, UC Berkeley Frank Dabek, Google Y. Charlie Hu, Purdue

Funding: Max Planck Society National Science Foundation Intel Research Microsoft Research Texas ATP


5

Decentralized (p2p) systems

Distributed computer system with Symmetric components Decentralized control and state Self-organization

Promise “Organic” growth Low barrier to deployment Resilience to faults, attack Resource abundance, diversity


Partly vs. fully decentralized systems

Partly decentralized systems have a dedicated controller node Organic growth, abundant/diverse resources Limited scalability, resilience

Fully decentralized systems Some fully decentralized systems have

powerful supernodes Increased efficiency, but reduced resilience


7

Decentralized systems: deployment

Self-organization enables deployment in dynamic networks

Ad hoc wireless networks Mobile wireless devices

Delay-tolerant networks Devices with intermittent connectivity

Overlay networks (most common) Internet-connected devices


8

Outline

1. Decentralized systems: state-of-the-art Overlays, object lookup, routing Example: Pastry Shared state and coordination: DHTs

and Scribe/DOLR Challenges Putting it all together: ePOST

2. Accountability for distributed systems3. Social computing and applications


9

Overlay networks

Internet

Overlay links rely on unicast service in the Internet Topology can be “structured” or “unstructured”

Overlaynetwork


10

Why overlays?

Overcome limitations of Internet architecture group communication, content-oriented networking enable innovation

Low barrier to deployment resource sharing enables “organic” growth self-organization simplifies operation

Robustness to faults, attacks, unexpected workloads decentralization resource diversity, wealth


11

Decentralized (p2p) systems: What do they enable?

Cooperative computing Content sharing/distribution (Kazaa, BitTorrent) Streaming media (SOPcast, PPLive, Joost, iPlayer) Telephony (Skype), popular scientific computing Low barrier to deployment, market entry: Innovation

Digital preservation Diversity, abundance of resources provides

durability

Autonomous distributed systems Self-managing networks of little or mobile devices Decentralization is necessary for autonomy


Popular decentralized systems

File sharing, bulk content distribution BitTorrent, eDonkey dominate Internet

traffic Streaming media distribution

PPLive, CoolStreaming, Joost, iPlayer, LiveStation

Skype Volunteer computing

BOINC apps perform 1 PFLOPS on average


13

Decentralized (p2p) systems: State-of-the-art

Decentralized state management Object location Replication Availability, Durability Load balancing

Efficient, consistent lookup routing in Internet overlays

Efficient cooperative content distribution Dependable storage from untrusted components Security: secure routing, content integrity, incentives


14

Key problem: Object location

Objects partitioned among participating nodes Mapping from objects to nodes is dynamic

Unicast routing doesn’t help don’t know who to talk to don’t know where to store objects want to address (data) objects, not nodes !


15

Solution 1: Unstructured overlay

No assumptions about overlay graph structure New node is assumed to know one participant Performs random walk to find more nodes to attach

to

Object placement Inserting node or random walk target May leave references along random path

Object lookup Scoped flooding or random walk

Examples: Gnutella, Kazaa, eDonkey


Unstructured object location


I inserts an object Leave reference on R

S floods a request Finds reference at R Tradeoff between scalability and recall Popular object easy to find

17

Solution 2: structured overlay networks

Overlay graph conforms to a specific graph structure

Key-based routing primitive (KBR):

KBR(M, X): route message M to the live node that is currently responsible for the object associated with numerical id X

Basis for content-oriented networking

Examples: Chord, CAN, Pastry, Tapestry, Bamboo, Kademlia, SkipNet, Kelips, Accordeon, etc.


18

Structured vs. unstructured overlays

Unstructured Simple overlay

formation Tradeoff between

recall and efficiency Robust to churn

Structured Pre-determined

routes Efficient identity

lookup, tree formation

More susceptible to churnCan be combined:

Stable nodes form structure Others attach randomly


19

Outline





20

Pastry: Identifier space

key

Consistent hashing [Karger et al. ‘97]

160 bit circular id space

nodeIds (uniform random)

keys (uniform random)

Each key is mapped to the live node with “closest” nodeId

nodeIds

O2160-1


21

Pastry: lookup

X

KBR(M, X)

Msg with key X is routed to live node with nodeId closest to X

Problem:

complete routing table not scalable

O 2160-1


22

Pastry: prefix-based routing

Properties• log16 N steps • O(log N) state

d46a1c

KBR(M, d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1


23

Pastry: routing table (node 65a1fcx)0x

1x

2x

3x

4x

5x

7x

8x

9x

ax

bx

cx

dx

ex

fx

60x

61x

62x

63x

64x

66x

67x

68x

69x

6ax

6bx

6cx

6dx

6ex

6fx

650x

651x

652x

653x

654x

655x

656x

657x

658x

659x

65bx

65cx

65dx

65ex

65fx

65a0x

65a2x

65a3x

65a4x

65a5x

65a6x

65a7x

65a8x

65a9x

65aax

65abx

65acx

65adx

65aex

65afxlog16 N

rows

Row 0

Row 1

Row 2

Row 3


24

Pastry: prefix-based routing

Similar to Plaxton Trees [Plaxton et al. ‘97]

But added Neigbor sets for consistency, robustness,

security Consistent routing Self-organization (dynamic joins, fault

tolerance) Proximity neighbor selection for efficiency Secure routing to defend against malicious

nodes


Neighbor sets

Stabilization protocol ensures eventual consistency aids routing consistency enables secure routing localizes fault detection within neighbor sets enables application-specific local coordination (e.g., object replica management)

AB


26

Challenge: Inconsistent routing

Routing consistency: “At any time, at most one

overlay node accepts messages with a given key”

Necessary for consistency of mutable data

Complicated by Internet routing anomalies

key

New node N has informed X, but not yet Y of its arrival

N

X

Y


28

Challenge: Self-organization

Initializing and maintaining node state (overlay construction and maintenance)

Node addition Node departure (failure)


29

Pastry: Node join

d46a1c

KBR(Join,d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

New node: d46a1c


30

Pastry: Node departure (failure)

Neighbor set members exchange keep-alive messages (failure detection, neighbor set stabilization)

Neighbor set repair (eager): request set from farthest live node in set

Routing table repair (lazy): get table from peers in the same row, then higher rows


31

Challenge: Overlay route efficiency

Nodes close in id space, but far away in Internet

Goal: choose routing table entries that yield few hops and low latency

CA-T1CCIArosUtah

CMUMIT

MA-CableCisco

Cornell

NYU

OR-DSL20x

80x

89x81x


32

Proximity neighbor selection (PNS)

Assumptions: scalar proximity metric (e.g., RTT) a node can probe distance to any other

node

Proximity invariant:

Each routing table entry refers to a node close to the local node (in the physical network), among all nodes with the appropriate nodeId prefix.


33

PNS: Routes in delay space

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

NodeId space

d467c4

65a1fcd13da3

d4213f

d462ba

Delay space


34

PNS Properties

1) Low-delay routes: Average delay stretch, relative to IP, is a small constant (1.3 - 2.2) and can be derived from the physical network’s delay distribution

2) Route convergence: Routes of messages sent by nearby nodes with the same key converge at a node near the source nodes

Details in [Castro et al. MSR-TR-2002-82]


35

Outline





Sharing state: Distributed hash tables (DHT)

Hashtable API: put(obj,key), obj <- get(key)

Layered on top of a structured overlay Scalability, Robustness Persistent storage High availability

Examples: Chord/CFS, Pastry/PAST, Bamboo, Kelips, Kademlia


37

Distributed hash table (DHT)

k6,v6

k1,v1

k5,v5

k2,v2

k4,v4

k3,v3

nodes

Operations:insert(k,v)v=lookup(k)

Overlay

network

Overlay

network

• Structured overlay maps keys to nodes• Decentralized and self-organizing• Scalable, robust


38

DHT: Insertion and replication

Storage Invariant: Tuple replicas are stored on r nodes with nodeIdsclosest to key

key

Insert(key,value,r)

r=4


39

DHT: Lookup

Key Object located in log16 N steps (expected)

usually locates replica nearest client C

Lookup(key)

r replicasC


40

DHT: Dynamic caching

Nodes cache tuples in the unused portion of their allocated disk space

Tuples cached on nodes along the route of lookup and insert messages

Goals: maximize query xput for popular tuples balance query load improve client latency


41

DHT: Dynamic caching

Key

Lookup(key)

Delay space


42

Coordination: Decentralized group management

E.g., SCRIBE [Rowstron et al., JSAC ’02] Spanning trees embedded in

structured overlay Multicast, anycast primitives Scalable: large numbers of groups,

members, wide range of members/group, dynamic membership


43

Cooperative group communication

n2

n1

n0g:n1,n2

g:n3,n4

g

nodes

Operations:create(g)join(g)leave(g)multicast(g,m)anycast(g,m)

• groupId g mapped to n0• decentralized membership • robust, scalable

n3g

n4g


44

Scribe

groupId

Join(groupId)

Delay space


45

Structured overlay APIs

KBR

DHT SCRIBE / DOLR

[Dabek et al., IPTPS ’05]

route(M, X)

insert(k,v)v=lookup(k)

create(g)join(g)leave(g)multicast(g,m)anycast(g,m


46

Outline


and Scribe/DOLR Challenges: malicious participants Putting it all together: ePOST



47

Malicious participants: threats

Prevent messages from reaching root drop or corrupt bias routing tables

Cause objects to be placed on faulty nodes

choose nodeId values use many identities (Sybil attack) impersonate root

key

A

B

C

F

IJL


48





A

B

C

F

IJL


49





keyAB

C

F

IJL


50





keyA

E

D

B

C

FG

HI

JKL


51





key

A

B

C

F

IJ

KL

“F is my neighbor”


52

Securing routing

Secure node identifier assignment thwarts Sybil and id choosing attacks

Secure membership protocol Prevents routing table bias attacks

Secure routing primitive Prevents root impersonation

Can tolerate up to 25% malicious nodes

key

A

B

C

F

IJ


53

Securing routing

Secure routing primitive Prevents root impersonation

key

A

B

C

F

IJ

KL

“F is my neighbor”

M

[Castro et al., OSDI’ 02 ]


54

Other threats

Freeloading: incentives mechanisms Data corruption: crypto Denial-of-service Several defenses needed


55

Outline


and Scribe/DOLR Challenges: malicious participants Putting it all together: ePOST



56

Putting it all together: ePOST

Decentralized, cooperative email service Based on users’ desktops/notebooks Messages transmitted and stored

securely Standard mail clients (IMAP/POP) Interoperability via SMTP Nodes may fail arbitrarily Users only trust their local node

[Mislove et al., EuroSys 06]


HPDC-15, June 21, 2006

Why Email?

Demanding user expectations Privacy Integrity Durability Availability

Goal: Demonstrate that a decentralized, cooperative email service can be built that users can entrust with their production email

58

ePOST: Single-copy store

Emails split into MIME components, stored in the DHT

Using its content-hash as the key Self-certifying (integrity) Identical items stored once Convergent encryption

Items replicated thrice for availability Additional erasure-coded

replicas for durability (Glacier [Haeberlen et al., NSDI’05])

Header

Body

Attachment

Attachment

Email Data

59

ePOST: Single-writer log

Per-user metadata (folders, inbox, etc.) stored as an update log

All updates performed by owner Stored in the DHT

Entries form a hash chain Log head is signed with owner’s key Periodic snapshots stored in logHeader

Body

Attachment

Attachment

Email DataLog HeadLog Entry

Insert msg x

Mark msg y read

Insert msg y

60

ePOST: Message Delivery

Message notifications are signed and contain encrypted headers and keys to the message’s components

Each user has a Scribe group Node joins user’s group if it has a message for

the user User announces to the group when online Pending notifications delivered

ePOST: Security

Users have certificates (public key, node id)

Secure communication (SSL) All content stored in the DHT is protected

Authenticity Integrity Privacy

Incentives to prevent freeloading (Scrivener [Nandi, Middleware’05])

Secure KBR


62

Deployment and Experience Rice / MPI rings: reserved for internal members PlanetLab ring: open membership ring, backed by

Planetlab Usage

26 internal users (16 used ePOST as primary email) over more than two years

40 DHT nodes (Rice / MPI ring), 350 nodes (PlanetLab ring)

Several times, ePOST was available when Rice or MPI-SWS email had failed

No system-wide outages after initial testing phase Shut down due to overhead of tracking spam filtering

63

Decentralized systems challenges

Maintaining mutable distributed state remains hard Fortunately, lots of useful applications don’t

require it

Incentives are basis for cooperation Strategy-prove protocols (e.g. tit-for-tat) Accountability

Need to control membership Certified identities (background check or fee) proof-of-work, social networks?


64

Decentralized systems challenges

Need to protect data Durability requires non-decreasing membership Scalable storage, high availability, churn

resilience: pick two [Blake&Rodrigues, HotOS-IX]

Manageability Self-organization reduces administrative effort Hardware management is decentralized BUT: Evidence that lack of centralized control

may make it difficult to manage system-wide disruptions


65

Outline

1. Decentralized systems: state-of-the-art2. Accountability for distributed systems

Why accountability? What is accountability? How can we implement it? How well does it work? Accountable virtual machines

3. Social computing and applications


PODC, Toronto, 18 August 2008 66

Byzantine faults occur in practice

Not all faults cause a node to stop The faulty node continues to operate, but its

behavior deviates from that of a correct node

Examples: Hardware malfunction Misconfiguration Software error External security attack Intentional software modification


Example: LAX airport outage

Aug 2007: 17,000 passengers stranded at LAX Cause: intermittent fault of a network card

Admin


Example: Botnets in the Internet

Compromised computer targets different domain Admin A must localize fault, then convince admin B that

her machine is faulty

Domain A Domain B

Administrative domain


Example: Insider attack

Mar 2002: UBS PaineWebber admin disrupts trade for days to weeks

Difficult to detect, defuse logical bombs

Administrative domain


Why is detecting faults difficult?

How to detect faults? How to identify the faulty node? How to convince others that a node is (not) faulty?

Incorrectmessage

Responsibleadmin

71

Learning from the 'offline' world Relies on accountability Example: Banks

Record can be used to (manually) detect, identify and convince

Is accountability useful in distributed systems? Is it practical?

Requirement Solution

Commitment Signed receipts

Tamper-evident record

Double-entry bookkeeping

Inspections Audits



What does accountability mean?

Accountability := tamper-evident record + automated, reliable fault detection


Is accountability alone useful?

No, if faults are severe and irrecoverable need byzantine fault tolerance (see Lorenzo‘s

course)

Yes, for systems that provide „best-effort“ service systems that assume crash failures systems that mask severe/irrecoverable faults

Accountability reliably detects and localizes faults provides incentives to avoid faults builds trust, reputation


Which Systems can benefit?

Internet services (BGP, DNS, NTP, NNTP, SMTP)

Web services Content distribution networks (CDN) Grid computing Peer-to-peer systems Multi-player games Cloud computing


Butler Lampson on accountability

"Don’t forget that in the real world, security depends more on police than on locks, so detecting attacks, recovering from them, and punishing the bad guys are more important than prevention." -- Butler Lampson, "Computer Security in the Real World", ACSAC 2000

76

Outline



3. What’s next? Social computing and applications



Ideal accountability

Whenever a node is faulty in any way, the system generates a proof of misbehavior against that node

Fault := Node deviates from expected behavior Our goal is to automatically

detect faults identify the faulty nodes convince others that a node is (or is not) faulty

Can we build a system that provides the following guarantee?


Can we detect all faults? Problem: Faults that

affect only a node's internal state

Would require online trusted probes at each node

Focus on observable faults: Faults that affect a correct node

Can detect observable faults without requiring trusted components

AA

X

CC

100101011000101101011100100100

0


Can we always get a proof?

Problem: He-said-she-said Three possible causes:

A never sent X B refuses to acknowledge X X was lost by the network

Cannot get proof of misbehavior! Generalize to verifiable evidence:

a proof of misbehavior, or a challenge that a faulty node cannot answer

What if the challenged node does not respond? Does not prove a fault, but node is suspected until it

responds

AA

X

BB

CC

?

I sent X!

I neverreceived

X!

?!


Practical accountability We propose the following requirement for an

accountable distributed system:

This is useful Any (!) fault that affects a correct node is

eventually detected and linked to a faulty node

It can be implemented in practice

Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node

81

Outline






Adds accountability to a given system Implemented as a library Provides tamper-evident record Detects faults via state-machine replay

Assumptions:

An implementation: PeerReview

1. Nodes can be modeled as deterministic state machines

2. Nodes have reference implementations of the state machines

3. Correct nodes can eventually communicate4. Nodes can sign messages


M

PeerReview from 10,000 feet All nodes keep logs of

their inputs & outputs Including all messages

Each node has a set of witnesses, which audit the node periodically

If the witnesses detect misbehavior, they

generate evidence make the evidence

avai-lable to other nodes

Other nodes check evi-dence, report fault

A's log

B's log

AA

BB

M

CCDD

EE

A's witnesses

M

A is faulty

84

PeerReview detects tampering

A B

Message Has

h ch

ain

Send(X)

Recv(Y)

Send(Z)

Recv(M)

H0

H1

H2

H3

H4

B's log

ACK

What if a node modifies its log entries?

Log entries form a hash chain

Inspired by secure histories [Maniatis02]

Hash is included with every message authenticator Node commits to its current state Changes are evident

Hash(log)

Hash(log)



PeerReview detects omission What if a node omits

log entries? While inspecting A’s

log, A’s witnesses send msg authenticators signed by B to B’s witnesses

Thus, witnesses learn about all messages their node has ever sent or acknowleged

Omission of a message from the log is a fault

A's log

AA

BB

A's witnesses

B's witnesses

MB

MB M

B

MB

MB

MB

86

PeerReview detects inconsistencies

What if a node keeps multiple logs? forks its log?

Witnesses check whether all msg authenticators form a single hash chain

Two authenticators not connected by a log segment indicate a fault

H3'

Read X

H4'

Not found

Read Z

OK

Create X

H0

H1

H2

H3

H4

OK

"View #1""View #2"



Module B

PeerReview detects faults How to recognize

faults? Assumption:

Nodes can be modeled as deterministic state machines

To audit a node, witness Fetches signed log Replays inputs to a

trusted copy of the state machine

Checks outputs against the log

Module A

Module B

=?

LogNetwork

Input

Output

Sta

te m

ach

ine

if ≠

Module A


PeerReview guarantees

1) Observable faults will be detected

2) Good nodes cannot be accused

Formal definitions and proof in the TR

If node commits a fault + has a correct witness,

then witness obtains a proof of misbehavior (PoM), or a challenge that the faulty node cannot answer

If node is correct there can never be a PoM,

and it can answer any

challenge


PeerReview is widely applicable App #1: NFS server in the Linux kernel

Many small, latency-sensitive requests Tampering with files Lost updates

App #2: Overlay multicast Transfers large volume of data

Freeloading Tampering with content

App #3: P2P email Complex, large, decentralized

Denial of service Attacks on DHT routing

More information in [Haeberlen et al., SOSP’07]

Metadata corruption Incorrect access

control

Censorship

90

Outline






How much does PeerReview cost?

Log storage 10 – 100 GByte per month, depending on

application

Message signatures Message latency (e.g. 1.5ms RTT with RSA-1024) CPU overhead (embarrassingly parallel)

Log/authenticator transfer, replay overhead Depends on # witnesses Can be deferred to exploit bursty/diurnal load

patterns


P2p email, dedicated witnesses

Dominant cost depends on number of witnesses W

O(W2) component

Baseline 1 2 3 4 5

100

80

60

40

20

0

Avg t

raffi

c (K

bps/

node)

Number of witnesses

Baseline traffic

Signaturesand ACKs

Checking logs

W dedicatedwitnesses


P2p email, mutual auditing

Small probability of error is inevitable Example: Replication

Can use this to optimize PeerReview Accept that an instance of a fault is found

only with high probability Asymptotic complexity: O(N2) O(log N)

Small randomsample of peers

chosen as witnesses

Node


PeerReview is scalable

Assumption: up to 10% of nodes can be faulty Probabilistic guarantees provide scalability

Example: email system scales to over 10,000 nodeswith P=0.999999

DSL/cableupstream

Email systemw/o accountability

O((log N)2)

O(log N)

Email system+ PeerReview(P=0.999999)

Email system + PeerReview(P=1.0)

System size (nodes)

Avg

traf

fic (

Kbp

s/no

de)

95

PeerReview summary Accountability is a new approach to

handlingfaults in distributed systems detects faults identifies the faulty nodes produces evidence

PeerReview: A system that enforces accountability Offers provable guarantees and is widely

applicable

Details in [Haeberlen et al., SOSP ‘07 ]



Challenges

Tension between accountability and privacy PeerReview (PR) requires disclosure to

witnesses Zero-knowledge proofs?

Fault detection PR uses state-machine replay for fault

detection Can‘t detect deterministic software bugs Different implementations of underspecified

protocols may diverge Protocol specification or abstract model?


Challenges (cont‘d)

Message signatures PR assumes a public-key infrastructure Web-of-trust (physical network, social network)

?

Partial deployment Accountability zones, gateways ?

PR requires source code modifications To enable deterministic replay Accountable virtual machines?

NetReview

Accountability applied to inter-domain routing

Fault detection based on a spec of the routing policy

Web-of-trust-based certificates Auditing limited to peering partners Partial deployment: accountability zones

Details in [Haeberlen et. al., NSDI’09]


99

Outline



3. What’s next? Social computing and applications



Accountable virtual machines (AVM)

Make unmodified binary VMs accountable

VMM provides deterministic logging/replay

Accountable VMM

AVMVM

Log

Unmodified binary

Packets Authenticator


What are AVMs good for?

Accountability for proprietary/legacy software

Accountable cloud computing Customer can verify correct execution

Making an entire host computer accountable Check for compromised software Forensics


Trusted network probes

Making the Internet accountable, one host at a time

Secure log

Cable/DSL modemor ISP’s DSLAM

Internet AccountableWorkstation

AuthenticatorPacket

Chain of authenticatorsvalidates log


Related Work

Accountability [Lampson ’00, Yumerefendi&Chase ’05, Yemerefendi et al. ’07, Argyraki et al. ’07, Michalakis et al. ‘07]

Practical byzantine fault tolerance [Castro&Liskov ‘00, Ramasamy ‘07]

General fault detection [Kihlstrom et al. ’07, Doudou et al. ’99, Malkhi&Reiter ‘97]

Intrusion detection, reputation systems [Denning ’87, Ko et al. ’94, Kamvar et al. ‘03]

Trusted computing [Garfinkel et al. ’02] Fault-specific defenses [Cox&Noble ‘03,

Waldman&Mazieres ’03] Tamper-evident logs [Schneier&Kelsey ’98, Maniatis&Baker

‘02]


Conclusion Byzantine faults in distributed systems are real

Accountability is a new approach to handling faults detects observable faults identifies the faulty node produces verifiable evidence

Presented a practical definition of accountability

Practical implementations exist

Many challenges remain

105

Outline

1. Decentralized systems: state-of-the-art2. Accountability for distributed systems3. Social computing and applications

Exploiting social networks for distributed computing

Example: enhancing Web search Example: thwarting unwanted

communication


106

From service-centric to user-centric computing

Collaborative, social computing and communication

In peer-to-peer, users share technical resources In social computing, users share knowledge,

opinions, referrals, ratingsTECS Week, Pune, 5-9 January 2009

107

User-centric, social computing

Mass collaboration, enabled by technology

Human intelligence aggregated through technology

User contribution is the most important resource(Underutilized resource of enormous scale?)

BUT: Outcome depends on user behavior depends on cooperation, good will vulnerable to spoilers


108

Social networks: two concepts

Users contribute Content Opinions, recommendations, ratings (ex- or

implicit)

Users form social networks Graph connecting users (ex- or implicit) Links imply shared interest or trust


109

What are social networks?

Graphs connecting people Edges connect “friends” Imply shared interest or trust Online friends may have

never met in real life E.g., email, Skype, IM

Online social networking sites Network hosted by a Web

site Often used to share

opinions, advice, ratings, multimedia content

Social Network

Online Social Network


110

Huge opportunity…

…to leverage collective user input, e.g.

to deal with unwanted communication to thwart security attacks to enable better organization, filtering,

search, ranking, and distribution of content may provide an answer to the ever-

increasing flood of information


111

Outline

1. Decentralized systems: state-of-the-art2. Accountability for distributed systems3. What’s next? Social computing and

applications Exploiting social networks for

distributed computing Example: enhancing Web search Example: thwarting unwanted

communication


112

What’s it got to do with Systems?

Social networks enhance distributed systems Sybil attacks Unwanted communication Personalization

Social computing may need distribution Privacy Avoid dependence on a single provider


113

Leveraging social networks to enhance systems

Trust can help thwart security problems Sybil attacks: SybilGuard [SIGCOMM’06] Clones unlikely to have diverse links

Trust can help block unwanted communication Friends unlikely to send SPAM: RE [NSDI’06] Using social networks to thwart SPAM

(Ostra)

Shared interest can improve search Web search: PeerSpective [HotNets’06] Related users likely to visit relevant content


114

Leveraging social networks: More ideas

Sharing solutions and problem fixes Configurations that work Fixes that others have found “Copy what works for others”

Combine technology and social networks to truly “stand on the shoulders of giants”

Answer to the increasing complexity of the information age?


115

Outline




communication


116

Example: social network based Web search

PeerSpectiveGoogl

e

PeerSpective experiment Idea: users can query their friends’ previously viewed pages Results from friends appear alongside Google results


117

PeerSpective implementation

Prototype is a lightweight HTTP proxy Runs on users’ desktop and indexes all browsed content When Google search is performed,

query other PeerSpective proxies in parallel with Google present PeerSpective results alongside Google results

PeerSpectivePeerSpective

PeerSpectivePeerSpective

PeerSpectivePeerSpectiveTECS Week, Pune, 5-9 January 2009

118

PeerSpective results summary

Explored potential of integrating Web and social network search

Evidence that PeerSpective added value Additional coverage for viewed sites Improved ranking of results Aided in finding content serendipitously

However, just an experiment Many challenges remain Opportunities as well

Details in [Mislove et al., HotNets ’06]TECS Week, Pune, 5-9 January 2009

119

Outline




communication


120

Unwanted communication

Well-known problem Email spam

Increasingly affects other systems Search-engine spam Mislabeled videos plaguing YouTube Unwanted invitations in Skype

Existing solutions insufficient Content filtering for videos?


121

Known defenses

Content filtering Works very well for email, but False positives reduce communication

reliability Doesn’t work for multimedia

Holding senders accountable Requires strong user identities

Imposing a per-communication cost Refunded if communication is wanted Requires micro-payments/quota market


122

Ostra: Using social relationships

Assumptions

Cost for acquiring and maintaining social links Cannot create links arbitrarily fast Cannot maintain arbitrary number of links

Receivers are willing to classify content Explicit (Junk button) Implicit (Deletion, response)


123

Ostra: Pair-wise credit exchange

• Credit balance/bound associated with each link • Credit balances decay at constant rate (10%/day)• Sum of all credit = 0 (invariant)

-202


124

Ostra: Pair-wise credit exchange

Receiver

Sender

Message unwanted -> sender pays receiver one credit Sending spam exhausts sender’s link balance

-202


125

Ostra: End-to-end credit exchange

Sender

Receiver

Rate of spam a user can send is proportional to number of links (s)he has

-202

-212 -202-2-12


126

Sybil attacks are not effective

{Sybils

Total unwanted communication by Sybils is bounded by the number of links with other users


127

Ostra

Thwarts unwanted communication existing systems Examples: Email, Skype, IM, YouTube

Uses existing relationships among users Online social networks Graph of email/IM/Skype users

Does not require strong user identities Does not rely on automatic content classification Respects recipient’s idea of wanted/unwanted

communication

Details in [Mislove et al., NSDI ’08 ]


128

SN and applications research agenda

Measurement/Analysis Theory of complex networks Empirical study of social networks

Understanding SN evolution Understanding SN information flow

Design Personalized search, filtering, content distribution Using social networks to thwart unwanted behavior Online social networks and privacy


129

Outline

1. Decentralized systems: state-of-the-art2. Accountability for distributed systems3. Social computing and applications

Exploiting social networks for distributed computing

Example: enhancing Web search Example: thwarting unwanted

communication


130

Max Planck Institute for Software Systems(MPI-SWS)

Part of Max Planck Society Academic research institute, pub.

funded Focus on basic research Kick-off in Aug 2005 17 faculty positions (tenure-track) ~100 doctoral/post-doc positions Administrative and technical support

staff Top international research institution


131

MPI-SWS Faculty

Distributed systems

Peter Druschel

Krishna Gummadi

Program analysis

and verificati

on

Andrey Rybalchenko

Derek Dreyer

Functional Programming

Networked systems

Michael Backes (Fellow)(Fellow)

Security andCryptography

Rodrigo Rodrigues

Dependable systems


Paul Francis

Large scale

Internet systems

132

Graduate program (MS/PhD)

Advised by MPI-SWS faculty Stimulating, competitive environment International, diverse student body

(80%) English language Financial aid Internships available

http://www.mpi-sws.org


133

Thanks for your attention!


TECS Week, Pune, 5-9 January 2009 1 The User is the Computer: From Decentralized Systems to Social...

Documents

Transcript of TECS Week, Pune, 5-9 January 2009 1 The User is the Computer: From Decentralized Systems to Social...