From peer to peer: 10 peer reviewing tips from peer reviewers
Cluster Computing on the Fly Peer-to-Peer Scheduling of Idle Cycles in the Internet Virginia Lo,...
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of Cluster Computing on the Fly Peer-to-Peer Scheduling of Idle Cycles in the Internet Virginia Lo,...
Cluster Computing on the Fly
Peer-to-Peer Scheduling of Idle Cycles in the Internet
Virginia Lo, Daniel Zappala, Dayi Zhou, Shanyu Zhao, and Yuhong LiuNetwork Research Group
University of Oregon
Cycle Sharing Motivation A variety of users and their applications
need additional computational resources Many machine throughout the Internet lie
idle for large periods of time Many users are willing to donate cycles
How to provide cycles to the widest range of users? (beyond institutional barriers)
Cycle Sharing Ancestors Condor - load sharing in one institution.
--> Condor-G, Condor flocks Grid Computing - coordinated use of
resources at supercomputer centers for large scale scientific applications --> peer-based Grid computing
SETI@home - volunteer computing using client server model --> BOINC
5DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago
tomographic reconstruction
real-timecollection
wide-areadissemination
desktop & VR clients with shared controls
Advanced Photon Source
Online Access to Scientific Instruments
archival storage
6
The 13.6 TF TeraGrid:Computing at 40 Gb/s
26
24
8
4 HPSS
5
HPSS
HPSS UniTree
External Networks
External Networks
External Networks
External Networks
Site Resources Site Resources
Site ResourcesSite ResourcesNCSA/PACI8 TF240 TB
SDSC4.1 TF225 TB
Caltech Argonne
TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org
7
Layered Grid Architecture and Globus Tookit(Foster and Kesselman)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
CCOF Goals and Assumptions
Cycle sharing in an open peer-to-peer environment
P2P model; adopt principles from P2P networking
Application-specific scheduling
Long term fairness
Hosts retain local control, sandbox
CCOF Research Issues Incentives and fairness
What incentives are needed to encourage hosts to donate cycles?
How to keep track of resources consumed v. resources donated?
How to prevent resource hogs from taking an unfair share?
Resource discovery How to discover hosts in a highly dynamic
environment (hosts come and go, withdraw cycles, fail)
How to discover hosts that can be trusted, that will provide the needed resources?
CCOF Research Issues Verification, trust, and reputation
How to check returned results? How to catch malicious or misbehaving hosts that
change results with low frequency? Which reputation system?
Application-based scheduling How does trust and reputation influence scheduling? How should a host decide from whom to accept
work?
CCOF Research Issues Quality of service and performance monitoring
How to provide local admission control? How to evaluate and provide QoS - guaranteed
versus predictive service? Security
How to prevent attacks launched from guest code running on the host?
How to prevent denial of service attacks in which useless code occupies many hosts
Related Work Systems most closely resembling CCOF SHARP (Fu, Chase, Chun, Schwab, Vahdat, 2003) Partage, Self-organizing Flock of Condors (Hu, Butt, Zhang, 2003) BOINC (Anderson, 2003) - limited to donation of cycles to workpile) Resource discovery (Iamnitchi and Foster, 2002); Condor matchmaking Load sharing within and across institutions Condor, Condor Flocks, Grid computing Incentives and Fairness See Berkeley Workshop on Economics of P2P Systems OurGrid (Andrade, Cirne, Brasileiro, Roisenberg, 2003) Trust and Reputation EigenRep (Kamvar, Schlosser, Garcia-Molina, 2003); TrustMe(Singh and Liu,
2003)
Cycle Sharing Applications
Four classes of applications that can benefit from harvesting idle cycles:
Infinite workpile Workpile with deadlines Tree-based search Point-of-Presence (PoP)
Infinite workpile Consume huge amounts of compute time Master-slave model “Embarrassingly parallel”: no communication
among hosts
Ex: SETI@home, Stanford Folding, etc.
Workpile with deadlines Similar to infinite workpile but more moderate Must be completed by a deadline (days or
weeks) Some capable of increasingly refined results
given extra time
Ex: simulations with a large parameter space, ray tracing, genetic algorithms
Tree-based Search Tree of slave processes rooted in single master
node Dynamic growth as search space is expanded Dynamic pruning as costly solutions are
abandoned Low amount of communication among slave
processes to share lower bounds
Ex: distributed branch and bound, alpha-beta search, recursive backtracking
Point-of-presence Minimal consumption of CPU cycles Require placement of application code dispersed
throughout the Internet to meet specific location, topological distribution, or resource requirements
Ex: security monitoring systems, traffic analysis systems, protocol testing, distributed games
CCOF Architecture Cycle sharing communities based on factors
such as interest, geography, performance, trust, or generic willingness to share.
Span institutional boundaries without institutional negotiation
A host can belong to more than one community
May want to control community membership
CCOF Architecture
Application schedulers to discover hosts, negotiate access, export code, and collect and verify results.
Application-specific (tailored to needs of application)
Resource discovery Monitors jobs for progress; checks jobs for
correctness Kills or migrates jobs as needed
CCOF Architecture (cont.) Local schedulers enforce local policy
Run in background mode v. preempt when user returns
QoS through admission control and reservation policies
Local machine protected through sandbox
Tight control over communication
CCOF Architecture (cont.) Coordinated scheduling
Across local schedulers, across application schedulers
Enforce long-term fairness Enhance resource discovery through
information exchange
CCOF Projects Wave Scheduler [JSSPP’05, IPDPS’06] Resource discovery [CCGRID’03] Result verification [IEEE P2P’05] Point-of-Presence Scheduler
[HOTP2P’05]
Wave Scheduler Well-suited for workpile with
deadlines Provides on-going access to
dedicated cycles by following night timezones around the globe
Uses a CAN-based overlay to organize hosts by timezone
Resource Discovery(Zhou and Lo, WGP2P’04 at CC-Grid ‘04)
Highly dynamic environment (hosts come, go) Hosts maintain profiles of blocks of idle time
Four basic search methods Rendezvous points Host advertisements Client expanding ring search Client random walk search
Resource Discovery
Rendezvous point best high job completion rate and low msg
overhead, but favors large jobs under heavy workloads
==> coordinated scheduling needed for long term fairness
CCOF VerificationGoal: Verify correctness of returned results
for workpile and workpile with deadline
Quizzes = easily verifiable computations that are indistinguishable from the actual work
Standalone quiz v. Embedded quizzes Quiz performance stored in reputation system
Quizzes v. replication
Point-of-Presence Scheduler Scalable protocols for identifying selected
hosts in the community overlay network such that each ordinary node is k-hops from C of the selected hosts
(C,k) dominating set Problem
Useful for leader election, rendezvous point placement, monitor location, etc.
CCOF Dom(C,k) Protocol
Round 1: Each node says HI to k-hop neighbors <Each node knows size of its own k-hop neighborhood> Round 2: Each node sends size of its k-hop
neighborhood to all its neighbors. <Each node knows size of all nbrs k-hop nbrhoods.> Round 3: If a node is maximal among its nbrhood, it declares itself a dominator and notifies all nbrs. <Some nodes hear from some dominators, some don’t>
For those not yet covered by C dominators, repeat Rounds 1-3 excluding current dominators, until all nodes covered.
31
Result Verification and Trust-based Scheduling in Open Peer-to-Peer Cycle Sharing Systems
Student: Shanyu Zhao
Direct Research Project
Committee Members:Prof. Virginia Lo (Advisor)Prof. Jun LiProf. Sarah Douglas
CIS Department, University of OregonDec. 2, 2004
32
What’s this DRP about? Problem: in open peer-to-peer cycle
sharing systems, computational results could be cheated
Current strategy: Replication We propose:
Quiz as an alternative verification scheme Trust-based Scheduling to avoid
scheduling on malicious hosts, and to lower the cost of verification
33
Outline
Result Cheating is a Serious Problem Result Verification: Replication vs.
Quiz Math Analysis of Repl. and Quiz Trust-based Scheduling Simulations Related Work Conclusion and Future Work
34
Berkeley’s SETI@Home Uses Internet-connected computers
in the Search for Extraterrestrial Intelligence (SETI)
Users download and run programs (screen saver) to participate
Currently Over 5 million users Utilize 1000 years of CPU time every
day Faster than IBM ASCI White
35
Cheating in SETI@Home
To gain higher ranking on the website: Some team fakes the number of
work units completed Let every team member return a copy
of the result for the same task Unauthorized patch codes to make
SETI run faster but return wrong results
36
Current Strategies Specialized Schemes: e.g. Ringer,
Encrypted Function They have restrictions on the type of
computation Generic Schemes: Replication
Susceptible to collusion, e.g.: Use Distributed Hash Table Patches that return wrong results
Fixed high overhead
37
Outline
Result Cheating is a Serious Problem Result Verification: Replication vs.
Quiz Math Analysis of Repl. and Quiz Trust-based Scheduling Simulations Related Work Conclusion and Future Work
38
Result Cheating Models Three basic types:
Type I: foolish cheater, always returns wrong results
Type II: ordinary cheater, returns wrong results with fixed probability
Type III: smart cheater, performs well for a period, then turns to Type II
Non-colluding vs. Colluding Scenarios Static vs. Dynamic Scenarios
39
Quiz: A Result Verification Scheme
A quiz is an indistinguishable task with result known to the client
Scheduling: mix t tasks and m quizzes to form a package. Send the package to a host
Verification: discard all task results in a package if any quiz failed, otherwise accept them all
40
Replication vs. Quiz
Different principle: Replication: a voting principle Quiz: a sampling principle
Quiz does not suffer from collusion With unlimited resources, Quiz
may have unfavorable turnaround time
43
Outline
Result Cheating is a Serious Problem Result Verification: Replication vs.
Quiz Math Analysis of Repl. and Quiz Trust-based Scheduling Simulations Related Work Conclusion and Future Work
44
System Model Philosophy: Trust-but-verify!
Use reputation systems to choose hosts Reduce verification overhead for trusted
hosts
45
Reputation System Classification
Two steps to calculate reputation: Formation of local trust value Formation of aggregated trust value
Classification by degree of trust info sharing (aggregation): Local Sharing Partial Sharing Global Sharing
46
Five Reputation Systems Local Sharing
Local Reputation System Private trusted list and optional black list
Partial Sharing NICE Reputation System
Tree-based search for recruitment when trusted list is exhausted
Gossip Reputation System Periodically ask most trusted peers in the
trusted list, for their most trusted peers.
47
Five Reputation Systems (cont)
Global Sharing Global Reputation System
A shared trusted list and blacklist by all peers, trust values updated by every transaction.
EigenTrust Reputation System A shared trusted list and blacklist, trust
values are updated periodically by the algorithm described in the EigenTrust paper.
48
Trust-based Replication Algorithm
Scheduling
while(task queue not empty):for each task in task queue:a. fetch a most trusted free host(aid by repu sys);b. calculate the replication factor k, and determine the # of extra hosts needed, say c;c. pick c most trusted free hosts(aid by repu sys),if(not enough hosts are allocated) then stall for a while;
Verification and Repu System Updatingfor each task scheduled:upon receiving the result set from all replicas:a. cross check the results,if(# of majority results > (k + 1)/2) then accept the majority result;else reject all results, reschedule that task later;b. update reputation system,if(task result is accepted) then for each host H who was chosen in replication: if(host H gave majority result) then increase trust value for H; else if(host H gave minority result) then decrease trust value for H;
49
Trust-based Quiz Algorithm
Scheduling
while(task queue not empty):a. fetch a most trusted free host H(aid by repu sys);b. calculate the quiz ratio r, and determine the # of tasks to schedule, say t, and the # of quizzes to be inserted, say m;c. pick t tasks from task queue, mixed with m quizzes
Verification and Repu System Updating
for each package scheduled:upon receiving all results from the host H:a. check the results for quizzes in that package,if(all results for quizzes are correct) then accept all the results in that package;else reject all results, reschedule later;b. update reputation system,if(task result is accepted) then increase trust value for H;else decrease trust value for H;
50
Accuracy On Demand Clients want to have a desired level
of accuracy Accuracy of Trust-based Scheduling
depends on: Trust value of the selected host Quiz ratio / Replication factor in the
verification scheme AOD dynamically adjusts quiz ratio
or replication factor based on trust value
51
AOD Example Calculate how many quizzes (m’) should
be in a package (s), to obtain accuracy A:110101()(1)1()(1)mmppfxxdxAppfxxdx+−+−=−+−∫∫
The relation between A and m:
mmv′=−v: how many quizzes this host has passed, derived from trust value:
Assume p=0.5, f(x)=1:
121mvA′=−−−
52
Outline
Result Cheating is a Serious Problem Result Verification: Replication vs.
Quiz Math Analysis of Repl. and Quiz Trust-based Scheduling Simulations Related Work Conclusion and Future Work
53
Simulation Models Cycle Sharing Model
1000 peers, a peer acts as both a client and a host Dedicated, no private job Tasks have the same length Topology not considered, every two peers are connected
Synthetic Task Generation Model
# of tasks task runtime Inter-arrivals
Syn1 Normal, u=20 1 round Exponentialu=50 rounds
Syn2 Normal, u~Exponential
1 round Exponentialu=50 rounds
54
Simulation Models Trust Function (to adjust local trust value)
Linear Increase Sudden Die (LISD) Additive Increase Multiplicative Decrease
(AIMD) Blacklisting
Quiz Ratio / Replication Factor FunctionMaxVal
MinVal
QR.RF
TrustValThreshold
55
Simulation Metrics
Accuracy
Overhead
# of tasks with correct results
# of tasks accepted by the clients
# of quizzes or replicas + # of rejected tasks
# of tasks accepted by the clients
56
Simulation Results
Contribution of different reputation systems
Replication vs. Quiz Inspecting trust functions: LISD,
AIMD and Blacklisting Accuracy on demand
57
Replication Overhead Reduced by Reputation Systems
Parameters:
No collusion
30% Type II
Repl factor: 1
58
Quiz: Accuracy Boosted to 1, Overhead Reduced by Reputation Systems
Parameters:
30% Type II
Quiz ratio: [0.05,1]
59
Parameters:
30% Type III
Quiz ratio:
[0.05,1]
Quiz Accuracy Converge to 1 After Break Out of Type III Malicious hosts
60
Replication vs. Quiz
Parameters:
Local repu system
Calculated among the first 500000 tasks
Quiz ratio, repl factor: [0.05, 1]
61
Inspect Trust Functions for Quiz: LISD vs. Blacklisting
Parameters:
30% TypeII
System Load:
u=60
Quiz ratio:
[0.05, 1]
63
Outline
Result Cheating is a Serious Problem Result Verification: Replication vs.
Quiz Math Analysis of Repl. and Quiz Trust-based Scheduling Simulations Related Work Conclusion and Future Work
64
Related Work on Cycle Sharing Systems Three Threads
Grid computing Peer-to-Peer cycle sharing Computational network overlay
Taxonomy of Current Research Projects
Open Institution-based
P2P Paradigm
CCOF, Partage Flock of Condor, SHARP, OurGrid
Client-Server
SETI@Home, BOINC, Folding@Home
Globus, Condor-G, XenoServer
65
Related Workon Result Verification (1)
Encrypted Function, Sander, 1998 Basic Idea
Client sends to host 2 functions: f, and E(f)
Host returns f(x) and E(f)(x) Then verify if P(E(f)(x)) = f(x)
But, find an encrypted function for a general computation is hard
66
Related Workon Result Verification (2)
Ringer Scheme, Golle, 2001 Basic Idea
f(x) is an one-way function, host should compute f(x) for all x in D
Client pre-compute several values yi=f(xi)
Client sends all the yi, host should return every possible xi
But, only applies when f(x) is strictly a one-way function
67
Related Workon Result Verification (3)Uncheatable Grid Computing, Du, 2003 Basic Idea
host should compute f(x) for all x in D, |D|: 2^40 Host build a Merkle Tree, send the root of the tree (a
hash value) to the client Client randomly choose some x, let the host give
proof in the merkle tree. Drawback:
Strong restriction on computation model The building of a huge merkle tree is costly Only combat the cheating of incompletely computing
68
Related Workon Reputation Systems (1) Centralized Reputation Systems,
eBay… Improved Gnutella protocol, Damiani,
2002 Add a reputation polling phase
EigenTrust, Kamvar, 2003 Form global reputation value by
calculating the Eigen vector of the trust matrix
Use a DHT to decide mother nodes who take care of the calculation
69
Related Workon Reputation Systems (2) TrustMe, Singh, 2003
Bootstrap server decide THAs for a node Flooding trust query, anonymous THAs
return trust value for the nodes NICE, Lee, 2003
Infer trust value by a trust chain Weighted Majority Algorithm, Yu,
2004 Assign weights to advisors, adjust
weights based on the quality of advice
70
Conclusion & Future Work Contribution
Quiz: a result verification scheme Math analysis of the two verification
schemes Trust-based scheduling, “trust-but-
verify” Future Work
Quiz Generation Accuracy on Demand, how to
estimate fraction of malicious host and distribution of b.