CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) [email protected].
-
Upload
brent-fields -
Category
Documents
-
view
240 -
download
6
Transcript of CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) [email protected].
2CIS6930.5: Federated Distributed Systems (Fall 2006)
Contact Info
Email: [email protected]: ENB 334Office hours: by appointment (email me)Course page: http://www.csee.usf.edu/~anda/CIS6930.5
3CIS6930.5: Federated Distributed Systems (Fall 2006)
CIS 6930.5: Course Goals
Primary– Gain deep understanding of fundamental
issues that affect design of large-scale federated distributed systems
– Map primary contemporary research themes
– Gain experience in network research Secondary
– By studying a set of outstanding papers, build knowledge of how to present research
– Learn how to read papers & evaluate ideas
4CIS6930.5: Federated Distributed Systems (Fall 2006)
What I’ll Assume You Know
Basic Internet architecture– IP, TCP, DNS, HTTP
Basic principles of distributed computing– Asynchrony (cannot distinguish between
communication failures and latency)
– Partial global state knowledge (cannot know everything correctly)
– Failures happen. In very large systems, even rare failures happen often
If there are things that don’t make sense, ask!
5CIS6930.5: Federated Distributed Systems (Fall 2006)
Examples of Distributed Systems
ATT web Gnutella network
The InternetA Sensor Network
6CIS6930.5: Federated Distributed Systems (Fall 2006)
Definition (a version)
A distributed system is a collection of autonomous, programmable, failure-prone entities that are able to communicate through a communication medium that is unreliable.– Entity=a process on a device (PC, PDA, mote)
– Communication Medium=Wired or wireless network
“Federated” – spanning multiple institutional or network (DNS) domains
7CIS6930.5: Federated Distributed Systems (Fall 2006)
Outline
Case study (and project ideas): – Volunteer computing: SETI@home and BOINC
– Grid computing
– P2P systems Administravia
8CIS6930.5: Federated Distributed Systems (Fall 2006)
9CIS6930.5: Federated Distributed Systems (Fall 2006)
SETI@home Operations
datarecorder
screensavers
WU storage
splitters
DLT tapes
dataserver
science DBuser DB
resultqueue
acct.queue
garbagecollector
tape archive,delete
tape backup
master DBredundancy
checking
RFIelimination
repeatdetection
web site
CGI program
web pagegenerator
10CIS6930.5: Federated Distributed Systems (Fall 2006)
How does it work?
Fixed-rate data processing task Low bandwidth/computation ratio Independent parallelism Error tolerance
SETI@home
Master-workerarchitecture
11CIS6930.5: Federated Distributed Systems (Fall 2006)
History and Statistics Conceived 1995, launched April 1999 “scientific experiment that uses Internet-connected
computers in the Search for Extraterrestrial Intelligence (SETI). You can participate by running a free program that downloads and analyzes radio telescope data. “
No ET signals yet, but other results
Total Last 24 Hours(as of Wed Feb 23 07:04:51)
Users 5,361,313 4,391
Results received 1,779 millions 5 million
Total CPU time 2.2 million years 3610.717 years
Average CPU time/work unit
10 hr 58 min 14.0 sec 6 hr 19 min 30.1 sec
12CIS6930.5: Federated Distributed Systems (Fall 2006)
Volunteer computing
Also called “public-resource computing” Utilizes idle computing cycles over Internet Other systems:
– Original: GIMPS, distributed.net
– Commercial: United Devices, Entropia, Porivo, Popular Power
– Academic, open-source> Cosm, folding@home
13CIS6930.5: Federated Distributed Systems (Fall 2006)
None of the popularity of SETI!
ET How to get and retain users (from David Anderson,
the leader of the SETI@home project)– Graphics are important (but monitors do burn in)– Teams: users recruit other users– Keep users informed
Science news System management news Periodic project emails
Reward users:– PDF certificates– Milestone pages and emails– Leader boards (overall, country, …)
14CIS6930.5: Federated Distributed Systems (Fall 2006)
Millions and millions of computers!(Problems)
Server scalability Dealing with excess CPU time Cheating Bad behavior:
– Team recruitment by spam
– Sale of accounts on eBay Malfunctions Network bandwidth costs money
15CIS6930.5: Federated Distributed Systems (Fall 2006)
SETI@home: Summary Master-worker design
– Centralized solution>Master=central point of control>Single point of failure>Performance bottleneck
Incentives for participation– Mean sometimes incentives for cheating
Massive (“embarrassing”) parallelism Low bandwidth/computation ratio Users do donate real resources: $1.5M / year
consumed power More information: http://setiathome.ssl.berkeley.edu
16CIS6930.5: Federated Distributed Systems (Fall 2006)
BOINC Berkeley Open Infrastructure for Network
Computing “Open-source software for volunteer computing and
desktop grid computing. “http://boinc.berkeley.edu/
Project idea: install and configure BOINC on a set of machines at USF to run large embarrassingly parallel applications. – Two candidate applications from mechanical
engineering and physics (code already exists)– Report experience. Think along the following idea:
would it be beneficial to use the administrative desktops for scientific computations at USF?
17CIS6930.5: Federated Distributed Systems (Fall 2006)
Outline
Case study (and project ideas): – Volunteer computing: SETI@home and BOINC
– Grid computing
– P2P systems Administravia
18CIS6930.5: Federated Distributed Systems (Fall 2006)
Grid Computing: Current Status The metaphor: power grid Many deployed grids running in
production mode Scientists are the most traditional users Users:
– 100s, 10s of institutions
– Well-established communities Resources:
– Computers, data, instruments, storage, applications
– Owned/administered by institutions Applications: data- and compute-
intensive processing Approach: common infrastructure
19CIS6930.5: Federated Distributed Systems (Fall 2006)
Why Don’t We Build a Huge Supercomputer?
1
10
100
1000
10000
1 10 100 1000Rank (log scale)
Lin
Pack
per
f.G
FLO
PS (
log
scal
e) . 2001 2000
1999 19981997 19961995
Top500 supercomputer list over time: Zipf distribution: Perf(rank) ≈ rank -k
Parameter 'k' evolution .
-0.84
-0.82
-0.80
-0.78
-0.76
-0.74
-0.72
-0.70
-0.68
1995
1996
1997
1998
1999
2000
2001
2002
2003
20CIS6930.5: Federated Distributed Systems (Fall 2006)
Impact
Trend: it is increasingly interesting to aggregate the capabilities of the machines in the tail of this distribution.
– A virtual machine that aggregates the last 10 in Top500 would rank 32nd in ’95 but 14th in ‘03
Both Grid and P2P computing are results of this trend:– Grids: focus on assembling (a relatively small number of)
resources to enable controlled, secure resource sharing
– P2P focus: scale, deployability.
Challenge: design services that offer the best of both worlds
complex, secure services, that deliver controlled QoS; are scalable and can be easily deployed.
Parameter 'k' evolution .
-0.84
-0.82
-0.80
-0.78
-0.76
-0.74
-0.72
-0.70
-0.68
1995
1996
1997
1998
1999
2000
2001
2002
2003
21CIS6930.5: Federated Distributed Systems (Fall 2006)
Outline
Case study (and project ideas): – Volunteer computing: SETI@home and BOINC
– Grid computing
– P2P systems Administravia
22CIS6930.5: Federated Distributed Systems (Fall 2006)
Peer-to-Peer Systems
Revived (?) by music sharing A variety of applications deployed today Def 1: “A class of applications that take
advantage of resources (e.g., storage, cycles, content) available at the edge of the Internet.”– Edges often turned off, without permanent IP
addresses, etc. Def 2: “A class of decentralized, self-organizing
distributed systems, in which all or most communication is symmetric.”
Lots of other definitions that fit in between
23CIS6930.5: Federated Distributed Systems (Fall 2006)
P2P Impact (1)
Widespread adoption leading to– KaZaA – 170 millions downloads (3.5M/week)
one of the most popular applications ever! (almost) zero-cost data distribution
… is forcing companies to change their business models
… might impact copyright laws
24CIS6930.5: Federated Distributed Systems (Fall 2006)
0%
20%
40%
60%
80%
100%
Feb.'02 Aug.'02 Feb.'03 Aug.'03 Feb. '04 July'04
Other
Data transfers
Unidentified
File sharing
P2P Impact (2)
Killer application for broadband to consumers– P2P generated traffic may be the single
largest contributor to Internet traffic today
Internet2 traffic statistics
Source: www.internet2.edu
25CIS6930.5: Federated Distributed Systems (Fall 2006)
Applications (1)
File sharing – The ‘killer’ application to date
– Too many to list them all: Napster, FastTrack (KaZaA, iMesh), Gnutella (LimeWire, Morpheus, BearShare),
Streaming: the user ‘plays’ the data as it arrives
P2P approach
Possible solution: The first few users get the
stream from the server New users get the stream
from the server or from users who are already receiving the stream
source
Oh, I am exhausted!
Client/server approach
26CIS6930.5: Federated Distributed Systems (Fall 2006)
Applications (2)
Performance benchmarking Problem:– Evaluate the performance of your Web site form end-user
perspective > Multiple views on your site performance
– Generate Internet statistics> Connectivity statistics> Routing errors, routing maps
Backup storage (HiveNet, OceanStore) Collaborative environments (Groove Networks) Instant messaging (Yahoo, AOL) Web serving communities (uServ) Spam filtering Anonymous email Censorship-resistant publishing systems (Ethernity, Freenet
27CIS6930.5: Federated Distributed Systems (Fall 2006)
P2P Networks: Current Status
Users: – Millions– Anonymous individuals
Resources:– Computing cycles XOR files– Owned/administered (?) by user– Intermittent participation:
> Gnutella: 60 min. (‘01)> MojoNation: 1/6 users always connected
(‘01)> Overnet: 50% nodes available 70% of
time over a week (‘02) Applications: file retrieval or parallel
computations Approach: vertically integrated solutions
(www.slyck.com, 06/14/’06)
???MP2P
???DirectConnect
645,120 Overnet
2,219,539Gnutella
2,848606 FastTrack
3,108,066 eDonkey2K
Network Users
28CIS6930.5: Federated Distributed Systems (Fall 2006)
Trend: Large, Dynamic, Self-Configuring Grids
Scale & volatility
Functionality &infrastructure
Grids
P2P
•Large scale•Weaker trust assumptions•Ease of integration
•No centralized authority•Intermittent resource/user participation•Diversity in:
•Shared resources•Sharing characteristics
•Variable technical support•Infrastructure (sharable services)
•Support for diverse applications
On Death, Taxes, and the Convergence of Grid and P2P Systems, Foster and Iamnitchi, IPTPS’03
29CIS6930.5: Federated Distributed Systems (Fall 2006)
Challenges in Distributed Systems
Scale Real problems: spam, denial of service attacks
(and distributed), security, fault tolerance, etc. We’ll look at latest solutions to such problems
proposed in: – Top conferences in systems and networking:
SIGCOMM, OSDI, NSDI
– Top workshops (hot topics): IPTPS, HotOS
– Other venues (Digression: how do you tell when a conference is
top?)
30CIS6930.5: Federated Distributed Systems (Fall 2006)
Course Organization/Syllabus/etc.
31CIS6930.5: Federated Distributed Systems (Fall 2006)
Administravia: Grading
Reviewing:30% Discussion leading: 15% Project: 55%
– Aim high!
– Have fun!
32CIS6930.5: Federated Distributed Systems (Fall 2006)
Administravia:Paper Reviewing (1)
Goals:– Think of what you read– Get used to writing paper reviews
Reviews due by noon before class Be professional in your writing Have an eye on the writing style:
– Clarity– Beware of traps: learn to use them in writing and
detect them in reading– Detect (and stay away from) trivial claims. E.g., 1st sentence in the Introduction: “The tremendous/unprecedented/phenomenal
growth/scale/ubiquity of the Internet…”
33CIS6930.5: Federated Distributed Systems (Fall 2006)
Administravia:Paper Reviewing (2)
Follow the form provided when relevant. State the main contribution of the paper Critique the main contribution: Rate the significance of the
paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two.
Rate how convincing the methodology is. Do the claims and conclusions follow from the experiments? Are the assumptions realistic? Are the experiments well designed? Are there different experiments that would be more convincing? Are there other alternatives the authors should have
considered? (And, of course, is the paper free of methodological errors?)
34CIS6930.5: Federated Distributed Systems (Fall 2006)
Administravia:Paper Reviewing (3)
What is the most important limitation of the approach? What are the three strongest and/or most interesting ideas in
the paper? What are the three most striking weaknesses in the paper? Name three questions that you would like to ask the authors. Detail an interesting extension to the work not mentioned in
the future work section. Optional comments on the paper that you’d like to see
discussed in class.
35CIS6930.5: Federated Distributed Systems (Fall 2006)
Administravia:Discussion leading
Come prepared!– Prepare discussion outline– Prepare questions:
> “What if”s> Unclear aspects of the solution proposed> …
– Similar ideas in different contexts– Initiate short brainstorming sessions
Leaders do NOT need to submit paper reviews Main goals:
– Keep discussion flowing – Keep discussion relevant– Engage everybody (I’ll have an eye on this, too)
36CIS6930.5: Federated Distributed Systems (Fall 2006)
Administravia:Projects
Combine with your research if relevant to the class Get approval from all instructors if you overlap final
projects:– Don’t sell the same piece of work twice
– You can get more than twice as many results with less than twice as much work
Aim high!– Put one extra month and get a publication out of it
– It is doable (we have proofs) Try ideas that you postponed out of fear: it’s just a
class, not your PhD.
37CIS6930.5: Federated Distributed Systems (Fall 2006)
Administravia:Project deadlines (tentative)
Sept. 14: 1-page project proposal Oct. 10: 3-page literature survey
– Know relevant work in your problem area
– If implementation project, list tools, similar projects Nov. 13: 5-page Midterm project due
– Have a clear image of what’s possible/doable
– Report preliminary results Last class(es):In-class project presentation
– Demo, if appropriate Dec. 15:
– 10-page write-up
38CIS6930.5: Federated Distributed Systems (Fall 2006)
Next Class (Wed, August 30) In-class discussion of papers:
– “Automated Worm Fingerprinting”, OSDI ‘04.
– “Planet Scale Software Updates”, SIGCOMM ’06. Discussion of some project ideas Need discussion leader to team up with me for the
class next week: Real systems (1): BitTorrent– Exploiting BitTorrent For Fun (IPTPS’06)– A Case for Efficient Execution of Data-Intense
Applications with BitTorrent on Computational Desktop Grids ()
39CIS6930.5: Federated Distributed Systems (Fall 2006)
Questions?