1
Seminar: Information Management in the Web
Gnutella, Freenet and more: an overview of file sharing
architectures
Thomas Zahn
2
Peer-to-Peer - Introduction
• "opposite" of Client/Server
• no central servers information highly distributed
• every peer acts as a client AND server
• -> can query, reply to queries and route messages at the same time
• every peer can directly "talk" to any other peer
3
Popular Peer-to-Peer Networks
• Napster
• Gnutella
• Freenet
• FastTrack (Kazaa)
• CHORD, CAN, PASTRY, TAPESTRY
4
Napster
• was used primarily for file sharing
• NOT a pure peer-to-peer network
• => hybrid system
• peer turns to central DB for querying (client/server)
• peer downloads directly from other peer(s) (peer-to-peer)
5
Napster
central DB6
5
1 2
4
3
1. Query2. Response
3. DownloadRequest
4. File
Peer
6
Gnutella - overview
• pure peer-to-peer
• used for file sharing
• very popular => practically proven ?
• very simple protocol
• no routing "intelligence"
• messages are always broadcast
7
Gnutella - PING/PONG
1 52
4
3
6
7
8
Ping 1
Ping 1
Ping 1
Ping 1
Ping 1
Ping 1
Ping 1Known Hosts:2
3,4,5
6,7,8
Pong 2
Pong 4
Pong 3
Pong 5Pong 3,4,5
Pong 6,7,8 Pong 6
Pong 7
Pong 8
Pong 6,7,8
Query/Response analogous
8
Gnutella - Pro & Con
• VERY simple protocol => easy to implement
• very little overhead
• practically proven functionality (?)
• message broadcasts flood network
• =>heavy network traffic
• => bad, bad scalibility
9
Gnutella – Reachable Peers
T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8
N=2 2 4 6 8 10 12 14 16
N=3 3 9 21 45 93 189 381 765
N=4 4 16 52 160 484 1,456 4,372 13,120
N=5 5 25 105 425 1,705 6,825 27,305 109,225
N=6 6 36 186 936 4,686 23,436 117,186 585,936
N=7 7 49 301 1,813 10,885 65,317 391,909 2,351,461
N=8 8 64 456 3,200 22,408 156,864 1,098,056 7,686,400
10
Gnutella – Generated Traffic in Bytes (1)
T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8
N=2 166 332 498 664 830 996 1,162 1,328
N=3 249 747 1,743 3,735 7,719 15,687 31,623 63,495
N=4 332 1,328 4,316 13,28 40,172 120,848 362,876 1,088,960
N=5 415 2,075 8,715 35,275 141,515 566,475 2,266,315 9,065,675
N=6 498 2,988 15,438 77,688 388,938 1,945,188 9,726,438 48,632,688
N=7 581 4,067 24,983 150,479 903,455 5,421,311 32,528,447 195,171,263
N=8 664 5,312 37,848 265,600 1,859,864 13,019,712 91,138,648 637,971,200
• query message length: 83 bytes• simple query relaying (no responses)
11
Gnutella – Generated Traffic in Bytes (2)
T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8
N=3283.68 1,418.4 4,822.56 13,900.3 36,594.7 91,061.3 218,15 508.638
N=4378.24 2,647.68 12,860.2 53,710.1 206,897 758,371 2,688,530 9,306,220
N=5472.8 4,255.2 26,949.6 147,986 753,17 3,658,050 17,214,200 79,185,000
N=6567.36 6,240.96 48,793 332,473 2,105,470 12,743,500 74,798,500 429,398,000
N=7661.92 8,604.96 80,092.3 651,991 4,941,123 35,823,800 252,002,000 1,734,360,000
N=8756.48 11,347.2 122,55 1,160,440 10,242,000 86,526,900 709,521,000 5,693,470,000
• Mean percentage of users who typically share content: 30%• Mean perctg. of users who typically have responses to search queries: 40%• Mean number of search responses the typical respondent offers: 10• Mean length of search responses the typical respondent offers: 60 "Standard client settings yield a whopping 17MB generated in response to […] search query "
12
Freenet - Concepts
• peer-to-peer file storage & retrieval system
• every document has a globally unique ID
• efficient (?) retrieval algorithm– documents are retrieved with sublinear effort
• routing based on likelihood of answer capability
• focus on security
13
Freenet – Query Routing (1)
• every peer maintains routing table
• table contains known peers along with the IDs of the documents their are storing
• a request is routed to the peer most likely to have an answer (closest matching ID)
• responses are sent back upstream
• intermediate peers also store document and augment their routing tables
14
Freenet – Query Routing (2)
Routing TableB: 14, 20Doc Cache19, 30
A B
C
D
Routing TableC: 19, 30D: 45, 51Doc Cache14,20
Routing TableB: 14, 20X: 47, 60Doc Cache5, 89
Routing TableB: 14, 20Z: 105, 110Doc Cache17, 45, 51, 102, 205
1. Query for doc 17 3. C has no match -> backtrack
2. Forward to best match
4. Forward query to 2nd best match
5. Send back doc 17
Routing TableC: 19, 30D: 17, 45, 51Doc Cache14, 17, 20
6. Route back response
Routing TableB: 14, 17, 20X: 47, 60Doc Cache5, 17, 89
15
Freenet – Document Insert
• analogous to query routing
• insert is routed to the peer most likely to be interested in new doc (closest matching ID)
• intermediate peers cache document and augment routing tables
• until TTL is reached
16
Freenet - Discussion
• efficient routing algorithm (compared to Gnutella)
• adequate security features/heuristics (the more popular a document, the more frequently it gets cached)
• no metasearch
• no updates, deletes possible
• worst case query routing = DFS
17
FUtella – Concepts
• peer-to-peer platform for general knowledge sharing
• tries to model learning style of humans
• content-based routing
• combines and extends approaches from:– Gnutella (message format)– JXTA (peer groups)– JXTA Search (queryspaces and registrations)– FreeNet (routing of registration discoveries)
18
FUtella - Knowledge Groups
E
MiM1 . . .
Group Head: Peer E
Members M1 - Mi
FUtella NetKnowledge Group:Queryspace "Computer Architecture"
Inserts Registration
19
FUtella - Knowledge Group Discovery 1
Routing Table"computer" -> B"computer analysis" -> YRegistration Cache"computer": B"computer analysis": Y
A B
C
D
Routing Table"computer analysis" -> C"computer systems" -> D"data base" -> ARegistration Cache"computer analysis" : Y"computer systems": Z"data base" : X
Routing Table"computer" -> B"data base" -> XRegistration Cache"computer": B"data base": X
Routing Table"computer" -> B"computer systems" -> Z"computer architecture" -> ERegistration Cache"computer systems": Z"computer": B"computer architecture": E
1. Discovery request "computer architecture"
3. C has no cached registration for "computer architecture -> backtrack
2. Forward discovery request
4. Forward discovery request to 2nd best match
20
FUtella - Knowledge Group Discovery 2
A B D
Routing Table"computer analysis" -> C"computer architecture" -> D"computer systems" -> D"data base" -> ARegistration Cache"computer analysis" : Y"computer architecture": E"computer systems": Z"data base" : X
Routing Table"computer" -> B"computer architecture" -> D"data base" -> XRegistration Cache"computer": B"computer architecture": E"data base": X
Routing Table"computer" -> B"computer systems" -> Z"computer architecture" -> ERegistration Cache"computer systems": Z"computer": B"computer architecture": E
5. Discovery response
Containing registration "computer architecture": E
6. Forward discovery response
21
Futella - Query Processing
A B
C
D
1. Discovery request "computer architecture"
2. Forward discovery request
3. C has no cached registration for "computer architecture -> backtrack4. Forward discovery
request to 2nd best match
5. Discovery response containing cached registration
6. Forward discovery response
E M1
Mi
.
.
.
8. Forward query to member
8.Forward query to member
9. Query response
9. Query response
Knowledge group "computer architecture"
7. Send query
22
Futella - Test Results (1)
Total Number of Messages
dynamic peersstatic peers semi-dynamic peers
0
50000
100000
150000
200000
250000
# m
sg
threshold 2
no threshold
Gnutella
23
FUtella - Test Results (2)
Average Hit Ratio
dynamic peersstatic peers semi-dynamic peers
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
threshold 2
no threshold
Gnutella
24
Conclusion
• first and second generation P2P systems still most widely used
• practically proven
• very flexible in terms of topology
• bad scalibility (Gnutella)
• no guaranteed lower bound on query effort (Freenet)
• (scientificly) far better approach: DHTs (see next presentation)
25
Questions ?
?
Top Related