“Blessed are the poor in spirit: for theirs is the kingdom of heaven.”
-
Upload
edmund-richards -
Category
Documents
-
view
220 -
download
2
Transcript of “Blessed are the poor in spirit: for theirs is the kingdom of heaven.”
Intelligent File Sharing Intelligent File Sharing FrameworkFramework
A THESIS INA THESIS INComputer ScienceComputer Science
Changgyu OhChanggyu Oh5/2/20025/2/2002
ContentsContentsTitle Page
Motivations 3Network Topologies 4Problem Domains 5Research Goal 8Related Works 9Intelligent File Sharing Framework 12Framework Figure 13Query Service Using Reasoning 14IS-A/Contained-In Hierarchies 15File Association Rules 16The Benefits of IFS Search 17Grouping Service 18IFS P2P V.S. P2P Network 19Benefits of Dynamic Group Partition 20
Title Page
Dynamic Group Partition 21IP Clue Mechanism 22File Transaction in IFS 24QUERY SERVICE TYPES 25IFS System Architecture 26Client View 27Server View 28IFS Prototype Implementation 29IFS Query Interface 30Experimental Results 32Comparative Analysis 33Contributions 34Conclusion 35Future Work 36References 37
MotivationsMotivations
• Why P2P?– Limitations of Client/Server– Increasing interest in sharing and collaborative
computing– Improving P2P technologies
• Why P2P File Sharing?– FILE Reusability– Share available resources
• Significance of this research– Increase Network scalability– Anonymity– Flexible and powerful query
Problem Domains (1)Problem Domains (1)
• Limitations of P2P Network – Scalability– Utilization of Network resources– P2P Network Topology
• Broadcast• Logical Mesh network
Problem Domains (2)Problem Domains (2)
• Limitation of Resource Source’s Anonymity – Resource source’s IP address in queryHit
message• Privacy and security• How can source node send it to destination
without revealing its IP address in public?
Problem Domains (3)Problem Domains (3)
• Limitation of Keyword Based Query– Primitive and limited– Only one file searching– Not flexible– Not satisfy users’ requests
Research GoalResearch GoalTo increase P2P network scalability
Message flow control (Dynamic Group Partition and Caching)
To protect the publisher anonymity
IP-Clue mechanism (Encoding/Decoding)
To increase the capacity of file querying
File querying using intelligent reasoning, caching, dynamic peer group
Related Works-IRelated Works-I Anonymous Publication ServiceAnonymous Publication Service
• The Publius system [Marc W., 2000] – document-anonymity because the key is split between the n servers, and without
sufficient shares of the key a server is unable to decrypt the document that is stores.
– Anonymity based on static, system-wide list of available servers.– Not support the adding of new server
• The Eternity system [Ross J., 1996]– Provides publisher’s anonymity by using one-way anonymous re-mailers – Server anonymity is not provided– Reader anonymity is not provided by open public proxies
• Query and Advertising System [Heimbigner D., 2000]– Arbitrary name is placed at the first level server for each client.– First level server has actual IP address of clients
• Freenet [Ian C., 2000]– Provides document-anonymity – Server-anonymity is not provided.
Related Works- IIRelated Works- IIMeta Search MethodsMeta Search Methods
• Efficient and Effective Metasearch [Yu C.,’1999]– representatives for each database optimizing relationship hierarchy
• Efficient Transitive Closure Reasoning [Lee Y.,2001]– inheritance, classification transitive closure reasoning– Class/Part/Containment Hierarchy
• Browsing Large Digital Library Collections[Geffner S., 1999]– classification hierarchies to increase capabilities of the data browsing in digital lib
raries.
Related Works-IIIRelated Works-IIIFile Sharing Systems using CachingFile Sharing Systems using Caching
• The Distributed File System [Burns, R.C , 2000]– Detecting network failures ensures that caches are consistent.
• Network File System [Palmer J., 1996]– Clients poll the server to find out when the file was last modified– Determines the cached version is valid.
• Hint-Based Cooperative Caching file system [Sarkar, P., 2000]– Help clients make decisions based on the computer’s local state– Reduce overhead and access latency
Intelligent File Sharing Intelligent File Sharing FrameworkFramework
• Major Building Blocks:– Query Service using Reasoning– IP-Clue Mechanism:
• Encoding/Decoding
– Dynamic Grouping and Caching Service
Initial components
Components
Services
Combined, enhanced File Sharing Service
FILE ASSOCIATION RULE components
GROUPING SERVICE
IP ENCODING/DECODING SERVICE
QUERY FAST REASONING SERVICE
MESSAGE component
Other Components from
Gnutella
File DOWNLOAD SERVICE USING
HTTP 1.1 CONNECTION
OTHER GNUTELLA SERVICES
OUR FRAMEWORK
File Association Rules
Modified Caching Concept
Modified Query Concept
Partition Algorithm
Brother Relationship Mechanism
IS- A component
RUN- WITH component
CONTAINED- IN component
Message component from
GnutellaModified Caching
Mechanism
QUERY component
Query component from
GnutellaModified Query
Mechanism
NETWORK componentNetwork
component from GnutellaDivision
Mechanism
HOST component
Host component from Gnutella
New Routing Mechanism
Gnutella Components
FLEXIBLE QUERY MECHANISM
GOOD DIRECTORY
SERVICE
ENHANCED ANONYMITY
Efficient Routing
GNUTELLA SERVICE
EnhancedFile
SharingService
Query Service Using ReasoningQuery Service Using Reasoning
• Goal: – Fast search using the file relation hierarchy Set– More flexible query and directory services
• Approach:– Relationships:
• IS-A• Contained-In• Run-With
– File Relation Hierarchy Set <Ν,Ŗ,Ω,Њ> • Set of Number pairs (Ν), • Relation Type (Ŗ), • Constraint Rule (Ω), • Hierarchy Identifier (Њ).
– File Association Rules• Generalized Association Rule • Aggregated Association Rules• Constrain-based Association Rule
IS-A/Contained-In HierarchiesIS-A/Contained-In Hierarchies
[1,13],1: Shared root directory[2, 6],1: Multimedia directory[3, 4],1: MP3 directory[4, 4],1: Love.mp3[5, 5],1: Xfile.mpg[6, 6],1: Sport20.rm[7, 7],1: J avaSource.java[8, 8],1: UserManual.doc[9, 9],1: Tutorial.doc[10,13],1: Game directory[11,11],1: CardGame.zip[12,12],1: Chess.zip[13,13],1: RacingCar.zip
[1,16]
[2,3] [13,16]
[7,7][8,8] [12,12][3,3] [5,5] [14,14][15,15][16,16][11,11]
4,5]
[1,16],2: Extension root directory[2, 3],2: MP3[3, 3],2: Love.mp3[4, 5],2: MPG[5, 5],2: Xfile.mpg[6, 7],2: RM[7, 7],2: Sport20.rm[8, 9],2: J AVA[9, 9],2: J avaSource.java[10,12],2: DOC[11,11],2: UserManual.doc[12,12],2: Tutorial.doc[13,16],2: ZIP[14,14],2: CardGame.zip[15,15],2: Chess.zip[16, 16],2: RacingCar.zip
[6,7][8,9] [10,12]
[1,13]
[2,6]
[10,13]
[3,4]
[7,7] [8,8] [9,9]
[5,5] [6,6] [11,11] [12,12] [13,13]
[4,4]
Contained_In
IS_A
Hierarchy 1
File Association RulesFile Association Rules
• Generalized Association Rule – Subtype relationship between files– E.g., If Window multimedia application X is a multimedia application
Y and if a multimedia file Z is running with the Multimedia application Y, then X runs Z.
• Aggregated Association Rule– directory contains multiple sub-directories or files– E.g., “Find the files on CS101 homework”
• Constrain-based Association Rule– File association based on constraints such as file size, Network capa
city, etc.– E.g., “Find a file whose size is less than 1 MBtype and can be opene
d with MS Word.”
The Benefits of IFS SearchThe Benefits of IFS Search
Method IFS Search Keyword Based Search
Keyword Search Yes Yes
File Extension Search Yes No
Application Search Yes No
Directory Search Yes No
Keyword Search in a certain directory
Yes No
File Extension Search in a certain directory
Yes No
File Search with Constraints
Yes Yes
Combination Yes No
hopi
i
lementsofroutingE
j
jifFmax
1
#
1
),(
hopi
i
lementsofroutingE
j
jifFmax
1
#
1
),(
hopi
i
lementsofroutingE
j
jifFmax
1
#
1
),(
Grouping ServiceGrouping Service• Goal: Increase Scalability
– Control Maximum hop– Control a number of replicas of message
generated by peer hosts– Control a number of peer hosts for message
forwarding in a routing table of each peer host.
• Approach:– Group partition – Brother relationship– Caching
hopi
i
lementsofroutingE
j
jifFmax
1
#
1
),(
hopi
i
lementsofroutingE
j
jifFmax
1
#
1
),(
IFS P2P V.S. P2P NetworkIFS P2P V.S. P2P Network
GROUP II
GROUP I
A
G
C
D
B
F
G
E
E
F
A
D
CB
J
HI
GROUP IIIH
J
IGrouping Method
decentralized Method
Benefits of Dynamic Group Benefits of Dynamic Group PartitionPartition
• Broadcast in a same group – Robust Search against node failure– Ensure a shortest path
• Increase Network Scalability by grouping peers – Server-less and Decentralized manner– Dynamic partition– Reduce network traffics
• Requires only one hop per a group
Dynamic Group Partition Dynamic Group Partition
Group 010 members
Group 1250 members
Group 0250 members
Group 0500 members
gro
om
ing
divide group 0
IP Clue MechanismIP Clue Mechanism
• Goal: Protect identity of resource publisher in P2P file sharing• Approach
– IP Encoding/Decoding• Encoding the IP in source peers• Decoding the encoded IP in destination peers
– Formula:• Assume that IP address of A is represented in [W.X.Y.Z]
(e.g., [255.122.25.5])– (1) W + the size of query– (2) X + the first character of a query– (3) Y + the file extension size– (4) Z + the last character of a query message
Only the destination peers can recognize the IP Clue!!!
P2P Network
queryHit
Sourse IP Address
Encoded IP Address of a Source Host
QUERY componentin a source host
Query component
from Gnutella
Modified Query
Mechanism
IP- Clue Algorithm
QUERY componentin a query originator
Query component
from Gnutella
Modified Query
Mechanism
IP- Clue Algorithm
Sourse IP Address
IP-Clue MechanismIP-Clue Mechanism
Host A Host D134.193.2.34
At the first connection between host A and DD sends filename and instance of
file association rule for the file"bluesea.mpg",[5,6]
Host A Host D134.193.2.34
Now, A can send query for the "bluesea.mpg"with the instance fo the file association rule, which is [5,6]
Now who can figure out what [5,6] means?
File Transaction in IFSFile Transaction in IFS
* Qeury Message format
MS FS FN QT
MS(MinSpeed): The minimum speed (in kb/s) of servants
FS(FileSize): It holds offset of the end of the file name field
FN(FileName): A query string is stored. It can be full length of file name
or partial file name
QT(QueryType): 0: File Search
1: Application for the file
2: File Information
* Result Set Format of the queryHit
FI FS RT FN
FI(FileIndex), FS(FileSize), and FN(Filename) fields are same as gnutella
protocol ver. 0.4.
RT(ReplyType) : 0: File Search
1: Application for the file
2:File Information
QUERY SERVICEQUERY SERVICE TYPES TYPES
IFS System ArchitectureIFS System Architecture
• Component-based Architecture• Servant Component
– Highest level of component– Server + Client Components
• Manager Components:– Control work flow– Assign tasks to worker components
• Worker Components:– Perform actual tasks
• Service (Entity) Components:– Task description
Client ViewClient View
DownloadWorker
The user searches a
query, "AAA.mpg"
UserComputer
DownloadWorker
Listener
FileAssociationManagerIsA
RunWith
ContainedIn
DownloadManager
ServiceManager
QueryManager
ConnectionManager
SendManager
SendWorker
1
1
1
1
1
2
2
3
4
5
6
7
8
9
10
12
11
11
11
12
12
Server ViewServer View
DownloadWorker
send back a requested file
"AAA.mpg" to a peer
ComputerListener
FileAssociationManager
IsA
RunWith
ContainedIn
HostManager
ServiceManager
QueryManager
ConnectionManager
SendManager
ReadWorker
MsgManager
SendWorker
Shared Directory
GroupingWorker
1
1
1
1
1
1
2
2
3
4
5
5
2
2
2
5
6
78
8
8
9
9
9
10
10
10
11
12
IFS Prototype IFS Prototype ImplementationImplementation
• IFS prototype is built on top of Gnutella Phex System• Developing System Environment
– Need at least 25 Mbyte free Memory Space– JAVA Virtual Machine– Pentium III 500MHz CPU
• Event Driven Methods– Each task is performed based on events
• Components based Programming – Manager Components– Worker Components– Service Components
Experimental ResultsExperimental ResultsDynamic Group Partition and CacheDynamic Group Partition and Cache
TTL=3 RoutingTable Size=10% of Netw ork
1
10
100
1000
10000
100000
1E+06
1E+07
1E+08
1E+09
1E+10
1E+11
50 100 250 500 750 1000 2500 5000 7500 10000
Hosts
Mes
sage
s
Broadcasting
Grouping Only
Caching Only
TotalGrouping&Caching
Comparative AnalysisComparative AnalysisMeasure Napster Gnutella IFS
Topology Client/Server Logical Mesh Logical Mesh
DesignPurpose
MP3 file sharing File sharing in a decentralized manner
Enhanced Gnutella
Size of Routing table
Need a server’s IP address
O(N) O(K)Where K << N
Node JoinOperation
O(1) O(1) O(1)
Node failure Severe Tolerable Tolerable
Search Mechanism
File indexing based on keyword search
File indexing based on keyword search
Fast Reasoning based on file association rules
Description Client/server based P2P network. Heavy traffics on serversNode failure is severe
DecentralizedHeave traffics due to the exponentially increased replicas of query messages
DecentralizedControl the network trafficsFlexible query mechanism
ContributionsContributions
• Proposed a conceptual framework for decentralized P2P file sharing.– Dynamic group partition and caching– Query using fast reasoning– IP-clue mechanism (encoding/decoding)
• Designed a component-based architecture• Implemented to extend an existing file shar
ing system (Gnutella Phex)
ConclusionConclusion
• The IFS system– Supports decentralized P2P File Sharing.– Increases high Network scalability.– Provides flexible file searching and
querying.– Protect resource sources’ anonymity.
Future WorkFuture Work
• Further Research on the latency due to the grouping
• File registration strategy on heterogeneous environment
• Discover advanced mechanism to reasoning file relationships & file association rules
• Research on the grouping policies– Grouping by peer host’s network capacity– Grouping by interests – Grouping by context– Grouping by location
References:References:
• C. T. Yu, W. Meng, K.-L. Liu, W. Wu, and N. Rishe. Efficient and effective metasearch for a large number of text databases. In CIKM, pages 217--224, 1999
• Y. Lee and J. Geller, Efficient Transitive Closure Reasoning in a Combined Class/Part/Containment Hierarchy, Journal of Knowledge and Information System, 2002
• S. Geffner, D. Agrawal, A. Abbadi and T. Smith, Browsing Large Digital Library Collections Using Classification Hierarchies, CIKM, 195-201, 1999
References: (Continue)References: (Continue)
• M. Waldman, A. Rubin, and L. F. Cranor. Publius: A robust, tamperevident, censorship-resistant, web publishing system. In Proc. 9th USENIX Security Symposium, page 59-72, August 2000
• R. J. Anderson, The Eternity service, in Proceedings of the 1st International Conference on the Theory and Applications of Cryptology (PRAGOCRYPT '96), Prague, Czech Republic 1996.
• J. Palmer, R. Strong, and E. Upfal. Nonblocking membership protocols with asymmetric safety. Technical Report RJ10096 (91912), IBM Research Division, December 1997.
References: (Continue)References: (Continue)
• I. Clarke, O. Sandberg, B. Wiley, and T. Hong. Freenet: A distributed anonymous information storage and retrieval system. In Proceedings of the Workshop on Design Issues in Anonymity and Unobservability, pages 46-66, July 2000.
• D. Heimbigner, Adapting Publish/Subscribe Middleware to Achieve Gnutella-like Functionality. Technical Report CU-CS-909-00, Department of Computer Science, University of Colorado, Sept. 2000
• P. Sarkar, J. H. Hartman ACM Transactions on Computer Systems (TOCS) November 2000 Volume 18 Issue 4