“Blessed are the poor in spirit: for theirs is the kingdom of heaven.”

40
“Blessed are the poor in spirit: for theirs is the kingdom of heaven.” <Matthew 5:3>

Transcript of “Blessed are the poor in spirit: for theirs is the kingdom of heaven.”

“Blessed are the poor in spirit: for theirs is the kingdom of heaven.” <Matthew 5:3>

Intelligent File Sharing Intelligent File Sharing FrameworkFramework

A THESIS INA THESIS INComputer ScienceComputer Science

Changgyu OhChanggyu Oh5/2/20025/2/2002

ContentsContentsTitle Page

Motivations 3Network Topologies 4Problem Domains 5Research Goal 8Related Works 9Intelligent File Sharing Framework 12Framework Figure 13Query Service Using Reasoning 14IS-A/Contained-In Hierarchies 15File Association Rules 16The Benefits of IFS Search 17Grouping Service 18IFS P2P V.S. P2P Network 19Benefits of Dynamic Group Partition 20

Title Page

Dynamic Group Partition 21IP Clue Mechanism 22File Transaction in IFS 24QUERY SERVICE TYPES 25IFS System Architecture 26Client View 27Server View 28IFS Prototype Implementation 29IFS Query Interface 30Experimental Results 32Comparative Analysis 33Contributions 34Conclusion 35Future Work 36References 37

MotivationsMotivations

• Why P2P?– Limitations of Client/Server– Increasing interest in sharing and collaborative

computing– Improving P2P technologies

• Why P2P File Sharing?– FILE Reusability– Share available resources

• Significance of this research– Increase Network scalability– Anonymity– Flexible and powerful query

Network TopologiesNetwork Topologies

Problem Domains (1)Problem Domains (1)

• Limitations of P2P Network – Scalability– Utilization of Network resources– P2P Network Topology

• Broadcast• Logical Mesh network

Problem Domains (2)Problem Domains (2)

• Limitation of Resource Source’s Anonymity – Resource source’s IP address in queryHit

message• Privacy and security• How can source node send it to destination

without revealing its IP address in public?

Problem Domains (3)Problem Domains (3)

• Limitation of Keyword Based Query– Primitive and limited– Only one file searching– Not flexible– Not satisfy users’ requests

Research GoalResearch GoalTo increase P2P network scalability

Message flow control (Dynamic Group Partition and Caching)

To protect the publisher anonymity

IP-Clue mechanism (Encoding/Decoding)

To increase the capacity of file querying

File querying using intelligent reasoning, caching, dynamic peer group

Related Works-IRelated Works-I Anonymous Publication ServiceAnonymous Publication Service

• The Publius system [Marc W., 2000] – document-anonymity because the key is split between the n servers, and without

sufficient shares of the key a server is unable to decrypt the document that is stores.

– Anonymity based on static, system-wide list of available servers.– Not support the adding of new server

• The Eternity system [Ross J., 1996]– Provides publisher’s anonymity by using one-way anonymous re-mailers – Server anonymity is not provided– Reader anonymity is not provided by open public proxies

• Query and Advertising System [Heimbigner D., 2000]– Arbitrary name is placed at the first level server for each client.– First level server has actual IP address of clients

• Freenet [Ian C., 2000]– Provides document-anonymity – Server-anonymity is not provided.

Related Works- IIRelated Works- IIMeta Search MethodsMeta Search Methods

• Efficient and Effective Metasearch [Yu C.,’1999]– representatives for each database optimizing relationship hierarchy

• Efficient Transitive Closure Reasoning [Lee Y.,2001]– inheritance, classification transitive closure reasoning– Class/Part/Containment Hierarchy

• Browsing Large Digital Library Collections[Geffner S., 1999]– classification hierarchies to increase capabilities of the data browsing in digital lib

raries.

Related Works-IIIRelated Works-IIIFile Sharing Systems using CachingFile Sharing Systems using Caching

• The Distributed File System [Burns, R.C , 2000]– Detecting network failures ensures that caches are consistent.

• Network File System [Palmer J., 1996]– Clients poll the server to find out when the file was last modified– Determines the cached version is valid.

• Hint-Based Cooperative Caching file system [Sarkar, P., 2000]– Help clients make decisions based on the computer’s local state– Reduce overhead and access latency

Intelligent File Sharing Intelligent File Sharing FrameworkFramework

• Major Building Blocks:– Query Service using Reasoning– IP-Clue Mechanism:

• Encoding/Decoding

– Dynamic Grouping and Caching Service

Initial components

Components

Services

Combined, enhanced File Sharing Service

FILE ASSOCIATION RULE components

GROUPING SERVICE

IP ENCODING/DECODING SERVICE

QUERY FAST REASONING SERVICE

MESSAGE component

Other Components from

Gnutella

File DOWNLOAD SERVICE USING

HTTP 1.1 CONNECTION

OTHER GNUTELLA SERVICES

OUR FRAMEWORK

File Association Rules

Modified Caching Concept

Modified Query Concept

Partition Algorithm

Brother Relationship Mechanism

IS- A component

RUN- WITH component

CONTAINED- IN component

Message component from

GnutellaModified Caching

Mechanism

QUERY component

Query component from

GnutellaModified Query

Mechanism

NETWORK componentNetwork

component from GnutellaDivision

Mechanism

HOST component

Host component from Gnutella

New Routing Mechanism

Gnutella Components

FLEXIBLE QUERY MECHANISM

GOOD DIRECTORY

SERVICE

ENHANCED ANONYMITY

Efficient Routing

GNUTELLA SERVICE

EnhancedFile

SharingService

Query Service Using ReasoningQuery Service Using Reasoning

• Goal: – Fast search using the file relation hierarchy Set– More flexible query and directory services

• Approach:– Relationships:

• IS-A• Contained-In• Run-With

– File Relation Hierarchy Set <Ν,Ŗ,Ω,Њ> • Set of Number pairs (Ν), • Relation Type (Ŗ), • Constraint Rule (Ω), • Hierarchy Identifier (Њ).

– File Association Rules• Generalized Association Rule • Aggregated Association Rules• Constrain-based Association Rule

IS-A/Contained-In HierarchiesIS-A/Contained-In Hierarchies

[1,13],1: Shared root directory[2, 6],1: Multimedia directory[3, 4],1: MP3 directory[4, 4],1: Love.mp3[5, 5],1: Xfile.mpg[6, 6],1: Sport20.rm[7, 7],1: J avaSource.java[8, 8],1: UserManual.doc[9, 9],1: Tutorial.doc[10,13],1: Game directory[11,11],1: CardGame.zip[12,12],1: Chess.zip[13,13],1: RacingCar.zip

[1,16]

[2,3] [13,16]

[7,7][8,8] [12,12][3,3] [5,5] [14,14][15,15][16,16][11,11]

4,5]

[1,16],2: Extension root directory[2, 3],2: MP3[3, 3],2: Love.mp3[4, 5],2: MPG[5, 5],2: Xfile.mpg[6, 7],2: RM[7, 7],2: Sport20.rm[8, 9],2: J AVA[9, 9],2: J avaSource.java[10,12],2: DOC[11,11],2: UserManual.doc[12,12],2: Tutorial.doc[13,16],2: ZIP[14,14],2: CardGame.zip[15,15],2: Chess.zip[16, 16],2: RacingCar.zip

[6,7][8,9] [10,12]

[1,13]

[2,6]

[10,13]

[3,4]

[7,7] [8,8] [9,9]

[5,5] [6,6] [11,11] [12,12] [13,13]

[4,4]

Contained_In

IS_A

Hierarchy 1

File Association RulesFile Association Rules

• Generalized Association Rule – Subtype relationship between files– E.g., If Window multimedia application X is a multimedia application

Y and if a multimedia file Z is running with the Multimedia application Y, then X runs Z.

• Aggregated Association Rule– directory contains multiple sub-directories or files– E.g., “Find the files on CS101 homework”

• Constrain-based Association Rule– File association based on constraints such as file size, Network capa

city, etc.– E.g., “Find a file whose size is less than 1 MBtype and can be opene

d with MS Word.”

The Benefits of IFS SearchThe Benefits of IFS Search

Method IFS Search Keyword Based Search

Keyword Search Yes Yes

File Extension Search Yes No

Application Search Yes No

Directory Search Yes No

Keyword Search in a certain directory

Yes No

File Extension Search in a certain directory

Yes No

File Search with Constraints

Yes Yes

Combination Yes No

hopi

i

lementsofroutingE

j

jifFmax

1

#

1

),(

hopi

i

lementsofroutingE

j

jifFmax

1

#

1

),(

hopi

i

lementsofroutingE

j

jifFmax

1

#

1

),(

Grouping ServiceGrouping Service• Goal: Increase Scalability

– Control Maximum hop– Control a number of replicas of message

generated by peer hosts– Control a number of peer hosts for message

forwarding in a routing table of each peer host.

• Approach:– Group partition – Brother relationship– Caching

hopi

i

lementsofroutingE

j

jifFmax

1

#

1

),(

hopi

i

lementsofroutingE

j

jifFmax

1

#

1

),(

IFS P2P V.S. P2P NetworkIFS P2P V.S. P2P Network

GROUP II

GROUP I

A

G

C

D

B

F

G

E

E

F

A

D

CB

J

HI

GROUP IIIH

J

IGrouping Method

decentralized Method

Benefits of Dynamic Group Benefits of Dynamic Group PartitionPartition

• Broadcast in a same group – Robust Search against node failure– Ensure a shortest path

• Increase Network Scalability by grouping peers – Server-less and Decentralized manner– Dynamic partition– Reduce network traffics

• Requires only one hop per a group

Dynamic Group Partition Dynamic Group Partition

Group 010 members

Group 1250 members

Group 0250 members

Group 0500 members

gro

om

ing

divide group 0

IP Clue MechanismIP Clue Mechanism

• Goal: Protect identity of resource publisher in P2P file sharing• Approach

– IP Encoding/Decoding• Encoding the IP in source peers• Decoding the encoded IP in destination peers

– Formula:• Assume that IP address of A is represented in [W.X.Y.Z]

(e.g., [255.122.25.5])– (1) W + the size of query– (2) X + the first character of a query– (3) Y + the file extension size– (4) Z + the last character of a query message

Only the destination peers can recognize the IP Clue!!!

P2P Network

queryHit

Sourse IP Address

Encoded IP Address of a Source Host

QUERY componentin a source host

Query component

from Gnutella

Modified Query

Mechanism

IP- Clue Algorithm

QUERY componentin a query originator

Query component

from Gnutella

Modified Query

Mechanism

IP- Clue Algorithm

Sourse IP Address

IP-Clue MechanismIP-Clue Mechanism

Host A Host D134.193.2.34

At the first connection between host A and DD sends filename and instance of

file association rule for the file"bluesea.mpg",[5,6]

Host A Host D134.193.2.34

Now, A can send query for the "bluesea.mpg"with the instance fo the file association rule, which is [5,6]

Now who can figure out what [5,6] means?

File Transaction in IFSFile Transaction in IFS

* Qeury Message format

MS FS FN QT

MS(MinSpeed): The minimum speed (in kb/s) of servants

FS(FileSize): It holds offset of the end of the file name field

FN(FileName): A query string is stored. It can be full length of file name

or partial file name

QT(QueryType): 0: File Search

1: Application for the file

2: File Information

* Result Set Format of the queryHit

FI FS RT FN

FI(FileIndex), FS(FileSize), and FN(Filename) fields are same as gnutella

protocol ver. 0.4.

RT(ReplyType) : 0: File Search

1: Application for the file

2:File Information

QUERY SERVICEQUERY SERVICE TYPES TYPES

IFS System ArchitectureIFS System Architecture

• Component-based Architecture• Servant Component

– Highest level of component– Server + Client Components

• Manager Components:– Control work flow– Assign tasks to worker components

• Worker Components:– Perform actual tasks

• Service (Entity) Components:– Task description

Client ViewClient View

DownloadWorker

The user searches a

query, "AAA.mpg"

UserComputer

DownloadWorker

Listener

FileAssociationManagerIsA

RunWith

ContainedIn

DownloadManager

ServiceManager

QueryManager

ConnectionManager

SendManager

SendWorker

1

1

1

1

1

2

2

3

4

5

6

7

8

9

10

12

11

11

11

12

12

Server ViewServer View

DownloadWorker

send back a requested file

"AAA.mpg" to a peer

ComputerListener

FileAssociationManager

IsA

RunWith

ContainedIn

HostManager

ServiceManager

QueryManager

ConnectionManager

SendManager

ReadWorker

MsgManager

SendWorker

Shared Directory

GroupingWorker

1

1

1

1

1

1

2

2

3

4

5

5

2

2

2

5

6

78

8

8

9

9

9

10

10

10

11

12

IFS Prototype IFS Prototype ImplementationImplementation

• IFS prototype is built on top of Gnutella Phex System• Developing System Environment

– Need at least 25 Mbyte free Memory Space– JAVA Virtual Machine– Pentium III 500MHz CPU

• Event Driven Methods– Each task is performed based on events

• Components based Programming – Manager Components– Worker Components– Service Components

0 Returns, Searching...

IFS

IFS Query InterfaceIFS Query Interface

2 Returns, Searching...

IFS

IFS Query InterfaceIFS Query Interface

Experimental ResultsExperimental ResultsDynamic Group Partition and CacheDynamic Group Partition and Cache

TTL=3 RoutingTable Size=10% of Netw ork

1

10

100

1000

10000

100000

1E+06

1E+07

1E+08

1E+09

1E+10

1E+11

50 100 250 500 750 1000 2500 5000 7500 10000

Hosts

Mes

sage

s

Broadcasting

Grouping Only

Caching Only

TotalGrouping&Caching

Comparative AnalysisComparative AnalysisMeasure Napster Gnutella IFS

Topology Client/Server Logical Mesh Logical Mesh

DesignPurpose

MP3 file sharing File sharing in a decentralized manner

Enhanced Gnutella

Size of Routing table

Need a server’s IP address

O(N) O(K)Where K << N

Node JoinOperation

O(1) O(1) O(1)

Node failure Severe Tolerable Tolerable

Search Mechanism

File indexing based on keyword search

File indexing based on keyword search

Fast Reasoning based on file association rules

Description Client/server based P2P network. Heavy traffics on serversNode failure is severe

DecentralizedHeave traffics due to the exponentially increased replicas of query messages

DecentralizedControl the network trafficsFlexible query mechanism

ContributionsContributions

• Proposed a conceptual framework for decentralized P2P file sharing.– Dynamic group partition and caching– Query using fast reasoning– IP-clue mechanism (encoding/decoding)

• Designed a component-based architecture• Implemented to extend an existing file shar

ing system (Gnutella Phex)

ConclusionConclusion

• The IFS system– Supports decentralized P2P File Sharing.– Increases high Network scalability.– Provides flexible file searching and

querying.– Protect resource sources’ anonymity.

Future WorkFuture Work

• Further Research on the latency due to the grouping

• File registration strategy on heterogeneous environment

• Discover advanced mechanism to reasoning file relationships & file association rules

• Research on the grouping policies– Grouping by peer host’s network capacity– Grouping by interests – Grouping by context– Grouping by location

References:References:

• C. T. Yu, W. Meng, K.-L. Liu, W. Wu, and N. Rishe. Efficient and effective metasearch for a large number of text databases. In CIKM, pages 217--224, 1999

• Y. Lee and J. Geller, Efficient Transitive Closure Reasoning in a Combined Class/Part/Containment Hierarchy, Journal of Knowledge and Information System, 2002

• S. Geffner, D. Agrawal, A. Abbadi and T. Smith, Browsing Large Digital Library Collections Using Classification Hierarchies, CIKM, 195-201, 1999

References: (Continue)References: (Continue)

• M. Waldman, A. Rubin, and L. F. Cranor. Publius: A robust, tamperevident, censorship-resistant, web publishing system. In Proc. 9th USENIX Security Symposium, page 59-72, August 2000

• R. J. Anderson, The Eternity service, in Proceedings of the 1st International Conference on the Theory and Applications of Cryptology (PRAGOCRYPT '96), Prague, Czech Republic 1996.

• J. Palmer, R. Strong, and E. Upfal. Nonblocking membership protocols with asymmetric safety. Technical Report RJ10096 (91912), IBM Research Division, December 1997.

References: (Continue)References: (Continue)

• I. Clarke, O. Sandberg, B. Wiley, and T. Hong. Freenet: A distributed anonymous information storage and retrieval system. In Proceedings of the Workshop on Design Issues in Anonymity and Unobservability, pages 46-66, July 2000.

• D. Heimbigner, Adapting Publish/Subscribe Middleware to Achieve Gnutella-like Functionality. Technical Report CU-CS-909-00, Department of Computer Science, University of Colorado, Sept. 2000

• P. Sarkar, J. H. Hartman ACM Transactions on Computer Systems (TOCS) November 2000 Volume 18 Issue 4