SHRACK: A SELF-ORGANIZING PEER-TO-PEER SYSTEM FOR

Click here to load reader

  • date post

    12-Apr-2022
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of SHRACK: A SELF-ORGANIZING PEER-TO-PEER SYSTEM FOR

SHRACK: A SELF-ORGANIZING PEER-TO-PEER SYSTEM FOR DOCUMENT SHARING AND TRACKING
by
Hathai Tanta-ngai
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
at
April 2010
DALHOUSIE UNIVERSITY
FACULTY OF COMPUTER SCIENCE
The undersigned hereby certify that they have read and recommend to
the Faculty of Graduate Studies for acceptance a thesis entitled “SHRACK: A
SELF-ORGANIZING PEER-TO-PEER SYSTEM FOR DOCUMENT SHARING
AND TRACKING” by Hathai Tanta-ngai in partial fulfillment of the requirements
for the degree of Doctor of Philosophy.
Dated: April 23, 2010
Research Supervisors: Dr. Evangelos E. Milios
Dr. Vlado Keselj
Dr. Nur Zincir-Heywood
TITLE: SHRACK: A SELF-ORGANIZING PEER-TO-PEER SYSTEM FOR DOCUMENT SHARING AND TRACKING
DEPARTMENT OR SCHOOL: Faculty of Computer Science
DEGREE: PhD CONVOCATION: October YEAR: 2010
Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions.
Signature of Author
The author reserves other publication rights, and neither the thesis nor extensive extracts from it may be printed or otherwise reproduced without the author’s written permission.
The author attests that permission has been obtained for the use of any copyrighted material appearing in the thesis (other than brief excerpts requiring only proper acknowledgement in scholarly writing) and that all such use is clearly acknowledged.
iii
my precious son, Leonard
and my beloved sister, Kamonwan
iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
1.1.2 Why Pull-Only Communication . . . . . . . . . . . . . . . . . 4
1.2 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2 Background and Related Work . . . . . . . . . . . . . . . 7
2.1 Data and Document Sharing Networks . . . . . . . . . . . . . . . . . 7
2.2 Introduction to Peer-to-Peer Systems . . . . . . . . . . . . . . . . . . 8
2.2.1 Definition of Peer-to-Peer Networks . . . . . . . . . . . . . . . 9
2.2.2 Peer-to-Peer Overlay Networks . . . . . . . . . . . . . . . . . 9
2.3 Peer-to-Peer System for Data Sharing . . . . . . . . . . . . . . . . . . 10
2.4 Gossip Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.1 Blind Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.3 Shortcut Overlay . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.5 Community-Based Information Dissemination Networks . . . . 17
2.6 Re-Coll: Peer-to-Peer Document Tracking Network . . . . . . . . . . 18
2.7 Peer-to-Peer Research Collaboration on JXTA . . . . . . . . . . . . . 19
2.8 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8.2 Collaborative Filtering Applications . . . . . . . . . . . . . . . 20
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Formal Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2 Provider Peer Selection Module . . . . . . . . . . . . . . . . . 28
3.4 Shrack Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Peer Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1 Information Dissemination Protocol . . . . . . . . . . . . . . . . . . . 33
4.1.1 Shrack Messages . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.2 Pull Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.2 Term-Weight Document Metadata Model . . . . . . . . . . . . 41
4.2.3 Peer Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . 42
vi
4.3 Provider Peer Selection Module . . . . . . . . . . . . . . . . . . . . . 49
4.3.1 Random Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.3 Hybrid Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.1 Authorship User Interest Model . . . . . . . . . . . . . . . . . . . . . 53
5.2 Performance Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . 55
5.2.1 Quality of Received Documents . . . . . . . . . . . . . . . . . 56
5.2.2 Dissemination Speed and Distance . . . . . . . . . . . . . . . 58
5.2.3 Self-Organizing Network Property . . . . . . . . . . . . . . . . 59
5.3 ShrackSim: A Shrack Simulator . . . . . . . . . . . . . . . . . . . . . 60
5.3.1 An Overview of PeerSim . . . . . . . . . . . . . . . . . . . . . 61
5.3.2 PeerSim Event-Based Simulation . . . . . . . . . . . . . . . . 61
5.3.3 Life Cycle of a PeerSim Event-Based Simulation . . . . . . . . 62
5.3.4 Main Components of ShrackSim . . . . . . . . . . . . . . . . . 64
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1 Experiment Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3 Parameter Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.5.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 72
vii
7.1 Experiment Hypotheses and Road Map . . . . . . . . . . . . . . . . . 86
7.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3.1 Simulated Users . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.5.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 92
7.6.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 100
7.7.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 105
8.1 Experiment Setup and Performance Metrics . . . . . . . . . . . . . . 113
8.2 Quality of Received Documents . . . . . . . . . . . . . . . . . . . . . 114
8.2.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 114
8.3.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 118
8.4.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 120
9.2 Quality of Received Documents . . . . . . . . . . . . . . . . . . . . . 126
9.2.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 126
9.3.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 130
9.5.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 135
10.1.4 Simulation Environment . . . . . . . . . . . . . . . . . . . . . 142
10.1.5 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . 142
10.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Networks with Unlimited TTL . . . . . . . . . . . . . . . 153
Appendix B Statistic Results of ANOVA Tests on Self-Organizing Net-
works with Limited TTL . . . . . . . . . . . . . . . . . . . 162
Appendix C Statistic Results of ANOVA Tests on the Effect of TTL 169
Appendix D ShrackSim: A Shrack Simulator . . . . . . . . . . . . . . 178
D.1 Main Components of ShrackSim . . . . . . . . . . . . . . . . . . . . . 178
D.1.1 ShrackNode and Related Components . . . . . . . . . . . . . 178
D.1.2 Control Classes . . . . . . . . . . . . . . . . . . . . . . . . . . 181
D.2 The Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . 186
D.3 Running and Evaluating Experiments . . . . . . . . . . . . . . . . . . 192
x
5.1 An example of an authorship user interest model . . . . . . . . 54
6.1 Performance metrics for experiments on scalability of the Shrack dissemination protocol . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Input parameters for experiments on scalability of the Shrack dissemination protocol . . . . . . . . . . . . . . . . . . . . . . . 69
6.3 Parameter setup for experiments on the scalability of the Shrack dissemination protocol in the uniform model . . . . . . . . . . 71
6.4 The Pull Load and New messages per pull response of Shrack dissemination protocol on a uniform model as a function of net- work size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.5 Parameter setup for experiments on scalability of the Shrack dissemination protocol in the super peer model∗ . . . . . . . . . 78
6.6 The average pull load observed by normal peers in super-peer models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.7 The average new messages observed by normal peers in super- peer models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.8 Comparison of pull load, new message per pull response, and message overhead of super peers with a fixed number of super peers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.9 Comparison of pull load and new messages per pull response, and message overhead of super peers with vary number of super peers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.10 The number of super peers in the network with fix-super peer and vary super peer configurations . . . . . . . . . . . . . . . . 83
7.1 The road map of experiments on self-organizing Shrack networks 87
7.2 The summary of peformance metrics in the aspect of the quality of received documents, the dissemination speed and distance and the self-organizing network property . . . . . . . . . . . . . . . 88
xi
7.3 The number of users and documents in each subclass . . . . . . 89
7.4 Experimental parameter setup for self-organizing Shrack network 90
7.5 Notation of provider peer selection strategy with item-based or term-based profile representation . . . . . . . . . . . . . . . . . 92
9.1 Experimental parameter setup to study the effect of Time-To- Live (TTL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.2 Notation for the analysis on the effects of TTL . . . . . . . . . 126
B.1 Tests of Between-Subjects Effects; Dependent Variable: Precision 162
B.2 Tests of Between-Subjects Effects; Dependent Variable: Recall 163
B.3 Tests of Between-Subjects Effects; Dependent Variable: F-score 164
B.4 Tests of Between-Subjects Effects; Dependent Variable: Rele- vant Pull Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
B.5 Tests of Between-Subjects Effects; Dependent Variable: Rele- vant Path length . . . . . . . . . . . . . . . . . . . . . . . . . . 166
B.6 Tests of Between-Subjects Effects; Dependent Variable: CCO . 167
B.7 Tests of Between-Subjects Effects; Dependent Variable: CPL . 168
C.1 Tests of Between-Subjects Effects; Dependent Variable: Precision 170
C.2 Tests of Between-Subjects Effects; Dependent Variable: Recall 171
C.3 Tests of Between-Subjects Effects; Dependent Variable: F-score 172
C.4 Tests of Between-Subjects Effects; Dependent Variable: Rele- vant Pull Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
C.5 Tests of Between-Subjects Effects; Dependent Variable: Rele- vant Path Length . . . . . . . . . . . . . . . . . . . . . . . . . 174
C.6 Tests of Between-Subjects Effects; Dependent Variable: CCO . 175
C.7 Tests of Between-Subjects Effects; Dependent Variable: CPL . 176
C.8 Tests of Between-Subjects Effects; Dependent Variable: Pull Load177
xii
3.2 Shrack network . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Peer p1 publishes a document d and its document metadata is disseminated to peer p2 . . . . . . . . . . . . . . . . . . . . . 31
4.1 Shrack pull request and pull response . . . . . . . . . . . . . . 34
4.2 The prototype of the knowledge integrator module . . . . . . . 38
5.1 A flow chart of the PeerSim simulation life cycle . . . . . . .…