SELF ORGANIZING SEMANTIC TOPOLOGIES IN PEER DATABASE ie. · PDF fileSELF ORGANIZING SEMANTIC...

Click here to load reader

  • date post

    29-Mar-2018
  • Category

    Documents

  • view

    215
  • download

    3

Embed Size (px)

Transcript of SELF ORGANIZING SEMANTIC TOPOLOGIES IN PEER DATABASE ie. · PDF fileSELF ORGANIZING SEMANTIC...

  • SELF ORGANIZING SEMANTIC

    TOPOLOGIES IN PEER DATABASE

    SYSTEMS

    AMI EYAL

  • SELF ORGANIZING SEMANTIC TOPOLOGIES

    IN PEER DATABASE SYSTEMS

    RESEARCH THESIS

    SUBMITTED IN PARTIAL FULFILLMENT OF THE

    REQUIREMENTS

    FOR THE DEGREE OF MASTER OF SCIENCE

    IN INFORMATION MANAGEMENT ENGINEERING

    AMI EYAL

    SUBMITTED TO THE SENATE OF THE TECHNION ISRAEL INSTITUTE OF TECHNOLOGY

    TAMMUZ, 5767 HAIFA JUNE, 2007

  • THIS RESEARCH THESIS WAS SUPERVISED BY DR. AVIGDOR GAL

    UNDER THE AUSPICES OF THE INDUSTRIAL ENGINEERING AND

    MANAGEMENT DEPARTMENT

    ACKNOWLEDGMENT

    I would like to express my deepest gratitude to my supervisor, Pro-

    fessor Avigdor Gal, for his devoted guidance and wise counsel. My

    sincere thanks to the faculty personnel, for their help in all practical

    and administrative matters during my studies, special thanks are given

    to Judith Ish-Lev. Additional thanks to my colleagues, Haggai, Inbal,

    Victor and others, for helpful discussions, motivation and support when

    I most needed it. Last and most important, I am deeply indebted to

    my dear family and friends, whose endless love and support enabled

    the completion of this work.

    THE GENEROUS FINANCIAL HELP OF THE EUROPEAN COMMISSION

    SIXTH FRAMEWORK IST PROJECT QUALEG AND THE TECHNION IS

    GRATEFULLY ACKNOWLEDGED

  • Contents

    Abstract xi

    List of Symbols 1

    1 Introduction 3

    1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.1.1 Schema Matching . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.1.2 Peer Database Systems . . . . . . . . . . . . . . . . . . . . . 7

    1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2 Model Definition 14

    2.1 The Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.2 The Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.2.1 Schema Mappings . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.2.2 Query Dissemination . . . . . . . . . . . . . . . . . . . . . . . 18

    2.2.3 Semantic Topology . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.3 The Matching Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.3.1 Mapping Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 22

    iii

  • CONTENTS iv

    2.3.2 Mapping Accuracy Preservation . . . . . . . . . . . . . . . . 26

    2.4 Evaluation of semantic topologies . . . . . . . . . . . . . . . . . . . . 31

    2.4.1 Self-Interest Based Topology Evaluation . . . . . . . . . . . . 32

    2.4.2 Cooperative Interest Based Topology Evaluation . . . . . . . 34

    3 On Optimal Semantic Topologies 37

    3.1 Optimal Self-Interest Based Topologies . . . . . . . . . . . . . . . . . 38

    3.2 Optimal Cooperative-Interest Based Topologies . . . . . . . . . . . . 40

    3.2.1 Degree Bounded Maximum Minimal Product Paths Tree (db-

    MMPT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    3.2.2 Single Peer Single Query (SPSQ) Optimal Topology Problem 50

    3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    4 Dynamic Self-Organizing Topologies 54

    4.1 Semantic Acquaintance . . . . . . . . . . . . . . . . . . . . . . . . . 55

    4.2 Semantic Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    5 Experiments 67

    5.1 Simulation Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 67

    5.2 Data and parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    5.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    5.5.1 Good Initial Topologies . . . . . . . . . . . . . . . . . . . . . 78

    5.5.2 Initial Bad Topologies . . . . . . . . . . . . . . . . . . . . . . 82

    5.5.3 Randomly Generated Topologies . . . . . . . . . . . . . . . . 92

  • CONTENTS v

    5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    6 Discussion 109

    6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

    6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    References 111

    Hebrew Abstract k

  • List of Figures

    2.1 A query reformulation example. . . . . . . . . . . . . . . . . . . . . . 17

    2.2 DPMS model description. . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.3 A semantic network graph, where peers schemata are interlinked by

    schema mappings provided by the peers. . . . . . . . . . . . . . . . . 19

    2.4 Semantic Network Model: Query translation layers and a Topology

    with a limit of Kp = 2 neighbors. . . . . . . . . . . . . . . . . . . . . 21

    2.5 An example of mapping accuracy. . . . . . . . . . . . . . . . . . . . . 24

    2.6 An example of mapping preservation. . . . . . . . . . . . . . . . . . . 28

    2.7 An example for query reformulation graph. . . . . . . . . . . . . . . . 30

    2.8 An example for accuracy oriented semantic topology evaluation. . . . 33

    3.1 Classification of the optimal CIV topology problem. . . . . . . . . . 41

    3.2 Example for maximum minimal product paths tree (MMPT) and max-

    imum product paths tree (MPT). . . . . . . . . . . . . . . . . . . . . 44

    3.3 Example of transformation from MPT to SPT. . . . . . . . . . . . . . 46

    3.4 Example of MMPT Vs. db-MMPT. . . . . . . . . . . . . . . . . . . . 47

    3.5 Example of transformation from ATSP to db-MMPT. . . . . . . . . . 50

    3.6 Example of transformation from db-MMPT to SPSQ. . . . . . . . . . 52

    vi

  • LIST OF FIGURES vii

    4.1 Semantically disconnected components. . . . . . . . . . . . . . . . . . 56

    4.2 Acquaintance policies example. . . . . . . . . . . . . . . . . . . . . . 60

    4.3 Bad replacement example. . . . . . . . . . . . . . . . . . . . . . . . . 63

    5.1 Simulation Model: domain, schemata, and query sets. . . . . . . . . . 68

    5.2 Simulation Model: semantic topology and query translation layers. . . 70

    5.3 Simulation Model: sequence Diagram of a single query cycle. . . . . . 71

    5.4 Domain attributes probability for participation in peer schemas and

    queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    5.5 Attributes mapping accuracies distributions for similar and different

    attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    5.6 Network topology: out degree Vs. peer rank following power law. . . 74

    5.7 Replacement policies comparison: convergence in initial good topologies 79

    5.8 Replacement policies comparison: topology changes in initial good

    topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    5.9 Replacement policies comparison: SIV change in initial good topologies 81

    5.10 Replacement policies comparison: CIV change in initial good topologies 81

    5.11 Acquaintance policies comparison: convergence in initial bad topologies 83

    5.12 Acquaintance policies comparison: topology changes in initial bad

    topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    5.13 Acquaintance policies comparison: SIV change in initial bad topologies 84

    5.14 Acquaintance policies comparison: CIV change in initial bad topologies 85

    5.15 Acquaintance policies comparison: reachability change in initial bad

    topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

  • LIST OF FIGURES viii

    5.16 Acquaintance policies comparison: average CIV measure change in

    initial bad topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

    5.17 Replacement policies comparison: convergence in initial bad topologies 87

    5.18 Replacement policies comparison: topology changes in initial bad topologies 88

    5.19 Replacement policies comparison: SIV change in initial bad topologies 89

    5.20 Replacement policies comparison: CIV change in initial bad topologies 90

    5.21 Replacement policies comparison: average CIV measure change . . . 90

    5.22 Replacement policies comparison: reachability change in initial bad

    topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    5.23 Acquaintance policies comparison: convergence in randomly generated

    topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    5.24 Acquaintance policies comparison: topology changes in randomly gen-

    erated topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

    5.25 Acquaintance policies comparison: SIV change in randomly generated

    topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    5.26 Acquaintance policies comparison: CIV change in randomly generated

    topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    5.27 Acquaintance policies comparison: average CIV change in randomly

    generated topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    5.28 Acquaintance policies comparison: reachability change in randomly

    generated topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    5.29 Replacement policies comparison: convergence in randomly generated

    topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .