Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham...

14
Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    1

Transcript of Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham...

Page 1: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Data-Extraction Ontology Generation by Example

Yuanqiu (Joe) ZhouData Extraction Group

Brigham Young UniversitySponsored by NSF

Page 2: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Motivation

Semi-structured Web data need to be extracted for further manipulations.

Contrast to other wrapper generation techniques, BYU ontology-based data-extraction technique is resilient.

By-Example approach makes it possible to help common users generate ontologies easily.

Page 3: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Web-based System GUI

Canon PowerShot S40

4.0 1600 x 12001024 x 768640 x 480

Page 4: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Architecture

Data Frame Library

User Defined Form

System GUI

Sample Pages

Ontology Generator

Extraction Engine Test PagesPopulated Database

Extraction Ontology

Page 5: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Extraction Ontology

Object and Relationship Sets and Constraints

Extraction Patterns

Keywords

Context Expressions

Page 6: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

BaseA

B

C

D1 D2

E1 E2

Base [0:1] A [1:*]

Base [0:2] B [1:*]

Base [0:*] C [1:*]

Base [0:2] D1 [1:*] D2 [1:*]

Base [0:*] E1 [1:*] E2 [1:*]

Ontology GenerationObject and Relationship Sets and Constraints

Page 7: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Base

A

B

A

B1

B2

B1, B2 : B

G

H I

F

A [0:1] F [1:*]

B1 [0:1] G [1:*]

B2 [0:1] H [1:*] I [1:*]

Ontology GenerationObject and Relationship Sets and Constraints

Page 8: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Sample Web Page User Created Form

CCD Resolution Image Resolution

Optical Zoom

Digital Zoom

Digital Camera

Brand Model

Zoom

Zoom

PowerShot G2 Canon

4.0 2272 x 1074

3

2

Object and Relationship Sets and Constraints

DigitalCamera [-> object]DigitalCamera [0:1] Brand [1:*]DigitalCamera [0:1] Model [1:*]DigitalCamera [0:1] CCDResolution [1:*]DigitalCamera [0:1] ImageResolution [1:*]DigitalCamera [0:1] Zoom [1:*]

Zoom [0:1] DigitalZoom [1:*]Zoom [0:1] OpticalZoom [1:*]

Page 9: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Ontology GenerationExtraction Patterns

Data Frame Library Lexicons Synonym Dictionaries or thesauri Regular Expressions

Matching extraction patterns: Only one (bingo!) More than one (use extraction pattern filters) No matching extraction pattern (create one)

Page 10: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Features a high-quality 4.0 Megapixel Resolution CCD

The new Nikon Coolpix 995 boasts of a 3.34 Megapixel CCD

3 effective megapixel

Ontology GenerationKeywords

Page 11: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

3.5x optical zoom (2.5x digital)

a superior 4x Optical Zoom Nikkor lens, plus 4x stepless digital zoom

optical 3X /digital 6X zoom

Ontology GenerationContext Expressions

Page 12: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

DigitalCamera [-> object];DigitalCamera [0:1] Brand [1:*];DigitalCamera [0:1] ImageResolution [1:*];DigitalCamera [0:1] Zoom [1:*];DigitalCamera [0:1] CCDResolution [1:*];

Zoom[0:1] OpticalZoom[1:*];

Brand matches [10] constant{ extract "\bNikon\b";},

{ extract "\bCanon\b";},{ extract "\bOlympus\b";},{ extract "\bMinolta\b";},{ extract "\bSony\b";};

end;

CCD Resolution matches [20] constant{ extract "\b\d(\.\d{1,2})?\b"; };

keyword "\bMegapixel\b“, "\bCCD\b", "\bCCD Resolution\b";

end;

OpticalZoom matches [10]constant{ extract "\b\d(\.\d)";

context "\b\d(\.\d)?(x)\b"; };keyword "\boptical\b";

end;

Extraction Ontology

Page 13: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Measurements How much of the ontology was generated with

respect to how much could have been generated?

How many components generated should not have been generated?

What comparisons can we make about the precision and recall ratios of extraction data between a system-generated ontology and an expert-generated ontology?

How many sample pages are necessary for acceptable system performance?

Page 14: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.

Contributions

Proposes a by-example approach to semi-automatically generate data-extraction ontologies

Constructs a Web-based tool to generate data-extraction ontologies