Human Resource Management Information System Project: Data ...
DataBase and Information System … on Web The term information system refers to a system of...
-
Upload
rodger-daniel -
Category
Documents
-
view
222 -
download
0
Transcript of DataBase and Information System … on Web The term information system refers to a system of...
![Page 1: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/1.jpg)
Basi di Dati e Sistemi Informativi su Web
Prof. Massimo Ruffolo
Ing. Ermelinda Oro
UNICAL - A.A. 2008-2009
![Page 2: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/2.jpg)
DataBase and Information System … on Web The term information system refers to a system of
persons, data records and activities that process the data and information in an organization, and it includes the organization's manual and automated processes.
A database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model.
![Page 3: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/3.jpg)
Querying unstructured sources
![Page 4: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/4.jpg)
Querying unstructured sources Structure query over unstructured document
Extract/Select/Annotate politicianNews
From http://...
Where politicianNews(X,Y,Z),
Z:politician(name:N),
N=hillaryClinton
[Fill database uri]
This kind of query can be executed over database or unstructured document. Only the rewriting strategy changes
![Page 5: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/5.jpg)
Information extraction and Annotation
Information extraction (IE): enables to acquire information contained in unstructured documents and store them in structured forms
Current Web into a Semantic Web requires automatic approaches for annotation of existing data since manual annotation approaches will not scale in general. More scalable semi-automatic approaches known from ontology learning deal with extraction of ontologies from texts (also in tabular form).
An ontology-based system for information
extraction from semi and unstructured Web
Documents
![Page 6: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/6.jpg)
Motivations Existing IE approaches mainly exploits syntactic structure of
information and not its actual semantics
Much work on IE from HTML documents: There is not a unique winning approach Extraction rules are able to identify tabular information only when such a
structure is explicitly declared Variability of HTML language and the use of Cascading Style Sheet
technology, produce classic HTML approaches not robust
Too little work on IE from PDF documents: No ontology-based approaches Existing Table Recognition approaches and information extraction
follow distinct scope
![Page 7: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/7.jpg)
State of Art
Existing Approaches
and Systems
Manual approches
TSIMMIS
Minerva
W4F
XWRAP
JEDI
FLORID
Supervised Approches
SRV
RAPIER
WHISK
WIEN
STALKER
SoftMealy
NoDoSe
DEByE
LixTo
Unsupervised Approaches
STAVIES
DeLa
RoadRunner
EXALG
DEPTA
NLP-oriented
system
GATE
RAPIER
SRV
WHISK
TextRunner
SnowBall
PDF-oriented
approachesFlesca et Al.
(Fuzzy System) 06
Gottlob et Al. 06
Document Understanding
techniques
HTML
![Page 8: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/8.jpg)
PDF Document: the standard format for document publication, sharing and exchange IE from Adobe Portable Document Format (PDF)
One of the most diffused unstructured document format PDF documents are completely unstructured and their
internal encoding is visualization-oriented The PDF document description language represents a
PDF document as a collection of 2-dimensional typographic elements contained in content streams
Traditional wrapping/IE systems cannot be applied
![Page 9: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/9.jpg)
Information Extraction from Documents by means extraction rules that:
i. Exploit a human-oriented document representation: 2-dimensional representation
ii. Exploit semantics of the information represented in a Knowledge Base
iii. Directly Populate (enrich) the Knowledge Base with the Extracted Information
iv. Handle both natural language and document structures (by exploiting embedded Table Recognition Approach)
v. Allow (Semantic) annotation of unstructured sources for enabling semantic classification and search
Goals
?
![Page 10: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/10.jpg)
Proposed Approach
To exploit semantics represented in a Knowledge Base To recognize information (when they are organized in
both textual and tabular form) To directly store extracted information in the Knowledge
Base
:
2
allowthatRulesExtraction
FormalismionepresentatRKnowledge
PatternsbasedGrammarsAttribute
ionepresentatRDocumentlDimensiona
![Page 11: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/11.jpg)
2-Dimensional Document Representation Semantic given by the position
Value about
Operating revenues
Obtained in 2007 year
![Page 12: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/12.jpg)
Internal Document Representation:Input Document
![Page 13: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/13.jpg)
2-Dimensional Document Representation: Document Portion
(0,0) X
Y
![Page 14: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/14.jpg)
2-Dimensional Document Representation: Document Portion
(0,0) X
Y
(1,32)
(4,33)
1 4
32
33
![Page 15: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/15.jpg)
2-Dimensional Document Representation:Document Portion
(1,32)
(4,33)
33,432,1
""
2
1
v
v
WarningTornado
21,, vv
Portioning Process
![Page 16: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/16.jpg)
Attribute Grammars
Example: math expression
E → [+ | −] T [ (+ | −) T ]*
T → F [ (* | /) F]*
F → NUM | (E)
An attribute for each symbol of the grammar and local attributes used as aid. So, the semantic action allow to compute the value of the expression:
E → {double E.ris; int segno =1;} [+ | − {segno= −1;} ] T1 {E.ris=segno*T1.ris;}
[ (+ {segno=1;} | − {segno=−1;}) T2 {E.ris=E.ris+segno*T2.ris;} ]*
T → {double T.ris; int oper;} F1 {T.ris=F1.ris;} [ (* {oper=1;} | / {oper=2;} )
F2 {T.ris=(oper==1)?T.ris*F2.ris : T.ris/F2.ris;}]*
F → {double F.ris;} NUM {F.ris=NUM.val;} | (E) {F.ris=E.ris;}
redPFuncAttSVVΠAG
GrammarAttribute
CFG
TN ,,,,,,
:
![Page 17: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/17.jpg)
Simple Extraction Patterns: regex Recognize a float number
\d+(\.\d{2})?
Mail address: ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$(C|c)ittà
![Page 18: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/18.jpg)
,,,,,,,,
:
IRCADO
ogyOntol
place
city
Knowledge Representation Formalism
ID name
ID name population inState
chicago “Chicago” 2833321 illinois
cityClimate
city
![Page 19: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/19.jpg)
Self-Describing/Populating Ontology (SDO)
A SDO is an ontology in which objects and classes can be equipped by a set of rules named descriptors.
Descriptors are object-oriented grammatical rules that: Allow to recognize and extract objects from documents
and populate classes with new extracted objects Exploit Knowledge contained in OOKB for the extraction Can exploit each other in describing more complex objects
![Page 20: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/20.jpg)
Descriptors
Class Descriptors that handle 2-D capabilities:
class weatherRecord( wCity:city, wWarns:warnings,Temp:temperature, wHumid:percentage, wPress:pressure, wDescr:weatherDescription, wWind:wind).
<weatherRecord(C,Wa,T,H,P,D,Wi)> -> <X:city()>{C:=X;} (<X:warnings()>{Wa:=X;})? <X:temperature()>{T:=X;} <X:percentage()>{H:=X} <X:pressure()>{P:=X;}
<X:wind()>{Wi:=X;} 2D-BOTH.
<X:weatherDescription()>{D:=X;}
General or Domain Specific Knowledge
![Page 21: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/21.jpg)
The system architecture
Attribute Transition Network (ATN) implemented as logic programs in OntoDLP Language
![Page 22: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/22.jpg)
The system architecture
Direct use of Chart Parsing Algorithms for AG parsing
![Page 23: DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.](https://reader036.fdocuments.net/reader036/viewer/2022062408/56649f315503460f94c4c612/html5/thumbnails/23.jpg)
The system architecture: 2-D matcher
Direct use of Chart Parsing Algorithms for AG parsing