Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale
-
Upload
benjamin-adrian -
Category
Technology
-
view
949 -
download
0
description
Transcript of Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale
![Page 1: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/1.jpg)
Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Using Suffix Arrays for Efficient Recognition of Named Entities
in Large Scale
Benjamin Adrian,Sven Schwarz
![Page 2: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/2.jpg)
2Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
A huge Web of Data
The Semantic Web offerstechniques for ...
● representing,● formalizing,● and reasoning information
… on the WWW in order to make information ...
● transferable,● portable, ● and interpretable
… for machine consumption.∑ 9,363,625 distinct literal values
![Page 3: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/3.jpg)
3Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Wouldn't it be great to … ?
… to link entity references in text to referents in RDF graphs.
Goal: Enrich natural language text with formal facts.
Benjamin works at DFKI, Kaiserslautern.
![Page 4: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/4.jpg)
4Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
natural language text
How to recognize entity references ?
→ application of relational databases and suffix arrays
efficient representation RDF source
Benjamin works at DFKI, Kaiserslautern.
![Page 5: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/5.jpg)
5Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Entity Recognition Process
text suffix array database RDF graph
query
candidates withmatching prefixes
hashes
prefixhashing
noun-phrasechunking
exact matches
exact match
![Page 6: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/6.jpg)
6Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
RDF statements
<#19810211> <rdfs:label> “Benjamin Adrian”<#67478302> <rdfs:label> “DFKI”
<#19810211> <#employedAt> <#67478302>
symbols
relation
![Page 7: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/7.jpg)
7Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Represent RDF data
RESOURCE INDEX
URI INDEX
RELATIONS
SUBJECT PREDICATE OBJECT
SYMBOLS
SUBJECT PREDICATE OBJECT
LITERAL INDEX
LITERALINDEX HASH
sepatarate storage of symbols and relations
dictionaries
![Page 8: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/8.jpg)
8Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Suffix Array
“Benjamin Adrian works in DFKI, Kaiserslautern”
Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, Kaiserslauternin DFKI, KaiserslauternKaiserslauternworks in DFKI, Kaiserslautern
Text
Suffix array (sorted list of suffixes)
![Page 9: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/9.jpg)
9Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Suffix Array
Benjamin AdrianDFKIKaiserslautern
Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, KaiserslauternKaiserslautern
“Benjamin Adrian works in DFKI, Kaiserslautern”
Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, Kaiserslauternin DFKI, KaiserslauternKaiserslauternworks in DFKI, Kaiserslautern
Text
Suffix array (sorted list of suffixes)
Phrases in text Reduced suffix array
![Page 10: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/10.jpg)
10Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Noun phrases in natural language text
![Page 11: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/11.jpg)
11Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Hashing prefixes
LITERAL INDEX
LITERALINDEX HASHAdrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, KaiserslauternKaiserslautern
Suffix array (hashed prefix size = 4)
![Page 12: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/12.jpg)
12Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Select candidates from database
![Page 13: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/13.jpg)
13Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Response time
![Page 14: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/14.jpg)
14Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Summary
text suffix array database RDF graph
query
candidates withmatching prefixes
hashes
prefixhashing
noun-phrasechunking
exact matches
exact match
![Page 15: Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale](https://reader034.fdocuments.net/reader034/viewer/2022042813/5485b0ff5806b5db588b47b7/html5/thumbnails/15.jpg)
15Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname
Thank you
Questions?
Benjamin Adrian
Sven Schwarz