NIF - NLP Interchange Format
-
Upload
sebastian-hellmann -
Category
Technology
-
view
2.270 -
download
1
description
Transcript of NIF - NLP Interchange Format
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.euAKSW, Universität Leipzig
Sebastian Hellmann
NIF – NLP Interchange Format
2
Creating Knowledge out of Interlinked Data
Open Linguistics@OKCon 30.6.2011 http://lod2.eu
NIF – NLP Interchange Format
Problem:• Currently NLP software is organized in pipelines• Integration is done „hard-wired“– For each tool and each framework an adapter has to be created
(n*m)• Difficult to exchange single components
2
3
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
Overview: • NLP tools can be integrated via a common output format (Common
pattern in Enterprise Application Integration)• For each tool a wrapper needs to be created, that reads NIF and
produces NIF• The combination of tools can be adhoc, i.e. it is not a pipeline that
needs to be configured• Multi-layer and overlapping annotations are possible• Ontologies provide interfaces for each layer and for applications
3Open Linguistics@OKCon 30.6.2011
4
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
• First Challenge: Representing Strings in RDF• How to give a part of a document or text an identifier (URI)?• What properties can such URIs have?
4Open Linguistics@OKCon 30.6.2011
5
Creating Knowledge out of Interlinked Data
LOD2 Event . 06.09.2010 . Page http://lod2.eu
NIF – NLP Interchange Format
5
6
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
6
Example URIs for annotating „Semantic Web“
Open Linguistics@OKCon 30.6.2011
7
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
• First Challenge: Representing Strings in RDF• How to give a part of a document or text an identifier (URI)?• What properties can such URIs have?
7Open Linguistics@OKCon 30.6.2011
8
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
• URIs are used to integrate output. RDF merges naturally, if the URIs are the same (or convertible using a certain recipe)
8Open Linguistics@OKCon 30.6.2011
9
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
• Second challenge: Output of each layer is required to be stable.• Components and layers can be interchanged• OLiA provides an ontological interface
9Open Linguistics@OKCon 30.6.2011
10
Creating Knowledge out of Interlinked Data
LOD2 Event . 06.09.2010 . Page http://lod2.eu
NIF – NLP Interchange Format
10
11
Creating Knowledge out of Interlinked Data
LOD2 Event . 06.09.2010 . Page http://lod2.eu
NIF – NLP Interchange Format
11
12
Creating Knowledge out of Interlinked Data
LOD2 Event . 06.09.2010 . Page http://lod2.eu
NIF – NLP Interchange Format
12
13
Creating Knowledge out of Interlinked Data
http://lod2.eu
Workplan
• EU Deliverable almost finished
• Integration of SnowballStemming and the Stanford Parser
• Next step: Integration of Knowledge Extraction tools (Zemanta, DBpedia Spotlight, Alchemy, OpenCalais)
• Web Service that read NIF and Output NIF
• Google Code Project: http://code.google.com/p/nlp2rdf/
13Open Linguistics@OKCon 30.6.2011
14
Creating Knowledge out of Interlinked Data
http://lod2.eu
Future
• NIF allows to represent NLP output using Knowledge Representation Formalisms (RDF/OWL)
• It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia)
• Good foundation to optimize machine learning:• Choose the best algortihms • Choose the best data
14Open Linguistics@OKCon 30.6.2011
15
Creating Knowledge out of Interlinked Data
http://lod2.eu
Reasons for Open Data
15Open Linguistics@OKCon 30.6.2011
• Horváth et. al. (ILP 2009): „A Logic-Based Approach to Relation Extraction from Texts“• POS-Tags and Dependency Trees in First-Order-Logic• ILP Machine Learning Approach
• TIDES Extraction (ACE) 2003 Multilingual Training Data• closed licence• about 3000 US $
• Barrier for reproduction of results• Authors could send me a (p)(r)e-print, but not a copy of the
benchmarkTM
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
Thank you for your attention!