NIF - NLP Interchange Format

16
Creating Knowledge out of Interlinked Data LOD2 Presentation . 02.09.2010 . Page http://lod2.eu AKSW, Universität Leipzig Sebastian Hellmann NIF – NLP Interchange Format

description

 

Transcript of NIF - NLP Interchange Format

Page 1: NIF - NLP Interchange Format

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.euAKSW, Universität Leipzig

Sebastian Hellmann

NIF – NLP Interchange Format

Page 2: NIF - NLP Interchange Format

2

Creating Knowledge out of Interlinked Data

Open Linguistics@OKCon 30.6.2011 http://lod2.eu

NIF – NLP Interchange Format

Problem:• Currently NLP software is organized in pipelines• Integration is done „hard-wired“– For each tool and each framework an adapter has to be created

(n*m)• Difficult to exchange single components

2

Page 3: NIF - NLP Interchange Format

3

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

Overview: • NLP tools can be integrated via a common output format (Common

pattern in Enterprise Application Integration)• For each tool a wrapper needs to be created, that reads NIF and

produces NIF• The combination of tools can be adhoc, i.e. it is not a pipeline that

needs to be configured• Multi-layer and overlapping annotations are possible• Ontologies provide interfaces for each layer and for applications

3Open Linguistics@OKCon 30.6.2011

Page 4: NIF - NLP Interchange Format

4

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

• First Challenge: Representing Strings in RDF• How to give a part of a document or text an identifier (URI)?• What properties can such URIs have?

4Open Linguistics@OKCon 30.6.2011

Page 5: NIF - NLP Interchange Format

5

Creating Knowledge out of Interlinked Data

LOD2 Event . 06.09.2010 . Page http://lod2.eu

NIF – NLP Interchange Format

5

Page 6: NIF - NLP Interchange Format

6

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

6

Example URIs for annotating „Semantic Web“

Open Linguistics@OKCon 30.6.2011

Page 7: NIF - NLP Interchange Format

7

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

• First Challenge: Representing Strings in RDF• How to give a part of a document or text an identifier (URI)?• What properties can such URIs have?

7Open Linguistics@OKCon 30.6.2011

Page 8: NIF - NLP Interchange Format

8

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

• URIs are used to integrate output. RDF merges naturally, if the URIs are the same (or convertible using a certain recipe)

8Open Linguistics@OKCon 30.6.2011

Page 9: NIF - NLP Interchange Format

9

Creating Knowledge out of Interlinked Data

http://lod2.eu

NIF – NLP Interchange Format

• Second challenge: Output of each layer is required to be stable.• Components and layers can be interchanged• OLiA provides an ontological interface

9Open Linguistics@OKCon 30.6.2011

Page 10: NIF - NLP Interchange Format

10

Creating Knowledge out of Interlinked Data

LOD2 Event . 06.09.2010 . Page http://lod2.eu

NIF – NLP Interchange Format

10

Page 11: NIF - NLP Interchange Format

11

Creating Knowledge out of Interlinked Data

LOD2 Event . 06.09.2010 . Page http://lod2.eu

NIF – NLP Interchange Format

11

Page 12: NIF - NLP Interchange Format

12

Creating Knowledge out of Interlinked Data

LOD2 Event . 06.09.2010 . Page http://lod2.eu

NIF – NLP Interchange Format

12

Page 13: NIF - NLP Interchange Format

13

Creating Knowledge out of Interlinked Data

http://lod2.eu

Workplan

• EU Deliverable almost finished

• Integration of SnowballStemming and the Stanford Parser

• Next step: Integration of Knowledge Extraction tools (Zemanta, DBpedia Spotlight, Alchemy, OpenCalais)

• Web Service that read NIF and Output NIF

• Google Code Project: http://code.google.com/p/nlp2rdf/

13Open Linguistics@OKCon 30.6.2011

Page 14: NIF - NLP Interchange Format

14

Creating Knowledge out of Interlinked Data

http://lod2.eu

Future

• NIF allows to represent NLP output using Knowledge Representation Formalisms (RDF/OWL)

• It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia)

• Good foundation to optimize machine learning:• Choose the best algortihms • Choose the best data

14Open Linguistics@OKCon 30.6.2011

Page 15: NIF - NLP Interchange Format

15

Creating Knowledge out of Interlinked Data

http://lod2.eu

Reasons for Open Data

15Open Linguistics@OKCon 30.6.2011

• Horváth et. al. (ILP 2009): „A Logic-Based Approach to Relation Extraction from Texts“• POS-Tags and Dependency Trees in First-Order-Logic• ILP Machine Learning Approach

• TIDES Extraction (ACE) 2003 Multilingual Training Data• closed licence• about 3000 US $

• Barrier for reproduction of results• Authors could send me a (p)(r)e-print, but not a copy of the

benchmarkTM

Page 16: NIF - NLP Interchange Format

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

Thank you for your attention!