Technologies du Web Master COMASIC Information Extraction ...€¦ · Technologies du Web Master...

35
Introduction Information Extraction Semantic Web Technologies du Web Master COMASIC Information Extraction and the Semantic Web Antoine Amarilli 1 December 2, 2014 1 Course material adapted from Fabian Suchanek’s slides: http://suchanek.name/work/teaching/IESW2010.pdf. SPARQL example from: en.wikipedia.org/w/index.php?title=SPARQL&oldid=575552762. Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ 1/31

Transcript of Technologies du Web Master COMASIC Information Extraction ...€¦ · Technologies du Web Master...

Introduction Information Extraction Semantic Web

Technologies du WebMaster COMASIC

Information Extraction and the Semantic Web

Antoine Amarilli1

December 2, 2014

1Course material adapted from Fabian Suchanek’s slides:http://suchanek.name/work/teaching/IESW2010.pdf.SPARQL example from:en.wikipedia.org/w/index.php?title=SPARQL&oldid=575552762.Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer,Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/

1/31

Introduction Information Extraction Semantic Web

Motivation

We have seen how search engines work at the level of words.Sometimes, this works...

Sometimes, it doesn’t:

Those hard queries would be easy on RDBMSes!→ We need to extract structured information.→ We would like to understand its semantics.

2/31

Introduction Information Extraction Semantic Web

Other motivations: job offeringsMotivation: Examples

Title Type Location

Business strategy Associate Part time Palo Alto, CA

Registered Nurse Full time Los Angeles

... ... 8

3/31

Introduction Information Extraction Semantic Web

Other motivations: scientific papersMotivation: Examples

Author Publication Year

Grishman Information Extraction... 2006

... ... ... 9

4/31

Introduction Information Extraction Semantic Web

Other motivations: price comparisonMotivation: Examples

Product Type Price

Dynex 32” LCD TV $1000

... ... 10

5/31

Introduction Information Extraction Semantic Web

Table of contents

1 Introduction

2 Information Extraction

3 Semantic Web

6/31

Introduction Information Extraction Semantic Web

Roadmap Information Extraction

Instance Extraction

Fact Extraction

Ontological Information Extraction

and beyond Information Extraction (IE) is the process of extracting structured information (e.g., database tables) from unstructured machine-readable documents (e.g., Web documents).

Person Nationality

Elvis Presley American

nationality

11

Elvis Presley singer

Angela Merkel politician

7/31

Introduction Information Extraction Semantic Web

Instance extraction with Hearst patterns

Entities can be extracted and categorized by automaticextraction of simple patterns:

Many scientists, including Einstein, believed...France, Germany and other countries have been plagued with...Other forms of government such as constitutional monarchy...

Difficulties:→ Must parse correctly.→ Must be resilient to noise.

8/31

Introduction Information Extraction Semantic Web

Instance extraction with set expansion

Start with a seed set of entities of a certain type.Find occurrences of them at specific position in documents:

Lists.Table columns.

Assume that other items are other entities of the same nature.→ Once again, this is noisy...

→ Precision and recall, see previous slides.

9/31

Introduction Information Extraction Semantic Web

Set expansion exampleInstance Extraction: Set Expansion Seed set: {Russia, USA, Australia}

Result set: {Russia, Canada, China, USA, Brazil, Australia, India, Argentina,Kazakhstan, Sudan}

17

10/31

Introduction Information Extraction Semantic Web

Fact extraction with wrapper inductionFact Extraction: Wrapper Induction Observation: On Web pages of a certain domain, the information is often in the same spot.

28

11/31

Introduction Information Extraction Semantic Web

Specifying a wrapper

A wrapper can be expressed:As a path in the DOM (usually XPath).Extensions to multiple pages, e.g., OXPath.As a regular expression.

A wrapper can be produced:Through manual annotation of the relevant fields.Using specific knowledge of the source.→ Wikipedia categories and infoboxes.

By comparison between similar pages to find what changed.Using seed pairs (known facts).

Possibility to iterate between patterns and facts.→ Risk of semantic drift.

12/31

Introduction Information Extraction Semantic Web

Fact extraction on text

Entity extraction:→ find entities in the text.

Named entity recognition:→ identify the type of entities.

→ person→ organization→ quantity→ address→ etc.

NLP patterns to extract facts→ POS patterns→ Parse trees.

13/31

Introduction Information Extraction Semantic Web

Fact extraction on textFact Extraction: Pattern Matching

Einstein ha scoperto il K68, quando aveva 4 anni.

Bohr ha scoperto il K69 nel anno 1960.

Person Discovery

Einstein K68

X ha scoperto il Y

Person Discovery

Bohr K69

The patterns can be more complex, e.g. •  regular expressions X discovered the .{0,20} Y •  POS patterns X discovered the ADJ? Y •  Parse trees

X discovered Y

PN

NP S

VP

V PN

NP

34

Try

14/31

Introduction Information Extraction Semantic Web

Ontologies

Ontology: a set of entities and relations.

Name Ment. Mfact Domain Publisher Start Update

YAGO >10 >120 general MPI 2008 2012DBPedia2 4.6 3 general OpenLink et al 2007 2014Wikidata3 15 30 general Wikimedia 2012 2014Freebase4 46 2 680 general Metaweb (Google) 2007 2014Knowl. Vault5 >570? >1 600 general Google 2014 2014MusicBrainz6 >35 >180 music MetaBrainz 2003 2013WordNet 0.4 2 English Princeton 1985 2006ConceptNet7 5.3 13 obvious MIT 2000 2014

2http://blog.dbpedia.org/2014/09/09/dbpedia-version-2014-released/3http://cacm.acm.org/magazines/2014/10/178785-wikidata/fulltext#body-94http://freebase.com/5http://www.newscientist.com/article/mg22329832.

700-googles-factchecking-bots-build-vast-knowledge-bank.html6https://musicbrainz.org/statistics7http://conceptnet5.media.mit.edu/downloads/

15/31

Introduction Information Extraction Semantic Web

Entity disambiguation

Map mentions in text to entities.Problem: mentions are ambiguous!→ Use the importance of entities.→ Use the likelihood that a term refers to an entity.→ Use semantic consistency on the mappings of a document.

(Demo: https://gate.d5.mpi-inf.mpg.de/webaida/)

16/31

Introduction Information Extraction Semantic Web

Entity disambiguation

Map mentions in text to entities.Problem: mentions are ambiguous!→ Use the importance of entities.→ Use the likelihood that a term refers to an entity.→ Use semantic consistency on the mappings of a document.

(Demo: https://gate.d5.mpi-inf.mpg.de/webaida/)

16/31

Introduction Information Extraction Semantic Web

Ontological IE

Use the existing ontology as reference.Extract information from additional documents.Use fuzzy rules to extend the ontology (often manual):→ Extraction rules→ Logical constraints→ Common sense

Reasoning:→ Datalog→ Weighted MAX-SAT→ Markov Logic Networks

17/31

Introduction Information Extraction Semantic Web

Ontological IE

Ontology

NL documents Elvis was born in 1935

Semantic Rules

birthdate<deathdate

type(Elvis_Presley,singer) subclassof(singer,person) ...

appears(“Elvis”,”was born in”, ”1935”) ... means(“Elvis”,Elvis_Presley,0.8) means(“Elvis”,Elvis_Costello,0.2) ...

born(X,Y) & died(X,Z) => Y<Z appears(A,P,B) & R(A,B) => expresses(P,R) appears(A,P,B) & expresses(P,R) => R(A,B) ...

First Order Logic Formulae

1935 born

New facts with 1. canonic relations 2. canonic entities 3. consistency with the ontology

Ontological IE: Reasoning

SOFIE system

18/31

Introduction Information Extraction Semantic Web

Open IE

Use the entire Web as corpus.Crawl the Web for new facts.Create new rules from what is extracted.Examples:

Open IE, University of Washington.

→ Demo: http://openie.cs.washington.edu/Read the Web, CMU.→ Demo: http://rtw.ml.cmu.edu/rtw/

19/31

Introduction Information Extraction Semantic Web

Open IE

Use the entire Web as corpus.Crawl the Web for new facts.Create new rules from what is extracted.Examples:

Open IE, University of Washington.→ Demo: http://openie.cs.washington.edu/

Read the Web, CMU.→ Demo: http://rtw.ml.cmu.edu/rtw/

19/31

Introduction Information Extraction Semantic Web

Open IE

Use the entire Web as corpus.Crawl the Web for new facts.Create new rules from what is extracted.Examples:

Open IE, University of Washington.→ Demo: http://openie.cs.washington.edu/

Read the Web, CMU.→ Demo: http://rtw.ml.cmu.edu/rtw/

19/31

Introduction Information Extraction Semantic Web

Table of contents

1 Introduction

2 Information Extraction

3 Semantic Web

20/31

Introduction Information Extraction Semantic Web

Motivation

Having structured data is nice.However, independent sources are not useful.We need to create links between data sources.→ Run a query across multiple relevant data stores.→ Perform complex transactions (booking a flight, a hotel...).→ Rich data visualization (integrating e.g. maps and statistics).

We need to define the semantics of the data.We need to enforce constraints.We need to evaluate complex queries over multiple sources.

21/31

Introduction Information Extraction Semantic Web

The Semantic Web

URIs Globally unique identification of entities and relations.OWL Constraint language over structured data.RDF Storage format for structured data.

SPARQL Query language for structured data.LOD Linked Open Data: draw links between data sources.

22/31

Introduction Information Extraction Semantic Web

URIs

Uniform Resource IdentifierLike URLs.Not always dereferenceable.URNs: urn:isbn:0486415864URLs, often with namespaces:→ dbp:Paris for http://dbpedia.org/resource/Paris

23/31

Introduction Information Extraction Semantic Web

RDF

Resource Description Framework.Triples: Subject predicate object.<dbp:Paris> <dbp:country> <dbp:France>→ The country of Paris (DBPedia resources).

<dbp:Paris> <foaf:homepage> <http://www.paris.fr>→ The homepage of Paris (FOAF relation, website).

<dbp:Paris> <foaf:name> "Paris"@en→ The name of Paris (FOAF relation, literal value).

Multiple serializations.

24/31

Introduction Information Extraction Semantic Web

RDFS

RDF Schema.<dbp:Paris> <rdf:type> <dbp:Settlement>→ Paris is a settlement.

<dbp:Settlement> <rdfs:subclassOf> <dbp:Place>→ If I am a Settlement then I am a Place.

<dbp:writer> <rdfs:subPropertyOf> <y:created>→ If you are the writer of something (for DBPedia)

then you are the creator of that thing (for YAGO).

25/31

Introduction Information Extraction Semantic Web

OWL

Ontology Web Language.<dbp:birthPlace> <rdf:type> <owl:FunctionalProperty>→ People are born in at most one place.

<dbp:Person> <owl:disjointWith> <dbp:Settlement>→ Something cannot be both a Person and a Settlement.

<schema:spouse> <owl:equivalentProperty> <dbp:spouse>→ spouse in Schema.org in DBPedia are equivalent properties.

<myonto:p4242> <owl:sameAs> <dbp:Douglas_Adams>→ Assert equalities between resources.

26/31

Introduction Information Extraction Semantic Web

SPARQL

SPARQL Protocol And RDF Query Language.Query language for RDF.PREFIX abc: <http://example.com/exampleOntology#>SELECT ?capital ?countryWHERE {

?x abc:cityname ?capital ;abc:isCapitalOf ?y .

?y abc:countryname ?country ;abc:isInContinent abc:Africa .

}

(Demo: http://dbpedia-live.openlinksw.com/sparql/)

27/31

Introduction Information Extraction Semantic Web

SPARQL

SPARQL Protocol And RDF Query Language.Query language for RDF.PREFIX abc: <http://example.com/exampleOntology#>SELECT ?capital ?countryWHERE {

?x abc:cityname ?capital ;abc:isCapitalOf ?y .

?y abc:countryname ?country ;abc:isInContinent abc:Africa .

}

(Demo: http://dbpedia-live.openlinksw.com/sparql/)

27/31

Introduction Information Extraction Semantic Web

Linked Open Data

Linked Datasets as of August 2014

Uniprot

AlexandriaDigital Library

Gazetteer

lobidOrganizations

chem2bio2rdf

MultimediaLab University

Ghent

Open DataEcuador

GeoEcuador

Serendipity

UTPLLOD

GovAgriBusDenmark

DBpedialive

URIBurner

Linguistics

Social Networking

Life Sciences

Cross-Domain

Government

User-Generated Content

Publications

Geographic

Media

Identifiers

EionetRDF

lobidResources

WiktionaryDBpedia

Viaf

Umthes

RKBExplorer

Courseware

Opencyc

Olia

Gem.Thesaurus

AudiovisueleArchieven

DiseasomeFU-Berlin

Eurovocin

SKOS

DNBGND

Cornetto

Bio2RDFPubmed

Bio2RDFNDC

Bio2RDFMesh

IDS

OntosNewsPortal

AEMET

ineverycrea

LinkedUser

Feedback

MuseosEspaniaGNOSS

Europeana

NomenclatorAsturias

Red UnoInternacional

GNOSS

GeoWordnet

Bio2RDFHGNC

CticPublic

Dataset

Bio2RDFHomologene

Bio2RDFAffymetrix

MuninnWorld War I

CKAN

GovernmentWeb Integration

forLinkedData

Universidadde CuencaLinkeddata

Freebase

Linklion

Ariadne

OrganicEdunet

GeneExpressionAtlas RDF

ChemblRDF

BiosamplesRDF

IdentifiersOrg

BiomodelsRDF

ReactomeRDF

Disgenet

SemanticQuran

IATI asLinked Data

DutchShips and

Sailors

Verrijktkoninkrijk

IServe

Arago-dbpedia

LinkedTCGA

ABS270a.info

RDFLicense

EnvironmentalApplications

ReferenceThesaurus

Thist

JudaicaLink

BPR

OCD

ShoahVictimsNames

Reload

Data forTourists in

Castilla y Leon

2001SpanishCensusto RDF

RKBExplorer

Webscience

RKBExplorerEprintsHarvest

NVS

EU AgenciesBodies

EPO

LinkedNUTS

RKBExplorer

Epsrc

OpenMobile

Network

RKBExplorerLisbon

RKBExplorer

Italy

CE4R

EnvironmentAgency

Bathing WaterQuality

RKBExplorerKaunas

OpenData

Thesaurus

RKBExplorerWordnet

RKBExplorer

ECS

AustrianSki

Racers

Social-semweb

Thesaurus

DataOpenAc Uk

RKBExplorer

IEEE

RKBExplorer

LAAS

RKBExplorer

Wiki

RKBExplorer

JISC

RKBExplorerEprints

RKBExplorer

Pisa

RKBExplorer

Darmstadt

RKBExplorerunlocode

RKBExplorer

Newcastle

RKBExplorer

OS

RKBExplorer

Curriculum

RKBExplorer

Resex

RKBExplorer

Roma

RKBExplorerEurecom

RKBExplorer

IBM

RKBExplorer

NSF

RKBExplorer

kisti

RKBExplorer

DBLP

RKBExplorer

ACM

RKBExplorerCiteseer

RKBExplorer

Southampton

RKBExplorerDeepblue

RKBExplorerDeploy

RKBExplorer

Risks

RKBExplorer

ERA

RKBExplorer

OAI

RKBExplorer

FT

RKBExplorer

Ulm

RKBExplorer

Irit

RKBExplorerRAE2001

RKBExplorer

Dotac

RKBExplorerBudapest

SwedishOpen Cultural

Heritage

Radatana

CourtsThesaurus

GermanLabor LawThesaurus

GovUKTransport

Data

GovUKEducation

Data

EnaktingMortality

EnaktingEnergy

EnaktingCrime

EnaktingPopulation

EnaktingCO2Emission

EnaktingNHS

RKBExplorer

Crime

RKBExplorercordis

Govtrack

GeologicalSurvey of

AustriaThesaurus

GeoLinkedData

GesisThesoz

Bio2RDFPharmgkb

Bio2RDFSabiorkBio2RDF

Ncbigene

Bio2RDFIrefindex

Bio2RDFIproclass

Bio2RDFGOA

Bio2RDFDrugbank

Bio2RDFCTD

Bio2RDFBiomodels

Bio2RDFDBSNP

Bio2RDFClinicaltrials

Bio2RDFLSR

Bio2RDFOrphanet

Bio2RDFWormbase

BIS270a.info

DM2E

DBpediaPT

DBpediaES

DBpediaCS

DBnary

AlpinoRDF

YAGO

PdevLemon

Lemonuby

Isocat

Ietflang

Core

KUPKB

GettyAAT

SemanticWeb

Journal

OpenlinkSWDataspaces

MyOpenlinkDataspaces

Jugem

Typepad

AspireHarperAdams

NBNResolving

Worldcat

Bio2RDF

Bio2RDFECO

Taxon-conceptAssets

Indymedia

GovUKSocietal

WellbeingDeprivation imd

EmploymentRank La 2010

GNULicenses

GreekWordnet

DBpedia

CIPFA

Yso.fiAllars

Glottolog

StatusNetBonifaz

StatusNetshnoulle

Revyu

StatusNetKathryl

ChargingStations

AspireUCL

Tekord

Didactalia

ArtenueVosmedios

GNOSS

LinkedCrunchbase

ESDStandards

VIVOUniversityof Florida

Bio2RDFSGD

Resources

ProductOntology

DatosBne.es

StatusNetMrblog

Bio2RDFDataset

EUNIS

GovUKHousingMarket

LCSH

GovUKTransparencyImpact ind.Households

In temp.Accom.

UniprotKB

StatusNetTimttmy

SemanticWeb

Grundlagen

GovUKInput ind.

Local AuthorityFunding FromGovernment

Grant

StatusNetFcestrada

JITA

StatusNetSomsants

StatusNetIlikefreedom

DrugbankFU-Berlin

Semanlink

StatusNetDtdns

StatusNetStatus.net

DCSSheffield

AtheliaRFID

StatusNetTekk

ListaEncabezaMientosMateria

StatusNetFragdev

Morelab

DBTuneJohn PeelSessions

RDFizelast.fm

OpenData

Euskadi

GovUKTransparency

Input ind.Local auth.Funding f.

Gvmnt. Grant

MSC

Lexinfo

StatusNetEquestriarp

Asn.us

GovUKSocietal

WellbeingDeprivation ImdHealth Rank la

2010

StatusNetMacno

OceandrillingBorehole

AspireQmul

GovUKImpact

IndicatorsPlanning

ApplicationsGranted

Loius

Datahub.io

StatusNetMaymay

Prospectsand

TrendsGNOSS

GovUKTransparency

Impact IndicatorsEnergy Efficiency

new Builds

DBpediaEU

Bio2RDFTaxon

StatusNetTschlotfeldt

JamendoDBTune

AspireNTU

GovUKSocietal

WellbeingDeprivation Imd

Health Score2010

LoticoGNOSS

UniprotMetadata

LinkedEurostat

AspireSussex

Lexvo

LinkedGeoData

StatusNetSpip

SORS

GovUKHomeless-

nessAccept. per

1000

TWCIEEEvis

AspireBrunel

PlanetDataProject

Wiki

StatusNetFreelish

Statisticsdata.gov.uk

StatusNetMulestable

Enipedia

UKLegislation

API

LinkedMDB

StatusNetQth

SiderFU-Berlin

DBpediaDE

GovUKHouseholds

Social lettingsGeneral Needs

Lettings PrpNumber

Bedrooms

AgrovocSkos

MyExperiment

ProyectoApadrina

GovUKImd CrimeRank 2010

SISVU

GovUKSocietal

WellbeingDeprivation ImdHousing Rank la

2010

StatusNetUni

Siegen

OpendataScotland Simd

EducationRank

StatusNetKaimi

GovUKHouseholds

Accommodatedper 1000

StatusNetPlanetlibre

DBpediaEL

SztakiLOD

DBpediaLite

DrugInteractionKnowledge

BaseStatusNet

Qdnx

AmsterdamMuseum

AS EDN LOD

RDFOhloh

DBTuneartistslast.fm

AspireUclan

HellenicFire Brigade

Bibsonomy

NottinghamTrent

ResourceLists

OpendataScotland SimdIncome Rank

RandomnessGuide

London

OpendataScotland

Simd HealthRank

SouthamptonECS Eprints

FRB270a.info

StatusNetSebseb01

StatusNetBka

ESDToolkit

HellenicPolice

StatusNetCed117

OpenEnergy

Info Wiki

StatusNetLydiastench

OpenDataRISP

Taxon-concept

Occurences

Bio2RDFSGD

UIS270a.info

NYTimesLinked Open

Data

AspireKeele

GovUKHouseholdsProjectionsPopulation

W3C

OpendataScotland

Simd HousingRank

ZDB

StatusNet1w6

StatusNetAlexandre

Franke

DeweyDecimal

Classification

StatusNetStatus

StatusNetdoomicile

CurrencyDesignators

StatusNetHiico

LinkedEdgar

GovUKHouseholds

2008

DOI

StatusNetPandaid

BrazilianPoliticians

NHSJargon

Theses.fr

LinkedLifeData

Semantic WebDogFood

UMBEL

OpenlyLocal

StatusNetSsweeny

LinkedFood

InteractiveMaps

GNOSS

OECD270a.info

Sudoc.fr

GreenCompetitive-

nessGNOSS

StatusNetIntegralblue

WOLD

LinkedStockIndex

Apache

KDATA

LinkedOpenPiracy

GovUKSocietal

WellbeingDeprv. ImdEmpl. Rank

La 2010

BBCMusic

StatusNetQuitter

StatusNetScoffoni

OpenElection

DataProject

Referencedata.gov.uk

StatusNetJonkman

ProjectGutenbergFU-BerlinDBTropes

StatusNetSpraci

Libris

ECB270a.info

StatusNetThelovebug

Icane

GreekAdministrative

Geography

Bio2RDFOMIM

StatusNetOrangeseeds

NationalDiet Library

WEB NDLAuthorities

UniprotTaxonomy

DBpediaNL

L3SDBLP

FAOGeopolitical

Ontology

GovUKImpact

IndicatorsHousing Starts

DeutscheBiographie

StatusNetldnfai

StatusNetKeuser

StatusNetRusswurm

GovUK SocietalWellbeing

Deprivation ImdCrime Rank 2010

GovUKImd Income

Rank La2010

StatusNetDatenfahrt

StatusNetImirhil

Southamptonac.uk

LOD2Project

Wiki

DBpediaKO

DailymedFU-Berlin

WALS

DBpediaIT

StatusNetRecit

Livejournal

StatusNetExdc

Elviajero

Aves3D

OpenCalais

ZaragozaTurruta

AspireManchester

Wordnet(VU)

GovUKTransparency

Impact IndicatorsNeighbourhood

Plans

StatusNetDavid

Haberthuer

B3Kat

PubBielefeld

Prefix.cc

NALT

Vulnera-pedia

GovUKImpact

IndicatorsAffordable

Housing Starts

GovUKWellbeing lsoa

HappyYesterday

Mean

FlickrWrappr

Yso.fiYSA

OpenLibrary

AspirePlymouth

StatusNetJohndrink

Water

StatusNetGomertronic

Tags2conDelicious

StatusNettl1n

StatusNetProgval

Testee

WorldFactbookFU-Berlin

DBpediaJA

StatusNetCooleysekula

ProductDB

IMF270a.info

StatusNetPostblue

StatusNetSkilledtests

NextwebGNOSS

EurostatFU-Berlin

GovUKHouseholds

Social LettingsGeneral Needs

Lettings PrpHousehold

Composition

StatusNetFcac

DWSGroup

OpendataScotland

GraphSimd Rank

DNB

CleanEnergyData

Reegle

OpendataScotland SimdEmployment

Rank

ChroniclingAmerica

GovUKSocietal

WellbeingDeprivation

Imd Rank 2010

StatusNetBelfalas

AspireMMU

StatusNetLegadolibre

BlukBNB

StatusNetLebsanft

GADMGeovocab

GovUKImd Score

2010

SemanticXBRL

UKPostcodes

GeoNames

EEARodAspire

Roehampton

BFS270a.info

CameraDeputatiLinkedData

Bio2RDFGeneID

GovUKTransparency

Impact IndicatorsPlanning

ApplicationsGranted

StatusNetSweetie

Belle

O'Reilly

GNI

CityLichfield

GovUKImd

Rank 2010

BibleOntology

Idref.fr

StatusNetAtari

Frosch

Dev8d

NobelPrizes

StatusNetSoucy

ArchiveshubLinkedData

LinkedRailway

DataProject

FAO270a.info

GovUKWellbeing

WorthwhileMean

Bibbase

Semantic-web.org

BritishMuseum

Collection

GovUKDev LocalAuthorityServices

CodeHaus

Lingvoj

OrdnanceSurveyLinkedData

Wordpress

EurostatRDF

StatusNetKenzoid

GEMET

GovUKSocietal

WellbeingDeprv. imdScore '10

MisMuseosGNOSS

GovUKHouseholdsProjections

totalHouseolds

StatusNet20100

EEA

CiardRing

OpendataScotland Graph

EducationPupils by

School andDatazone

VIVOIndiana

University

Pokepedia

Transparency270a.info

StatusNetGlou

GovUKHomelessness

HouseholdsAccommodated

TemporaryHousing Types

STWThesaurus

forEconomics

DebianPackageTrackingSystem

DBTuneMagnatune

NUTSGeo-vocab

GovUKSocietal

WellbeingDeprivation ImdIncome Rank La

2010

BBCWildlifeFinder

StatusNetMystatus

MiguiadEviajesGNOSS

AcornSat

DataBnf.fr

GovUKimd env.

rank 2010

StatusNetOpensimchat

OpenFoodFacts

GovUKSocietal

WellbeingDeprivation Imd

Education Rank La2010

LODACBDLS

FOAF-Profiles

StatusNetSamnoble

GovUKTransparency

Impact IndicatorsAffordable

Housing Starts

StatusNetCoreyavisEnel

Shops

DBpediaFR

StatusNetRainbowdash

StatusNetMamalibre

PrincetonLibrary

Findingaids

WWWFoundation

Bio2RDFOMIM

Resources

OpendataScotland Simd

GeographicAccess Rank

Gutenberg

StatusNetOtbm

ODCLSOA

StatusNetOurcoffs

Colinda

WebNmasunoTraveler

StatusNetHackerposse

LOV

GarnicaPlywood

GovUKwellb. happy

yesterdaystd. dev.

StatusNetLudost

BBCProgram-

mes

GovUKSocietal

WellbeingDeprivation Imd

EnvironmentRank 2010

Bio2RDFTaxonomy

Worldbank270a.info

OSM

DBTuneMusic-brainz

LinkedMarkMail

StatusNetDeuxpi

GovUKTransparency

ImpactIndicators

Housing Starts

BizkaiSense

GovUKimpact

indicators energyefficiency new

builds

StatusNetMorphtown

GovUKTransparency

Input indicatorsLocal authorities

Working w. tr.Families

ISO 639Oasis

AspirePortsmouth

ZaragozaDatos

AbiertosOpendataScotland

SimdCrime Rank

Berlios

StatusNetpiana

GovUKNet Add.Dwellings

Bootsnall

StatusNetchromic

Geospecies

linkedct

Wordnet(W3C)

StatusNetthornton2

StatusNetmkuttner

StatusNetlinuxwrangling

EurostatLinkedData

GovUKsocietal

wellbeingdeprv. imd

rank '07

GovUKsocietal

wellbeingdeprv. imdrank la '10

LinkedOpen Data

ofEcology

StatusNetchickenkiller

StatusNetgegeweb

DeustoTech

StatusNetschiessle

GovUKtransparency

impactindicatorstr. families

Taxonconcept

GovUKservice

expenditure

GovUKsocietal

wellbeingdeprivation imd

employmentscore 2010

28/31

Introduction Information Extraction Semantic Web

Linked Open Data (zoom)

DBpedia URI

Opencyc

Freebase YAGO

KUPKB

DBpedia

EUNIS

DrugbankFU-Berlin

DBpediaEL

UMBEL

DBpediaNL

StatusNetProgval

ArchiveshubLinkedData

CodeHaus

FOAF-Profiles

LOV

GeospeciesLinkedOpen Data

ofEcology

Taxonconcept

29/31

Introduction Information Extraction Semantic Web

Linked Open Data

Integrate many sources:→ General ontologies.→ Domain-specific ontologies.→ Open data dumps.→ Existing relational databases.

Find links→ Automatically.→ By hand (relations, rules...).

Statistics:→ Hundreds of ontologies.→ 504 million RDF links (2011).→ 52 billion triples (2012).8→ 5 billion entities (2012, incl., e.g., 854 million people).

→ Wealth of data, still underused8http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/

LinkingOpenData30/31

Introduction Information Extraction Semantic Web

Challenges

Manage trust and attribution.Manage time.Manage uncertainty throughout the pipeline.→ Extraction.→ Trust.→ Intrisic uncertainty.

Manage complex facts.→ Reification.

Manage noise.Compute alignments.

31/31