Felix Sasaki - Value beyond content creation - Introducing ITS 2.0; soapconf 2014

45
Sasaki – SOAP! 2014 Value Beyond Content Creation: Introducing ITS 2.0 Felix Sasaki DFKI / W3C Fellow Slides at http://www.w3.org/Talks/2014/1003-soap-sasaki.pd f 1

Transcript of Felix Sasaki - Value beyond content creation - Introducing ITS 2.0; soapconf 2014

Sasaki – SOAP! 2014 1

Value Beyond Content Creation: Introducing ITS 2.0

Felix SasakiDFKI / W3C Fellow

Slides athttp://www.w3.org/Talks/2014/1003-soap-sasaki.pdf

Sasaki – SOAP! 2014 2

If you want to have nice visualization what ITS 2.0 is: go here

“Linguini a la translation:An Introduction to ITS 2.0”

https://www.youtube.com/watch?v=5Goet3hX6Jo

Sasaki – SOAP! 2014 3

What content authors normally do

• Make money by creating– Content– Layout– Apps

• More and more difficult– Growing amount of content & apps– What is the differentiator?

Sasaki – SOAP! 2014 4

What content authors may doin the future

• Make money by enriching content– Using automatic tools with manual correction– Create the basis for further processes• Translation, search engine optimization,

contextualization, personalization, ..

– Authors become content curators • Background: R&D projects and their results

Sasaki – SOAP! 2014 5

Background 1: LIDER projecthttp://lider-project.eu/

• EU funded project – aims:– Demonstrating the value of multilingual linguistic

linked data sources– Exploring usage scenarios & requirements in

various domains– Creating an R&D roadmap around the topic

Sasaki – SOAP! 2014 6

Background 2: ITS 2.0http://www.w3.org/TR/its20/

• W3C standard to foster multilingual content creation

• Defines metadata (“data categories”) to support the multilingual content life cycle

• A way to interlink Web content and multilingual linked data sources

Sasaki – SOAP! 2014 7

ITS 2.0 data categories

• Translate• Localization Note• Terminology• Directionality• Language Information• Elements Within Text• Domain• Text Analysis• Locale Filter• Provenance

• External Resource• Target Pointer• ID Value• Preserve Space• Localization Quality Issue• Localization Quality

Rating• MT Confidence• Allowed Characters• Storage Size

Sasaki – SOAP! 2014 8

ITS 2.0: High level features

• Can be applied to general XML content and to HTML5

• Partially natively supported in HTML5– E.g. HTML5 “translate” attribute

• Applying data categories– locally: ITS attributes in content– globally: CSS like selector mechanism, using XPath

• Independent data categories: no need to support (as tool maker or user) everything

Sasaki – SOAP! 2014 9

Example: “Translate”local and global

<p>The <span translate=no>World Wide Web Consortium</span> is making the World Wide Web worldwide!</p>

<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"> <its:translateRule selector="//h:code" translate="no" xmlns:h="http://www.w3.org/1999/xhtml"/></its:rules>

Sasaki – SOAP! 2014 10

Example: “Localization Note”

<dataits:locNote="%1\$s is the original text's date in the format YYYY-MM-DD HH:MM always in GMT" …> <value>Translated from English content dated <span id="version-info">%1\$s</span> GMT.</value> </data>

Sasaki – SOAP! 2014 11

Example: “Elements within Text”

<text xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0"> <body> <par>Text with <bold its:withinText="yes">bold</bold>.</par> </body></text>

Sasaki – SOAP! 2014 12

Example: “Locale Filter”

<book xmlns:its="http://www.w3.org/2005/11/its"> <info> <legalnotice its:localeFilterList="en-CA, fr-CA"> <para>This legal notice is only for English and French Canadian locales.</para> </legalnotice> </info></book>

Sasaki – SOAP! 2014 13

Example: “Allowed Characters”

<p>Login names can only use letters from A to Z (upper or lowercase) and the character underscore (_) and minus (-). For example: <code its-allowed-characters=[a-zA-Z_\-]>Huck_Finn</code>.</p>

Sasaki – SOAP! 2014 14

Example: “Terminology”

<p>And he said: you need a new <quote its:term="yes" its:termInfoRef="http://www.directron.com/motherboards1.html" its:termConfidence="0.5">motherboard</quote></p>

Sasaki – SOAP! 2014 15

Example: “MT Confidence”

<body its-annotators-ref="mt-confidence|file:///tools.xml#T1"> <p> <span its-mt-confidence=0.8982>Dublin is the capital of Ireland.</span>

Sasaki – SOAP! 2014 16

Example: “Provenance”

<p its-tool-ref="http://www.onlinemtex.com/2012/7/25/wsdl/" its-org="acme-CAT-v2.3" its-prov-ref="http://www.examplelsp.com/excontent987/production/prov/e6354" its-rev-org="acme-CAT-v2.3" >This paragraph was translated from the machine.</p>

Sasaki – SOAP! 2014 17

Example: “Localization Quality Issue”

<p> <span data-mytool-qacode=named_entity_not_found its-loc-quality-issue-comment="Should be Thomas Cahill.” its-loc-quality-issue-profile-ref=http://example.org/qaMovel/v1 its-loc-quality-issue-severity=100 its-loc-quality-issue-type=inconsistent-entities>Christian Bale</span> (1867–1934) conceived of an instrument … </p>

Sasaki – SOAP! 2014 18

Example: “Text Analysis”

• Identify concepts in content, like named entities– persons, places, events, …

• Store identifiers in (Web) content• Provide a link to multilingual linked data

sources – a basis for content curation

Sasaki – SOAP! 2014 19

Example: “Text Analysis”

<p><span its-ta-confidence="0.7" its-ta-class-ref="http://nerd.eurecom.fr/ontology#Location" its-ta-ident-ref="http://dbpedia.org/resource/Dublin" >Dublin</span> is the <span its-ta-source="Wordnet3.0" its-ta-ident="301467919" its-ta-confidence="0.5" >capital</span> of Ireland.</p>

Sasaki – SOAP! 2014 20

What content authors can do with multilingual linked data sources and ITS 2.0

• Add value to content beyond the content itself• Curate content: provide identifiers, context,

cross lingual information• Tool examples:

1) Generation of ITS 2.0 “Text Analysis” for ePub, and Schema.org markup

2) Generation of translation suggestions3) Working with linked data in the browser –

without understanding details

Sasaki – SOAP! 2014 21

TOOLING 1): GENERATION OF ITS 2.0 “TEXT ANALYSIS” AND SCHEMA.ORG MARKUP FOR EPUB

Sasaki – SOAP! 2014 22

Setup

• oXygen XML editor, modified for ePub / XHTML5 author mode

• Input: ePub or XHTML5 documents• Output: documents enriched with Schema.org

structured information• User does information generation in a

WYSIYWG mode

Sasaki – SOAP! 2014 23

Process

1. Automatic generation of entity annotation, using DBpedia spotlight, producing DBpedia identifiers

2. Access to DBpedia information with pre-defined linked data queries

3. Generation of Schema.org markup

Sasaki – SOAP! 2014 24

1. Automatic generation of entity annotation

• Input:

<p>Welcome to Dublin in Ireland, the home of Samuel Beckett.</p>

Sasaki – SOAP! 2014 25

1. Automatic generation of entity annotation

• Output, stored with ITS 2.0 “Text Analysis” markup:

<p>Welcome to <span its-ta-ident-ref="http://dbpedia.org/resource/Dublin" ...>Dublin</span> in Ireland, the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett" ...>Samuel Beckett</span>.</p>

Sasaki – SOAP! 2014 26

2. Access to DBpedia information

• Using DBpedia identifiers from previous steps in linked data query templates. Example query (part of the query), checking whether entity is a person:

SELECT ?birthPlace ... WHERE{ <http://dbpedia.org/resource/Samuel_Beckett> rdf:type foaf:Person.... }

Sasaki – SOAP! 2014 27

3. Generation of Schema.org structured information

• Using output of previous step (query result)• Generating Schema.org structured

information– Taking types derived from DBpedia into account,

currently• http://schema.org/Person• http://schema.org/Place

Sasaki – SOAP! 2014 28

3. Generation of Schema.org structured information

• Input: linked data query result and marked-up document

<p>Welcome to <span its-ta-ident-ref="http://dbpedia.org/resource/Dublin" ...>Dublin</span> in Ireland, the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett" ...>Samuel Beckett</span>.</p>

Sasaki – SOAP! 2014 29

3. Generation of Schema.org structured information

• Output: marked-up document with Schema.org structured information

<p>Welcome to <span ... itemscope="" itemtype="http://schema.org/Place"><a itemprop="url" href="http://en.wikipedia.org/wiki/Dublin"><span itemprop="name">Dublin</span></a></span>…</p>

Sasaki – SOAP! 2014 30

3. Generation of Schema.org structured information

• Output: auto-generating markup + text

<p>... Samuel Beckett ... (born in <span itemscope="" itemtype="http://schema.org/Place"><a itemprop="url" href="http://en.wikipedia.org/wiki/Foxrock"><span itemprop="name">Foxrock</span></a></span>)</p>

Sasaki – SOAP! 2014 31

Checking output withStructured Data Testing Tool

Sasaki – SOAP! 2014 32

Broad review: a view of schema.org types that may work well

Book (dbpedia-owl:Book)City (dbpedia-owl:City)Country (dbpedia-owl:Country)Event (dbpedia-owl:Event)Hotel (dbpedia-owl:Hotel) Library (dbpedia-owl:Library)Movie (dbpedia-owl:Film)Person (foaf:Person)Place (dbpedia-owl:Place)Organization (dbpedia-owl:Organization)

Sasaki – SOAP! 2014 33

TOOLING 2): GENERATION OF TRANSLATION SUGGESTIONS

Sasaki – SOAP! 2014 34

Generating translation suggestions

• Input: like before• Steps:

1. Entity annotations (again)2. Access to DBpedia and Wikidata to get

translation suggestions3. Storing the results as a localization note

Sasaki – SOAP! 2014 35

1. Automatic generation of entity annotation

• Output, stored with ITS 2.0 “Text Analysis” markup:

<p>Welcome to <span its-ta-ident-ref="http://dbpedia.org/resource/Dublin" ...>Dublin</span> in Ireland, the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett" ...>Samuel Beckett</span>.</p>

Sasaki – SOAP! 2014 36

2. Access to DBpedia and Wikidata to get translation suggestions

• Get translation suggestion from Dbpedia

SELECT ?o WHERE { <http://dbpedia.org/resource/Samuel_Beckett> rdfs:label ?o}

Sasaki – SOAP! 2014 37

2. Access to DBpedia and Wikidata to get translation suggestions

• Get translation suggestion from Wikidata

http://www.wikidata.org/w/api.php?action=wbgetentities&sites=itwiki&titles=Samuel%20Beckett

Sasaki – SOAP! 2014 38

3. Storing the results as ITS 2.0 localization note

• Input: DBpedia + Wikidata query result and marked-up document

<p>… the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett" ...>Samuel Beckett</span>.</p>

Sasaki – SOAP! 2014 39

3. Storing the results as localization note

• Output: Translation suggestions stored as localization note

<p>… the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett"its-loc-note="TRANSLATION SUGGESTIONS: 1) wikidata:サミュエル・ベケット2) dbpedia:サミュエル・ベケット " ...>Samuel Beckett</span>.</p>

Sasaki – SOAP! 2014 40

TOOLING 3: WORKING WITH LINKED DATA IN THE BROWSER – WITHOUT UNDERSTANDING DETAILS

Sasaki – SOAP! 2014 41

MLOD4CON

• Working with links to external multilingual data sources

• Under the hood: lot’ of technology– ITS 2.0, RDF, SPARQL, JavaScript, …

• Good news: the user does not need to know about these

Demo at http://www.w3.org/People/fsasaki/mlod4con/

Sasaki – SOAP! 2014 42

EVERYTHING DONE?

Sasaki – SOAP! 2014 43

Issues

• Learn from communities what they want to do with ITS 2.0 and linked data sources– Content creators and content architects,

translators, XML / Web tool makers, researchers in the data and language technology area, …

• Provide adequate tooling• Look carefully into requirements: “Too much

information is no information!”

Sasaki – SOAP! 2014 44

What next for you?

• ITS 2.0 Toolinghttps://www.w3.org/International/its/wiki/ITS_Implementations

• Videos explaining ITS 2.0 usagehttps://www.youtube.com/user/W3CITS20/videos

• Linked Data for Language Technology Community Group: discuss use cases and requirements for multilingual linked data

http://www.w3.org/community/ld4lt/

• ITS Interest Group: Join the community of ITS 2.0 users and implementers

https://www.w3.org/International/its/ig/

Sasaki – SOAP! 2014 45

Value Beyond Content Creation: Introducing ITS 2.0

Felix SasakiDFKI / W3C Fellow

Slides athttp://www.w3.org/Talks/2014/1003-soap-sasaki.pdf