Felix Sasaki - Value beyond content creation - Introducing ITS 2.0; soapconf 2014
-
Upload
soapconf -
Category
Technology
-
view
145 -
download
1
Transcript of Felix Sasaki - Value beyond content creation - Introducing ITS 2.0; soapconf 2014
Sasaki – SOAP! 2014 1
Value Beyond Content Creation: Introducing ITS 2.0
Felix SasakiDFKI / W3C Fellow
Slides athttp://www.w3.org/Talks/2014/1003-soap-sasaki.pdf
Sasaki – SOAP! 2014 2
If you want to have nice visualization what ITS 2.0 is: go here
“Linguini a la translation:An Introduction to ITS 2.0”
https://www.youtube.com/watch?v=5Goet3hX6Jo
Sasaki – SOAP! 2014 3
What content authors normally do
• Make money by creating– Content– Layout– Apps
• More and more difficult– Growing amount of content & apps– What is the differentiator?
Sasaki – SOAP! 2014 4
What content authors may doin the future
• Make money by enriching content– Using automatic tools with manual correction– Create the basis for further processes• Translation, search engine optimization,
contextualization, personalization, ..
– Authors become content curators • Background: R&D projects and their results
Sasaki – SOAP! 2014 5
Background 1: LIDER projecthttp://lider-project.eu/
• EU funded project – aims:– Demonstrating the value of multilingual linguistic
linked data sources– Exploring usage scenarios & requirements in
various domains– Creating an R&D roadmap around the topic
Sasaki – SOAP! 2014 6
Background 2: ITS 2.0http://www.w3.org/TR/its20/
• W3C standard to foster multilingual content creation
• Defines metadata (“data categories”) to support the multilingual content life cycle
• A way to interlink Web content and multilingual linked data sources
Sasaki – SOAP! 2014 7
ITS 2.0 data categories
• Translate• Localization Note• Terminology• Directionality• Language Information• Elements Within Text• Domain• Text Analysis• Locale Filter• Provenance
• External Resource• Target Pointer• ID Value• Preserve Space• Localization Quality Issue• Localization Quality
Rating• MT Confidence• Allowed Characters• Storage Size
Sasaki – SOAP! 2014 8
ITS 2.0: High level features
• Can be applied to general XML content and to HTML5
• Partially natively supported in HTML5– E.g. HTML5 “translate” attribute
• Applying data categories– locally: ITS attributes in content– globally: CSS like selector mechanism, using XPath
• Independent data categories: no need to support (as tool maker or user) everything
Sasaki – SOAP! 2014 9
Example: “Translate”local and global
<p>The <span translate=no>World Wide Web Consortium</span> is making the World Wide Web worldwide!</p>
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"> <its:translateRule selector="//h:code" translate="no" xmlns:h="http://www.w3.org/1999/xhtml"/></its:rules>
Sasaki – SOAP! 2014 10
Example: “Localization Note”
<dataits:locNote="%1\$s is the original text's date in the format YYYY-MM-DD HH:MM always in GMT" …> <value>Translated from English content dated <span id="version-info">%1\$s</span> GMT.</value> </data>
Sasaki – SOAP! 2014 11
Example: “Elements within Text”
<text xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0"> <body> <par>Text with <bold its:withinText="yes">bold</bold>.</par> </body></text>
Sasaki – SOAP! 2014 12
Example: “Locale Filter”
<book xmlns:its="http://www.w3.org/2005/11/its"> <info> <legalnotice its:localeFilterList="en-CA, fr-CA"> <para>This legal notice is only for English and French Canadian locales.</para> </legalnotice> </info></book>
Sasaki – SOAP! 2014 13
Example: “Allowed Characters”
<p>Login names can only use letters from A to Z (upper or lowercase) and the character underscore (_) and minus (-). For example: <code its-allowed-characters=[a-zA-Z_\-]>Huck_Finn</code>.</p>
Sasaki – SOAP! 2014 14
Example: “Terminology”
<p>And he said: you need a new <quote its:term="yes" its:termInfoRef="http://www.directron.com/motherboards1.html" its:termConfidence="0.5">motherboard</quote></p>
Sasaki – SOAP! 2014 15
Example: “MT Confidence”
<body its-annotators-ref="mt-confidence|file:///tools.xml#T1"> <p> <span its-mt-confidence=0.8982>Dublin is the capital of Ireland.</span>
Sasaki – SOAP! 2014 16
Example: “Provenance”
<p its-tool-ref="http://www.onlinemtex.com/2012/7/25/wsdl/" its-org="acme-CAT-v2.3" its-prov-ref="http://www.examplelsp.com/excontent987/production/prov/e6354" its-rev-org="acme-CAT-v2.3" >This paragraph was translated from the machine.</p>
Sasaki – SOAP! 2014 17
Example: “Localization Quality Issue”
<p> <span data-mytool-qacode=named_entity_not_found its-loc-quality-issue-comment="Should be Thomas Cahill.” its-loc-quality-issue-profile-ref=http://example.org/qaMovel/v1 its-loc-quality-issue-severity=100 its-loc-quality-issue-type=inconsistent-entities>Christian Bale</span> (1867–1934) conceived of an instrument … </p>
Sasaki – SOAP! 2014 18
Example: “Text Analysis”
• Identify concepts in content, like named entities– persons, places, events, …
• Store identifiers in (Web) content• Provide a link to multilingual linked data
sources – a basis for content curation
Sasaki – SOAP! 2014 19
Example: “Text Analysis”
<p><span its-ta-confidence="0.7" its-ta-class-ref="http://nerd.eurecom.fr/ontology#Location" its-ta-ident-ref="http://dbpedia.org/resource/Dublin" >Dublin</span> is the <span its-ta-source="Wordnet3.0" its-ta-ident="301467919" its-ta-confidence="0.5" >capital</span> of Ireland.</p>
Sasaki – SOAP! 2014 20
What content authors can do with multilingual linked data sources and ITS 2.0
• Add value to content beyond the content itself• Curate content: provide identifiers, context,
cross lingual information• Tool examples:
1) Generation of ITS 2.0 “Text Analysis” for ePub, and Schema.org markup
2) Generation of translation suggestions3) Working with linked data in the browser –
without understanding details
Sasaki – SOAP! 2014 21
TOOLING 1): GENERATION OF ITS 2.0 “TEXT ANALYSIS” AND SCHEMA.ORG MARKUP FOR EPUB
Sasaki – SOAP! 2014 22
Setup
• oXygen XML editor, modified for ePub / XHTML5 author mode
• Input: ePub or XHTML5 documents• Output: documents enriched with Schema.org
structured information• User does information generation in a
WYSIYWG mode
Sasaki – SOAP! 2014 23
Process
1. Automatic generation of entity annotation, using DBpedia spotlight, producing DBpedia identifiers
2. Access to DBpedia information with pre-defined linked data queries
3. Generation of Schema.org markup
Sasaki – SOAP! 2014 24
1. Automatic generation of entity annotation
• Input:
<p>Welcome to Dublin in Ireland, the home of Samuel Beckett.</p>
Sasaki – SOAP! 2014 25
1. Automatic generation of entity annotation
• Output, stored with ITS 2.0 “Text Analysis” markup:
<p>Welcome to <span its-ta-ident-ref="http://dbpedia.org/resource/Dublin" ...>Dublin</span> in Ireland, the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett" ...>Samuel Beckett</span>.</p>
Sasaki – SOAP! 2014 26
2. Access to DBpedia information
• Using DBpedia identifiers from previous steps in linked data query templates. Example query (part of the query), checking whether entity is a person:
SELECT ?birthPlace ... WHERE{ <http://dbpedia.org/resource/Samuel_Beckett> rdf:type foaf:Person.... }
Sasaki – SOAP! 2014 27
3. Generation of Schema.org structured information
• Using output of previous step (query result)• Generating Schema.org structured
information– Taking types derived from DBpedia into account,
currently• http://schema.org/Person• http://schema.org/Place
Sasaki – SOAP! 2014 28
3. Generation of Schema.org structured information
• Input: linked data query result and marked-up document
<p>Welcome to <span its-ta-ident-ref="http://dbpedia.org/resource/Dublin" ...>Dublin</span> in Ireland, the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett" ...>Samuel Beckett</span>.</p>
Sasaki – SOAP! 2014 29
3. Generation of Schema.org structured information
• Output: marked-up document with Schema.org structured information
<p>Welcome to <span ... itemscope="" itemtype="http://schema.org/Place"><a itemprop="url" href="http://en.wikipedia.org/wiki/Dublin"><span itemprop="name">Dublin</span></a></span>…</p>
Sasaki – SOAP! 2014 30
3. Generation of Schema.org structured information
• Output: auto-generating markup + text
<p>... Samuel Beckett ... (born in <span itemscope="" itemtype="http://schema.org/Place"><a itemprop="url" href="http://en.wikipedia.org/wiki/Foxrock"><span itemprop="name">Foxrock</span></a></span>)</p>
Sasaki – SOAP! 2014 32
Broad review: a view of schema.org types that may work well
Book (dbpedia-owl:Book)City (dbpedia-owl:City)Country (dbpedia-owl:Country)Event (dbpedia-owl:Event)Hotel (dbpedia-owl:Hotel) Library (dbpedia-owl:Library)Movie (dbpedia-owl:Film)Person (foaf:Person)Place (dbpedia-owl:Place)Organization (dbpedia-owl:Organization)
Sasaki – SOAP! 2014 34
Generating translation suggestions
• Input: like before• Steps:
1. Entity annotations (again)2. Access to DBpedia and Wikidata to get
translation suggestions3. Storing the results as a localization note
Sasaki – SOAP! 2014 35
1. Automatic generation of entity annotation
• Output, stored with ITS 2.0 “Text Analysis” markup:
<p>Welcome to <span its-ta-ident-ref="http://dbpedia.org/resource/Dublin" ...>Dublin</span> in Ireland, the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett" ...>Samuel Beckett</span>.</p>
Sasaki – SOAP! 2014 36
2. Access to DBpedia and Wikidata to get translation suggestions
• Get translation suggestion from Dbpedia
SELECT ?o WHERE { <http://dbpedia.org/resource/Samuel_Beckett> rdfs:label ?o}
Sasaki – SOAP! 2014 37
2. Access to DBpedia and Wikidata to get translation suggestions
• Get translation suggestion from Wikidata
http://www.wikidata.org/w/api.php?action=wbgetentities&sites=itwiki&titles=Samuel%20Beckett
Sasaki – SOAP! 2014 38
3. Storing the results as ITS 2.0 localization note
• Input: DBpedia + Wikidata query result and marked-up document
<p>… the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett" ...>Samuel Beckett</span>.</p>
Sasaki – SOAP! 2014 39
3. Storing the results as localization note
• Output: Translation suggestions stored as localization note
<p>… the home of <span its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett"its-loc-note="TRANSLATION SUGGESTIONS: 1) wikidata:サミュエル・ベケット2) dbpedia:サミュエル・ベケット " ...>Samuel Beckett</span>.</p>
Sasaki – SOAP! 2014 40
TOOLING 3: WORKING WITH LINKED DATA IN THE BROWSER – WITHOUT UNDERSTANDING DETAILS
Sasaki – SOAP! 2014 41
MLOD4CON
• Working with links to external multilingual data sources
• Under the hood: lot’ of technology– ITS 2.0, RDF, SPARQL, JavaScript, …
• Good news: the user does not need to know about these
Demo at http://www.w3.org/People/fsasaki/mlod4con/
Sasaki – SOAP! 2014 43
Issues
• Learn from communities what they want to do with ITS 2.0 and linked data sources– Content creators and content architects,
translators, XML / Web tool makers, researchers in the data and language technology area, …
• Provide adequate tooling• Look carefully into requirements: “Too much
information is no information!”
Sasaki – SOAP! 2014 44
What next for you?
• ITS 2.0 Toolinghttps://www.w3.org/International/its/wiki/ITS_Implementations
• Videos explaining ITS 2.0 usagehttps://www.youtube.com/user/W3CITS20/videos
• Linked Data for Language Technology Community Group: discuss use cases and requirements for multilingual linked data
http://www.w3.org/community/ld4lt/
• ITS Interest Group: Join the community of ITS 2.0 users and implementers
https://www.w3.org/International/its/ig/
Sasaki – SOAP! 2014 45
Value Beyond Content Creation: Introducing ITS 2.0
Felix SasakiDFKI / W3C Fellow
Slides athttp://www.w3.org/Talks/2014/1003-soap-sasaki.pdf