NoTube: Models & Semantics
-
Upload
notubeproject -
Category
Technology
-
view
478 -
download
2
description
Transcript of NoTube: Models & Semantics
Monday, March 26, 2012
WP1 Overview
• “Backend” shared datasets and services• Mappings, integration and common vocabulary• Extra datasets to support usecase scenarios
2
Monday, March 26, 2012
WP1: Year 3 Direc2on & Achievements
• Moving from single ‘warehouse’ to distributed set of databases, datasets and services
• Planning for sustainable life-‐aFer-‐project• Integra2ng feedback from end-‐to-‐end demos
3
Monday, March 26, 2012
4
Monday, March 26, 2012
Why WP1? two roles
• NoTube internal: a hub for data sharing• NoTube external: show how shared datasets and vocabularies help with user-‐facing “Web and TV” problems
• “show” -‐cri2cally-‐ includes “thinking out loud” as we explore, via blog, email, twiTer etc.– scholarly ar2cles rarely reach our target audiences
5
Monday, March 26, 2012
Outreach message
• Let metadata flow widely -‐ adver2sing content, rather than be a hidden asset
• Iden/fy and link content with useful URLs(*)• Open APIs to control TV and link devices [WP7c]
6
...from W3C TV & Web position paper (with Project Baird), Berlin 9 Feb 2011
WP1 concerned primarily with the first two: getting metadata into the Web from source, rather than scraping, guessing, approximating.
Monday, March 26, 2012
Aside: RDFa went mainstream
• Try ‘View source’ on IMDB, RoTen Tomatoes, BBC, tv.com sites to find RDF descrip2ons of TV content.
• NoTube’s approach was to lead by example, to engage with industry and to plan from the beginning for the ‘aFerlife’.
• This strategy worked.
7
Monday, March 26, 2012
8
Facebook OGP
tv.com 'The Wire' page
...simple, extensible standards are being adopted
OGP since 2010; schema.org since 2011...
Monday, March 26, 2012
TV Data Warehouse
• We s2ll host several crawls of TV EPG data• Trend is for data to be more cleanly available from source, without scraping
• Crawling, aggrega2on and integra2on s2ll useful, but less scraping required
• Crawled 'data warehouse' also used as a research testbed collec2on
9
Monday, March 26, 2012
WP1: Example Datasets
• WP7c/WP3 use DBpedia/Wikipedia URLs for topics; covers all mainstream areas.
• BBC also using Lonclass/UDC topic codes (we’re helping prepare this for sharing)
• For Music, we adopt MusicBrainz IDs• Mapping diverse representa2ons of ‘genre’• “Organic” item/topic similarity measures derived from user data from WP3
10
Monday, March 26, 2012
WP1: Data Services
• Data Services exposed as sta2c files:– Show how to embed RDFa in HTML– Publish as RDF/XML Linked Data
• Interac2ve Data Services:– Using W3C SPARQL, SQL or SOLR/Lucene, over HTTP and/or XMPP.
11
Monday, March 26, 2012
WP1: Exploita2on and Sustainability
• WP1’s approach designed to outlive NoTube• Use, augment and contribute to external data
– e.g. DBpedia, Archive.org, W3C & wider Web of data trend (e.g. RDFa adop2on)
– also we demonstrate e.g. on blog how we did it -‐ so others can replicate it
– WP4 enrichments can be fed back to externals, e.g. similarity metrics & clusters
12
Monday, March 26, 2012
WP1: Sustainability 2• NoTube’s 2010 W3C “Web & TV” posi2on paper lobbied for unique IDs & public metadata for video content; this is now going mainstream.
• VUA will con2nue hos2ng some data, using PURL.org so can pass e.g. to W3C later.
• Collab with Facebook OGP (helped with their RDFa adop2on) and now search engine's Schema.org (RDFa and extending TV vocab).
13
Monday, March 26, 2012
14
schema.org
Monday, March 26, 2012
Workpackage Links
• Background data for all Workpackages• Collaborated with WP2 on BMF RDF models• Closer 2es throughout WP3/7 developments• WP4 en2ty and topic URIs point to WP1• Outreach work around RDFa, Posi2on Paper
15
Monday, March 26, 2012
2nd review comments
• Not clear though how this work has built upon the results of year 1, and how the current progress is in line with the case studies. – Worked more closely and pragma1cally with case studies in
WP7, especially 7c and related WP3 work. Moved towards more decentralised model, instead of 'warehouse'.
– 7c collabora1on with KMI's 'Watch and Buy' scenario, and with WP4 1med ad inser1on work, used EU p2pnext 'limo' work; also egtaMETA from EBU from 7c
– WP1 work became more "hands-‐on"; we helped WP7 extract datasets such as TED.com and Archive.org which we expect will shortly be replaceable by cleaner informa1on from 'official' sources.
16
Monday, March 26, 2012
2nd review comments
• No relevant state of the art is documented and no details or cita<ons on automated algorithms are given. Evalua<on is restricted to examples and no quan<ta<ve data are given.– We accept weakness in report (lack of scholarly/scien1fic detail); chose to focus on more informal communica1on with outside world in final phase. A 2nd version of the doc was produced, but main changes were around 'life aUer project' themes rather than adding more scien1fic and scholarly detail.
17
Monday, March 26, 2012
2nd review comments
• A close collabora5on with WP7 is recommended in order to ensure that work meets the requirements of the use cases.– this very well describes our emphasis in final phase
18
Monday, March 26, 2012
Lessons Learned
• It's hard to simulate an evolving global data ecosystem; but we've played a small part in some huge changes.
• Publishers will adopt simple Seman2c Web standards when they are given an incen5ve.
• It's hard for a 4-‐year old plan to stay relevant in such an environment; ability to be agile was cri2cally important.
19
Monday, March 26, 2012
WP1 Summary
• Used open standards (RDF) and largely open data (e.g. Wikipedia/DBpedia)
• Integrated, mapped and data-‐mined• Contribu1ng our addi1ons back to the community /
commons (highlight: BBC sims)• Documen1ng what we learned for external developers and
subsequent projects
20
Questions?
Monday, March 26, 2012
21
Monday, March 26, 2012
22
Monday, March 26, 2012
WP1: End-‐to-‐End issues
• In final year, our End-‐to-‐End scenarios have more mature implementa2ons
• Feedback from WP3/7c: key issue is sparsity of large vocabularies when used for record matching. No single solu2on here.
• Integra2ng techniques from WP4 (e.g. clustering, data-‐mining) cri2cal for applying large and chao2c vocabularies for prac2cal recommenda2ons.
23
Monday, March 26, 2012