Post on 24-Dec-2015
MIT’s SIMILE ProjectMIT’s SIMILE ProjectDemonstrating Practical Value of Demonstrating Practical Value of Semantic Web Technology for Semantic Web Technology for Digital LibrariesDigital Libraries
MacKenzie Smith, MIT Libraries
2©MIT CNI Spring 2006
Semantic Web GoalsSemantic Web Goals
Effective management and reuse of data across domains, at Web scale
Standards to reduce the social and technical costs of sharing data
i.e. data interoperability
4©MIT CNI Spring 2006
RDF “Open World” RDF “Open World” PhilosophyPhilosophy
The real world is very, very messy• People lie, cheat, and make mistakes about their data• They do what they must to get the job done• Their decisions are subjective, inconsistent• HTML, XML, other encodings are usually malformed
Standards for dealing with data must cope…
RDF allows for this better than most
5©MIT CNI Spring 2006
RDF “Open World” RDF “Open World” PhilosophyPhilosophyNeed to support a new kind of data?
• OK! Just add a new RDF statement, no need to change an xml or database schema
Need to mix data from several sources?• OK! Just pour them together as RDF statements
Got data that contradicts itself?• OK! Put both statements in RDF and equate or disambiguate
them with more RDF statements; all points of view are possible
6©MIT CNI Spring 2006
The Only Two Things You Really The Only Two Things You Really Need to Know about RDF…Need to Know about RDF…
1. Every piece of data has a URI• i.e. a globally unique identifier
http://web.mit.edu/simile/www/metadata/ocw/Contributor#john_dower
• Needn’t be resolvable on the Web
2. All data relationships are explicitly labeled• Differs from XML, other data standards that hide
relationships in their structure• Can model any kind of data this way
7©MIT CNI Spring 2006
The Digital Library The Digital Library ProblemProblem Digital repositories manage metadata
descriptive, administrative, structural, technical/preservation
Metadata is highly diverse and it evolves
XML/RDBMS solutions are too brittleneed to reduce barriers to interoperability (e.g. cost, prior agreement)
8©MIT CNI Spring 2006
Simple ExampleSimple Example
Qualified DC for digital object description• Supports display, search, browse, versioning, etc.• Consistent across all collections/objects in
repository• Creates internal interoperability of the data model
But • Metadata started out much richer (MARC, ONIX,
PRISM, IMS LOM, VRA, DDI, FGDC, etc.)• Many locally developed domain-based data models
So all of that rich description is lost
9©MIT CNI Spring 2006
Simple ExampleSimple Example
RDF for digital object description• Supports display, search, browse, versioning, etc.• Consistent across all collections, digital items• Creates internal interoperability of the data model
And • Still have all of the original metadata• It’s just remodeled into RDF as a graph and each
data element has a URI added
So all of that rich description is still there to use
10©MIT CNI Spring 2006
Simple ExampleSimple Example
But metadata consists of values that are interpreted, made sense of
• Different encodingse.g. Pablo Picasso == Picasso, Pablo,
1881-1973
• Typose.g. Pablo Picasso == Pablo Picassso
• Homonyms, other collisions across domainse.g. apple the fruit vs apple the
computer
11©MIT CNI Spring 2006
Simple ExampleSimple Example
Qualified DC caseNot much you can do except normalize values
where possible
RDF caseNo need to normalize, add more RDF!
Pablo Picasso sameAs Picasso, Pablo, 1881-1973
Pablo Picasso sameAs Pablo Picassso
Bank (the place where you put your money)
differentFrom Bank (the place next to the river)
Not a complete solution, but a big improvement
12©MIT CNI Spring 2006
Mixing Data Quality, Mixing Data Quality, SemanticsSemantics
But each collection offers a different set of qualities
e.g. level of granularity, correctness, consistency
C1 Q1 M1
collection quality goals metadata
C2 Q2 M2
C? Q? M?
What qualities doesthe union have? Is anybody happy?
+ union
=
?
13©MIT CNI Spring 2006
RDF ChallengesRDF Challenges
RDF enthusiasts• Very ivory tower, where the air is thin
RDF adoption rate• Creates doubt in target audience
Future is cloudy on scalability• Query engines look tractable• Large-scale inferencing… ?
“It’s too complicated”• RDF, RDF/S, OWL, OWL-lite, SPARQL, 50 years of AI
research…
14©MIT CNI Spring 2006
RDF To-DoRDF To-Do Scalability
• Just a matter of doing the work• Not a problem for many domains, applications• Do not necessarily use RDF internally (just where
interoperability or schema evolution is a problem)
Real world, public demonstrations• e.g. Piggy Bank• Uptake in other domains (e.g. biomedical, Oracle db)• Build more short term benefits for RDF adopters• Demonstrations of interoperability wins
Lower the barrier to entry• Open Source toolkits• Tap into innovative energy on the periphery
15©MIT CNI Spring 2006
SIMILE GoalsSIMILE Goals
Make metadata interoperability easier for digital libraries by providing useful tools for browsing, searching and mapping heterogeneous metadata in RDF
16©MIT CNI Spring 2006
Tools for Metadata Tools for Metadata ManagersManagers Gadget
– XML inspector
RDFizers– Batch tools to transform existing XML data into RDF
Solvent– Firefox extension for Javascript screen scraping
Welkin – Graphical tool to inspect/edit RDF graph
17©MIT CNI Spring 2006
GadgetGadget
Works on any well-formed XMLUsed for
•Data exploration, understanding•Data migration, transformation•Data cleanup•Complexity evaluation•Schema adherence understanding•Schema emergence (if none
provided)
18©MIT CNI Spring 2006
Gadget – the big picture of your Gadget – the big picture of your XMLXMLOCW: 2,002,015 Lines of XML
avg. string length # of instances# of unique values
19©MIT CNI Spring 2006
Gadget - Gadget - the big picture of your the big picture of your XMLXML
That’s Odd
20©MIT CNI Spring 2006
RDFizersRDFizers
Input types • Simple Dublin Core (via OAI-PMH)• MARC/MODS• OCW (soon IMS LOM)• VRA Core 3• Email• BibTeX
Done with XSLT style sheets, simple scripts
Need to define RDF “ontologies” for each
22©MIT CNI Spring 2006
Tools for End-UsersTools for End-Users
Longwell– Web-based RDF faceted metadata browser
Piggy Bank– Firefox extension for personal information
management of metadata in RDF
Semantic Bank– Web-based server that allows data publishing and
sharing by individuals, groups, or communities
24©MIT CNI Spring 2006
Example Collection – MIT Example Collection – MIT LibrariesLibraries
MIT Libraries public catalog– books, other publications
MIT OpenCourseWare– course material including visual images
DSpace@MIT – articles, working papers, theses, images, datasets, etc.
FOAF for MIT people
25©MIT CNI Spring 2006
RDF OntologiesRDF Ontologies
MODS– for OPAC MARC data and DSpace data
OCW-specific– Will migrate to IMS LOM eventually
FOAF for people
SIMILE specific (glue)
26©MIT CNI Spring 2006
OpenCourseWare example OpenCourseWare example (N3)(N3)@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema@> .@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix lomEdu:
<http://www.imsproject.org/rdf/imsmd_educationalv1p3#> .@prefix ocw: <http://web.mit.edu/simile/www/2004/01/ocw#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix dcq: <http://dublincore.org/2000/03/13/dcq#> .@prefix : <#> .
[…]
ocw:Lecturerdfs:subClassOf lomEdu:LearningResourceType ;rdfs:label “Lecture”@en .
ocw:Bibliographyrdfs:subClassOf lomEdu:LearningResourceType ;rdfs:label “Bibliography”@en .
27©MIT CNI Spring 2006
Piggy BankPiggy Bank
Firefox extension for managing metadata
• Loads RDF into local Longwell server
Search and faceted browse of local RDF• Views defined by library, other users
Users can find, collect, annotate RDF • Can then publish for access by others
29©MIT CNI Spring 2006
Semantic BankSemantic Bank To persist,
share, publish data on a server
For individuals, groups, communities
e.g. conference proceedings