INLS 520

29
INLS 520 – Erik Mitchell INLS 520 Information Organization

description

INLS 520. Information Organization. Review. Controlled vocabularies Term Lists, Hierarchies, Trees, Paradigms, Facets, Folksonomies Knowledge organization systems Term Lists, Thesauri, Taxonomies, Ontologies. Today. Poster topic review Poster creation concepts - PowerPoint PPT Presentation

Transcript of INLS 520

Page 1: INLS 520

INLS 520 – Erik Mitchell

INLS 520

Information Organization

Page 2: INLS 520

INLS 520 – Erik Mitchell

Review

• Controlled vocabularies– Term Lists, Hierarchies, Trees, Paradigms,

Facets, Folksonomies

• Knowledge organization systems– Term Lists, Thesauri, Taxonomies,

Ontologies

Page 3: INLS 520

INLS 520 – Erik Mitchell

Today

• Poster topic review

• Poster creation concepts

• Automation in metadata & organization– RDF– OWL

• Guest Speaker – Barrie Hayes

Page 4: INLS 520

Creating poster presentations

• Content– Methodological approach

• Title, question, overview, methods, findings, observations

– Conceptual approach• Title, question, model or framework

• Structure• Balancing words & images

• Document flow

• Examples - 1, 2, 3, 4INLS 520 – Erik Mitchell

Page 5: INLS 520

Creating poster presentations

• Technology– Creation tools

• PowerPoint Templates in blackboard• An image editor• HTML

• Display– Flickr, Printed

• Some guidelines 1, 2, 3, 4

INLS 520 – Erik Mitchell

Page 6: INLS 520

Automation

• Group discussion– Based on the articles by Hlava, Hearst, and

Stearns what are some of the primary uses of automation in information organization?

– Is automation primarily a tool to create representations of resources or to enable retrieval of resources?

INLS 520 – Erik Mitchell

Page 7: INLS 520

Automatic Indexing

• Automatic Extraction/representation– Lancaster calls this: “Words or phrases appearing in a text are extracted and

used to represent the content of the text as a whole” (Lancaster 284)

• Automatic Classification/Categorization– The computer compares terms in the document against thesauri or

controlled vocabularies to map a word in the resource to an accepted index term (Lancaster 287)

• Automatic Abstract/Surrogate Generation– The extraction of sentences from documents to create an abstract

(Lancaster 298)

• Automatic index/tool Generation– Creating spelling dictionaries, word lists, indexes, etc to be used for other

automation techniques• Latent Semantic Indexing - Concepts are extracted and analyzed to detect relationships. These

relationships are then used to help identify & rank documents (overview)

INLS 520 – Erik Mitchell

Page 8: INLS 520

Automatic metadata use

• Metadata Harvesting– Using automated techniques to discover and extract metadata from documents

• OAI/PMH

• Knowledge representation & discovery– Using structured ontologies to make statements and inferences about

resources– OWL Ontologies

• Metadata interoperability– Using structured metadata to automatically connect with other metadata

systems– OpenURL, XML Schema

• Transformation– Creating new documents through the automatic extraction, analysis, and

processing of resources

– XML/XSL

INLS 520 – Erik Mitchell

Page 9: INLS 520

Metadata standards

• XML– Structure & syntax but no ‘semantic constraints’

• XML Schema– Restrict structure of XML, extend XML with datatypes

• RDF– Data-model showing relationships among resources

• RDF Schema– ‘Vocabulary for describing properties and classes of RDF

resources’

• OWL– Added vocabulary for describing properties & classes

• Relationships, cardinality, equality, charicteristics

INLS 520 – Erik Mitchell

Page 10: INLS 520

INLS 520 – Erik Mitchell

RDF

• Subject, property, object triples

• Transmitted in xml

• RDFS extends RDF with an ontology language– Properties, specialization

• OWL – More powerful extension of RDFS– Uses same syntax of RDF

Page 11: INLS 520

INLS 520 – Erik Mitchell

RDF Model

Webpage: http://www.stuff.com “Saki Knafo”

Author

(Value)

Object

(Property type)

Predicate

(Resource)

Subject

“The author of the stuff webpage is Saki Knafo”

-A literal, a triple, a statement

Page 12: INLS 520

INLS 520 – Erik Mitchell

How is RDF different?

• RDF is a descriptive model that – Allows variable contextualized description– Deconstructs the descriptive process– Allows more granular automated

processing of data– Uses exact markup to indicate the context

of values (namespaces, schemas)– Bags, Sequences, Alternative values,

parseType

Page 13: INLS 520

INLS 520 – Erik Mitchell

Encoding RDF in XML<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdf:Description rdf:about="http://www.stuff.com/">  <dc:title>The Hang: The Island of Black

Jeans</dc:title>   <dc:creator>SAKI KNAFO</dc:creator>   <dc:date>Sun, 16 Sep 2007 01:04:40 GMT</dc:date>   <dc:description>descriptive

content</dc:description>   </rdf:Description></rdf:RDF>

Page 14: INLS 520

INLS 520 – Erik Mitchell

Iterative RDF description<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:vcard="http://dli.grainger.uiuc.edu/publications/metadatacasestudy/dc_schemas/vcard.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdf:Description rdf:about=“http://www.stuff.com">  <dc:title>The Hang: The Island of Black Jeans</dc:title>   <dc:creator rdf:href = "#Creator_001"/>  <dc:identifier>http://www.stuff.com</dc:identifier>   <dc:date>Sun, 16 Sep 2007 01:04:40 GMT</dc:date>   <dc:description>descriptive content</dc:description>   </rdf:Description> <rdf:Description ID="Creator_001">

rdf:about="http://dli.grainger.uiuc.edu/publications/metadatacasestudy/dc_,,,">  <vcard:given>Saki</vcard:given>

<vcard:family>Knafo</vcard:family><vcard:email>

<vcard:userid>[email protected]</vcard:userid></vcard:email>

  </rdf:Description></rdf:RDF>

Page 15: INLS 520

INLS 520 – Erik Mitchell

RDFS

• RDF Schema– Defines additional rdf elements that help

type relationships

• Special Classes– Based on RDF Classes / Properties /

Attributes with additional• http://www.w3schools.com/rdf/rdf_reference.asp

• Allows the creation of vocabularies / ontologies

Page 16: INLS 520

INLS 520 – Erik Mitchell

Ontology Definitions

• “The study of being or existence”

• “A conceptualization of a specification” (Gruber)

• “An ontology formally defines a common set of terms that are used to describe and represent a domain.” (OWL)

Page 17: INLS 520

INLS 520 – Erik Mitchell

Ontology Concepts

• Classes – Names of objects in the domain

• Relationships between classes• Connections between classes

• Properties of classes• Background or identifying knowledge of these objects

• Constraints on these properties & relationships

• Limits and parameters of the relationships

Page 18: INLS 520

INLS 520 – Erik Mitchell

A good ontology has

• Features:– Meaningful – all classes have instances– Accurate / correct– Non-redundant – each class/instance is

represented in a single way– Rich in description – context, content

• Enabled functionality:– Able to use queries to connect new pieces of

information– Use XML & definitions to integrate knowledge

across domains

Page 19: INLS 520

INLS 520 – Erik Mitchell

Ontology Continuum

• Keyword Lists

• Basic Thesauri

• Complex Thesauri

• Taxonomies

• Simple Ontologies (wordnet)

• Complex Ontologies (OWL)

Page 20: INLS 520

INLS 520 – Erik Mitchell

SHOE Ontology project – • Possible to build an ontology for anything

– Simple HTML Ontology Extensions (SHOE) Project

• http://www.cs.umd.edu/projects/plus/SHOE/

• http://www.cs.umd.edu/projects/plus/SHOE/html-pages.html

• Sample projects

– Beer Ontology• http://www.cs.umd.edu/projects/plus/SHOE/onts/index.html#beer

– Document Ontology• http://www.cs.umd.edu/projects/plus/SHOE/onts/docmnt1.0.html

Page 21: INLS 520

INLS 520 – Erik Mitchell

Ontology Concepts

• Multiple inheritance• Vertical and horizontal

relationships• Decomposed subject/object• Predicate based description

(isRelatedto, hasVersion)

Page 22: INLS 520

INLS 520 – Erik Mitchell

Creating an Ontology

• Determine Scope of field, define boundaries

• Check for existing ontologies, vocabularies

• Select a top-down/bottom-up approach– Identify concepts, vocabulary, parameters,

constraints

• Identify relationships– Multiple hierarchies, inheritance

• Build, test, maintain

Page 23: INLS 520

INLS 520 – Erik Mitchell

OWL (Web Ontology Language)

• An ontolgy that is geared towards representing information on the web– Classes, properties, and relationships that describe

URIs and their facets.

• Based on the Triple concept– Subject, Predicate, Object– 3 versions: OWL-Lite, OWL-DL, OWL-Full

• Formatted in RDF/XML– Uses RDF and RDFS as a foundation– Adds new elements in the owl namespace

Page 24: INLS 520

INLS 520 – Erik Mitchell

OWL Versions

• OWL-Lite– Simple hierarchies, constraints

• OWL-DL– Uses description logics

• Logic-based semantic markup based on first-order predicate logic

– Still guarantees finite relationship processing– Adds ‘reasoning’ capacity to infer

information/relaitonships

• OWL-Full– Most complex– Open ended, possible to get into infinite processing

Page 25: INLS 520

INLS 520 – Erik Mitchell

OWL Example<?xml version="1.0"?> <rdf:RDF

xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:rdfs="http://www.w3.org/2000/01/rdfschema#" xmlns:owl=http://www.w3.org/2002/07/owl#xmlns=http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/part.owl#xml:base="http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/part.owl">

<owl:Ontology rdf:about=“> <owl:versionInfo rdf:datatype="http://www.w3.org/2001/X...">1.0</owl:versionInfo> <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >

An ontology containing the basic part relations: partOf, hasPart, partOf_directly, and hasPart_directly. These are described in the accompanying note. Author: Chris Welty

</rdfs:comment> </owl:Ontology> <owl:TransitiveProperty rdf:ID="partOf">

<owl:inverseOf> <owl:TransitiveProperty rdf:ID="hasPart"/>

</owl:inverseOf>

</owl:TransitiveProperty> <owl:ObjectProperty rdf:ID="hasPart_directly">

<rdfs:subPropertyOf rdf:resource="#hasPart"/> <owl:inverseOf>

<owl:ObjectProperty rdf:ID="partOf_directly"> <rdfs:subPropertyOf rdf:resource="#partOf"/> </owl:ObjectProperty>

</owl:inverseOf>

</owl:ObjectProperty> </rdf:RDF>

(Chris Welty)

Page 26: INLS 520

OWL – Lite features

• Class• A collection of things related to each other by properties

• rdfs:subClassOf• A way of showing hierarchical class relationships

• rdf:Property• A stated relationship between an thing and a value

(hasChild, hasRelative, hasSibling, hasAge)• Bi-directional, Transitive (hasAncestor),

• Rdf:subPropertyOf• Similar to subClassOf, a way of showing property

hierarchies• Individual

• Instances of classes (objects)INLS 520 – Erik Mitchell

Page 27: INLS 520

OWL relationships

INLS 520 – Erik Mitchell

Practical guide to OWL ontologies

Page 28: INLS 520

INLS 520 – Erik Mitchell

Some OWL Examples

• Airport

• Pizza

Page 29: INLS 520

INLS 520 – Erik Mitchell

Next Week(s)

• 10/28 – No class

• 11/4 – Metadata based services, guest speaker

• 11/11 – no class

• 11/18 – no class

• 11/25 – semantic web

• 12/2 – final projects due