Exposing Hidden Relationships: Practical Work in Linked Data using Digital Collections

36
EXPOSING HIDDEN RELATIONSHIPS: PRACTICAL WORK IN LINKED DATA USING DIGITAL COLLECTIONS Cory Lampert and Silvia Southwick UNLV University Libraries Digital Collections April 23, 2015 Linked Data & RDF: New Frontiers in Metadata and Access Conference

Transcript of Exposing Hidden Relationships: Practical Work in Linked Data using Digital Collections

EXPOSING HIDDEN RELATIONSHIPS: PRACTICAL WORK IN LINKED DATA USING DIGITAL COLLECTIONS

Cory Lampert and Silvia Southwick

UNLV University Libraries Digital Collections

April 23, 2015

Linked Data & RDF: New Frontiers in Metadata and Access Conference

OVERVIEW

Video Demo UNLV Linked Data Project Digital Collections Metadata: Source of Rich

(But Hidden) Relationships Video Demo Next Steps, Future Questions

VIDEO DEMO

This short video (no sound, just image) will give a preview of what linked data may look like to users.

It shows the Virtuoso Pivot Viewer software acting upon UNLV’s Linked Open Data – triplestore.

Think about how this is similar/different to how users currently view data in library systems.

[PLAY PIVOTVIEWER.mp4]

EXPLORING LOD: TAKING THEORY TO PRACTICE• How we started• Goals set• What we accomplished

HO

W W

E B

EG

AN

Conferences and “buzz”

Curiousity and professional development

Exploration and pilot project

Compelling results; sharing impact of what we’ve learned

Assessment

Much more to do... A sense of humor is helpful!

Photo: Five men with burros, circa 1900, Tonopah/Goldfield Collection

MOTIVATION

Information encapsulated in records

Records contained in collections

Very few links are created within and/or across collections

Links have to be manually created

Existing links do not specify the nature of the relationships among records

This structure hides potential context (links) within and across collections

Free metadata from silos Expose rich relationships Leverage powerful,

seamless, interlinking of data from multiple sources

Discover and query data in new ways

More precise searching More opportunities to

repurpose data

Current Practice LOD Potential

POLL

Please use the agree/disagree button, available from the pull down menu at the top of the screen to respond to the statement below:

Statement: There is interest in doing practical work with linked open data at my institution.

FOUNDATION OF PILOT

Our digital collections consist of unique materials documenting the history of Southern Nevada stored in CONTENTdm; project focused on LOD for visual material collections

Definition of LOD we are using: “Linked Data refers to a set of best practices for publishing and interlinking data on the Web.”

A good way to better understand this is the 5-Star Data diagram: http://5stardata.info/

PR

EPA

RIN

G F

OR

DEPA

RTU

RE

Before we launch into a discussion of how we created our linked data, let’s take a short trip.

We will start in our current data: digital collections metadata records, and end in the new world of linked open data.

Photo: Photograph of Howard Hughes in cockpit of the second XF-11, April 4, 1947, Howard Hughes Collection

Graphical Representation: Part of a Record

EXAMPLES OF RECORDS

ShowgirlsMenus

Dreaming the

Skyline

December 12, 1915

EXPOSING HIDDEN LINKS

POLL

Please use the agree/disagree button, available from the pull down menu at the top of the screen to respond to the statement below:

Statement: The diagrams helped me to see how linked data helps to reveal hidden relationships in existing metadata.

UNLV LINKED OPEN DATA PROJECT GOALS

Study the feasibility of developing a common process that would allow the conversion of our collection records into linked data preserving their original expressivity and richness

Publish data from our collections in the Linked Open Data Cloud to improve discoverability and connections across our collections and with data from other related data sets on the Web

ACTIONSTECHNOLOGIES

Clean dataExport data

CONTENTdm

Open Refine

Import dataPrepare dataReconcileGenerate triplesExport RDF

Import dataPublish

Mulgara /Virtuoso

Phase 1

Phase 2

Phase 3

WH

AT W

E L

EA

RN

ED

With interest and motivation, Linked Open Data is a feasible goal

Visualization tools help convey the benefits of LOD work

A pilot quickly turned into a project and then into production

Moving into the next phase required careful examination of current practice focusing on expressing links (relationships)

Photo: Film transparency of a chimpanzee with slot machines at the Sands Hotel, Las Vegas, circa late 1950s, Sands Collection

LOD APPROACH AFTER THE PILOT

After learning the concepts, applying a model, and testing technologies, the LOD transformation process becomes repeatable

Sustainability of process depends upon data quality

Data begins with existing metadata in current collections; there are many lessons from the pilot that should inform revisions to current practice (even if LOD is more in future than present)

MINING THE METADATAApplication profile

Shared Vocabularies

Managing Controlled Vocabularies

Managing Linked Data

When should we start preparing metadata for Linked Data?

EVOLUTION OF METADATA

OUR FOCUS IS ON METADATA

Why? Metadata is essential for establishing

relationships Any metadata?

Ability of discovering relationships is directly affected by metadata quality

It is critical to: Use well-established Controlled Vocabularies

(particularly if they are linked data ready) Rigorously control local terms Re-use URIs Assign URIs for local terms

METADATA CREATION – COMMON APPROACHES

Focus is on the collection being created Usually metadata consistency is managed within

collections

Not much rigor is used to enter controlled vocabulary terms Exs.: Misspellings, use of terms that do not match the

preferred terms, etc.

Limited control of local terms

Implications: Ability to identify relationships within and across

collections is decreased

When should we start preparing metadata for Linked Data?

WHAT CAN WE DO TO CREATE “SAPIENT” METADATA?

Application

Profile

Re-design strategies

tomanage and use

CVs

WHAT DO I DO WITH MY LEGACY METADATA?

Adjust metadata according

to theApplicatio

nProfile

Apply strategies

tomanage and use

CVs effectively

METADATA MILESTONES AT UNLV LIBRARIES

Adopted an approach that considers each individual digital collection as part of an integrated digital library.

THE UNLV APPLICATION PROFILE

Specifies: which metadata terms UNLV Libraries uses for its digital

collections the source of metadata terms how metadata should be expressed labels to be used for each metadata field

Benefits: Increases consistency of content across digital

collections Improves user interactions with digital collections Indexing guidelines are easy to generate Facilitates transformation to Linked Data Increases compliance with regional and national

aggregators

OUTCOMES

Well-established CVs allow re-use of URIs

Rigorous rules of data entry facilitate reconciliation

Local Controlled vocabularies allow interlinking among local terms / names within collections

Shared vocabularies allow interlinkage among local terms / names across collections

All these actions: allow creation of a single process to

transform digital collections into linked data

Video: [PLAY SUPER-SKELETON-WHH.mp4]

MOVING FROM EXPERIMENTATION TO IMPLEMENTATION

Cleaning and sharing controlled vocabularies from legacy collections (time consuming)

Re-training metadata creators

Re-designing workflows

Delegating additional data management responsibilities

DATA MANAGEMENT

Maintenance of local URIs Terms Authoritative Names

Design and implementation of new processes to maintain synchronization between digital library and linked data set

Design processes to enrich relationships with external data sets

NEXT STEPSFuture Activities

Resources

Video Demo

FUTURE ACTIVITIES

Publish data Interlinking with other data sets Documentation Collaborative activities (regional controlled

vocabularies) Training and staff skill development Interface design and development Work with hierarchical data

VIDEO DEMO

This short video (no sound, just image) will give a preview of what linked data may look like to users.

It shows the Relfinder software acting upon UNLV’s Linked Open Data – triplestore.

Think about how this is similar/different to how users currently view data in library systems.

[PLAY SHOWING RELATIONSHIPS.mp4]

THE LINKED DATA CLOUD

RESOURCES

Leading to Linking: Introducing Linked Data to Academic Library Digital Collections: http://www.tandfonline.com/doi/pdf/10.1080/19386389.2013.826095

A Guide for Transforming Digital Collections Metadata into Linked Data Using Open Source Technologies:

http://www.tandfonline.com/doi/pdf/10.1080/19386389.2015.1007009

UNLV Linked Data Blog (videos posted here): https://www.library.unlv.edu/linked-data

Contact us!

TH

AN

K Y

OU

!

Contact Us:

Cory Lampert [email protected]

Silvia [email protected]

UNLV Digital Collectionswww.d.library.unlv.edu

Questions?

Photo: Photograph of Bluebells posing outside of Pan Am jet, 1958, Donn Arden Collection

QUESTIONS?

Contact:

Cory Lampert [email protected]

Silvia [email protected]

UNLV Digital Collectionswww.d.library.unlv.edu