Big Metadata: Mining Special Collections Catalogs for New Knowledge

15
Big Metadata Mining Special Collections Catalogs for New Knowledge @AllisonJaiODell #rbms15 2015 RBMS Conference, 25 June, Oakland & Berkeley, CA #gillsans #sorrynotsorry

Transcript of Big Metadata: Mining Special Collections Catalogs for New Knowledge

Big MetadataMining Special Collections Catalogs for New Knowledge

@AllisonJaiODell

#rbms152015 RBMS Conference, 25 June, Oakland & Berkeley, CA

#gillsans #sorrynotsorry

MetadataData about data

“Metadata was traditionally in the card catalogs of libraries”

-- Wikipedia

“We kill people based on metadata”

Big Data“Big data is an evolving term that describes

any voluminous amount of structured, semi-structured, and unstructured data that has the potential to be mined for information.”

-- Margaret Rouse

IT Acronyms: A Quick Reference Guide

Volume

Velocity

Variety

Veracity

Big MetadataA voluminous amount of semi-structured

data that has the potential to be mined for information

Data Mining“Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in

novel ways that are both understandable and useful to the data owner.”

-- David Hand, Heikki Mannila, Padhraic Smyth

Principles of Data Mining

Digital Humanities“By digital humanities, we mean research

that uses information technology as a central part of its methodology, for creating and/or

processing data.

“The digital humanities used to be known as Humanities Computing, or ICT (Information

and Communications Technology) for humanities research. The use of the term

reflects a growing sense of the importance that digital tools and resources now have for

humanities subjects.”

-- University of Oxford

What are the Digital Humanities?

Visualization“Data visualization is the presentation of data

in a pictorial or graphical format. For centuries, people have depended on visual

representations such as charts and maps to understand information more easily and

quickly”

-- SAS

Topic Modeling“Topic models provide a simple way to

analyze large volumes of unlabeled text. A ‘topic’ consists of a cluster of words that

frequently occur together.”

-- MAchine Learning for LanguagE Toolkit (MALLET)

Pattern Matching“Pattern matching is the act of checking a

given sequence of tokens for the presence of the constituents of some pattern.”

-- Wikipedia

“A regular expression (regex or regexp for short) is a special text string for describing a

search pattern. You can think of regular expressions as wildcards on steroids.”

-- regular-expressions.info

ToolsR

D3

Gephi

MIT Exhibit

Tableau

FusionCharts

PALLADIO

MALLET

Topic-Modeling-Tool

ArchExtract

Stanford Named Entity Recognizer

Jigsaw

More in the DH Toychest

Shop Your Closet“You really can repurpose what you have.

Look in the back of the closet at the garments and whole outfits you forgot you

have. Mix it all up in new combinations. “

-- Deborah L. Jacobs

10 Ways to ‘Shop Your Closet’

Share Everything PlanData dumps

Export options

Harvesting enabled

Provenance Metadata“Assertions about description statements or

description sets”

-- DCMI Metadata Provenance Task Group

Creation & revision history

Policy documentation

SummaryMetadata is data

Your catalog is full of data

Do some data mining

Make some cool discovery experiences

Make your researchers happy

Questions?Allison Jai O’Dell

Metadata Librarian

University of Florida

[email protected]

@AllisonJaiODell

#rbms15