Data Publishing in Archaeozoology

39
Data Publishing in Archaeozoology or “Everybody knows that a 14 is a Sheep” Sarah Whitcher Kansa Alexandria Archive Institute OpenContext.org Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licens es/by/3.0/>

Transcript of Data Publishing in Archaeozoology

Page 1: Data Publishing in Archaeozoology

Data Publishing in Archaeozoology

or “Everybody knows that a 14 is a Sheep”

Data Publishing in Archaeozoology

or “Everybody knows that a 14 is a Sheep”

Sarah Whitcher KansaAlexandria Archive Institute

OpenContext.org

Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License

<http://creativecommons.org/licenses/by/3.0/>

Page 2: Data Publishing in Archaeozoology

Main PointsMain Points

- Reproducibility and new research opportunities require data sharing

- Raw data are not sufficient- Publishing open data on the

Web is a solution- Publishing data takes special

expertise

Page 3: Data Publishing in Archaeozoology

Good scientific practice requires data sharing.

We cannot trust results based on hidden data.

Good scientific practice requires data sharing.

We cannot trust results based on hidden data.

Page 4: Data Publishing in Archaeozoology

• Limits of print (entrenched practice but not best practice)

• Data preservation crisis (wasted effort)

• Hard to compare and integrate data now

The ChallengesThe Challenges

Page 5: Data Publishing in Archaeozoology

Policy Consensus:

Urgent Need for Better Data Practices!

Policy Consensus:

Urgent Need for Better Data Practices!

Page 6: Data Publishing in Archaeozoology

DIPIR (http://www.dipir.org)

3-Year project, Oct. 2010-Sept. 2013 National Leadership Grant from the Institute for

Museum and Library Services (LG-06-10-0140-10) Ixchel Faniel (PI), Elizabeth Yakel (Co-PI)

Page 7: Data Publishing in Archaeozoology

Raw Data Can Be UnappetizingRaw Data Can Be Unappetizing

Page 8: Data Publishing in Archaeozoology

Data Documentation PracticesData Documentation Practices “I use an Excel spreadsheet…which I … inherited from my research

advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, ‘This is ridiculous.’… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13)

A long way to go before we get usable, intelligible data

Page 9: Data Publishing in Archaeozoology

Sometimes data is better served cooked.

Page 10: Data Publishing in Archaeozoology

Adapt “publishing” metaphor to digital data

Page 11: Data Publishing in Archaeozoology

• Cleaned, described, organized• More intelligible and cohesive• Open access• Linked to other resources (including print

publications)• Machine-readable for discovery and reuse• Archived and curated (CDL)

What is Data Publication?What is Data Publication?

Putting editorially-vetted data on the WebPutting editorially-vetted data on the Web

Page 12: Data Publishing in Archaeozoology

• Enhanced presentation• Enhanced search, discovery, understanding• Depth & breadth (linked to project data, other datasets,

print publications, etc.)• Allowing for Linked Open Data = facilitates future use• Professional advancement

• Takes time, effort• Requires informatics expertise

Benefits need to outweigh challenges

The Bad:

The Good:

Benefits & ChallengesBenefits & Challenges

Page 13: Data Publishing in Archaeozoology

Thousand FlowersThousand Flowers

Started in 2007 Integrates and publishes

various forms of archaeological documentation (structured data, media, documents)

Not a repository, but archived with California Digital Library

Interoperability via web services, increasing emphasis on Linked Data

Page 14: Data Publishing in Archaeozoology

Data Publishing

Data Quality and Standards Alignment(1) Check consistency(2) Edit functions(3) Align to common standards

(“Linked Data” if applicable)(4) Issue tracking, version

control

Page 15: Data Publishing in Archaeozoology
Page 16: Data Publishing in Archaeozoology
Page 17: Data Publishing in Archaeozoology
Page 18: Data Publishing in Archaeozoology

Data Publishing

Data Publishing Comprehensive (Kenan Tepe: 30K

photos, documents, object descriptions)

Added capabilities (search, analysis, visualization)

More attractive, usable data Interactions with data editors

improve data

Page 19: Data Publishing in Archaeozoology

• Citation provided for each item

• CDL archival service to give permanence

Page 20: Data Publishing in Archaeozoology

Beyond the SiloBeyond the Silo

Often too much emphasis on single systems, need to consider relationships across systems

Even if one reaches some scale, it can't be isolated from the rest of the Web

Machines are important “audiences” (e.g. RESTful Services: Atom, AtomPub, JSON, etc.)

Page 21: Data Publishing in Archaeozoology

Linked Open DataLinked Open Data

Regarded as best practice for sharing

data (among informatics researchers)

Page 22: Data Publishing in Archaeozoology

Web of Data (2009)Web of Data (2009)

Growing, Decentralized Innovation

Page 23: Data Publishing in Archaeozoology

Web of Data (2011)Web of Data (2011)

Page 24: Data Publishing in Archaeozoology

Web of Data (2011)Web of Data (2011)

Need Archaeology on the Map

Contributions should not be isolated from other communities

Page 25: Data Publishing in Archaeozoology

Open Context: RecordOpen Context: Record

HTTP URIs to identify resources at a meaningful level of granulaity (“a URL per potsherd”)

Use HTTP URIs published by others

URIs act as “primary keys” allow data to be related

Page 26: Data Publishing in Archaeozoology
Page 27: Data Publishing in Archaeozoology

Concept: Bos taurus (http://eol.org/pages/328699/)

Page 28: Data Publishing in Archaeozoology

Concept: Bos taurus (http://eol.org/pages/328699/)

Page 29: Data Publishing in Archaeozoology

Open Context: RecordOpen Context: Record

Page 30: Data Publishing in Archaeozoology

Open Context Entity ReconciliationOpen Context Entity Reconciliation

Authors / Editors relate project-specific

terminologies to global terminologies

“Common name : Cattle, domestic” = http://eol.org/pages/328699/

(Bos taurus)

Page 31: Data Publishing in Archaeozoology

Open Context Entity ReconciliationOpen Context Entity Reconciliation

Many project-specific terms

related to global terminologies

Authors / Editors relate project-specific

terminologies to global terminologies

Project Specific Property EOL Link (Global Terminology)

Species : Sheep / Goat http://eol.org/pages/2851411/ (Caprinae)

Taxon : Bos taurus http://eol.org/pages/328699/ (Bos taurus)

Species : Deer http://eol.org/pages/38816/ (Dama sp.)

Type : Deer http://eol.org/pages/34547/ (Odocoileus sp.)

Taxon : Ovis / Capra http://eol.org/pages/2851411/ (Caprinae)

Species : Cattle http://eol.org/pages/34548/ (Bos taurus)

Species : Goat http://eol.org/pages/328660/ (Capra hircus)

Page 32: Data Publishing in Archaeozoology

Open Context Entity ReconciliationOpen Context Entity Reconciliation

Many project-specific terms

related to global terminologies

Authors / Editors relate project-specific

terminologies to global terminologies

Project Specific Property EOL Link (Global Terminology)

Species : Sheep / Goat http://eol.org/pages/2851411/ (Caprinae)

Taxon : Bos taurus http://eol.org/pages/328699/ (Bos taurus)

Species : Deer http://eol.org/pages/38816/ (Dama sp.)

Type : Deer http://eol.org/pages/34547/ (Odocoileus sp.)

Taxon : Ovis / Capra http://eol.org/pages/2851411/ (Caprinae)

Species : Cattle http://eol.org/pages/34548/ (Bos taurus)

Species : Goat http://eol.org/pages/328660/ (Capra hircus)

Editorial work-flow helps annotate

data for interoperability

Page 33: Data Publishing in Archaeozoology
Page 34: Data Publishing in Archaeozoology

Data Publishing ProjectsData Publishing Projects

EOL (2012) funding for publishing additional zooarchaeology datasets (Neolithic Anatolia), in project led by Ben Arbuckle (Baylor University)

Page 35: Data Publishing in Archaeozoology

NEH (2012) funding for publishing trade + exchange related datasets (Bronze-Iron Age Mediterranean)

Data Publishing ProjectsData Publishing Projects

Page 36: Data Publishing in Archaeozoology

Data Publishing ProjectsData Publishing Projects

Complement Conventional Publishing

Lockwood Press (“Archaeobiology Series”), Cotsen Institute Press (UCLA)

Page 37: Data Publishing in Archaeozoology

Data Publishing ProjectsData Publishing Projects

Driven by research interests and publication goals among researchers wanting to compare datasets, create reference collections, and have citable, full datasets linked to synthetic publications.

Page 38: Data Publishing in Archaeozoology

Summary

Outcomes of Publishing Data:

(1) Make “datasets” first class citizens in world of scholarly communications

(2) Provide needed transparency to published interpretations

(3) Enable new kinds of multi-disciplinary research across many datasets

Page 39: Data Publishing in Archaeozoology

Thank you!Thank you!

Special Thanks!

Canan Ҫakırlar, RCAC, Koҫ University, ICAZ, and other sponsors