Interpretation, Context, and Metadata: Examples from Open Context
-
Upload
eric-kansa -
Category
Data & Analytics
-
view
228 -
download
1
Transcript of Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata:Examples from Open Context
Eric Kansa (@ekansa)
Data often discussed using language of
compliance (Taylorist perspectives)
Data often discussed using language of
compliance (Taylorist perspectives)
● Linked: Links with other systems & data (tDAR, ORCID, etc)● Open: Code, data (mainly CC-By) on GitHub, machine-readable formats, APIs● Long-term: NSF, NEH data management. California Digital Library archiving● Global: Mirroring, collaboration with the German Archaeological Institute (DAI)
● Linked: Links with other systems & data (tDAR, ORCID, etc)● Open: Code, data (mainly CC-By) on GitHub, machine-readable formats, APIs● Long-term: NSF, NEH data management. California Digital Library archiving● Global: Mirroring, collaboration with the German Archaeological Institute (DAI)
Role: Publication (editorial & peer-review) and exhibition (like an online museum) Promote Data Reuse: Attempt to document context, annotate data to common
vocabularies. Increasing emphasis on intervening earlier in research data “life-cycle”.
Role: Publication (editorial & peer-review) and exhibition (like an online museum) Promote Data Reuse: Attempt to document context, annotate data to common
vocabularies. Increasing emphasis on intervening earlier in research data “life-cycle”.
?Spectrum of Less and More Structure1. More structured: classification, quantification2. Less structured: images, field-notes3. Structured and less structured information need to
cross-reference (URIs useful), all provide context
Spectrum of Less and More Structure1. More structured: classification, quantification2. Less structured: images, field-notes3. Structured and less structured information need to
cross-reference (URIs useful), all provide context
Open Context ≠ A conventional digital repository
Open Context ≠ A conventional digital repository
Information Stable URI
300m wall circumference (estimated based on geomagnetic sounding, approximate)
http://arcserver.usc.edu/reports/reports/TAA_2000_to_2007.pdf
Wall foundation about 1.8m thick http://opencontext.org/media/BF565965-98A8-4E84-2318-AFFA983277E1
Brick dimensions: 34 x 31 x 9 cm http://opencontext.org/subjects/975143F2-B80E-436B-B078-1D67FD848352
Surviving wall height: 1.2 meters http://opencontext.org/subjects/02B9D6E6-D6AD-4138-7FCC-3EF6F8BD5722
Specific Citation Promotes Reproducibility1. Look at lots of pictures, read field notes.2. URIs facilitate reproducibility, link assertions with
specific information sources
Specific Citation Promotes Reproducibility1. Look at lots of pictures, read field notes.2. URIs facilitate reproducibility, link assertions with
specific information sources
URIs & Unstructured Data
APIs (Machine-Readable Data) make it easier to re-use, analyze, visualize, + interpret less structured data.
APIs (Machine-Readable Data) make it easier to re-use, analyze, visualize, + interpret less structured data.
Open Context ≠ A conventional digital repository
Open Context ≠ A conventional digital repository
Image Credit: Mark Skipper via Flickr (CC-BY) https://www.flickr.com/photos/bitterjug/7670055210
Challenge of ComplexityChallenge of Complexity
Entity Relation Diagram:Anglo-Saxon Graves and Grave Goods of the 6th and 7th Centuries AD: A Chronological FrameworkJohn Hines (2013)http://dx.doi.org/10.5284/1018290
Entity Relation Diagram:Anglo-Saxon Graves and Grave Goods of the 6th and 7th Centuries AD: A Chronological FrameworkJohn Hines (2013)http://dx.doi.org/10.5284/1018290
Digital Repository
Citation Cite Archaeological Entities (sites, coins, bones, etc)
Cite Digital Files (can contain thousands of items)
Granularity High (“1 URI per potsherd”)
Low (Information aggregated in big files)
Discovery, Querying
Common schema, common index for content, not just metadata
Index metadata only, content is more opaque
Cost Expensive “Boutique Publishing”
Cheaper, easier to scale. Self-service models.
Managing Complexity:Data about this coin came from several different files (relational data bases, spreadsheets)
Some archaeological projects can have dozens of different spreadsheets + databases!
Managing Complexity:Data about this coin came from several different files (relational data bases, spreadsheets)
Some archaeological projects can have dozens of different spreadsheets + databases!
Publishing Workflow
Improve / Enhance1. Consistency2. Context
(intelligibility)
Improve / Enhance1. Consistency2. Context
(intelligibility)
Large scale data sharing & integration for exploring the origins of farming. Funded by EOL / NEH
Large scale data sharing & integration for exploring the origins of farming. Funded by EOL / NEH
“Bos taurus”http://eol.org/pages/328699
Code: 14
Cattle
Code: 70
Code: 16
Bos taurus
Code: 15
Cattle, domestic
B. taurus
Cattle (dom.)
LimitationsLimitations• Diverse recovery, sampling, Diverse recovery, sampling,
identification methods…identification methods…• Data modeling problems in Data modeling problems in
sources (esp. teeth)sources (esp. teeth)• Researchers need to Researchers need to
understand how to make data understand how to make data better suited for reusebetter suited for reuse
LimitationsLimitations• Diverse recovery, sampling, Diverse recovery, sampling,
identification methods…identification methods…• Data modeling problems in Data modeling problems in
sources (esp. teeth)sources (esp. teeth)• Researchers need to Researchers need to
understand how to make data understand how to make data better suited for reusebetter suited for reuse
Bootstrapping ProblemBootstrapping Problem• (Linked) Data can feel like (Linked) Data can feel like
having a telephone with having a telephone with nobody to callnobody to call
• Links with other data can help Links with other data can help buid context. But relevance buid context. But relevance can have a very narrow scope can have a very narrow scope
Bootstrapping ProblemBootstrapping Problem• (Linked) Data can feel like (Linked) Data can feel like
having a telephone with having a telephone with nobody to callnobody to call
• Links with other data can help Links with other data can help buid context. But relevance buid context. But relevance can have a very narrow scope can have a very narrow scope
Pelagios:Geographic context emerging as key way to aggregate multiple datasets (Pis: Leif Isaksen, Elton Barker)
Pelagios:Geographic context emerging as key way to aggregate multiple datasets (Pis: Leif Isaksen, Elton Barker)
● Digital Index of North American Archaeology (DINAA): David G. Anderson, Joshua Wells (PIs) NSF-funded.
● Publishes a gazetteer of archaeological “site” records (from state agencies). gazetteer of “sites”. (A site is a key concept in archaeology)
● Digital Index of North American Archaeology (DINAA): David G. Anderson, Joshua Wells (PIs) NSF-funded.
● Publishes a gazetteer of archaeological “site” records (from state agencies). gazetteer of “sites”. (A site is a key concept in archaeology)
● Cross referenced site URIs with relevant records in tDAR and other public databases
● Cross referenced site URIs with relevant records in tDAR and other public databases
PeriodO (http://perio.do)• Led by Adam Rabinowitz, Ryan
Shaw, Eric Kansa (NEH funding)• Sometimes little consensus in
context (time periods)
PeriodO (http://perio.do)• Led by Adam Rabinowitz, Ryan
Shaw, Eric Kansa (NEH funding)• Sometimes little consensus in
context (time periods)
PeriodO Gazetteer of Periods, modeling:(1) Temporal scope(2) Geographic coverage(3) Scholarly authority [because
disagreements about High, Middle, and Low Chronologies]
PeriodO Gazetteer of Periods, modeling:(1) Temporal scope(2) Geographic coverage(3) Scholarly authority [because
disagreements about High, Middle, and Low Chronologies]
New Publishing Services1. Open Context will publish
citable, formally modeled (SKOS) controlled vocabularies
2. Context-informed reconciliation services to help researchers / curators link data
3. Offer a recommendation service for relevant vocabularies for researchers (especially seeking DMP help)
New Publishing Services1. Open Context will publish
citable, formally modeled (SKOS) controlled vocabularies
2. Context-informed reconciliation services to help researchers / curators link data
3. Offer a recommendation service for relevant vocabularies for researchers (especially seeking DMP help)
Final Thoughts(Finally) some examples of data reuse and integration (in archaeology).
In many cases, reuse is still aspirational. Need long time scales to develop context.
“Context” is a hard research problem (including theoretical); requires better practice at each stage of the data life-cycle.
(Finally) some examples of data reuse and integration (in archaeology).
In many cases, reuse is still aspirational. Need long time scales to develop context.
“Context” is a hard research problem (including theoretical); requires better practice at each stage of the data life-cycle.
THANK YOU!
Special Thanks!DCC, DIPIR Team!