Thatcamp recap

THAT

CAMPSTL

TAKEAW

AYS

RE

CA

PS

OF

SE

SS

I ON

S A

T T

HA

TC

AM

PS

TL

ON

NO

VE

MB

ER

9,

20

13

BEGINNERS DIGITAL HUMANITIES/SUBJECT LIBRARIAN BOOT CAMP

What is Digital Humanities? Hack (building scholarly digital editions, projects) vs. Yack (theory)

Two areas within the Hack part of DH: textual encoding with TEI (Textual Encoding Initiative) XML, and textual mining

We all use digital tools now: what differentiates something as uniquely digital humanities (vs. ‘traditional’) scholarship? Digital humanities scholarship will leverage the digital medium, i.e., create something that could not be duplicated in analog formats; or if it could be reproduced in analog with no loss, it’s not DH

The research team—of scholars, programmers, librarians—is characteristic (and probably necessary) of DH, but new to the humanities, which had a tradition (if not completely accurate) of the “lone wolf” scholar

Pointers toward resources in getting started in DH will be on the THATCamp STL site next week!

Session leaders: Chris Freeland and Andrew Rouner

POTENTIAL LITERATURE

We looked at markov chain random text generation.

Playing around with a "rhymer" script led to a discussion of lexical resources for text generation.

We looked at a version of the "dada engine", which generates texts by applying a vocabulary to a grammer.

We briefly surveyed John Cage's "mesostics".

All of this led to a discussion of quantitative measures for literary creativity.

Resources used on the session are available at http://ada.artsci.wustl.edu/dada/

Session leader: Stephen Pentecost

http://ada.artsci.wustl.edu/dada/

BUILDING A SEMI-AUTOMATIC GEOCODING PROGRAM FOR TEXT DOCUMENTS

Andrew introduced concept of geocoding place references in text documents.

Aaron said technology is out there to do this.

Jeff demonstrated Viewshare, Library of Congress open source software, which he used for mapping important place references in oral histories compiled for the Missouri State Historical Society.

Aaron described how Clavin is based on a Gazetteer, that enables you use access coordinates for real place names.

The problem is that it won’t recognize historical places that no longer exist. It also will not function at the fine grain of street addresses. There was discussion about the need to create a gazetteer for St. Louis to incorporate lost landscapes and street addresses.

Brian demonstrated Open Calais, a name recognition software, and explained how he used it to map locations in St. Louis Beacon articles through the Google API.

Anupam asked about different types of geographical data output, other than web-based displays.

The session wrapped-up with some playing around with the Clavin demo to make it usable.

Session leader: Andrew Hurley

http://viewshare.org/

http://www.opencalais.com/

SLU CENTER FOR DIGITAL HUMANITIES PT. 1

SLU CDH Origin story accidental opportunities come from casting a wide net pursue impossible ideas and you might make connections that make it possible

Collaborative spirit leaves the door open Linked Open Data ideas can support interoperability and future collaboration, communication, or data reuse

even if it is not exposed to the world

WashU Libraries sharing experiences and seeking solutions for DLXS lack of support working with technologies like Fedora, Hydra the power of finding user groups and library communities

WashU Unique struggles with 20th, 21st century texts publishing incomplete biographical text is a DH project that can best exist as an interactive digital object financing, copyright, access control can interrupt standards and interoperability even requests are not standardized and change from institution to institution and country to country Finding tools that support standards or that help mediate the IPR by remotely fetching images or supporting

remote annotation so access can be used with violating rights can help

Session leader: Patrick Cuba

SLU CENTER FOR THE DIGITAL HUMANITIES PT. 2

Mizzou Faces challenges for Digital Humanities support where sciences are prominent and geography is isolating Digital Humanities may allow for more distant collaboration where interests overlap Sometimes the DH projects need to precede institutional support until a critical mass of interest exists on campus

Webster Film project for annotated documentaries or user-guided stories Tradamus (SLU-CDH) took from others to find standards and directions Sharing obstacles with peers can aid in the discovery of tangent tools which nearly meet challenges as a starting point

for new projects Visualization tools for moving through a graph may assist in composition or user interface LittleBigPlanet game allows users to move around well defined visual components to create and experience and the

community reshares compositions (crowd-sourced documentary possibilities)

Eastern Illinois Past Tracker and Localities projects are great resources which would benefit from update challenges include rotating grad position in charge of working on project, lack of time at institution, and decentralized

resources for working with DH projects Contact with other institutions revealed on-campus resources that may be available When creating this as a DH project, tracking the history of the project itself may be of interest, both popular interest

and as an aid to future scholars


BLURRING THE BOUNDARIES BETWEEN SCHOLARSHIP

1. Open-source tools in Digi Hum: calls on the public to do creative work with material

An example: http://t-pen.org/TPEN/ an example: http://rapgenius.com

2. Crowd-sourcing & social media

3. Community, broader impacts in digi-hum projects/products/methods

4. How to convince students? How to incorporate into class construction?

5. How do faculty involve students and still maintain the project quality integrity of the original product goals (this is true outside of the student context too--at community level)? Faculty-student collaboration? Faculty-student guidance/direction? Both?

6. The "subject" as another type of community?

7. Academic/faculty/scholar collaboration

8. Futures? Communities for scholarly peer review in DigiHum, simultaneous, long-distance scholar input (using Wikipedia as an example of the beginnings of this)

Session leader: Kristine Hildebrandt

http://t-pen.org/TPEN/

http://rapgenius.com/

http://rapgenius.com/

STL LAMS

Going forward, the TECHO (Technology Exchange Humanities Cultural Organizations) group should:

continue meeting with a focus on projects; making it a “sharing group,” participants will lose interest; projects require commitments

identify a better platform for collaboration than Google Groups, and at the same time should have a public-facing resource, so interested parties can contact the group to join in (possibly WordPress)

build its network and collaborators begin planning ongoing, informal training on relevant platforms and

standards

Session leader: Andrew Rouner

XML, OAC, RDF, JSON-LD AND THE KING STOOD: THE UNIVERSE IS METADATA:

TEI is a great schema for description and interoperability, but XML limits in too many ways overlapping ranges are not allowed when annotating XML document does not resemble simulated original metadata in headers and in-line tags are artificially different massive XML documents must be parsed and processed for relevant or wanted information

RDF sought to fix some of the problems, but RDF-XML still stumbles

OAC (openannotation.org) removes the description, conversation, and linking from the original digital object

solves all the listed problems of XML, leaves some common issues of vocab, convention, and data fragility allows for TEI or DC or any vocabulary to be used in description creates an independent digital object that can be stored, queried, or resolved from any location complex chains of annotations and selectors can describe a resource so well that even if an original image or

text becomes unavailable, the annotations can still recreate meaning OAC abandons the idea that annotations should be easily human readable in favor of machine navigatable

triples that can be passed easily between and within digital applications

Thinking in oa:Annotations instead of XML allows for new possibilities

SharedCanvas (shared-canvas.org) extends OAC and creates a sc:Canvas object for reference which has no content and is only annotated

Tradamus (SLU-CDH project) creates digital editions whose text is only


http://shared-canvas.org/

QGIS

Introduced QGIS and the history of the project

Discussed types of GIS possible with the software

Demonstrated how to search for data and add simple data to a QGIS project

Outlined various ways QGIS was similar\different to ArcGIS

Session leader: Aaron Addison

DIGITAL PEDAGOGYEven in instructional settings where teaching DH is not the primary goal, DH or simply

technology-assisted projects (as basic as creating sites, blogging, tweeting) can encourage student to interact, take ownership of content, teach peers, & learn important lessons about source documentation & context

Ongoing projects in particular are great for incorporating new/young/uneducated students, giving them built-in peer teaching, engagement, bigger sense of purpose, & responsibility to “real” audience outside classroom (examples from participants: http://widewideworlddigitaledition.siue.edu/ http://talus.artsci.wustl.edu/spenserArchivePrototype/)

Combining content/theory & making/DH in one course is challenging: many approaches, incl. one hands-on session & one lecture each week, an additional lab option, periodic technical bootcamps throughout semester, or a DH-customized lab track of a larger survey course – none of them perfect, all requiring institutional support!

DH playing field is absolutely not level: digital divide an issue in different institutional contexts, and not all languages can claim the evel of successful digitization that English literature enjoys – so how can those of us who teach and/or study foreign languages expand the definition of DH to include basic digitization & translation projects that will be useful to them? Should we recenter DH to address socioeconomic & linguistic difference, especially if these are topics we encounter regularly in our classrooms? (possible example of richly multilingual project: http://library.princeton.edu/projects/bluemountain/)

Session leader: Wendy Love Anderson

http://widewideworlddigitaledition.siue.edu/

http://widewideworlddigitaledition.siue.edu/

http://talus.artsci.wustl.edu/spenserArchivePrototype/

http://library.princeton.edu/projects/bluemountain/

INTEGRATING NEW TECHNOLOGIES INTO FIRST GENERATION DIGITIZATION PROJECTS

Problem of intellectual stewardship: who is custodian of an archive? Should you share files, cede ownership?

How do you ensure usability in the future? Front-end vs. back-end?

Uniformity of standards: metadata should talk across platforms, archives.

"We all want our stuff to work with other people’s stuff to have better scholarship

is the underlying issue that we should be agitating to change the rules?"

Session leader: Malgorzata Rymsza-Pawlowska

SPATIAL HUMANITIESThe discussion revolved around ways in which digital spatial tools have or might in the

future enhance scholarship. The early part of the discussion focused on GIS mapping. There was also some discussion about 3D digital environments toward the end of the session.

Campers identified several types of research that lend themselves to electronic spatial analysis:

Research involving data produced by crowd sourcing. Research involving massive amounts of data. Research about the diffusion processes. Research attempting to flesh out the physical dimensions of a place. Research about material objects and architectural elements that can be reconstructed in 3D

Limitations of employing spatial digital tools included:

Temporal analysis is difficult to display through maps.

Data collection and input along with the building of 3D environments is resource intensive and there is the danger that such enterprises will be monopolized by corporate behemoths like G*****.

The discussion ended on the subject of the portability of geographical data and issues of access.

Session leader: Andrew Hurley

UNSTRUCTURED DATA

• Types of NoSQL db’s – other Big Data technologies

• Application and use cases in Humanities• Crowdsourcing data• Word spotting• Data mining of archives

• Need to be sure we are asking the right questions

• Importance of metadata for all processes

Session leader: Aaron Addison

WORDPRESS

WordPress can be used as a full content management system. It's not just a blogging platform.

Some example WordPress sites: http://taylorfamilyinstitute.wustl.edu http://mallinckrodt-academy.org/ http://historyofmedicine.wustl.edu/

The Advanced Custom Fields plugin makes it easy to enter and display data for site-specific types of content.

For developers, WordPress strikes a good balance between flexibility and ease of use.

WordPress is very popular. As free, open source software, it has a low barrier to entry. Its huge installed base makes it easy to find hosting, technical support, themes, and plugins.

The easiest way to get started with WordPress is to sign up for an account at wordpress.com.

Session leader: Brian Marston

http://taylorfamilyinstitute.wustl.edu/



http://mallinckrodt-academy.org/



http://historyofmedicine.wustl.edu/

http://historyofmedicine.wustl.edu/

TIME SERIES

The session on Databases Before Digital drew a small group for a discussion that spent some time on questions of how to improve methods of working with tabular textual material that OCR often doesn't handle well, but also included shared curiosity on the history of how people have historically organized data and bureaucracies. There was some overlap with earlier discussions of 19th-century St. Louis city directories and what might be done with them in the form of a structured digital historical resource. The session ended early to enable participants to attend other sessions of interest at the same time.

The session on Time Series delved into questions of modeling and visualization, and became a fascinating speculative conversation. We discussed how to represent spans of time and how to deal with fuzzy and unknown data. Simile timeline tools

Session leader: Doug Knox

ATTRIBUTION AND COLLABORATIONFacing the challenges of attribution and credit in a digital world

Traditional publishing offers monolithic intellectual objects marked with citation conventions

Digital objects record micro-contributions and allows for chaining of annotations precise citation and criticism becomes possible crowd-sourced or collaborative work can be assembled by groups, rather than simply mass contributed and

then munged into cohesion by a single editing entity if an editorial decision is discredited, it becomes easier to find dependent opinions and revise them

It introduces many scenarios we cannot resolve How do we discriminate between users who contribute different types of work?

datasets sparse, but critical editorial choices advanced transcription and collation helpful visualizations proof-reading and corrective changes linking, citation, and supportive annotation

How do we balance quality over quantity? an RA may have created 95% of the annotations (editorial acts), but the PI may 'own' the critical, controversial, or

significant 5%

The act of reviewing and accepting an annotation doesn't necessarily change the credit of the contributor, but establishes some editorial hegemony

Different institutions attach very different values to work like data collection, cataloging, transcription, collation, key-finding, inter-linking, etc.


Thatcamp recap

Technology

Transcript of Thatcamp recap