Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

21
e-Humanities Group Research Meeting: STCN 2013/10/10 Wouter Beek Albert Meroño Peñuela Rinke Hoekstra Fernie Maas Inger Leemans

description

Despite a stagnating domestic demand near the end of the seventeenth century, Dutch book producers managed to keep up their international market position. In a so-called embedded research project, the Short Title Catalogue, Netherlands (STCN) was used to gain insight in the strategies and decisions of these publishers. The STCN is a retrospective bibliography of publications 1540-1800, containing information on title, author, book producer, language, subject and collation. Historians and computer scientists collaborated to disclose this STCN, and to connect it to other relevant datasets. To explore the possibilities of, and difficulties in, disclosing and linking the bibliography, attention was turned to a particular strategy: publishing scandalous books. Next to explaining the process of converting and querying the STCN data, the presentation will deal with differences in handling data and the advantages of an Open Data approach in the humanities research.

Transcript of Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

Page 1: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

e-Humanities Group Research Meeting: STCN

2013/10/10 Wouter Beek

Albert Meroño Peñuela Rinke Hoekstra

Fernie Maas Inger Leemans

Page 2: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

‘OPENING’ THE STCN LINKING THE STCN

Page 3: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

Open data

Page 4: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies
Page 5: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

Linked Open Data

• Connect to existing datasets • Connect to services • Queries/inferences run across datasets

– The Picarta topic hierarchy allows us to infer that certain publications cover related topics.

– GeoNames gives the latitude of publishing houses, allowing publishing decisions to be correlated to historical events.

– Lexvo / ISO standards allow translations to be traced via related languages (e.g. language families).

• Easy to create mashups / new applications.

Page 6: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

died in

Biografisch portaal

same as

Page 7: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

Taking the STCN to the Semantic Web

• 139.817 publications (4M facts) • 23.543 authors (120K facts) • 9.959 printers (55K facts) • 37K enriched concepts (DBpedia, Yago, Heidelberg

Diglit, …) • 105 topics (1K facts) • Relate to international standards

(GGC/OCLC/ISO/RFC/IANA) • Making the schema explicit (vocabulary)

Page 8: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

Relational DB domain knowledge

RDF files

Text files ambiguous

XML files depends on structure

domain knowledge

Link to external sources (linksets) domain knowledge needed

Domain-independent data conversions fully automated

Simple RDF

Domain-dependent data conversions domain knowledge needed

Connect to services (e.g. query interface, maps)

high level of reuse

Fixing bad data origin inconsistencies

& inaccuracies

Page 9: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

FROM THE LIBRARY TO THE LAB

Page 10: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies
Page 11: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

“How many publications by Arminius?”

Page 12: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

“How many publications by Gomarus?”

Page 13: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

What happens to the average publication format after 1619?

Measured in terms of the number of folds: • Works by Arminius: 5.6 5.7 • Works by Gomarius: 6.8 4.9

Distant reading!

Page 14: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

Methodological implications

From

searching for resources (librarian) to

validating/refuting hypotheses (scientist)

Page 15: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

humR

humanities + R (statistics processing software)

A WEB SERVICE FOR

RESEARCH INVOLVING DISTANT READING

Page 16: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies
Page 17: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies
Page 18: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies
Page 19: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

Open issues 0: institutional hurdles

• The products of publicly funded research should be publicly available (papers&datasets). – Not everybody makes their data publicly available.

• Distant reading research is often restricted by the user interace.

Page 20: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

Open issues 1: meaning

A large percentage of the data has no/unknown meaning: • “before 1808” • “This book was published between the Big Bang and

1808.” Context-dependent: • “The first dinosaur walked the earth before 300M years

BC.” • “Einstein came up with the idea of general relativity

before 1937.” Fuzzyness: • “James Joyce’s Ulysses was published before 1925.”

Page 21: Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

Open issues 2: statistics • Which query results are statistically relevant? • How to detect whether a statistically significant

difference reflects reality and not the way in which the dataset was constructed?