Dialogue 2014, Bekasovo Anastasia Bonch-Osmolovskaya NRU HSE.
-
Upload
allen-jackson -
Category
Documents
-
view
216 -
download
0
Transcript of Dialogue 2014, Bekasovo Anastasia Bonch-Osmolovskaya NRU HSE.
Association of Digital Humanities Organizations (Europe, America, Australasia, Japan)
Digital humanities project
Scholarly dissemination Big Data for HumanitiesDistant reading and complex network analysis
and vizualisationLinking cultural data: building standartized
resources and interoperability
Digital humanities: state of the art field
a stack of books of 2,7 m
height
includes all published
works, variants,
unpublished drafts,
diaries, letters, fragments
13 volumes of diaries
31 volume or 8500 letters
about 14,5 mln tokens
commentaries, indexes
Leo Tolstoy’ 90-vol complete edition
A project to digitise the entire works of Leo Tolstoy – named All of Tolstoy in One Click – making them available for tablets and smartphones, turned out to be lighter work than expected for the Tolstoy Museum in Moscow, when thousands of readers from all over the world responded to a call for volunteers. (The Guardian)
Now, thanks largely to the efforts of these volunteers, nearly all of the great Russian writer’s massive body of work, including novels, diaries, letters, religious tracts, philosophical treatises, travelogues, and childhood memories, will soon be available online, in a form that can be easily downloaded, free of charge. (The New Yorker)
A Crowdsourcing Wonder
The idea of contemporary standards of cultural heritage web publishing
Tagging relevant structural elements of the text and textual data
Linking elements inside and outside the text
project participantsTolstoy Museum (Fekla Tolstaya)High School of Economics, philology department (Boris
Orekhov, Anastasia Bonch-Osmolovskaya)Tartu University (Roman Leibov)ABBYY Compreno ( Anatoly Starostin)students of the philological department HSE
Semantic Tolstoy
What should be tagged? What tags should be used?Should we do it manually or automatically?Do we represent book or text? (Do we tag
non-Tolstoy’s texts?)
Semantic Tolstoy: how to start
What should be tagged? Everything that can be tagged with TEI
What tags should be used? TEI schemeShould we do it manually or automatically? It
dependsDo we represent volumes or texts? Text
Semantic Tolstoy: how to start
xml standard scheme for books encoding http://www.tei-c.org
wiki, manuals, tutorials, events, discussions, groups of interest
ROMA - http://www.tei-c.org/Roma/ - customization generator for TEI scheme
Text Encoding Initiative
TEI scheme modules
critical apparatusreadings,
variantsnames dates
placestables, formulae,
graphics, notated music
language corporadictionaries
linking, segmentation, alignment
linguistic annotationpos tagging
certainty, precision, responsibility
Types of texts
documentary textsliterary texts
proseverseperformance texts
spoken textstranscriptions of
speech
manuscriptsancient texts
on papyri, stonemedieval texts
illuminated mscmodern texts
variorumhandwrittentypewritten
<l>Я просыпаюсь. Я <choice> <orig>об'ят</orig> <reg>объят</reg> </choice> <l>Открывшимся. Я на <choice> <orig>учете</orig> <reg>учете.</reg> </choice> </l>
Normalization
create volume/text-type matrixselect TEI schemes for different text types
use modificated xml from ABBYY Finereader for structural elements
parse indexes and link them to text define intertextual linksmake Semantic Tolstoy cookbook
Preliminary work
Улыбка <forename>Аграфены Петровны</forename> означала, что письмо было от <rolename>княжны</rolename> <surname>Корчагиной</surname>, на которой, по мнению <forename>Аграфены Петровны</forename>, <surname>Нехлюдов</surname> собирался жениться. И это предположение, выражаемое улыбкой <forename>Аграфены Петровны</forename>, было неприятно <surname>Нехлюдову</surname>.
TEI for Tolstoy (cookbook)
Automatic date extraction(M.Kolbasov, HSE student)
Прямой полный 17 марта 1847 года <date when="1847-17-03"> 17 марта 1847 года </date>
Прямой неполный Числа 22 <date when="1847-22-03"> Числа 22 </date>
Лучевой задний Вот уже шестой день Вот уже <date from="1847-24-04" to="1848-01-01"> шестой день </date>
Отрезковый наст. Эту неделю я сижу дома Эту <date from="1847-19-04" to="1847-25-04"> неделю </date> я сижу дома
Точечный прош. Я совершенно доволен собою за вчерашний день
Я совершенно доволен собою за <date when="1847-23-04"> вчерашний день </date>
Student projectsOld2New orthography transliteratorTolstoy corpus for ruscorpora Universal index parser
Together with ComprenoNamed entity extractionEvaluation of NE merging (indexes as a Gold
Standard)Fact extraction
Accompanying projects