Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

29
KONVENS Wien, 15 Sep 2004 EXMARaLDA – A modeling and visualization framework for the computer-assisted transcription of spoken language Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

description

KONVENS Wien, 15 Sep 2004 EXMARaLDA – A modeling and visualization framework for the computer-assisted transcription of spoken language. Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg. Background. Multilingual Database , SFB 538 „Mehrsprachigkeit“, University of Hamburg - PowerPoint PPT Presentation

Transcript of Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Page 1: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

KONVENS Wien, 15 Sep 2004

EXMARaLDA – A modeling and visualization framework for the computer-assisted transcription

of spoken language

Thomas Schmidt

SFB 538 ‚Mehrsprachigkeit‘

University of Hamburg

Page 2: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Background

• Multilingual Database, SFB 538 „Mehrsprachigkeit“, University of Hamburg

• EXMARaLDA (Extensible Markup Language for Discourse Annotation)

• Dissertation project „Computer-based transcription of spoken language as a modelling and visualisiation process“ (Supervisor: Angelika Storrer)

Page 3: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Background

• Transcription of spoken language– Interviewer / child interaction– Classroom interaction– Interpreted doctor-patient discourse

– for discourse / conversation analysis– for (child) language acquisition studies

Page 4: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Background

• Problem: Diversity of Transcription Data

– Theoretical diversity: • Entities of transcription (utterances, turns, non-verbal activities

etc.)• Relations between entities (temporal, hierarchical, features, ...)• Presentation formats (partitur notation, column notation, ...)

– Technological diversity: • Storage formats (text, binary, RDB)• Software (syncWriter, HIAT-DOS, DBM-Systems, word

processors, ...)• Operating Systems (Windows, MAC OS)

Page 5: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Background

Page 6: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Background

Page 7: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Background

• Problem: Diversity of Transcription Data

• Aim: A common platform for computer-assisted transcriptionExchange, reuse, archive transcription data

Merge corpora

Use different software tools with one piece of data

Page 8: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Background

• Problem: Diversity of Transcription Data

• Aim: A common platform for computer-assisted transcription

• (Elements of a) SolutionXML technologyThree level architecture

Separate form from contentSeparate logical from physical structure

Page 9: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Topics of this talk

2. Components of the developed system

1. Some methodological considerations:

Linguistic methods Computer science methods

„Computing in the humanities“

Interdisciplinary communication

Page 10: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Methodological considerations

Established view„Verschriftlichung“ Theory

Quality criteria Readability

Transcription as...

Adequacy

Transcript

FormForm Text technology viewForm ContentDocument...

Database viewE/R modelFormFormViewApplication vs.Logical layer

Model theory viewSymbolic modelForm

Analogue model

ModellingVisualisationVisualisationVisualisation

ComputerTranscription as...

Modified view

Page 11: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Methodological considerations

Transcription as Modeling and Visualization of spoken language

Accordance with text-technological conceptsOne model, different visualizationsNo tradeoff between readability and adequacyNo tradeoff between human and computer processabilityNo “Standardization” of models

a common modelling framework, not a common modelno ontological specifications

XML = Standardization of physical representation

Page 12: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Visualization to Model

Page 13: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Visualization to Model

Structural relations: 1. Temporal sequence

Page 14: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Structural relations: 1. Temporal sequence2. Simultaneity

Visualization to Model

Page 15: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Structural relations: 1. Temporal sequence2. Simultaneity3. Equivalence (Entity Feature)

Visualization to Model

Page 16: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Structural relations: 1. Temporal sequence2. Simultaneity3. Equivalence (Entity Feature)4. Hierarchy (Containment)

Visualization to Model

Page 17: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Modeling framework

• Relational? Sequence? Simultaneity?• OHCO? Simultaneity?• DAG: Annotation Graphs? Complexity? Transcription Graphs

Page 18: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

System architecture

Page 19: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Application: Input tools

EXMARaLDA Partitur-Editor

Page 20: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Application: Input tools

Simple EXMARaLDA Text file

Page 21: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Application: Input tools

TASX annotator

Page 22: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Application: Input tools

PRAAT

Page 23: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Application: Input tools

EUDICO Linguistic Annotator (ELAN)

Page 24: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

... as a wrapped partitur

... as a line transcript ... in column notation

Application: Visualization

Page 25: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Application: Corpus management

EXMARaLDA Corpus Manager (COMA)

Page 26: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Application: Query/Analysis

Search and Query Instrument for EXMARaLDA (SQUIRREL)

Page 27: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Project status

• Software past beta stage• Five projects at our own institution use EXMARaLDA for their corpus work• Around 800 users in research and teaching outside SFB• Used at the IDS in Mannheim• Submitted a suggestion for integration of data model into P5 of the TEI guidelines

Page 28: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Summary

Transcription as theory and „Verschriftlichung“ Computer-assisted transcription as modelling and visualisation

Interdisciplinary bridge / Methodology of computational techniques in „classical“ linguistics Concrete practical improvements for work with transcription data

EXMARaLDA and Database „Multilingalism“Data model, formats and tools building on the separation of model and visualisation

Page 29: Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

Fin.