Shebanq roma-2013-10-01

Post on 26-Jun-2015

162 views 0 download

Tags:

description

SHEBANQ project (half-way) as a use case in querying language resources. The corpus is the text of the Hebrew Bible with linguistic features, packaged in de special text database and converted to LAF

Transcript of Shebanq roma-2013-10-01

Data Archiving and Networked Services !

SHEBANQ !

Dirk Roorda - researcher @ DANS,TLA !

System for HEBrew Text: ANnotations for Queries and Markup !

TEI pre-conference workshop: Query !Roma – 2013-10-01 !

Overview

1.  Context: text, data, research in Hebrew Bible

2.  MdF database model, MQL query language

3.  Sharing the research process

4.  CLARIN-NL project: SHEBANQ

5.  Towards new tools

1 (of 5) Context

Text, data and research in the Hebrew Bible

VU Amsterdam

Eep Talstra Centre for Bible and Computer

text + linguistic features => database

database + research questions => publications

4 !

2 (of 5) MdF and MQL

•  MdF database model

•  MQL query language

Monad Object Feature

1977-now: Eep Talstra et al. ECA, WIVU. Print reference (Google Books)

1988-1994 Crist-Jan Doedens: Text Databases – One Database Model and Several Retrieval Languages (google books reference)

2004: Ulrik Petersen. Emdros - a text database engine for analyzed or annotated text. COLING

word objects

standardedition

text

monads(atomic chunks

of text)

lexeme_utf8= תישארold_lexeme_utf8= תישאר

vocalized_lexeme_utf8= תישארsurface_consonants_utf8= תישאר

graphical_lexeme_utf8= ישאר

׃ץראה תאו םימשה תא םיה.א ארב תישארב

1234567891011

23456789101112

84383

59559

34680

7763777638

40770

7 .. 511 .. 9

11 .. 5

11 .. 5

11 .. 1

11 .. 1

clause_atom_number=1clause_atom_relation=0

clause_atom_relation_daughter_tense=unknownclause_atom_relation_kind=No_relation

clause_atom_relation_mother_tense=unknownclause_atom_relation_preposition_class=none

clause_atom_type=xQtlindentation=0

phrase objects

Monad-Object-Feature

subphrase objects

phrase_atom objects

clause_atom objects

sentence objects

MQL query language

topographic, i.e:

query expression =~= query results w.r.t.

•  sequence

•  embedding

Example SELECT ALL OBJECTS !WHERE ![Clause ! [Phrase ! [Word FOCUS !" " "part_of_speech = verb AND !" " "lexeme = "FJM["] !

] ! .. ! [Phrase FOCUS !" "phrase_function = Objc OR !" "phrase_function = IrpO!

] ! .. ! [Phrase FOCUS !" "phrase_function = Objc OR !" "phrase_function = IrpO!

] !] !

!

3 (of 5) Sharing

Problem: how to share (intermediate) results of analysis

Solution: saving queries as annotations

Lock - in

scholarly-bi

bles.com!

Stuttgart Electronic Study Bible

⇒ massive dissemination

But

⇒ not the right dynamics for tool development

Leiden: international workshop biblical scholarship

Desiderata:

new tool development

text transmission (variants)

linguistic analysis (features)

even combined!

a short history: 2012

leiden loren

tz!

Hebrew Text in the Archive

urn:nbn:nl:u

i:13-ikjj-ek

!

Hebrew Text in the Archive

urn:nbn:nl:u

i:13-ikjj-ek

!

how can the people annotate

our work? !

Research Data Cycle

Research Data Cycle Text transmission, tradition, editorial

processes

Free University, theology faculty,

server department, WIVU project

!

NWO projects !NWO projects

religious communities

theol. scholars

theol. scholars

enlightened lay people

scholarly-

bibles.com!

Research Data Cycle Text transmission, tradition, editorial

processes

Free University, theology faculty,

server department, WIVU project

!

NWO projects !NWO projects

religious communities

theol. scholars

theol. scholars

CLARIN SHEBANQ

linguists

Wider public: Annotation,

Query Saving, via Linked Data

dig. hum

comp. hum

enlightened lay people

scholarly-

bibles.com!

Research Data Archiving

DANS

3 (of 5) Sharing (c’t’d)

Solution: Queries As Annotations

queries-as-annotations

model ! query ! example !

body ! query instruction !SELECT ALL OBJECTS WHERE [Word FOCUS part_of_speech = verb AND lexeme = "שים"] !

targets ! query results in context !

ו ישכם יעקב ב בקר ו יקח את ה אבן אשר שם מראשתיו ו ישם אתה מצבה ו יצק שמן

על ראשה

annotation ! published query ! qu123 (just an identifier) !

metadata !

researcher, date created, date last

run, research question !

Janet Dyk 2004-02-16 2012-01-27 Can the verb ים have a double שobject? - article in Foundations for Syriac Lexicography !

OpenAnnotation openannotati

on.org!

provenance

motivation

demonstrator datane

tworkservice

.nl/qaa!

demonstrator datane

tworkservice

.nl/qaa!

demonstrator datane

tworkservice

.nl/qaa!

demonstrator datane

tworkservice

.nl/qaa!

demonstrator

demonstrator

demonstrator

demonstrator

still missing:

saving queries

not semantic-web-enabled

sustainability

4 (of 5) Project

CLARIN-NL: SHEBANQ:

(A) Curation

(B) Demonstrator

SHEBANQ

System for Hebrew Text: ANnotations for Queries

CLARIN-NL project

data curation: LAF

demonstrator: query saver

#!/etc bc

s/g$/q/ !

Linguistic Annotation Framework

ISO 24612:2012

Nancy Ide, Laurent Romary

feature definitions

feature definitions

TEI ISO-FS schema

dcr:datcat on <fDecl> versus <f>

26,225,966 <f>s ! !2.5 GB redundant attribute material !!

5 (of 5) Project

CLARIN-NL: SHEBANQ: (B) Demonstrator

select all objects where

[clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ]]

Execute

Query executed

Passage

תאו םימשה תא םיהלא ארב תישארב׃ץראה

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Controls

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Gen 1:1

2Chron 3:4

Gen 1:1 תאו םימשה תא םיהלא ארב תישארב׃ץראה

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Text

1Sam 12:4

Ex 23:2

Query results

Prev 2 3 65 ... 2241 Next21 313 results

Executing query ...

view in context

Save this query

Researcher Oliver Glanz

Date created 2013-08-25

Date last run 2013-08-25

Project Data and Tradition

Institute VU/Eep Talstra Centre for Bible and Computing

Reason irregular valency of ארב

Comments needs to be combined with query on םיהלא

Save PublishCancel

Name valency ארב

Edit Query

Passage

תאו םימשה תא םיהלא ארב תישארב׃ץראה

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Controls

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Gen 1:1

2Chron 3:4

Gen 1:1 תאו םימשה תא םיהלא ארב תישארב׃ץראה

תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי

Text

1Sam 12:4

Ex 23:2

Saved Query Results

Prev 2 3 65 ... 2241 Next21 313 results

view in context

Information on this query

Researcher Oliver Glanz

Date created 2013-08-25

Date last run 2013-08-25

Project

Institute

Reason

Comments

Name

Query Info

select all objects where

[clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ]]

MQL query text Persistent Identifier urn:nbn:nl:ui:13-scpm-ji

http://www.persistent-identifier.nl/?identifier=urn...

valency ארב

Data and Tradition

VU/Eep Talstra Centre for Bible and Computing

irregular valency of ארב

needs to be combined with query on םיהלא

datanetworks

ervice.nl/qa

a!

SHEBANQ: implementing Q-a-A

5 (of 5) Towards new tools

•  LAF tools

•  or generic graph algorithms

•  Emdros tools

•  or generic database technology

•  Linked Data tools

•  or generic SPARQL queries

Side conditions •  development close to the researchers

•  preferably in their own institutions

•  decent performance

•  within the scale of a laptop

•  usable to researchers

•  that is: non-programmers

•  persistence in mind

•  new results will be archived and re-enter the data cycle

thank you

dirk.roorda@dans.knaw.nl

slideshare.net/dirkroorda/

s/g$/q/ !

#!/etc bc Eep Talstra Centre for Bible and Computer!