Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

download Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

of 42

  • date post

    02-Apr-2015
  • Category

    Documents

  • view

    219
  • download

    5

Embed Size (px)

Transcript of Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

  • Slide 1

Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick Slide 2 Talk outline Slide 3 Teaching projects CTU 1996-99: Internet based grammar teaching software (research and development)CTU 1996-99: Internet based grammar teaching software (research and development) ELU1 1998-2000: VISL tools for Danish universities and teacher seminariesELU1 1998-2000: VISL tools for Danish universities and teacher seminaries VISL-HHX 2001-03: VISL tools for Danish business schoolsVISL-HHX 2001-03: VISL tools for Danish business schools VISL-GYM 2001-02: VISL tools for Danish gymnasiumsVISL-GYM 2001-02: VISL tools for Danish gymnasiums PaNoLa, GREI 2002-2004: Major Nordic languagesPaNoLa, GREI 2002-2004: Major Nordic languages VISL-SEM 2004-05: VISL didactics for teacher training collegesVISL-SEM 2004-05: VISL didactics for teacher training colleges URKAS 2004-05: Almen sprogforstelse (1.g)URKAS 2004-05: Almen sprogforstelse (1.g) Slide 4 Unity in diversity: A unified approach for 22 languages Slide 5 VISL research languages Slide 6 The VISL teaching network Slide 7 kompleksitetsprogression Slide 8 Grammy i Klostermlleskoven Story-line about grammar Interactive exercises Book = IT Comments for teachers Explanations for students Slide 9 The Paintbox game Slide 10 ShootingGallery: Hit a noun! Slide 11 WordFall - Tetris for grammarians Slide 12 Labyrinth - a word class maze Slide 13 Post office - stamping syntactic function Slide 14 Syntris - syntax brick by brick Slide 15 SpaceRescue: Alien syntax Slide 16 Constituent trees Slide 17 Interactive syntactic trees Slide 18 Teaching corpora of analyzed sentences Slide 19 Function categories Slide 20 BuildTree: Drag & drop constituents Slide 21 LabelTree: Drag & drop syntactic function Slide 22 Cross-language problems: Infinitive marker Slide 23 Cross-language problems: participial clauses Slide 24 Cross-language problems: Discontinuity Slide 25 VISL source notation VISL lite vertical tree (non-graphical notation, filtered) VISL vertical tree (non-graphical notation, incl. morphology) UTT:cl S:propVISL P:ver Cs:g =D:artet =H:nforskningsprojekt =D:cl ==S:pronder ==P:vinvolverer ==Od:g ===D:pronmange ===D:adjforskellige ===H:nsprog STA:fcl S:prop("VISL")VISL P:v-fin("vre",pr,akt)er Cs:np =DN:art("en",neu,sg,idf)et =H:n("forskningsprojekt",neu,sg,idf,nom)forskningsprojekt =DN:fcl ==S:pron-rel("der",nG,nN,nom)der ==P:v-fin("involvere",pr,akt)involverer ==Od:np ===DN:pron-indef("mange",nG,pl,nom)mange ===DN:adj("forskellig",nG,pl,nD,nom)forskellige ===H:n("sprog",neu,pl,idf,nom)sprog Slide 26 CG source notation (function/dependency) Slide 27 Supported xml-formats TIGER-xml (constituents) TIGER-xml (dependency) MALT-xml VISL data file markers: pedagogical topic and chaptering attributes for dynamic html-layout Slide 28 Search interfaces for annotated corpora Slide 29 Menu-based searches Slide 30 Statistical tools Slide 31 Corpus annotation Slide 32 Annotated corpora Morphosyntactically tagged Korpus90 and Korpus2000, mixed genre, 56M words DFK, mainly transscribed parliamentary discussions, 7M words CETEMPblico, European Portuguese, news text, 180M words Folha de So Paulo, Brazilian news text, 90M words CORDIAL-SIN, dialectal Portuguese, 30K words NURC, transscribed Brazilian speech, 100K words Tycho Brahe, historical Portuguese, 50K words Valency tagged NILC corpus, Brazilian Portuguese, journalistic and essays, 39M words Treebanks Floresta Sint(c)tica, European Portuguese, 1M words (35K revised) Arboretum, Danish, 50K words revised Slide 33 Integrating live NLP and language awareness teaching Slide 34 KillerFiller: Towards evaluation Slide 35 Performance statistics Slide 36 VISL http://visl.sdu.dk VISL http://visl.sdu.dk Eckhard Bick, lineb@hum.au.dk ************** Slide 37 The most common syntactic categories Slide 38 Slide 39 The DanGram system in current numbers Lexemes in morphological base lexicon: 146.342 (equals about 1.000.000 full forms), of these: proper names: 44839 (experimental) polylexicals: 460 (+ names and certain number expressions) Lexemes in the valency and semantic prototype lexicon: 95.308 Lexemes in the bilingual lexicon (Danish-Esperanto): 36.001 Danish CG-rules, in all: 6.233 morphological CG disambiguation rules: 2.678 syntactic mapping-rules: 1.701 syntactic CG disambiguation rules: 1.854 (plus 429 bilingual rules in separate MT grammars, and a smaller number of semantic case-role and proper name- rules in the semantics and name grammars) Danish PSG-rules: 490 (for generating syntactic tree structures) Performance: At full disambiguation (i.e., maximal precision), the system has an average correctness of 99% for word class (PoS), and about 96% for syntactic tags (depending, on how fine grained an annotation scheme is used) Speed: full CG-parse: ca. 400 words/sec for larger texts (start up time 3-6 sec) morphological analysis alone: ca. 1000 words/sec Slide 40 VISL parsing tools Preprocessing: word- and sentence boundaries, polylexicals Lexicon and rule based morphological analysis: Inflexion, derivation, composita recognition Postprocessing: Valency and semantic potential Morphological contextual disambiguation (CG) Syntactic mapping og diambiguation (CG) Names CG, feature propagation CG, Case role-CG PSG-overbygning: Teaching, Arboretum, Floresta Slide 41 Research projects SHF 1999-2001: CG, syntax & semantics (da,en,po)SHF 1999-2001: CG, syntax & semantics (da,en,po) AC/DC 1999-?: Portuguese CG-corporaAC/DC 1999-?: Portuguese CG-corpora Floresta 2000-?: Portuguese treebankFloresta 2000-?: Portuguese treebank DSL 2001-?: Korpus90/2000 (Danish CG-corpora)DSL 2001-?: Korpus90/2000 (Danish CG-corpora) Arboretum 2002-?: Danish treebankArboretum 2002-?: Danish treebank PaNoLa 2002-2003: Integration of Nordic CG researchPaNoLa 2002-2003: Integration of Nordic CG research Nomen Nescio: Automatic named entity recognitionNomen Nescio: Automatic named entity recognition Slide 42 Da [da] KS @SUB den [den] ART UTR S DEF @>N gamle [gammel] ADJ nG S DEF NOM @>N slger [slger] N UTR S IDF NOM @SUBJ> krte [kre] V IMPF AKT @FS-ADVL> hjem [hjem] N NEU P IDF NOM @ DET UTR S @>N bil [bil] N UTR S IDF NOM @P V IMPF AKT @FMV han [han] PERS UTR 3S NOM @ DET nG P NOM @>N sm [lille] ADJ nG P nD NOM @>N dyr [dyr] N NEU P IDF NOM &ACI-SUBJ@ N vde [vd] ADJ nG P nD NOM @>N veje [vej] N UTR P IDF NOM @P< Running CG-annotation