UCCTS, 2010 (Omskrik)
description
Transcript of UCCTS, 2010 (Omskrik)
![Page 1: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/1.jpg)
IAC (ACCESS INTERFACE CORPUS)
DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA
TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA)JUDITH DOMINGO (BARCELONA MEDIA)CARME COLOMINAS (UNIVERSITAT POMPEU FABRA)
UCCTS, 2010 (Omskrik)
![Page 2: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/2.jpg)
IAC CORPORA USE: REQUIREMENTS
It’s easy to build corpus from the web but difficult to search
We need tools that allow frequency statistics, sorting results, linguistically-annotated sequences, etc.
![Page 3: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/3.jpg)
Concordances software (MonoConc, Concordance)
Databases
Corpus query systems (ie.CQP, EMDROS)Useful but tough to learnNot useful for training as students spend too much
time to learn the query system
IAC CORPORA: SEARCHING METHODS
![Page 4: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/4.jpg)
IAC CORPORA: INTERFACES (SEARCHING METHODS)
DISADVANTAGESLearn more than 1 interface
from the user point of viewProgramming and design
interfaces background needed (external resources)
If different attribute types are added > new design of the interface > new founding needed
Usually, more expensive than other options
ADVANTAGESUser-friendly
Not necessary training
![Page 5: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/5.jpg)
IAC (ACCESS INTERFACE CORPUS)
Translation Department (UPF) had many corpus (changing and growing constantly)
IAC was born (developed by Barcelona Media and UPF)
GOALSMonolingual and aligned corporaFast and easy creation of interfaces for corpora One interface design for all the corpora
![Page 6: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/6.jpg)
IAC INTERFACES
Simple : Key Words Out of Context
Advanced : Key Words In Context
Statistics: KWIC and frequency-based results
*** For corpus searching and indexation, IAC uses Corpus WorkBench (CWB) developed by IMS Stuttgart
EXAMPLES IAC
![Page 7: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/7.jpg)
IAC CORPUS FORMAT
<metadata title = “Demo” year=“2010”>
<func=subj>
The Det sg
boy Noun sg
</func>
buysVerbsg
<func=DO>
pencils Noun pl
</func>
</metadata>
Tabular
xml for metadata
Verticalized
xml for structural data
![Page 8: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/8.jpg)
IAC CORPORA: INSERTING A CORPUS INTO IAC
Upload the corpus (txt file) at the server
Searching interface design through a graphical tool (included in IAC) according to the corpus type and the linguistic annotation added
![Page 9: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/9.jpg)
![Page 10: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/10.jpg)
IAC is a flexible and powerful tool that goes beyond current corpora interfaces limitations
User-friendly toolAccess to multiple corpus from the same
platformNo need of external developer or
programming backgroundFast interface creation that can be modified
easily
IAC CONCLUSIONS
![Page 12: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/12.jpg)
SOME EXAMPLES…
![Page 13: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/13.jpg)
ADVANCED SEARCH
To show the advanced search, we use an annotated corpus with translation.
Let's look at examples of sequences with 1 or more words with syntax errors.
![Page 14: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/14.jpg)
ADVANCED SEARCH
![Page 15: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/15.jpg)
ADVANCED SEARCH
![Page 16: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/16.jpg)
ALIGNED CORPORA WITH METADATA
As example of aligned corpora, a Spanish > English corpus
Can
Could
May
Might
Poder (verb)
Our goal is to get examples of poder (Verb) translated as may or might in Economics texts.
![Page 17: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/17.jpg)
ALIGNED CORPORA WITH METADATA
![Page 18: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/18.jpg)
ALIGNED CORPORA WITH METADATA
![Page 19: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/19.jpg)
STATISTICS
Statistics are useful to get quantitative results of sequences. Our goal in this case is to get quantitative results of the prepositions that follow the verb pensar (to think) in Spanish
![Page 20: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/20.jpg)
STATISTICS
![Page 21: UCCTS, 2010 (Omskrik)](https://reader035.fdocuments.net/reader035/viewer/2022062500/568159c4550346895dc71639/html5/thumbnails/21.jpg)
STATISTICS
Back