The annotation of causative constructions in a dialectal...

1
Contactos Av. Professor Gama Pinto, 2, 1649-003 Lisboa, Portugal Tel.: +351 217 904 700 | Fax: +351 217 965 622 | www.clul.ul.pt 1. INTRODUCTION AIM: empirical exploitation of a dialect annotated corpus, which allows: fast access to systematically organized and structured data and its geographical distribution; automatic searching of precise morphological and syntactic information. CASE STUDY: causative (non-finite) constructions with the verbs deixar, fazer and mandar (cf. Gonçalves 1999; Carrilho & Pereira 2010) Inflected Infinitive: (1) A mãe fez os filhos comerem a sopa. The mother made the MASC.PL kids NOM to-eat 3PL the soup Exceptional Case Marking (ECM): (2) A mãe fez os filhos comer a sopa. The mother made the MASC.PL kids ACC to-eat the soup Faire-INF (3) A mãe fez comer a sopa aos filhos . The mother made to-eat the soup to-the kids ‘The mother made the kids eat the soup.’ The annotation of causative constructions in a dialectal corpus Sandra Pereira* [email protected] 2. CORDIAL-SIN Annotation system (Tycho Brahe corpus; Penn parsed corpora) Input: POS annotated texts A/D-F mãe/N fez/VB-D comer/VB a/D-F sopa/N a@/P @os/D-P filhos/N-P ./. Causative Constructions o Inflected vs non-inflected infinitive o position and form of the causee: Null subject *pro* or NP between the verbs (4) Faz os porcos / *pro* criarem carne. (CLH23) made the MASC.PL pigsNOM to-raise 3PL meat ‘It makes the pigs get fat’ Null subject *arb*: (5) () até não deixavam trabalhar tanto (AJT03) even not let 3PL to-work much ‘they don’t even let them work so hard’ NP subject between and after the verbs: (6) Deixa esta vara crescer, (PST01) let 3SG this twig to-grow ‘(You) let this twig grow’ (7) Deixe ir a sua Ester . (GRJ31) let to-go the FEM.PL your Ester ‘Let your Ester go’ PP subject between and after the verbs: (8) Mandávamos moer às moleiras (FIG02) sent 1PL to-grind to-the FEM.PL millers ‘We sent millers to grind’ Empty category coindexed with a clitic: (9) Mandaram-lhe arrumar a camioneta ali (AAL34) sent 3PL - him DAT to-park the van there ‘They sent him to park the van there’ (10) () ela mandou-o ir, (COV11) she sent-him ACC there to-go ‘she sent him to go there’ CORDIAL-SIN Syntax-Oriented Corpus of Portuguese Dialects http://www.clul.ul.pt/resources/411-cordial-corpus annotated corpus: (almost) spontaneous speech 42 locations / 600 000 words (± 68 hours of audio recording) • Informants: aged, non-educated, rural, born and raised in the place of the interview Syntactic annotation (under development: ±200.000 words) Automatic search (CorpusSearch2: http://corpussearch.sourceforge.net ) CORDIAL-SIN Syntactic Annotation System Manual (http://www.clul.ul.pt/cordial-sam ) Cf. Carrilho 2010; Magro 2010 Methods in Dialectology XV, Groningen, August 11-15 2014 *Funded by PEst-OE/LIN/UI0214/2013 1. VPA. Vila Praia de Âncora (Viana do Castelo); 2. CTL. Castro Laboreiro (Viana do Castelo); 3. PFT. Perafita (Vila Real); 4. AAL. Castelo de Vide, Porto da Espada, S. Salvador de Aramenha, Sapeira, Alpalhão, Nisa (Portalegre); 5. PAL. Porches, Alte (Faro); 6. CLC. Câmara de Lobos, Caniçal (Funchal); 7. PST. Camacha, Tanque (Funchal); 8. MST. Monsanto (Castelo Branco); 9. FLF. Fajãzinha (Horta); 14. FIG. Figueiró da Serra (Guarda); 15. ALV. Alvor (Faro); 16. SRP. Serpa (Beja); 17. LVR. Lavre (Évora); 18. ALC. Alcochete (Setúbal) Syntagmatic labels NP-SBJ Noun Phrase (Subject) NP-ACC Noun Phrase (Accusative) NP-DAT Noun Phrase (Dative) PP-SBJ Prepositional Phrase (Dative) VB Infinitive Verb VB-F Infinitive Inflected Verb Phrasal labels IP-MAT Matrix / independent clause IP-INF Infinitive clause CorpusSearch2: Search functions iDoms x immediately dominates y HasSister x and y have the same mother Precedes x comes before y but x does not dominate y (IP-INF HasSister VB-*) AND (IP-INF iDoms NP-SBJ) AND (IP-INF iDoms VB-F*) I. query: no matches found!!! II. query: (IP-INF HasSister VB-*) AND (IP-INF iDoms {1}NP-SBJ|PP-SBJ) AND ({1}NP-SBJ|PP-SBJ iDoms !\*pro\*) AND (IP-INF iDoms !VB-F*) AND ({1}NP-SBJ|PP-SBJ) Precedes VB) AND (VB Precedes {1}NP-SBJ|PP-SBJ) The more specific the query is, the more refined the results are!! References: CARRILHO, Ernestina. 2010. Tools for dialect syntax: the case of CORDIAL-SIN (an annotated corpus of Portuguese dialects). In Gotzon AURREKOETXEA and Jose Luis ORMAETXEA (eds.) Tools for Linguistic Variation. Bilbao: Universidad del Pais Vasco. 57-70.; CARRILHO, E. & S. PEREIRA. 2010. "Causees in European Portuguese dialects: some observations on the properties and the position of the causee in causative constructions in CORDIAL-SIN", presented at Wedisyn's First Workshop on Syntactic Variation, IKER, Bayonne; GONÇALVES, Anabela. 1999. Predicados Complexos Verbais em Contextos de Infinitivo não Preposicionado do Português Europeu. Dissertação de doutoramento. Universidade Lisboa; MAGRO, Catarina. 2010. When CORDIAL becomes friendly: endowing the CORDIAL-SIN corpus with a syntactic annotation layer. Proceedings of the seventh international conference on Language Resources and Evaluation (LREC 2010). 3705-3711; Penn parsed corpora: http://www.ling.upenn.edu/histcorpora/ ; PEREIRA, Sandra. 2012. Protótipo de um Glossário dos Dialetos Portugueses com Informação Sintática. Dissertação de Doutoramento. Universidade de Lisboa; RANDALL, B., 2005-2007, CorpusSearch2 (http://corpussearch.sourceforge.net); Tycho Brahe Corpus: http://www.tycho.iel.unicamp.br/~tycho/corpus/en/ Problem: - We only search for what we know (the standard constructions); - Some dialectal constructions (cf. Pereira 2012) may not be searchable by these queries. For dialectal data, new queries are needed: III. query: ({1}PP HasSister {2}VB*) AND ({1}PP iDoms IP-INF) AND ({2}VB* iDoms mand*|deix*|faz*|faç*|fiz*|fez*) CORDIAL-SIN is a resource for studying Portuguese dialect syntax (cf. Carrilho 2010); POS tags and a rich syntactic annotation scheme allow several possibilites of search; Standard syntactic constructions (e.g. causative constructions) can be easily found by CorpusSearch2; Non-standard syntactic constructions can also be retrieved.

Transcript of The annotation of causative constructions in a dialectal...

Page 1: The annotation of causative constructions in a dialectal ...clul.ulisboa.pt/files/sandrabrito_pereira/PosterMethods_SP.pdf · MASC.PL kids ACC to-eat the soup Faire-INF (3) A mãe

Contactos Av. Professor Gama Pinto, 2, 1649-003 Lisboa, Portugal Tel.: +351 217 904 700 | Fax: +351 217 965 622 | www.clul.ul.pt

1. INTRODUCTION •  AIM: empirical exploitation of a dialect annotated

corpus, which allows: Ø  fast access to systematically organized and structured

data and its geographical distribution; Ø  automatic searching of precise morphological and

syntactic information. •  CASE STUDY: causative (non-finite) constructions

with the verbs deixar, fazer and mandar (cf. Gonçalves 1999; Carrilho & Pereira 2010)

Inflected Infinitive: (1)  A mãe fez os filhos comerem a sopa. The mother made theMASC.PL kidsNOM to-eat3PL the soup

Exceptional Case Marking (ECM): (2) A mãe fez os filhos comer a sopa. The mother made theMASC.PL kidsACC to-eat the soup

Faire-INF (3) A mãe fez comer a sopa aos filhos. The mother made to-eat the soup to-the kids

‘The mother made the kids eat the soup.’

The annotation of causative constructions in a dialectal corpus

Sandra Pereira* [email protected]

2. CORDIAL-SIN Annotation system (Tycho Brahe corpus; Penn parsed corpora) Ø  Input: POS annotated texts A/D-F mãe/N fez/VB-D comer/VB a/D-F sopa/N a@/P @os/D-P filhos/N-P ./. Ø Causative Constructions o  Inflected vs non-inflected infinitive o  position and form of the causee: §  Null subject *pro* or NP between the verbs

(4) Faz os porcos / *pro* criarem carne. (CLH23) made theMASC.PL pigsNOM to-raise3PL meat ‘It makes the pigs get fat’

§  Null subject *arb*: (5) (…) até não deixavam trabalhar tanto (AJT03) even not let3PL to-work much ‘they don’t even let them work so hard’

§  NP subject between and after the verbs: (6) Deixa esta vara crescer, (PST01) let3SG this twig to-grow ‘(You) let this twig grow’ (7) Deixe ir a sua Ester. (GRJ31) let to-go theFEM.PL your Ester ‘Let your Ester go’

§  PP subject between and after the verbs: (8) Mandávamos moer às moleiras (FIG02) sent1PL to-grind to-theFEM.PL millers ‘We sent millers to grind’

§  Empty category coindexed with a clitic: (9) Mandaram-lhe arrumar a camioneta ali (AAL34) sent3PL- himDAT to-park the van there ‘They sent him to park the van there’ (10) (…) ela mandou-o lá ir, (COV11) she sent-himACC there to-go ‘she sent him to go there’

CORDIAL-SIN •  Syntax-Oriented Corpus of Portuguese Dialects

http://www.clul.ul.pt/resources/411-cordial-corpus •  annotated corpus: (almost) spontaneous speech •  42 locations / 600 000 words

(± 68 hours of audio recording) •  Informants: aged, non-educated, rural, born and raised in the

place of the interview •  Syntactic annotation (under development: ±200.000 words)

Ø  Automatic search (CorpusSearch2: http://corpussearch.sourceforge.net)

Ø  CORDIAL-SIN Syntactic Annotation System Manual (http://www.clul.ul.pt/cordial-sam)

Ø  Cf. Carrilho 2010; Magro 2010

Methods in Dialectology XV, Groningen, August 11-15 2014 *Funded by PEst-OE/LIN/UI0214/2013

1. VPA. Vila Praia de Âncora (Viana do Castelo); 2. CTL. Castro Laboreiro (Viana do Castelo); 3. PFT. Perafita (Vila Real); 4. AAL. Castelo de Vide, Porto da Espada, S. Salvador de Aramenha, Sapeira, Alpalhão, Nisa (Portalegre); 5. PAL. Porches, Alte (Faro); 6. CLC. Câmara de Lobos, Caniçal (Funchal); 7. PST. Camacha, Tanque (Funchal); 8. MST. Monsanto (Castelo Branco); 9. FLF. Fajãzinha (Horta); 14. FIG. Figueiró da Serra (Guarda); 15. ALV. Alvor (Faro); 16. SRP. Serpa (Beja); 17. LVR. Lavre (Évora); 18. ALC. Alcochete (Setúbal)

Syntagmatic labels NP-SBJ   Noun Phrase (Subject) NP-ACC   Noun Phrase (Accusative) NP-DAT   Noun Phrase (Dative) PP-SBJ   Prepositional Phrase (Dative) VB   Infinitive Verb VB-F   Infinitive Inflected Verb

Phrasal labels IP-MAT Matrix / independent clause IP-INF Infinitive clause

CorpusSearch2: Search functions iDoms   x immediately dominates y HasSister x and y have the same mother Precedes x comes before y but x does not dominate y

(IP-INF HasSister VB-*) AND (IP-INF iDoms NP-SBJ) AND (IP-INF iDoms VB-F*)

I. query:

no matches found!!!

II. query: (IP-INF HasSister VB-*) AND (IP-INF iDoms {1}NP-SBJ|PP-SBJ) AND ({1}NP-SBJ|PP-SBJ iDoms !\*pro\*) AND (IP-INF iDoms !VB-F*)

AND ({1}NP-SBJ|PP-SBJ) Precedes VB)

AND (VB Precedes {1}NP-SBJ|PP-SBJ)

The more specific the query is, the more refined the results are!!

References: CARRILHO, Ernestina. 2010. Tools for dialect syntax: the case of CORDIAL-SIN (an annotated corpus of Portuguese dialects). In Gotzon AURREKOETXEA and Jose Luis ORMAETXEA (eds.) Tools for Linguistic Variation. Bilbao: Universidad del Pais Vasco. 57-70.; CARRILHO, E. & S. PEREIRA. 2010. "Causees in European Portuguese dialects: some observations on the properties and the position of the causee in causative constructions in CORDIAL-SIN", presented at Wedisyn's First Workshop on Syntactic Variation, IKER, Bayonne; GONÇALVES, Anabela. 1999. Predicados Complexos Verbais em Contextos de Infinitivo não Preposicionado do Português Europeu. Dissertação de doutoramento. Universidade Lisboa; MAGRO, Catarina. 2010. When CORDIAL becomes friendly: endowing the CORDIAL-SIN corpus with a syntactic annotation layer. Proceedings of the seventh international conference on Language Resources and Evaluation (LREC 2010). 3705-3711; Penn parsed corpora: http://www.ling.upenn.edu/histcorpora/ ; PEREIRA, Sandra. 2012. Protótipo de um Glossário dos Dialetos Portugueses com Informação Sintática. Dissertação de Doutoramento. Universidade de Lisboa; RANDALL, B., 2005-2007, CorpusSearch2 (http://corpussearch.sourceforge.net); Tycho Brahe Corpus: http://www.tycho.iel.unicamp.br/~tycho/corpus/en/

Problem: - We only search for what we know (the standard constructions); - Some dialectal constructions (cf. Pereira 2012) may not be searchable by these queries.

For dialectal data, new queries are needed: III. query: ({1}PP HasSister {2}VB*)

AND ({1}PP iDoms IP-INF) AND ({2}VB* iDoms mand*|deix*|faz*|faç*|fiz*|fez*)

ü  CORDIAL-SIN is a resource for studying Portuguese dialect syntax (cf. Carrilho 2010);

ü  POS tags and a rich syntactic annotation scheme allow several possibilites of search;

ü  S t a n d a r d s y n t a c t i c constructions (e.g. causative constructions) can be easily found by CorpusSearch2;

ü  Non-s tanda rd syn tac t i c constructions can also be retrieved.