CAT: an advanced environment for the manual annotation of...

CAT: an advanced environment for the manual annotation of text and corpora

Giovanni Moretti – FBK, Italy Matteo Fuoli – Lund University, Sweden Rachele Sprugnoli– FBK/Università di Trento, Italy

Outline

1.  Limitations of traditional corpus analysis software (i.e. concordancers) –  The case of evaluation (Hunston and Thompson, 2000;

Thompson and Alba-Juez, 2014)

2.  Overview of the Content Annotation Tool – CAT 3.  Software demonstration

Background Evaluation

“The expression of the speaker or writer’s attitude or stance towards, viewpoint on, or feelings about the entities or propositions that he or she is talking about” (Hunston and Thompson, 2000, p. 5)

Background Challenges in the corpus-based analysis of evaluation

1.  Open-ended set of forms 2.  Multi-word expressions 3.  Role of context and co-text

Context/co-text Polysemy

•  ExxonMobil is dedicated to minimizing adverse risks and impacts associated with our products. (EVALUATIVE)

•  This may seem strange in a column dedicated to that very subject, but I think it is excellent advice. (NON-EVALUATIVE)

Context/co-text Evaluative polarity

•  Priority issues. Foster a diverse work environment that encourages employee growth. (POSITIVE)

•  BP operates throughout the world in locations, terrains and climates that are tremendously diverse and frequently challenging. (NEUTRAL/NEGATIVE)

Background Challenges for the quantitative analysis of evaluation 1.  It is impossible to identify a definitive finite list of forms that can be

searched for using automatic corpus techniques

2.  Context needs to be taken into account

‘Top-down’ approach: focus on a restricted range of language forms with predictable evaluative meaning

‘Bottom-up’ approach: manual corpus annotation

The Content Annotation Tool – CAT

The Content Annotation Tool – CAT Overview

•  A general-purpose web-based tool for manual corpus annotation •  User-friendly interface •  Fully customizable annotation scheme •  It allows to annotate text spans of variable length and discontinuous •  It supports multiple annotation layers •  Annotation data stored in stand-off XML format

–  Easily manipulated and converted into tabular ‘case-by-variable’ format

•  It features a statistics module –  Frequency of annotated types and inter-coder agreement

Software demo

The Content Annotation Tool – CAT Main strengths •  Ease of use and flexibility •  It supports the annotation of discontinuous text spans •  Multiple annotators can access the same project from different

locations •  The annotation data are stored in stand-off XML format

–  Flexible and easy to manipulate –  Easily converted into ‘case-by-variable’ tabular format –  Supports multiple annotation layers: same tokens and texts

can be annotated more than once

The Content Annotation Tool – CAT Main strengths •  It enables sophisticated statistical analyses based on manual

corpus annotation •  It enables new types of corpus-based analyses, e.g.

quantifying functions instead of forms

Evaluation Variables of interest

•  What kind of evaluative meaning is being expressed? •  Who is the stance-taker? •  What/who is being evaluated? •  Are evaluative expressions boosted/hedged? •  What is the topic being discussed? •  What is the discourse genre under analysis?

Fuoli and Glynn (2013) Coding scheme

•  Part of speech •  Evaluative semantics (coarse) •  Evaluative semantics (fine) •  Engagement •  Graduation •  Hypotheticality •  Sentential negation •  Target

•  Target person •  Stance-taker •  Subject person •  Evaluative polarity •  Topic •  Company •  Year •  Period (before-after)

14 variables

Fuoli and Glynn (2013) Statistical analysis

•  Univariate statistics – Chi-square test

•  Exploratory multivariate statistics – Correspondence analysis

•  Confirmatory multivariate statistics – Logistic regression

Multiple correspondence analysis Evaluative polarity, target, engagement

Fuoli and Glynn (2013) Conclusions

•  Evaluative semantics is not a significant factor •  Stancetaker, hypotheticality and target are the strongest

factors

Fuoli (2013) Log-linear analysis of Appraisal in specialized corpus

Standardized

Residuals:

−4:−2

−2:0

bp.table

company

markable

BP CHEVRON CONOCO EXXON SHELL

TJUDGEM

2008 2009 2010 2011 20082009 2010 2011 20082009 2010 2011 2008 2009 2010 2011 2008 200920102011

Accessing CAT •  Beta version can be freely accessed here:

https://dh.fbk.eu/resources/cat-content-annotation-tool

Thank you for listening Giovanni Moretti (FBK) moretti@fbk.eu

Rachele Sprugnoli (FBK) sprugnoli@fbk.eu

Matteo Fuoli (Lund University) matteo.fuoli@englund.lu.se

CAT: an advanced environment for the manual annotation of...

Documents

Transcript of CAT: an advanced environment for the manual annotation of...

Committee for Advanced Therapies (CAT) · Committee for Advanced Therapies (CAT) EMA/CAT/297683/2020 Page 7/26 Scope: Day 120 list of questions . Action: for adoption . The CAT Rapporteurs

Advanced practical course in genome bioinformaticsekhidna.biocenter.helsinki.fi/downloads/teaching/spring2017/Practic… · •Web Apollo –collaborative online manual annotation

Advanced Digital Microscopes and Measurement Systems · Annotation. Accurate measurement and annotation of lines, angles & circles to accomodate a ... Inspex II is factory calibrated

C ADVANCED TECHNOLOGY (CAT) · I. Sample Contract ... Centers for Advanced Technology (CAT) ... Enter a new password and click the SAVE button located on the top, ...

Package ‘annotationTools’ - bioconductor.riken.jp fileThe annotation is returned as elements of list ’annotation’. If a single annotation is given for a If a single annotation

Review Free EGASP: the human ENCODE Genome Annotation ...repository.cshl.edu/25307/1/...Genome-Annotation...Automatic genome annotation methods To date, accurate automatic annotation

Committee for Advanced Therapies (CAT) · Committee for Advanced Therapies (CAT) EMA/CAT/143367/2020 Page 6/23 1. Introduction 1.1. Welcome and declarations of interest of members,

Annotation - 碁峰資訊epaper.gotop.com.tw/pdf/A155.pdf · Retention meta-annotation Java annotation annotation class class Java virtual machine annotation class class annotation

Annotation consistency using annotation intersections

Committee for Advanced Therapies (CAT) · Committee for Advanced Therapies (CAT) EMA/CAT/305411/2017 Page 3/24 . 4.1.4. Allogeneic human umbilical cord blood -derived mesenchymal

DRAFT - Version 1/27/2014 - Annotation Studio · Type your annotation into the Annotation Editor. !!!!! 3.5 Tags and privacy options! Below the annotation ﬁeld in the Annotation

CAT Advanced series Circuit Breaker Analyzers & Timers

Advanced English I: The Five People Eddie Meets in Heaven€¦ · · 2016-05-16Advanced English I: Summer Reading Annotation Guide Advanced English I: The Five People Eddie Meets

NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.

Pathema Burkholderia Annotation Jamboree: Introduction to Annotation Jamboree

Nyan Cat Advanced Piano Sheet Music

Annotation. Traditional genome annotation BLAST Similarities.

Formal Model and Conceptual Architecture of the Annotation ...ferro/papers/2004/PhD2004-F.pdfASI Annotation Service Integrator ASM Annotation Storing Manager ATIM Annotation Textual

Advanced Cat Herding - Newcity · Advanced Cat Herding (Pre-Release v0.4) Introduction You’ve heard it before—“like herding cats” is a favorite simile for getting any group

An annotation dataset facilitates automatic annotation of ... · An annotation . dataset facilitates. automatic annotation of whole-brain activity imaging of . C. elegans. Author.