The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the...
Transcript of The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the...
The CASS Technique for Evaluating the Performance of Argument Mining
Rory Duthie, John Lawrence, Katarzyna Budzynska, Chr is Reed
Centre for Argument TechnologyUniversity of Dundee
Rory Duthie John Lawrence Katarzyna Budzynska Chris Reed
IFiS Polish Academy of Sciences
22
OutlineMotivation and Aim
• Problems when publishing evaluation and results
• CASS (Combined Argument Similarity Score)
Metric• How CASS is calculated
Automation• Deployment of CASS
33
OutlineMotivation and Aim
• Problems when publishing evaluation and results
• CASS (Combined Argument Similarity Score)
Metric• How CASS is calculated
Automation• Deployment of CASS
44
Motivation
•Consistency for the Argument Mining community
•Metric which does not double penalise mismatches
•Automate the calculations
55
Motivation: Consistency for the community
From the 2nd Workshop on Argument(ation) Mining:
• Inter-annotator agreement: 3 papers - Cohen’s Kappa 3 papers - percentage agreement2 papers - precision and recall 3 papers - other methods
• Automatic Argument Mining results: 4 papers - accuracy 5 papers - precision, recall and F-score1 paper - macro-averaged F-score
• Other Metrics in Comp Ling: ROUGE, in text summarization
66
Motivation: Metric (1/3)(Kirschner et al., 2015) provides:• Graph Based approach, APA, Weighted Average
Problems: • Segmentation differences
• Propositional content relations only
• Not all nodes in an analysis (Distance < 6)
• Relation direction ignored
• Set metrics
77
Motivation: Metric (2/3)CASS extends (Kirschner et al., 2015):
• Segmentation differences
• Propositional content relations and dialogical content relations:
• confusion matrices
• all nodes
• differing segmentation
88
Motivation: Metric (3/3)• Use CASS to combine scores
• CASS with any metric
• Annotator agreement and Argument Mining results
• Comparison of analysis in different annotation schemes
9
Motivation: Automatic Solution
Manual VS ManualCohen’s Kappa,Fleiss Kappa…
Manual VS AutomaticPrecision, Recall, F-score,
Accuracy…
1010
VS VS
OutlineMotivation and Aim
• Problems when publishing evaluation and results
• Aim of CASS (Combined Argument Similarity Score)
Metric• How CASS is calculated
Automation• Deployment of CASS
1111
Metric: Segmentation (1/4)
1212
Still, it is possible that, should war erupt in Iraq, American and British forces might fall foul of, for example, the provision of the ICC treaty outlawing attacks on military targets that cause "clearly excessive" harm to civilians.
Metric: Segmentation (2/4)
1313
That is especially so if they do not learn lessons from recent wars and take corrective steps. The weapon most likely to produce such harm is the cluster bomb.
Metric: Segmentation (3/4)
1414
12 31 1810 28S2 17 12 27
S1 20 18 29 39 31 18
Still, it is possible that, should war erupt in Iraq, American and British forces might fall foul of, for example, the provision of the ICC treaty outlawing attacks on military targets that cause "clearly excessive" harm to civilians.
Metric: Segmentation (4/4)
•Pk - (Beeferman et al., 1999)
•WindowDiff - (Pevzner and Hearst, 2002)
•Segmentation Similarity - (Fournier and Inkpen, 2012)
1515
Metric: Calculating Relations
•Guaranteed matching formula used for all propositions and locutions
•We use the Levenshtein distance
•Levenshtein distance and word positions are combined to give node matches
1616
Metric: Propositional Relations (1/3)
1717
5
6
42
31
7
2 4
31
6
8
5
Annotation 1 Annotation 2
Metric: Propositional Relations (2/3)
1818
5
6
42
31
7
2 4
31
6
8
5
Annotation 1 Annotation 2
Metric: Propositional Relations (3/3)
•Pair nodes and check the relation attached
•When there is a differing segmentation, consider fine grained and convergent arguments
•All node pairs are considered to give a confusion matrix
19
Metric: Dialogical Relations (1/3)
2020
Metric: Dialogical Relations (2/3)
2121
Metric: Dialogical Relations (3/3)
•Split calculation into parts
•When there is a differing segmentation, considered matched pairs
•All node pairs are considered to give a confusion matrix
22
CASS technique
•Combine scores for the CASS technique
•Applied to any consistent combination of scores
2323
CASS: Evaluation
•Use CASS – Kappa as it provides an adjustment of the score for chance
•Not the only score that can be used with CASS
2424
CASS: Extension
•Any metric with a confusion matrix can be applied to CASS
• E.g. Balanced Accuracy, Informedness…
•We provide a select set but there is no metric ruled out
2525
OutlineMotivation and Aim
• Problems when publishing evaluation and results
• Aim of CASS (Combined Argument Similarity Score)
Metric• How CASS is calculated
Automation• Deployment of CASS
2626
Automation: AIF (Argument Interchange Format)
•AIF allows us to split calculations into component parts: segmentation, propositional and dialogical
•AIF allows the translation of other representation models to AIF format
•Allows for comparison of corpora in different representations.
•However, CASS technique is independent of AIF
2727
Thank You.
31
Find out more athttp://arg.tech
Come to COMMA 2016: Conference onComputational
Models of Argument(Potsdam)
Investigate thedatasets at
http://aifdb.org
31
ReferencesChristian Kirschner, Judith Eckle-Kohler, and Iryna Gurevych. 2015. Linking the thoughts: Analysis of argumentation structures in scientific publications. In Proceedings of the Second Workshop on Argumentation Mining. Association for Computational Linguistics, pages 1–11.
Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical models for text segmentation. Machine learning, 34(1-3):177–210.
Lev Pevzner and Marti A Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19–36.
Chris Fournier and Diana Inkpen. 2012. Segmentation similarity and agreement. In Proceedings of the2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 152–161. Association for Computational Linguistics
3232