ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.
-
date post
22-Dec-2015 -
Category
Documents
-
view
222 -
download
2
Transcript of ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.
![Page 1: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/1.jpg)
ACM email corpus annotation analysis
Andrew Rosenberg2/26/2004
![Page 2: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/2.jpg)
2
Overview
• Motivation• Corpus Description• Kappa Shortcomings• Kappa Augmentation• Classification of messages• Corpus annotation analysis• Next step: Sharpening method• Summary
![Page 3: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/3.jpg)
3
Motivation
• The ACM email corpus annotation raises two problems.– By allowing annotators to assign a message one or
two labels, there is no clear way to calculate an annotation statistic.
• An augmentation to the kappa statistic is proposed
– Interannotator reliability is low (K < .3)• Annotator reeducation and/or annotation material redesign
are most likely necessary.• Available annotated data can be used, hypothetically, to
improve category assignment.
![Page 4: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/4.jpg)
4
Corpus Description
• 312 email messages exchanged between the Columbia chapter of the ACM.
• Annotated by 2 annotators with one or two of the following 10 labels– question, answer, broadcast, attachment
transmission, planning, planning scheduling, planning-meeting scheduling, action item, technical discussion, social chat
![Page 5: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/5.jpg)
5
Kappa Shortcomings
• Before running ML procedures, we need confidence in assigning labels to the messages.
• In order to compute kappa (below) we need to count up the number of agreements.
• How do you determine agreement with an optional secondary label?– Ignore the secondary label?
)(1
)()(
Ep
EpApK
![Page 6: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/6.jpg)
6
Kappa Shortcomings (ctd.)
• Ignoring the secondary label isn’t acceptable for two reasons.– It is inconsistent with the annotation guidelines.– It ignores partial agreements.
• {a,ba} - singleton matches secondary• {ab,ca} - primary matches secondary• {ab,cb} - secondary matches secondary• {ab,ba} - secondary matches primary, and vice
versa
• Note: The purpose is not to inflate the kappa value, but to accurately assess the data.
![Page 7: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/7.jpg)
7
Kappa Augmentation
• When a labeler employs a secondary label, consider it as a single annotation divided between two categories
• Select a value of p, where 0.5≤p≤1.0, based on how heavily to weight the secondary label– Singleton annotations assigned a score of 1.0– Primary p– Secondary 1-p
![Page 8: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/8.jpg)
Kappa Augmentation example
A B
1 a,b b,d
2 b,a a,b
3 b b
4 c a,d
5 b,c c
Annotator labelsJudge A a b c d
1 0.6 0.4
2 0.4 0.6
3 1
4 1
5 0.6 0.4
Total 1 2.6 1.4 0 5
Judge B a b c d
1 0.6 0.4
2 0.6 0.4
3 1
4 0.6 0.4
5 1
Total 1.2 2 1 0.8 5
Annotation Matrices with p=0.6
![Page 9: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/9.jpg)
9
Kappa Augmentation example (ctd.)
a b c d
1 00.2
4 0 0
20.2
40.2
4 0 0
3 0 1.0 0 0
4 0 0 0 0
5 0 0 0.4 0
Total0.2
41.4
8 0.4 0 2.12
Agreement matrix
424.05
12.2)( Ap
Judge A a b c d
1 0.6 0.4
2 0.4 0.6
3 1
4 1
5 0.6 0.4
Total 1 2.6 1.4 0 5
Judge B a b c d
1 0.6 0.4
2 0.6 .4
3 1
4 0.6 0.4
5 1
Total 1.2 2 1 0.8 5
Annotation Matrices
![Page 10: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/10.jpg)
10
Kappa Augmentation example (ctd.)
• To calculate p(E), use the relative frequencies of each annotators label usage.
P(Topic) Judge A Judge B P(A)*P(B)
a 0.2 0.24 0.048
b 0.52 0.4 0.208
c 0.28 0.2 0.056
d 0 0.16 0
p(E)= 0.312• Kappa is then computed as originally:
163.0312.01
312.0424.0
)(1
)()('
Ep
EpApK
![Page 11: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/11.jpg)
11
Classification of messages
• This augmentation allows us to classify messages based their individual kappa’ values at different values of p. – Class 1: high kappa’ at all values of p.– Class 2: low kappa’ at all values of p.– Class 3: high kappa’ at p = 1.0– Class 4: high kappa’ at p = 0.5
• Note: mathematically kappa’ needn’t be monotonic w.r.t. p, but with 2 annotators it is.
![Page 12: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/12.jpg)
12
Corpus Annotation Analysis
• Agreement is low at all values of p– K’(p=1.0) = 0.299– K’(p=0.5) = 0.281
• Other views of the data will provide some insight into how to revise the annotation scheme.– Category distribution– Category co-occurrence– Category confusion– Class distribution– Category by class distribution
![Page 13: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/13.jpg)
13
Corpus Annotation Analysis:Category Distribution
total gr db
Question 175 86 89
Answer 169 90 79
Broadcast 132 23 109
Attachment Transmission 3 1 2
Planning Meeting Scheduling 63 32 31
Planning Scheduling 27 22 5
Planning 92 76 16
Action Item 19 10 9
Technical Discussion 31 22 9
Social Chat 36 29 7
![Page 14: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/14.jpg)
14
Corpus Annotation Analysis:Category Co-occurrence
Q A B A.T. P.M.S P.S. P. A.I T.D S.C
Question x 19 12 1 8 6 17 1 6 7
Answer x x 2 0 15 3 4 1 7 2
Broadcast x x x 0 2 2 8 0 0 1
AttachmentTransmission x x x x 0 0 0 0 0 0
PlanningMeetingScheduling x x x x x 2 1 0 0 0
PlanningScheduling x x x x x x 0 0 0 0
Planning x x x x x x x 3 2 0
Action Item x x x x x x x x 1 0
TechnicalDiscussion x x x x x x x x x 1
Social Chat x x x x x x x x x x
![Page 15: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/15.jpg)
15
Corpus Annotation Analysis:Category Confusion
Q A B A.T. P.M.S. P.S P A.I T.D. S.C.
Question 62 36 21 0 18 13 47 7 13 10
Answer x 60 15 0 24 7 19 5 17 3
Broadcast x x 14 0 12 13 52 3 8 22
AttachmentTransmission x x x 0 0 0 1 0 0 1
PlanningMeetingScheduling x x x x 13 6 3 2 0 0
PlanningScheduling x x x x x 2 4 1 1 0
Planning x x x x x x 7 5 5 0
Action Item x x x x x x x 1 2 1
TechnicalDiscussion x x x x x x x x 2 1
Social Chat x x x x x x x x x 4
![Page 16: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/16.jpg)
16
Corpus Annotation Analysis:Class Distribution
Constant High (Class 1): 82 0.262821
Constant Low (Class 2): 150 0.480769
Low to High (Class 3): 40 0.128205
High to Low (Class 4): 40 0.128205
Total Messages 312
![Page 17: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/17.jpg)
17
Corpus Annotation Analysis:Category by Class Distribution-1/2
Num messagesClass :
Total
Question 52 0.29714
Answer 62 0.36686
Broadcast 16 0.12121
Attachment Transmission 0 0
Planning Meeting Scheduling 18 0.28571
Planning Scheduling 2 0.07407
Planning 8 0.08695
Action Item 0 0
Technical Discussion 2 0.06451
Social Chat 4 0.11111
Num messagesClass :
Total
Question 37 0.21142
Answer 42 0.24852
Broadcast 92 0.69697
Attachment Transmission 3 1
Planning Meeting Scheduling 24 0.38095
Planning Scheduling 13 0.48148
Planning 60 0.65217
Action Item 14 0.73684
Technical Discussion 17 0.54838
Social Chat 22 0.61111
Class 1:const. high Class 2:const. low
![Page 18: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/18.jpg)
Corpus Annotation Analysis:Category by Class Distribution-2/2
Num messagesClass :
Total
Question 46 0.26285
Answer 40 0.23668
Broadcast 6 0.04545
Attachment Transmission 0 0
Planning Meeting Scheduling 4 0.06349
Planning Scheduling 5 0.18518
Planning 5 0.05434
Action Item 4 0.21052
Technical Discussion 11 0.35483
Social Chat 64 0.16666
Num messagesClass :
Total
Question 40 0.22857
Answer 25 0.14972
Broadcast 18 0.13636
Attachment Transmission 0 0
Planning Meeting Scheduling 17 0.26984
Planning Scheduling 7 0.25925
Planning 19 0.20652
Action Item 1 0.05263
Technical Discussion 1 0.03225
Social Chat 2 0.11111
Class 3:low to high Class 4:high to low
![Page 19: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/19.jpg)
19
Next step: Sharpening method
• In determining interannotator agreement with kappa, etc., two available pieces of information are overlooked:– Some annotators are “better” than others– Some messages are “easier to label” than others
• By limiting the contribution of known poor annotators and difficult messages, we gain confidence in the final category assignment of each message.
• How do we rank annotators? Messages?
![Page 20: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/20.jpg)
20
Sharpening Method (ctd.)
• Ranking Annotators– Calculate kappa between each annotator and
the rest of the group.– “Better” annotators have a higher agreement
with the group
• Ranking messages– Variance (or -p*log(p)) of label vector summed
over annotators.– Messages with high variance are more
consistently annotated
![Page 21: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/21.jpg)
21
Sharpening Method (ctd.)
• How do we use these ranks?– Weight the annotators based on their rank.– Recompute the message matrix with weighted
annotator contributions.– Weight the messages based on their rank.– Recompute the kappa values with weighted
message contributions.– Repeat these steps until the weights change
beneath a threshold.
![Page 22: ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d7a5503460f94a5dd61/html5/thumbnails/22.jpg)
22
Summary
• The ACM email corpus annotation raises two problems.– By allowing annotators to assign a message one or
two labels, there is no clear way to calculate an annotation statistic.
• An augmentation to the kappa statistic is proposed
– Interannotator reliability is low (K < .3)• Annotator reeducation and/or annotation material redesign
are most likely necessary.• Available annotated data can be used, hypothetically, to
improve category assignment.