PASCAL P ASCAL C HALLENGE ON I NFORMATION E XTRACTION & M ACHINE L EARNING Neil Ireson Local...
-
Upload
katherine-johnston -
Category
Documents
-
view
214 -
download
0
Transcript of PASCAL P ASCAL C HALLENGE ON I NFORMATION E XTRACTION & M ACHINE L EARNING Neil Ireson Local...
PASCAL
PASCAL CHALLENGE ON INFORMATION EXTRACTION
& MACHINE LEARNING
Designing Knowledge Management using Adaptive Information Extraction from Text
PASCAL Network of Excellence on Pattern Analysis, Statistical Modelling and Computational Learning
Call for participation:
Evaluating Machine Learning for Information Extraction
July 2004 - November 2004
The Dot.Kom European project and the Pascal Network of Excellence invite you in participating in the Challenge on Evaluation of Machine Learning for Information Extraction from Documents. Goal of the challenge is to assess the current situation concerning Machine Learning (ML) algorithms for Information Extraction (IE), identifying future challenges and to foster additional research in the field. Given a corpus of annotated documents, the participants will be expected to perform a number of tasks; each examining different aspects of the learning process.
Corpus A standardised corpus of 1100 Workshop Call for Papers (CFP) will be provided. 600 of these documents will be annotated with 12 tags that re late to pertinent information (names, locations, dates, etc.). Of the annotated documents 400 will be provided to the participants as a training set, the remaining 200 will form the unseen test set used in the final evaluation. All the documents will be pre-processed to include tokenisation, part-of-speech and named-entity information.
Tasks Full scenario: The only mandatory task for participants is learning to annotate implicit information: given the 400 training documents, learn the textual patterns nece ssary to extract the annotated information. Each participant provides results of a four-fold cross-validation experiment using the same document partitions for pre-competitive tests. A final test will be performed on the 200 unseen documents. Active learning: Learning to select documents: the 400 training documents will be divided into fixed subsets of increasing size (e.g. 10, 20, 30, 50, 75, 100, 150, and 200). The use of the subsets for training will show effect of limited resources on the learning process. Secondly, given each subset the participants can select the documents to add to increment to the next size (i.e. 10 to 20, 20 to 30, etc.), thus showing the ability to select the most suitable set of documents to annotate. Enriched Scenario: the same procedure as task 1, except the participants will be able to use the unannotated part of the corpus (500 documents). This will show how the use of unsupervised or semi-supervised methods can improve the results of supervised approaches. An interesting variant of this task could concern the use of unlimited resources, e.g. the Web.
Participation Participants from different fields such as machine learning, text mining, natural language processing, etc. are welcome. Participation in the challenge is free. After registration, participant will receive the corpus of documents to train on and the precise instructions on the tasks to be performed. At an established date, participants will be required to submit their systems’ answers via a Web portal. An automatic scorer will compute the accuracy of extraction. A paper will have to be produced in order to describe the system and the results obtained. Results of the challenge will be discussed in a dedicated workshop.
Timetable 5th July 2004: Formal definition of the tasks, annotated corpus and evaluation server 15th October 2004: Formal evaluation November 2004: Presentation of evaluation at Pascal workshop
Organizers Fabio Ciravegna: University of Sheffield, UK; (coordinator) Mary Elaine Califf, Illinois State University, USA,
Neil Ireson
Local Challenge Coordinator
Web Intelligent GroupDepartment of Computer ScienceUniversity of Sheffield
PASCAL
Organisers• Sheffield – Fabio Ciravegna
• UCD Dublin – Nicholas Kushmerick
• ITC-IRST – Alberto Lavelli
• University of Illinois – Mary-Elaine Califf
• FairIsaac – Dayne Freitag
Website• http://tyne.shef.ac.uk/Pascal
PASCAL
Outline
• Challenge Goals
• Data
• Tasks
• Participants
• Results on Each Task
• Conclusion
PASCAL
Goal : Provide a testbed for comparative evaluation of ML-based IE
• Standardised data• Partitioning• Same set of features
– Corpus preprocessed using Gate– No features allowed other than the ones provided
• Explicit Tasks• Standard Evaluation
• Provided independently by a server
• For future use• Available for further test with same or new systems• Possible to publish and new corpora or tasks
PASCAL
Data (Workshop CFP)2005
1993
2000
Training Data
400 Workshop CFP
Testing Data
200 Workshop CFP
PASCAL
Data (Workshop CFP)2005
1993
2000
Training Data
400 Workshop CFP
Testing Data
200 Workshop CFP
Set0
Set1
Set2
Set3
PASCAL
Data (Workshop CFP)2005
1993
2000
Training Data
400 Workshop CFP
Testing Data
200 Workshop CFP
Set0
Set1
Set2
Set3
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
PASCAL
Data (Workshop CFP)2005
1993
2000
Training Data
400 Workshop CFP
Testing Data
200 Workshop CFP
Enrich Data 1
250 Workshop CFP
Enrich Data 2
250 Conference CFP
WWW
PASCAL
Preprocessing
• GATE– Tokenisation– Part-Of-Speech– Named-Entities
• Date, Location, Person, Number, Money
PASCAL
Annotation Exercise
• 4+ months• Initial consultation• 40 documents – 2 annotators• Second consultation• 100 documents – 4 annotators• Determine annotation
disagreement• Full annotation – 10 annotators
AnnotatorsChristopher BrewsterSam ChapmanFabio CiravegnaClaudio GiulianoJose IriaAshred KhanVita LanfranchiAlberto LavelliBarry Norton
PASCAL
PASCAL
Annotation SlotsTraining Corpus Test corpus
workshop name 543 11.8% 245 10.8%
acronym 566 12.3% 243 10.7%
homepage 367 8.0% 215 9.5%
location 457 10.0% 224 9.9%
date 586 12.8% 326 14.3%
paper submission date 590 12.9% 316 13.9%
notification of acceptance date 391 8.5% 190 8.4%
camera-ready copy date 355 7.7% 163 7.2%
conference name 204 4.5% 90 4.0%
acronym 420 9.2% 187 8.2%
homepage 104 2.3% 75 3.3%
Total 4583 100.0% 2274 100.0%
PASCAL
Evaluation Tasks
• Task1 - ML for IE: Annotating implicit information – 4-fold cross-validation on 400 training documents
– Final Test on 200 unseen test documents
• Task2a - Learning Curve: – Effect of increasing amounts of training data on learning
• Task2b - Active learning: Learning to select documents – Given seed documents select the documents to add to training set
• Task3a - Enriched Data:– Same as Task1 but can use the 500 unannotated documents
• Task3b - Enriched & WWW Data:– Same as Task1 but can use all available unannotated documents
PASCAL
Evaluation
• Precision/Recall/F1Measure
• MUC Scorer
• Automatic Evaluation Server
• Exact matching
• Extract every slot occurrence
PASCAL
ParticipantsParticipant ML 4-fold X-validation Test Corpus
1 2a 2b 3a 3b 1 2a 2b 3a 3b
Amilcare (Sheffield, UK) LP2 2 2 1 1 1 1 1
Bechet (Avignon, France) HMM 2 1 2 2
Canisius (Netherlands) SVM, IBL 1 1
Finn (Dublin, Ireland) SVM 1 1
Hachey (Edinburgh, UK) MaxEnt, HMM 1 1
ITC-IRST (Italy) SVM 3 3 1
Kerloch (France) HMM 2 2 3 2
Sigletos (Greece) LP2, BWI, ? 1 3
Stanford (USA) CRF 1 1
TRex (Sheffield, UK) SVM 2
Yaoyong (Sheffield, UK) SVM 3 3 3 3 3 3
Total 15 8 4 0 0 20 10 5 1 1
PASCAL
Task1
Information Extraction with all the available data
PASCAL
Task1: Test Corpus
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Re
call
AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch
PASCAL
Task1: Test Corpus
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Precision
Re
call
AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch
PASCAL
Task1: 4-Fold Cross-validation
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Precision
Rec
all
Amilcare
Yaoyong
ITC-IRST
Sigletos
Canisius
Bechet
Finn
Kerloch
PASCAL
Task1: 4-Fold & Test Corpus
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Precision
Rec
all
Amilcare
Yaoyong
ITC-IRST
Sigletos
Canisius
Bechet
Finn
Kerloch
PASCAL
Task1: Slot FMeasure
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Mean
Max
PASCAL
Best Slot FMeasures Task1: Test Corpus
Amilcare1 Yaoyong1 Stanford1 Yaoyong2 ITC-IRST2name 0.352 0.58 0.596 0.542 0.66acro 0.865 0.612 0.496 0.6 0.383date 0.694 0.731 0.752 0.69 0.589home 0.721 0.748 0.671 0.705 0.516loca 0.488 0.641 0.647 0.66 0.542pape 0.864 0.74 0.712 0.696 0.712noti 0.889 0.843 0.819 0.856 0.853came 0.87 0.75 0.784 0.747 0.783name 0.551 0.503 0.493 0.477 0.481acro 0.905 0.445 0.491 0.387 0.348home 0.393 0.149 0.151 0.116 0.119
workshop
conference
PASCAL
Slot Recall: All Participants
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10
Workshop name
Workshop acro
Workshop date
Workshop home
Workshop loca
Workshop pape
Workshop noti
Workshop came
Conference name
Conference acro
Conference home
PASCAL
Task 2a
Learning Curve
PASCAL
Task2a: Learning Curve FMeasure
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Amilcare
Yaoyong1
Yaoyong2
Yaoyong3
ITC-IRST1
Bechet2
Kerloch3
Bechet1
Kerloch2
Hachey
MEAN
PASCAL
Task2a: Learning Curve Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Amilcare
Yaoyong1
Yaoyong2
Yaoyong3
ITC-IRST1
Bechet2
Kerloch3
Bechet1
Kerloch2
Hachey
MEAN
PASCAL
Task2a: Learning Curve Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Amilcare
Yaoyong1
Yaoyong2
Yaoyong3
ITC-IRST1
Bechet2
Kerloch3
Bechet1
Kerloch2
Hachey
MEAN
PASCAL
Task 2b
Active Learning
PASCAL
Active Learning (1)
400 Potential Training Documents
200 Test Documents
PASCAL
Active Learning (1)
360 Potential Training Documents
40 SelectedTraining
Document
200 Test Documents
Select
Test
PASCAL
Active Learning (2)
360 Potential Training Documents
200 Test Documents
Subset040 Training Documents
Extract
PASCAL
Active Learning (2)
320 Potential Training Documents
40 SelectedTraining
Documents
200 Test Documents
Select
TestSubset040 Training Documents
PASCAL
Active Learning (3)
320 Potential Training Documents
200 Test Documents
Subset0,180 Training Documents
Extract
PASCAL
Active Learning (3)
280 Potential Training Documents
40 SelectedTraining
Documents
200 Test Documents
Select
TestSubset0,180 Training Documents
PASCAL
Task2b: Active Learning
• Amilcare– Maximum divergence from expected number of
tags.
• Hachey– Maximum divergence between two classifiers
built on different feature sets.
• Yaoyong (Gram-Schmidt)– Maximum divergence between example subset.
PASCAL
Task2b: Active LearningIncreased FMeasure over random selection
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Amilcare
Yaoyong1
Yaoyong2
Yaoyong3
Hachey
PASCAL
Task 3
Semi-supervised learning
(not significant participation)
PASCAL
Conclusions (Task1)
• Top three (4) systems use different algorithms– Amilcare : Rule Induction– Yaoyong : SVM– Stanford : CRF– Hachey : HMM
PASCAL
Conclusions (Task1: Test Corpus)• Same algorithms (SVM) produced different results
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
P recision
Yaoyong
ITC-IRST
Canisius
Trex
Finn
PASCAL
Conclusions (Task1: 4-fold Corpus)• Same algorithms (SVM) produced different results
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Precision
Yaoyong
ITC-IRST
Canisius
Finn
PASCAL
Conclusions (Task1)
• Task 1– Large variation on slot performance
• Good performance on:– “Important” dates and Workshop homepage
– Acronyms (for Amilcare)
• Poor performance on:– Workshop name and location
– Conference name and homepage
PASCAL
Conclusion (Task2 & Task3)
• Task 2a: Learning Curve– Systems’ performance is largely as expected
• Task 2b: Active Learning– Two approaches, Amilcare and Hachey,
showed benefits
• Task 3: Enrich Data– Not sufficient participation to evaluate use of
enrich data
PASCAL
Future Work
• Performance differences:– Systems: what determines good/bad performance– Slots: different systems were better/worse at identifying different
slots
• Combine approaches• Active Learning• Enrich data
– Overcoming the need for annotated data
• Extensions– Data: Use different data sets and other features, using (HTML)
structured data– Tasks: Relation extraction
PASCAL
Why is Amilcare Good?
PASCAL
Contextual Rules
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9
no context-PRE
no context-REC
no context-FME
PASCAL
Contextual Rules
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9
context-PRE
context-REC
context-FME
no context-PRE
no context-REC
no context-FME
PASCAL
Rule Redundancy
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0.3 0.5 0.7 0.9
FMeasure
Nu
mb
er
of
Ru
les
Slots
Linear (Slots)