© 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J....
-
Upload
alyson-joseph -
Category
Documents
-
view
219 -
download
0
description
Transcript of © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J....
© 2010 IBM Corporation
Learning to Predict Readability using Diverse Linguistic Features
Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J. Mooney, Salim Roukos, Chris Welty
Presented by: Young-Suk Lee
The University of Texas at Austin IBM T. J. Watson Research Center
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
2
Outline
Problem definition and motivations
Data
System and Features
Experimental Results
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
3
Readability
DARPA machine reading program (MRP)
“Readability is defined as a subjective judgment of how easily a reader can extract the information the writer or the speaker intended to convey.”
Task: given a general document, assign a readability score (1 to 5)
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
4
Sample Passage: High Readability
Industrial agriculture has grown increasingly paradoxical, replacing natural processes with synthetic practices and treating farms as factories. Consequently, food has become a marketing entity rather than a necessity to sustain life. …
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
5
Sample Passage: Low Readability
The word of the prince of believers may Allah God him Talk of gold this at present Reflections on the word of the prince of believers may Allah pleased with him, Prince of Believers May Allah be pleased with him: …
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
6
Remove less readable documents from web-search
Filter out less readable documents before extracting knowledge
Select reading materials
Readability: Motivations
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
7
Predicting readability: conveying message– vs. reading difficulty (grade 1 to 12)
Document sources: multiple genres – vs. single domain, genre or reader group
Contrast With Other Work
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
8
Outline
Problem definition and motivations
Data
System and Features
Experimental Results
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
9
Data
390 training documents
Each document:– 8 expert ratings: [1,..,5]– 6-10 “novice” ratings: [1,…,5]
Ratings differ by genre– Nwire and wiki documents: high– MT documents: low
Genre #Docs Expert Rating
Novice
Ratingnwire 56 4.93 4.23
wiki 56 4.83 4.13
weblog 55 4.46 3.75
q-trans 56 4.47 3.83
news-grp 55 4.26 3.34
ccap 56 4.13 3.53mt 56 2.38 1.92
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
10
Data
MT docs
0
50
100
150
200
250C
ount
1 2 3 4 5Rating
Histogram of Novice Ratings
nwwkwlqtngccmt
MT docsMT docs
ng: newsgroupSpeech: closed
caption
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
11
Outline
Problem definition and motivations
Data
System and Features
Experimental Results
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
12
System Overview
Training Docs
Preprocessing
LM score
Parser score
…Regression
(WEKA)
Test Doc
Sys. Rating
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
13
Syntactical Features
Using Sundance [Riloff &Phillips 04] and English Slot Grammer parsers
– Ratio of sentences without verbs– Avg. # clauses/per sentence– Avg. #NPs, #VPs, #PPs, #Phrases/sent, – Failure rate of ESG parser– ..
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
14
Language Model (LM) Features
Normalized document probability:– by a 5-gram generic LM
Genre-specific LMs– Data readily available for those genres– Certain genre is a strong predictor of readability
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
15
Genre-based Language Model Features
Perplexity of genre-specific LM (Mj):
Genre posterior perplexity (relative probability compared to all G genres):
DocumentHistory words
Word
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
16
Lexical Features
Fraction of known words using dictionary and gazetteer of names
Out-of-vocabulary (OOV) rates using genre-based corpora
Ratio of function words (“the”, “of” etc.)
Ratio of pronouns
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
17
Experiments: Evaluation Metric
Pearson correlation coefficient – Mean expert judge rating as the gold-standard
To compare with novice judges:– A sampling distribution representing performance of novice judges
was generated
– Distribution mean and upper critical value were computed Correlation between system and mean expert ratings
– If above the upper critical value: system significantly (statistically) better than novice judges
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
18
Outline
Problem definition and motivations
Data
System and Features
Experimental Results
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
19
Experiments: Methodology
Compared regression algorithms
Feature ablation experiments
Results: 13-fold cross-validation – Balanced genre representation
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
20
Results: Regression Algorithms
Choice of regression algorithm is not critical.
00.10.20.30.40.50.60.70.80.9
1
BaggedDecision Tree
LinearRegression
SVMRegression
GaussianProcess
Regression
Decsion Trees
Correlation
Distribution MeanUpper Critical Value
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
21
Results: Feature Sets
Correlation
00.10.20.30.40.50.60.70.80.9
1
All Lexical Syntactical Lexical +Syntactical
LM Based
Distribution MeanUpper Critical Value
Each feature set contributes, LM-based feature set: most useful.
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
22
Results: Genre-based Feature Sets
00.10.20.30.40.50.60.70.80.9
1
All Genre-independent Genre-based
Correlation
Distribution MeanUpper Critical Value
Genre-independent features: better than novice mean;Genre-specific features: significantly improve performance.
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
23
Results: Individual Feature Sets
00.10.20.30.40.50.60.70.80.9
1
All Sundance ESG Perp. Post.Perp.
OOV rates
By itself Ablated from all
Correlation Distribution MeanUpper Critical Value
Posterior perplexities: best feature set, but no single feature set is indispensable.
System using all features
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
24
Official Evaluation
Conducted by SAIC on behalf of DARPA
Three teams participated
Evaluation task: Predict readability of 150 test documents using the 390 documents for training
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
25
Official Evaluation Results
Our system performed favorably and scored betterthan the upper critical value.
00.10.20.30.40.50.60.70.80.9
1
Our System System B System C
Upper Critical Value
Correlation
Novice mean
Sig. better than human at p<0.0001
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
26
Readability system– Regression over syntactical, lexical and language model features
All features contribute, but LM features are most useful
System is significantly (statistically) better than novice human judges
Conclusions
© 2010 IBM Corporation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
27
Questions??
Thank You!