© 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J....

27
© 2010 IBM Corporation earning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J. Mooney, Salim Roukos, Chris Welty Presented by: Young-Suk Lee The University of Texas at Austin IBM T. J. Watson Research Center

description

© 2010 IBM Corporation 3 Readability  DARPA machine reading program (MRP)  “Readability is defined as a subjective judgment of how easily a reader can extract the information the writer or the speaker intended to convey.”  Task: given a general document, assign a readability score (1 to 5)

Transcript of © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J....

Page 1: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

Learning to Predict Readability using Diverse Linguistic Features

Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J. Mooney, Salim Roukos, Chris Welty

Presented by: Young-Suk Lee

The University of Texas at Austin IBM T. J. Watson Research Center

Page 2: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

2

Outline

Problem definition and motivations

Data

System and Features

Experimental Results

Page 3: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

3

Readability

DARPA machine reading program (MRP)

“Readability is defined as a subjective judgment of how easily a reader can extract the information the writer or the speaker intended to convey.”

Task: given a general document, assign a readability score (1 to 5)

Page 4: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

4

Sample Passage: High Readability

Industrial agriculture has grown increasingly paradoxical, replacing natural processes with synthetic practices and treating farms as factories. Consequently, food has become a marketing entity rather than a necessity to sustain life. …

Page 5: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

5

Sample Passage: Low Readability

The word of the prince of believers may Allah God him Talk of gold this at present Reflections on the word of the prince of believers may Allah pleased with him, Prince of Believers May Allah be pleased with him: …

Page 6: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

6

Remove less readable documents from web-search

Filter out less readable documents before extracting knowledge

Select reading materials

Readability: Motivations

Page 7: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

7

Predicting readability: conveying message– vs. reading difficulty (grade 1 to 12)

Document sources: multiple genres – vs. single domain, genre or reader group

Contrast With Other Work

Page 8: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

8

Outline

Problem definition and motivations

Data

System and Features

Experimental Results

Page 9: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

9

Data

390 training documents

Each document:– 8 expert ratings: [1,..,5]– 6-10 “novice” ratings: [1,…,5]

Ratings differ by genre– Nwire and wiki documents: high– MT documents: low

Genre #Docs Expert Rating

Novice

Ratingnwire 56 4.93 4.23

wiki 56 4.83 4.13

weblog 55 4.46 3.75

q-trans 56 4.47 3.83

news-grp 55 4.26 3.34

ccap 56 4.13 3.53mt 56 2.38 1.92

Page 10: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

10

Data

MT docs

0

50

100

150

200

250C

ount

1 2 3 4 5Rating

Histogram of Novice Ratings

nwwkwlqtngccmt

MT docsMT docs

ng: newsgroupSpeech: closed

caption

Page 11: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

11

Outline

Problem definition and motivations

Data

System and Features

Experimental Results

Page 12: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

12

System Overview

Training Docs

Preprocessing

LM score

Parser score

…Regression

(WEKA)

Test Doc

Sys. Rating

Page 13: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

13

Syntactical Features

Using Sundance [Riloff &Phillips 04] and English Slot Grammer parsers

– Ratio of sentences without verbs– Avg. # clauses/per sentence– Avg. #NPs, #VPs, #PPs, #Phrases/sent, – Failure rate of ESG parser– ..

Page 14: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

14

Language Model (LM) Features

Normalized document probability:– by a 5-gram generic LM

Genre-specific LMs– Data readily available for those genres– Certain genre is a strong predictor of readability

Page 15: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

15

Genre-based Language Model Features

Perplexity of genre-specific LM (Mj):

Genre posterior perplexity (relative probability compared to all G genres):

DocumentHistory words

Word

Page 16: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

16

Lexical Features

Fraction of known words using dictionary and gazetteer of names

Out-of-vocabulary (OOV) rates using genre-based corpora

Ratio of function words (“the”, “of” etc.)

Ratio of pronouns

Page 17: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

17

Experiments: Evaluation Metric

Pearson correlation coefficient – Mean expert judge rating as the gold-standard

To compare with novice judges:– A sampling distribution representing performance of novice judges

was generated

– Distribution mean and upper critical value were computed Correlation between system and mean expert ratings

– If above the upper critical value: system significantly (statistically) better than novice judges

Page 18: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

18

Outline

Problem definition and motivations

Data

System and Features

Experimental Results

Page 19: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

19

Experiments: Methodology

Compared regression algorithms

Feature ablation experiments

Results: 13-fold cross-validation – Balanced genre representation

Page 20: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

20

Results: Regression Algorithms

Choice of regression algorithm is not critical.

00.10.20.30.40.50.60.70.80.9

1

BaggedDecision Tree

LinearRegression

SVMRegression

GaussianProcess

Regression

Decsion Trees

Correlation

Distribution MeanUpper Critical Value

Page 21: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

21

Results: Feature Sets

Correlation

00.10.20.30.40.50.60.70.80.9

1

All Lexical Syntactical Lexical +Syntactical

LM Based

Distribution MeanUpper Critical Value

Each feature set contributes, LM-based feature set: most useful.

Page 22: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

22

Results: Genre-based Feature Sets

00.10.20.30.40.50.60.70.80.9

1

All Genre-independent Genre-based

Correlation

Distribution MeanUpper Critical Value

Genre-independent features: better than novice mean;Genre-specific features: significantly improve performance.

Page 23: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

23

Results: Individual Feature Sets

00.10.20.30.40.50.60.70.80.9

1

All Sundance ESG Perp. Post.Perp.

OOV rates

By itself Ablated from all

Correlation Distribution MeanUpper Critical Value

Posterior perplexities: best feature set, but no single feature set is indispensable.

System using all features

Page 24: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

24

Official Evaluation

Conducted by SAIC on behalf of DARPA

Three teams participated

Evaluation task: Predict readability of 150 test documents using the 390 documents for training

Page 25: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

25

Official Evaluation Results

Our system performed favorably and scored betterthan the upper critical value.

00.10.20.30.40.50.60.70.80.9

1

Our System System B System C

Upper Critical Value

Correlation

Novice mean

Sig. better than human at p<0.0001

Page 26: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

26

Readability system– Regression over syntactical, lexical and language model features

All features contribute, but LM features are most useful

System is significantly (statistically) better than novice human judges

Conclusions

Page 27: © 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

© 2010 IBM Corporation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

27

Questions??

Thank You!