ACTIVE LEARNING FOR TEXT CLASSIFICATION
description
Transcript of ACTIVE LEARNING FOR TEXT CLASSIFICATION
![Page 1: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/1.jpg)
ACTIVE LEARNING FOR ACTIVE LEARNING FOR TEXT CLASSIFICATIONTEXT CLASSIFICATIONAnkit Bhutani Y9094
![Page 2: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/2.jpg)
AUTOMATIC TEXT CLASSIFICATION
A FEW HOURS ONLY
MANUAL TEXT CLASSIFICATIONTAKES YEARS
![Page 3: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/3.jpg)
ORGANIZING LARGE ORGANIZING LARGE VOLUMES OF TEXTVOLUMES OF TEXTMassive volume of online text
available.Organisation into categories to
enable efficient search.Find use in a lot of applications like
Data Mining, Automatic Query Answer, Learning User Interest, Making Suggestions, etc.
Learning Approaches : unsupervised, supervised and semi-supervised.
![Page 4: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/4.jpg)
Terms UsedTerms UsedMultinomial Naïve Bayes :
◦Documents in bag of words format◦Independence assumptions
![Page 5: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/5.jpg)
Terms UsedTerms UsedSemi-Supervised Learning :
◦Makes use of Labeled as well as Unlabeled Data to learn the parameters of the model.
Expectation Maximization :◦Class of Iterative Algorithms for
Maximum Likelihood Estimation in problems with incomplete data
Parameters of the model
Document labels
Provide Soft Labels to Documents based on estimated model parameters
Re-estimate the model parameters based on the
soft labels
![Page 6: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/6.jpg)
Terms usedTerms usedActive Learning :
◦Form of supervised machine learning◦Learning Algorithm is able to
interactively query the user◦Query has associated cost.◦Algorithm requests label for document
such that gain in information about model parameters is maximized
But how to choose which DOCUMENT to request for
Label???
![Page 7: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/7.jpg)
Terms UsedTerms UsedQuery by Committee :
◦Divide the training set into 4 – 5 sets.◦Each set as member gives
probability estimates.◦Maximum disagreement measured
by maximum average KL divergence between all pairs
![Page 8: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/8.jpg)
Terms UsedTerms UsedSemi-Supervised Frequency
Estimate (SFE) :◦Slight variation in basic EM :
Different parameters re-estimation formula.
![Page 9: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/9.jpg)
NOTICABLE WORK: NOTICABLE WORK: Semi-Supervised LearningSemi-Supervised LearningNigam et al, 1998-99 :
◦MNB + EM◦100 Labeled + 2500 Unlabeled
documents◦80 – 85 % accuracy
Nigam & McCullum, 2000 : ◦MNB + EM + Active Learning◦Total 1000 Documents◦Label requests : 50, Accuracy :
~90%
![Page 10: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/10.jpg)
NOTICABLE WORK: NOTICABLE WORK: Semi-Supervised LearningSemi-Supervised LearningLYRL, 2004 :
◦Compared various Semi-supervised Learning Techniques
◦Introduced Reuters Corpus as a new benchmark
Su Shirabad and Matwin, 2011 : ◦MNB + SFE
![Page 11: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/11.jpg)
My workMy workMNB + SFE + Active Learning
◦Data-set: Reuters Corpus from LYRL 2004: contains around 8 lakh documents
◦Experiments on 10,000 documents starting with : 50 Labeled Documents + 100 requests 100 Labeled Documents + 50 requests
![Page 12: ACTIVE LEARNING FOR TEXT CLASSIFICATION](https://reader035.fdocuments.net/reader035/viewer/2022062314/56814a5b550346895db77f56/html5/thumbnails/12.jpg)
Results so farResults so far