ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB...
Transcript of ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB...
![Page 1: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/1.jpg)
ML for IR: Sentiment Analysis and Multi-label Categorization
Jay Aslam Cheng Li, Bingyu Wang, Virgil Pavlu
Northeastern University
![Page 2: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/2.jpg)
An Empirical Study of Skip-gram Features and Regularization for Learning on Sentiment Analysis
![Page 3: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/3.jpg)
Sentiment Analysis
An Amazon Product Review
![Page 4: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/4.jpg)
Sentiment Analysis
An Amazon Product Review
Sentiment Analysis
Positive
![Page 5: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/5.jpg)
An IMDB Movie Review
Sentiment AnalysisSentiment Analysis
![Page 6: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/6.jpg)
An IMDB Movie Review
Sentiment Analysis
Negative
Sentiment AnalysisSentiment Analysis
![Page 7: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/7.jpg)
Review Document
Sentiment Analysis with Binary Text Classification Pipeline
![Page 8: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/8.jpg)
Review Document
Feature Vector <waste:1, like:2, disappoint:1, worse:1,...>
Sentiment Analysis with Binary Text Classification Pipeline
![Page 9: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/9.jpg)
Review Document
Feature Vector <waste:1, like:2, disappoint:1, worse:1,...>
Classifier Logistic Regression, SVM
Sentiment Analysis with Binary Text Classification Pipeline
![Page 10: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/10.jpg)
Review Document
Feature Vector <waste:1, like:2, disappoint:1, worse:1,...>
Classifier Logistic Regression, SVM
Prediction Positive, Negative
Sentiment Analysis with Binary Text Classification Pipeline
![Page 11: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/11.jpg)
Review Document
Feature Vector <waste:1, like:2, disappoint:1, worse:1,...>
Classifier Logistic Regression, SVM
Prediction Positive, Negative
Bottleneck!
Sentiment Analysis with Binary Text Classification Pipeline
![Page 12: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/12.jpg)
• Unigram (bag of words) capture sentiment indicator terms
<I:1, like:1, this:1, movie:1>
Positive
Logistic Regression
Text Representation Issues in Sentiment Analysis
![Page 13: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/13.jpg)
• Unigram (bag of words) capture sentiment indicator termscould not capture negations
<I:1, don't:1, like:1, this:1, movie:1>
Positive
Logistic Regression
Text Representation Issues in Sentiment Analysis
![Page 14: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/14.jpg)
• Unigram (bag of words) capture sentiment indicator termscould not capture negations
• Add Bi-gramscapture negation-polarity word pairs
<I:1, don't:1, like:1, I don't:1, don't like:1,...>
Negative
Logistic Regression
Text Representation Issues in Sentiment Analysis
![Page 15: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/15.jpg)
• Unigram (bag of words) capture sentiment indicator termscould not capture negations
• Add Bi-gramscapture negation-polarity word pairscapture two-words sentiment phrases
<how:1, could:1, sit through:1, anyone sit:1,...>
Negative
Logistic Regression
Text Representation Issues in Sentiment Analysis
![Page 16: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/16.jpg)
• Unigram (bag of words) capture sentiment indicator termscould not capture negations
• Add Bi-gramscapture negation-polarity word pairscapture two-words sentiment phrases
<waste time:2, waste:2, money:1,...>
Negative
Logistic Regression
Text Representation Issues in Sentiment Analysis
![Page 17: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/17.jpg)
• Unigram (bag of words) capture sentiment indicator termscould not capture negations
• Add Bi-gramscapture negation-polarity word pairscapture two-words sentiment phrases
• Add tri-grams,quad-grams... capture sentiment phrases with many words
Text Representation Issues in Sentiment Analysis
![Page 18: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/18.jpg)
• Many variations"
Difficulty with High Order n-grams
![Page 19: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/19.jpg)
• Many variations"
increase the dimensionality
Difficulty with High Order n-grams
![Page 20: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/20.jpg)
• Many variations"
increase the dimensionality
• rare cases" 676 times in IMDB" 6 times " 4 times
Difficulty with High Order n-grams
![Page 21: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/21.jpg)
• Many variations"
increase the dimensionality
• rare cases" 676 times in IMDB" 6 times " 4 times
insufficient data for parameter estimation
Difficulty with High Order n-grams
![Page 22: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/22.jpg)
• n-gram templates matched loosely
• Looseness parameterized by slop, the number of additional words
• n-gram = skip-gram with slop 0
Skip-grams
![Page 23: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/23.jpg)
Skip-gram Examples
![Page 24: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/24.jpg)
• Group infrequent n-grams into a frequent skip-gram
• Allow n-grams to borrow strength from each other
• Easier learning
• Better generalization
Advantages of Skip-grams
![Page 25: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/25.jpg)
• Huge number
• Many are non-informative or noisy skip-gram " with slop 2 can match both " and "
Difficulties with Skip-grams
![Page 26: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/26.jpg)
• Ask human assessors to pick informative skip-grams
limited by available domain knowledge
expensive
• Build dense word vectors on top of skip-grams
information loss
less interpretable
Existing Use of Skip-grams in Sentiment Analysis
![Page 27: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/27.jpg)
• Test whether skip-grams are helpful when used directly as features in sentiment analysis
• Test different automatic regularization/feature selection strategies
• Compare against n-grams and word vectors
Goal of this Study
![Page 28: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/28.jpg)
• Consider skip-grams with n<=5 and slop<=2 (5-grams with 2 additional words in between)
• Discard skip-grams with very low frequencies (<=5)
max n max slop # skip-grams on IMDB1 0 2x10^42 0 1x10^53 0 2x10^55 0 4x10^52 1 3x10^53 1 9x10^55 1 1x10^62 2 6x10^53 2 2x10^65 2 3x10^6
Skip-gram Extraction
![Page 29: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/29.jpg)
Skip-gram features: huge number, correlated • L1: ✓ shrink weights ✓ select a subset of features
select one out of several correlated features
• L2: ✓ shrink weights
use all features
✓ spread weight among correlated features
compact model
robust model
L1 vs L2 regularization
![Page 30: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/30.jpg)
• L1+L2:
✓ shrink weights
✓ select a subset of features
✓ spread weight among correlated features
compact model
robust model
L1 vs L2 regularization
![Page 31: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/31.jpg)
• L2-regularized linear SVM
• L1-regularized linear SVM
• L2-regularized Logistic Regression
• L1-regularized Logistic Regression
• L1+L2-regularized Logistic Regression
Learning and Regularization
![Page 32: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/32.jpg)
dataset positive negative
IMDB 25,000 reviews with ratings 7-10
25,000 reviews with ratings 1-4
Amazon Baby Product
136,461 reviews with ratings 4-5
32,950 reviews with ratings 1-2
Amazon Phone Product
47,970 reviews with ratings 4-5
22,241 reviews with ratings 1-2
Binary classification with neutral reviews ignored
Datasets
![Page 33: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/33.jpg)
L2 LR L1 LR L1+L2 LR
Classification Accuracy with Skip-gram Features
![Page 34: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/34.jpg)
• Blue line: moving from unigrams to bigrams gives substantial improvement
L2 LR L1 LR L1+L2 LR
Classification Accuracy with Skip-gram Features
![Page 35: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/35.jpg)
• Blue line: moving from unigrams to bigrams gives substantial improvement
• Blue line: using high-order n-grams gives marginal improvement
L2 LR L1 LR L1+L2 LR
Classification Accuracy with Skip-gram Features
![Page 36: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/36.jpg)
• Blue line: moving from unigrams to bigrams gives substantial improvement
• Blue line: using high-order n-grams gives marginal improvement
• Green and red lines: increasing slop from 0 to 1 and 2 gives further improvement
L2 LR L1 LR L1+L2 LR
Classification Accuracy with Skip-gram Features
![Page 37: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/37.jpg)
• Blue line: moving from unigrams to bigrams gives substantial improvement
• Blue line: using high-order n-grams gives marginal improvement
• Green and red lines: increasing slop from 0 to 1 and 2 gives further improvement
• max # features selected: L2: 10^6, L1: 10^4, L1+L2: 10^5
L2 LR L1 LR L1+L2 LR
Classification Accuracy with Skip-gram Features
![Page 38: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/38.jpg)
# Features Used vs Accuracy
![Page 39: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/39.jpg)
• L2: achieves better overall accuracy
- Large training sets facilitate parameter estimation
- Effective handling of correlated features
• L1: produces much smaller models
• L1+L2: good compromise
Observations on L1 vs L2
![Page 40: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/40.jpg)
all features selected features weighted features
Skip-gram Feature Contribution
![Page 41: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/41.jpg)
• Comparing left with middle: the fraction of unigrams increases;the fraction of slop 2 trigrams decreases. Many slop 2 trigrams are eliminated by L1.
all features selected features weighted features
Skip-gram Feature Contribution
![Page 42: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/42.jpg)
• Comparing left with middle: the fraction of unigrams increases;the fraction of slop 2 trigrams decreases. Many slop 2 trigrams are eliminated by L1.
• In right: The standard n-grams with slop=0 only contribute to 20% of the total weight, and the remaining 80% is due to skip-grams with non-zero slops.
all features selected features weighted features
Skip-gram Feature Contribution
![Page 43: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/43.jpg)
skip-gram word vector
AMAZON BABY 96.85 88.84
AMAZON PHONE 92.58 85.38
IMDB 91.26 92.58 / 85.0
- Word vectors work extremely well on the given test set (92.58%), but poorly on random test sets (85%).
Comparison with Word Vectors
![Page 44: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/44.jpg)
•
- Among the methods which only use labeled data, skip-grams achieved the highest accuracy
Other Results on IMDB
![Page 45: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/45.jpg)
• Skip-grams group similar n-grams together, facilitating learning and generalization
• Using skip-grams achieves good sentiment analysis performance
• L1+L2 regularization reduces the number of features significantly while maintaining good accuracy
• Our code is available at:https://github.com/cheng-li/pyramid
Conclusion
![Page 46: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/46.jpg)
Conditional Bernoulli Mixtures for Multi-label Classification
![Page 47: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/47.jpg)
Task: Multi-label Classification
I binary classification: 1 out of 2
I multi-class classification: 1 out of many
Imulti-label classification: many out of many
![Page 48: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/48.jpg)
Multi-label Classification: Example
News Article Categorization
Internet 3, crime 7, NFL 3, government 7, Asia 7,sports 3, politics 7, sports business 3, Twitter 3
![Page 49: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/49.jpg)
Multi-label Classification: Example
Image Tagging
airport 7, animal 7, clouds 3, book 7, lake 3,
sunset 3, sky 3, cars 7, water 3, reflection 3
![Page 50: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/50.jpg)
Multi-label Classification: Mathematical Formulation
x
h�! y = [
length Lz }| {1, 0, 0, 1, 0, ..., 1]
L: # candidate labelsx: instancey: label subset, written as binary vector of length Ly` = 1 if label ` occurs
![Page 51: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/51.jpg)
Naive Approach: Predict Each Label Independently
Binary Relevance: not always effective
Iwater: easy to predict directly
Ireflection: hard to predict directly (based on the givenfeature representation)
![Page 52: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/52.jpg)
Better Solution: Exploit Label Dependencies
let easy labels help difficult labels
Iwater: easy to predict directly
Ireflection: often co-occurs with water
![Page 53: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/53.jpg)
How to Model Label Dependencies?
Existing approaches
I Power-Set: treat each subset as a class + multi-class� 2L ) poor scalability; cannot predict unseen subsets
![Page 54: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/54.jpg)
How to Model Label Dependencies?
Existing approaches
I Power-Set: treat each subset as a class + multi-class� 2L ) poor scalability; cannot predict unseen subsets
I Conditional Random Field: manually specify labeldependencies with a graphical model� only model specified (e.g., all pair-wise) dependencies
![Page 55: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/55.jpg)
How to Model Label Dependencies?
Existing approaches
I Power-Set: treat each subset as a class + multi-class� 2L ) poor scalability; cannot predict unseen subsets
I Conditional Random Field: manually specify labeldependencies with a graphical model� only model specified (e.g., all pair-wise) dependencies
I Classifier Chain: h(x, y1, y2, ..., y`�1) ! y`� hard to predict the jointly most probable subset
![Page 56: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/56.jpg)
Proposed Model: Conditional Bernoulli Mixtures
Idea: approximate p(y|x) by a Conditional Bernoulli Mixture(CBM) with fully factorized mixture components
![Page 57: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/57.jpg)
Proposed Model: Conditional Bernoulli Mixtures
Idea: approximate p(y|x) by a Conditional Bernoulli Mixture(CBM) with fully factorized mixture components
I Step 1. write p(y) as a mixture
Mixture: p(y) =KX
k=1
⇡kp(y;�k )
![Page 58: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/58.jpg)
Proposed Model: Conditional Bernoulli Mixtures
Idea: approximate p(y|x) by a Conditional Bernoulli Mixture(CBM) with fully factorized mixture components
I Step 1. write p(y) as a mixture
Mixture: p(y) =KX
k=1
⇡kp(y;�k )
I Step 2: factorize component density
Bernoulli Mixture: p(y) =KX
k=1
⇡kLY
`=1
b(y`;�k` )
![Page 59: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/59.jpg)
Proposed Model: Conditional Bernoulli Mixtures
Idea: approximate p(y|x) by a Conditional Bernoulli Mixture(CBM) with fully factorized mixture components
I Step 1. write p(y) as a mixture
Mixture: p(y) =KX
k=1
⇡kp(y;�k )
I Step 2: factorize component density
Bernoulli Mixture: p(y) =KX
k=1
⇡kLY
`=1
b(y`;�k` )
I Step 3: condition on x
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
![Page 60: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/60.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
⇡(z = k |x;↵): probability of assigning x to component k ;instantiated with a multi-class classifiere.g., multinomial logistic regression with weight ↵
b(y`|x;�k` ): probability of x having label y` in component k ;
instantiated with a binary classifiere.g., binary logistic regression with weight �k
` .
Prediction: argmaxy
p(y|x)
![Page 61: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/61.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
� Property 1: automatically capture label dependencies� Property 2: a flexible reduction method� Property 3: easily adjust the complexity by changing the
number of components K� Property 4: simple training with EM� Property 5: fast prediction by dynamic programming
![Page 62: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/62.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Property 1: automatically capture label dependencies
p(y|x) 6=LY
`=1
p(y`|x)
analogy: Gaussian mixture with fully factorized componentscan represent a complex joint
![Page 63: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/63.jpg)
Proposed Model: Conditional Bernoulli Mixtures
Property 1: capture label dependencies – illustration
I p(y|x) estimation provided by CBMI showing only top 4 components; row = component;
bar = individual label probability; ⇡ = mixture coefficient
![Page 64: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/64.jpg)
Proposed Model: Conditional Bernoulli Mixtures
Property 1: capture label dependencies – illustration
I marginal probability = averaging bars weighted by ⇡
I p(water|x) = 0.69, p(lake|x) = 0.56, p(sunset|x) = 0.66I p(reflection|x) = 0.32
) missed by independent prediction �
![Page 65: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/65.jpg)
Proposed Model: Conditional Bernoulli Mixtures
Property 1: capture label dependencies – illustration
Ireflection is positively correlated with lake, water, andsunset;p(y|x) ) ⇢
reflection,lake
= 0.5, ⇢reflection,water
= 0.4,⇢reflection,sunset
= 0.17
![Page 66: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/66.jpg)
Proposed Model: Conditional Bernoulli Mixtures
Property 1: capture label dependencies – illustration
p({clouds, lake, sky, sunset, water, reflection}|x) = 0.09p({clouds, lake, sky, sunset, water}|x) = 0.06
) predicting the most probable subset includes reflection �
![Page 67: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/67.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Property 2: a flexible reduction method
I multi-label ) multi-class + binaryI instantiated by many binary/multi-class classifiers
e.g., logistic regressions, gradient boosted trees, neuralnetworks
![Page 68: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/68.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Property 3: easily adjust the complexity by changing thenumber of components K
![Page 69: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/69.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Property 4: Simple Training with EMIdea:
I maximum likelihoodI hidden variables ) EMI update parameters ) binary and multi-class classifier
learning
![Page 70: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/70.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Property 5: Fast Prediction by Dynamic ProgrammingA common difficulty in prediction:
I how to find argmaxy
p(y|x) without enumerating 2L
possibilities of y?
![Page 71: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/71.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Property 5: Fast Prediction by Dynamic ProgrammingA common difficulty in prediction:
I how to find argmaxy
p(y|x) without enumerating 2L
possibilities of y?
Existing solutions used in Power-Set, CRF, and ClassifierChain:� restrict to y in training set ) will not predict unseen y
� approximate inference ) suboptimal
![Page 72: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/72.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Property 5: Fast Prediction by Dynamic ProgrammingA common difficulty in prediction:
I how to find argmaxy
p(y|x) without enumerating 2L
possibilities of y?
Existing solutions used in Power-Set, CRF, and ClassifierChain:� restrict to y in training set ) will not predict unseen y
� approximate inference ) suboptimalCBM:� efficiently find the exact argmax
y
p(y|x) by DP
![Page 73: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/73.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Summary
� Property 1: automatically capture label dependencies� Property 2: a flexible reduction method� Property 3: easily adjust the complexity by changing the
number of components K� Property 4: simple training with EM� Property 5: fast prediction by dynamic programming
![Page 74: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/74.jpg)
Experimental Results on Benchmark Datasets
I 5 Datasets of different types
dataset SCENE RCV1 TMC2007 MEDIAMILL NUS-WIDEdomain image text text video image
#labels / #label subsets 6 / 15 103 / 799 22 / 1341 101 / 6555 81 / 18K#features / #datapoints 294 / 2407 47K / 6000 49K / 29K 120 / 44K 128 / 270K
![Page 75: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/75.jpg)
Experimental Results on Benchmark Datasets
I 5 Datasets of different types
dataset SCENE RCV1 TMC2007 MEDIAMILL NUS-WIDEdomain image text text video image
#labels / #label subsets 6 / 15 103 / 799 22 / 1341 101 / 6555 81 / 18K#features / #datapoints 294 / 2407 47K / 6000 49K / 29K 120 / 44K 128 / 270K
I 2 instantiations of CBM: LR and GB
![Page 76: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/76.jpg)
Experimental Results on Benchmark Datasets
I 5 Datasets of different types
dataset SCENE RCV1 TMC2007 MEDIAMILL NUS-WIDEdomain image text text video image
#labels / #label subsets 6 / 15 103 / 799 22 / 1341 101 / 6555 81 / 18K#features / #datapoints 294 / 2407 47K / 6000 49K / 29K 120 / 44K 128 / 270K
I 2 instantiations of CBM: LR and GBI 8 baselines: BinRel, PowSet, CC, PCC, ECC-label,
ECC-subset, CDN, pairCRF
![Page 77: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/77.jpg)
Experimental Results on Benchmark Datasets
I 5 Datasets of different types
dataset SCENE RCV1 TMC2007 MEDIAMILL NUS-WIDEdomain image text text video image
#labels / #label subsets 6 / 15 103 / 799 22 / 1341 101 / 6555 81 / 18K#features / #datapoints 294 / 2407 47K / 6000 49K / 29K 120 / 44K 128 / 270K
I 2 instantiations of CBM: LR and GBI 8 baselines: BinRel, PowSet, CC, PCC, ECC-label,
ECC-subset, CDN, pairCRFI evaluation measure: subset accuracy
![Page 78: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/78.jpg)
Experimental Results on Benchmark Datasets
dataset SCENE RCV1 TMC2007 MEDIAMILL NUS-WIDEdomain image text text video image
#labels / #label subsets 6 / 15 103 / 799 22 / 1341 101 / 6555 81 / 18K#features / #datapoints 294 / 2407 47K / 6000 49K / 29K 120 / 44K 128 / 270K
Method LearnerBinRel LR 51.5 40.4 25.3 9.6 24.7PowSet LR 68.1 50.2 28.2 9.0 26.6
CC LR 62.9 48.2 26.2 10.9 26.0PCC LR 64.8 48.3 26.8 10.9 26.3
ECC-label LR 60.6 46.5 26.0 11.3 26.0ECC-subset LR 63.1 49.2 25.9 11.5 26.0
CDN LR 59.9 12.6 16.8 5.4 17.1pairCRF linear 68.8 46.4 28.1 10.3 26.4
CBM LR 69.7 49.9 28.7 13.5 27.3
I with LR learner, CBM is the best on 4 out of 5 datasets
![Page 79: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/79.jpg)
Experimental Results on Benchmark Datasetsdataset SCENE RCV1 TMC2007 MEDIAMILL NUS-WIDEdomain image text text video image
#labels / #label subsets 6 / 15 103 / 799 22 / 1341 101 / 6555 81 / 18K#features / #datapoints 294 / 2407 47K / 6000 49K / 29K 120 / 44K 128 / 270K
Method LearnerBinRel LR 51.5 40.4 25.3 9.6 24.7PowSet LR 68.1 50.2 28.2 9.0 26.6
CC LR 62.9 48.2 26.2 10.9 26.0PCC LR 64.8 48.3 26.8 10.9 26.3
ECC-label LR 60.6 46.5 26.0 11.3 26.0ECC-subset LR 63.1 49.2 25.9 11.5 26.0
CDN LR 59.9 12.6 16.8 5.4 17.1pairCRF linear 68.8 46.4 28.1 10.3 26.4
CBM LR 69.7 49.9 28.7 13.5 27.3BinRel GB 59.3 30.1 25.4 11.2 24.4PowSet GB 70.5 38.2 23.1 10.1 23.6
CBM GB 70.5 43.0 27.5 14.1 26.5
I replace LR with GB ) further improvements on 2 datasetsSCENE: 69.7!70.5; MEDIAMILL: 13.5!14.1
I use different learners for different applications
![Page 80: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/80.jpg)
Conclusion
I proposed a new multi-label model CBMI enjoys many nice propertiesI performs well on real dataI code available at
https://github.com/cheng-li/pyramid
![Page 81: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/81.jpg)
Thank You
![Page 82: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/82.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Property 4: Simple Training with EMDenote the posterior membership distribution p(zn|xn, yn) as�(zn) = (�1
n , �2n , ..., �
Kn ).
E step: Re-estimate posterior membership probabilities:
�kn =
⇡(zn = k |xn;↵)QL
`=1 b(yn`|xn;�k` )PK
k=1 ⇡(zn = k |xn;↵)QL
`=1 b(yn`|xn;�k` )
![Page 83: ML for IR: Sentiment Analysis and Multi-label Categorization · dataset positive negative IMDB 25,000 reviews with ratings 7-10 25,000 reviews with ratings 1-4 Amazon Baby Product](https://reader034.fdocuments.net/reader034/viewer/2022050410/5f86b119503701042a5dfd8e/html5/thumbnails/83.jpg)
Proposed Model: Conditional Bernoulli Mixtures
CBM: p(y|x) =KX
k=1
⇡(z = k |x;↵)LY
`=1
b(y`|x;�k` )
Property 4: Simple Training with EMM step: Update model parameters. Decompose into simpleclassification problems:
↵new = argmin↵
NX
n=1
KL(�(zn)||⇡(zn|xn;↵))
(multi-class classification with soft target labels)
�k` new = argmin
�k`
NX
n=1
�knKL(Ber(Yn`; yn`)||b(Yn`|xn;�
k` ))
(weighted binary classification)