Predicting Click Through Rate for Job Listings Manish Gupta Yahoo! HotJobs Jan 22, 2009.
-
Upload
francine-george -
Category
Documents
-
view
217 -
download
2
Transcript of Predicting Click Through Rate for Job Listings Manish Gupta Yahoo! HotJobs Jan 22, 2009.
Predicting Click Through Rate for Job Listings
Manish Gupta
Yahoo! HotJobs
Jan 22, 2009
CTR and its applications
• CTR = Ratio of clicks to get full description of entity to views of a reduced version
• Rank results• Impacts publisher revenue in pay for perf
models• Bidding in ad exchanges• Trends can help detect click frauds
CTR for new job listings
• Avg CTR = 2.29%• MLE would have high variance
CTR for job listings
Related work• Regelson and Fain – Estimate CTR using topic clusters (job categories)
• Richardson et. al.– Describe features for predicting CTR for ads.
• Our baseline: avg CTR for a test job (2.29%)
Refined Problem definition
• Ideal: Predict CTR(job j, position p, user cluster u, context c)
Data sparsity Huge feature vector• Predict CTR(job)
Use CTR versus position curve• Predict CTR(job, position)
Data set
• Used HotJobs data from Aug 11, 2008 to Aug 31, 2008 to predict CTR of jobs on Sep 1, 2008
• 40K jobs from 7k+ companies• 32K train set and 8K as test set• Jobs have location, company name, category,
creation date, posting date, optional position wise click history, job source, title, snippet & job description.
Different models
• Weka: Linear Regression and SMOReg• Treenet: Gradient Boosted Decision Trees
• Feature selection:– Weka: wrapper with evaluator=linear regression
and search=GreedyStepwise– Treenet: Variable importance metrics
Features
• Features from Similar Jobs (60)– CTR of jobs with same
title/company/state/city+state/category and their cardinalities posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks
• Features from Related Jobs (288) – CTR_mn of related jobs with m= |A-B| and
n=|B-A| and cardinalities (0 ≤m,n≤ 5) posted in past one/two weeks or all jobs based on the click history of past one/two/three weeks
Features
• Job Title Features (11)– #words, #capitalized words, isAllCaps, hasHighPunct,
hasLongWords, hasNumbers, vocabulory features• Daily CTR Features for past 3 weeks (21)• Other Features (10)– Job Category, age, location specificity, job source, and
job description page features• Other potential features– high-marketing-pitch words, brand value of company,
spam feedback, seasonal variations
Experiments and results• Baseline: Predict avg CTR for a test job (2.29%)• Predicting avg - category-wise – CTR (A)• Linear Regression over 390 features (B) – uses only 142 regressors.• GBDT using Treenet over 390 features (C) – uses 300 regressors. (at
256_600_0.01_100)
Analysis of regressor distribution
Important features
• Similar Jobs features– Same company, title, city+state using 1 week click
history• Others features– Creation date, job description page size, date of
update, posting date, job category• Related Jobs features– Related_11, related_12 jobs posted in past 1/3
weeks over 1/3 week click history
Pruning the feature set
Pruning the feature set
• Wrapper based feature selection with linear regression and with Treenet’s variable importance (E) -11 features.
In absence of click history …
• Linear regression with 369 features (F) – uses 187 regressors.
• Treenet uses 282 regressors at 256_600_0.01_20 (G)
Analysis of regressor distribution
None of the sets alone helps!
Pruning the feature set
Variable importance curves
Conclusion and future work• More features• Dyadic models to predict user-personalized CTR with
(job feature vector, user feature vector) dyads.• Auto model updates to correct model drift
• We built a machine learning system to predict CTR for job listings and presented our results using various regression metrics.
Thanks for your time