of 42

  • date post

  • Category


  • view

  • download


Embed Size (px)


NLP Course Seminar. WEB PERSONALIZATION. Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015) ‏. Roadmap. Motivation Introduction The Personalization Process Personalization Approaches Personalization Techniques Issues Conclusion. Motivation. Some Facts - PowerPoint PPT Presentation


  • WEB PERSONALIZATIONNLP Course SeminarGroup 14 Vishaal Jatav (04d05013) Varun Garg (04d05015)

  • RoadmapMotivationIntroductionThe Personalization ProcessPersonalization ApproachesPersonalization TechniquesIssuesConclusion

  • MotivationSome FactsOverwhelming amount of information on webNot all the documents are relevant to the userUsers cannot convey their information needsUsers never find any document 100% relevantUsers expect more personal behaviorI don't want results of Delhi when I am in Bombay.I was looking for crane (the bird) not crane (the machine).

  • Google Customization

  • Google (without personalization)

  • Google (with personalization)

  • Google Search History

  • Google Search History

  • IntroductionPersonalizationReact differently to different usersSystem reacts in a way the users want it toUltimately bring back the user to the systemWeb PersonalizationApply machine learning and data miningBuild models of user behavior (called profiles)Predict user's needs and expectationsAdaptively estimate better models

  • The Personalization ProcessConsider the following pieces of informationGeographical LocationAge, gender, ethnicity, religion, etc.InterestsPrevious reviews on products......How could these pieces of information help?How to collect these information?

  • The Personalization Process(Contd...)Collect lots of information on the user behaviorInformation must be attributable to a single userDecide on a user modelFeaturing user needs, lifestyle, situations, etc.Create user profile for each user of the systemProfile captures the individuality of the userHabits, browsing behavior, lifestyle, etc.With every interaction, modify the user profile

  • The Personalization Process More FormallyWeb is a collection of n items I = {i1,i2,....in}User comes from a set U = {u1,u2,...um}User has rated each item by ruk : I [0,1] U !where, ij = ! means ij is not rated by the userIk(u) is set of items not yet rated by user ukIk(r) is set of items rated by user ukGOAL: recommend items ij to user ua that are present in Ia(u), which might be of his interest

  • Classification of Personalization ApproachesIndividual Vs CollaborativeReactive Vs ProactiveUser Vs Item Information

  • Classification of Personalization ApproachesIndividual Vs CollaborativeIndividual approach (Google Personalized Search)Use only individual user's dataGenerate user profile by analyzingUser's browsing behaviorUser's active feedback on the systemAdvantageCan be implemented on the client-side - no privacy violationDisadvantageBased only on past interactions lack of serendipity

  • Classification of Personalization ApproachesIndividual Vs CollaborativeContd...Collaborative approach (Amazon recommendations)Find the neighborhood of the active userReact according to an assumptionIf A is like B, then B likes the same things as A likesDisadvantagesNew item rating problemNew user problemAdvantageBetter than individual approach - Once the two problems are solved.

  • Classification of Personalization ApproachesReactive Vs ProactiveReactive approachExplicitly ask user for preferencesEither in the form of query or feedbackProactive approachLearn user preferences by user behaviorNo explicit preference demand from the userBehavior is extractedClick-through ratesNavigational pattern

  • Classification of Personalization ApproachesUser Vs Item InformationUser InformationGeographic location (from IP address)age, gender, marital status, etc (explicit query)Lifestyle, etc. (inference from past behavior)Item InformationContent of Topics movie genre, etc.Product/ domain ontology

  • Personalization TechniquesContent-Based FilteringCollaborative FilteringModel Based PersonalizationRule basedGraph theoreticLanguage Model

  • Content-Based FilteringSyskill and Webert use explicit feedbackIndividual, Reactive, Item-informationUses nave Bayes to distinguish likes from dislikesInitial probabilities updated with new interactionsUses 128 most informative words from each itemLetizia uses implicit feedbackIndividual, Proactive, Item-informationFind likes/dislikes based on tf-idf similarityOthers use nearest-neighborhood for similarity

  • Collaborative FilteringFound successful in recommendation systemsGeneral TechniqueFor every user, a user neighborhood is computedNeighborhood contains users who have rated several items almost equallyGet candidate items for recommendationsItems seen by the neighborhood but not by active user uaData is stored in the form of a rating matrixItems as rows and users as columns

  • Collaborative FilteringContd....System must provide the following algorithmsMeasure similarity between usersFor creation of the neighborhoodPearson and Spearman Correlation, cosine similarity, etc.Predicting rank of the item not rated by the userTo decide order with which these items will be presentedWeighted sum of ranks most commonSelect neighborhood subset for predictionTo reduce large amount of computationThreshold in similarity value most common

  • Model Based Personalization ApproachesExecuted in two stagesOffline process to create the actual modelOnline process using the model and interactionCommon data used for model generationWeb usage data (web history, click-through rates, etc.)Item's structure and content dataExamplesRule-Based ModelsGraph-Theoretic ModelsLanguage Models

  • Model Based PersonalizationRule Based ModelsAssociation rule-basedItem ia is in unordered association with ibIf user considers ib, then ia is a good recommendationSequence rule-basedItem ia is in sequential association with ibIf user considers ia, then ib is a good recommendationAssociation between items can be stored as a dependency graph

  • Model Based PersonalizationGraph Theoretic ModelRatings data is transformed into a directed graphNodes are usersA edge between ui and uj means that ui predicts ujWeights on edges represents the predictabilityTo predict if an item ik will be of interest to uiCalculate shortest path from ui to any user ur Where ur has rated ikPredicted rating is calculated as a function of path between ui and ur

  • Model Based PersonalizationLanguage Modeling ApproachesWithout using user's relevance feedbackSimple language modeling

    Using user's relevance feedbackN gram based methodsNoisy channel model based method

  • Language Model ApproachSimple Language ModelingWithout using user's feedbackHistory consists of all the words in the past queriesLearn User Profile as {(w1,P(w1)),... (wn,P(wn))}where

  • Language Model ApproachSimple Language ModelingSample User profile

  • Language Model ApproachSimple Language ModelingRe-ranking of unpersonalized resultsRe-ranking is done according to P(Q|D,u)

    Is a weighter parameter between 0 and 1UP is user profile

  • Language Model ApproachN gram based approachUsing user's relevance feedbackLearn User ProfileLet Hu represent the search history of user uH = {(q1, rf1), (q2, rf2), (q3, rf3), ...., (qn, rfn)}Unigram

    Now the user profile consists of {(w1, P(w1)), (w2, P(w2)), (w3, P(w3)), ...., (wn, P(wn))}

  • Language Model ApproachN gram based approachSample Unigram User Profile

  • Language Model ApproachN gram based approachBigram

    the user profile consists of {(w1w2, P(w2|w1)), (w2w3, P(w3|w2)), ... , (wn-1wn, P(wn|wn-1))}

  • Language Model ApproachN gram based approachSample Bigram User Profile

  • Language Model ApproachN gram based approachRe-ranking unpersonalized resultsBased on unigram ( = weighting parameter)Q = q1 q2 q3 .... qnP(q1 q2 q3 .... qn)= P(q1) P(q2) P(q3) ....... P(qn)

  • Language Model ApproachN gram based approachBased on bigramsQ = q1 q2 q3 .... qnP(q1 q2 q3 .... qn)= P(q1|q2) P(q2|q3) ....... P(qn-1|qn)

  • Language Model ApproachNoisy Channel based approachWith using User's Feedback (Implicit)User history is represented asHi = (Q1,D1) , (Q2,D2) , .... (QN,DN)Di is the document visited for QiD consists of words w1, w2, .... wmBasic Idea Statistical Machine TranslationGiven Parallel Text of languages S and TWe get P(ti|si) si S and ti TUsing EM we get the optimized model P(T|S)

  • Language Model ApproachNoisy Channel based approachSimilarlyT = past queries Q1, Q2, .... QKS = text of relevant documents for queries TWe learn the model P(Q|D) or more precisely P(qi|wj)AssumptionTranslate the ideal [information containing] document into a queryDocument a verbose languageQuery a compact languageUser profile is stored asTuples < qi , wj , P(qi|wj) >

  • Language Model ApproachNoisy Channel based approachSample Noisy Channel User Profile

  • Language Model ApproachNoisy Channel based approachRe-rankingRe-rank the documents using P(Q|D,u)

    = weighting parameterP(qi|GE) is the lexical probability of qi

  • Issues in PersonalizationCold Start Problem (new user problem)Latency Problem (new item problem)Data sparseness ScalabilityPrivacyRecommendation List DiversityRobustness

  • ConclusionWeb personalization is the need of the hour for e-businessesA relatively new research topicSeveral issues are yet to be solved effectivelyData should be collected without evading user privacyCreating user models effectively and scaling it to the size of a large number of users/ items is at the core of Personalization

  • BibliographyRohini U, Vamshi Ambati and Vasudeva Varma. Statistical Machine Translation Models for Personalized Search. In the Proceedings of 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), January 7-12, 2008, Hyderabad, India. Sarabjot S. Anand and Bamshad Mobasher. Intelligent