Data Mining for Web Personalization

Data Mining for Web Personalization

Patrick DudasData Mining for Web PersonalizationOutlinePersonalizationData miningExamplesWeb miningMapReduceData PreprocessingKnowledge DiscoveryEvaluationInformation High

PersonalizationGoal of data mining approach is for automatic personalizationAutomatic Personalization:Content-basedCollaborativeRule-based Rule-based (Brief overview)Create decision rulesImplicitly/ExplicitlyHighly domain dependent Rules nontransferable Profiles are based on user inputBiased StaticDegrade over time

Content-based (Brief overview)Profile based on users past experiences and their interest (ratings)Think Amazon, Pandora, eBay..Vector similarities based on cosine similarityBayesian classificationRemember: Ratings = Profile = RecommendationCollaborative (Brief overview)Creating groups of users based on ratingsNearest neighbor approachOnce grouped, recommendation based on the other neighbors are presentedMore users or items = more dimensions of dataDynamic or real-time not applicableData Mining Data rich descriptionsLarge volumes of datareliable modelsAutomated data collectionEvaluate results/make decisionsIntegration with existing data sources

Examples of Large Datasetshttp://aws.amazon.com/datasetsFeatured data sets:Illumina - Jay Flatley (CEO of Illumina) Human Genome Data Setcience 315(5814): 972.350 GBYRI Trio Dataset700GBSloan Digital Sky Survey DR6 Subset160 GBGenome, survey data, Google Books n-gram corpuses, traffic statistics, OpenStreetMap dataset, Wikipedia trafficData Mining Web Personalization Recommendations based on Web objects:ItemsPagesDocumentsNavigation by links

Web miningPros: Personalization (duh.), real-time, more enriched datasets Cons: Privacy issues, building complex systems that misrepresent the individualExtend the Data Mining ParadigmData Preparation and TransformationWeb logsDate/time usageSite informationResource requested (image, video, etc.)Site files/meta-data The power of the cookie

Server-side cookies!Data Preparation and Transformation (cont.)Pageview:User actions (where they clicked and the path)User events (what they are trying to accomplish)Session:Sequence of page views

MapReduceGoogle designHoodop implementedC++, C#, Erlang, Java, Ocaml, Perl, Python, Ruby, F#, R..

Example

Usage Data Pre-Processing

Pattern DiscoveryWe have data! Now what?ClusterClassificationAssociation Rule DiscoverySequential pattern DiscoveryMarkov ModelsLatent Variable Model

ClusteringPartitioningSplit your data into groupsK-meansHierarchical Divisive (top-down)Start with everything, find groupsAgglomerative (bottom-up)Start with a cluster and add additional informationModel-basedBuilding a model for the data (best fit)K-means

User-Based ClusteringStart with the user profilePartition into k-groups of profilesBased on similarity

Association DiscoverySupportmin(support)Confidencemin(confidence)

Evaluation (Personalization Model)Challenges:Recommendation algorithms may require unique set of evaluation metricsPersonalization actions may be differentDomainIntended applicationData gatheredCheck for overfitting dataTraining setROC CurveROC CurveTPR=TP/ (TP+FN)FPR=FP/ (FP+TN)"No""Yes"SNN.95.05.20.80

Information HighInformation is addictiveInformation can be misleadingInformation ethicsInformation is power, sometimes too powerful

Personal SuggestionsDevelop a hypothesisFigure out what data is neededMake informed decisionsDont trust just your judgmentExperts are experts for a reason!Develop a way to validate based on experienceThen get more data if needed

SourcesKohavi, R. and F. Provost (2001). "Applications of data mining to electronic commerce." Data Mining and Knowledge Discovery 5(1): 5-10.Dean, J. and S. Ghemawat (2008). "MapReduce: Simplified data processing on large clusters." Communications of the ACM 51(1): 107-113.Mobasher, B. (2007). "Data mining for web personalization." The adaptive web: 90-135.http://en.wikipedia.org/wiki/File:FrequentItems.pngZaharia, M., A. Konwinski, et al. (2008). Improving mapreduce performance in heterogeneous environments, USENIX Association.Witten, I. H. and E. Frank (2002). "Data mining: practical machine learning tools and techniques with Java implementations." ACM SIGMOD Record 31(1): 76-77.Senthil kumar, A. (2011). Knowledge discovery practices and emerging applications of data mining : trends and new domains. Hershey, PA, Information Science Reference.Thank you!Questions?

Data Mining for Web Personalization

Documents

Transcript of Data Mining for Web Personalization