MUSIC RECOMMENDATION SYSTEM FOR LAST.FM DATASET. Why music recommendation system is required?

39
MUSIC RECOMMENDATION SYSTEM FOR LAST.FM DATASET

Transcript of MUSIC RECOMMENDATION SYSTEM FOR LAST.FM DATASET. Why music recommendation system is required?

  • Slide 1

MUSIC RECOMMENDATION SYSTEM FOR LAST.FM DATASET Slide 2 Why music recommendation system is required? Slide 3 What is a data mining ? Data mining, which can be called data or knowledge discovery, is the process of analyzing data from different perspectives and summarizing it into useful information. http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm http://www.headsafrica.com/headsafrica/application/views/services/client/zf_files/images/data_mining/data_mining.jpg Slide 4 Data mining Modelling ClusteringClassificationAssociation Items are grouped for their similar specification in this method. It is consider the similarities of data among themselves It is very common technique for predicting some interests. It may refer to categorization data items. Unclassified cases are predicted as any class label group according to other classified label class Existing records in the database by examining their relationship with each other, it is a technique that determines which events occur together simultaneously Slide 5 What is recommendation engine? Recommendation system is described as system which interprets data that users entered the system and makes recommendation to users. Slide 6 Recommendation Techniques Content-based Filtering The salient features of any contents which were liked or watched previously by users are saved in mostly databases and new profile is created for users. While making recommendation, the content that belongs to nearest feature from the sets of property previously created is recommended with looking at this profile. https://www.ntt-review.jp/archive_html/200804/images/le1_fig02.gif Slide 7 Recommendation Techniques Collaborative Filtering This constitutes the foundation of The one loving one loves the alike approaches. It is not depending on the one user's content- property profile, while making recommendation bearing in mind that users who like the similar content properties or users with similar characteristics. http://www.bridgewell.com/images_en/ec_03.jpg Slide 8 Recommendation Techniques Collaborative Filtering Types User-based recommendation: This technique finds the similar users and recommends item. Item-based recommendation: The similarity of items is calculated and items are recommended. http://oytunyuksel.com/wp-content/uploads/post-02-01.jpg Slide 9 How to be created recommendation engine ? Slide 10 When the recommendation engine is created, the following steps should be implemented. The definition of data representation The creation of database or file model structure Making data pre-processing for getting the best result http://www.w3.org/WAI/TIDE/phases.gif Slide 11 What is an Apache Mahout ? http://hortonworks.com/hadoop/mahout/ http://hortonworks.com/wp-content/uploads/2013/09/mantle-mahout.png It is a Java library of scalable machine-learning algorithms, implemented on top of Apache Hadoop and using the MapReduce paradigm. For using Mahout in project: Download the latest Mahout release is 0.8 It can be accessed from the link below http://apache.fastbull.org/mahout/0.8/mahout-distribution-0.8.zip Extract all the libraries and include them in a new Eclipse (NetBeans) project as external JAR file. Java 1.6.x or greater is required for installation Hadoop is not mandatory to create recommendation engine. Slide 12 How to use Mahout for recommendation? The recommendation in Mahout follows these steps: The dataset is adjusted for Mahout-compliant The compatible recommender component is chosen The similarity calculations are computing according to rating or preferences The recommendation is evaluated Slide 13 Recommender job flow http://www.ibm.com/developerworks/library/j-mahout-scaling/ The main step doing the heavy lifting in the workflow is the "calculate co- occurrences" step. This step is responsible for doing pairwise comparisons across the entire matrix, looking for commonalities. Slide 14 The background process of recommendation in architecture The background process of recommendation in architecture Slide 15 Graduation Project with Last.fm Scheduling Slide 16 Graduation Project with Last.fm Gannt chart Slide 17 Graduation Project with Last.fm What is important risks ? Big-Data Time Computer performance Sparsity http://www.pm-primer.com/wp-content/uploads/2012/04/risk1.jpg Slide 18 Music recommendation project for Last.fm The dataset of Last.fm Dataset-1K users is used in project. This dataset has information about user properties and which songs are listened by which users. This dataset 2 files, one of them is users profile file and other one contains users musical history. There are 1000 users and 19,150,868 lines musical history which belongs to 1000-users. Slide 19 Music recommendation project for Last.fm Last.fm API is used and new csv format is created. Although there are 1000 users, during to project period 700 users' files with desired properties were prepared due to time constraints. After preparing files, all files were saved on database tables for the sake of easy data processing, the tables: Artists Users Tracks TrackTags UserTagTrack Slide 20 Music recommendation project for Last.fm The collaborative filtering method is used. 2 types of segmentation are considered. The one of the recommendation is made between clustering users according to gender, age, country type. Other recommendation is made between all users. User-based recommendation engine is created. JDBC and File Data Model is used for data representation. Slide 21 Music recommendation project for Last.fm To make cluster, Weka is used because of simplicity. All users' characteristics were represented as value. (In thesis page 33-34) . goes Slide 22 Music recommendation project for Last.fm There are many methods can be used for collaborative filtering : Mean Squared Differences Algorithm Vector Similarity Pearson Correlation Coefficient Strengths and Weaknesses of Collaborative Filtering Method Pearson Correlation Similarity algorithm is used for thesis data model. Since it is convenient and gives correct result for huge amount of data. Slide 23 The functionality of project system Slide 24 JDBC Model-Database Tables artist idartist name track idtrack nameartist idpublished year tag idtag name usertagtrack iduser idtrack idtag idpreferences user iduser namegenderagecountryArtists Tracks TrackTags Users UserTagTrack It is a general database (default), all files or other databases are created from this. Slide 25 Recommendation Model user idtag idsum (preferences) user idtrack idsum (preferences) track idtag idsum (preferences)PrefUserTag PrefUserTrack PrefTagTrack In JDBCDataModel, primary keys must be defined because of time efficiency. The database format should be: Slide 26 Number of elements in tables The name of tables begins with Pref statement are formatted table for Mahout recommendation functions. They contain very low data according to UserTagTrack table. Slide 27 Number of elements in tables Before the assignment of primary key With primary key, format is shown below: user idtag idsum (preferences) Slide 28 The introduction of system After the text file is created via API, standard line of text is shown as follows: This line represents on UserTagTrack table: user name, artist name, track name, published year, tags user_000103, Super Furry Animals, The Undefeated, 2003, indie, britpop, rock, trumpet, pop Slide 29 The functions used in the recommendation engine The working principle of user-based recommendation engine: Slide 30 Recommendation Results The infinite amount of results can be obtained via evaluator program. In thesis, pages 41-51 have many results with different conditions. Table NamePrefUserTag Neighbourhood Size2 For User Id5 # Recommendations5 ResultsTag-Name RecommendedItem[item:112040,value:213.03076]missjudy76 RecommendedItem[item:3387, value:211.02057]my 750 essential songs RecommendedItem[item:8124, value:194.43637]lionel richie RecommendedItem[item:8147, value:175.26286]leona lewis RecommendedItem[item:1809, value:167.69398]better than the original Slide 31 Recommendation Results Table NamePrefUserTrack Neighbourhood Size2 For User Id5 # Recommendations5 ResultsTrack Name RecommendedItem[item:7064,value:73.0]Out Of Control Neighbourhood Size7 Results Track Name RecommendedItem[item:16570,value:304.5]When You'Re Gone RecommendedItem[item:7064, value:73.0]Out Of Control RecommendedItem[item:1466, value:9.0]Aerodynamic RecommendedItem[item:7170, value:5.0 ]Bring Me To Life RecommendedItem[item:2969, value:5.0]Number Five With A Bullet Slide 32 How to evaluate results ? The evaluation of this recommendation engine result is realized with the most common metrics precision and recall. Precision is calculated with the ratio of relevant items recommended correctly to the number of items recommended. Recall is the ratio of relevant items recommended correctly to the number of items which are relavent to users. Actual PositiveActual Negative Predicted as positive TPFP Predicted as negative FNTN Slide 33 How to evaluate results ? The precision-recall is provided RecommenderIRStatsEvaluator class in Mahout. The evaluate function gives the result of F-measure, precision, recall value of recommendation engine. Parameters are given this functions, the important parameter is at which means that the number of recommendations to consider when evaluating precision o precision at something (integer value) Slide 34 Evaluation Results Table NamePrefUserTag Data Model StructureUser-Tag-Preference Row-Column Variable Number# users: 700, # item: 14044 Neighbourhood Size2 5 recommendationsPrecision: 0.9784243295019155 Recall: 0.9741058655221752 Table NamePrefUserTrack Data Model StructureUser-Track-Preference Row-Column Variable Number# users: 700, # item: 316018 Neighbourhood Size2 5 recommendationsPrecision: 0.033268482490272366 Recall: 0.005531505531505532 Slide 35 Evaluation Results Table NamePrefUserTrack Data Model StructureUser-Track-Preference Row-Column Variable Number# users: 700, # item: 316018 Neighbourhood Size3 5 recommendationsPrecision: 0.036322463768115994 Recall: 0.012746512746512747 Slide 36 The comment of evaluation results If the number of neighbourhood size increases, the recommendation engine results will be better because of the working principle of similarity function. User-tag recommendation engine is the better than user-track recommendation engine because of data size and sparsity. People with similar characteristics are also similar musical tastes. When the neighbourhood size increases, the number of recommended items increases. Slide 37 Self-criticism I The creation of data set and data representation took a long time. Thus, ready dataset can be used and this way buys project holder extra time. There are huge amount of data in data model. Scanning all data and making recommendation took a long time because of computer capacity. Thus, I could get a better computer. The out of memory error was the most frequently encountered problems while calculating evaluation result because of low JAVA heap-space in operating system or Java version. Slide 38 Self-criticism II Slowness or memory error problems can be solved via using parallel programming. In addition, using server is the another alternative solution for problems. User-Track Profile results is not good, recommendation engine performance for this model could be increased. If the computer capacity increases, more data can be used for recommendation engine. http://d1jb6zrebfcfrk.cloudfront.net/assets/content/cache/made/65b7808e1a1599d2/Think_Bigger,_Make_B etter_3_860_484.png http://thisiscolossal.com/wp-content/uploads/2011/01/better-3-600x337.jpg Slide 39 Thank you for listening Thank you for listening