Btp 3rd Report

Music Recommendation System

Content

1. Introduction2. Progress of work3. Work done 4. Algorithms and Technical details5. Dataset6. References

BTP Report => 3

Panel No => 5

Group No => G6

Members => dinesh singh yadav (200601026)

Vinay Kuamr (200601102)

Faculty Advisor => Dr. Vikram Pudi

1. Introduction

Recommendation system is to recommend the related items to the users interest. Basically the main goal of this system is to propose to user interesting music, to discover, including unknown artists, their popular available tracks based on users’ musical taste. Music domain is somewhat different than other domains as implicit ratings are not collected in the terms of ratings but in the terms of playing, skipping, or stopping recommended track. The work is focused on presenting to a user a list of artists, or creating the ordered set of tracks (personalized playlist) based on common approaches like collaborative and content-based filtering. Typically, a recommender system compares the user's profile to some reference characteristics, and seeks to predict the 'rating' that a user would give to an item they had not yet considered.

2. Work Done

Followings are the different techniques have been used to get the top-N recommended items or songs for the users.

a. Content based filtering: Content-based filtering algorithms use description of track. MIR (Music Information retrieval) is a science that deals with retrieving information from the music and extract features from digital signal by signal processing. New music can be recommended if two song has similar acoustic content i.e. their feature vector. However it is difficult to construct a vector representation of that is enough compact and meaningful to describe whole audio. Nearly 1 million integer is required to present 10 seconds of CD-quality stereo song( 44,100 16 bit samples/second). However researchers use Mean variance of set of features collected over N-frames to reduce the dimensions of feature and to get whole feature description. The feature extraction is itself a highly researched field among researchers of speech processing, multimedia audio and signal processing. The features feasible for music domain are timbre, textual, MFCC, instrumental, rhythmic and harmony features defined in Pampalk. There are many machine learning/data-mining techniques to automatically classify music based on training/classification and clustering using Kernel, Gaussian and SVM classifiers. Data-mining includes Kmean, clustering of music and getting the hidden patterns from the vector representation of tracks. This was done in the last semester. But still its quality is yet to be improved by more analysis of data set. Through the vector mean similarity, it gives us the average of all item's vector. This is then compared to the new entry to find out the type and category of the songs. It provides top-N songs which are few kind of similar to the song which is being played by the users.

b. Collaborative Filtering: Collaborative filtering is the process of

filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets. The method of making automatic predictions (filtering) about the interests of a user by collecting taste information from many users (collaborating). This is almost done and will be shown in the next demo. Filtering process in collaborating filtering is only human based, not on machine analysis of content. Each user of an collaborating filtering system rates songs that they have experienced, in order to establish a profile of interests. The CF system then matches together that user with people of similar interest or tastes. Then rating for those similar people are used to generate recommendations for the user. CF has many significant advantages over traditional content-based filtering, primarily because it does not depend on error-prone machine analysis of content. The advantages include the ability to filter any type of content. e.g. Text, art, work, music, mutual funds; the ability to filter based on complex and hard to represent concepts, such as taste and quality; and the ability to make serendipitous recommendations. It is important to note that CF technologies do not necessarily complete with content-based filtering. In most cases, they can be integrated to provide a powerful hybrid filtering solution. For entertainment domains collaborating filtering is quite efficient but they have yet to be successful in content domains where higher risk is associated with accepting a filtering recommendation. Collaborative Filtering has some issues which make its efficiency very low at some points, this is called COLD START PROBLEM. It is a potential problem in computer based information systems which involve a degree of automated data modeling.

Specifically, it concerns the issue that the system cannot draw any inferences for users or items about which it has not yet gathered sufficient information. Typically, a recommendation system compares the user's profile to some reference characteristics. These characteristics may be from the information item or the user's social environment.

In the content based approach, the system must be capable of matching the characteristics of an item against relevant features in the user's profile. In order to do this it must first construct a sufficiently detailed model of the user's tastes and preferences through preference elicitation. So this is basically problem with an item which no one in the community has rated previously. Solution of this problem is being studied by us and will be applying soon for this project. It is required for efficient results. It can be overcome by introducing an element of collaboration amongst agents assisting various users. This way, novel solutions is handled by requesting other agents to share what they have already learnt from their respective users. In recommendation systems, the cold start problem is often reduced by adopting a hybrid between content-based matching and collaborative filtering. New items which have not yet received any ratings from the community would be assigned a rating automatically, based on the ratings assigned by the community to other similar items, Item similarity would be determined according to the item's content-based characteristics. A useful plugin was made which was imbedded in an existing player Amarok and Rythimic Box. This plugin was made to check the performance of the back end. It didn't give expected accuracy but more or less it was better.

3. Progress work

Collaborative filtering is in the verge of complete and will be completed by the time of Demo. We are studying the algorithms of Query by Humming and most probably it will be listed for the next work. Few of the papers are being followed for this process which seems a bit difficult to get good accuracy because this is totally new and advance work which is still a research topic. Right now a site has been developed which acts on client side. Site is still a simple and completes basic needs but not much interactive. For more attractiveness it needs a little settlement for CSS and JavaScript which will be left for further improvement time. Site has been made in Ruby on Rails and the back end has been made in Python. Site is having search option for user in music and some other events. User will be redirected to search result of search query by user. Whatever song is played by the user will be used to get all other related songs of that category or type. A list will be displayed with a sequence of most similar song with top rated for that user. Web site is developed via MVC model which is quite efficient in giving results as it makes efficient way to interact with data models and gives the results.

We have hosted an open source project Simreco at Github and all of our work is online and free to extend.Project URl http://github.com/dinesh/simreco/tree/masterGit-clone url git://github.com/dinesh/simreco.git

a. Web Interface- As it’s been explained in above paragraph that web interface has been developed with MVC architecture which is most popular today as most of the sites are using MVC model based framework, AJAX, JavaScript libraries. Site has been intended to provide publically so that users profile can be got easily when they rate and tell us their interest. It is having a flash player which will play the audio file on client side. As we need a very good streaming of audio file which needs an efficient way of implementation of model which will not make load on site while increasing the registration of users day-by-day.

b. Implementation of Web-based radio using flash - Internet radio has wider audience and eliminates the need of downloading, managinglarge audio libraries. It will be a streaming media server integrated with flash front-end. User-client can give feedback as thumps-up, thumps-down, love, stop, skip and ban. It should also learn the pattern of user listening habits and incorporate the feedback.

c. Gathering implicit feedback in terms of skipping, playing, stopping the songs. It is defined earlier to increase the recommendations using users’ personal feedback.

d. Improvement of recommendation Algorithm - From the previous results its improvement has been increased expectedly enough. New addition of content based filtering and collaborative filtering is efficient in their own way more than the previous algorithms we implemented.

e. Ability to recommend by artist, user-profiles as seedsIt defines the ability to generate the recommendations based on artists,user-profiles by extending the recommendation algorithm for searching similar artist and knn-Neighbor-search algorithm.

f. Crawling/ mining for new music-Music content is growing like mushrooms and many new technologies podcasts, rss feeds, foaf profiles, weblogs and mp3-blogs are publishing updates on their site. The recommendation System should make use of these services to regularly find new music, events and releases of users favorites artists.

4. Algorithm

We used the combination of collaborative and demographical recommendation this time. Demographic recommendation methods use only the information about the users. The users are categorized based on the attributes of their

http://github.com/dinesh/simreco/tree/master

demographic profiles in order to find users with similar features. Collaborative filtering are based on User-based and item-based approach, though item-based approach much more scalable and used in current industry. There are similarity algorithms that have been used: cosine vector similarity, Pearson correlation, Spearman correlation, among then Pearson similarity works superior to other which we used in our algorithm.

Collaborative filtering depends upon explicit and implicit data in the form of ratings and demographic info of user and various categories. Before explaining the algorithms we introduce some definitions that facilitate the explanation process. First of all, we define,

A set of m user U = {u(x) where x=1,2,3… }A set on n items I = {i(x) where x=1,2,3.. }A set of p categories C = { c(x) where x=1,2,3,..}A set of q explicit ratings R = {r(x), where x=1,2,3… and q*q < m*n

total items} A set of q implicit ratings R = {r’(x), where x=1,2,3… and q*q < m*n

total items}

We also define three matrix derive from our data user-item matrix, user-categories matrix and item-categories bitmap matrix.

User-item matrix is a matrix of users against items that have as elements the explicit ratings of users to items. Some of the cells are not filled.

User-categories is the matrix of users against item-categories that have as element values that show the number of times a user has rated positive or negative rating for a category. For each category we keep two columns, one for positive and one for negative ratings.

Item-category bitmap matrix is a matrix of items against category that have as elements the value 1 if the item belongs to the specific category and the value otherwise. Next we define User-based and item-based similarity for explicit and implicit data –

User-based Similarity--

1. Explicit based- Lets define subset of users u, v which has both rated as I’ ={ i(x) : x=1,2,3…n’ n’< n } Pearson-similarity between them is

2. Implicit based-

Item-based Similarity

1. Explicit based-

2. Implicit based-

These are the standard formulas for calculating similarity between user-based and item-based approaches.

Prediction Algorithms

User-based prediction algorithm is based on user’s average and an adjustment (weighted sum that integrated between item-similarity) to it.

UP = user’ average + adjustment

For Item-based prediction algorithms prediction is based on items’ average and an adjustment.

IP = item’ average + adjustment

For each active user i(a) we can get UP and IP from the above algorithms, our final task is to merge these two of them as per the equation—

CR = alpha*UP + (1-alpha)*IP

This is the recommendation ratings we score for recommending items for the user.

By changing the value of alpha we can variant the results. For our test alpha = ½ works well in normal conditions.

5. Dataset

Edith Law (of TagATune fame) and Olivier Gillet have put together one of the most complete MIR research datasets since uspop2002. The data (with the best name ever) is called magnatagatune. It contains:

clips: 25863

source mp3: 5405

albums: 446

artists: 230

unique tags: 188

similarity triples: 533

votes for the similarity judgments: 7650

6. References

1. Adomavicius, G. and Tuzhilin A.(2005), “Towards the next generation of recommender system and a survey and state-of-art and possible extensions”

http://tagatune.org/Datasets.html

http://labrosa.ee.columbia.edu/projects/musicsim/uspop2002.html

http://www.gwap.com/gwap/gamesPreview/tagatune/

2. Aucouturier, J-J. and Packet, F. (2002), “Music similarity measures: What’s the use?”

3. P. Cano, M. Kopperbergerger, N. Wack, “Content based music audio recommendation”. B. Logan, “Music recommendation from song-sets”

4. G. Tzanetakis, P. Cook, “Musical genre classification of audio signals”

5. Last.fm , Pandora and Amazon services and architechure.

Btp 3rd Report

Technology

Transcript of Btp 3rd Report