Combining Audio Content and Social Context for Semantic Music Discovery
description
Transcript of Combining Audio Content and Social Context for Semantic Music Discovery
![Page 1: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/1.jpg)
Combining Audio Content and Social Context for Semantic Music Discovery
José Carlos Delgado Ramos
Universidad Católica San Pablo
![Page 2: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/2.jpg)
I. Introduction
II. Sources of Music Information
III. Combining multiple sources of music information
IV. Experiments
![Page 3: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/3.jpg)
Introduction
• Most music IR system focus on either content-based analysis of audio signals
![Page 4: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/4.jpg)
Introduction
• Or content-based analysis of webpages…
![Page 5: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/5.jpg)
Introduction
• …user preference information…
![Page 6: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/6.jpg)
Introduction
• … and social tagging data.
![Page 7: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/7.jpg)
Tags
• Short text-based tokens• Helpful when describing songs
![Page 8: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/8.jpg)
Tags
• Not always accurate, the strength of the semantic association betwen each song and each tag may vary.
![Page 9: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/9.jpg)
Sources of semantic information
• Surveys
• Social tagging websites
• Annotation games
![Page 10: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/10.jpg)
Relevance of tags to songs
• May be determined by using content-based audio analysis or by text-mining associated web documents.
![Page 11: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/11.jpg)
Main sources for information retrieval
• Audio content, Social tags and Web documents
• Also used audio signal analysis by using two acoustic feature representations related to timbre and harmony.
![Page 12: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/12.jpg)
Sources of Music Information
• A relevance score function r(s;t) is derived; evaluates the relevance of a song s to a tag t.
• Song-tag representations are dense if based on audio content, sparse if based on social representations.
![Page 13: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/13.jpg)
Representing Audio Content: Supervised Multiclass Labeling (SML)
• Audio track s represented as a bag of feature vectors X = {x1,x2,…,xT}
• 1: Expectation maximization algorithm • 2: Identify set of example songs with a given tag.• 3: Mixture-hiearchies expectation maximization
algorithm.
![Page 14: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/14.jpg)
Representing Audio Content: Supervised Multiclass Labeling (SML)
• Given a song s, X is extracted and likehood is evaluated using each of the tag GMMs.
• Result: vector or probabilites. Relevance of song s to a tag t may be written as:
![Page 15: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/15.jpg)
Representing Audio Content: Audio feature representations
• Mel Frequency Cepstral Coefficients (MFCC): associated with musical notion of timbre.
• Chroma: represents the armonic content (keys, chords) by computing spectral energy at frequences corresponding to chromatic scale.
![Page 16: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/16.jpg)
Representing Social Context:
• Summarize each song with annotation vector over a vocabulary of tags.
• Methods for retrieval tags: social & web-mined.• Missing song-tag pair: Tag not relevant or
relevant but not annotated.
![Page 17: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/17.jpg)
Representing Social Context:Social Tags
• Last.FM: Music discovery website.• 20 million users a month annotate 3.8 million
items over 50 million times using a 1.2 million tags universe.
• Last.FM db: 150 million songs/16 million artists.
![Page 18: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/18.jpg)
Representing Social Context:Social Tags
![Page 19: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/19.jpg)
Representing Social Context:Social Tags
• Two lists of social Last.FM tags for each song: relating song to tags, and relating artist to tags.
• Relevance Tsocial(s,t) = artist list tag scores + songs lists tag scores + tag score for synonyms or wildcard matches of t on either list.
![Page 20: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/20.jpg)
Representing Social Context:Web-Mined Tags
• Relevance Scoring (RS) algorithm.• Relevance function is a function of tag-
frequency, document frequency, number of total words in documents, etc
• Site-specific queries in HQ web-sites.• Steps: Collect Document Corpus and Tag songs
![Page 21: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/21.jpg)
Combining multiple sources ofmusic information
• Given a query tag t, goal: fin a simple rank ordering of songs based on relevance to t.
• Tag-score, web-relevance score and convex optimization used.
• Three algorithms: supervised, use labeled traning data for learning.
![Page 22: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/22.jpg)
Calibrated Score Averaging (CSA)
• Using training data, we can learn a function g() that calibrates scores such that
• To learn g(), we start with a rank-ordered training set of N songs where
• If data is is perfectly ordered, then g is isotonic. Otherwise:
![Page 23: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/23.jpg)
Calibrated Score Averaging (CSA)
• E.g. 7 songs with relevant scores (1,2,4,5,6,7,9) and ground truth levels = (0,1,0,1,1,0,1)
• Then g(r) = 0 for r < 2, g(r) = ½ for 3<=r<6, g(r) = 2/3 for 6<=r<9 and g(r) = 1 for 9<=r.
• Missing song tags scores suggests tag isn’t relevant. Instead:
![Page 24: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/24.jpg)
Rankboost algorithm
• For a given song, weak ranking function is n indicator functions that outputs 1 if the scoe for the associated representation is greater than the threshold or if the score is missing and the default value is set to 1. Otherwise 0.
![Page 25: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/25.jpg)
Kernel Combination SVM (KC-SVM)
• Linear combination of M different kernels that each encode different data features:
• Since each kernel matrix, Km is positive semi-definite, their positive-weighted sum, K is also a valid positive semi-definite kernel.
![Page 26: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/26.jpg)
Kernel Combination SVM (KC-SVM)
• Km represents similarities between all songs in the data set, after vectors X = {x1,x2,…,xT} obtained from MFCC and Chroma. Compute the entries of a probability product kernel (PPK)
![Page 27: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/27.jpg)
Kernel Combination SVM (KC-SVM)
• For each of the social context features, a radial basis function (RBF) function is computed, with entries:
• Where K(i,j) represents the similaritybetween xi and xj, the annotation vectors for songs i and j.
![Page 28: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/28.jpg)
Kernel Combination SVM (KC-SVM)
• For each tag t and corresponding class-label vector, y, the primal problem for single-kernel SVM is to find the decision boundary with maximum margin separating the two clases..
• Optimum K can be learned by minimizing the function that optimizes the dual (thereby maximizing hte margin) with respect to the kernel weights .
![Page 29: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/29.jpg)
Kernel Combination SVM (KC-SVM)
• Where and e is an n-vector of ones such that constrains the weights tu sum to one. C is a hyper parameter that limits violations of the margin.
![Page 30: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/30.jpg)
Kernel Combination SVM (KC-SVM)
• The solution returns a linear decision function that defines the distance of a new song sz, from the hyperplane boundary between the positive and negative classes (i.e. elevance of sz to tag t)
• b: offset of the decision boundary from the region.
![Page 31: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/31.jpg)
Semantic Music Retrieval Experiments
• 500 songs by 500 unique artists, each annotated by a minimum of 3 individual from a 174-tag vocabulary.
• Song annotated: 80% agree with tag relevance.• Experiment: 72 tags associated with at least 20
songs each.
![Page 32: Combining Audio Content and Social Context for Semantic Music Discovery](https://reader035.fdocuments.net/reader035/viewer/2022062408/56813949550346895da0e65d/html5/thumbnails/32.jpg)
Thanks!