Igor Kostiuk “Как приручить музыкальную рекомендательную...

Igor Kostiuk | 2016

Tags: #music, #recommender_systems, #deep_learning, #neural_networks, #mel_spectrograms

How to train your music recommender system

Recommender systems are a family of methods that seek to predict the rating or preference that a user would give to an item © Wiki

Is there something similar to something else?

There are two common ways to make recommendations.

Collaborative filtering

- cold start problem (requires a large amount of information on a user in order to make accurate recommendations)

- will not recommend rare or new songs, games, etc. (popular items will be much easier to recommend than unpopular items)

- bad scalability

+ content-agnostic

Example: Last.fm recommends music based on a comparison of the listening habits of similar users.

http://ru.anime-characters-fight.wikia.com/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Anime-heroes-wallpaper-hd-cool-7.jpg

Popularity

Content-based filtering

- can only make recommendations that are similar to the original seed

- semantic gap between audio or video, and the various aspects of music / movie that affect user preferences (genre, mood)

- obvious recommendations ( Doom Doom 4 etc. )→

http://static.giantbomb.com/uploads/original/13/137381/2846580-doom.jpg

There is nothing more similar to the tea kettle than the other tea kettle

Approaches1. Automatic generation of social tagsSocial tags are user-generated keywords associated with song.Predicting these social tags directly from MP3 files avoids the ''cold-start problem''.Using a set of one vs all classifiers for every tag, we can map audio features onto social tags collected from the Web.

2. Music genre classificationAttempt to classify songs into a set of genre classes. Clustering – each cluster represents a specific genre.Setting label to each cluster by choosing the “majority vote” - which genre was the most common in that cluster.

https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

Deep Learning approachPredicting listening preferences from audio signals by training a regression model to predict the latent representations of songs that were obtained from a collaborative filtering model.

Datafrom a collaborative filtering model

Dataraw mp3

Latent factors vector extractingmatrix factorization

Mel-spectrograms extracting

Deep neural network

input output

prediction

Advantages

+ Effectiveness in recommending new and unpopular songs

+ Good recommendations despite the semantic gap

Development stages

Data retrieval

The Echo Nest Taste Profile Subset

http://labrosa.ee.columbia.edu/millionsong/tasteprofile

b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBSUJE12A6D4F8CF5 2b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBVFZR12A6D4F8AE3 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXALG12A8C13C108 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXHDL12A81C204C0 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBYHAJ12A6701BF1D 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOCNMUH12A6D4F6E6D 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 5

Taste Profile subset is big. Some numbers:

1,019,318 unique users 384,546 unique MSD songs 48,373,586 user - song - play count triplets

http://labrosa.ee.columbia.edu/millionsong/tasteprofile

Data retrieval

https://www.7digital.com/

We are able to attain 29 second audio clips for over 99% of the dataset.

Original dataset has no raw audio, only precomputed, badly documented features.

https://www.7digital.com/

Weighted matrix factorization

https://youtu.be/o8PiWO8C3zs

song_id

user_id

song_id

user_id

https://youtu.be/o8PiWO8C3zs


n songs

m users ≈ *

m u

sers

f

f

n songs

R P

Q

R – rating matrix m*nP – user matrix m*fQ – song matrix f*nf – number of features


Alternating Least Squares

http://mendeley.github.io/mrec/https://github.com/benanne/wmfhttps://github.com/benanne/theano_wmf

http://mendeley.github.io/mrec/

https://github.com/benanne/wmf

https://github.com/benanne/theano_wmf


iteration

erro

r

http://benanne.github.io/2014/08/05/spotify-cnns.html

Mel-spectrograms

A mel-spectrograms is a kind of time-frequency representation.

It is obtained from an audio signal by computing the Fourier transforms of short, overlapping windows.

Finally, the frequency axis is changed from a linear scale to a mel scale.

https://en.wikipedia.org/wiki/Mel_scale

https://en.wikipedia.org/wiki/Mel_scale

Mel-spectrograms

series = np.sin(time)

# filename = "The Prodigy - Invaders Must Die.mp3"# filename = "Lady GaGa - Poker Face.mp3"

Mel-spectrograms

Used log-compressed mel-spectrograms with 128 components and the window size and hop size 1024 and 512 audio frames respectively.

https://github.com/librosa/librosa

http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html#librosa.feature.melspectrogram

https://github.com/librosa/librosa



T-SNE

https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding

https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding

1024 1024 * 2

1024 * 4

Mel-spectrograms

Convolutional neural network

The deep neural network baseline architecture could be consisted of two convolutional layers and two fully connected layers.


259 x 128 x 1

4 x 128 x 1

259 x 4 x 32

4s

0.0029s

Filters


The network can be trained on windows of 3 seconds sampled randomly from the audio clips.

The last layer of the network is the output layer, which predicts 40 latent factors obtained from the collaborative filtering.

http://www.slideshare.net/erikbern/music-recommendations-mlconf-2014

Album cover based models

1) series = (np.sin(time) - np.sin(time / np.pi))https://www.google.com.ua/#q=y+%3D+sin%28x%29+-+sin%28x+%2F+pi%29 2) Deep content-based music recommendationhttp://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf

3) Collaborative Filtering for Implicit Feedback Datasetshttp://yifanhu.net/PUB/cf.pdf

4) Alternating Least Squares Method for Collaborative Filtering http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/

5) Recommending music on Spotify with deep learninghttp://benanne.github.io/2014/08/05/spotify-cnns.html

6) * http://papers.nips.cc/paper/3370-automatic-generation-of-social-tags-for-music-recommendation.pdf

http://cs229.stanford.edu/proj2013/FauciCastSchulze-MusicGenreClassification.pdf

http://ismir2011.ismir.net/papers/PS6-10.pdf

http://erikbern.com/2013/12/20/more-insight-into-recommender-algorithms/

http://www.slideshare.net/irecsys/matrix-factorization-in-recommender-systems

https://www.google.com.ua/#q=y+%3D+sin%28x%29+-+sin%28x+%2F+pi%29

http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf

http://yifanhu.net/PUB/cf.pdf

http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/

http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/


http://papers.nips.cc/paper/3370-automatic-generation-of-social-tags-for-music-recommendation.pdf

http://papers.nips.cc/paper/3370-automatic-generation-of-social-tags-for-music-recommendation.pdf

http://cs229.stanford.edu/proj2013/FauciCastSchulze-MusicGenreClassification.pdf

http://ismir2011.ismir.net/papers/PS6-10.pdf

http://erikbern.com/2013/12/20/more-insight-into-recommender-algorithms/

http://www.slideshare.net/irecsys/matrix-factorization-in-recommender-systems

Let’s stay in touch:

Facebook

https://www.facebook.com/neverdraw

LinkedIn

https://www.linkedin.com/in/awesomengineer

Github

https://github.com/spaceuniverse

https://www.facebook.com/neverdraw

https://www.linkedin.com/in/awesomengineer

https://github.com/spaceuniverse

Thanks

http://cdn.gymnasticstracks.com/wp-content/uploads/2015/09/httyd.jpg

Igor Kostiuk “Как приручить музыкальную рекомендательную...

Business

Transcript of Igor Kostiuk “Как приручить музыкальную рекомендательную...