DEEP LEARNING FOR LARGE SCALE MUSIC RECOMMENDATION

DEEP LEARNING FOR LARGE SCALE MUSIC RECOMMENDATION

ORIOL NIETO [email protected]

BIOSTAT SEMINAR STANFORD, CA

OCTOBER 20, 2016

mailto:[email protected]?subject=

• Pandora Overview

• Music Informatics Overview

• Large Scale Music Recommendation with Deep Learning: • Machine Listening • Collaborative Filtering

Outline

BE#THE#EFFORTLESS#SOURCE#OF#PERSONALIZED#MUSIC#ENJOYMENT

Today

~80$MMAUs

24$HPer-Month

10$BStations

75BThumbs

>3$MSongs

130$KArtists&spinningevery&month

Today

80#MMAUs

24#HPer-Month

11#BStations

75#BThumbs

>3#MSongs

180#KArtists&spinningevery&month

Content&BasedCollective)Intelligence

PersonalizedFiltering




Outline

Music Informatics(AKA MUSIC INFORMATION RETRIEVAL)


Beats


BeatsDownbeats


Beats

Am Am/G F#m Fmaj7 Am G D

ChordsDownbeats


Beats


ChordsDownbeats

Verse

Structure


Beats


ChordsDownbeats

Verse

Structure

“electric guitar”,

“drums”,

“male singer”,

“pop”,

“classic rock”

Music InformaticsTHE BASICS

Music InformaticsTHE BASICS

Any sound can be built out of sinusoids

Discrete Fourier Transform

X = Wx

x 2 Rn

W 2 Cn⇥n

X 2 Cn

Discrete Fourier TransformEXAMPLES

Any sound can be built out of sinusoidsX = Wx

x (time domain) |X| (frequency domain)

Short Time Fourier Transform

x (time domain)

|X| (freq domain)

STFTfre

quency

time

Short Time Fourier TransformMY GUITAR GENTLY WEEPS -‐ THE BEATLES

Short Time Fourier TransformCLOCKWORK -‐ MESHUGGAH

Short Time Fourier TransformTHE WARNING -‐ NINE INCH NAILS

Extracting Information

Beats

ChordsDownbeats

Structure


“drums”,

“male singer”,

“pop”,

“classic rock”

Hand Crafted Features

Beats

ChordsDownbeats

Structure


“drums”,

“male singer”,

“pop”,

“classic rock”

MFCCs Spectral Rolloff Spectral Centroid Structural Features Chromagrams Tempograms

CERNs …

Learned Features

http://blogs-‐images.forbes.com/kevinmurnane/files/2016/03/google-‐deepmind-‐artificial-‐intelligence-‐2-‐970x0-‐970x646.jpg

Beats

ChordsDownbeats

Structure


“drums”,

“male singer”,

“pop”,

“classic rock”

DEEP LEARNING (Mostly Convolutional Neural Networks)




Outline

The Music Genome Project

>1.5 Million tracks manually analyzed

~400 attributes per track

https://static01.nyt.com/images/2009/10/18/magazine/18pandora.2-‐650.jpg

Attribute Examples

Breathy Voice Nasal Voice Odd Meter Has Banjo Joyful Lyrics

…

Recommending Music using the MGP

Ranked Tracks

Attributes

Query Track

Recommending Music using the MGPEXAMPLE

Arhst Title

Query Track Meshuggah Bleed

Ranked 1 Meshuggah Lethargica

Ranked 2 Meshuggah Pravus

Ranked 5 Mithras Into Black Holes Of Oblivion

Arhst Title

Query Track The Beatles While My Guitar Gently Weeps

Ranked 1 Paul McCartney Freedom

Ranked 2 Badly Drawn Boy What Is It Now

Ranked 3 Jefferson Starship Fading Lady Light

Recommending Music using the MGPEXAMPLE

USING DEEP LEARNINGEstimating MGP Attributes


USING DEEP LEARNINGEstimating MGP Piano Attribute


https://static.squarespace.com/static/539a01fae4b053c5aa5435dd/t/539a2575e4b0e16ced73e69f/1402611061234/11357509244_0e8af7a4ff_o.jpg

USING DEEP LEARNINGPiano Detector

USING DEEP LEARNINGPiano Detector

128 bins

N N N

Patch 1 Patch 3 Patch 2

DEEP ARCHITECTUREPiano Detector

128

N

X


256

128

N

X

f(X, ✓) = �(W ⇤X + b)

Convolutional Layer


256

128

N

X256

f(X, ✓) = �2(W2 ⇤ �1(W1 ⇤X + b1) + b2)

Convolutional Layers


128

N

X

256256

512 512


f(X, ✓) = �4(W4 ⇤ . . . ⇤ �1(W1 ⇤X + b1) + . . .+ b4)


128

N

X

256256

512 512


2048

Dense Layer

f(X, ✓) = �5(W1�4(W4 ⇤ . . . ⇤ �1(W1 ⇤X + b1) + . . .+ b5)


128

N

X

256256

512 512


2048

f(X, ✓) = �6(W2�5(W1�4(W4 ⇤ . . . ⇤ �1(W1 ⇤X + b1) + . . . b6)

2048

Dense Layers


256256

512 512

2048

2048128

N

Convolutional Layers Dense Layers

f(X, ✓) = �7(W3�6(W2�5(W1�4(W4 ⇤ . . . ⇤ �1(W1 ⇤X + b1) + . . .+ b7)

TRAININGPiano Detector

480k Patches

60k Patches

60k Patches

Training

Validation

Test(Different model)

ESTIMATING FULL TRACKSPiano Detector

RESULTSPiano Detector

OUTPUT EXAMPLESPiano Detector

Then, by Dewey Redman Quartet.

OUTPUT EXAMPLESPiano Detector

Contract On The World Love Jam by Public Enemy




Outline

PROBLEM OVERVIEWCollaborative Filtering

Users

? ?

? ? ?

?

? ?

? ?[Items (Tracks) [

Explicit

Thumbs (up and down)

Station Creation

Implicit

Track Completion

Track Skips

LATENT FACTORSCollaborative Filtering

AggressiveCalm

Simple Harmony

Complex Harmony

PROBLEM FORMULATIONCollaborative Filtering

qi 2 Rf

Given Item i and User u:

pu 2 Rf

Rating:

r̂ui = qTi pu

rui

Item Latent Factor:

User Latent Factor:

Rating Approximation:

Across all data:

? ?? ? ?

?? ?

? ?[ [Users

Item

s (Tr

acks

)

⇡[ [ [[f

f

Users

Item

s (Tr

acks

)

MATRIX FACTORIZATIONCollaborative Filtering

argminq⇤,p⇤X

u,i2S(rui � qTi pu)

2+�(||qi||2 + ||pu||2)

(Koren et al., 2009)

? ?? ? ?

?? ?

? ?[ [Users

Item

s (Tr

acks

)

⇡[ [ [[f

f

Users

Item

s (Tr

acks

)

Collaborative Filtering

Arhst Title

Query Track The Beatles While My Guitar Gently Weeps

Ranked 1 The Beatles A Day In The Life

Ranked 2 The Beatles A Day In The Life (Love Version)

Ranked 3 The BeatlesWhile My Guitar Gently Weeps

(Love Version)

EXAMPLE

THE GOOD AND THE BADCollaborative Filtering

Rich preference-‐driven similarity space Latent space is generally not interpretable

Powerful at matching the right song with the right listener

Can only recommend items that have already been rated

WITH DEEP LEARNINGApproximating Item Factors

128

N

f

256256

512 512

2048

2048

Convolutional Layers Dense Layers

(van den Oord et al., 2013)

TRAININGApproximating Item Factors

700k Patches

70k Patches

70k Patches

Training

Validation

Test

ESTIMATING FULL TRACKSApproximating Item Factors

Recommending from Learned Factors

Arhst Title

Query Track La Bossa d’Urina Mama i Papa

Ranked 1 Johnny Cash I'm Going To Memphis

Ranked 2 Merle Haggard The Woman Made A Fool Out Of Me

Ranked 3 Waylon Jennings Lonesome On'ry And Mean

EXAMPLE



? ?

? ? ?

?

? ?

? ?

ENSEMBLE OF RECOMMENDERS MAY PRODUCE OPTIMAL RECOMMENDATIONS

MAN vs MACHINE?

MAN + MACHINE

MAN + MACHINE“Mix,of,Art,and,Science”

THANKS!

References

Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kavukcuoglu, K. (2016). Wavenet: A Generative Model for Raw Audio, 846–849. Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-‐Scale Image Recognition. International Conference on Learning Representations, 1–14. http://doi.org/10.1016/j.infsof.2008.09.005 Mcfee, B., Humphrey, E. J., & Bello, J. P. (2015). A Software Framework For Musical Data Augmentation. In Proc. of the 16th International Society for Music Information Retrieval Conference (pp. 248–254). Málaga, Spain. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49. http://doi.org/10.1109/MC.2009.263 Oord, A. Van Den, Dieleman, S., & Schrauwen, B. (2013). Deep Content-‐based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651. Retrieved from http://papers.nips.cc/paper/5004-‐deep-‐content-‐based-‐music-‐recommendation

http://doi.org/10.1016/j.infsof.2008.09.005

http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation

DEEP LEARNING FOR LARGE SCALE MUSIC RECOMMENDATION

Documents

Transcript of DEEP LEARNING FOR LARGE SCALE MUSIC RECOMMENDATION