DEEP LEARNING FOR LARGE SCALE MUSIC RECOMMENDATION
Transcript of DEEP LEARNING FOR LARGE SCALE MUSIC RECOMMENDATION
DEEP LEARNING FOR LARGE SCALE MUSIC RECOMMENDATION
ORIOL NIETO [email protected]
BIOSTAT SEMINAR STANFORD, CA
OCTOBER 20, 2016
• Pandora Overview
• Music Informatics Overview
• Large Scale Music Recommendation with Deep Learning: • Machine Listening • Collaborative Filtering
Outline
• Pandora Overview
• Music Informatics Overview
• Large Scale Music Recommendation with Deep Learning: • Machine Listening • Collaborative Filtering
Outline
BE#THE#EFFORTLESS#SOURCE#OF#PERSONALIZED#MUSIC#ENJOYMENT
Today
~80$MMAUs
24$HPer-Month
10$BStations
75BThumbs
>3$MSongs
130$KArtists&spinningevery&month
Today
80#MMAUs
24#HPer-Month
11#BStations
75#BThumbs
>3#MSongs
180#KArtists&spinningevery&month
Content&BasedCollective)Intelligence
PersonalizedFiltering
• Pandora Overview
• Music Informatics Overview
• Large Scale Music Recommendation with Deep Learning: • Machine Listening • Collaborative Filtering
Outline
Music Informatics(AKA MUSIC INFORMATION RETRIEVAL)
Music Informatics(AKA MUSIC INFORMATION RETRIEVAL)
Beats
Music Informatics(AKA MUSIC INFORMATION RETRIEVAL)
BeatsDownbeats
Music Informatics(AKA MUSIC INFORMATION RETRIEVAL)
Beats
Am Am/G F#m Fmaj7 Am G D
ChordsDownbeats
Music Informatics(AKA MUSIC INFORMATION RETRIEVAL)
Beats
Am Am/G F#m Fmaj7 Am G D
ChordsDownbeats
Verse
Structure
Music Informatics(AKA MUSIC INFORMATION RETRIEVAL)
Beats
Am Am/G F#m Fmaj7 Am G D
ChordsDownbeats
Verse
Structure
“electric guitar”,
“drums”,
“male singer”,
“pop”,
“classic rock”
Music InformaticsTHE BASICS
Music InformaticsTHE BASICS
Any sound can be built out of sinusoids
Discrete Fourier Transform
X = Wx
x 2 Rn
W 2 Cn⇥n
X 2 Cn
Discrete Fourier TransformEXAMPLES
Any sound can be built out of sinusoidsX = Wx
x (time domain) |X| (frequency domain)
Short Time Fourier Transform
x (time domain)
|X| (freq domain)
STFTfre
quency
time
Short Time Fourier TransformMY GUITAR GENTLY WEEPS -‐ THE BEATLES
Short Time Fourier TransformCLOCKWORK -‐ MESHUGGAH
Short Time Fourier TransformTHE WARNING -‐ NINE INCH NAILS
Extracting Information
Beats
ChordsDownbeats
Structure
“electric guitar”,
“drums”,
“male singer”,
“pop”,
“classic rock”
Hand Crafted Features
Beats
ChordsDownbeats
Structure
“electric guitar”,
“drums”,
“male singer”,
“pop”,
“classic rock”
MFCCs Spectral Rolloff Spectral Centroid Structural Features Chromagrams Tempograms
CERNs …
Learned Features
http://blogs-‐images.forbes.com/kevinmurnane/files/2016/03/google-‐deepmind-‐artificial-‐intelligence-‐2-‐970x0-‐970x646.jpg
Beats
ChordsDownbeats
Structure
“electric guitar”,
“drums”,
“male singer”,
“pop”,
“classic rock”
DEEP LEARNING (Mostly Convolutional Neural Networks)
• Pandora Overview
• Music Informatics Overview
• Large Scale Music Recommendation with Deep Learning: • Machine Listening • Collaborative Filtering
Outline
Content&BasedCollective)Intelligence
PersonalizedFiltering
The Music Genome Project
>1.5 Million tracks manually analyzed
~400 attributes per track
https://static01.nyt.com/images/2009/10/18/magazine/18pandora.2-‐650.jpg
Attribute Examples
Breathy Voice Nasal Voice Odd Meter Has Banjo Joyful Lyrics
…
Recommending Music using the MGP
Ranked Tracks
Attributes
Query Track
Recommending Music using the MGPEXAMPLE
Arhst Title
Query Track Meshuggah Bleed
Ranked 1 Meshuggah Lethargica
Ranked 2 Meshuggah Pravus
Ranked 5 Mithras Into Black Holes Of Oblivion
Arhst Title
Query Track The Beatles While My Guitar Gently Weeps
Ranked 1 Paul McCartney Freedom
Ranked 2 Badly Drawn Boy What Is It Now
Ranked 3 Jefferson Starship Fading Lady Light
Recommending Music using the MGPEXAMPLE
Arhst Title
Query Track The Beatles While My Guitar Gently Weeps
Ranked 1 Paul McCartney Freedom
Ranked 2 Badly Drawn Boy What Is It Now
Ranked 3 Jefferson Starship Fading Lady Light
Recommending Music using the MGPEXAMPLE
Arhst Title
Query Track The Beatles While My Guitar Gently Weeps
Ranked 1 Paul McCartney Freedom
Ranked 2 Badly Drawn Boy What Is It Now
Ranked 3 Jefferson Starship Fading Lady Light
Recommending Music using the MGPEXAMPLE
USING DEEP LEARNINGEstimating MGP Attributes
http://blogs-‐images.forbes.com/kevinmurnane/files/2016/03/google-‐deepmind-‐artificial-‐intelligence-‐2-‐970x0-‐970x646.jpg
USING DEEP LEARNINGEstimating MGP Piano Attribute
http://blogs-‐images.forbes.com/kevinmurnane/files/2016/03/google-‐deepmind-‐artificial-‐intelligence-‐2-‐970x0-‐970x646.jpg
https://static.squarespace.com/static/539a01fae4b053c5aa5435dd/t/539a2575e4b0e16ced73e69f/1402611061234/11357509244_0e8af7a4ff_o.jpg
USING DEEP LEARNINGPiano Detector
USING DEEP LEARNINGPiano Detector
128 bins
N N N
Patch 1 Patch 3 Patch 2
DEEP ARCHITECTUREPiano Detector
128
N
X
DEEP ARCHITECTUREPiano Detector
256
128
N
X
f(X, ✓) = �(W ⇤X + b)
Convolutional Layer
DEEP ARCHITECTUREPiano Detector
256
128
N
X256
f(X, ✓) = �2(W2 ⇤ �1(W1 ⇤X + b1) + b2)
Convolutional Layers
DEEP ARCHITECTUREPiano Detector
128
N
X
256256
512 512
Convolutional Layers
f(X, ✓) = �4(W4 ⇤ . . . ⇤ �1(W1 ⇤X + b1) + . . .+ b4)
DEEP ARCHITECTUREPiano Detector
128
N
X
256256
512 512
Convolutional Layers
2048
Dense Layer
f(X, ✓) = �5(W1�4(W4 ⇤ . . . ⇤ �1(W1 ⇤X + b1) + . . .+ b5)
DEEP ARCHITECTUREPiano Detector
128
N
X
256256
512 512
Convolutional Layers
2048
f(X, ✓) = �6(W2�5(W1�4(W4 ⇤ . . . ⇤ �1(W1 ⇤X + b1) + . . . b6)
2048
Dense Layers
DEEP ARCHITECTUREPiano Detector
256256
512 512
2048
2048128
N
Convolutional Layers Dense Layers
f(X, ✓) = �7(W3�6(W2�5(W1�4(W4 ⇤ . . . ⇤ �1(W1 ⇤X + b1) + . . .+ b7)
TRAININGPiano Detector
480k Patches
60k Patches
60k Patches
Training
Validation
Test(Different model)
ESTIMATING FULL TRACKSPiano Detector
RESULTSPiano Detector
OUTPUT EXAMPLESPiano Detector
Then, by Dewey Redman Quartet.
OUTPUT EXAMPLESPiano Detector
Contract On The World Love Jam by Public Enemy
• Pandora Overview
• Music Informatics Overview
• Large Scale Music Recommendation with Deep Learning: • Machine Listening • Collaborative Filtering
Outline
Content&BasedCollective)Intelligence
PersonalizedFiltering
PROBLEM OVERVIEWCollaborative Filtering
Users
? ?
? ? ?
?
? ?
? ?[Items (Tracks) [
Explicit
Thumbs (up and down)
Station Creation
Implicit
Track Completion
Track Skips
LATENT FACTORSCollaborative Filtering
AggressiveCalm
Simple Harmony
Complex Harmony
PROBLEM FORMULATIONCollaborative Filtering
qi 2 Rf
Given Item i and User u:
pu 2 Rf
Rating:
r̂ui = qTi pu
rui
Item Latent Factor:
User Latent Factor:
Rating Approximation:
Across all data:
? ?? ? ?
?? ?
? ?[ [Users
Item
s (Tr
acks
)
⇡[ [ [[f
f
Users
Item
s (Tr
acks
)
MATRIX FACTORIZATIONCollaborative Filtering
argminq⇤,p⇤X
u,i2S(rui � qTi pu)
2+�(||qi||2 + ||pu||2)
(Koren et al., 2009)
? ?? ? ?
?? ?
? ?[ [Users
Item
s (Tr
acks
)
⇡[ [ [[f
f
Users
Item
s (Tr
acks
)
Collaborative Filtering
Arhst Title
Query Track The Beatles While My Guitar Gently Weeps
Ranked 1 The Beatles A Day In The Life
Ranked 2 The Beatles A Day In The Life (Love Version)
Ranked 3 The BeatlesWhile My Guitar Gently Weeps
(Love Version)
EXAMPLE
THE GOOD AND THE BADCollaborative Filtering
Rich preference-‐driven similarity space Latent space is generally not interpretable
Powerful at matching the right song with the right listener
Can only recommend items that have already been rated
Content&BasedCollective)Intelligence
PersonalizedFiltering
WITH DEEP LEARNINGApproximating Item Factors
128
N
f
256256
512 512
2048
2048
Convolutional Layers Dense Layers
(van den Oord et al., 2013)
TRAININGApproximating Item Factors
700k Patches
70k Patches
70k Patches
Training
Validation
Test
ESTIMATING FULL TRACKSApproximating Item Factors
Recommending from Learned Factors
Arhst Title
Query Track La Bossa d’Urina Mama i Papa
Ranked 1 Johnny Cash I'm Going To Memphis
Ranked 2 Merle Haggard The Woman Made A Fool Out Of Me
Ranked 3 Waylon Jennings Lonesome On'ry And Mean
EXAMPLE
Content&BasedCollective)Intelligence
PersonalizedFiltering
? ?
? ? ?
?
? ?
? ?
ENSEMBLE OF RECOMMENDERS MAY PRODUCE OPTIMAL RECOMMENDATIONS
MAN vs MACHINE?
MAN + MACHINE
MAN + MACHINE“Mix,of,Art,and,Science”
THANKS!
References
Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kavukcuoglu, K. (2016). Wavenet: A Generative Model for Raw Audio, 846–849. Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-‐Scale Image Recognition. International Conference on Learning Representations, 1–14. http://doi.org/10.1016/j.infsof.2008.09.005 Mcfee, B., Humphrey, E. J., & Bello, J. P. (2015). A Software Framework For Musical Data Augmentation. In Proc. of the 16th International Society for Music Information Retrieval Conference (pp. 248–254). Málaga, Spain. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49. http://doi.org/10.1109/MC.2009.263 Oord, A. Van Den, Dieleman, S., & Schrauwen, B. (2013). Deep Content-‐based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651. Retrieved from http://papers.nips.cc/paper/5004-‐deep-‐content-‐based-‐music-‐recommendation