HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020....
Transcript of HitPredict: Using Spotify Data to Predict Billboard Hitsegeorgie/HitPredict/ICML2020.pdf · 2020....
HitPredict: Using Spotify Data to Predict
Billboard Hits
1Stanford University Center for Computer Research in Music and Acoustics (CCRMA)2Stanford University Department of Civil and Environmental Engineering
E LENA GEORGIEVA1
M ARCELLA S UTA 2 , N I CHOLAS B URTON 2
Overview
§ Inspiration§ Spotify Audio Features§ Dataset Selection§ Machine Learning Approaches§ Next Steps
We approach the “Hit Song Science” problem, aimingto predict which songs will become Billboard Hits
Inspiration
Spotify Audio Features - NY TimesThe New York Times used Spotify’s API to gather information on songs,
1. Loudness: “Volume of the song”2. Energy: “How fast and noisy the song sounds”3. Danceability: “Strength and regularity of the beat”4. Acousticness: “Likelihood that the song uses acoustic instruments5. Valence: “How cheerful the song sounds”
Spotify Audio Features - OthersThe New York Times chose to omit several available features from the Spotify API:
1. Speechiness: “How much spoken words are in a track”2. Instrumentalness: “Detects whether a track contains no vocals”3. Liveness: “Detects whether the track was performed live”4. Tempo: “The beats per minute of a track”5. Duration: “Duration of the track in minutes”6. Mode (Major/Minor)7. Key or Tonality8. Time Signature
6
Top Songs
Using Spotify Audio Features to Study !the Evolution of Pop Music!
!
Elena Georgieva and Blair Kaneshiro Center for Computer Research in Music and Acoustics (CCRMA)
Stanford University
WiMIR 1st Annual Workshop | Paris, France 2018 Contact: [email protected]
AbstractPopular music is a symbol of culture, and is often looked at as a symbol of a time period or a generation. While there is much research on the evolution of pop music, most such research is anecdotal rather than scientific in nature.3 We investigate the top 5 songs on the Billboard Hot 100 in the first week of September of the years 2018, 2008, 1998, and 1988.1 Nine audio features were taken from Spotify’s API.2 Initial observations show that on average, popular songs are becoming more dance-able, louder, and shorter in length. Notably, tracks are showing more variety after a heavy similarity across all features in 2008.
Data
DiscussionEach of the first 6 features is scaled [0, 1] where 0.5 is the average amount of that quality across all tracks on Spotify. Looking at these sonic footprints, the five top tracks are rather varied in 1988, 1998, and 2018, but shockingly similar in 2008. The ‘Valence’ and ‘Energy’ of top tracks seems to be decreasing as time progresses, but the ‘Danceability’ is increasing. Values of ‘Liveness,’ ‘Acousticness, and ‘Speechiness’ are low across all tracks. The last 3 features are quantitative: Tempo (BPM), song length, and average loudness (dB). Song length of top tracks has been steadily decreasing over the years, while loudness increased until 2008, but calmed down somewhat in 2018. Song tempo values varied very little in 2008, when tracks seemed to be most similar.
FurtherResearchIn the future, we will look at the initial observations more in-depth, and investigate potential causes and repercussions of the observed trends. Furthermore, we will look at top songs in 5 year increments and dating back further to the 1960s. This research can have potential to predict trends in future popular music, as well as predict what styles and recorded tracks will be commercially successful.
References1. Billboard. (2018). Billboard Hot 100 Chart. Retrieved from:
https://www.billboard.com/charts/hot-100 2. Chinoy, S. and Ma, J. (2018). Why Songs of the Summer
Sound the Same. Nytimes.com. Retrieved from: https://www.nytimes.com/interactive/2018/08/09/opinion/do-songs-of-the-summer-sound-the-same.html
3. Mauch, M., MacCallum, R. M., Levy, M., and Leroi, A. M. (2015). The Evolution of Popular Music: USA 1960–2010. R. Soc. open sci.
0
1
2
3
4
5
6
7
year 2018 year 2008 year 1998 year 1988
-16 -14 -12 -10 -8 -6 -4 -2 0
year 2018 year 2008 year 1998 year 1988
0
0.5
1Dance
Energy
Speech
Acoustic
Liveness
Valence "The Boy is Mine" Brady & Monica "My Way" Usher
"The First Night" Monica "Crush" Jennifer Paige "Never Ever" All Saints
0
0.5
1Dance
Energy
Speech
Acoustic
Liveness
Valence "Monkey" George Michael
"I Don't Wanna Go On With You Like That" Elton John "I Don't Wanna Live Without Your Love" Chicago "Sweet Child O' Mine" Guns N' Roses "Simply Irresistible" Robert Palmer
Top Songs: September 2018
Top Songs: September 1998 Top Songs: September 1988
Song Tempo (BPM) Song Length (Minutes)
Loudness (dB)
0
0.5
1 Dance
Energy
Speech
Acoustic
Liveness
Valence "In My Feelings" Drake "I Like It" Cardi B.
"Girls Like You" Maroon5 "Fefe" 6ix9ine
"Better Now" Post Malone
0
0.5
1 Dance
Energy
Speech
Acoustic
Liveness
Valence "Disturbia" Rihanna "Crush" David Archuleta "Forever" Chris Brown "I Kissed A Girl" Katy Perry "Viva La Vida" Coldplay
Top Songs: September 2008
0 20 40 60 80
100 120 140 160 180
year 2018 year 2008 year 1998 year 1988
7
Top Songs
Using Spotify Audio Features to Study !the Evolution of Pop Music!
!
Elena Georgieva and Blair Kaneshiro Center for Computer Research in Music and Acoustics (CCRMA)
Stanford University
WiMIR 1st Annual Workshop | Paris, France 2018 Contact: [email protected]
AbstractPopular music is a symbol of culture, and is often looked at as a symbol of a time period or a generation. While there is much research on the evolution of pop music, most such research is anecdotal rather than scientific in nature.3 We investigate the top 5 songs on the Billboard Hot 100 in the first week of September of the years 2018, 2008, 1998, and 1988.1 Nine audio features were taken from Spotify’s API.2 Initial observations show that on average, popular songs are becoming more dance-able, louder, and shorter in length. Notably, tracks are showing more variety after a heavy similarity across all features in 2008.
Data
DiscussionEach of the first 6 features is scaled [0, 1] where 0.5 is the average amount of that quality across all tracks on Spotify. Looking at these sonic footprints, the five top tracks are rather varied in 1988, 1998, and 2018, but shockingly similar in 2008. The ‘Valence’ and ‘Energy’ of top tracks seems to be decreasing as time progresses, but the ‘Danceability’ is increasing. Values of ‘Liveness,’ ‘Acousticness, and ‘Speechiness’ are low across all tracks. The last 3 features are quantitative: Tempo (BPM), song length, and average loudness (dB). Song length of top tracks has been steadily decreasing over the years, while loudness increased until 2008, but calmed down somewhat in 2018. Song tempo values varied very little in 2008, when tracks seemed to be most similar.
FurtherResearchIn the future, we will look at the initial observations more in-depth, and investigate potential causes and repercussions of the observed trends. Furthermore, we will look at top songs in 5 year increments and dating back further to the 1960s. This research can have potential to predict trends in future popular music, as well as predict what styles and recorded tracks will be commercially successful.
References1. Billboard. (2018). Billboard Hot 100 Chart. Retrieved from:
https://www.billboard.com/charts/hot-100 2. Chinoy, S. and Ma, J. (2018). Why Songs of the Summer
Sound the Same. Nytimes.com. Retrieved from: https://www.nytimes.com/interactive/2018/08/09/opinion/do-songs-of-the-summer-sound-the-same.html
3. Mauch, M., MacCallum, R. M., Levy, M., and Leroi, A. M. (2015). The Evolution of Popular Music: USA 1960–2010. R. Soc. open sci.
0
1
2
3
4
5
6
7
year 2018 year 2008 year 1998 year 1988
-16 -14 -12 -10
-8 -6 -4 -2 0
year 2018 year 2008 year 1998 year 1988
0
0.5
1Dance
Energy
Speech
Acoustic
Liveness
Valence "The Boy is Mine" Brady & Monica "My Way" Usher
"The First Night" Monica "Crush" Jennifer Paige "Never Ever" All Saints
0
0.5
1Dance
Energy
Speech
Acoustic
Liveness
Valence "Monkey" George Michael
"I Don't Wanna Go On With You Like That" Elton John "I Don't Wanna Live Without Your Love" Chicago "Sweet Child O' Mine" Guns N' Roses "Simply Irresistible" Robert Palmer
Top Songs: September 2018
Top Songs: September 1998 Top Songs: September 1988
Song Tempo (BPM) Song Length (Minutes)
Loudness (dB)
0
0.5
1 Dance
Energy
Speech
Acoustic
Liveness
Valence "In My Feelings" Drake "I Like It" Cardi B.
"Girls Like You" Maroon5 "Fefe" 6ix9ine
"Better Now" Post Malone
0
0.5
1 Dance
Energy
Speech
Acoustic
Liveness
Valence "Disturbia" Rihanna "Crush" David Archuleta "Forever" Chris Brown "I Kissed A Girl" Katy Perry "Viva La Vida" Coldplay
Top Songs: September 2008
0 20 40 60 80
100 120 140 160 180
year 2018 year 2008 year 1998 year 1988
Hit Song ScienceIndustry§ The Echo Nest§ ChartMetric§ Next Big Sound
Academia§ International Society for Music Information Retrieval (ISMIR) Conference
HitPredict: Using Spotify Data to Predict
Billboard Hits E LENA GEORGIEVA1
M ARCELLA S UTA 2 , N I CHOLAS B URTON 2
Step 1 – Data CollectionBillboard Hits § All unique songs featured on “Billboard Hot 100” § 1990- 2018§ Billboard API Library§ Dataset:
› Artist name, song title, other misc. features
Step 1 – Data CollectionNon-Hit Songs§ Million Song Dataset (labROSA, Columbia University) § 1990- 2018
Step 1 – Data Collection
~4000 songsLabeled 1 (Hit) or 0 (Non-Hit)
All Together§ Remove Overlapping songs § Balance Datasets
Step 2 – Feature CollectionAudio Features§ Spotify Web API § Chose 9 audio features:
› Danceability, Energy, Speechiness, Acousticness, Instrumentalness, Liveness, Valence, Loudness, and Tempo
Step 2 – Feature CollectionAn Additional Feature: “Artist Score”§ Whether or not the artist has a previous Billboard Hit§ Back to 1987 § Using Billboard API Library
Overview
Step 3 – Classification
Figure. A plot of songs’ danceability vs. energy vs. loudness (dB). Black circles represent Billboard hits and red marks represent non-hits.
Step 3 – ClassificationSupervised Learning§ Logistic Regression (LR)§ Gaussian Discriminant Analysis (GDA)§ Neural Network (NN)
› 1 hidden layer of 6 units› Sigmoid activation function› L2 regularization to avoid over-fitting
§ Training/ Testing -- 75/25 split
Some Results
§ LR and GDA yielded accuracies of 75.9% and 73.7%, respectively, against the testing data with similar accuracy against the training data indicating no overfitting
Neural Network
• The NN gives similar accuracy to LR, but interestingly generates significantly higher precision. This shows the robustness of the NN prediction.
• The peak accuracy: ~19000 epochs.
Error AnalysisAblative Analysis• Ablative analysis was used, the features at the end of the list decreased the
accuracy of predictions and were removed.
Time• We divided the data into subsets of five- year periods and split each
subset into training and validation sets (80/20).• In most cases, the accuracy on both the training and validation set
improved, implying that the features of pop music are somewhat unique to the time period of the songs release.
Accuracy on the validation set for specific time periods. Accuracy improves for individual time periods, indicating that hit songs have features unique to their time period. 50
556065707580859095100
1990-2018
1990-1994
1995-1999
2000-2004
2005-2009
2010-2014
2015-2018
Logistic RegressionNeural Network
Conclusion & Future Work
• Why do songs in a given time period hold trends?• Social culture? Commercial Influences?
• “External factors”, difficult to quantify but may be very important in predicting a song’s Billboard success.
• Do we… want this?
Do we… want this?
NY Times: “Why Songs of the Summer Sound the Same”
HitPredict
Thanks to my Collaborators Marcella Suta and Nicholas Burton! Thanks to Blair Kaneshiro
ReferencesBertin-Mahieux, T., Ellis, D. P. W., Whitman, B., and Lamere, P. The Million Song Dataset. In Proceedings of
International Society for Music Information Retrieval, 2011. Chinoy, S. and Ma, J. Why songs of the summer sound the same. New York Times, 2018. Dhanaraj, R. and Logan, B. Automatic prediction of hit songs. In Proceedings of International Society for
Music Information Retrieval, 2005. Guo, A. Python API for Billboard data. github.com. retrieved from: https://pypi.org/project/billboard.py/. Mauch, M., MacCallum, R. M., Levy, M., and Leroi, A. M. The evolution of popular music: USA 1960-2010. In
Royal Society Open Science, 2015. Ni, Y. and Santos-Rodriguez, R. Hit song science once again a science. In International Workshop on Machine
Learning and Music, 2011. Pachet, F. and Roy, P. Hit song science is not yet a sci- ence. In Proceedings of International Society for
Music Information Retrieval, 2008. Singhi, A. and Brown, D. G. Hit song detection using lyric features alone. In Proceedings of International
Society for Music Information Retrieval, 2014. Yang, L.-C., Chou, S.-Y., Liu, J.-Y., Yang, Y.-H., and Chen, Y.-A. Revisiting the problem of audio-based hit
song prediction using convolutional neural networks. In IEEE International Conference on Acoustics, Speech and Sig- nal Processing (ICASSP), 2017.
Zangerle, E., Pichl, M., Hupfauf, B., and Specht, G. Can microblogs predict music charts? an analysis of the rela- tionship between #nowplaying tweets and music charts. In Proceedings of International Society for Music Infor- mation Retrieval, 2016.
HitPredict: Using Spotify Data to Predict
Billboard Hits E G E O R G I E @ C C R M A . S TA N F O R D . E D U
THANK YOU!