2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星)...
-
Upload
silvester-stone -
Category
Documents
-
view
255 -
download
1
Transcript of 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星)...
![Page 1: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/1.jpg)
112/04/21 1
Music Information Retrieval:Overview and Challenges
J.-S. Roger Jang (張智星)Multimedia Information Retrieval (MIR) Lab
CSIE Dept, National Taiwan Univ.
http://mirlab.org/jang
![Page 2: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/2.jpg)
-2-
Outline
Music information Retrieval (MIR) Intro to MIR Intro to ISMIR & MIREX
Two classical paradigms of MIR QBSH (query by singing/humming) AFP (audio fingerprinting)
Conclusions
![Page 3: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/3.jpg)
-3-
Introduction to QBSH
QBSH: Query by Singing/Humming Input: Singing or humming from microphone Output: A ranked list retrieved from the song database
according to similarity to the query
Progression First paper: Around 1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX, since
2006
![Page 4: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/4.jpg)
-4-
Two Steps in QBSH
Pitch Tracking To detect the period of a
waveform Time domain (時域 )
ACF (Autocorrelation function)
NSDF (Normalized squared difference function)
AMDF (Average magnitude difference function)
Frequency domain (頻域 )Harmonic product spectrumCepstrum
Database comparison To find similarity between
query and database songs Linear scaling Dynamic time warping Recursive alignment Hybrid methods
![Page 5: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/5.jpg)
-5-
Frame Blocking for Pitch Tracking
Sample rate = 16 kHzFrame size = 512 samplesFrame duration = 512/16000 = 0.032 s = 32 msOverlap = 192 samplesHop size = frame size – overlap = 512-192 = 320 samplesFrame rate = 16000/320 = 50 frames/sec = Pitch rate
0 50 100 150 200 250 300-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
![Page 6: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/6.jpg)
-6-
ACF: Auto-correlation Function
Shifted frame s(t-):
Original frame s(t):
=30 acf(30) = inner product of the overlap part
Pitch period
To play safe, the frame size needs to cover at least two fundamental periods!
1n
t
acf s t s t
![Page 7: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/7.jpg)
-7-
Frequency to Semitone Conversion
Semitone : A music scale based on A440
Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )
69440
log12 2
freq
semitone
![Page 8: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/8.jpg)
-8-
Demos
Pitch related demos Pitch tracking Pitch shift
![Page 9: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/9.jpg)
-9-
Basic Comparison Method:Linear Scaling
Scale the query pitch linearly to match the candidates
Original input pitch
Stretched by 1.25
Stretched by 1.5
Compressed by 0.75
Compressed by 0.5
Target pitch in database
Best match
Original pitch
![Page 10: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/10.jpg)
-10-
Typical Result of Pitch Tracking
Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音
![Page 11: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/11.jpg)
-11-
Comparison of Pitch VectorsYellow line : Target pitch vector
![Page 12: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/12.jpg)
-12-
QBSH Demos
QBSH demos by our lab Description QBSH on the web: MIRACLE QBSH on toys
Existing commercial QBSH systems www.midomi.com www.soundhound.com
![Page 13: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/13.jpg)
-13-
Our QBSH System: MiracleSingle server with GPU
NVIDIA 560 Ti, 384 cores (speedup factor = 10)
Master server
Clients Single server
PC
PDA/Smartphone
Cellular
Master serverRequest: pitch vector
Response: search result
Database size: ~20,000
![Page 14: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/14.jpg)
-14-
Improving QBSH
Many ways to improve QBSH Sorted error vector Various weight for rests Re-ranking for better accuracy Better memory arrangement in GPU …
![Page 15: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/15.jpg)
-15-
Intro to Audio Fingerprinting (AFP)
Goal Identify a noisy version of a given audio clips
Also known as… “Query by exact example” no “cover versions”
are allowed
![Page 16: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/16.jpg)
-16-
AFP Applications
Commercial applications of AFP Music identification & purchase Royalty assignment (over radio) TV shows or commercials ID (over TV) Copyright violation (over web)
Major commercial players Shazam, Soundhound, Intonow, Viggle…
![Page 17: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/17.jpg)
-17-
Two Stages in AFP
Offline Feature extraction Hash table construction
for songs in database Inverted indexing
Online Feature extraction Hash table search Ranked list of the
retrieved songs/music
![Page 18: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/18.jpg)
-18-
Robust Feature Extraction
Various kinds of features for AFP Invariance along time and frequency Landmark of a pair of local maxima Wavelets …
Extensive test required for choosing the best features
![Page 19: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/19.jpg)
-19-
Representative Approaches to AFP
Philips J. Haitsma and T.
Kalker, “A highly robust audio fingerprinting system”, ISMIR 2002.
Shazam A.Wang, “An industrial-
strength audio search algorithm”, ISMIR 2003
Google S. Baluja and M. Covell,
“Content fingerprinting using wavelets”, Euro. Conf. on Visual Media Production, 2006.
V. Chandrasekhar, M. Sharifi, and D. A. Ross, “Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications”, ISMIR 2011
![Page 20: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/20.jpg)
-20-
Improvement on AFP
Re-ranking of AFP by learning to rankDemo:
http://mirlab.org/demo/audioFingerprinting
![Page 21: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/21.jpg)
-21-
Shazam’s Method
Ideas Take advantage of music local structures
Find salient peaks on spectrogramPair peaks to form landmarks for comparison
Efficient search by hash tablesUse positions of landmarks as hash keysUse song ID and offset time as hash valuesUse time constraints to find matched landmarks
![Page 22: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/22.jpg)
-22-
How to Find Salient Peaks
We need to find peaks that are salient along both frequency and time axes Frequency axis: Gaussian local smoothing Time axis: Decaying threshold over time
![Page 23: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/23.jpg)
-23-
How to Find Initial Threshold?
Goal To suppress neighboring
peaks
Ideas Find the local max. of mag.
spectra of initial 10 frames Superimpose a Gaussian on
each local max. Find the max. of all
Gaussians 50 100 150 200 2500
0.5
1
1.5
2
2.5
3
3.5
4
Original signal
Positive local maximaFinal output
![Page 24: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/24.jpg)
-24-
How to Update the Threshold along Time?
Decay the threshold Find local maxima larger
than the threshold salient peaks
Define the new threshold as the max of the old threshold and the Gaussians passing through the active local maxima
![Page 25: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/25.jpg)
-25-
Time-decaying Thresholds
Frame index
Fre
q in
dex
Forward pass
200 400 600 800 1000 1200
50
100
150
200
250
1
2
3
4
5
Frame index
Fre
q in
dex
Backward pass
200 400 600 800 1000 1200
50
100
150
200
250
1
2
3
4
5
Forward:
Backward:
![Page 26: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/26.jpg)
-26-
How to Pair Salient Peaks?
Target zone
![Page 27: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/27.jpg)
-27-
Salient Peaks and Landmarks
Peak picking after forward smoothing
Matched landmarks (green)
(Source: Dan Ellis)
![Page 28: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/28.jpg)
-28-
Landmarks for Hash Table Access
![Page 29: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/29.jpg)
-29-
Optimization Strategies for AFP
Several ways to optimize AFP Strategy for query landmark extraction Confidence measure Incremental retrieval Better use of the hash table Re-ranking for better performance
![Page 30: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/30.jpg)
-30-
Demos of Audio Fingerprinting
Commercial apps Shazam Soundhound
Our demo http://mirlab.org/demo/audioFingerprinting
![Page 31: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/31.jpg)
-31-
QBSH vs. AFP
QBSH Goal: MIR Feature: Pitch
PerceptibleSmall data size
Method: LS Database
Harder to collectSmall storage
BottleneckCPU/GPU-bound
AFP Goal: MIR Features: Landmarks
Not perceptibleBig data size
Method: Matched LM Database
Easier to collectLarge storage
BottleneckI/O-bound
![Page 32: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/32.jpg)
-32-
Conclusions
Successful applications in MIR QBSH AFP
Due to Faster bigger memory Advances in GPU/CPU
(Moore’s law) New machine learning
methods
Challenges in MIR Audio melody extraction
from polyphonic musicDatabase collection for
QBSHCover song ID (which
cannot handled by AFP)
Polyphonic music transcription
![Page 33: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan.](https://reader036.fdocuments.net/reader036/viewer/2022081417/5697bf771a28abf838c815d2/html5/thumbnails/33.jpg)
-33-
Thank you for your attention!
Questions & comments?