2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang （張智星）...

112/04/21 1

Music Information Retrieval:Overview and Challenges

J.-S. Roger Jang （張智星）Multimedia Information Retrieval (MIR) Lab

CSIE Dept, National Taiwan Univ.

http://mirlab.org/jang

Outline

Music information Retrieval (MIR) Intro to MIR Intro to ISMIR & MIREX

Two classical paradigms of MIR QBSH (query by singing/humming) AFP (audio fingerprinting)

Conclusions

Introduction to QBSH

QBSH: Query by Singing/Humming Input: Singing or humming from microphone Output: A ranked list retrieved from the song database

according to similarity to the query

Progression First paper: Around 1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX, since

Two Steps in QBSH

Pitch Tracking To detect the period of a

waveform Time domain (時域 )

ACF (Autocorrelation function)

NSDF (Normalized squared difference function)

AMDF (Average magnitude difference function)

Frequency domain (頻域 )Harmonic product spectrumCepstrum

Database comparison To find similarity between

query and database songs Linear scaling Dynamic time warping Recursive alignment Hybrid methods

Frame Blocking for Pitch Tracking

Sample rate = 16 kHzFrame size = 512 samplesFrame duration = 512/16000 = 0.032 s = 32 msOverlap = 192 samplesHop size = frame size – overlap = 512-192 = 320 samplesFrame rate = 16000/320 = 50 frames/sec = Pitch rate

0 50 100 150 200 250 300-0.4

Zoom in

Overlap

0 500 1000 1500 2000 2500-0.4

ACF: Auto-correlation Function

Shifted frame s(t-):

Original frame s(t):

=30 acf(30) = inner product of the overlap part

Pitch period

To play safe, the frame size needs to cover at least two fundamental periods!

acf s t s t

Frequency to Semitone Conversion

Semitone : A music scale based on A440

Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )

log12 2

semitone

Pitch related demos Pitch tracking Pitch shift

Basic Comparison Method:Linear Scaling

Scale the query pitch linearly to match the candidates

Original input pitch

Stretched by 1.25

Stretched by 1.5

Compressed by 0.75

Compressed by 0.5

Target pitch in database

Best match

Original pitch

Typical Result of Pitch Tracking

Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音

Comparison of Pitch VectorsYellow line : Target pitch vector

QBSH Demos

QBSH demos by our lab Description QBSH on the web: MIRACLE QBSH on toys

Existing commercial QBSH systems www.midomi.com www.soundhound.com

Our QBSH System: MiracleSingle server with GPU

NVIDIA 560 Ti, 384 cores (speedup factor = 10)

Master server

Clients Single server

PDA/Smartphone

Cellular

Master serverRequest: pitch vector

Response: search result

Database size: ~20,000

Improving QBSH

Many ways to improve QBSH Sorted error vector Various weight for rests Re-ranking for better accuracy Better memory arrangement in GPU …

Intro to Audio Fingerprinting (AFP)

Goal Identify a noisy version of a given audio clips

Also known as… “Query by exact example” no “cover versions”

are allowed

AFP Applications

Commercial applications of AFP Music identification & purchase Royalty assignment (over radio) TV shows or commercials ID (over TV) Copyright violation (over web)

Major commercial players Shazam, Soundhound, Intonow, Viggle…

Two Stages in AFP

Offline Feature extraction Hash table construction

for songs in database Inverted indexing

Online Feature extraction Hash table search Ranked list of the

retrieved songs/music

Robust Feature Extraction

Various kinds of features for AFP Invariance along time and frequency Landmark of a pair of local maxima Wavelets …

Extensive test required for choosing the best features

Representative Approaches to AFP

Philips J. Haitsma and T.

Kalker, “A highly robust audio fingerprinting system”, ISMIR 2002.

Shazam A.Wang, “An industrial-

strength audio search algorithm”, ISMIR 2003

Google S. Baluja and M. Covell,

“Content fingerprinting using wavelets”, Euro. Conf. on Visual Media Production, 2006.

V. Chandrasekhar, M. Sharifi, and D. A. Ross, “Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications”, ISMIR 2011

Improvement on AFP

Re-ranking of AFP by learning to rankDemo:

http://mirlab.org/demo/audioFingerprinting

Shazam’s Method

Ideas Take advantage of music local structures

Find salient peaks on spectrogramPair peaks to form landmarks for comparison

Efficient search by hash tablesUse positions of landmarks as hash keysUse song ID and offset time as hash valuesUse time constraints to find matched landmarks

How to Find Salient Peaks

We need to find peaks that are salient along both frequency and time axes Frequency axis: Gaussian local smoothing Time axis: Decaying threshold over time

How to Find Initial Threshold?

Goal To suppress neighboring

Ideas Find the local max. of mag.

spectra of initial 10 frames Superimpose a Gaussian on

each local max. Find the max. of all

Gaussians 50 100 150 200 2500

Original signal

Positive local maximaFinal output

How to Update the Threshold along Time?

Decay the threshold Find local maxima larger

than the threshold salient peaks

Define the new threshold as the max of the old threshold and the Gaussians passing through the active local maxima

Time-decaying Thresholds

Frame index

Forward pass

200 400 600 800 1000 1200

Frame index

Backward pass

200 400 600 800 1000 1200

Forward:

Backward:

How to Pair Salient Peaks?

Target zone

Salient Peaks and Landmarks

Peak picking after forward smoothing

Matched landmarks (green)

(Source: Dan Ellis)

Landmarks for Hash Table Access

Optimization Strategies for AFP

Several ways to optimize AFP Strategy for query landmark extraction Confidence measure Incremental retrieval Better use of the hash table Re-ranking for better performance

Demos of Audio Fingerprinting

Commercial apps Shazam Soundhound

Our demo http://mirlab.org/demo/audioFingerprinting

QBSH vs. AFP

QBSH Goal: MIR Feature: Pitch

PerceptibleSmall data size

Method: LS Database

Harder to collectSmall storage

BottleneckCPU/GPU-bound

AFP Goal: MIR Features: Landmarks

Not perceptibleBig data size

Method: Matched LM Database

Easier to collectLarge storage

BottleneckI/O-bound

Conclusions

Successful applications in MIR QBSH AFP

Due to Faster bigger memory Advances in GPU/CPU

(Moore’s law) New machine learning

methods

Challenges in MIR Audio melody extraction

from polyphonic musicDatabase collection for

QBSHCover song ID (which

cannot handled by AFP)

Polyphonic music transcription

Thank you for your attention!

Questions & comments?

2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang （張智星）...

Documents

Transcript of 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang （張智星）...

HORAIRE / SCHEDULE Chong Lee cahier de... · 2016-07-02 · taegeuk – ee jang taegeuk – sa jang taegeuk – yuk jang taegeuk – pal jang taegeuk – sam jang taegeuk – oh jang

Binomial Heaps Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

Team: CSIE Department, National Taiwan University

Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

Cap1-Macro Csie An2 Popescu2

DTW for QBSH J.-S Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept. National Taiwan University.

SAIN MARTON - Magyar Elektronikus Könyvtármek.oszk.hu/05000/05052/pdf/01_sain_1-12.pdfCsen Luan 342 LiJe 342 CsuSi-csie 343 Jang Huj 344 A kínai mértékegységek 344 A kínai matematika

Sorting Algorithms Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

Curs1 Csie Ie

Analysis Tools Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org Multimedia Information Retrieval.

CUP - Taekwondo Québec … · Taegeuk—Ee Jang Taegeuk—Sa Jang Taegeuk—Yuk Jang Taegeuk—Pal Jang Taegeuk—Sam Jang Taegeuk—Oh Jang Taegeuk—Chil Jang Palgue—Koryo •

Index Maltese for CSIE - EENET

Experiments with MATLAB Mandelbrot Set Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University jang@mirlab.org .

CSIE 2 - Master 2014

Demos for QBSH J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw CSIE Dept, National Taiwan University.

2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ. .

NTU CSIE Computer Networks 2009 Spring

Macroeconomie CSIE rezolvari

NTU CSIE 有獎徵答