「應建立食品追溯追蹤系統之食品業者」 QA ... · 蹤管理資訊系統─非追不可(第9 條第3 項)。 3. 應使用統一發票之食品業者應使用電子發票(第9
Pitch Tracking ( 音高追蹤 )
description
Transcript of Pitch Tracking ( 音高追蹤 )
![Page 1: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/1.jpg)
Pitch Tracking (音高追蹤 )
Jyh-Shing Roger Jang (張智星 )
MIR Lab, Dept of CSIE
National Taiwan University
http://mirlab.org/jang
![Page 2: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/2.jpg)
Pitch (音高)Definition of pitch
Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform
Pitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later)
Characteristics of pitch Noise and unvoiced sounds do not have pitch.
![Page 3: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/3.jpg)
Pitch Tracking (音高追蹤 ) Pitch tracking (PT): The process of computing the
pitch vector of a give audio segment (對整段音訊求取音高 )
Sample applications Query by singing/humming (哼唱選歌 ) Tone recognition for Mandarin (華語的音調辨識 ) Intonation scoring for English (英語的音調評分 ) Prosody analysis for speech synthesis (語音合成中的韻律分析 )
Pitch scaling and duration modification (音高調節與長度改變 )
![Page 4: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/4.jpg)
Typical Steps for Pitch Tracking
Pre-processing Filtering Excitation extraction
Main processing Frame blocking PDF (periodicity
detection function) computation
Pitch candidates via max picking over PDF
Post-processing Unreliable pitch removal
via volume/clarity thresholding
Pitch refinement via parabolic interpolation
Pitch smoothing via median filters, etc.
![Page 5: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/5.jpg)
Frame Blocking
Sample rate = 16 kHzFrame size = 512 samplesFrame duration = 512/16000 = 0.032 s = 32 msOverlap = 192 samplesHop size = frame size – overlap = 512-192 = 320 samplesFrame rate = 16000/320 = 50 frames/sec = Pitch rate
0 50 100 150 200 250 300-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
![Page 6: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/6.jpg)
Periodicity Detection FunctionsPDF (periodicity detection function) is used to
detect the period of a waveformTwo categories of PDF
Time domain (時域 )ACF (Autocorrelation function)NSDF (Normalized squared difference function)AMDF (Average magnitude difference function)
Frequency domain (頻域 )Harmonic product spectrumCepstrum
![Page 7: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/7.jpg)
ACF: Auto-correlation Function
Shifted frame s(t-):
Original frame s(t):
=30 acf(30) = inner product of the overlap part
Pitch period
To play safe, the frame size needs to cover at least two fundamental periods!
1n
t
acf s t s t
)0(s )1( ns
)(s )1( ns0-index based,[s(0), s(1), …, s(n-1)]
Quiz candidate!
Quizcandidate!
![Page 8: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/8.jpg)
ACF: Formula 1
Assume a frame is represented by s(t), t=0~n-1
ACF formula
s(t-):
s(t):
s(t-)
1n
t
acf s t s t
t
s(t)
Shift to right
![Page 9: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/9.jpg)
ACF: Formula 2
Assume a frame is represented by s(t), t=0~n-1
ACF formula
s(t+):
s(t):
s(t+)
1
0
n
t
acf s t s t
t
s(t)
Shift to left
This formula is the same as the previous one!
![Page 10: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/10.jpg)
Example of ACF
sunday.wav Sample rate = 16kHz Frame size = 512
(starting from point 9000)
Fundamental frequency Max of ACF occurs at
index 131 FF = 16000/131 =
123.077 Hz
frame2acf01.m
Index 0 Index 131
We suppose it is zero-based indexing.
![Page 11: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/11.jpg)
Locating the Pitch Point
If the range of human’s FF is [40, 1000], then we have the interval for locating fundamental period (FP):
frame2acfPitchPoint01.m
0 100 200 300 400 500 600-1
-0.5
0
0.5
1Input frame
0 100 200 300 400 500 600-20
0
20
40ACF vector (method = 1)
Original ACF
Truncated ACFACF pitch point
401000
100040
fsFP
fsFP
fs
Index: 0
Index: FP
Quiz candidate!
Sample rate
![Page 12: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/12.jpg)
Locating the Fundamental Period (II)
The human pitch range could go wrong Pitch too high
Vitas (local short clip)Whistling
Low-pitch singing/humming requires a big frame size
![Page 13: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/13.jpg)
Example of ACF Based PT
Specs Sample rate = 11025 Hz Frame size = 353 points
= 32 ms Overlap = 0 Frame rate = 31.25 f/s
Playback Original singing Pitch by ACF
wave2pitchByAcf01.m
1 2 3 4 5 6 7 8-1
0
1Waveform
0 1 2 3 4 5 6 7 80
100
200Volume
1 2 3 4 5 6 7 840
60
80
Time (second)
Sem
itone
Original pitch (blue) and volume-thresholded pitch (red)
![Page 14: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/14.jpg)
Example of ACF Based PT (II)
Specs The previous script is
converted into a function pitchTrackingSimple.m for easy access.
ptByAcf01.m
1 2 3 4 5 6 7 8-1
0
1Waveform
0 1 2 3 4 5 6 7 80
100
200Volume
1 2 3 4 5 6 7 840
60
80
Time (second)
Sem
itone
Original pitch (blue) and volume-thresholded pitch (red)
![Page 15: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/15.jpg)
Demo of ACF-based PT
Real-time display of ACF for pitch tracking goPtByAcf.mdl under SAP toolbox
Real-time pitch tracking for mic input goPtByAcf2.mdl under SAP toolbox
![Page 16: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/16.jpg)
ACF Variants to Avoid Tapering
Normalized version
frame2acf02.m
Half-frame shifting
frame2acf03.m
1n
t
s t s tacf
n
/2
0
n
t
acf s t s t
method=2 method=3
![Page 17: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/17.jpg)
NSDF: ACF Variant with Normalize Range
NSDF: normalized squared difference function Formula:
A variant of ACF within the range [-1 1], based on the inequality:
2 2
2 s t s tnsdf
s t s t
12
1
22
22
22222222
ii
ii
iiiiii
yx
yx
yxyxyxyxxyyx
![Page 18: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/18.jpg)
NSDF Example
frame2nsdf01.m
Clarity: height of the pitch point
![Page 19: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/19.jpg)
AMDF: Average Magnitude Difference Function
Shifted frame s(i-):
Original frame s(i):
=30
30
amdf(30) = sum of abs. difference of the overlap part
Pitch period
1n
t
amdf s t s t
Quiz candidate!
![Page 20: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/20.jpg)
Comparison between ACF & AMDF
Formulas ACF:
AMDF:
Two major advantages of AMDF over ACF AMDF requires less computing power AMDF is less likely to have the risk of overflow
1n
t
acf s t s t
1n
t
amdf s t s t
Quiz candidate!
![Page 21: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/21.jpg)
Example of AMDF
sunday.wav Sample rate = 16kHz Frame size = 512
(starting from point 9000)
Fundamental frequency Pitch point occurs at
index 131, which is harder to determine
frame2amdf01.m
Index 0 Index 131
![Page 22: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/22.jpg)
Example of AMDF to Pitch
sunday.wav Sample rate = 16kHz Frame size = 512
(starting from point 9000)
Fundamental frequency Pitch point occurs at
index 131, which is determined correctly
FF = 16000/131 = 123.077 Hz
frame2amdf4pt01.m
0 100 200 300 400 500 600-1
-0.5
0
0.5
1Input frame
0 100 200 300 400 500 600-100
0
100
200AMDF vector (method = 1)
Original AMDF4PT
Truncated AMDF4PTAMDF pitch point
Index 0
Index 131
![Page 23: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/23.jpg)
Example of AMDF Based PT
Specs Sample rate = 11025 Hz Frame size = 353 points
= 32 ms Overlap = 0 Frame rate = 31.25 f/s
Playback Original singing Pitch by AMDF
ptByAmdf01.m
1 2 3 4 5 6 7 8-1
0
1Waveform
0 1 2 3 4 5 6 7 80
100
200Volume
1 2 3 4 5 6 7 8
40
60
80
Time (second)
Sem
itone
Original pitch (blue) and volume-thresholded pitch (red)
![Page 24: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/24.jpg)
AMDF: Variations to Avoid Tapering
Normalized version
frame2amdf02.m
Half-frame shifting
frame2amdf03.m
1n
t
s t s tamdf
n
/2
0
n
t
amdf s t s t
method=2 method=3
![Page 25: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/25.jpg)
Combining ACF and AMDF
ACF
AMDF
Frame
ACF/AMDF
![Page 26: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/26.jpg)
Audio Features in Time Domain
Audio features presented in the time domain
Intensity
Fundamental period
Timbre: Waveform within an FP
![Page 27: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/27.jpg)
Audio Features in Frequency DomainEnergy: Sum of power spectrumPitch: Distance between harmonicsTimber: Smoothed spectrum
Second formant F2First formant
F1Pitch freq
Energy
![Page 28: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/28.jpg)
About DFT & FFT
Terminology DFT: Discrete Fourier transform FFT: Fast Fourier transform, which is an efficient
method for computing DFT
More about DFT
![Page 29: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/29.jpg)
Harmonic Product Spectrum (HPS)
Procedure1. Compute the power spectrum of a frame
2. Eliminate its trend obtained from 20-order polynomial fitting Formants are removed
3. Apply exponential weighting to suppress high-frequency harmonics
4. Down sample and add to enhance the harmonics at the fundamental frequency
5. Find the max as the pitch point
![Page 30: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/30.jpg)
“Down Sample and Add” in HPS
![Page 31: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/31.jpg)
Example of HPSframe2hps01.m
50 100 150 200 250 300 350
-0.20
0.2
Frame
Samples
0 1000 2000 3000 4000 5000-200
0
200Power spectrum and its trend
0 1000 2000 3000 4000 5000-200
0
200Trend-subtracted power spectrum and its tapering version
0 1000 2000 3000 4000 5000-100
0100
Down-sampled versions of power spectrum
0 1000 2000 3000 4000 5000-50
050
Harmonic product spectrum
Freq (Hz)
![Page 32: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/32.jpg)
Example of PT by HPSptByHps01.m
1 2 3 4 5 6 7 8-1
0
1Waveform
0 1 2 3 4 5 6 7 80
100
200Volume
1 2 3 4 5 6 7 8
40
60
Blue: original pitch, black: volume-thresholded pitch)
Time (second)
Sem
itone
![Page 33: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/33.jpg)
PT by Cepstrum
Formula for cepstrum
Procedure for PT by cepstrum1.Compute the power spectrum of a frame.
2.Eliminate the trend of the power spectrum if necessary.
3.Take the inverse FFT on the (symmetric) power spectrum. (The result is real, why?)
4.Find position of the max to compute the pitch.
)(log framefftifftcepstrum
![Page 34: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/34.jpg)
PT by Cepstrum: How It Works?
Close to sinusoids!
This should be a single pulse only!
![Page 35: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/35.jpg)
Example of Cepstrumframe2ceps01.m
50 100 150 200 250 300 350
-0.2
0
0.2
Frame
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
-6-4-202
Power spectrum
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
-2
-1
0
Cepstrum
![Page 36: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/36.jpg)
Example of PT by CepstrumptByCeps01.m
1 2 3 4 5 6 7 8-1
0
1Waveform
0 1 2 3 4 5 6 7 80
100
200Volume
1 2 3 4 5 6 7 8
60
80
Blue: original pitch, black: volume-thresholded pitch)
Time (second)
Sem
itone
![Page 37: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/37.jpg)
Two Parts of PT
PT has two parts Voicing detection
Decide if a frame has a melody pitch or not
Pitch estimationEstimate the most likely melody pitch of a frame
These two parts can be performed in any orderPerformance evaluation of PT depends on
these two parts
![Page 38: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/38.jpg)
Performance Evaluation of PT
Several criteria for PT performance evaluation Raw pitch accuracy
Prob. of a correct pitch value (to within ±¼ tone or ±0.5 semitone) over the voiced frames
Raw chroma accuracyProb. that the chroma (i.e. the note name) is correct
over the voiced frames
Overall accuracyProb. of a correct pitch value (via pitch estimation) and
pitched decision (via voicing detection) over all frames
![Page 39: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/39.jpg)
Preprocessing for Pitch Tracking
Some commonly used preprocessing for the audio signals before pitch tracking Pre-filtering the signals Clipping the signals SIFT method for the signals
![Page 40: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/40.jpg)
Preprocessing: Pre-filtering
Observation Range of humans’ pitch: [40, 1000]
Idea Low-pass the signals with a cutoff frequency
between 800 and 1000
Characteristics The effect is yet to be verified
![Page 41: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/41.jpg)
Preprocessing: Clipping
Observation Small signals near zero is likely to cause pitch
tracking error
Idea Clip the signals
Characteristics Save computation for embedded system Overall effect is yet to be verified
![Page 42: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/42.jpg)
Preprocessing: SIFT
Observation Channel effect is likely to cause pitch tracking
error
Idea of SIFT (simple inverse filter tracking) Identify the excitation via LPC Use the excitation for PDF
Characteristics Overall effect is yet to be verified
![Page 43: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/43.jpg)
Example of SIFT
siftAcf01.m
0 50 100 150 200 250 300-0.4
-0.2
0
0.2
0.4Original signal vs. LPC estimate
Original Signal
LPC estimate
0 50 100 150 200 250 300-0.1
-0.05
0
0.05
0.1Residual signal when order = 20
0 50 100 150 200 250 300-1
-0.5
0
0.5
1Normalized ACF curves
Frame index
Normalized ACF on original frame
Normalized ACF on excitation
![Page 44: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/44.jpg)
Example of PT based on SIFT & ACF
ptBySiftAcf01.m
1 2 3 4 5 6 7 8-1
0
1Waveform
0 1 2 3 4 5 6 7 80
100
200Volume
1 2 3 4 5 6 7 8
40
60
Blue: original pitch, black: volume-thresholded pitch)
Time (second)
Sem
itone
![Page 45: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/45.jpg)
Postprocessing for Pitch Tracking
Some commonly used postprocessing for pitch tracking Smoothing to remove abrupt-changing pitch Interpolation to increase pitch precision
![Page 46: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/46.jpg)
Postprocessing: Smoothing
Smoothing by a median filterptWithMedianFilter01.m
1 2 3 4 5 6 7 8-1
0
1Waveform
0 1 2 3 4 5 6 7 80
100
200Volume
1 2 3 4 5 6 7 8
40
60
80Blue: original pitch, black: volume-thresholded pitch)
Time (second)
Sem
itone
![Page 47: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/47.jpg)
Postprocessing: Interpolation
Idea Using the pitch point and
its neighbors to identify the max position
ptWithParabolicFit01.m
0 50 100 150 200 250 30050
60
70
80
Original pitch
Finetuned pitch with parabolic fit
0 50 100 150 200 250 300-0.2
-0.1
0
0.1
0.2Pitch difference
![Page 48: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/48.jpg)
48/44
UPDUDP (1/4)
UPDUDP: Unbroken Pitch Determination Using DP Goal: To take pitch smoothness into consideration
: a given path in the AMDF matrix : Number of frames : Transition penalty : Exponent of the transition difference
n
i
n
i
m
iiii pppamdfm1
1
11,,cost p
mn
ni ppp ,,1p
Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments",ACM Transactions on Asian Language Information Processing, No. 10, Vol. 7, Aug 2008.
![Page 49: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/49.jpg)
UPDUDP (2/4)
Optimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j)
Recurrent formula:
Initial conditions : Optimum cost :
160,8),(),1( 1 jjamdfjD
),(min
160,8jnD
j
2
160,8),1(min)(),( jkkiDjamdfjiD
ki
160,8,,1 jni
![Page 50: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/50.jpg)
Example of UPDUDP
A typical example (via AMDF)
![Page 51: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/51.jpg)
Robustness of UPDUDP
Insensitivity in
0 0.5 1 1.5 2
-3
-2
-1
0
1
2
3
x 104
Wav
efor
m
xi
x i
lu
l u
chan
ch a nn
sheng
sh ng
chang
ch a ng
0 0.5 1 1.5 2
20
30
40
50
60
70
80
Time (seconds)
Pitc
h (S
emito
nes)
xi
x i
lu
l u
chan
ch a nn
sheng
sh ng
chang
ch a ng
=0
=2000 =4000 =6000 =8000 =10000 =12000 =14000 =16000 =18000 =20000
![Page 52: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/52.jpg)
Another Example of UPDUDP
Example of MATLAB code using UPDUDP (via ACF)
Result
waveFile='arina_short.wav';wObj=waveFile2obj(waveFile);ptOpt=ptOptSet(wObj.fs, wObj.nbits, 1);pitch=pitchTracking(wObj, ptOpt, 1);
1 2 3 4 5 6
-202
x 104 Waveform of arina_short.wav
PF matrix (white dots: DP path, black dots: Pitch after all kinds of thresholding/smoothing
0 1 2 3 4 5 6
20406080
100120
1 2 3 4 5 640
60
80Computed pitch
Pitch
(sem
itone
)
1 2 3 4 5 60
2
4x 10
6 Volume
1 2 3 4 5 60
0.5
1
Time (sec)
Clarity
![Page 53: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/53.jpg)
Frequency to Semitone Conversion
Semitone : A music scale based on A440
Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )
69440
log12 2
freqsemitone
![Page 54: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/54.jpg)
Unreliable Pitch Removal (1/2)
Pitch removal via volume thresholding
1 2 3 4 5 6 7 8
-100
-50
0
50
100
Waveform of .wav小 毛 驢
1 2 3 4 5 6 70
5000
10000
Volume
1 2 3 4 5 6 7
40
50
60
70
80
Pitch
Time (sec)
![Page 55: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/55.jpg)
Unreliable Pitch Removal (2/2)
Pitch removal via volume/clarity thresholding
1 2 3 4 5 6 7 8
-100
0
100
Waveform of .wav小 毛 驢
1 2 3 4 5 6 70
5000
10000
Volume
1 2 3 4 5 6 70
0.5
1Clarity
1 2 3 4 5 6 7
40
60
80
Pitch
Time (sec)
![Page 56: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/56.jpg)
Rest Handling
0 50 100 150 200 25055
60
65
70Original PV
0 20 40 60 80 100 120 140 160 18055
60
65
70useRest=1
0 50 100 150 200 25055
60
65
70useRest=0
Frame index
Rests are removed. Good for DTW.
Rests are replaced by previous nonzero pitch. Good for LS.
Original pitch vectors with rests.
![Page 57: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/57.jpg)
Typical Result of Pitch Tracking
Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音
![Page 58: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/58.jpg)
Comparison of Pitch VectorsYellow line : Target pitch vector
![Page 59: Pitch Tracking ( 音高追蹤 )](https://reader033.fdocuments.net/reader033/viewer/2022061602/56814467550346895db0fa24/html5/thumbnails/59.jpg)
Other Pitch Related Demos
Pitch scaling pitchShiftDemo/project1.exe pitchShift-multirate/multirate.m