An Overview of Pitch Detection Algorithms
description
Transcript of An Overview of Pitch Detection Algorithms
![Page 1: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/1.jpg)
An Overview of Pitch Detection Algorithms
Alexandre SavardMUMT611: Music Information Acquisition,
Preservation, and Retrieval
February 2006
![Page 2: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/2.jpg)
Content• Introduction
– Classification– Applications– Problems and Constraints
• Time Domain Algorithms• Frequency Domain Algorithms• Alternative Techniques• Conclusion
![Page 3: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/3.jpg)
IntroductionPrior Definitions
– Pitch : Defined as the perceptual appreciation of the highness or the lowness of a sound. It is related to the periodicity of a sound. – Frequency : Physical attribute of a sound or any type other of signal. Describes the amount of times that a repeated event occur per unit of time. – Fundamental Frequency : In a complex sound or signal, it is the lowest partial.
![Page 4: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/4.jpg)
IntroductionApplication of Pitch Tracking
– Music Automatic Transcription from audio signals to common music notation or to MIDI number– Score Following– Musical Queries by singing or humming– Acoustic feature for Human-Computer Interaction– Sound-Editing Program like pitch-shifting and time- scaling operation
![Page 5: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/5.jpg)
IntroductionNon-Exclusive Classification
– Voice ( Speech, Singing )– Instrumental– Monophonic– Polyphonic– Time-Based Algorithm– Spectral-Based Algorithm– Alternative
![Page 6: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/6.jpg)
IntroductionGenerally Encountered Problems
– Noise– Reverberation– Other Sounds from the environment– Shortness of the sustained part for certain sounds– Sounds need to be analyzed right after the attack transient where they are not totally stable– Detuning during the sustain part of a sound– Minimal output delay for realtime.
![Page 7: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/7.jpg)
IntroductionMusic-Specific Difficulties
– Large frequency range for musical instrument– Many instrumental sound have inharmonic partials– Expressiveness factors ( glissando, vibrato, thrill )– Fast algorithm for real-time processing– Multiphonic
![Page 8: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/8.jpg)
Time Domain• Zero-Crossing Detection• Autocorrelation Function• Average Magnitude Difference Function
![Page 9: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/9.jpg)
Time DomainZero-Crossing Detection
– Based on a direct application of the definition of periodicity– Counting the number of time that the signal crosses a reference level– Mostly Inexpensive in computation– Weakness against noise – Presents weakness when used to analyze signals with energy in high frequencies
![Page 10: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/10.jpg)
Time DomainZero-Crossing Detection
http://www-ccrma.stanford.edu/~pdelac/154/m154paper.htm#_ftn5
![Page 11: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/11.jpg)
Time DomainAutocorrelation Technique
– Cross-Correlation is a non-linear operation that measure the similarity between two signal.– The coresponding samples of a signals and a time- shifted version of an other one are multiplied and added toghether.
– The Cross-Correlation functionwill then have a peak to the offset value which coresponds to the maximum of similarity.
![Page 12: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/12.jpg)
Time DomainAutocorrelation Technique
– Autocorrelation is a cross-correlation of a signal with itself.
– The maximum of similarity occurs for time shifting of zero.– An other maximum should occur in theory when the time-shifting of the signal corresponds to the fundamental period.
![Page 13: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/13.jpg)
Time DomainAutocorrelation Technique
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html
![Page 14: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/14.jpg)
Time DomainAutocorrelation Technique
– Not very efficient for high fundamental frequency.– Convolution is a very expensive process.– Computation efficiency can be improved using the FFT algorithm instead of convolution. It reduces calculation from N squared to NlogN.– Most of the variation of this technique related to the mathematical definition of the autocorrelation used, the way the maximums are localized, and how errors in the maximum identification are attenuated.
![Page 15: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/15.jpg)
Time DomainAverage Magnitude Difference Function
– It is an alternate to Autocorrelation function.– It compute the difference between the signal and a time-shifted version of itself.
– While auttocorelation have peaks at maximum similarity, there will be valleys in the average magnitude difference function.
![Page 16: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/16.jpg)
Time DomainOther Temporal Algorithm
– Waveform Maximum Detection– Sum Magnitude Difference Squared Function– Average Squared Difference Function– Cumulative Mean Normalized Difference Function– Circular Average Magnitude Difference Function– Adaptive Filter
![Page 17: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/17.jpg)
Time DomainOther Temporal Algorithm
– Adaptive Filter– Super Resolution Pitch Determination
![Page 18: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/18.jpg)
Frequency Domain• Harmonic Product Spectrum• Cepstrum
![Page 19: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/19.jpg)
Frequency DomainHarmonic Product Spectrum
– FFT is used to convert temporal representation of sound into its spectral representation– Assume that all signals are made of harmonic partials– The spectrum is compressed by a factor corresponding to harmonic numbers– Multiplying the compressed spectrum with the original one leads to a amplification of the fundamental frequency
![Page 20: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/20.jpg)
Frequency DomainHarmonic Product Spectrum
– The highest peak most likely correspond to the fundamental frequency
http://www-ccrma.stanford.edu/~pdelac/154/m154paper.htm#_ftn5
![Page 21: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/21.jpg)
Frequency DomainHarmonic Product Spectrum
– Presents a high degree of robustness in a noisy environment– Less efficient for sounds that are not made from harmonic components– Computationnally inexpensive– Octave Errors can occur
![Page 22: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/22.jpg)
Frequency DomainCepstrum
– Cepstrum is defined as the inverse Fourrier transform of the logarithm of the power spectrum of a signal– Cepstrum extracts periodicity from the spectrum– It can be unformally mathematically written as:
– It results a peak which correspond to the fundamental period
![Page 23: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/23.jpg)
Frequency DomainCalculation of Cepstrum for Voice
– In the source filter-model, voiced speech s(t) can be considered as the convolution of a pulse train p(t) with the impulse respond of the vocal tract h(t).– In the spectrum we get:
– Taking the logarithm on both side we then obtain:
![Page 24: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/24.jpg)
Frequency DomainCepstrum
– The logarithim operation flatten the spectra so that so that it gives more robustness for formants– However this same operation rises the noise level
![Page 25: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/25.jpg)
Frequency DomainOther Frequency Domain Algorithm
– Maximum Likelihood– Linear Prediction Coding– Spectral Autocorrelation
![Page 26: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/26.jpg)
Alternative TechniqueTeager Energy Function
– Referring again to the source-filter model for voice, it can be represented by a pulse train filtered by the vocal tract.– The pulse train is produced by the successive opening and closure of the glottis.– The production of speech is closely related to the release of energy through the glottis.– The opening/closure of the glottis result in a peak of energy into the signal
![Page 27: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/27.jpg)
Alternative TechniqueTeager Energy Function
– The Teager energy function is a non-linear operator that defines the instantaneous energy as:
– It is derived from the total energy of an oscillatory spring-mass system.- Estimating the periodicity of energy peaks for the signal leads to an approximation of the fundamental frequency.
![Page 28: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/28.jpg)
Alternative TechniqueMiscellaneous Technique
– Wavelet Transform
– Bayesian Statistical Model– Hidden Markov Model– Graphical probablilistic Models– Perceptual Pitch Detector
![Page 29: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/29.jpg)
Conclusion
![Page 30: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/30.jpg)
Bibliography• Liu B.,Wu Y., L Yi. "Linear Hidden Markov Model for Music Information Retrieval Based on Humming." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing 2003.
• Li B., Li Y., Wang C., Tang C., Zhang E. "A New Efficient Pitch-Tracking Algorithm." Paper presented at the International Conference on Robotics, Intelligent Systems and Signal Processing 2003.
• Chilton E., Evans B. "The Spectral Autocorrelation Applied to the Linear Prediction Residual of Speech for Robust Pitch Detection." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing 1988.
• Monti G., Sandler M. "Monophonic Transcription with Autocorrelation " Paper presented at the Conference on Digital Audio Effects 2000.
• Liu J., Zheng T., Deng J. and Wu W. "Real-Time Pitch Tracking Based on Combined Smdsf." Paper presented at the Conference on Speech Communcation and Technology 2005.
![Page 31: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/31.jpg)
Bibliography• Luo H., Denbigh P. "A Speech Separation System That Is Robust to Reverberation." Paper presented at the International Symposium on Speech, Image Processing and Neural Networks 1994.
• Wu M., Wang D., Brown G. "A Multi-Pitch Tracking Algorithm for Noisy Speech." Paper presented at the International Conference on Acoustic, Speech, and Signal Processing 2002.
• Nazih Abu-Shikhah Mohamed Deriche. "A Novel Pitch Estimation Technique Using the Teager Energy Function." Paper presented at the International Symposium on Signal Processing and its Applications 1999.
• Picone J., Doddington G., Secrest B. "Robust Pitch Detection in a Noisy Telephone Environment." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing 1987.
• Quast H., Schreiner O., Schroeder R. "Robust Pitch Tracking in the Car Environment." Paper presented at the International Conference on Acoustics, Speech, and Signal Processing 2002.
![Page 32: An Overview of Pitch Detection Algorithms](https://reader036.fdocuments.net/reader036/viewer/2022081514/56813a48550346895da23d1a/html5/thumbnails/32.jpg)
Bibliography• Marchand S. "An Efficient Pitch-Tracking Algorithm Using a Combination of Fourier Transforms." Paper presented at the Conference on Digital Audio Effects 2001.• Walmsley P., Godsill S., Rayner P. "Polyphonic Pitch Tracking Using Joint Bayesian Estimation of Multiple Frame Parameters." Paper presented at the Workshop on Applications of Signal Processing to Audio and Acoustics 1999.•Zhu W., Kankanhalli M. "Robust and Efficient Pitch Tracking for Query-by-Humming." Paper presented at the Conference on Information, Communications and Signal Processing 2003.• Roads C., “The Computer Music Tutorial”, p.497-533, Boston, The MIT Press, 1996.