Lec04, Speech II, v1.06.ppt - ce.sharif.educe.sharif.edu/courses/91-92/2/ce342-1/resources... ·...
Transcript of Lec04, Speech II, v1.06.ppt - ce.sharif.educe.sharif.edu/courses/91-92/2/ce342-1/resources... ·...
Multimedia SystemsMultimedia Systems
Speech IISpeech II
Course PresentationCourse Presentation
Mahdi Amiri
February 2013
Sharif University of Technology
Speech Compression
Based on Time Domain analysis
Differential Pulse-Code Modulation (DPCM)
Adaptive DPCM (ADPCM)
Road MapRoad Map
Page 1 Multimedia Systems, Speech II
Based on Frequency Domain analysis
Linear Predictive Coding (LPC)
Code Excited Linear Prediction (CELP)
Differential PCM (DPCM)IdeaIdea
Take advantage of data redundancy
Page 2 Multimedia Systems, Speech II
[… 110 112 111 112 112 114 115 115 114 114… ] [… +2 -1 +1 0 +2 +1 0 -1 0 …]
Or histogram of PCM samples in a chunk
of digitized audio.
Differential PCM (DPCM)Basic SchemeBasic Scheme
General Predictive Coding
Page 3 Multimedia Systems, Speech II
1Delta Modulation (DM): i n ia x z
−
−⇒∑
Problem?
Differential PCM (DPCM)Error PropagationError Propagation
General Predictive Coding
Page 4 Multimedia Systems, Speech II
The output of dequantizer in decoder is not equal with the input of the
quantizer in the encoder � The input of predictor in decoder is not the
same as input values of predictor in encoder � This is the source of error
propagation.
Differential PCM (DPCM)Better StructureBetter Structure
Page 5 Multimedia Systems, Speech II
Adaptive DPCM (ADPCM)IdeaIdea
Page 6 Multimedia Systems, Speech II
Problem?
Adaptive DPCM (ADPCM)Size of Quantization StepSize of Quantization Step
Delta Modulation (DM)
1 bit quantizer: 0 means + and 1 means ∆ −∆
Page 7 Multimedia Systems, Speech II
ADM: [ ] [ 1]n M n∆ = ∆ −
12, 2
P Q= =
1 if [ ] [ 1]
1 if [ ] [ 1]
M P c n c n
M Q c n c n
= > = −
= < ≠ −
Adaptive Delta Modulation (ADM)
Speech Compression ConceptsFFT, No Time LocalizationFFT, No Time Localization
Speech Signal
Page 8 Multimedia Systems, Speech II
FFT
(is only localized in frequency)
Joseph Fourier, 1768-1830
Speech Compression ConceptsFFT, No Time LocalizationFFT, No Time Localization
Page 9 Multimedia Systems, Speech II
See Power Spectral Density (PSD) examples in MATLAB
Speech Compression ConceptsSTFTSTFT
Speech Signal
Page 10 Multimedia Systems, Speech II
STFT
(fixed time and frequency localization)
Dennis Gabor, 1900-1979
Speech Compression ConceptsSpectrogramSpectrogram
Page 11 Multimedia Systems, Speech II
3D surface spectrogram of a part
from a music piece.
Speech Compression ConceptsSpectrogramSpectrogram
Page 12 Multimedia Systems, Speech II
Spectrogram of a male voice saying ‘nineteenth century’.
Speech Compression ConceptsSpectrogram Display in Spectrogram Display in AudaCityAudaCity
Page 13 Multimedia Systems, Speech II
Waveform
Spectrogram
Speech Compression ConceptsSpectrogram Display in Spectrogram Display in AudaCityAudaCity
AudaCity | Edit | Preferences |
Spectrograms | FFT Window |
Window size
Page 14 Multimedia Systems, Speech II
FFT Window size:128
FFT Window size:1024
Speech Compression ConceptsSpectrogram, DemonstrationSpectrogram, Demonstration
Bat Echolocation Call Flute by Jean Pierre Rampal
Page 15 Multimedia Systems, Speech II
Bat Echolocation Call Flute by Jean Pierre Rampal
Singing Voice Face!
Speech Compression ConceptsFormantFormant
Page 16 Multimedia Systems, Speech II
The time and frequency domain
presentation of vowels /a/, /i/, and /u//a/
/i/
/u/
Speech Compression ConceptsSample ApplicationSample Application
A computing system to answer
questions posed in natural language
Page 17 Multimedia Systems, Speech II
Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson
www-943.ibm.com/innovation/us/watson/
Dr. David Ferrucci, Watson Principal Investigator
Linear Predictive Coding (LPC)ModelingModeling
Page 18 Multimedia Systems, Speech II
Linear Predictive Coding (LPC)Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)
Buzzer � Filter
Speech = Formants + Residue
Chuncks: 30 thr. 50 frames/sec.
Page 19 Multimedia Systems, Speech II
1
[ ] [ ]P
i
i
x n a x n i=
= −∑ɶPredictor for each frame:
Speech = Formants + Residue
Linear Predictive Coding (LPC)Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)
The human vocal tract as an infinite impulse response (IIR) system Vowel /a/
Page 20 Multimedia Systems, Speech II
LPC Block Diagram
Linear Predictive Coding (LPC)Original Paper, Original Paper, AtalAtal--HanauerHanauer 19711971
Original
Page 21 Multimedia Systems, Speech II
Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's
time we rounded up that herd of Asian cattle," spoken by a male speaker
Original
Synthetic
Linear Predictive Coding (LPC)Voiced Frame ExampleVoiced Frame Example
Original
Page 22 Multimedia Systems, Speech II
Synthetic
Time Domain Frequency Domain
180 samples, Pitch period: 75
Linear Predictive Coding (LPC)Unvoiced Frame ExampleUnvoiced Frame Example
Original
Page 23 Multimedia Systems, Speech II
Synthetic:
White noise
with uniform
distribution
Time Domain Frequency Domain
180 samples
Code Excited Linear Prediction
Problem of LPCWhere there is both Hiss and Buzz
Solution
CELPCELP
Encoder
Page 24 Multimedia Systems, Speech II
SolutionEncode residue
MethodVector Quantization
(Codebook)Decoder
Vector QuantizationBlock DiagramBlock Diagram
Page 25 Multimedia Systems, Speech II
Vector QuantizationExampleExample
Sample scalar quantizer
We have 3 possible colors for
each square; so we can quantize
each square with 2 bits � (28 *
2 = 56 bits for all 28 (7*4)
squares.
Page 26 Multimedia Systems, Speech II
squares.
Sample vector quantizer
We have 8 forms in the
codebook; so we can quantize
each form with 3 bits � (7 * 3
= 21 bits for all 28 (7*4)
squares.Codebook
Vector QuantizationCodebook DesignCodebook Design
Page 27 Multimedia Systems, Speech II
Comparison of Speech CodersSample SpeechSample Speech
A lathe is a big tool. Grab every dish of sugar.
Page 28 Multimedia Systems, Speech II
Comparison of Speech CodersDemonstrationDemonstration
Page 29 Multimedia Systems, Speech II
Original ADPCM
LPC CELP
Speech Coding
G.711
PCM
u-law, a-law
64, 80 and 96 kbps
G.722
ITUITU--T StandardsT Standards
Check out a complete list athttp://en.wikipedia.org/wiki/List_of_codecs#Audio_codecs
A comparison of Internet audio compression formats
http://www.sericyb.com.au/audio.html
Page 30 Multimedia Systems, Speech II
G.722
ADPCM
48, 56 and 64 kbps
G.728
A form of CELP
16 kbps
Vocoders
http://www.sericyb.com.au/audio.html
Speech Coding
HawkVoice
Free and Open Source CodeFree and Open Source Code
http://hawksoft.com/hawkvoice/
Page 31 Multimedia Systems, Speech II
Check out voice samples of HawkVoice™ codecs at
http://hawksoft.com/hawkvoice/codecs.shtml
Thank You
Multimedia SystemsMultimedia Systems
Speech IISpeech II
Page 32 Multimedia Systems, Speech II
Thank You
1. http://ce.sharif.edu/~m_amiri/
2. http://www.dml.ir/
FIND OUT MORE AT...
Next Session: Entropy CodingNext Session: Entropy Coding