Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0...
Transcript of Lec04, Speech II, v1.08.ppt - ce.sharif.educe.sharif.edu/courses/92-93/2/ce342-1/resources... · 0...
Multimedia SystemsMultimedia Systems
Speech IISpeech II
Course PresentationCourse Presentation
Mahdi Amiri
February 2014
Sharif University of Technology
Speech Compression
Based on Time Domain analysis
Differential Pulse-Code Modulation (DPCM)
Adaptive DPCM (ADPCM)
Road MapRoad Map
Page 1 Multimedia Systems, Speech II
Based on Frequency Domain analysis
Linear Predictive Coding (LPC)
Code Excited Linear Prediction (CELP)
Differential PCM (DPCM)IdeaIdea
Take advantage of data redundancy
Page 2 Multimedia Systems, Speech II
[… 110 112 111 112 112 114 115 115 114 114… ] [… +2 -1 +1 0 +2 +1 0 -1 0 …]
Or histogram of PCM samples in a chunk
of digitized audio.Typ. Chunk length: 50 ms
Differential PCM (DPCM)
Simplest prediction: The difference between the current
sample value and previous sample is quantized. It means
we predicted current sample by assuming it to be equal to
the previous sample.
Another better prediction: The difference between the
PredictionPrediction
Page 3 Multimedia Systems, Speech II
Another better prediction: The difference between the
current sample value and the average of e.g. 2 previous
samples.
Much better prediction: The difference between the
current sample value and the weighted average of e.g. 10
previous samples.
Differential PCM (DPCM)Basic SchemeBasic Scheme
General Predictive Coding
Page 4 Multimedia Systems, Speech II
1Delta Modulation (DM): i n ia x z−−⇒∑
Problem?
Differential PCM (DPCM)Error PropagationError Propagation
General Predictive Coding
Page 5 Multimedia Systems, Speech II
The output of dequantizer in decoder is not
equal with the input of the quantizer in the
encoder
� The input of predictor in decoder is not
the same as input values of predictor in
encoder
� This is the source of error propagation.
General Predictive Coding
x_n: 0 0.6 1.2 1.8 2.4 3.0 3.6 4.2
p_n: 0 0 0.6 1.2 1.8 2.4 3.0 3.6
d_n: 0 0.6 0.6 0.6 0.6 0.6 0.6 0.6
Q : 0 1 1 1 1 1 1 1
^Q : 0 1 1 1 1 1 1 1
^dn: 0 1 1 1 1 1 1 1
^pn: 0 0 1 2 3 4 5 6
^xn: 0 1 2 3 4 5 6 7
Err: 0 0.4 0.8 1.2 1.6 2.0 2.4 2.8
Error propagation example in above structure when input signal increases constantly by +0.6.
Quantization step size is 1; Therefore, e.g. -0.5 thr. 0.5 quantized as 0 and 0.5 thr. 1.5 quantized as 1.
Differential PCM (DPCM)Better StructureBetter Structure
Page 6 Multimedia Systems, Speech II
Adaptive DPCM (ADPCM)IdeaIdea
Page 7 Multimedia Systems, Speech II
Problem?
Adaptive DPCM (ADPCM)Size of Quantization StepSize of Quantization Step
Delta Modulation (DM)
1 bit quantizer: 0 means + and 1 means ∆ −∆
Page 8 Multimedia Systems, Speech II
ADM: [ ] [ 1]n M n∆ = ∆ −
12, 2
P Q= =
1 if [ ] [ 1]
1 if [ ] [ 1]
M P c n c n
M Q c n c n
= > = −
= < ≠ −
Adaptive Delta Modulation (ADM)
Speech Compression ConceptsFFT, No Time LocalizationFFT, No Time Localization
Speech Signal
Page 9 Multimedia Systems, Speech II
FFT
(is only localized in frequency)
Joseph Fourier, 1768-1830
Speech Compression ConceptsFFT, No Time Localization, Ex. 1FFT, No Time Localization, Ex. 1
Page 10 Multimedia Systems, Speech II
See Power Spectral Density (PSD) examples in MATLAB
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1x4 = x1+x2+x3
1x5 = x1 then x2 then x3
Speech Compression ConceptsFFT, No Time Localization, Ex. 2FFT, No Time Localization, Ex. 2
0 0.005 0.01 0.015-1
0
1x1, freq.: 400 Hz
0 1000 2000 3000 4000 5000 6000 7000 80000
50
100
150PSD x1
Frequency (Hz)
0 0.005 0.01 0.015-1
0
1x1, freq.: 900 Hz
0 1000 2000 3000 4000 5000 6000 7000 80000
50
100
150PSD x2
Frequency (Hz)
1x1, freq.: 1600 Hz
150PSD x3
0 1000 2000 3000 4000 5000 6000 7000 80000
100
200
300
400
500PSD x4
Frequency (Hz)
500PSD x5
X4 = (x1 + x2 + x3) / 3
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1x6 = x3 then x1 then x2
Page 11 Multimedia Systems, Speech II
See Power Spectral Density (PSD) examples in MATLAB
0 0.005 0.01 0.015-1
0
1
0 1000 2000 3000 4000 5000 6000 7000 80000
50
100
150
Frequency (Hz)
0 0.005 0.01 0.015-3
-2
-1
0
1
2
3x4 = x1+x2+x3
0 1000 2000 3000 4000 5000 6000 7000 80000
100
200
300
400
500
Frequency (Hz)
0 1000 2000 3000 4000 5000 6000 7000 80000
100
200
300
400
500PSD x6
Frequency (Hz)
0.325 0.33 0.335 0.34-1
-0.5
0
0.5
1x5 = x1 then x2 then x3
0.325 0.33 0.335 0.34 0.345-1
-0.5
0
0.5
1x6 = x3 then x1 then x2
Part of
x4
X5 = x1 then x2 then x3
X6 = x3 then x1 then x2Zoomed
Parts �
Part of
x5
Part of
x6
Speech Compression ConceptsSTFTSTFT
Speech Signal
Page 12 Multimedia Systems, Speech II
STFT
(fixed time and frequency localization)
Dennis Gabor, 1900-1979
Speech Compression ConceptsSpectrogramSpectrogram
Page 13 Multimedia Systems, Speech II
3D surface spectrogram of a part
from a music piece.
Speech Compression ConceptsSpectrogramSpectrogram
Page 14 Multimedia Systems, Speech II
Spectrogram of a male voice saying ‘nineteenth century’.
Speech Compression ConceptsSpectrogram Display in Spectrogram Display in AudaCityAudaCity
Page 15 Multimedia Systems, Speech II
Waveform
Spectrogram
Speech Compression ConceptsSpectrogram Display in Spectrogram Display in AudaCityAudaCity
AudaCity | Edit | Preferences |
Spectrograms | FFT Window |
Window size
Page 16 Multimedia Systems, Speech II
FFT Window size:128
FFT Window size:1024
Speech Compression ConceptsSpectrogram, DemonstrationSpectrogram, Demonstration
Bat Echolocation Call Flute by Jean Pierre Rampal
Page 17 Multimedia Systems, Speech II
Bat Echolocation Call Flute by Jean Pierre Rampal
Singing Voice Face!
Speech Compression ConceptsFormantFormant
Page 18 Multimedia Systems, Speech II
The time and frequency domain
presentation of vowels /a/, /i/, and /u//a/
/i/
/u/
Speech Compression ConceptsSample ApplicationSample Application
A computing system to answer
questions posed in natural language.
Design a reliable speech recognition
unit for IBM Watson project.
Page 19 Multimedia Systems, Speech II
Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson
www-943.ibm.com/innovation/us/watson/
Dr. David Ferrucci, Watson Principal Investigator
Linear Predictive Coding (LPC)ModelingModeling
Page 20 Multimedia Systems, Speech II
Simplified
Linear Predictive Coding (LPC)Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)
Buzzer � Filter
Speech = Formants + Residue
Chuncks: 30 thr. 50 frames/sec.
Page 21 Multimedia Systems, Speech II
1
[ ] [ ]P
i
i
x n a x n i=
= −∑%
Predictor for each frame:
Speech = Formants + Residue
Linear Predictive Coding (LPC)Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)
The human vocal tract as an infinite impulse response (IIR) system Vowel /a/
Page 22 Multimedia Systems, Speech II
LPC Block Diagram
Linear Predictive Coding (LPC)Original Paper, Original Paper, AtalAtal--HanauerHanauer 19711971
Original
Page 23 Multimedia Systems, Speech II
Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's
time we rounded up that herd of Asian cattle," spoken by a male speaker
Original
Synthetic
Linear Predictive Coding (LPC)Voiced Frame ExampleVoiced Frame Example
Original
Page 24 Multimedia Systems, Speech II
Synthetic
Time Domain Frequency Domain
180 samples, Pitch period: 75
Linear Predictive Coding (LPC)Unvoiced Frame ExampleUnvoiced Frame Example
Original
Page 25 Multimedia Systems, Speech II
Synthetic:
White noise
with uniform
distribution
Time Domain Frequency Domain
180 samples
Code Excited Linear Prediction
Problem of LPCWhere there is both Hiss and Buzz
Solution
CELPCELP
Encoder
Page 26 Multimedia Systems, Speech II
SolutionEncode residue
MethodVector Quantization
(Codebook)Decoder
Vector QuantizationBlock DiagramBlock Diagram
Page 27 Multimedia Systems, Speech II
Vector QuantizationExampleExample
Sample scalar quantizer
We have 3 possible colors for
each square; so we can quantize
each square with 2 bits � (28 *
2 = 56 bits for all 28 (7*4)
squares.
Page 28 Multimedia Systems, Speech II
squares.
Sample vector quantizer
We have 8 forms in the
codebook; so we can quantize
each form with 3 bits � (7 * 3
= 21 bits for all 28 (7*4)
squares.Codebook
Vector QuantizationCodebook DesignCodebook Design
Page 29 Multimedia Systems, Speech II
Comparison of Speech CodersSample SpeechSample Speech
A lathe is a big tool. Grab every dish of sugar.
Page 30 Multimedia Systems, Speech II
Comparison of Speech CodersDemonstrationDemonstration
Page 31 Multimedia Systems, Speech II
Original ADPCM
LPC CELP
Speech Coding
G.711
PCM
u-law, a-law
64, 80 and 96 kbps
G.722
ITUITU--T StandardsT Standards
Check out a complete list athttp://en.wikipedia.org/wiki/List_of_codecs#Audio_codecs
A comparison of Internet audio compression formats
http://www.sericyb.com.au/audio.html
Page 32 Multimedia Systems, Speech II
G.722
ADPCM
48, 56 and 64 kbps
G.728
A form of CELP
16 kbps
Vocoders
http://www.sericyb.com.au/audio.html
Speech Coding
HawkVoice
Free and Open Source CodeFree and Open Source Code
http://hawksoft.com/hawkvoice/
Page 33 Multimedia Systems, Speech II
Check out voice samples of HawkVoice™ codecs at
http://hawksoft.com/hawkvoice/codecs.shtml
Thank You
Multimedia SystemsMultimedia Systems
Speech IISpeech II
Page 34 Multimedia Systems, Speech II
Thank You
1. http://ce.sharif.edu/~m_amiri/
2. http://www.dml.ir/
FIND OUT MORE AT...
Next Session: Entropy CodingNext Session: Entropy Coding