EG-348_371_09 1 Multimedia Communications (371) Speech and Image Communications (348) John Mason...
-
Upload
kristen-burkman -
Category
Documents
-
view
214 -
download
1
Transcript of EG-348_371_09 1 Multimedia Communications (371) Speech and Image Communications (348) John Mason...
EG-348_371_09
1
Multimedia Communications (371) Speech and Image Communications (348)
John Mason
Engineering
Swansea University
EG-348_371_09
2
Features in speech
X1
.
.
.
.Xi
.
.
.
.
.
Acquisition
(frame: 20/30 ms & sampling F: 8khz)
Feature extraction
time
EG-348_371_09
3
Features in speech
X1
.
.
.
.Xi
.
.
.
.
.
Acquisition
(frame: 20/30 ms & sampling F: 8khz)
Feature extraction
EG-348_371_09
4
Speech production
Air fromthe lungs
Vocal fold Vocal tract Speech
0
EG-348_371_09
5
LPC Short and Long
Spectral envelop reflects morphological characteristics of the vocal tract
H1(z) H2(z)noise synthesisedSpeech
Air fromthe lungs
Vocal fold Vocal tract Speech
EG-348_371_09
6
Features: building of statistical model
T1
T2
T1
T2
T1
T2
T1
T2
T1
T2
T1
T2
T1
T2
T1
T2
T1
T2
T1
T2 T1
T2
EG-348_371_09
7
VT Shape & Some Vowels - Ladefoged ‘62
EG-348_371_09
8
Speech Processing - Applications
Why? Communications Synthesis Recognition
Speech & Speaker
How? Frame-based Systems approach
EG-348_371_09
9
Some Books
Flanagan -’Speech Analysis, Synthesis and Perception’, Springer-Verlag, - a classic!
Furui - several books on recognition Parsons - `Voice and Speech Processing’ - McGraw Hill,
one of the first text books on computer speech processing O’Shaughnessy - ‘Speech Comms - human and machine’
Addison-Wesley Rabiner & Juang - ‘Fundamentals of Speech Recognition’
Prentice Hall, 1993 Ramachandran & Mamone (eds) ‘Modern Methods of
Speech Processing’ Kluer Academic, 1995
EG-348_371_09
10
Speech Communications
Person-to-Person
Person-to-Machinespeech/speaker recognition
Machine-to-Personspeech synthesis
EG-348_371_09
11
(Electronic) Speech Communications
perhaps separated by long distance(or in time)
EG-348_371_09
12
Telephony & Broadcasting
Acoustic Air Path Acoustic Air Path
Electronic Link
l Transmission Path
EG-348_371_09
13
Speech Comms: Telephony
Electronic Link
Channel Transmission Path
MicrophoneADCAnalysisCodingTransmitter
ReceiverDecoding(re-)SynthesisDACLoudspeaker
EG-348_371_09
14
Speech Bit Rates
Message
Creation
Language
Coding
Human
Acoustic
generation
Transmission
Message
Realisation
Language
decoding
Human
Hearing
Extraction
Acoustic Space
tens hundreds thousands Tens ofthousands
Approx. bit rate in bps
EG-348_371_09
15
Criteria in Speech Comms.
Quality versus Bit-rate
Qua
lity
Excellent
Good
Fair
Poor
4 8 16 32 64 kbps
GSM
ADPCM
CELP
4 Quality Measures:intelligibility loudnessnaturalness ease-of-listening
EG-348_371_09
16
Low Bit Rate Speech CodingCompandent http://www.compandent.com/
EG-348_371_09
17
Speech Processing
The three main application areas are: Speech Comms. (the ‘electronic link’) Automatic Speech/Speaker recognition Speech Synthesis
Much of the underlying analysis is common, eg linear predictive coding
EG-348_371_09
18
What does speech look like?
EG-348_371_09
19
What does speech look like?
0 1000 2000 3000 4000 5000 6000 7000
Dynamic Range - for flexibilityand robustness
Time-varying - to convey information
EG-348_371_09
20
Frame-based Analysis
0 1000 2000 3000 4000 5000 6000 7000
To capture time variations:• 20-30 ms frames - ‘centi-second’ labeling
• spectral analysisFFTFilter-bankLinear Predictive Coding
EG-348_371_09
21
Speech Analysis/Coding
Two general cases: Waveform coders Source (voice) coders (vo-coders)
Source coders eg linear predictive coding (LPC): Model the source ie the vocal tract (VT) Linear, time varying model of VT, plus excitation
H(z)
Excitation:voiced
unvoiced
speechen sn
EG-348_371_09
22
Systems Approach
VocalTract
Excitation Speech
Voiced
Unvoiced
Model
Time VaryingParameters
Speechf0
EG-348_371_09
23
LPC Analysis/Synthesis
Synthesis: Input: Excitation output: Speech
Analysis: Input: Speech output: Excitation
H(z)hn
S(z)E(z)en sn
1/H(z) E(z)S(z)sn en
EG-348_371_09
24
‘Perfect’ Analysis/Synthesis
H(z)S(z)E(z)
en sn
1/H(z) E(z)S(z)sn en
Input sn and output sn are identical (within arithmetic limits)
EG-348_371_09
25
Analysis
Coding .Synthesis
De-coding
Source Coding
SnSn
LPC-based Systems (eg CELP)
1
H z( )sn en
Analysis Re-Synthesis
)(ˆ zHne sn
Practical Analysis/Synthesis
EG-348_371_09
26
Practical Analysis/Synthesis
1/H(z) E(z)S(z)sn en
H(z)S(z)E(z)
en sn
Transmission ReceivingSending
Parameters for Transmission :• Input / Excitation en
• Source model H(z)Thus Analysis must derive these parameters, and
Synthesis must use them to re-generate speech
EG-348_371_09
27
Principle of linear prediction: The next value (or sample) in a series, ie at time n, is predicted
or estimated by a weighted sum of previous values, ie those at time n-1, n-2, ...
Thus for a predictor of order p, we have:
s a s a s a sn n n n
1 1 2 2 3 3 ........ a sn p p
Linear Predictive Coding - LPC
EG-348_371_09
28
Linear Prediction
Transforming to the z-domain gives:( ) ( ) ( ) ...... ( )
( ) { ( ) ( ) ...... ( )}
( ) ( ) { ( ) ( ) ...... ( )}
( ) ( )
( ) ( .... )
S z a z S z a z S z a z S z
S z a z S z a z S z a z S z
E z S z a z S z a z S z a z S z
A z S z
where A z a z a z a z
pp
pp
pp
pp
11
22
11
22
11
22
11
22
0
1
......s a s a s a s
a s
n n n p n p
i n ii
p
1 1 2 2
1
EG-348_371_09
29sn
)('1)(
)(zA
zS
zE
LPC Error Terms
Error is simply difference between predicted and actual values:
A’(z)
+ensn
e s s s a s
E z S z S z
S z a z S z a z S z a z S z
A z S z
where A z a z a z a z
n n n n i n ii
p
pp
pp
( ) ( ) ( )
( ) { ( ) ( ) ...... ( )}
( ) ( )
( ) ( .... )
1
11
22
11
221
ˆ-
EG-348_371_09
30
Synthesis
H(z)sn
Parameters updated at frame rate
en
A’(z)
+ snen
+
NB ‘hat’ of approximation omitted for simplicity
EG-348_371_09
31
The Analysis and Synthesis must match what is needed for the Synthesis?
Answer: en - the excitation and H(z) - the system
Thus the Analysis must derive these terms (from sn ):
The speech signal, sn is analysed to give en and H(z) ie A’(z) parameters for transmission.
Analysis for Synthesis
H(z)sn
en
Synthesis
1/H(z) E(z)S(z)
sn en
Analysis
A’(z)
+
-
ensn
Analysis
EG-348_371_09
32
Derivation of LPC Coefficients - A(z)
e s s s a sn n n n i n ii
p
1
Recall:
where ai are the p prediction coefficients.The principlebehind LPC is to find a set of p coefficients, a1, a2, a3, ...ap, which in some sense minimizes the error signal en, over a frame of speech, N. This leads to a set p coefficients for each frame.
1
0
2
1
1
0
22
N
n
p
iinin
N
nnnn sasssE
EG-348_371_09
33
Derivation of A(z) – (2)
Minimisation of En is achieved by setting the p partial derivatives to zero:
02
i
n
a
E
for i = 1, 2, .… p
01
p
kjkkj rar where:
1nknjnjk ssr
From which:
In matrix form:
0 aRr rRa 1or
The matrix [R] is Toepliz symmetric, offering numerically efficient inversion techniques - Durbin’s recursion algorithm being one of the most popular.
EG-348_371_09
34
Derivation of A(z) – (3)
When N very large r is the autocorrelation coefficients of s S comes from e convolved with h (excitation & vocal tract) we are interested here in separating e and h the predictor order, p, is small to reflect the short-term periodicities
(formants) with higher predictor orders we will get the longer-term periodicities
(pitch) 2 practical problems with evaluating a:
matrix singularities in R-1
unstable resultant H(z)
in practice both are solved by windowing - shaping frame - Hamming
EG-348_371_09
35
Speech Signal Characteristics
Duration Dynamic Range Periodicities:
vocal tract pitch
Frame-based Analysis frame size: quasi-stationary
capture transitiontypically 20 - 30ms
frame rate: task dependent: more means moreband-width/computation - up to 100 frames/second
EG-348_371_09
36
Harmonic Structures and Periodicities
Harmonic Structures & Periodicities give potential for data reduction
LPC is one way of gaining this compression
Speech has two obvious separate structures
vocal tract resonances
pitch
EG-348_371_09
37
Harmonic Structures and Periodicities
0
nenE
sase
sse
sas
in
p
iinn
nnn
in
p
iin
)( 2
1
1
ˆ
ˆ
nssn
p
Vocal tract
voicedorunvoiced
H(z)speechen sn
Tp
Short term prediction
Short Term
EG-348_371_09
38
Harmonic Structures and Periodicities
0
nenE
sase
sse
sas
in
P
iinn
nnn
in
P
iin
)( 2
1
1
ˆ
ˆ
nssn
P
Vocal tract
voiced
unvoicedHst(z)
speechepn sn
Tp
Long term prediction
Hlt(z)
Pitchen
EG-348_371_09
39
Hst(z)snHlt(z)en ep
n
Two Structures: short-term (formants) & long-term - pitch (excitation)
Harmonic Structures and Periodicities
eg 20ms frame160 samples @ 8Khz
ai eg p=3 ai eg p=10
Gain
k
NB Representations of these parameters are transmitted
EG-348_371_09
40
Waveform & Source Coders (Vocoders)Source Coders (Vocoders) 2 periodicities/redundancies in source
short-term (formants) long-term - pitch
Excitation en
Practical Coding Systems
Hst(z)snHlt(z)en epn
EG-348_371_09
41
‘Perfect’ Analysis/Synthesis (1)
H(z)S(z)E(z)
en sn
1/H(z) E(z)S(z)sn en
Input sn and output sn are identical (within arithmetic limits)
EG-348_371_09
42
‘Perfect’ Analysis/Synthesis (2)
H(z)S(z)E(z)
en sn
1/H(z) E(z)S(z)sn en
1/(1–A’(z))S(z)E(z)
en sn
1 – A’(z) E(z)S(z)sn en
1 – A’(z)sn en 1/(1–A’(z))en sn
EG-348_371_09
43
‘Perfect’ Analysis/Synthesis (3)
1 – A’(z)sn en 1/(1–A’(z))en sn
sn en
Z-1
Z-1
Z-1
a1
ai
ap
sn
sn
sn-1
sn-i
sn-p
+-
Note – minus sign:in Matlab combined with ai What determines p?
Original Speech Residual
p
iininnnn sassse
1
EG-348_371_09
44
‘Perfect’ Analysis/Synthesis (4)
1 – A’(z)sn en 1/(1–A’(z))en sn
en
Z-1
Z-1
Z-1
a1
ai
ap
sn
snen
Z-1
Z-1
Z-1
a1
ai
ap
sn-1
sn-i
sn-p
sn
sn-1
sn-i
sn-p
sn
Original Speech Residual Re-Synth.
+NoteNo minus
+-
EG-348_371_09
45
Practical System
TransmittedData Frame
H(z)S(z)E(z)
en
1/H(z) E(z)S(z)sn en
Input sn and output sn are “similar”
sn
What does the Transmitted Data Frame Contain?
EG-348_371_09
46
Analysis-by-Synthesis: LPAS
Integrated encoder & decoder at the encoder
Basicdecoder
Adaptiveencoder
sn
-
+
LPAS Encoder
Weighted error
EG-348_371_09
47
Log Spectral Estimates
Comparisons between frames are very important in many situations log spectral estimates are the most common (though in Comms. An
approximation is used to reduce computation)
))(log(
))(log(
1
)()(1
12/
0
2
0
2
zH
orsDFTSwhere
SSN
dwwSwSB
D
jwez
nk
N
kkk
B
In Comms, compuation is expensive and parameter vector approximations to D are used
EG-348_371_09
48
Some Standards
GSM European Cellular RPE-LTP13kb/s
FS1016 Secure Voice CELP 4.8
IS54 NA Cellular VSELP 7.95
IS96 “ QCELP 1-8
JDC-FR Japanese Cellular VSELP 6.7
JDC-HR “ PSI-CELP 3.67
G.728 (terrestrial) LD-CELP 16
EG-348_371_09
49
Low Bit Rate Speech CodingCompandent http://www.compandent.com/
EG-348_371_09
50
Criteria in Speech Comms.
Quality versus Bit-rate
Qua
lity
Excellent
Good
Fair
Poor
4 8 16 32 64 kbps
GSM
ADPCM
CELP
4 Quality Measures:intelligibility loudnessnaturalness ease-of-listening
EG-348_371_09
51
CELP eg
enHst(z)
snHlt(z)
CBIndex Gain
Long-term coefficients(pitch)
Short-term coefficients(formants)
Excitation is represented by address ie CB Index en
EG-348_371_09
52
CELP – LPAS (Encoder)
enHst(z) snHlt(z)
CBIndex
Gain
Long-term coefficients(pitch)
Short-term coefficients(formants)
Excitation is represented by address ie CB Index en
sn
snen
Basicdecoder
Adaptiveencoder
sn-
+Weighted error
EG-348_371_09
53
Conversion of LPC Parameters
• A(z) = 1 + a1 z - 1 + a2 z
- 2 + …… ap z - p and a i are to be Tx’d
• Line Spectral Frequencies (LSF) present a clever way of representing the LPC coefficients, the ai’s of A(z)
• The ai’s are floating point numbers and their accuracy is important
• Factorising A(z) tends to give complex roots in the z-domain
• LSF’s map these complex roots on to the unit circle
LSF’s Lead to efficient coding Ensure a minimum phase filter Bit errors are spectrum localised minimising loss of speech quality
z-plane jy
x
x
ws
LSF = ws . /2
EG-348_371_09
54
Line Spectral Frequencies
• Consider
P(z) = A(z) + z—(n+1) A(z—1 )
and
Q(z) = A(z) - z—(n+1) A(z—1 )
then P(z) and Q(z) lead to what is known as LSF’s
• Clearly if P(z) and Q(z) are known then A(z) can be found:A(z) = {P(z) + Q(z)} / 2
• Roots of P(z) and Q(z) lie on the unit circle in z-domain The locations give:
the LSF’s P(z) and Q(z), and whence A(z)
EG-348_371_09
55
LSF Evaluation
Consider one pair of complex roots, A1(z) :
A1(z) = 1 + a1 z -1 + a2 z
-2
P1(z) = 1 + a1 z -1 + a2 z
-2 + z -3 (1 + a1 z
1 + a2 z2 )
= (z2 + (a1 + a2 - 1) z + 1 )( z + 1 ) z –3
Q1(z) = 1 + a1 z -1 + a2 z
-2 - z -3 (1 + a1 z
1 + a2 z2 )
= (z2 + (a1 - a2 + 1) z + 1 )( z - 1 ) z -3
The roots at 0 and 1 are discarded
It follows that the LSF’s, 1 & 2 , are given by:
cos (1) = - (a1 + a2 - 1)/2
and cos (2) = - (a1 - a2 + 1)/2
Show:a1 = -(cos (1) + cos (2) ) and
a2 = (cos (2) - cos (1) +1 )
EG-348_371_09
56
LSF Test Example
A1(z) = 1 + a1 z -1 + a2 z
- 2
= (z2 + a1 z + a2 )z
- 2
= (z2 + 2 cos() wn z + wn
2 ) z - 2
where wn is radius and is angle from . So: radius = a2 & = -
Note: in P & Q all w n2 terms (of the multiple 2nd orders) are unity
EG 1: a2 = 1 then cos (1) = - (a1 + a2 - 1)/2 = - (a1)/2
roots already on circle and do not move (unstable system – not practical)
EG 2: a1 = 0 then cos (1) = - (a1 + a2 -1)/2 = - (a2 - 1)/2
cos (2) = - (a1 - a2 + 1)/2 = - (-a2 + 1)/2
so LSF’s are symmetric about /4
EG-348_371_09
57
LSF Review & Example (1)
LSF’s/LSP’s are defined as:
P(z) = A(z) + z-(n+1) A(z-1 )
and Q(z) = A(z) - z-(n+1) A(z-1 )
thus A(z) = {P(z) + Q(z)} / 2
EG-348_371_09
58
For a second order A(z)= 1 + a1 z-1 + a2 z-2
P (z) = 1 + a1 z-1 + a2 z-2 + (1 + a1 z1 + a2 z2)z-3
= (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3
Q (z) = 1 + a1 z-1 + a2 z-2 - (a1 z1 + a2 z2)z-3
= (z2 + (a1 - a2 + 1)z + 1)(z - 1 )z–3
cf: (s2 + ( 2cos()wn ) s + wn2)
LSF Review & Example (2)
EG-348_371_09
59
For a second order A(z)= 1 + a1 z-1 + a2 z-2 :
P (z) = (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3
Q (z) = (z2 + (a1 - a2 + 1)z + 1)(z - 1 )z–3
cf: (s2 + ( 2cos()wn )s + wn2)
Thus: (a1 + a2 - 1) = 2cos(1) = - 2cos(1)
&(a1 - a2 + 1) = - 2cos(2 )
So, given: i) LPC coeffs., a1 and a2 , then LSFs 1 & 2 can be found
ii) LSFs, 1 & 2 , then the LPC coeffs. a1 and a2 be found
00.20.40.60.8
1
-0.5 0 0.5 1
1
2 P(z)
Q(z)
P(z)Q(z)
2
1
LSF Review & Example (3)
EG-348_371_09
60
For a second order and with P(z) corresponding to the first root, Q(z) to the second root, and so P (z) = 1 + a1 z-1 + a2 z-2 + (1 + a1 z1 + a2 z2)z-3 = (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3 for the second pair of qi, 1.37 and 1.77
= (z2 - 2cos(1.37) z + 1 )(z + 1) z–3= (z3 +(1 - 2cos(1.37) z2 + (1 - 2cos(1.37))z + 1)z–3
LikewiseQ (z) = 1 + a1 z-1 + a2 z-2 - (a1 z1 + a2 z2)z-3
= (z2 + (a1 - a2 + 1)z + 1)(z - 1 )z–3 = (z2 - 2cos(1.77) z + 1 )(z - 1) z–3= (z3 +(-1 - 2cos(1.77) z2 + (1 + 2cos(1.77))z - 1)z–3
Then
A(z) = {P(z) + Q(z)} / 2) = (z3 + (cos(1.37) + cos(1.77))z2 + (1 - cos(1.37) + cos(1.77))z)z–3
LSF Review & Example (4)
EG-348_371_09
61
LSF Examples LPC coeffs. LSF’s
a1 a2 1 2
0 0.5 1.31812 1.82348
-1.8 0.9 0.31756 0.554811
+1.8 0.9 π-0.554811 π-0. 31756
2.2274 2.3743
-1 0 1
-1 0 1-1 0 1
EG-348_371_09
62
LSF Examples
LPC coeffs. LSF’s
a1 a2 1 2
0 0.5 1.31812 1.82348
-1.8 0.9 0.31756 0.554811
+1.8 0.9 π-0.554811
π-0. 31756
2.2274 2.3743
A(z)= 1 + a1 z-1 + a2 z-2
P (z) = 1 + a1 z-1 + a2 z-2 + (1 + a1 z1 + a2 z2)z-3
= (z2 + (a1 + a2 - 1)z + 1)(z + 1)z–3
= (z2 + (-1.8 + 0.9 - 1)z + 1)(z + 1)z–3
= (z2 - 1.9 z + 1) (z + 1)z–3
cf: (z2 + ( 2cos()wn ) z + wn2)
thus cos() = - 1.9/2 or = 2.824 and 1 = π -
= 0.318
EG-348_371_09
63
Bit allocation Voiced Unvoiced
V/U decision 1 1
Excitation 11 11
Sync 1 1
Φ1 = 0.3176 5 5
Φ2 = 0.5548 5 5
Φ3 = 1.4454 5 5
Φ4 = 1.6961 5 5
Φ 5 4 0
Φ 6 4 0
Φ 7 4 0
Φ 8 4 0
Φ 9 3 0
Φ 10 2 0
Error check 0 21
Total / frame 54 54
Example Bit Allocation
EG-348_371_09
64
Codebooks & VQ
p
N = 2L
i (0 … N-1)
Identical book
Data reduction: (p x B) to Ltime
p
time
EG-348_371_09
65
Principle representative data sets data vector is replaced / represented
by “nearest” vector, chosen from a “codebook” - a closed set of vectors
Examples LPC parameter sets Excitation as in CELP
Codebook Compression
M
N = 2 k
i
index, i
A(z)
enH(z)
sn
EG-348_371_09
66
P
Codebook Compression - CELP
H(z)sny ms eny ms
en are time domain samples (integers)
R samples per second (eg 8000 Hz)
Frame rate governs vector size
P = 2 j
Bit rate = j/y bits/ms
Codebook of time-domain samples
start point
y ms
NB en also includes gain
EG-348_371_09
67
A[z] at time t
time
Codebook Compression of H(z)
M
N = 2 k
i
index, i
Vector with M elements, every x ms
Codebook with N = 2 k vectors
Bit rate = k/x bits per ms (not a function of M)
In practice A[z] is converted to LSF’s.
x ms
EG-348_371_09
68
Codebook Generation
1) Initialise:form a single centroid of all training data, N=1
2) RepeatSplit centroids: N -> 2N Repeat
Cluster data to nearest centroiduntil convergence
until N large enough
EG-348_371_09
69
VQ Performance on Unseen Data
Ramachandran & Mamone (eds) ‘Modern Methods of Speech Processing’ Kluer Academic, 1995
EG-348_371_09
70
Ramachandran & Mamone (eds) ‘Modern Methods of Speech Processing’ Kluer Academic, 1995
VQ Performance on Unseen Data
EG-348_371_09
710 1 2 3 4 5-40
-20
0
20
40
Ma
gn
itu
de
(d
B)
Frequency (KHz) ( 0-to-Fs/2)
0 3.2 6.4 9.6 12.8 16 19.2 22.4 25.6-1
-0.5
0
0.5
1
Wav
efo
rm
Time (ms)
LPC & FFT SpectraLPC Roots -0.6651 ± 0.6695i -0.0560 ± 0.9709i 0.7228 ± 0.6225i 0.8714 ± 0.3694i 0.5758 -0.4200
2 of Q(z) 1 of P(z)
2.3743 2.2274
1.6540 1.5997
0.8261 0.6954
0.6106 0.3937
LSFs
EG-348_371_09
72
0 1 2 3 4 5-40
-20
0
20
40
Ma
gn
itu
de
(d
B)
Frequency (KHz) ( 0-to-Fs/2)
LPC Spectra & LSF’sLPC Roots -0.6651 ± 0.6695i -0.0560 ± 0.9709i 0.7228 ± 0.6225i 0.8714 ± 0.3694i 0.5758 -0.4200
2 of Q(z) 1 of P(z)
2.3743 2.2274
1.6540 1.5997
0.8261 0.6954
0.6106 0.3937
LSFs
-1
-0.5
0
0.5
1
-1 0 1
EG-348_371_09
730 1 2 3 4 5-40
-20
0
20
40
Frequency (KHz) ( 0-to-Fs/2)
0 3.2 6.4 9.6 12.8 16 19.2 22.4 25.6-1
-0.5
0
0.5
1
Time (ms)
A(z): 1.5537 -0.8276Roots: 0.7769 ± 0.4733i
H(0) = K (1- (1.5537 - 0.8276))
H(ws/2) = K
(1- (-1.5537 - 0.8276))
H(0) K/0.274 = = 21.8dBH(ws /2) K/ 3.38
LPC & FFT Spectra - 2nd Order
EG-348_371_09
74
GSM
Groupe Special Mobile - EU First digital cellular system in world See Hodge 1990 Based on TDMA & FDMA at 900MHz, and RPE-LPC
(ie it is an ‘LPAS’ system) Now at 1800 MHz Carriers at 200kHz Supporting 8 TDMA time slots each Time slots: 577s - 156.26 bit slots 8 time slots form 1 GSM frame of 4.62 ms Modulation: Gaussian minimum shift key 26 bit training in every time slot Round-trip delay ~ 80ms EU: GSM US: D-AMPS
EG-348_371_09
75
Other Related Topics
Spectral Lifting: H(z) = (1-az-1)
Codebook Training
Spectral Differences between 2 frames
Cepstra
Modeling Speech Space - HMM’s
EG-348_371_09
76
Pre-Emphasis Example
-8000
0
8000
-8000
0
8000
1
- 1
1
- 130ms
(a)
(b)
Figure Q1
EG-348_371_09
77
Pre-Emphasis Example
a
z-plane jy
1+a = 2
ws/2
G(ws/2) = 1 + aG(0) = 1 - a
For G(ws/2 ) > G(0) then a must be > 0
EG-348_371_09
78
1+a = 2
ws/2
0 1 2 3 4 5-30
-20
-10
0
10
20
30
40
50
Mag
nit
ud
e (d
B)
Frequency (KHz) ( 0-to-Fs/2)
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Real Part
Imag
inar
y P
art
Z-plane to Magnitude Spectrum
EG-348_371_09
79
LPC Short and Long
Spectral envelop reflects morphological characteristics of the vocal tract
H1(z) H2(z)noise synthesisedSpeech
Air fromthe lungs
Vocal fold Vocal tract Speech
EG-348_371_09
80
ST & LT Prediction
1 – A’(z)sn en
Residual
1 – A’(z) e`n
Z-1
Z-1
Z-1
a1
ai
ai
sn
sn
sn-1
sn-i
sn-p
+-Z-1
Z-1
Z-1
a1
ai
ap
+-
Z-1
ap
LTP
STP
Speech