AMR THAI prepress Documents/AMR/01.pdf · AMR THAI prepress ... 3 °
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
description
Transcript of Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Presented by Peter
AMR Narrow Band
Adaptive Multi-Rate Codec for narrow band speech (AMR-NB)
Specified by 3GPP for GSM/3G Systems Input: 8 kHz sampling rate, 13-bit PCM 20 ms frames, no overlap 8 modes + Comfort noise Output bitrate from 4.75 – 12.2 kbps Algebraic Code Excited Linear Prediction (ACELP)
is used as speech codec
Frequency Response
Speech Encoder
Pre-processing Linear prediction analysis and quantization Open-loop pitch analysis Impulse response computation Target signal computation Adaptive codebook Algebraic codebook Quantization of the adaptive and fixed codebook
gains Memory update
Principles of the adaptive multi-rate speech encoder Eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40,
6.70, 5.90, 5.15 and 4.75 kbit/s 10th order linear prediction (LP), or short‑term, synthesis filter
is used which is given by
The long‑term, or pitch, synthesis filter is given by The pitch synthesis filter is implemented using adaptive
codebook approach
H z
A z a zii
im
1 1
1 1
1 1
1B z g zpT
ACELP
w in d o w in ga n d
au to co rre la tio nR [ ]
L e v in so n -D u rb in
R [ ] A (z)
A(z)
L S Pqu a n tiza t io n
com pu te ta rge tfo r
in n ov ation
u p d ate filte rm e m o ries fo rn e x t su b fram e
O p en -lo o p p itc h se a rc h Adaptive codebooksearch
Innovative codebooksearch
F ilter m em oryupdate
in te rp o la tio n
su b fra m esL SP A(z)
L S P
co m p u tew e ig h te d
sp e ec h(4 su b fram e s )
fin do p en -lo o p p itc h
fin d b e s t in n o v ation
f ix e d c o d e b o o k
g a in q u a n tiz a tio n
A(z)^
x(n)
pitchin d ex
c o dein d ex
fram e subfram e
s(n)c om p u te targ et
fo r ad aptiveco d eb oo k
T ofin d b e st d e la y
a n d g a in
x(n)
c om p u teim p u lse
re sp o n s eA(z)^
A(z)h(n)
h(n)
A (z)
LPC a nalysis(tw ice per fram e)
A (z)
(tw ice per fram e)
x (n)2
q u a n tiz eL T P -g a in
com p uteadap tive
codebookcon trib utio n
L S Pind ic e s
L T Pg a in
in d e x
g a in in d e xfix e d co d e b o o k
in te rp o la t io nfor th e 4
su b fra m esL S P A (z)^
fo r th e 4
P re - p ro c e s s in g
Pre-pro cessing
c om p u tee xc ita tio n
Pre-Processing
Two pre‑processing functions high‑pass filtering signal down‑scaling –
prevent overflow
A filter with a cut off frequency of 80 Hz is used
21
21
1 911376953.0906005859.11927246903.08544941.1927246093.0)(
zz
zzzH h
Linear Prediction Analysis
Frame is spit into four sub-frames 12.2 kbit/s mode
Performed twice per frame 30ms asymmetric window No lookahead
10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s Performed once per frame 30ms asymmetric window 5ms lookahead
Windowing and Auto-correlation Computation 12.2 kbit/s mode
Two different asymmetric windows 1st window concentrates on 2nd sub-frame 2nd window concentrates on 4th sub-frame
w n
nL
n L
n LL
n L L LI
II
I
II I I
( ). .46 , , , ,
. .46 ( ) , , , .
( )( )
( )
( )( ) ( ) ( )
0 54 01
0 1
0 54 01
1
11
1
21 1 2
cos
cos
w n
nL
n L
n LL
n L L LII
IIII
II
IIII II II
( ). .46 , , , ,
( ) , , ,
( )( )
( )
( )( ) ( ) ( )
0 54 0 22 1
0 1
24 1
1
11
1
21 1 2
cos
cos
20 ms5 ms
frame (160 samples) sub frame(40 samples)
frame n-1 frame n
t
Iw (n) IIw (n)
L II1 232( )
L II2 8( )
Windowing and Auto-correlation Computation 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s
One asymmetric windows Concentrates on 4th sub-frame 5ms (40 samples) lookahead
w n
nL
n L
n LL
n L L LII
IIII
II
IIII II II
( ). .46 , , , ,
( ), , ,
( )( )
( )
( )( ) ( ) ( )
0 54 0 22 1
0 1
24 1
1
11
1
21 1 2
cos
cos
2001 L
402 L
Currentframe
Nextframe
Previousframe
Auto-correlation Computation Lag 0 to 10 is computed
is the windowed speech 60 Hz bandwidth expansion is used by lag windowing
is multiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor at ‑40 dB
r k s n s n k kacn k
( ) ' ( ) ' ( ) , , , , 239
0 10
s n n, ,0 239
w if i
filag
s
exp , ,12
21 100
2
f0 60
f s 8000
rac ( )0
r rac ac' ( ) . ( )0 1 0001 0 r k r k w k kac ac lag' ( ) ( ) ( ), , , 1 10
Levinson‑Durbin algorithm
by solving the set of equations
uses the following recursion:
The final solution is given as
a r i k r i ik ack
ac' ' ( ) , , , .
1
10
1 10
E ria
k a r i j E i
a kj i
a a k a
E i k E i
LD ac
i
i ji
acji
LD
ii
i
ji
ji
i i ji
LD i LD
( ) ' ( )
' ( ) / ( )
( ) ( ) ( )
( )
( )
( )
( ) ( ) ( )
0 01 10
1
1
1 1
1 1
01
10
1
1 1
2
for to do
for to do
end end
a a jj j ( ) , , ,10 1 10
LP to LSP conversion
The LP filter coefficients, are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes
The LSPs are defined as the roots of the sum and difference polynomials
All roots of these polynomials are on the unit circle and they alternate each other
z=-1 and 1 are eliminated
F z A z z A z111 1 F z A z z A z2
11 1
F z F z z1 111 F z F z z2 2
11
F z q z zii
11 2
1 3 91 2
, , ,
F z q z zii
21 2
2 4 101 2
, , ,
qi icos
LP to LSP conversion
Quantization of the LSP coefficients 12.2 kbit/s mode
Two sets of LSP are quantified using the representation in the frequency domain
1st order MA prediction is applied two residual LSF vectors are jointly quantified using split matrix
quantization (SMQ) weighted LSP distortion measure is used in the quantization
process 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes
1st order MA prediction is applied residual LSF vector is quantified using split vector quantization weighted LSP distortion measure
f f q iis
i 2
1 10
arccos , , , ,
Interpolation of the LSPs
12.2 kbit/s mode interpolated LSP vectors at the 1st and 3rd subframes are
given by
10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes interpolated LSP vectors at the 1st, 2nd, and 3rd subframes
are given by
. . , . . .
( ) ( ) ( )
( ) ( ) ( )q q qq q q1 4
12
3 2 4
05 050 5 05
n n n
n n n
. . , . . , . . .
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
q q qq q qq q q
1 41
4
2 41
4
2 41
4
0 75 0 250 5 0 5
0 25 0 75
n n n
n n n
n n n
Open‑loop pitch analysis Performed twice per frame (each 10 ms) for 12.2k, 10.2k,
7.95k, 7.40, 6.70k, 5.90k bit/s modes Performed once per frame for 5.15k, 4.75k bit/s modes Filtering the pre-processed signal with a perceptual weighting
filter
10
12
10
11
2
1
1
1
)/()/()(
i
ii
i
i
ii
i
za
za
zAzAzW
originalweightedunit circle
6.0,94.0 21 7.04.0,94.0 21 Flat: Tilted:
0 500 1000 1500 2000 2500 3000 3500 4000-20
-10
0
10
20
frequency (Hz)
dB
Frequency Response for /ey
Perceptual Weight Fn.
vocal tract filter LP filter
Impulse response computation The impulse response, h(n) is computed each
subframe For the search of adaptive and fixed codebooks Computed by filtering the vector of coefficients of
the filter extended by zeros through the two filters and
A z 1
1 A z 1 2A z
H z W z A z A z A z 1 2
Adaptive codebook
Adaptive codebook search is performed on a subframe basis
The parameters are the delay and gain of the pitch filter
The codebook contain entries taken from the previously synthesized excitation signal
Algebraic codebook Encode the random portion of the excitation signal The periodic portion of the weighted residual is first
removed. Only the random portion is remained to be coded by fixed codebook
Codebook search by minimize error between perceptual weighted input speech and reconstructed speech
Based on interleaved single-pulse permutation (ISPP) design A few sparse impulse sequence that are phase-shifted
version of each other All the pulses have the same magnitude Amplitudes are +1 or -1
Speech decoder
Codebook parameter are decoded by table look up LSP coefficients are interpolated and converted to
LP coefficients Excitation = sum of adaptive and fixed codebook
vectors multiplied by their respective gains in each subframe
Speech = excitation through vocal tract filter. Enhanced perceived quality by adaptive post-
filtering.
Speech decoder
L S Pin dices
d ecode L S P
inte rp o la tio n o f L S P fo r th e4 su b fram e s
L S P
dec o dea d ap tiv e
c odebo ok
d eco deinn ov ativeco deb o ok
pitchindex
cod eindex
de cod egains
A (z)^
con struc tex c ita tio n
fram e subfram e post-p rocess ing
s'(n)^s(n)^p o st f il te r
gainsind ices
synthe sisfilter
Synthesis model
A(z)1 s(n)^
+
v(n)
c(n)
u(n)
g c
fixedcodebook
adaptive codebook g p
LP synthesis
post-filtering s'(n)^
Synthesis model
To reconstruct speech A noise-like speech A pitch filter model of the glottal vibrations A linear prediction filter model of the vocal tract
Post‑processing
Adaptive post-filtering Cascade of two filters: a format postfilter and a tilt
compensation filter Updated every subframe of 5 ms
High-pass filter Against undesired low frequency components Cut-off frequency of 60 Hz is used
Up-scaling by a factor of 2 to compensate for the down-scaling by 2 which is applied to the input signal