Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Presented by Peter

AMR Narrow Band

Adaptive Multi-Rate Codec for narrow band speech (AMR-NB)

Specified by 3GPP for GSM/3G Systems Input: 8 kHz sampling rate, 13-bit PCM 20 ms frames, no overlap 8 modes + Comfort noise Output bitrate from 4.75 – 12.2 kbps Algebraic Code Excited Linear Prediction (ACELP)

is used as speech codec

Frequency Response

Speech Encoder

Pre-processing Linear prediction analysis and quantization Open-loop pitch analysis Impulse response computation Target signal computation Adaptive codebook Algebraic codebook Quantization of the adaptive and fixed codebook

gains Memory update

Principles of the adaptive multi-rate speech encoder Eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40,

6.70, 5.90, 5.15 and 4.75 kbit/s 10th order linear prediction (LP), or short‑term, synthesis filter

is used which is given by

The long‑term, or pitch, synthesis filter is given by The pitch synthesis filter is implemented using adaptive

codebook approach

H z

A z a zii

im

1 1

1 1

1 1

1B z g zpT

ACELP

w in d o w in ga n d

au to co rre la tio nR [ ]

L e v in so n -D u rb in

R [ ] A (z)

A(z)

L S Pqu a n tiza t io n

com pu te ta rge tfo r

in n ov ation

u p d ate filte rm e m o ries fo rn e x t su b fram e

O p en -lo o p p itc h se a rc h Adaptive codebooksearch

Innovative codebooksearch

F ilter m em oryupdate

in te rp o la tio n

su b fra m esL SP A(z)

L S P

co m p u tew e ig h te d

sp e ec h(4 su b fram e s )

fin do p en -lo o p p itc h

fin d b e s t in n o v ation

f ix e d c o d e b o o k

g a in q u a n tiz a tio n

A(z)^

x(n)

pitchin d ex

c o dein d ex

fram e subfram e

s(n)c om p u te targ et

fo r ad aptiveco d eb oo k

T ofin d b e st d e la y

a n d g a in

x(n)

c om p u teim p u lse

re sp o n s eA(z)^

A(z)h(n)

h(n)

A (z)

LPC a nalysis(tw ice per fram e)

A (z)

(tw ice per fram e)

x (n)2

q u a n tiz eL T P -g a in

com p uteadap tive

codebookcon trib utio n

L S Pind ic e s

L T Pg a in

in d e x

g a in in d e xfix e d co d e b o o k

in te rp o la t io nfor th e 4

su b fra m esL S P A (z)^

fo r th e 4

P re - p ro c e s s in g

Pre-pro cessing

c om p u tee xc ita tio n

Pre-Processing

Two pre‑processing functions high‑pass filtering signal down‑scaling –

prevent overflow

A filter with a cut off frequency of 80 Hz is used

21

21

1 911376953.0906005859.11927246903.08544941.1927246093.0)(

zz

zzzH h

Linear Prediction Analysis

Frame is spit into four sub-frames 12.2 kbit/s mode

Performed twice per frame 30ms asymmetric window No lookahead

10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s Performed once per frame 30ms asymmetric window 5ms lookahead

Windowing and Auto-correlation Computation 12.2 kbit/s mode

Two different asymmetric windows 1st window concentrates on 2nd sub-frame 2nd window concentrates on 4th sub-frame

w n

nL

n L

n LL

n L L LI

II

I

II I I

( ). .46 , , , ,

. .46 ( ) , , , .

( )( )

( )

( )( ) ( ) ( )

0 54 01

0 1

0 54 01

1

11

1

21 1 2

cos

cos

w n

nL

n L

n LL

n L L LII

IIII

II

IIII II II

( ). .46 , , , ,

( ) , , ,

( )( )

( )

( )( ) ( ) ( )

0 54 0 22 1

0 1

24 1

1

11

1

21 1 2

cos

cos

20 ms5 ms

frame (160 samples) sub frame(40 samples)

frame n-1 frame n

t

Iw (n) IIw (n)

L II1 232( )

L II2 8( )

Windowing and Auto-correlation Computation 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s

One asymmetric windows Concentrates on 4th sub-frame 5ms (40 samples) lookahead

w n

nL

n L

n LL

n L L LII

IIII

II

IIII II II

( ). .46 , , , ,

( ), , ,

( )( )

( )

( )( ) ( ) ( )

0 54 0 22 1

0 1

24 1

1

11

1

21 1 2

cos

cos

2001 L

402 L

Currentframe

Nextframe

Previousframe

Auto-correlation Computation Lag 0 to 10 is computed

is the windowed speech 60 Hz bandwidth expansion is used by lag windowing

is multiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor at ‑40 dB

r k s n s n k kacn k

( ) ' ( ) ' ( ) , , , , 239

0 10

s n n, ,0 239

w if i

filag

s

exp , ,12

21 100

2

f0 60

f s 8000

rac ( )0

r rac ac' ( ) . ( )0 1 0001 0 r k r k w k kac ac lag' ( ) ( ) ( ), , , 1 10

Levinson‑Durbin algorithm

by solving the set of equations

uses the following recursion:

The final solution is given as

a r i k r i ik ack

ac' ' ( ) , , , .

1

10

1 10

E ria

k a r i j E i

a kj i

a a k a

E i k E i

LD ac

i

i ji

acji

LD

ii

i

ji

ji

i i ji

LD i LD

( ) ' ( )

' ( ) / ( )

( ) ( ) ( )

( )

( )

( )

( ) ( ) ( )

0 01 10

1

1

1 1

1 1

01

10

1

1 1

2

for to do

for to do

end end

a a jj j ( ) , , ,10 1 10

LP to LSP conversion

The LP filter coefficients, are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes

The LSPs are defined as the roots of the sum and difference polynomials

All roots of these polynomials are on the unit circle and they alternate each other

z=-1 and 1 are eliminated

F z A z z A z111 1 F z A z z A z2

11 1

F z F z z1 111 F z F z z2 2

11

F z q z zii

11 2

1 3 91 2

, , ,

F z q z zii

21 2

2 4 101 2

, , ,

qi icos

LP to LSP conversion

Quantization of the LSP coefficients 12.2 kbit/s mode

Two sets of LSP are quantified using the representation in the frequency domain

1st order MA prediction is applied two residual LSF vectors are jointly quantified using split matrix

quantization (SMQ) weighted LSP distortion measure is used in the quantization

process 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes

1st order MA prediction is applied residual LSF vector is quantified using split vector quantization weighted LSP distortion measure

f f q iis

i 2

1 10

arccos , , , ,

Interpolation of the LSPs

12.2 kbit/s mode interpolated LSP vectors at the 1st and 3rd subframes are

given by

10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes interpolated LSP vectors at the 1st, 2nd, and 3rd subframes

are given by

. . , . . .

( ) ( ) ( )

( ) ( ) ( )q q qq q q1 4

12

3 2 4

05 050 5 05

n n n

n n n

. . , . . , . . .

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

q q qq q qq q q

1 41

4

2 41

4

2 41

4

0 75 0 250 5 0 5

0 25 0 75

n n n

n n n

n n n

Open‑loop pitch analysis Performed twice per frame (each 10 ms) for 12.2k, 10.2k,

7.95k, 7.40, 6.70k, 5.90k bit/s modes Performed once per frame for 5.15k, 4.75k bit/s modes Filtering the pre-processed signal with a perceptual weighting

filter

10

12

10

11

2

1

1

1

)/()/()(

i

ii

i

i

ii

i

za

za

zAzAzW

originalweightedunit circle

6.0,94.0 21 7.04.0,94.0 21 Flat: Tilted:

0 500 1000 1500 2000 2500 3000 3500 4000-20

-10

0

10

20

frequency (Hz)

dB

Frequency Response for /ey

Perceptual Weight Fn.

vocal tract filter LP filter

Impulse response computation The impulse response, h(n) is computed each

subframe For the search of adaptive and fixed codebooks Computed by filtering the vector of coefficients of

the filter extended by zeros through the two filters and

A z 1

1 A z 1 2A z

H z W z A z A z A z 1 2

Adaptive codebook

Adaptive codebook search is performed on a subframe basis

The parameters are the delay and gain of the pitch filter

The codebook contain entries taken from the previously synthesized excitation signal

Algebraic codebook Encode the random portion of the excitation signal The periodic portion of the weighted residual is first

removed. Only the random portion is remained to be coded by fixed codebook

Codebook search by minimize error between perceptual weighted input speech and reconstructed speech

Based on interleaved single-pulse permutation (ISPP) design A few sparse impulse sequence that are phase-shifted

version of each other All the pulses have the same magnitude Amplitudes are +1 or -1

Speech decoder

Codebook parameter are decoded by table look up LSP coefficients are interpolated and converted to

LP coefficients Excitation = sum of adaptive and fixed codebook

vectors multiplied by their respective gains in each subframe

Speech = excitation through vocal tract filter. Enhanced perceived quality by adaptive post-

filtering.

Speech decoder

L S Pin dices

d ecode L S P

inte rp o la tio n o f L S P fo r th e4 su b fram e s

L S P

dec o dea d ap tiv e

c odebo ok

d eco deinn ov ativeco deb o ok

pitchindex

cod eindex

de cod egains

A (z)^

con struc tex c ita tio n

fram e subfram e post-p rocess ing

s'(n)^s(n)^p o st f il te r

gainsind ices

synthe sisfilter

Synthesis model

A(z)1 s(n)^

+

v(n)

c(n)

u(n)

g c

fixedcodebook

adaptive codebook g p

LP synthesis

post-filtering s'(n)^

Synthesis model

To reconstruct speech A noise-like speech A pitch filter model of the glottal vibrations A linear prediction filter model of the vocal tract

Post‑processing

Adaptive post-filtering Cascade of two filters: a format postfilter and a tilt

compensation filter Updated every subframe of 5 ms

High-pass filter Against undesired low frequency components Cut-off frequency of 60 Hz is used

Up-scaling by a factor of 2 to compensate for the down-scaling by 2 which is applied to the input signal

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Documents

Transcript of Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec