Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December...

49

Click here to load reader

Transcript of Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December...

Page 1: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Time-Scale Modificationof Speech Signals

Bill Floyd

ECE 5525 – Digital Speech Processing

December 14, 2004

Page 2: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 2 of 49

Objectives

Introduction Background Theory

Methods Examples

Matlab Code Short Time Fourier Transform Short Time Fourier Transform Magnitude Speech Samples

Conclusion Questions References

Page 3: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 3 of 49

Introduction

Goal To either speed up or slow down a speech

signal while maintaining the approximate pitch Applications

Change voice mail playback Court stenographers-play proceedings quicker Sound effects Etc…

Page 4: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 4 of 49

Introduction

Option 1 – Change sample rate If you modify the sample rate, you can change

the speed but the pitch is also changed Increase sample rate = higher pitch (chipmunk

sound) Decrease sample rate = lower pitch (drawn out

echo sound) Option 2 – Decimate or Interpolate Signal

If you change the number of samples, the result is the same as modifying the sample rate

Page 5: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 5 of 49

Introduction

Option 3 – Use more complex methods This will change the speed of the sample while

preserving the pitch data Short Time Fourier Transform Short Time Fourier Transform Magnitude Sinusoidal Synthesis Linear Prediction Synthesis

Page 6: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 6 of 49

Terminology

0 100 200 300 400 500 600 7000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Window Representation

Window Size

Frame Rate

Page 7: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 7 of 49

Theory

Short Time Fourier Transform Methods Chapter 7 in our text (Discrete-Time Speech

Signal Processing) Refer to notes from in class for mathematical

theory of operation I will pick up from where Dr. Kepuska stopped

in his notes

Page 8: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 8 of 49

Short Time Fourier Transform

Short Time Fourier Transform Also called the Fairbanks method Extract successive short-time segments and

then discard the following ones

STFTDecimateSamples

IFFT

OLA

Signal

Output

Page 9: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 9 of 49

Short Time Fourier Transform

Frame Rate factor L In frequency domain after taking the STFT,

you get X(nL,ω)

Form a new signal by Y(nL, ω) = X(snL, ω)

where s = compression factor

Take Inverse Fourier Transform Use Overlap and Add method to form new

signal

Page 10: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 10 of 49

Short Time Fourier Transform

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

X(nL, ω)

Y(nL, ω)= X(2nL, ω)

Page 11: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 11 of 49

Short Time Fourier Transform

0 100 200 300 400 500 600 7000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Window Representation

0 100 200 300 400 500 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

New Sequence

OriginalWindowedSequence

Page 12: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 12 of 49

Short Time Fourier Transform

Problems Pitch Synchronization

It is highly likely that the pitch periods will not line up properly

Page 13: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 13 of 49

Short Time Fourier Transform Magnitude Short Time Fourier Transform Magnitude

Problems with STFT method relate directly to the linear phase component of the STFT

Time shift = phase change Alternate approach is to only use the

magnitude portion of the STFT—Short Time Fourier Transform Magnitude

Page 14: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 14 of 49

Short Time Fourier Transform Magnitude Compression

With the Fairbanks method, time slices were discarded

Now we can just compress the time slices Form a new signal by

|Y(nM, ω)| = |X(nL, ω)| where M = compression factor = L / speed i.e. for speeding up by two => M = L/2

Page 15: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 15 of 49

Short Time Fourier Transform Magnitude Compression

Take Inverse Fourier Transform Use Overlap and Add method to form new

signal

Page 16: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 16 of 49

Short Time Fourier Transform Magnitude

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

X(nL, ω)

Y(nM, ω)= X(nL, ω)

M=L/2

Page 17: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 17 of 49

Short Time Fourier Transform Magnitude

0 100 200 300 400 500 600 7000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Window Representation

New Sequence

OriginalWindowedSequence

-50 0 50 100 150 200 250 300 350 400 4500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 18: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 18 of 49

Other Methods

Sinusoidal Synthesis—Chapter 9 Time-warp the sinewave frequency track and

the amplitude function This technique has been successful with not

only speech but also music, biological, and mechanical signals

Problems Does not maintain the original phase relations Suffer from reverberance

Page 19: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 19 of 49

Other Methods

Linear Prediction Synthesis Use Homomorphic and Linear Prediction

results to modify the time base Book briefly mentions this is possible but ran

out of time before I could investigate this process more

Page 20: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 20 of 49

Other Methods

New Techniques Internet search showed several methods

trying to improve on what is out there now Software

Different software programs that will change speed for you

Adobe Audition is one of the most all encompassing right now

Page 21: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 21 of 49

Matlab Code-Prepare the Workspace

%%%%%%%%%%%%%%%%% Prepare Workspace%%%%%%%%%%%%%%%%

close all;clear all;

window_size_1 = 200;frame_rate_1 = 100;

%Speed to slow down byspeed = 2;

Page 22: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 22 of 49

Matlab Code-Load the Speech Signal

%%%%%%%%%%%%%%%%% Load Data File%%%%%%%%%%%%%%%%

filename = input('Please enter the file name to be used. ');

[sample_data,sample_rate,nbits] = wavread(filename);

loop_time = floor(max(size(sample_data))/frame_rate_1);

sample_data((max(size(sample_data))):(loop_time+1)* frame_rate_1)=0;

Page 23: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 23 of 49

Matlab Code-Develop the Window

%%%%%%%%%%%%%%%%% Create Windows%%%%%%%%%%%%%%%%

% Want windows of 25ms% File sampled at 10,000 samples/sec% Want a window of size 10000 * 25ms(10ms)

triangle_30ms = triang(window_size_1);%triangle_30ms = hamming(window_size_1);

W0 = sum(triangle_30ms);

Page 24: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 24 of 49

Matlab Code-Window the Entire Speech Signal

%%%%%%%%%%%%%%%%% Window the speech%%%%%%%%%%%%%%%%

for i =0:loop_time-1

window_data(:,i+1)=sample_data((frame_rate_1*i)+1:((i+2)* frame_rate_1)).*triangle_30ms;

end

Page 25: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 25 of 49

Matlab Code-Perform the Fast Fourier Transform

%%%%%%%%%%%%%%%%% Create FFT%%%%%%%%%%%%%%%%

for i = 1:loop_time

window_data_fft(:,i) = fft(window_data(:,i),1024);

end

Page 26: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 26 of 49

Matlab Code-Recreate the Modified Signal

%%%%%%%%%%%%%%%%% Recreate Original Signal%%%%%%%%%%%%%%%%

%Initialize the recreated signals

reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0;real_reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0;

modified_reconstructed_signal(1:(loop_time+3)*(frame_rate_1/speed))=0;

modified_reconstructed_signal_compressed(1:(loop_time+3)* (frame_rate_1/ speed))=0;

Page 27: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 27 of 49

Matlab Code-Recreate the Modified Signal

% Perform the ifft

for i = 1:loop_time recreated_data_ifft(:,i) = ifft(window_data_fft(:,i),1024); real_recreated_data_ifft(:,i) = ifft(abs(window_data_fft(:,i)),1024);

truncated_recreated_data_ifft(:,i) = recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0);

real_truncated_recreated_data_ifft(:,i) = real_recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0);

end

Page 28: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 28 of 49

Matlab Code-Recreate the Modified Signal

% Get back to the original signal

for i=0:loop_time-1

reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + truncated_recreated_data_ifft(:,i+1)';

real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i+1)';

end

Page 29: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 29 of 49

Matlab Code-Recreate the Modified Signal

% Get a modified signal by deleting certain parts (STFT)

for i=0:(loop_time-1)/speed

modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)* frame_rate_1)) = modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i*speed+1)';

end

Page 30: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 30 of 49

Matlab Code-Recreate the Modified Signal

% Initialize the compressed sequence (STFTM)

modified_reconstructed_signal_compressed(1:frame_rate_1+frame_rate_1/speed+1)=truncated_recreated_data_ifft(frame_rate_1-frame_rate_1/speed:window_size_1,1)';

% Get a modified signal by compressing

for i=0:(loop_time-2) modified_reconstructed_signal_compressed((frame_rate_1/speed*i)

+1:(frame_rate_1/speed*i)+window_size_1) = modified_reconstructed_signal_compressed((frame_rate_1/speed*i)+1:(frame_rate_1/speed*i)+window_size_1) + real_truncated_recreated_data_ifft(:,i+2)';

end

Page 31: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 31 of 49

Matlab Code-Plot Results

%%%%%%%%%%%%%%%%% Plot Results%%%%%%%%%%%%%%%%

Figure; subplot(211)plot(sample_data)title('Original Speech'); v1=axis;hold on; subplot(212)plot(real(modified_reconstructed_signal))title(['STFT Synthesis w/ Speed = ',num2str(speed),'X']); v2=axis;if speed > 1 subplot(211); axis(v1) subplot(212); axis(v1)else subplot(211); axis(v2) subplot(212); axis(v2)end

Page 32: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 32 of 49

Matlab Code-Write Sound Files

%%%%%%%%%%%%%%%%% Write sound files%%%%%%%%%%%%%%%%

wavwrite(modified_reconstructed_signal,sample_rate,nbits,'C:\Classes\ECE_5525\tea party fairbanks 2x.wav')

Page 33: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 33 of 49

Examples Baseline Samples

STFT Sound file

STFTM Sound file

Original File

Sample Rate 2X

Sample Rate .5X

Page 34: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 34 of 49

Examples STFT—Speed 0.5X

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-0.4

-0.2

0

0.2

0.4

0.6Original Speech

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-0.4

-0.2

0

0.2

0.4

0.6STFT Synthesis w/ Speed = 0.5X

Sound file

Page 35: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 35 of 49

Examples STFT—Speed 2X

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1Original Speech

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1STFT Synthesis w/ Speed = 2X

Sound file

Page 36: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 36 of 49

Examples STFT—Speed 4X

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1Original Speech

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1STFT Synthesis w/ Speed = 4X

Sound file

Page 37: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 37 of 49

Examples STFTM—Speed 0.5X

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-0.4

-0.2

0

0.2

0.4

0.6Original Speech

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-0.4

-0.2

0

0.2

0.4

0.6STFTM Synthesis w/ Speed = 0.5X

Sound file

Page 38: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 38 of 49

Examples STFTM—Speed 2X

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1Original Speech

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1STFTM Synthesis w/ Speed = 2X

Sound file

Page 39: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 39 of 49

Examples STFTM—Speed 4X

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1Original Speech

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1STFTM Synthesis w/ Speed = 4X

Sound file

Page 40: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 40 of 49

More Results

Change in window size If the window size becomes too small, then a

change in pitch will occur Need window to be 2 to 3 pitch periods long I generally used 20 – 30 ms windows

Page 41: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 41 of 49

More Results

Change in frame rate If the frame rate decreases too much, then there will

be too many samples overlapping to get an intelligible signal

-50 0 50 100 150 200 250 300 350 400 4500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 42: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 42 of 49

More Results

Change filter type Tried Hamming—not much perceptual

difference Using the window energy becomes important

here Frame Rate/W0 is not equal to one

Page 43: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 43 of 49

Conclusion

Optimum area Frame rate is one half of the window size Window size needs to be 2 to 3 pitch periods

long It is possible to easily change the time scale

and still maintain the original pitch although the result is not always natural sounding

Page 44: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 44 of 49

Conclusion

Further investigation What to do when you want to slow down over

half. Using the STFTM means there will be gaps

between the sequences

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 45: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 45 of 49

Conclusion

Further investigation What to do when you want to slow down over half

Could replicate windowed segments

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 46: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 46 of 49

Conclusion

Further investigation Use the other methods to determine quality

Implement Sinusoidal Synthesis Implement Linear Predictive Synthesis using linear

prediction and homomorphic methods Work on synchronizing pitch periods

Shift samples so that the peaks line up Scott and Gerber—Synchronized Overlap and Add (SOLA) Cross-correlation of two samples to find peak Use the peaks to line up samples

Align the window at same relative location within a pitch period

Page 47: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 47 of 49

Questions

Are there any questions?

Page 48: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 48 of 49

References

Quatieri, Thomas E. Discrete-Time Speech Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2002.

Rabiner, L.R. and Schafer, R.W. Digital Processing of Speech Signals. Prentice Hall, Upper Saddle River, NJ, 1978.

Oppenheim, A.V and Schafer, R.W. Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ, 1975.

Scott, R. and Gerber, S. “Pitch Synchronous Time-Compression of Speech,” Proc. Conf. Speech Communications Processing, p63-85, April 1972.

Page 49: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004.

Slide 49 of 49

References

Fairbanks, G., Everitt, W.L., and Jaeger, R.P. “Method for Time or Frequency Compression-Expansion of Speech,” IEEE Transaction Audio and Electroacoustics, vol. AU-2 pp.7-12, Jan 1954.