Sinusoidal Synthesis of Speech Using MATLAB
-
Upload
akshay-jain -
Category
Documents
-
view
233 -
download
0
Transcript of Sinusoidal Synthesis of Speech Using MATLAB
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
1/35
1
SINUSOIDAL SYNTHESIS OF SPEECH USING MATLAB
Thesis
Submitted in partial fulfillment of the requirement of
BITS C421T Thesis
BY
AKSHAY VIJAY JAIN
2009B4A8568P
Under the supervision ofDr. RAHUL SINGHAL
Assistant Professor, EEE
Dept.
BITS-Pilani
AT
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
November, 2013
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
2/35
2
ACKNOWLEDGEMENT
I would like to thank the Almighty first of all for his blessings.
I am obliged to Prof B.N. JAIN, Vice Chancellor, Birla Institute of Technology & Science, Pilani
for providing us with a course pattern where a student gets exposure to projects.
I wish to express deep sense of gratitude to DrRahul Singhal, my supervisor for Thesis named
Sinusoidal Synthesis of speech using MATLAB for providing me this wonderful opportunity to
learn about various parameters associated with speech and synthesis of speech from spectrogram.
I would also like to thanks him for his constant advice, encouragement and support in the study.
I wish to express gratitude to all other people as well as all the websites for the content they
provided me for performance of research work.
Last but not the least; I would like to thank our parents for their constant support and motivation.
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
3/35
3
CERTIFICATE
This is to certify that Thesis entitled ____________________________
________Sinusoidal Synthesis of Speech using Matlab
______________ is submitted by _Akshay Vijay Jain_ ID NO _2009B4A8568P in partial fulfillment of the requirement of the BITS
C421T Thesis embodies the work done by him under my supervision
Signature of Supervisor
Date: 25 November 2013 Dr Rahul Singhal
Assistant Professor,EEE Department,
BITS PILANI, PILANI CAMPUS
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
4/35
4
Thesis Abstract
This thesis report discusses speech signal, how it is stored on computer,
how it is analyzed and how it is synthesized. One of the way ofanalyzing speech signal is Short Time Fourier Transform, which is
discussed in the Thesis report along with its parameter. Based on this
analysis of speech signal, we are extracting the matrix containing
frequency present in the signal as function of time. Then after having
obtained the matrix from the spectrogram generated from the MATLAB,
we try to resynthesize the speech signal back by sinusoidal addition
using MATLAB code.
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
5/35
5
TABLE OF CONTENTS
1)Introduction
2)Recording of speech signal
3)Analysis of speech signal
a) Long term frequency analysis
b) Window sequence
c) Effect of window
d) Choice of window
e) Parameters of Short Term Frequency Spectrum
f) Time-Frequency domain: spectrogram
g) Length of window and fundamental frequency
4)Why sinusoids?
5)Additive synthesis
6)Frequency Vs Time matrix from spectrogram in MATLAB
1.GenerateFreqVsTime Matlab Code
2.Croplimit MatlabCroplimit Code
3.Screenshots
7)Speech signal from Frequency Vs Time matrix in MATLAB
1.GenerateSoundData Matlab Code
2.TestAtLevel Matlab Code
8)Conclusion
9)Bibliography/Reference
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
6/35
6
1) IntroductionWe all know speech is an acoustic signal by that we mean that it is a
mechanical wave that is an oscillation of pressure transmitted through
solid liquid or gas and it is composed of frequencies within hearing
range. Sound is a sequence of waves of pressure that propagates through
compressible media such as air or water. (Sound can propagate through
solids as well, but there are additional modes of propagation). Sound that
is perceptible by humans has frequencies from about 20 Hz to
20,000 Hz. In air at standard temperature and pressure, the
corresponding wavelengths of sound waves range from 17 m to 17 mm.
During propagation, waves can be reflected, refracted, or attenuated bythe medium.
Figure 1. Typical sound signal
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
7/35
7
2) Recording of SpeechSound recording is an electrical or mechanical inscription of sound waves,
such as spoken voice, singing, instrumental music, or sound effects. The
two main classes of sound recording technology are analogrecording and digital recording. Acoustic analog recording is achieved by
a small microphone diaphragm that can detect changes in atmospheric
pressure (acoustic sound waves) and record them as a graphic
representation of the sound waves on a medium such as a phonograph (in
which a stylus senses grooves on a record). In magnetic tape recording,
the sound waves vibrate the microphone diaphragm and are converted into
a varying electric current, which is then converted to a varying magnetic
field by an electromagnet, which makes a representation of the sound asmagnetized areas on a plastic tape with a magnetic coating on it.
Digital recording converts the analog sound signal picked up by the
microphone to a digital form by a process of digitization, allowing it to
be stored and transmitted by a wider variety of media. Digital recording
stores audio as a series of binary numbers representing samples of
the amplitude of the audio signal at equal time intervals, at a sample
rate high enough to convey all speechs capable of being heard. Digital
recordings are considered higher quality than analog recordings notnecessarily because they have higher fidelity (wider frequency
response or dynamic range), but because the digital format can prevent
much loss of quality found in analog recording due to noise
and electromagnetic interference in playback, and mechanical
deterioration or damage to the storage medium. A digital audio signal
must be reconverted to analog form during playback before it is applied
to a loudspeaker or earphones.
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
8/35
8
3) Analysis of Speech SignalThelong-term frequency analysis of speech signals yields good
information about the overall frequency spectrum of the signal, but no
information about the temporal location of those frequencies. Sincespeech is a very dynamic signal with a time-varying spectrum, it is often
insightful to look at frequency spectra of short sections of the speech
signal.
a)Long-term frequency analysisThe frequency response of a system is defined as the discrete-time
Fourier transform (DTFT) of the system's impulse response h[n]:
Similarly, for a sequencex[n], its long-term frequency spectrum is
defined as the DTFT of the Sequence
Theoretically, we must know the sequence x[n] for all values of n (from
n=- until n=) in order to compute its frequency spectrum.
Fortunately, all terms where x[n] = 0 do not matter in the sum, andtherefore an equivalent expression for the sequence's spectrum is
Here we've assumed that the sequence starts at 0 and is N samples long.
This tells us that we can apply the DTFT only to all of the non-zero
samples of x[n], and still obtain the sequence's true spectrum X (). But
what is the correct mathematical expression to compute the spectrum
over a short section of the sequence, that is, over only part of the non-zero samples of the sequence?
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
9/35
9
b)Window sequenceIt turns out that the mathematically correct way to do that is to multiply
the sequence x[n] by a window sequence w[n] that is non-zero only for
n=0 L-1, where L, the length of the window, is smaller than the length
N of the sequence x[n]:Now
Then we compute the spectrum of the windowed sequence xw[n] as
usual
The following figure illustrates how a window sequence w[n] is applied
to the sequence x[n]:
Figure 2 Result of application of windowed sequence to data sequence
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
10/35
10
As the figure shows, the windowed sequence is shorter in length than the
original sequence. So we can further truncate the DTFT of the
windowed sequence:
Using this windowing technique, we can select any section of arbitrary
length of the input sequence x[n] by choosing the length and location of
the window accordingly. The only question that remains is: how does
the window sequence w[n] affect the short-term frequency spectrum?
c)Effect of the windowTo answer that question, we need to introduce an important property ofthe Fourier transform. The diagram below illustrates the property
graphically:
I. Implementation of an LTI system in the time domain.
II. Equivalent implementation of an LTI system in the frequency
domain.
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
11/35
11
The two implementations of an LTI system are equivalent: they will give
the same output for the same input. Hence, convolution in the time
domain = multiplication in the frequency domain:
And since the time domain and the frequency domain are each others
dual in the Fourier transform, it is also true that multiplication in the
time domain = convolution in the frequency domain:
This shows that multiplying the sequence x[n] with the windowsequence w[n] in the time domain is equivalent to convolving the
spectrum of the sequence X (), with the spectrum of the window W().
The result of the convolution of the spectra in the frequency domain is
that the spectrum of the sequence is smeared by the spectrum of the
window. This is best illustrated by the example in the figure below:
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
12/35
12
Figure 3 Result of application of window sequence in time and
frequency domain
d)Choice of windowBecause the window determines the spectrum of the windowed sequenceto a great extent, the choice of the window is important. Matlab supports
a number of common windows, each with their own strengths and
weaknesses. Some common choices of windows are shown below.
Figure 4 Rectangular window sequence
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
13/35
13
Figure 5 Triangular and Hamming window sequence
All windows share the same characteristics. Their spectrum has a peak,
called the main lobe, and ripples to the left and right of the main lobecalled the side lobes. The width of the main lobe and the relative height
of the side lobes are different for each window. The main lobe width
determines how accurate a window is able to resolve different
frequencies: wider is less accurate. The side lobe height determines how
much spectral leakage the window has. An important thing to realize is
that we can't have short-term frequency analysis without a window.
Even if we don't explicitly use a window, we are implicitly using a
rectangular window.
e)Parameters of the short-term frequency spectrumBesides the type of windowrectangular, hamming, etc.there are
two other factors in Matlab that control the short-term frequency
spectrum: window length and the number of frequency sample points.
The window length controls the fundamental trade-off between time
resolution and frequency resolution of the short-term spectrum,
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
14/35
14
irrespective of the window's shape. A long window gives poor time
resolution, but good frequency resolution. Conversely, a short window
gives good time resolution, but poor frequency resolution. For example,
a 250 millisecond long window can, roughly speaking, resolve
frequency components when they are 4 Hz or more apart (1/0.250 = 4),but it can't tell where in those 250 millisecond those frequency
components occurred. On the other hand, a 10millisecond window can
only resolve frequency components when they are 100 Hz or more apart
(1/0.010= 100), but the uncertainty in time about the location of those
frequencies is only 10 millisecond. The result of short-term spectral
analysis using a long window is referred to as a narrowband spectrum
(because a long window has a narrow main lobe), and the result of short-
term spectral analysis using a short window is called a widebandspectrum. In short-term spectral analysis of speech, the window length is
often chosen with respect to the fundamental period of the speech signal,
i.e., the duration of one period of the fundamental frequency. A common
choice for the window length is either less than 1 times the fundamental
period, or greater than 2-3 times the fundamental period.
Examples of narrowband and wideband short-term spectral analysis of
speech are given in the figures below:
Figure 6 Wideband and Narrowband analysis of speech
The other factor controlling the short-term spectrum in Matlab is thenumber of points at which the frequency spectrum H () isevaluated.
The number of points is usually equal to the length of the window.
Sometimes a greater number of points is chosen to obtain a smoother
looking spectrum. Evaluating H () at fewer points than the window
length is possible, but very rare.
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
15/35
15
f) Time-frequency domain: SpectrogramAn important use of short-term spectral analysis is theshort-time
Fourier transform orspectrogram of a signal. The spectrogram of a
sequence is constructed by computing the short term spectrum of a
windowed version of the sequence, then shifting the window over to anew location and repeating this process until the entire sequence has
been analyzed. The whole process is illustrated in the figure below:
Figure 7 Demonstration of making of spectrogram
Together, these short-term spectra (bottom row) make up the
spectrogram, and are typically shown in a two-dimensional plot, where
the horizontal axis is time, the vertical axis is frequency, and magnitude
is the color or intensity of the plot. For example:
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
16/35
16
Figure 8 A typical spectrogram
The appearance of the spectrogram is controlled by a third parameter:
window overlap. Window overlap determines how much the window is
shifted between repeated computations of the short term spectrum.
Common choices for window overlap are 50% or 75% of the window
length. For example, if the window length is 200 samples and window
overlap is 50%, the window would be shifted over 100 samples between
each short-term spectrum. In the case that the overlap was 75%, the
window would be shifted over 50 samples. The choice of window
overlap depends on the application. When a temporally smooth
spectrogram is desirable, window overlap should be 75% or more. Whencomputation should be at a minimum, no overlap or 50% overlap are
good choices. If computation is not an issue, you could even compute a
new short-term spectrum for every sample of the sequence. In that case,
window overlap = window length1, and the window would only shift
1 sample between the spectra. But doing so is wasteful when analyzing
speech signals, because the spectrum of speech does not change at such
a high rate. It is more practical to compute a new spectrum every 20-50
millisecond, since that is the rate at which the speech spectrum changes.
g)Length of the window and fundamental frequency
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
17/35
17
In a wideband spectrogram (i.e., using a window shorter than the
fundamental period), the fundamental frequency of the speech signal
resolves in time. That means that you can't really tell what the
fundamental frequency is by looking at the frequency axis, but you can
see energy fluctuations at the rate of the fundamental frequency alongthe time axis. In a narrowband Spectrogram (i.e., using a window 2-3
times the fundamental period), the fundamental frequency resolves in
frequency, i.e., you can see it as an energy peak along the frequency
axis. See for example the figures below:
Figure 9. Wideband Speech Spectrogram
Figure 10. Narrowband Speech Spectrogram
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
18/35
18
4) Why Sinusoids?In general the goal of modelling a signal is to reduce redundancy and to
get a more compact representation of the data. There are different
techniques to model a time series and it depends on the signal which
technique to apply. Sinusoids are especially suited for modelling speech
with harmonic content. Most natural acoustical sounds exhibit this
attribute and the reason for this sinusoidity can be found in the way of
the speech production. Human voice production system consists of two
fundamental parts working together, namely the voice chords (the
excitation source) and the pharynx with mouth and nasal cavities acting
as acoustical filter. During voiced parts of speech the vocal chords are
opening and closing at a certain frequency (the fundamentalfrequency, f0) modulating the airstream coming from the lungs. The
harmonic overtone structure results from the structure of the pharynx
which can be seen as an open tube in a simplified way, letting develop
all overtones.
f1fn being integer multiples of the fundamental f0.
5) Additive SynthesisSine waves can be considered the building blocks of speech. In
fact, it was shown in the 19th Century by the mathematician
Joseph Fourier that any periodic function can be expressed as a
series of sinusoids of varying frequencies and amplitudes. This
concept of constructing a complex speech out of sinusoidal terms
is the basis for additive synthesis, sometimes calledFourier
synthesisfor the aforementioned reason. In addition to this, the
concepts of additive synthesis have also existed since the
introduction of the organ, where different pipes of varying pitch
are combined to create a sound or timbre.
A simple block diagram of the additive form may appear like
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
19/35
19
Figure 11. Block Diagram representation of Sinusoidal Synthesis
Its mathematical form based on Fourier series will be
Where is an offset value for the whole function (typically 0),
= the amplitude weightings for each sine term,
= the frequency multiplier value.
With hundreds of terms each with their own individual frequency
and amplitude weightings, we can design and specify some
incredibly complex sounds, especially if we can modulate the
parameters over time.
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
20/35
20
6) Frequency Vs Time Matrix fromSpectrogram in MATLAB
Determination of the frequency content present in speech at a particular
instant of time is possible approximately by the Short Term Fourier
Transform (STFT), for our thesis work we are using the Narrow Band
Spectrogram produced from Matlab. We are choosing narrow band
because it gives better frequency resolution and acceptable time
resolution. We tried with Wideband Spectrogram, but the speech
synthesized using information from Wideband Spectrogram was very
noisy.
First of all, we take the spectrogram of speech signal with the help of
MATLAB commandspectrogram. The spectrogram produced by the
MATLAB command spectrogram is a RGB image in decibel scale ,
where in the intensities above 0 dB are expressed in varying shades of
Red color, so we separate out the Red component from the RGB image,
then in the separated component we can easily identify the frequencies
which had higher intensities in the speech, since the pixelscorresponding to high intensity frequencies will appear white while
others will appear black and the intermediate will be in gray scale. Now
the Red component is appropriately cropped and resized with number of
rows equal to 400 implying every row for 10 Hz range and into number
of columns hundred times the duration of the speech signal implying that
each column in the speech signal corresponds to 10 milliseconds of
speech.
It has been found that when we convert the resized image in to black
and white by converting gray pixel nearer to white into white and gray
pixel nearer to black into black the quality of speech is very near to the
original speech. So we produce the black and white image which
corresponds to Frequency Vs Time Graph for the speech signal.
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
21/35
21
a) The MATLAB code for performing above task is as follows
1)% function GenerateFreqVsTime()
2)% Record your voice for 5 seconds.
3)f=input('Enter the time in seconds for which you want to record');
4)recObj = audiorecorder(8000,8,1);
5)disp('Start speaking.');
6)recordblocking(recObj,f);
7)disp('End of Recording.');
8)% Play back the recording.
9) play(recObj);
10) % Store data in double-precision array.
11) myRecording = getaudiodata(recObj);
12) figure(1)
13) plot (myRecording);title('sound ');
14) % Plot the spectrogram
15) figure(2)
16) spectrogram(myRecording, 1000,923, 1024,8E3,'yaxis');
17) h=gcf;
18) set(gcf, 'Position', get(0,'Screensize')); % Maximize figure.
19) level=input('Please enter level between 0 and 1');
20) saveas(h,'spectrogram1.jpg');
21) fig=imread('spectrogram1.jpg');
22) figG1ray=rgb2gray(fig);
23) figure(9)
24) imshow(figGray); title('FigGray');
25) figRed=fig(:,:,1);
26) figure(3)
27) imshow(figRed);
28) title('figRed');
29) [xmin ymin width height]=croplimits(figRed);
30) figure(4)
31) figRedCropped=imcrop(figRed,[xmin ymin width height]);
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
22/35
22
32) imshow(figRedCropped);title('figRed Cropped');
33) figure(5)
34) figRedCroppedResized=imresize(figRedCropped,[400 100*f]);
35) imshow(figRedCroppedResized);title('figRedCroppedResized');
36) figRedCroppedResizedCorrected=flipud(figRedCroppedResized);
37) figure(6)
38) figRedCroppedResizedBW=im2bw(figRedCroppedResized,level);
39) imshow(figRedCroppedResizedBW);title('figRedCroppedResizedBW');
40) figure(7)
41) figRedCroppedResizedBWCorrected=flipud(figRedCroppedResizedBW);
42) imshow(figRedCroppedResizedBWCorrected);
b) Matlab code for Croplimits function used in above code is as follows
1) function [xmin ymin width height]=croplimits(input)
2) xmin=0;r2=0;ymin=0;c2=0;
3) [row,column]=size(input);
4) for i=30:90
5) if(input(i,column/2)~=255)
6) ymin=i+5;
7) break
8) end
9) end
10) count=0;
11) for ki=row:-1:row-120
12) if(input(ki,column/2)~=255)
13) for kj=column/2:column/2+50
14) if(input(ki,kj)~=255)
15) count=count+1;
16) else count=count-1;
17) end
18) if(count>0)
19) r2=ki;
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
23/35
23
20) break
21) end // end of if on line 18
22) end //end of for loop from line 13
23) end //end of if on line 12
24) end //end of for loop on line 11
25) count=0;
26) for j=80:180
27) if(input(row/3,j)~=255)
28) for i=row/2:row/2+40
29) if(input(i,j)~=255)
30) count=count+1;
31) else
32) count=count-1;
33) end
34) end //end of for loop on line 28
35) if(count>24)
36) xmin=j+8;break;
37) end
38) end //end of if on line 27
39) end //end of for loop on line 26
40) count=0;
41) for j=column:-1:column-120
42) if(input(row/2,j)~=255)
43) for i=row/2:row/2+100
44) if(input(i,j)~=255)
45) count=count+1;
46) else
47) count=count-1;
48) end //end of if from line 44
49) end //end of for from line 43
50) if(count>0)
51) c2=j;break;
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
24/35
24
52) end // end of if from line 51
53) end // end of 42
54) end // end of 41
55) height=r2-ymin+1;
56) width=c2-xmin+1;
57) end // end of function
c) Screenshots
i. Speech Waveform
Figure 12 Speech Waveform
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
25/35
25
ii. Spectrogram of above speech using Matlab
Figure 13 Spectrogram of above speech using Matlab
iii. Grayscale Spectrogram
Figure 14 Grayscale Spectrogram
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
26/35
26
iv. Image of Red component of spectrogram since red componentrepresents positive magnitude
Figure 15 Red component of spectrogram
v. Same figure after being cropped by the matlab functioncroplimit
Figure 16. Same figure after being cropped by the matlab function
croplimit
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
27/35
27
vi. Above figure resized by Matlab function to generate acolumn of pixel corresponding to 10 milliseconds
Figure 17 Resized using Matlab
vii. Above figure inverted so as to make first row correspond to10Hz frequency and next row correspond to 20Hz while last400throw correspond to 4KHz
Figure 18 Same figure as previous but inverted
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
28/35
28
viii. Same figure as above with pixels having intensity less than.9 reduced to zero while others extended to 1
Figure 19 Same figure as above with pixels having intensity less
than .9 reduced to zero while others extended to 1
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
29/35
29
7) Speech signal from Frequency VsTime Matrix in MATLAB
Once we have Frequency Vs Time matrix, we can generate the all thefrequencies using thesin function of MATLAB and add them all and do
these for all the columns which correspond to 10 milliseconds. Now we
can concatenate the data generated for each column and result is the
speech signal.
The MATLAB code for performing above series of task is as follows
a)GenerateSoundData Matlab Code:
1) function sounddata=GenerateSoundData(image)
2) [row column]=size(image);
3) image=image/.255;
4) sounddata=zeros(1,80*column);
5) timeResolution=.01;% 10 milliseconds
6) samplingRate=8000;%8000Hz
7) time=1/samplingRate:1/samplingRate:timeResolution;
8) fori=1:column
9)y=sqrt(double(image(10,i)))*sin(2*pi*time*1*10);10) forj=11:row-100
12) y=y+sqrt(double(image(j,i)))*sin(2*pi*time*j*10);
13) end
14) sounddata(80*(i-1)+1:80*i)=y;
15)end
16) sounddata=sounddata';
In this code we are only generating frequencies in the range 100Hz to
3000 Hz, because other frequencies do not affect the hearing ability so
much.
b)TestAtLevel Matlab Code:
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
30/35
30
1) function sdata=TestAtLevel(spectrograph,level)
2) bwspectrograph=im2bw(spectrograph,level);
3) sdata=GenerateSoundData(bwspectrograph);
4) soundsc(sdata,8000);
5) end
In the above function namely TestAtLevel, we pass the matrix
obtained from the GenerateFreqVsTime function of name
figRedCroppedResizedCorrected, along with the level which specifies
the threshold for converting lower values to zero while values
greater than level to 1.
8) ResultsThe speech waveform generated with different values of level for
conversion of Red component of spectrogram into Black and
White image are demonstrated below along with their spectrogram
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
31/35
31
a)Level = 0.8
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
32/35
32
b)Level = 0.9
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
33/35
33
c)Level = 0.95
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
34/35
34
9) ConclusionFrom the above three speech waveforms, it seems that the level of
around 0.9 is the best threshold for Red component of spectrogram
generated from the Matlab, so that the speech generated using the above
Matlab function namely GenerateSoundData is matching more with
the original speech.
The sinusoidal model, a framework for modelling speech and music
signals, has been presented. Sinusoidal synthesis of speech by extracting
frequency and time information form the spectrogram gives acceptable
quality of speech. Another strategy would be decomposing the signal
into deterministic and stochastic parts and using different models for the
different portions of a speech as proposed by [5].
9)Bibliography/References[1] R. McAulay, Th. Quatieri: Speech Analysis/Synthesis Based on a SinusoidalRepresentation,in IEEE Transactions on Acoustics, Speech, and Signal Processing, August1986
[2] J. Smith III, X. Serra: PARSHL: An Analysis/Synthesis Program for Non -HarmonicSounds Based on a Sinusoidal Representation
[3] K. Fitz, L. Haken: On the Use of Time-Frequency Reassignment in Additive SoundModelling
[4] M. Lagrange, S. Marchand, M. Raspaud, J.-B. Rault: Enhanced Partial Tracking using
Linear Prediction, in Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03),September 2003
[5] X. Serra: A System for Sound Analysis/Transformation/Synthesis based on a Deterministicplus Stochastic Decomposition, Thesis, Stanford University, 1989
-
8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB
35/35