Algorithms in Signal Processors Audio Applications · PDF fileAlgorithms in Signal Processors...

102
Algorithms in Signal Processors Audio Applications 2005 DSP Project Course using Texas Instruments TMS320C6713 DSK Dept. of Electroscience, Lund University, Sweden i

Transcript of Algorithms in Signal Processors Audio Applications · PDF fileAlgorithms in Signal Processors...

Algorithms in Signal Processors

Audio Applications

2005

DSP Project Courseusing

Texas Instruments TMS320C6713 DSK

Dept. of Electroscience, Lund University, Sweden

i

ii

Contents

I Pitch EstimationErik Simmons, Bo Elovson Grey,Johan Nystrom, Tommaso Bresciani 1

1 Introduction 2

2 Method for pitch estimation 22.1 The autocorrelation method . . . . . . . . . . . . . . . . . . . 2

2.1.1 Problems with the autocorrelation method . . . . . . 3

3 Our algorithms 33.1 MatLab algorithms . . . . . . . . . . . . . . . . . . . . . . . . 33.2 The algorithms for the DSP . . . . . . . . . . . . . . . . . . . 5

4 Results 9

5 Reflections on RTDX 105.1 A short explanation of RTDX . . . . . . . . . . . . . . . . . . 105.2 Our problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.3 Our recommendations to the faculty . . . . . . . . . . . . . . 11

II Synthesizing FM signals using Texas DSK C6713Jonas Aspsater, Hans Aulin,Pascal Bentioulis, Johan Gardsmark 13

1 Introduction 14

2 The MIDI Format 142.1 The building of midi files . . . . . . . . . . . . . . . . . . . . 15

3 Creating the sound 153.1 Harmonical sound . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Sound analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 Envelope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 How the states where implemented. . . . . . . . . . . . . . . 18

4 Manual for setting up the DSP card 194.1 How to play a midi file . . . . . . . . . . . . . . . . . . . . . . 19

5 The source code 20

iii

6 The process 21

7 Conclusion 22

8 References 23

III ReverberationHakan Johansson, Danny Smith,Erik Stromdahl, Helene Svensson 25

1 Introduction 26

2 Jot’s algorithm 282.1 Feedback Delay Networks, FDN . . . . . . . . . . . . . . . . . 282.2 Algorithm description . . . . . . . . . . . . . . . . . . . . . . 292.3 Early reflections . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Implementation 323.1 Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . 323.3 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Memory management . . . . . . . . . . . . . . . . . . . . . . 33

4 Results 34

5 Future work 35

IV Talking Orchestra- a vocoder approach based on cross synthesizationPascal Collberg, Catia Hansen,Benny Johansson, Fia Lasson 37

1 Introduction 38

2 Equipment 39

3 Theory 413.1 Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2 Characteristics of instruments . . . . . . . . . . . . . . . . . . 423.3 Talking orchestra with LPC . . . . . . . . . . . . . . . . . . . 423.4 Talking orchestra with CS . . . . . . . . . . . . . . . . . . . . 43

4 Method 49

iv

5 Conclusions 505.1 Problems and Solutions . . . . . . . . . . . . . . . . . . . . . 505.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.3 Acquired knowledge . . . . . . . . . . . . . . . . . . . . . . . 51

V Voice Alteration using LPC–analysisMagnus Gunnarsson, Anders Frick,Joakim Persson, Johan Runnberg 57

1 Voice Alteration 581.1 The purpose of this project . . . . . . . . . . . . . . . . . . . 581.2 What is LPC? . . . . . . . . . . . . . . . . . . . . . . . . . . 581.3 From input to output . . . . . . . . . . . . . . . . . . . . . . 58

1.3.1 Highpass filter . . . . . . . . . . . . . . . . . . . . . . 591.3.2 The allpole filter . . . . . . . . . . . . . . . . . . . . . 591.3.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . 591.3.4 Schur recursion . . . . . . . . . . . . . . . . . . . . . . 601.3.5 Calculate the a-coefficients using Step-Up Recursion . 611.3.6 Calculate the roots in A(z) and filter the signal . . . . 611.3.7 Create an allpole model 1/A(z), modify and filter the

residual through 1/Amod(z) . . . . . . . . . . . . . . . 621.3.8 Lowpass filter . . . . . . . . . . . . . . . . . . . . . . . 63

2 Implementation 64

3 Results and conclusions 653.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

VI Subtractive Synthesis on the TMS320C6713 DSP cardDaniel Camera 69

1 Introduction 70

2 Theory 712.1 Subtractive Synthesis . . . . . . . . . . . . . . . . . . . . . . . 712.2 MIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.2.1 MIDI files . . . . . . . . . . . . . . . . . . . . . . . . . 71

3 First stages of the project 73

v

4 Methods 744.1 Low pass filters . . . . . . . . . . . . . . . . . . . . . . . . . . 744.2 Multiband filters . . . . . . . . . . . . . . . . . . . . . . . . . 744.3 Envelope Creation . . . . . . . . . . . . . . . . . . . . . . . . 754.4 Debugging Methods . . . . . . . . . . . . . . . . . . . . . . . 76

5 Implementation 775.1 Square Wave generation . . . . . . . . . . . . . . . . . . . . . 775.2 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.3 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.4 Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.5 Frequency Modulation . . . . . . . . . . . . . . . . . . . . . . 78

6 Results 78

7 Conclusions 79

VII ChorusChristian Isaksson, Per-Ola Larsson,Marcus Lindh, Andreas Nilsson 81

1 Introduction 82

2 Signal Processing behind chorus 832.1 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832.2 Frequency analysis of the different LFO waveforms . . . . . . 852.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 872.4 Amplitude modulation . . . . . . . . . . . . . . . . . . . . . . 882.5 Full model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3 Implementation in C-code 903.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.1.1 Put the sample in the circular buffer . . . . . . . . . . 903.1.2 Calculate the delay . . . . . . . . . . . . . . . . . . . . 913.1.3 Interpolate . . . . . . . . . . . . . . . . . . . . . . . . 913.1.4 Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.1.5 Modulate amplitude . . . . . . . . . . . . . . . . . . . 923.1.6 Sum up the output signal . . . . . . . . . . . . . . . . 93

3.2 Memory handling . . . . . . . . . . . . . . . . . . . . . . . . . 93

4 Conclusions 94

vi

1

Part I

Pitch Estimation

Erik Simmons, Bo Elovson Grey,Johan Nystrom, Tommaso Bresciani

Abstract

Pitch estimation can be described as a method that for exampletells you what notes you play on the piano. Our project goal was toimplement a algorithm in C-code that estimated the pitch from an au-dio input signal. The C-code is loaded into a DSP card that can runthe pitch estimation algorithm in real-time and output the notes intoa file. Our algorithm uses autocorrelation for the frequency estimationand pre- and post-processing to give a robust pitch estimation. The au-tocorrelation method compares a signal with versions of itself delayedby successive intervals and gives a high value if the signals comparedare similar and a low value if they are different from each other. Thefirst highest peaks are detected in the autocorrelation output and thefrequency of the signal is related to the number of samples betweenthe peaks. The pitch detection can be used for various musical ap-plications. One example is to write out the notes from a solo-playedmusical piece.

2 Pitch Estimation

1 Introduction

In this project we make a pitch estimator. A pitch estimator or pitch detec-tor is a device that tries to determine the fundamental pitch (pitch (period))of the input signal. The device should find a frequency that a human lis-tener would agree to be the same pitch as the input signal. Pitch estimationshould only be made on melodic sounds. To estimate the pitch of percussivesounds like cymbal crashes, snare drums and short thuds or rumbles makesno sense.

Pitch estimation has several applications. One common application isspeech analysis. There are also several musical applications of pitch detec-tion. One application that is used frequently today is sound transformationwhere pitch estimation is used as a guide for pitch-shifting and time-scalingoperations. Pitch detection can also be used for transcribing music. Es-pecially when it is some kind of ethnic music that otherwise can not betranscribed into ordinary musical notes.

In this project we implement pitch estimation software on a Texas Instru-ments C6713 DSP. The pitch estimation algorithm, pre- and post-processingalgorithms are first developed in Mathworks MatLab and then converted toC code.

2 Method for pitch estimation

There are several different methods for pitch estimation. They are mainlydivided into three categories: Time domain methods, frequency domainmethods and methods based on models of the human ear. All the meth-ods have their pros and cons. Some are good at detecting low frequenciesand others are good at detecting high frequencies. No method detects allfrequencies equally good. The method chosen for this project is the au-tocorrelation method. After comparing performance results from differentscientific articles, the autocorrelation method seemed to have the best aver-age performance. [3]

2.1 The autocorrelation method

The pitch-detector samples the input signal into a buffer and then usesdifferent lags to compare the signal with a delayed version of itself. A typicalautocorrelation function looks like this:

autocorrelation[lag] =N∑

n=0

signal[n] ∗ signal[n + lag] (1)

where n is the input sample index and N is the window length. The mag-nitude of the autocorrelation[lag] determines how much the signal that is

Erik Simmons, Bo Elovson Grey, Johan Nystrom, Tommaso Bresciani 3

contained in a window is similar with a version of itsef delayed by lag sam-ples. If the lag is 0 then the output of the normalized autocorrelation willbe 1 which means that the signals are identical.

2.1.1 Problems with the autocorrelation method

The first problem with the autocorrelation method is that it can be hardto know which peaks in the function that corresponds to the pitch period.Different noise sources can contribute false peaks.

The autocorrelation method requires the use of window for calculatingthe short time autocorrelation function. The longer the window is the morenoise will be suppressed. On the other hand if the window is too longwindow the calculations will become more time-consuming. The windowlength also defines which frequency band the pitch-detector can handle.Ideally a window should contain two or three complete pitch periods. Forhigh frequencies the window should just be a couple of milliseconds, but forlow frequencies the window needs to be between 20 and 50 milliseconds, ifnot more. The higher the samplerate is the required number of values pertime-unit increases. This will also result in a larger window.

To be able to detect high frequencies we need a high samplerate. Therewill also be less aliasing with a high samplerate. However when detectinglow frequencies the calculations will be massive. Aliasing and noise can beremoved to a certain extent with the proper pre-processing.

3 Our algorithms

In this section we will describe the algorithms we use for pitch detection.First the algorithms were developed in MatLab1 and later converted to C.

3.1 MatLab algorithms

The general idea for our algorithms is that we pre-process the signal toremove undesired, both high and low frequency, noise. Otherwise we mightdetect the noise as the true frequency in the autocorrelation function. Themaximum lag of the autocorrelation function is selected to fit the exactnumber of periods desired for the lowest frequency we want to be able todetect for a given samplerate. The window-length is a tradeoff betweennoise insensitivity and the ability to handle fast pitch variations in the inputsignal. The signal is then distributed over a number of windows. Now theautocorrelation is calculated. A certain amount of overlapping between thewindows is needed to be able to detect the frequencies correctly over theentire signal. Peak detection is then used for each autocorrelation functionto find the different frequencies in the signal.

1MatLab is a registered trademark of The MathWorks Inc.

4 Pitch Estimation

0 0.5 1 1.5 2

x 104

−0.2

0

0.2

0 1000 2000 3000 40000

0.005

0.01

0 0.5 1 1.5 2

x 104

−0.2

0

0.2

0 1000 2000 3000 40000

2

4

6x 10

−3

0 0.5 1 1.5 2

x 104

−0.1

0

0.1

0 1000 2000 3000 40000

2

4

6x 10

−3

0 100 200 300 400−0.2

0

0.2

0 1000 2000 3000 40000

2

4

6x 10

−3

Figure 1: The plots on the left side shows the audiosignal in the time-domain.On the right side we plot the same information in the frequency-domain.The first row shows the original signal. In the second row the high-pass FIRfiltering is done. Then in the third row low-pass filtering is computed. Thelast row shows the autocorrelation of the preprocessed signal.

The number of peaks used in the peak detection has to be consideredcarefully. If to few peaks are used, fast variations in the pitch will show upas different frequencies, though it could very well be the same note playing.Averaging over a higher number of peaks will make the pitch detection lesssensitive to for example a vibrato. On the other hand, if too many peaksare used fast true pitch variations will not be detected properly. Especiallyif we try to detect low frequencies, since the peaks stretch over a fairly largeamount of time.

After the peak detection we do some post-processing to get rid of thefrequencies that are undesired. What we do here is that we try to matchfrequencies that occur close to each other and then delete those that differto much.

Erik Simmons, Bo Elovson Grey, Johan Nystrom, Tommaso Bresciani 5

In figure 2 is shown how the pitch detection works in the matlab-code.

60 80 100 120 140 160

50

100

150

200

250

300

350

Number of frames.

Fre

qu

en

cy [

Hz]

Pitch Estimation of 14 notes played on a piano.

Figure 2: The pitch estimation for the piano sound file. Sampling frequencywas 8 kHz and each note was playing for 0.5 seconds. The estimation ismade without any post-processing.

3.2 The algorithms for the DSP

To be able to run an algorithm on the DSP it has to be written in either Cor assembly code in Code Composer Studio2, Texas Instruments’ integrateddevelopment environment. We chose to use C since it’s a much easier lan-guage. We assumed that our application would be fast enough without anyassembly optimizations.

When the MatLab-code was working we started to translate the codeinto C in Code Composer Studio. The first thing to be translated was justa simple algorithm that was suppose to calculate the pitch of a Sine-wave.The code written in MatLab is quite similar to the code that was going tobe written in C which was helpful.

2Code Composer studio is a registered trademark of Texas Instruments Inc.

6 Pitch Estimation

The issue with the C-program was that it was going to make the pitchdetection in real-time and that is why we didn’t have a specified length ofthe input signal. A solution to this was to sample the input via the PIPobject pipRX into a buffer and then when the buffer was full the calculationscould be made. Three software interrupt threads are used, see figure 3.

audio sampling audio sampling audio sampling audio sampling audio sampling

buffer to memory buffer to memory

Calculations

print to file

Execution graph

buffer to memory buffer to memory

audio sampling

buffer to memory

audio sampling

Calculations

Figure 3: Four threads running in our application. The audio samplingthread is a hardware interrupt and it will always run. The software inter-rupts run with the same priority.

The first thread seen in the picture (audio sampling) is a hardware in-terrupt handled by the operating system that does the actual sampling ofthe input signal into a buffer called audiosrc. When this buffer is full ainterrupt occurs which call buffer-to-memory. This thread write the valuesof audiosrc into a larger buffer (buffer in). When this vector is full the cal-culation thread is called which runs all the computations required for thepitch estimation. At the end there is a call to the print-to-file thread whichwill print the notes estimated into a textfile.

The file output is supposed to look like this:

3A 4C 4C# 4D4D# 4E 4E# 4Fetc.

With our thread-schedule we are able to receive new samples into theinput buffer during the calculations. In this way we managed to get thepitch estimation running in real-time with just a small amount of delay andwith the input buffering continuously running.

Erik Simmons, Bo Elovson Grey, Johan Nystrom, Tommaso Bresciani 7

For the pre-processing we use a bandpass-filter function that is suggestedin the Code Composer Studio help. The filter is a FIR bandpass filter andit has 64 taps since it is designed to be sharp. The filter coefficients werecalculated in MatLab. The bandpass-filter has almost the same propertiesas the two filters from the MatLab algorithms combined, see figure 4.

0 10 20 30 40 50 60 70−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Figure 4: This is the impulse response for the bandpass FIR filter for thepre-processing. The cutoff-frequencies are 72 and 7200Hz.

To save calculations we combine the peak-detection with the calculations ofthe autocorrelation function. Each point in the autocorrelation function iscalculated on the fly and used in the peak-detection algorithm. To optimizeeven more the peak-detection, we stop the computation of the autocorrela-tion function when we have found the desired number of peaks.

8 Pitch Estimation

Figure 5 shows how the CPU-load drops as higher and higher frequenciesare detected. This is expected behaviour since much fewer calculations areneeded to detect a high frequency because the space, i.e. the number ofvalues, between the peaks are much shorter. This fits well with the theorydiscussed in sections 2.1.1 and 3.1.

Figure 5: This shows the CPU-load for the DSP during pitch-estimation.Frequencies from 110 to 7040Hz is played in a monochromatic scale. In thegraph is clear how the CPU-load drops when the frequency is increased. Twodiscontinue-points occurs when the number of periods used by the algorithmfor calculating the pitch increases from 1 to 2 and from 2 to 4.

The conversion between frequencies and musical notes covers six octavescorresponding to frequencies between 110 and 7040 Hz.

Erik Simmons, Bo Elovson Grey, Johan Nystrom, Tommaso Bresciani 9

4 Results

We have had fair share problems with the results coming out from ourpitch detection. The MatLab-implementation went smoothly and we couldsee good results from our autocorrelation algorithm early in the project.Converting the MatLab algorithms into C code did not go very smoothly,though. The first problem we ran into was to be able to buffer the samplesfrom the input signal. Several buffer-sizes were tried. In the beginning toosmall buffers were used. That led to that the windows for calculating theautocorrelation were to small to handle low frequencies. Later we ran outof memory because the buffers were to large for the internal memory on theDSP. Now external SDRAM is used and memory is not an issue any more.An interesting error showed up because of a badly chosen buffer size. Wewere unable to detect 400 Hz when the buffer was 800 values long. Now thebuffer is 4096 values long. Only samples from one of the input-channels areused, since the frequency should be the same in both of the input-channels.We are only interested in detecting frequencies in mono sound.

Another issue that we ran into was that the application detected thedouble frequency which was caused by improper type-conversion.

At first we did the peak-detection after calculating the entire autocor-relation function. This made the calculations unnecessarily slow. Now thepeak-detection stops when two periods are found. This might be too fewperiods, but we get acceptable results this way.

The application uses a sampling frequency of 48 kHz to make sure thatwe could use as many values as possible to detect high frequencies. Sincewe get a huge amount of values we have problems detecting low frequencies.Theoretically, based on the FIR bandpass filter we should be able to detect70 Hz, but this is not the case. About 200 Hz is the actual lower limit.Around 7 kHz is the upper level. The upper limit is decided from thebandpass filter’s upper cutoff frequency since we thought that the frequencyswe want to detect won’t be higher than that.

The application is not suited for speech-recognition since we have troubledetecting lower frequencies than 200 Hz.

The detecting of the musical notes is accurate. We have compared theresults from the pitch detection with sine-waves created and played in Mat-Lab. In this way we have been able to verify that song, whistles and musicalinstruments playing were pitch detected correctly. We used our own ears tocompare the sounds from the different sources.

10 Pitch Estimation

5 Reflections on RTDX

5.1 A short explanation of RTDX

RTDX stands for Real-Time Data eXchange and it is an interface that allowsdata transmission between a host and a target-DSP in real-time not inter-rupting the application running on the DSP. RTDX uses an Object Linkingand Embedding, OLE, application programming interface to achieve this.OLE is a way for different threads, in a Microsoft Windows operating sys-tem, to send objects between them. This means that data processed on thetarget-DSP can be viewed or saved in real-time. The bandwidth of RTDXis 8 kbit/s.

RTDX has many uses and there is a wide variety of applications thatuse it, for example control-, servo- and audio-applications. It is also verysuitable for system debugging. Instead of stopping the application runningon the DSP, all calculations and data can be visualized in real-time.

5.2 Our problems

We originally planned to make a graphical user interface for our application.The idea was to use an RTDX-stream from the DSP-application to a hostapplication written in Microsoft Visual Studio .NET 20033. Since there areseveral RTDX examples in the tutorial it seemed to be a fairly easy task.Especially since it is very easy to design an interface in Visual Studio. Theexamples were mainly written in Visual Basic4 and in some cases a C++version also existed. There is a huge drawback though. Visual Basic 5is completely code incompatible with Visual Basic .NET. There is a code-conversion wizard in Visual Studio .NET but there are some problems thewizard cannot solve. The older versions of Visual Basic was a very weaklytyped language. The new versions, however, have stronger data types. Thismeans that you have to declare your variables properly in the .NET versions,and this was not the case in the older versions.

Another problem with the older versions is that there was a defaultdata-type for each object. To give an example: Assume that there is a classPerson, which describes a person. An object of the class person contains alot of information, e.g. name, age, sex and since we are Swedish, a uniquepersonal number. Let’s say that the default type is the personal number.Then both of these lines of code are valid:

dim Kalle as Persondim Eva as person

’ Create a new object of the class person

3Visual Studio .NET is a registered trademark of Microsoft Corporation4Visual Basic is also a registered trademark of Microsoft Corporation

Erik Simmons, Bo Elovson Grey, Johan Nystrom, Tommaso Bresciani 11

Kalle = new Person

’ Now let us create a person using the default data typeEva = "800101-1124"

The last line of code is invalid in Visual Basic .NET, and here lies the largestproblem with the old Visual Basic examples. In the examples objects areinitialized with other complex objects as data types. A complete documen-tation of the classes is needed to be able to initiate the objects and use thecorrect methods for assigning the values.

What about the C++ code then? The example code should be totallycompatible. The problem is that a dll-file, rtdxint.dll, is needed to be ableto use RTDX communication. The dll contains pre-compiled objects andin this case one needs to be able to use the objects and the namespace ofthe objects to be able to write working code. There is a standard modelfor objects under Windows, the Common Object Model, or COM for shortand the dll supports COM. COM have also been through a major changebetween Visual Studio 6 and Visual Studio .NET and COM dlls createdunder different versions are incompatible. This means that Visual Studio.NET refuses to recognize the namespace which renders the code impossibleto compile. A namespace is not necessary, but one has to know the namesof the classes, and that information is not available. Even if the class nameswould be known, there is no guarantee that the classes would be accepted.

5.3 Our recommendations to the faculty

RTDX provides excellent functionality. Laboratory classes sessions shouldbenefit from RTDX applications.

The example code in C++ from TI support compiles in Visual Studio.NET and works fine. Since no updated Visual Basic examples are availablewe cannot recommend developing in that language, since the old languagesyntax and functionality differs so significantly from the current version.Applications written in C++, or perhaps even C#, should work very well.

Visual Studio .NET is a very good development tool in our opinion. Itis stable and provides lots of nice functionality such as project managementand advanced debugging. Graphical interfaces are very easy to build in theForms editor. Just select the desired components and place where you wantthem. All the properties of the components can easily be changed. The codefor the entire form will be auto-generated in large parts. If an action-handleris needed just double-click on the component, for example a button, and thehandler will also be added automatically. The only thing needed is the codefor the actions for the different components.

If developing in a traditional programming language seems too time-consuming applications can be implemented in MicrosoftTM ExcelTM, Lab-

12 Pitch Estimation

View or MatLab. LabView is probably the easiest way to develop the ap-plications needed for laboratory classes.

References

[1] Roads Curtis, The computer music tutorial p504-520

[2] Rabiner, Lawrence On the Use of Autocorrelation Analysis For PitchDetection, IEEE Transactions on Acoustics, Speech and Signal Process-ing, Vol. ASSP-25, no. 1, February 1977

[3] Rabiner, Lawrence et al A Comparative Performance Study of SeveralPitch Detection Algorithms, IEEE Transactions on Acoustics, Speechand Signal Processing, Vol. ASSP-24, no. 5, October 1976

13

Part II

Synthesizing FM signals usingTexas DSK C6713

Jonas Aspsater, Hans Aulin,Pascal Bentioulis, Johan Gardsmark

Abstract

The purpose of this assignment was for us to get familiar with theMIDI format and develop a MIDI player based on FM and AM syn-thesis. To our help we used the programmable DSP Card DSK C6713made by Texas Instruments. The project has also been an introductionto the programming language C, that is used in many signal process-ing algorithms. We were also supposed to imitate different kind ofinstruments.

In this report you can read about how the MIDI-Format is builtup, and a theoretical description of how different synthesis works. Thereader will also get an overview on how to make a sound analysis ofdifferent musical instruments. In the middle of the report there is adescription over different practical implementations and solutions ofthe MIDI player. Finally our experiences and problems are discussed.

14 Synthesizing FM signals using Texas DSK C6713

1 Introduction

During the 80:s Yamaha introduced the FM-synthesis in the DX-7 synthe-sizer. It came to revolutionize the world of synths. Thanks to the newFM-synthesis, manufactures could create sounds that were more similar toreal instruments. Yamaha based their synths on the MIDI format, a formatstill living today. Even though MIDI is an old technique it is used in cellphones, computers, soundcards and of course the synths. The MIDI formatis an hexadecimal written language with information about the instruments,rate, pace for the melodi and amplitude for every tone that is played. To beable to play MIDI sounds a midiplayer is used to decode the information.The minimum requirements for the midi player in the project is given in thespecification below.

• Two channels

• Five simoultaneously playing tones

• An envolope applied to each tone

• A set of instruments

2 The MIDI Format

The MIDI format is a type of language that is used to tell how a synth issupposed to behave. It has instructions for when to play a tone, which tone,which instrument (channel) to play, when to stop and how much vibratoand so on. A MIDI file consists of different hexadecimal instructions thatare decoded by a synthesizer. The keys of a piano are numbered with alow number to the left and a greater number to the right at the piano. Forexample a 5 octave piano has a middle-C in the center and is given thenumber 60.

Figure 1: MIDI commands

Most MIDI commands consist of 3 bytes (see figur 1). The first four bitsof the first byte decides if a tone is to be played, if so the bits are set to1001. The following four bits represent a number between 0 and 16 and tellsus which channel that is running. The first byte is called the status byte.

Jonas Aspsater, Hans Aulin, Pascal Bentioulis, Johan Gardsmark 15

After this comes the first data byte. The first bit in the data byte alwayscontains the value 0. The following seven bits represent a number between 0to 127 and tell us which note that is played. No piano has 127 tangents andthe highest and lowest tones are not possible to play. The last data byte isbuilt up in the same way as the first. The only difference is that the numberbetween 0-127 stands for hit velocity and is a measurement how fast a keyis pressed down. Now we have all the information that is needed to start atone. To shut down a tone you send the same three bytes as you did whenyou played a tone but the status byte starts with 1000. The byte sequenceto shut down a tone also contains information about hit velocity.

One advantage with MIDI is that it is possible to play 16 channels atthe same time. For example you can play piano on channel one and drumson channel 10. This makes your MIDI sound like a band even though youonly have access to a synth. In a normal MIDI player the different channelsis set to predefined instruments.

A MIDI file also contains information about hit-sensitivity per key. Thepiano that use this technique are very expensive and this command is notoften used. The decoding programs can often handle this kind of informationand the information is stored in a similar way as in the case with hitvelocity.The only difference is that the first four bits in this status byte is set to 1010.

2.1 The building of midi files

To be able to generate good test signals when we developed the MIDI player,we used a program called MusicSculpture. The program is structured as apiano and you can record different tunes on different channels. We recordeda c-major scale with one note on each channel to test our player. MIDIfiles is available in two formats, MIDI 0 and MIDI 1. MIDI 0 saves allthe information in one track and MIDI 1 stores all information on differenttracks. Our program is built to handle MIDI 0 so we needed some kindof converting program between MIDI 1 and 0. For this task the sharewareprogram MIDconverter was used.

3 Creating the sound

3.1 Harmonical sound

To make a sound feel nice and not so stiff you need some side bands. Thesesidebands should be harmonic and that is a condition that is fulfilled if theyare placed in positions that are at multiples of the main frequency. Mostinstruments have a number of distinct sidebands that can be imitated withdifferent synthesises or simply by sampling the original instrument. If youbuy a modern keyboard today it is likely that it is using both sampledinstruments and/or different synthesising methods.

16 Synthesizing FM signals using Texas DSK C6713

3.2 Synthesis

One method that was common on the classical synthesizers but is not sowidely used today is subtractive-synthesis. This means, you have a soundthat is rich of overtones that you filter to get the desired tones. Accordingto theories founded by Fourier in the 18th century any signal can be rebuiltthrough summation of an infinite number of sine-generators. This is thetheory that is used in the additive synthesis. But there is a problem becauseyou need so many sine generators. A considerably less computationallydemanding method that can give similar frequency spectra is FM-syntes.The mathematical definition of a FM-modulated signal is5:

s(t) = Asin(2πCt + Isin(2πMt))

In the formula the main frequency is C and the modulation frequencyis M . The formula will cause a peak at frequency C and sideband tops atfrequencies C+nM and C−nM , where n is an integer. The sideband-peaksare always placed symmetrical around the main frequency. Those frequencycomponents that are placed on the negative axis are folded in to the positiveaxel, but with a phase distortion of 180 degrees. This leads to a problemif the frequencies that are folded lead to disharmonic sound. The foldedfrequencies must be placed on certain distances from the main frequency.When considering the expression C + nM and C − nM and the previousmentioned conditions for harmonic sound it is obvious that M should bean integer multiple of C. The term I in the formula is a measurement howmuch energy that is distributed to the sidebands. For example if I is set to 0there will only be a spike on the main frequency. Increasing I will distributegreater amount of energy to the sidebands.

The frequency content of two FM signals is shown in figur 2 and infigur 3. In the figures the respective parameters for the FM equation is;figur 2: C = 262 Hz, I = 1 and M = 262. figur 3: C = 262 Hz, I = 3 andM = 262.

3.3 Sound analysis

One of our goals with the project was to try and simulate a real instrument.The first instrument we tried to simulate was a piano. To our help we hada recorded sequence of a real Piano that we made a FFT analysis on. Theresults is shown in figur 4.

In the graph it is clearly seen that the sound contains a main frequency(in this case a C at 262 Hz) and a number of sidebands placed on harmonicdistances. Using our new knowledge about the different synthesis, it is agood idea to use the FM-synthesis since it is not so computationally de-manding. It is really easy to position the sidebands but a problem arose

5Roads, C. The computer music tutorial

Jonas Aspsater, Hans Aulin, Pascal Bentioulis, Johan Gardsmark 17

0 200 400 600 800 1000 1200 1400 1600 1800 20000

2000

4000

6000

8000

10000

12000

14000FM spectrum

frequency

ampl

itude

Figure 2: FM-synthesis spectra

when trying to master the amplitude of them. After some reading we un-derstood this is possible to do using Bessel-functions. This is an area thatwe have chosen to not investigate further. The most natural way to continuewas using additive synthesis. It is computationally more demanding but itis very simple to use because you have exact control over where you positionthe sidebands and the amplitude of them. When a FFT-spectra similar tothe one belonging to the piano was done, the sound was played. To ourdisappointment it wasn’t very similar the original sound. Something morehad to be done with the spectra. If you look at the spectra in figur 4 you cansee that the tops have some none harmonic transients. These also had tobe imitated. This was done by using FM-synthesis for every sine-oscillator.The adding of the FM signals did make the sound more similar but still therewas something missing. A spectrogram was plotted for the piano in figur 5.In the spectrogram it is clearly seen that some frequencies are reduced fasterthan others. This was easy to imitate by multiplying every sideband withan exponential decreasing function.

3.4 Envelope

In order to get a smooth sound when playing a tone you need to use anenvelope. The envelope consists of four states which can be seen in figur 6.The first state is the attack state and its function is to form the soundsuch that when a key is pressed down on the synthesizer the amplitude will

18 Synthesizing FM signals using Texas DSK C6713

0 200 400 600 800 1000 1200 1400 1600 1800 20000

5000

10000

15000FM spectrum

frequency

ampl

itude

Figure 3: FM-synthesis

increase fast, linearly to a maximum. When reaching the maximum youhave reached the second state, the decay state. The decay state decreasesthe amplitude in a similar way as the attack state increased it. The differenceis that in the decay state the amplitude only decreases to 70% of maximumamplitude. After decay a constant amplitude is kept for the remaining timeplaying the tone. This state is called the sustain state. When we release thekey i.e. the tone is not to be played anymore, we don’t want it to end playingimmediately because it results in a rough sound. Therefore the envelope hasyet another state, the release state. This state is just set to decrease thesignal linearly to zero amplitude.

3.5 How the states where implemented.

The first state is just a linear function that starts with zero amplitude andgoes to max amplitude. The time that this happens in is ATTACK TIME.The second state, the decay state, uses the same linear function as in thefirst state, with the difference of it being negative. The decay goes frommax amplitude to 70% of max amplitude. The time window is from AT-TAC TIME to END ATTACK. In the next state, the sustain state, theamplitude is constant with 70% of the maximum amplitude. This is doneuntil we get the command tone off. When the tone is set to tone off wekeep the amplitude and multiplies it to a linear function that will decreaseto zero under the time DECAY TIME. The envelope applied to a tone can

Jonas Aspsater, Hans Aulin, Pascal Bentioulis, Johan Gardsmark 19

Figure 4: A piano fft

be seen in figur 7.

4 Manual for setting up the DSP card

The DSP card needs one input and one output. The input is a USB cablethat loads the program to the DSP card and the output is an analog cablelinked to the speakers. When we play our midi file we do that through theprogram Code Composer, the same program we used to create our project.

4.1 How to play a midi file

In order to play a midi file you first have to build the project. You do thatfrom Project and then Rebuild all. When the project is built you will haveto load it to the DSP card. That is done from File and then Load program.When the project is loaded you can start playing midi files. You do that bythe following steps:

1. From the menu bar: DSP/BIOS and then Host Channel Control, anew window is opened

2. From the new window you press Bind and then choose the midi fileyou want to play. If the input already is bound, do Unbind first.

3. Run the program from File and then Run program, or press the short-cut button for run.

20 Synthesizing FM signals using Texas DSK C6713

Figure 5: A piano spectrogram

4. Press play from the Host Channel Control window.

Soft sound will now appear in the speakers.

5 The source code

The source code written in the FM-synthesis project consists mainly of twoparts. The first part is the process function defined in pip audio.c. Thisfunction is invoked when the program is ready to copy data from the inputbuffer to the output buffer. Before copying the data to the output buffer, theprocess function processes the input data (e.g. synthesizes tones given bythe midi commands in the input data , and then copies the processed datato the output buffer. The second part of the source code is the instrumentsfunctions defined in instruments.c.

When process is invoked it reads a midi command from the input bufferby calling the MIDI runCommand. The interpretation of the commandsgiven by the input data is done in midi.c. The interpreted parameters usedin this project are:

• the tone to play, stored in the external parameter Mtone.

• the velocity of the key when pressed down, stored in the externalparameter Mvel

• the channel, stored in the external prameter Mchan

• on/off commands, stored in the external parameter Mcom.

Jonas Aspsater, Hans Aulin, Pascal Bentioulis, Johan Gardsmark 21

Figure 6: An envelope

• the time (or nbr of ticks) to the next event, stored in dxPause

• the beat (or pace) of the midi file, stored in period.

When MIDI runCommand is finished a channel is updated by callingupdateChannel, which turns on/off the tone Mtone on the channel Mchan.The program features a maximum of five channels, each with a possibilityto play a maximum of ten tones simultaneously. The envelope calculationsare performed using an envelope counter and the velocity parameter Mvel.The calculations results in an amplitude used as an input parameter to theinstruments, along with the envelope counter used as the ”time” and thenote to play (a set of notes, ordered by rising frequencies, are stored in thevector notes). The instrument functions are responsible of synthesizing theoutput values. The output is then written to the output buffer.

6 The process

The developement of the project has been quite a journey. Since none of uswere especially good at programming in C, we felt that a good thing wouldbe to start in matlab. Matlab has proved to be a very good tool in orderto see how things work. Our project was based on FM. In order to createsounds, we played around in matlab quite a lot. In addition of creatingdifferent FM-synthesis, i.e. synthesizers, we actually managed to create apiano that sounded quite good. Unfortunately the piano didn’t work so wellwhen using it in the project (in the C code). The main reason for that was

22 Synthesizing FM signals using Texas DSK C6713

Figure 7: An envelope applied to a tone

that the piano needed lots of calculations, since it was built on differentsinusoids added together. Our conclusion was that the sound of a pianoprobably should be sampled instead of synthesized.

When making the envelope we made all programming in matlab. Thiswas done so we actually could plot and see that it was working correct.Strangely for us, it didn’t work at all in our project. We actually spent alot of time thinking about why it didn’t work. When we still couldn’t un-derstand why, we decided to sample the output signal to an other computerand then plot it. This helped us to figure out where the problem was. Itturned out that our envelope was working correctly. The problem was thatthe output signal, later to be sent to the instruments, became zero due tomultiplications and divisions executed in the wrong order (the division of aparameter of type short with a parameter of type integer became zero).

7 Conclusion

This project has given us a good experience in how to program an exter-nal DSP Card. We have also learned how the MIDI-format and differentsynthesis works. The programming technique is a bit different from how toprogram a normal PC and we have been forced to live within the tighterlimits of the cards memory and speed. We have realized there is a differenceto imitate pipe instruments compared to string instruments.

Jonas Aspsater, Hans Aulin, Pascal Bentioulis, Johan Gardsmark 23

None of us had any experience in C-programming so this was a difficulttask for us to master. Our limited experience in C has led to a code that isa bit clumsy. This makes it hard to add other functions to the program. Weencountered a problem when trying to use floating point variables. At thebeginning this worked just fine but when continuing building, the programslowed down or maybe even crashed. This was because the type float wascomputationally more demanding then for example integer.

8 References

• Roads, C. The computer music tutorial. The MIT Press 1996

• Salomonsson, F. Additive synthesis on Texas Instruments TMS320C6701DSP Card, Master of Science Thesis, Department of electroscience,April 2004

• Yamaha Co., What’s MIDI Musical Instrument Digital Interface

• Music net & Ernst Nethorst-Boss,http://www.musiknet.se/arkiv/MIDIskola februari - 2005

24 Synthesizing FM signals using Texas DSK C6713

25

Part III

Reverberation

Hakan Johansson, Danny Smith,Erik Stromdahl, Helene Svensson

Abstract

The purpose of this project was to familiarize ourselves with realtime programming on a TI family processor. Our task was to im-plement a good sounding stereo reverb. We chose a feedback delaynetwork structure as proposed by Jot for our implementation becauseit is known for its high quality sound and it was fairly easy to imple-ment. An early reflections module was added but the calculation ofits parameters lies outside this projects scope. We used an externalapplication to generate the early reflections parameters. Controllableparameters include late reflections gain, early reflections gain, directsound gain and reverb time for low and high frequencies. Our taskwas solved mainly by trying different structures and parameters andour perception of the reverb quality was our instrument for decidingif a solution was sufficient or not. The finished reverberation modulesounds very good and is adaptable to most room sizes. Our implemen-tation uses a lot of memory and CPU capacity but there is room forimprovement. This project has given us a well needed foundation inDSP programming for a real time environment.

26 Reverberation

1 Introduction

’Reverb’, short for reverberation, is the complicated set of reflections thatare produced when a sound travels from a source to a listener by bouncingoff many different surfaces. These reflections provide important cues to ourbrain, for example, allowing us to distinguish whether we are in a smallbathroom or a large stadium.

The reverberation effect consists of three different parts, the direct sound,the early reflections and the late reflections. The sound from a sound source,a speaker for instance, that travels directly to your ears is called direct sound.This is just one of the paths the sound waves can travel to reach your ears.Other paths will arise from the reflections of objects in the room such aswalls, ceilings, furniture etc. These waves will arrive attenuated because thereflecting objects absorb some of the energy and they will arrive delayedbecause of the longer path they had to travel. Of course, the reflected wavescan be bounced around several times before arriving at the listener (see Fig.1). This creates a dense serie of echoes and is called late reflections. Latereflections are what gives the room its sense of spaciousness and makes thesound more natural.

Figure 1: Reflection waves

The early reflections are the first couple of reflections that reaches thelistener, usually between 20 to 80 milliseconds after the direct sound. Earlyreflections are very important because they give the listener a sense of di-rection, how large the room is, where the speakers are, where the listener ispositioned etc.

The reflections also become colored by the reflecting objects. For in-stance, most rooms attenuate higher frequencies more than lower frequen-cies. These effects (delay, attenuation and coloring) will give different roomsdifferent sound characteristics. A small room will sound different than a bigroom; a tiled room will sound different than a room draped in soft materials.

In Fig. 2 we see a typical impulse response for a room. First we havethe direct sound, thereafter the early reflections and last the late reflections.

Hakan Johansson, Danny Smith, Erik Stromdahl, Helene Svensson 27

Figure 2: Typical impulse response for a room

The decay time of the late reflections is called t60 and it tells us how longtime it will take before the sound level has dropped 60dBs. For a large hallthis would be something like 3 seconds for low frequencies and around 1.5seconds for high frequencies.

Since reverberation is a natural effect that we expect to hear, the task ofproducing an artificial reverb with good sound quality has become impor-tant. The reverberation in a car for example, may not be sufficient to createthe sound of a symphony orchestra and when using headphones, there is nonatural reverberation added to the music. A very dry signal can sound quiteunnatural. Since we can’t always listen to music in a concert hall or otherpleasing environments, we add reverberation to the recording itself.

Over the years many ways of creating artificial reverb has been utilized.One of the more popular methods of reverb simulation was realized withbig hanging metallic plates under tension with electromagnetic transducersattached. It was possible to control time of oscillations of such plates usingdamping. A lower-quality variation of this approach was a spring rever-beration used earlier in guitar amplifiers. One end of the spring had anelectromagnetic transducer that made it oscillate and the other end had asound-receiver that got all its useful and parasitic oscillations. Nowadays itis done by the means of digital signal processing.

Much of the early work on digital reverb was done by Schroeder, andone of his well-known reverberator designs uses four comb filters and twoallpass filters as seen in Fig. 3.

Later on, Dattorro, Gardner, Moorer and Jot [3]. continued workingon artificial reverb and offered different approaches on how to obtain a re-verberation algorithm with some specific characteristics: natural sounding,high echo density, control of the reverberation time, etc.

More advanced algorithms can be developed to model specific room sizes.With chosen room geometry, source, and listener location, ray tracing tech-niques can be used to come up with a reverb pattern. Typically, a finiteimpulse response (FIR) filter is used to create the early reflections, and then

28 Reverberation

Figure 3: Schroeder’s reverberation design with comb and allpass filters

IIR filters are used to create the late reflections. In this project we will beusing a reverberator design proposed by Jot which uses a feedback delaynetwork.

2 Jot’s algorithm

The algorithm we have chosen to implement is named Jot’s algorithm afterJean-Marc Jot himself in the beginning of the ninties. It is a reverberationalgorithm which among other characteristics gives natural sounding, controlof the reverberation time and high echo density.

2.1 Feedback Delay Networks, FDN

To understand the Jot’s algorithm the structure of a Feedback Delay Net-work must be known, see Fig. 4. Jot uses a Feedback Delay Network simplyto achieve delay lines with feedback. A general FDN structure is composedof a set of digital delay lines and a single feedback matrix often named A,which connects the inputs and the outputs of the delay lines.

Figure 4: A general FDN structure with matrix and delay lines

Hakan Johansson, Danny Smith, Erik Stromdahl, Helene Svensson 29

The matrix has the function of spreading the energy among the N delaylines. By choosing an appropriate structure of the matrix different combi-nations of unit filters can be represented, comb as all-pass. To get a stableand efficient system there are some restrictions to follow. To ensure sta-bility of the system the matrix is often chosen to be unitary, which meansthat the poles are on the unit circle. A unitary feedback matrix circulatesthe energy of the input signal endlessly through the network. All energyis preserved and thereby it is also called a lossless feedback matrix. If thematrix is chosen with no null coefficients a faster increase of the echo densityis achieved due to the recirculation through the many delays, i.e. a matrixwith no null coefficients gives a maximum density without an increase in thenumber of delay lines. There are several of matrix designs that fulfil thesecriterions and in our reverb implementation we have chosen to use one ofthe Householder matrices. In addition to the advantage that this matrix isunitary and lossless it also minimizes the complexity of the implementationfrom ON ∗N to N ∗ log(N), for a N by N matrix.

The length of the delay lines should be chosen as mutually prime. Thisis important to avoid superpositioning of the signals. If the length of thedelay lines has one or more common factor superpositioning will easily occurdue to the recirculation in the FDN, which will not sound smooth to theear.

2.2 Algorithm description

This subsection will describe how the original standard algorithm works.Figure 5 shows the overall structure [4]. The N inputs of the FDN arefed with the sampled input signal x. The N outputs results in a singleoutput y after being summarized and filtered. Inside the FDN each delayline consists of a simple delay in series with a first-order IIR lowpass filter.Because the FDN is lossless the lowpass filter is used as a dampener to setthe reverberation time i.e. how long it takes for the energy to dissipate fromthe FDN. The transfer function of this filter is

Hi(z) = gi1− ai

1− aiz−1,

gi = 10−3MiT

t60(0) , ai =ln(10)

4∗ log10(gi)(1−

1α2

), α =t60(π/T )t60(0)

where gi gives desired reverberation time at dc and ai sets reverberationtime at high frequencies [2].

The structure also contains multipliers bk and ck, also called gains. Theseare used to set the amplitude of the early reflections and output signals,respectively to a desired value.

A tonal correction filter is needed for the late reverb. Since each delayline in the FDN takes use of a low-pass filter hk(w) to set the reverberation

30 Reverberation

Figure 5: The algorithm proposed by Jot

time, it naturally also effects the characteristics of the frequency response.The tonal correction filter reverses this so the frequency content is restored.Since the direct signal is not passing through the FDN, the tonal filter isnot needed here. The tonal correction filter for the late reverb is

E(z) =1− bz−1

1− b

whereb =

1− α

1 + α,

same alpha as before [3].

2.3 Early reflections

To build up and model a room with its characteristics the early reflectionsare important. These can be achieved with different approaches.

Jot made an improvement to his first algorithm, called Jot2 (Fig. 6),which models the early reflections separate [4]. The gains and delay lengthsbelonging to the first echoes are treated with a FIR filter separate from theother structure and accordingly it gives full control over the early reflections.The filter outputs are added to the signals in the FDN and hence also fedinto the matrix and filters. Jot2 works fine for smaller rooms, but when youhave larger rooms and feed the inputs of the FDN with the samples fromthe early reflections we experienced that the reverb almost dried out beforethe next early reflection arrived, which made it sound like a lot of echoes.

Hakan Johansson, Danny Smith, Erik Stromdahl, Helene Svensson 31

Figure 6: Jot2, separate modelling of early reflections

Instead we chose to implement the early reflections as a separate partfrom the other structure and hence the early reflections are not fed intothe FDN. Figure 7 shows the structure. Our approach causes the need ofan extra buffer (see predelay, Fig. 7) but the advantage is clearly thatit makes it possible to adjust the early and the late reflections separateand independent of each other. Since the calculation of the parametersof the early reflections lies outside this projects scope we used an externalapplication to generate these parameters [5].

Figure 7: Our reverb structure

32 Reverberation

3 Implementation

3.1 Buffers

The buffers used in the FDN and for the early reflection have been imple-mented as circular buffers. This means that no elements in the buffers aremoved, instead it is the pointer to the address where new data are insertedthat moves. When the pointer moves outside the buffer it starts over at theother end of the buffer.

In the FDN there are buffers with two different tasks. Both got a pointerto the last element in the buffer that is updated every time a new value isadded to a buffer. First, there are the input buffers containing the inputdata to the feedback delay network. There are two input buffers, one foreach audio channel. Second, there are the buffers used inside the FDN. Thenumber of buffers used for each audio channel equals the dimension of theFDN matrix. These buffers contain the output from the network and areused by the low pass filters in the feedback delay network.

The buffers used for the early reflections are different from those in theFDN. They got an array of pointers instead of a single pointer. Each arrayof pointers keeps track of the delayed elements that are to be used, includingone pointer that points at the last inserted value. The number of taps used isfour times the dimension of the FDN matrix. One set for each audio channeland then two more sets crossing samples between the audio channels.

Figure 8: Structure of the buffers

3.2 Matrix multiplication

The decision to use a householder matrix has made it possible to performthe matrix multiplication in the feedback delay network with only two mul-

Hakan Johansson, Danny Smith, Erik Stromdahl, Helene Svensson 33

tiplications, one for each audio channel being used. This is possible as thematrix can be written as a matrix with identical elements added to an iden-tity matrix. In the implementation the FDN outputs are added together andthen multiplied with the value of the elements in the matrix with identicalelements. To update the buffer inside the FDN this product is added toeach of the FDN outputs (representing the multiplication with the identitymatrix) and inserted into the buffer.

The use of this matrix implementation proved to give a substantial com-putation gain. We were able to move from 8 kHz sampling rate to 32 kHzwithout an increase in the number of calculations. Later on, after we dis-covered the optimization build option, we increased the sampling rate to 48kHz.

Figure 9: The matrix multiplication

3.3 Data types

All calculations in the implementation are made using float. But input andoutput data to the system are of the type unsigned short. The conversionbetween these has shown to be a bit sensitive. All input data has thereforefirst been converted to short and then to float. An equivalent process hasbeen used for the output data, converting data from float to short and thento unsigned short.

3.4 Memory management

Using the memory on the DSP card correctly was a bit more time consumingthan we first thought. With an early version of our program the cachememory was sufficient. The problems came later on as the implementationneeded more memory. The problem was mainly to understand how to usethe SDRAM memory correctly as it was hard to find any good tutorials orinstructions. Getting familiar with the memory settings in Code ComposerStudio and using the MEM calloc() function finally led us to the solution.

34 Reverberation

Figure 10: Impulse response of reverb algorithm sampled at 48 kHz

4 Results

A reverb algorithm can be viewed as a linear digital filter. In order to makethe reverb algorithm sound natural, the impulse response of the reverb filtermust remind of the impulse response for a real room. One way to determinethe quality of the reverb is to examine its frequency response. A frequencyresponse containing a lot of peaks indicates that the impulse response doesnot agree with the impulse response for a real room. Figure 10 shows theimpulse response of the reverb and Fig. 11 shows the magnitude spectra.In Fig. 10 one can easily see the early reflections (the beginning of theimpulse response) and the late reverb. Notice that the impulse responsedecays in a smooth fashion. This is due to the fact that the delay lengthsin the FDN are all mutually prime. This helps prevent superpositioningof samples circulating in the FDN. As could be seen in figure 11 the highfrequencies are amplified, probably by the tonal correction filter. Though,this is nothing that is perceived by our ears.

Figure 11: Frequency response of reverb algorithm sampled at 48 kHz

Hakan Johansson, Danny Smith, Erik Stromdahl, Helene Svensson 35

The most important thing when the quality of the reverb should beevaluated, is how it is actually sounding. This is the criterion we have usedwhen designing the reverb. Various pieces of music have been used in orderto tune the reverb parameters. A very important detail when fine tuningthe reverb is that the music being used does not contain any reverb. If themusic already contains a reverb the filtered result often sounds really bad.

5 Future work

Since we a have used a trial and error method in choosing the parameters,the reverb can still be improved. One way to improve the reverb and tryto make it sound more natural is to introduce variable delay lengths inthe FDN. This means that the delays are multiplied with a time varyingfunction, e.g. a sinusoid or the output from an unstable IIR filter. Thefunction must of course have an offset in order to obtain positive delays.This approach introduces some implementation difficulties when it comes tothe handling of the circular buffers.

When designing the reverb we have only experimented with the actualreverb algorithm. No user interface has been implemented. Allowing theuser to change the reverb time in real time would have been a nice feature.

References

[1] Jot, J.-M. Acoustics, Speech, and Signal Processing, 1992. ICASSP-92.,1992 IEEE International Conference on, Volume: 2, 23-26 March 1992

[2] http://ccrma.stanford.edu/~jos/Reverb/Reverb_4up.pdf

[3] http://ccrma.stanford.edu/~jos/Reverb/Tonal_Correction_Filter.html

[4] Lilja, Ola Algorithms for reverberation - theory and implementation,2002 (Master Thesis, LTH)

[5] http://www.musicdsp.org/files/early_echo_eng.c

36 Reverberation

37

Part IV

Talking Orchestra- a vocoder approach based oncross synthesization

Pascal Collberg, Catia Hansen,Benny Johansson, Fia Lasson

Abstract

The aim of this study was to implement a “Talking Orchestra”.There were two possible implementation alternatives. Both based onvocoder techniques. The first alternative examined was the LPC ap-proach which was abandoned due to miserable results in the MATLABprototype. The second approach was the cross syntesization versionthat was completely implemented. In the cross syntesization imple-mentation a closer look into subband filtering and envelope functionswas a must. The second assignment was to port the MATLAB code toC code and a Texas Instrument DSP card. This process was problem-atic but the final result was comparable with the MATLAB prototype.Further improvements could be to match the subband filters better orto let some voice subband pass unchanged through the vocoder.

38 Talking Orchestra - a vocoder approach based on cross synthesization

1 Introduction

In the course Algorithms in Signal Processors the students are supposed toimplement an algorithm on a digital signal processor. This report calledTalking Orchestra, named from the assignment on the course, aims to ex-plain how to make an instrument talk. To accomplish this goal the techniquestarts from analysis of two sounds to use the characteristics of one sound forexample speech, to modify the characteristics of another sound for examplean instrument.

Talking orchestra is in some way a misleading name. The name asso-ciates to multiple instruments talking simultaneously. All instruments havea characteristic making its sound specific. In reality, implementing a realtalking orchestra is a very complex problem since it requires each instrumentto be separated from the others.

A simpler interpretation of talking orchestra is when only one instrumentis used. To combine speech and the sound from an instrument, vocoding isa convenient method.

In this report two different vocoder structures are discussed and aftervalidation one of the methods was chosen for implementation. The methodwas implemented in C and transferred to the TI 6713 DSP-platform.

The purpose of this report is to show how the characteristics of twodifferent signals: speech, and sound from an instrument, can be combinedto make it sound like the instrument talk and how to realize this on a Texasinstrument Texas Instrument DSP C6713.

Pascal Collberg, Catia Hansen, Benny Johansson, Fia Lasson 39

2 Equipment

To solve this assignment following equipment has been used:

• MATLAB version 6.5

• DSP-card Texas Instrument 6713

• Code composer studio

• Behringer microphone-amplifer

• Shure microphone

The DSP-card is connected via the computers USB-port for data transmis-sion. Via this connection binary source files are downloaded to the DSP-card. When the DSP-card is in run mode audio signals are passed from theline-in input to the headphones-out output. Since the goal of this project isto mix voice signals with instrumental signals two inputs are needed. Thisis accomplished by splitting the stereo signal into two mono signals (voiceon the left channel and instrument on the right channel). To be able tocontrol the gain of the microphone a pre-amplifier is connected in betweenthe microphone and the DSP-card. To play instrument signals the soundrecorder program in Windows is used.

Code Composer Studio is a powerful development environment that wasimposed to use in this project. Special features used during the developmentof this application are the LOG-window, used to print out control messagesand values of variables, the CPU-load graph, used to supervise the processorsworking load.

40 Talking Orchestra - a vocoder approach based on cross synthesization

Figure 1: ”Equipment arrangement.

Pascal Collberg, Catia Hansen, Benny Johansson, Fia Lasson 41

Figure 2: The letter ”AAAA” spoken by a woman.

3 Theory

To understand the method of the project it is important to understand fun-damental theory about speech in order to be able to manipulate it. Thereare at least two ways to implement a talking orchestra; Linear PredictiveCodeing (LPC) and Cross Syntezation(CS). The theory behind each tech-nique and the fundamentals of speech are presented in the ”speech” section”Speech”.

3.1 Speech

Human speech is a mix of voiced and unvoiced sounds. To create a sound, airfrom the lungs is passed through the throat. When a voiced sound should becreated the vocal cords are tense, resulting in a harmonic vibration. On theother hand unvoiced sound is created when air passes unchanged throughlose vocal cords. When the air is pressed out of the mouth it obtains a noisycharacteristic. Each unvoiced sound has its own characteristic formed bythe oral cavity. A spectrum investigation of the signal after the vocal cordsshows two different plots. In the voiced case periodic notches appear thekeynote and its overtones with flat amplitude. In the unvoiced case the plotshows only noise. The air stream passed through the vocal cords (called thecarrier) is shaped by the oral cavity. This can be seen as a filter operatingon the carrier forming it to a specific sound, this is why it is called theformant.[3] Now the plots show a different characteristic, the spectrum isnot flat anymore and the appearance depends on the letter being spoken.This is why humans can distinguish between different letters even thoughthey actually have the same origin.

42 Talking Orchestra - a vocoder approach based on cross synthesization

The vocal cords is also the reason why the voice of men and of womenslightly differs. A man has longer vocal cords resulting in a lower resonancefrequency for example around 200 Hz for a bass and 500 Hz for a soprano.In figure 2 both the frequency notches and the form are clearly visible. [1]

3.2 Characteristics of instruments

Musical instruments create sounds in a very similar way to the human voice.Instruments can be divided into two subgroups: string instruments and windinstruments. For string instruments the sound is generated by one or severalstrings oscillate giving rise to frequencies rebouncing in the sounding bodyof the instrument. Wind instruments have also a sounding body but insteadair blown into the instrument makes a membrane vibrate.

3.3 Talking orchestra with LPC

LPC (Linear Predictive Coding) is a powerful tool to analyze speech. It isused in various applications such as mobile phone communications [6]. LPCmodulates the oral cavitys transfer function by filter coefficients. These co-efficients are calculated by the well-known Levinson-Durbin algorithm. Inthe Levinson-Durbin algorithm different variables are calculated, the reflec-tion coefficients are used to compute the AR-parameters. This is a way todescribe the oral cavitys transfer function. [2]

To render the LPC more efficient highpass filtering applies before ex-tracting the AR-coefficients. This is done to raise the spectrum of the signalthat is in most cases concentrated at low frequencies. [3]

In the LPC implementation the input signal is passed through an inversefilter of the AR-parameters. The output of this filter, named the residual,represents the speech signal without the oral cavitys influence. The spectrumis flattened again.

Through speech analysis with the LPC algorithm it is possible to derivethe oral cavitys transfer function. This approach diverges from the com-mon procedure of mixing instrument with speech using LPC. To make aninstrument talk the idea is to pass its carrier signal through the formant ofhuman speech. This is performed by handle the two signals separatly. Thefilter parameters are called LPC coefficients. Since LPC only acts on voicedsounds, a LPC-based vocoder is sensitive to the input signal. If the spokensound is unvoiced, which is mainly noise, it is much harder to calculate anyLPC coefficients [3]. In this case the LPC algorithm is often overridden by amonitoring algorithm and the signal is represented by noise characteristics.

By choosing the pitch from either the speech or the music and combinethis with the characteristic from the other source it is possible to create theimpression of a talking instrument [2].

Pascal Collberg, Catia Hansen, Benny Johansson, Fia Lasson 43

3.4 Talking orchestra with CS

The Cross Synthesization (CS) vocoder deals with the carrier formant no-tion in a slightly different way than the LPC vocoder. CS does not handlethe signals as they are created by the vocal cords and the oral cavity or theinstruments body. Instead both input signals are frequency divided into sev-eral subbands. Every subband of the carrier is then formed by informationfrom the subband divided formant signal. To achieve a correct output signalthe shaped carrier subband signals are added together. The signal informa-tion from the formant that is used to shape the carrier is amplitude. It isactually the formants amplitude function that is multiplied to the carrier.Historically when the CS vocoder was implemented in analogue circuits thismultiplication was performed by voltage control amplifiers (VCA:s). But intodays digital computers this is a very simple task. [4]

Envelope is actually calculating the mean value of the signal. [8] Thiscould be done by the very simple formula as seen in the equation below.

y(n− 1) =x(n) + x(n− 2)

2(1)

To extract the carriers amplitude function the envelope is used. Giventhat the DSP-card represents the signals numerically, a basic amplitudeenvelope where the output signal is the absolute value of input signal canbe used. Problem with such a simple envelope is that high frequency noiseoccurs at the output. This makes the envelope in need of modification.Improvements can be done in two different ways, using a low pass filter onthe output signal or integrating an exponential function into the envelopealgorithm giving a smoother output signal. In this application a functioncalled envelope performs the envelope calculations. In table 1 the structureof the algorithm is claryfied. Table 1

44 Talking Orchestra - a vocoder approach based on cross synthesization

Step Explanation1 The function has two in-parameters, ”minne” and ”envIn”.

”Minne” is a vector where the old values are stored and”envIn” is the the new input value.

2 First ”envIn” is divided by 8000, this is done due to thefact that the input samples are in short format and needs tobe translated to float format between 0 and 1. The specificvalue 8000 has been computed empirical.

3 Envelopes are always positive. This is why if ”envIn” isnegative the value will be multiplied with −1 so it always ispositive.

4 The envelope output (minne) is actually only depending onwhether the new input is larger or smaller than the previ-ous output. If ”envIn” is larger or equal to ”minne” then”minne” = ”minne” + ”minne” ∗ 0.04, where 0.04 is an em-pirical value. If ”envIn” is less than ”minne” then ”minne”= ”minne” − ”minne” ∗ 0.01, where 0.01 again is an empir-ical value. In this way the envelope function will generate asmooth output signal that follows the input max amplitudevery well.

5 Finally the algorithm controls so that the value neither istoo small or too big. If ”minne” is smaller than 0.001 then”minne” is equal to 0.001 and if ”minne” is larger than 1 itis given the value 1.

Subband filtering is done by filtering the same input signal using severaldifferent filters. In this application a decision was made to use eight sub-bands [4]. Since human speech has most information in the band between200 − 500 Hz, more precise measures should be done in this region i.e. nar-rower filter. While subband filters outside this region can be less precise.Subband technology is based on bandpass filters that divide a signal intosmaller subsignals.

Adding the subsignals together will result in the original signal, with theonly distinction that the phase may be distorted but containing the samefrequency information. To create a bandpass filter is a very simple task intheory. In frequency domain all that is needed is to cut out the importantpart, see figure 4.

Unfortunately this is not possible to carry out because transformingnotches results in infinite ripples in the time domain. An attempt to imple-ment such a filter in the time domain leads to infinite lengths.

Since it is impossible to use filters of infinite length it is necessary toround off the edges of the spectrum. This is done by a window function. Inthis application the Hamming window was chosen because of its favourableproperties i.e. the main lobe has sharp edges and the side lobes are sig-

Pascal Collberg, Catia Hansen, Benny Johansson, Fia Lasson 45

Figure 3: Picture shows subbands in the frequency domain

Figure 4: A bandpass filter can be modelled by cutting out a partwith the width w from the signal

46 Talking Orchestra - a vocoder approach based on cross synthesization

nificantly attenuated. When a window function is applied to a signal theyare multiplied in the time domain. The Hamming window time function isshown in the equation below.

wH(n) = 0.54− 0.46 cos(2π(n− M−1

2 )M − 1

) (2)

In the equation M represents the filter length. The bandpass filters arethen created by a lowpass and a highpass filter connected in series. A typicallowpass filter is implemented by the following function:

hLp(n) = 2f1sin(2πf1(n− M−1

2 )2πf1(n− M−1

2 )(3)

Where f1 is the cut-off frequency and M the filter length. Actually alowpass filter is no more than a sinc function. By transposing the lowpassfilter up to a higher frequency it will act as a bandpass filter.This is doneby multiplying the signal with a cosine. This application used the followingequation.

hHp(n) = 2 cos(2πf0(n−M − 1

2) (4)

Where M is the filter length and f0 the cut–off frequency. By alteringthe parameter f0 the bandpass filter can be moved along the frequency axis,see figure The f1 parameter controls the bandwidth. [1]

Through empirical methods eight subband filters was developed, in table2 the cut–off frequencys for the different filters is displayed.

Table 2Subband f0–f1 f0 + f1

1 0 250Hz2 175Hz 475Hz3 410Hz 690Hz4 650Hz 1000Hz5 980Hz 1550Hz6 1500Hz 2315Hz7 2245Hz 3465Hz8 3370Hz –

If the speech signal is mixed with a carrier signal from a synthesizer itgenerates the very well known effect of ”robot voice”. Since this effect wasdiscovered it has been used a lot by various music artists such as Kraftwerkand Cher.

A very important notice is the fact that the carrier - formant notion in aCS vocoder is inversed in relation to the LPC technique. In CS vocoders thevoice is the formant that shapes the instruments amplitude. While on theother hand the LPC vocoder uses the instruments body to form the speech

Pascal Collberg, Catia Hansen, Benny Johansson, Fia Lasson 47

Figure 5: by multiplying a signal with a cosine the basic frequencyis transposed

0 2000 4000 6000 8000−120

−100

−80

−60

−40

−20

0

20Sub−band 1 Frequency plot

f [Hz]

|H(f

)| [d

B]

0 2000 4000 6000 8000−120

−100

−80

−60

−40

−20

0

20Sub−band 2 Frequency plot

f [Hz]

|H(f

)| [d

B]

0 2000 4000 6000 8000−120

−100

−80

−60

−40

−20

0Sub−band 3 Frequency plot

f [Hz]

|H(f

)| [d

B]

0 2000 4000 6000 8000−120

−100

−80

−60

−40

−20

0Sub−band 4 Frequency plot

f [Hz]

|H(f

)| [d

B]

Figure 6: Frequency responce of subbfilter 1 − 4

48 Talking Orchestra - a vocoder approach based on cross synthesization

0 2000 4000 6000 8000−120

−100

−80

−60

−40

−20

0

20Sub−band 5 Frequency plot

f [Hz]

|H(f

)| [d

B]

0 2000 4000 6000 8000−100

−80

−60

−40

−20

0

20Sub−band 6 Frequency plot

f [Hz]

|H(f

)| [d

B]

0 2000 4000 6000 8000−80

−60

−40

−20

0

20Sub−band 7 Frequency plot

f [Hz]

|H(f

)| [d

B]

0 2000 4000 6000 8000−120

−100

−80

−60

−40

−20

0

20Sub−band 8 Frequency plot

f [Hz]

|H(f

)| [d

B]

Figure 7: Frequency responce of subbfilter 5 − 8

Pascal Collberg, Catia Hansen, Benny Johansson, Fia Lasson 49

signal. Another main difference between the two vocoders is that the CSvocoder was originally implemented in analogue circuits. The LPC vocoder,on the other hand, only exists in digital form. This due to the adaptivealgorithm characteristic that is impossible to implement in analogue circuits.

4 Method

First approach (LPC)

• Elicitation: The elicitation of technique was governed from the guide-lines for this project. Since focus on a LPC approach was stated inthe project description this approach was accepted instantly, withoutconsidering any other techniques.

• Analysis and specification: A small booklet was provided beforethe project started. It contained a brief description of an implementa-tion using LPC. Additional research was performed to achieve a deeperunderstanding for the method which was necessary to really exploit allthe possibilities of the method.

• Implementation and verification: It was possible to recycle aMATLAB code from a laboratory exercise in a previous course. Withsmall modifications the new code could extract the residual and gammaparameters (representing the oral cavity) and processing this data ledto a hearable result of a LPC vocoder in action. To verify the result,two independent implementations were performed.

• Validation: The LPC method was dropped after validation since itshowed large flaws in sound quality for both subgroups implementa-tions. Absence of a commercial LPC vocoder rendered it impossible tomake any comparison and convince the group of the techniques abilityto proper function as a vocoder.

Second approach (CS)

• Elicitation: Searching the internet for an alternative solution resultedin another technique known as CS. Though found in the form of ananalogue circuit diagram it provided a fun challenge to realize as adigital implementation. [4]

• Analysis and specification: Prior to introduce this technique it wasnecessary to understand how amplitude envelopes, subband filters, andVCAs work. Consulting a teacher led to a recommendation to an oldmaster theses explaining one solution to amplitude envelopes. Refresh-ing old knowledge about subband filters were not enough to understand

50 Talking Orchestra - a vocoder approach based on cross synthesization

how to implement these filters efficiently in C-code. Therefore the tu-tor helped out by giving a compendium about filter derivation andan extra informative lecture. All this information was necessary torender possible a change of approach. Concerning the VCA workingdigital simplified things substantially. The whole component could bereplaced with a simple multiplication operation.

• Implementation and verification: Before any implementation wasperformed, a program inspired sketch over the signal scheme was drawnto derive the overall program structure. This simplified the selection ofvariable names and resulted in a code easy-to-grasp. Once tested andverified in MATLAB the functions were implemented in the hardwarewith Code Composer Studio.

• Validation: Since the result is in the form of speech mixed withmusic, the best instrument in the world to judge quality was close athand (the human ear). Showing clear results using plots is very hardsince the major focus is on sound. [7]

5 Conclusions

5.1 Problems and Solutions

During the conception phase several problems were encountered. Below themost important are mentioned and how they were solved. The first draftturned out to have a lot of distortion and noise when the formant was mixedwith the carrier in the VCA-function. The subbands and the envelope func-tion seemed to work although the noise remained. To solve this problema thorough review of all the code was necessary. The MATLAB code wasrewritten in a more C-friendly way to insure that both implementations per-formed the same task. MATLAB plots were used to ensure that the filtercoefficients were correct. The plots point out several errors and the codecould be corrected.

Reimplemented in C another problem occurred, notches appeared in theleft output channel. This turned out to be the output variable that wascleared every time the run time system left the process function. By settingthis variable to static the problem disappeared. Until now the program hadonly treated the signal in one band.

Next step in the development was to split the signal into several sub-bands. This caused yet more noise and distortion which almost led theproject back to square one! Fortunately some simple transformations ofscalar variables to vectors brought the project back on track.

Pascal Collberg, Catia Hansen, Benny Johansson, Fia Lasson 51

Another annoying problem occurred at the USB-port on the DSP-card.Every time the plug was moved it caused CCS to crash. Apparently theplug was poorly fitted. CCS seems to be a sensitive program due to crasheswhen MATLAB or Windows media player were running at the same time.

The major advantage with CS is its simple structure; consisting of onlythree basic units, subband, envelope and VCA. Simplicity made it possibleto derive every part from scratch, which led to an increased understanding.Derivation of each function gave an opportunity to exploit knowledge fromearly courses. Problems were encountered when deducing the subband filterand envelope function.

5.2 Results

The goal of this assignment was to implement a functional vocoder on aTexas Instrument processor with competitive qualities. There are some dif-ficulties with presenting results in a written report, since the main results aresounds, and plots of frequencies can not represent sound quality or the de-sired effect of a vocoder. In view of the fact that this application is intendedfor music, quality is best judged with the ear.

First approach with LPC implementation led to unsatisfying propertiesalready at the verification state. Although only tested on vowels, it washard to hear differences in the modified signal and noise suspected to comefrom a poorly implemented algorithm.

Since the output from both MATLAB and the DSP card sounded similar,the second approach was approved in the validation state. An audible resultwas achieved, however it left more to be wanted. At present state it isvery hard to understand the spoken words; an improvement would be if thespeech was coming through clearer. It is possible to obtain more speechcharacteristics by sending the speech signal from the high frequent subbandstraight through the application, without passing through neither of theenvelope function or VCA.

5.3 Acquired knowledge

In this project the group members learned;

• how to derivate filters using only trigonometric functions such as thesinc function.

• how to apply a window function to receive a smooth, strong filter.

• to trust their already obtained knowledge instead of using copy-pastefrom known techniques. With support from teachers the group mem-bers felt confident to try out own theories (such as the envelope func-tion).

52 Talking Orchestra - a vocoder approach based on cross synthesization

• producing code in C for an embedded system.

• about Code Composer Studio and more about MATLAB.

• the importance of well structured and commented code, for ease inunderstanding and debugging.

• more about writing technical reports.

• creating and following a realistic time plan.

• how to go from theory to representative implementation.

• more about how DSP:s work and what to consider working in realtime.

Greater interaction with the other groups could have yielded better resultsfor both parties. But because of the time plan (late working hours due tosubstantial load from other courses) the group never met the other groupsduring implementation. All the group members are thankfull for the supportgiven by Martin Stridh, Frida Nilsson and Bengt Mandersson, and belivetheese three person made the project more interesting and a bit easier.

Pascal Collberg, Catia Hansen, Benny Johansson, Fia Lasson 53

0 2000 4000 6000 8000 10000 120000

50

100

150

200

250

300Formant fft

f

|H(f

)|

0 1 2 3 4

x 104

0

200

400

600

800

1000Formant Sub−band 3 DSP

f

|H(f

)|

0 5000 10000 150000

50

100

150

200

250Formant Sub−band 3 MATLAB

f

|H(f

)|

Figure 8: Top picture showes a frequency plot from the spokensentence “Detta ar en test signal!”Bottom showes filtered spec-trum, {left} from DSP implementation, {right} from MATLABimplementation.

54 Talking Orchestra - a vocoder approach based on cross synthesization

0 0.5 1 1.5−1

−0.5

0

0.5

1Formant

t [s]

A

0 0.5 1 1.5 2−0.4

−0.2

0

0.2

0.4

0.6Formant Sub−band 3 Envelope DSP

t [s]

A

0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7Formant Sub−band 3 Envelope MATLAB

t [s]

A

Figure 9: Top picture showes a time domain plot from the spokensentence “Detta ar en test signal!” Bottom showes envelop spec-trum, {left} from DSP implementation, {right} from MATLABimplementation.

Pascal Collberg, Catia Hansen, Benny Johansson, Fia Lasson 55

0 2 4 6

−0.5

0

0.5

Carrier Sub−band 3 DSP

t [s]

A

0 0.5 1 1.5

−0.2

−0.1

0

0.1

0.2

Carrier Sub−band 3 MATLAB

t [s]

A

0 0.5 1 1.5

−0.2

0

0.2

0.4

Formant Sub−band 3 Envelope DSP

t [s]

A

0 0.5 1 1.50

0.1

0.2

0.3

0.4

0.5

0.6

Formant Sub−band 3 Envelope MATLAB

t [s]

A

0 0.5 1 1.5−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6Result DSP

t [s]

A

0 0.5 1 1.5

−0.3

−0.2

−0.1

0

0.1

0.2

0.3Result MATLAB

t [s]

A

Figure 10: Left column shows characteristics from speech and rightcolumn shows characteristics from carrier music. Top picturesshowes time domain plots, middle pictures showes the spectra afterfiltering with subband 3, bottom pictures showes {left} envelopefrom speech, {right} output after VCA.

56 Talking Orchestra - a vocoder approach based on cross synthesization

References

[1] Mandersson Bengt, ”Tillampad elektronik, Tidsdiskreta kretsar och sig-naler . D (ETT021)”,1998, Institutionen for tillampad eletronik, LundsTekniska Hogskola

[2] Monson Hayes, “Statistical digital signal processing and modeling”,John Wiley, 1996. ISBN: 0471594318

[3] Stridh Martin, 2003, “Optimal Signalbehandling Labbhandledning”,Department of Electroscience, Lund University, Sweden

[4] http://www.paia.com/vocodwrk.htm - 2005-02-13

[5] Mitra: ”Digital Signal Processing. A Computer Based Approach”, 2001,McGRAW-HILL, ISBN: 0-07-118175-X

[6] Cook PerryR, “Real Sound Synthesis for Interachtive Applications”,2002, ISBN 1-56881-168-3

[7] Lauesen Soren, Software Requirements - Styles and Techniques,Addison-Wesley, ISBN 0-201-74570-4, 2002.

[8] http://www.tvms.net/Tech Articles/Synchronous vs Envelope Detection.htm

57

Part V

Voice Alteration usingLPC–analysis

Magnus Gunnarsson, Anders Frick,Joakim Persson, Johan Runnberg

Abstract

What we want to show in this rapport is how you can alter yourvoice using Linear Predictive Coding (LPC) analysis. LPC has beenused in the GSM nets since the middle of the 1980’s and is therefore awellknown method that works well and is very robust. What LPC inshort does, is that it tries to predict output samples based on a linearcombination of previous samples, and in this way create an approxi-mation of the input signal. The reason why we are using LPC, is thatit analyses the signal in a way that needs a relatively small amount ofcomputer power which makes it a very fast and powerful tool to usewhen you have a realtime application. All of the signal processing wasmade on a detached DSP-card from Texas Instruments because of itscustom design for this kind of applications.

58 Voice Alteration using LPC–analysis

1 Voice Alteration

Figure 1: A simple flowgraph describing the signal processing

A very simplified model in how you can alter the voice, using LinearPredictive Coding analysis, is shown in figure 1.

1.1 The purpose of this project

This rapport shows how you can alter the voice of a person. This shouldbe done in real-time so the need for a fast and simple method to processthe voice is required. One way of doing this is by using Linear PrediciveCoding (LPC) analysis which is a method that fullfills these criteria. Thesignals in this report was sampled with 8 kHz sampling rate and all of thesignal processing was made on a detached DSP-card from Texas Instrumentsbecause of its custom design for this kind of applications. All code is writtenin C and used in a specially developed environment, also this from TexasInstruments, called Code Composer Studio.

1.2 What is LPC?

Linear Prediction is a way to, with the help of spectrum analysis, predictthe output samples by a combination of previous samples, and in this waycreate an approximation of the input signal. LPC makes a model of thesignal using an AR-process i.e. an allpole filter. The allpole filter modelsthe spectrum of a few formant (energy) peaks. Such a model can make agood approximation of the human voice and some musical instruments.

1.3 From input to output

On the input we get a realtime signal, e.g. a human voice, that are tobe altered. The idea is to find an allpole model, filter the signal throughan inverse of the allpole filter i.e. a FIR-filter, and in this way reduce thevoice information in the signal. Then change the allpole filter by moving thepoles and then finally alter the original signal by sending the filtered signalthrough the modified filter to the output. Figure 2 which is shown below isa flowgraph describing how the signal is handled from the input and all theway to the modified output.

Magnus Gunnarsson, Anders Frick, Joakim Persson, Johan Runnberg 59

Figure 2: Flowgraph describing the signal processing in every step of theprocess

1.3.1 Highpass filter

The first thing that happens to the input signal is a highpass filtering. Thisstep is done to make the frequency spectra get a more useful form, whereall the frequences gets closer in magnitude. Doing this makes the furthermodelling of the signal more effective. The highpass filter is a simple filter,seen in equation 1.

HHP = 1− 0.98z−1 (1)

When this filtering is done the signal is now ready to be processed.

1.3.2 The allpole filter

The allpole filter that will be used to model the signal is of order two andlooks like equation 2.

V (z) =1

1 + a1z−1 + a2z−2(2)

To be able to calculate this allpole filter the first thing that is needed isto estimate the correlation coefficients.

1.3.3 Correlation

Correlation and the use of correlation coefficients is much used in signalprocessing. The correlation coefficients describes the relation between two

60 Voice Alteration using LPC–analysis

variables and is used to predict the next sample in a set of samples by usinga weight function with old samples. In intervals of 128 samples (16 ms) newcorrelation coefficients are calculated to update the allpole filter, modellingthe signal. The order of the allpole filter is only two and therefore it is justneeded to calculate 3 cofficients. This is easily done using equation 3, wherethe first correlation coefficient is normalized to one and where x is a vectorwith samples. [1]

rx(k) =127∑i=k

x(i)x(i− k) (3)

1.3.4 Schur recursion

The sequence of correlation coefficients are now used to calculate the corre-sponding sequence of reflection coefficients (Γ). These are used because oftheir lower sensitivity to truncations and roundoff errors, making the pre-cision in further calculations much better. The Schur recursion [2] in it’swhole can be seen in table 1 .

Table 1: The Schur Recursion

1. Set g0(k) = gR0 (k) = rx(k) for k = 0, 1, ..., p

2. For j = 0, 1, ..., p− 1

(a) Set Γj+1 = −gj(j + 1)/gRj (j)

(b) For k = j + 2, ..., p

gj+1(k) = gj(k) + Γj+1gRj (k − 1)

(c) For k = j + 1, ..., p

gRj+1(k) = gR

j (k − 1) + Γ∗j+1gj(k)

3. εp = gRp (p)

Magnus Gunnarsson, Anders Frick, Joakim Persson, Johan Runnberg 61

Figure 3: The equivalence between the correlation sequence, the reflectioncoefficients and the allpole model

1.3.5 Calculate the a-coefficients using Step-Up Recursion

As shown in figure 3, the way to achieve the wanted a-coefficients (a2(1), a2(2)),in the allpole filter (1/A(z)), is by using the Levinson-Durbin Step-up Re-cursion [3] shown in table 2. This with the help of the already calculatedreflection coefficients,Γ.

Table 2: The Step-up Recursion

1. Initialize the recursion: a0(0) = 1

2. For j = 0, 1, ..., p− 1

(a) For i = 1, 2, ..., j

aj+1(i) = aj(i) + Γj+1a∗j (j − i + 1)

(b) aj+1(j + 1) = Γj+1

3. b(0) = √εp

1.3.6 Calculate the roots in A(z) and filter the signal

To calculate the roots in this second order FIR-filter, the wellknown p-qformula can be used. See equation 4.

62 Voice Alteration using LPC–analysis

p = a2(1), q = a2(2)

x2 + px + q = 0 ⇒ x = −p

√(p

2

)2− q (4)

It is very important to check that the roots are stable, i.e. inside theunit circle, and if not move them towards the origin until they are stable.When a stable filter is achieved the next thing to do is to accually filter thesignal through A(z).

1.3.7 Create an allpole model 1/A(z), modify and filter the resid-ual through 1/Amod(z)

If this would be a re-creation of the signal, it would just be sent through theallpole filter 1/A(z). In this case an alteration of the signal is made to makeit sound a bit different. By changing the placement of the poles in the filterthere will be a displacement of the formants in the signal frequency spectra.This gives an amplification of other frequencies than in the original signal.The interesting part is to change the angle of the pole without changing itsdistance to the origin. This can be done by the equations 5, 6 and 7 usingthe information in figure 4.

Figure 4: An illustration showing the movement of the poles

r = |Re(pole) + Im(pole)| =√

(Re(pole))2 + (Im(pole))2 (5)

α = arctan(

Im(pole)Re(pole)

)(6)

Magnus Gunnarsson, Anders Frick, Joakim Persson, Johan Runnberg 63

Newpole : z = r cos(α + ϕ) + ir sin(α + ϕ) (7)

If the pole has an imaginary part there is always a complex conjugatedroot. Therefore it is just needed to change the sign of the imaginary partof the first calculated pole to achieve the conjugated one. An example ofa poleplacement, with in this case eight poles, is shown below to the leftin figure 5. To the right two polepairs have been moved according to thetherory in equations 5, 6 and 7 above.

−1 −0.5 0 0.5 1

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Real Part

Imag

inar

y P

art

−1 −0.5 0 0.5 1

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Real Part

Imag

inar

y P

art

Figure 5: To the left there is an example of a poleplacement and to the righttwo polepairs have been moved

With the new poleplacements a new polynomial for the allpole filter(1/Amod(z)) can be calculated. To re-create the signal with some alterationthe residual is filtered through the modified allpole filter.

1.3.8 Lowpass filter

The last thing that is to be done before listening to the output signal is toreverse the effect of the highpass filter that was used on the signal in thebeginning. The easiest way of achieving this is to make an inverse of thehighpass filter, but since the highpass filter has a zero in 0.98 there can beproblems with stability. To avoid this the inverse filter was created as inequation 8.

HLP =1

1− 0.95z−1(8)

64 Voice Alteration using LPC–analysis

2 Implementation

The main program which is written in C has four variables for the commonuser. Their purpose is to, in an easy way, change the modes in the code.Two of them are used for changing the angle and magnitude of the poles,and the other two, called sound left and sound right, are used for changingthe output signal from the DSP-card. The later ones can be set accordingto the following schedule:

Variable for changing the sound output.

• 0: Original

• 1: Silent

• 2: Highpass filtered

• 3: Residual

• 4: Altered (end product)

• 5: Lowpass filtered

Recommended settings are 4/0.

The sound samples is put in a vector with every other sample for theright respectively for the left channel. This must be taken in considerationwhen calculating the correlation. In the project all calculations are basedon the signal in the left channel alone.

Processing the signal is then done by using different C-code functionsthat operates according to the theories in chapter 1.

The DSP-card can just manage an output of ±32000. The program hasa control that checks that the signal doesn’t exceeds this value before it issent to the output. If it does it is set to ±31999 and a warning is written inthe LOG-window.

Magnus Gunnarsson, Anders Frick, Joakim Persson, Johan Runnberg 65

3 Results and conclusions

3.1 Results

According to all the theories in chapter 1, when altering a signal, the for-mants should move depending on the change in poleplacement. With apositive change the formants are moved up in frequency amplifying the tre-ble, and with a negative change the formants are moved down amplifyingthe base.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

50

100

150

200

250

300

350

Figure 6: Frequency spectra of a signal sent straight trough the DSP-cardwithout alternation

Seeing the differences in figure 6 and figure 7 the formants are reallychanged from lower to higher frequencies making the signal sound a bitdifferent, reducing the base and amplifying the treble.

After a lot of tests it was seen that a change of 30 degrees gave the bestresults considering hearable differences.

3.2 Conclusions

The goal of the project was to alter a voice, such that it would be difficultto recognize the one speaking, using a 8th order model. This became hard

66 Voice Alteration using LPC–analysis

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

5

10

15

20

25

30

35

Figure 7: Frequency spectra of a signal sent trough the DSP-card with a 30degree change in the poleplacement

to do for several reasons. In matlab it was easy to simulate the change inpoleplacement but when it came to implement this in C-code the workload,for calculating the roots of the 8th order A(z) polynomial, became too muchfor the DSP-card to handle. With this limitation it was needed to cut downthe model to be of 2nd order instead. This made it impossible to reach theoriginal goal of altering the voice enough to make it unrecognizable.

Because of the use of a 2nd order model it was needed to add a bandpassfilter when calculating the correlation coefficients. The reason for this wasthat only the first formant was to any use with this model, considering thatis contains the keynote. This also made the highpass and lowpass filtersunnecessary.

To obtain better stability when altering the poles they were also scaleddown with a factor 0.7 .

The implementation in C was very timeconsuming and numerous prob-lems were encountered, mostly concerning the function of the different op-erations in the signalprocessing.

Magnus Gunnarsson, Anders Frick, Joakim Persson, Johan Runnberg 67

Working with this project has given an insight in how voice alternationshould work, but also how difficult it can be to make an efficient and wellworking algorithm for signal processing.

68 Voice Alteration using LPC–analysis

References

[1] Monson H. Hayes, Statistical Digital Signal Processing and Modeling,John Wiley & Sons, Inc 1996, page 194

[2] Monson H. Hayes, Statistical Digital Signal Processing and Modeling,John Wiley & Sons, Inc 1996, page 245

[3] Monson H. Hayes, Statistical Digital Signal Processing and Modeling,John Wiley & Sons, Inc 1996, page 233

69

Part VI

Subtractive Synthesis on theTMS320C6713 DSP card

Daniel Camera

Abstract

This project was aimed at programming a simple synthesizer basedon the Subtractive Synthesis method. The target hardware was theTMS320C6713 Evaluation Module which is a Texas Instruments DSPin a circuit with several peripherals: A/D converter, external memory,USB communications, power supply, etc. The programming interfacewas TI’s Code Composer Studio and the programming language wasC++. The musical tracks were acquired through the transmission ofMIDI files from the computer to the board, and the notes were playedwith the aid of the processor’s PIP buffers, and its interrupt serviceroutines. The musical waveforms were also enhanced with frequencymodulation and an amplitude envelope.

70 Subtractive Synthesis on the TMS320C6713 DSP card

1 Introduction

Modern synthesizers can be based on a wide variety of methods: subtractivesynthesis, FM synthesis, wavetable synthesis, and additive synthesis amongothers. The first two are very common and the focus of this project is onthe first method. Subtractive synthesis was traditionally used in analogsynthesizers and is still very popular [2]. In the Theory section there is abrief description of this.

The hardware used in the project was Texas Instruments’ TMS320C6713Evaluation Module which connected to a PC via USB in an LTH computerlab. On the software side, the interface was Code Composer Studio, and ad-ditionally, MATLAB was very important for designing and testing differentfilters and signal sources.

The synthesized instruments of this project don’t have very convincingsound but nonetheless the work was a great exercise in real time program-ming and debugging. To create rich musical tones with a computer or DSPis a challenging task.

Daniel Camera 71

2 Theory

2.1 Subtractive Synthesis

This type of synthesis consists in sculpting the spectrum of a certain signalsource to obtain sounds that resemble musical instruments, voice or else. Asa simplification the phase spectrum of the signal to be shaped is assumedto be unimportant. Its amplitude spectrum is desired to be rich in har-monics so as to have more possibilities and to add character to the sound.Suitable waveforms for subtractive synthesis are square, triangular and saw-tooth waves, pulse signals and noise. A nice addition is to have modulationof parameters, where the most prominent ones are frequency and duty cycle.

This method will in principle produce stationary sounds as the sound istaken from the steady state output of the filters. Several techniques can beused in order to make the sound more natural and interesting:

• Amplitude envelopes, with attack, decay, sustain and release.

• Frequency modulation to simulate vibrato, or just to make the soundless synthetic.

• More advanced techniques like time-varying filters, all-pass filters (forphase manipulation) [1].

On the practical side, subtractive synthesis is often regarded as an effi-cient synthesis method. It is probably less computationally intensive thanadditive synthesis, and this is mostly due to how easy it is to generate thesignal source.

2.2 MIDI

MIDI stands for “Musical Instrument Digital Interface” and is a hardwareand software specification that allows communication between electronic mu-sical instruments. The hardware specification is a RS232-like serial interfaceand the software side of the specification is the actual language of the datathat is being sent: its structure, order and other characteristics.

An essential aspect of MIDI data is that it doesn’t contain sampledsound, instead it is a “series of commands or instructions which representmusical performance-related events” [4].

2.2.1 MIDI files

A MIDI file consists of “chunks” of data, which are just blocks of bytes.There are two types of them: header and track chunks. Every file musthave one header chunk and one or more track chunks. The data is storedwith its most significant bit first.

72 Subtractive Synthesis on the TMS320C6713 DSP card

The block of header data specifies global properties of the file:

• The format of the file, which can take the values of 0, 1 and 2. Inthis project, format 0 was used. This means that the MIDI file justcontains a single track chunk, which can potentially have multi-channeldata.

• The number of track chunks in the file.

• A variable called tickdiv which represents the timing interval to beused. This timing can be metrical (tempo related) or absolute (givenin seconds).

The track chunk (for format 0) is comprised of the following variables:

• A variable length Delta-time quantity that stores the time since theprevious event.

• At least 2 bytes that denote the event (MIDI, SysEx and Meta).

MIDI events contain the musical commands, with the channel to beplayed and the MIDI message. SysEx stands for System Exclusive andrefers to escape events and other events. Meta events consist of informationthat would not be in the MIDI stream like TrackName, Text, KeySig, etc.

Daniel Camera 73

3 First stages of the project

When faced with the task of executing subtractive synthesis to generateinstrument-like sounds, I thought that the most important part was to gen-erate the filters for each different note. This could be done manually bycomparing the spectrum of the input signal and the signal to be modelled,and then designing bandpass filters (also called ‘multiband filters’) to boostor attenuate selected harmonics.

The input signal was chosen to be a square wave with variable duty cyclein order to have almost all harmonics available. If the duty cycle were tobe fixed at 50% then only odd harmonics are possible. On the other hand,many instruments lack the presence of very high frequency harmonics, sothe addition of a suitable low pass filters is essential.

The prospect of manually designing multiband filters for every note,octave, and instrument inspired the use of Adaptive filters that could ‘auto-matically’ shape a square wave’s spectrum to match a desired instrument.The system for this analysis is shown in Figure 1:

Figure 1: Block diagram of the Adaptive setup

The adaptive filter can be based on the LMS algorithm or on the Nor-malized LMS algorithm. After some adaptation time the y(n) sequencematches the desired signal d(n) with a little step size dependent error. Thex(n) input is correlated to d(n) if the fundamental frequencies are equal,and then we can expect the error to decrease in time. Nonetheless the filtersthat were realized with this idea are not implemented in this project.

74 Subtractive Synthesis on the TMS320C6713 DSP card

4 Methods

4.1 Low pass filters

Low pass filters were designed with the GUI fdatool (Filter Design andAnalysis Tool) found in MATLAB. This tool lets the user design analogand digital filters with a wide variety of methods. One can design fromamplitude spectrum specifications such as passband and stopband gain, orone can choose to fix the filter order and cutoff frequency. The latter waywas chosen. The filters were specified as FIR filters and designed using thewindow method, with the default window (Kaiser). The coefficients wereeasily exported to either the workspace (to test the performance) or to a Cheader file (for implementing in the DSP).

As an example, to create a basic sound with the fundamental tone andonly one harmonic one can create a high order low pass filter with cutoff inthe second harmonic, and then filter the square wave with it. This simplesound will be approximate to a flute, which has a very sinusoidal wave-shape. The challenging thing about synthesizing this particular instrumentis choosing an envelope: a flute musician can be very expressive with thevolume, and will probably play a note sequence with increasing, decreasingor alternating envelopes. But this is outside the scope of this project.

4.2 Multiband filters

The creation of the multiband filters was done with a MATLAB functioncalled fircls, found in the Signal Processing Toolbox. Its inputs are:

• A vector of transition frequencies. The frequency values of this vectormust be normalized in such a way that 1 is the Nyquist frequency.

• A vector of desired amplitudes in the frequency response. Each el-ement will have correspondence with one of the frequency intervalsspecified in the frequency vector. Thus, the amplitudes vector willhave one less element than the frequencies vector.

• The order of the filter.

• Two vectors that define an amplitude tolerance for the frequency in-tervals.

I found that the tolerance vectors are more or less chosen in an arbitrarymanner. What worked for me was to set them equal to the amplitude vectorplus or minus some reasonable constant. Then the output of the function isa vector of FIR filter coefficients that is generated with “an iterative least-squares algorithm to obtain an equiripple response”.

Daniel Camera 75

Figure 2: A Multiband filter

In Figure 2 there is a plot of the fircls function input (the dashed line)and its output FIR filter amplitude response. The filter leaves frequenciesabove 0.15 almost unaltered; but shapes the low frequency harmonics of thesignal. The high peak will amplify by a factor of 3-4 the 4th harmonic ofthe note for which the filter was designed. There are noticeable ripples inthe response, which are explained by the algorithm that fircls uses.

This filter design procedure would in principle have to be done for everynote in the musical scale of the instrument, and for every instrument, ifthe synthesizer is to be a good one. Nonetheless, cheap synthesizers onlychange filters every octave, as Professor Bengt Mandersson observed. And ifthe filter order is high enough the transition bands can be very abrupt andsmall, making the use of the filter acceptable for notes in a neighborhood ofthe tone for which it was designed.

Furthermore, a clever way to get around the problem of generating afilter for every octave is to design the filters for the highest octave and theninterpolating the filter output to obtain 2 or 4 times more samples, and thusdoubling or quadrupling the period of the note. This was suggested by FridaNilsson, who is also in charge of the course. This has some drawbacks asit assumes that the instrument’s character doesn’t change with frequency,and the time it takes to execute the interpolation function in the DSP candisrupt the real time operation.

4.3 Envelope Creation

A generic envelope should have attack, decay, sustain and release. From areal instrument waveform (an isolated banjo note from a music track) anenvelope was derived with the help of MATLAB’s Curve Fitting tool. ThisGUI lets the user select custom points from the data vector to then makethe fit got through those instead of the whole sequence. This was important

76 Subtractive Synthesis on the TMS320C6713 DSP card

because it provided with a way of selecting the pertinent ’high’ points ofthe waveform. A sum of two Gaussian functions was used and the result isshowed in Figure 3.

Figure 3: The Curve Fitting Tool

4.4 Debugging Methods

This might as well be in the Results section but since most of the time spentin the program was spent debugging, it makes sense to comment on how itwas actually done.

In Code Composer Studio the breakpoints and memory map were veryuseful in debugging the program. I could never figure out how to get theprobe points and adjunct graphs to do their job, but the memory map wasenough to know what was going on with the variables.

For some time the FIR function was thought to be malfunctioning, andto debug it I used MATLAB to simulate the sequence that should had beenin DSP memory for a correct algorithm. For this simulation the square wavefunction had to be programmed in a MATLAB script file as well.

After some time it was discovered that the CPU could be reset andrestarted from a Code Composer menu, whereas at first the only way Icould find to start debugging from base address and states was to load thewhole program again. This used to be time consuming and aggravating.

A loop was programmed to generate a chromatic scale of 12 notes, andthis was very useful when testing the functions of the program that werenot related to MIDI parsing:

• Hearing the scale played over and over again was a test of implemen-tation robustness.

Daniel Camera 77

• The accuracy and quality of the sound was tested comparing the DSP’soutput sound with the sound produced in a computer using MATLAB.

5 Implementation

5.1 Square Wave generation

The square wave was created with a function that returns only one sam-ple. This function is constantly called upon and its input arguments arefrequency and duty cycle. The peak to peak amplitude is 2, and the mean(which is a function of the duty cycle) is subtracted to obtain a waveformwithout any DC component. Whether to output +1 or -1 is determined bystatic variable that is reset every period of the signal.

5.2 Filters

A static array was used to store the square wave’s past values. Before callingthe function that computes the finite convolution of the filter and the signalone has to put the current (present) value of it in the first position of thearray. The output was the current sample of the filtered signal.

5.3 Interpolation

Linear interpolation can be used if the sampling rate is high enough. Theimplementation problem is naturally solved with a recursive algorithm be-cause if upsampling by a factor of 4 instead of 2 is required then one onlyhas to interpolate the interpolated sequence! This of course has a limit, asthe distortion will increase with each step, and because good interpolationis done with ‘sinc-like’ pulses. But band-limited interpolation is a very CPUintensive task, and is not usual in commercial synthesizers [3].

5.4 Envelopes

To implement the envelope the filter output sequences are multiplied by thecurrent value of the envelope, which is computed in every step using theGaussian function parameters stored in the DSP. The other possibility is tostore the sequence values of the envelope in memory, but this would be avery large array at the current sampling rate (44.1 kHz). It could also bepractical to use a piecewise linear function that approximates the curve.

It has to be taken into account that this type of envelope decays to zerovery quickly, so a linearly decaying zone is implemented on the final part ofit.

78 Subtractive Synthesis on the TMS320C6713 DSP card

5.5 Frequency Modulation

The input to the square wave generating function was the sum of a fixed fre-quency (the actual note) plus a small and slowly varying sinusoidal variationthat changed the pitch slightly over time. The amplitude of this sinusoidalvariation increases as the frequency of the notes to be generated increases,so that the effect is perceptually ‘constant’.

6 Results

• Distortion was in my experience usually caused by algorithms takingtoo much time. For example, a function that calculates 2n/12 whichis the interval between semitones can take a lot of time. Instead alook-up table was used.

• There were a lot of problems with the compiler when trying to pro-gram C structures. These would have been useful to represent thecharacteristics of an instrument in a compact way.

• The flute-like instrument channel sounds good, albeit a little syntheticand monotonous.

To program the interpolation function a recursive solution was adopted.It seemed very natural (and fulfilling) to do it in this manner. But therewere some problems:

1. Recursive functions require local variables, but these were filling thestack and overwriting other values. The resulting corruption was lead-ing the program to crash.

2. To avoid this 1st problem one could define the variables as static butthis would overwrite the last instance of the variables during a recur-sive call, thus making it impossible to return to the previous statecorrectly.

3. The way around it was to use dynamic memory allocation in this way

var = (int*)MEM_alloc(INTERNALHEAP,sizeof(int)*10,0);

This worked wonderfully as each instance of the function saved itsvariables in a different place of the internal memory. The instruction

MEM_free(INTERNALHEAP,VAR,sizeof(int)*10);

also needed to be executed after the function was finished to dynami-cally clear the allocated memory.

Daniel Camera 79

4. Nonetheless, the dynamic memory allocation seemed to take to muchtime, judging from the sound of the interpolated notes. If when usingstatic variables one only interpolates once (meaning no recursive callstaking place), then the sound is very good. But once the memoryproblem is taken into account, speed problems appear. The only so-lution is to program a recursive function without any local variableswhatsoever with the help of static arrays declared in the beginning ofthe synthesizer program.

7 Conclusions

As with other programming language interfaces I have used before, one ofthe most important things is to lose fear in the interface. With a littlepractice one gains confidence, which makes the work evolve at a faster pace.

It seems reasonable to conclude that recursive functions are not too ap-propriate in real time applications. But trying to do it was good training inmany programming techniques such as dynamic memory allocation, pointermanipulation, and usage of indexes.

New ideas arose as a by-product of working in this project. For exampleinstead of a sine modulating wave one could use white noise filtered with anextremely narrowband filter to obtain a more natural sounding frequencymodulation. This filtered noise would resemble a sinusoid. Another idea isto implement time-variable IIR filters to introduce a non-stationarity withfew parameters and make the sound more interesting.

References

[1] C. Roads, The Computer Music Tutorial, The MIT Press, 1996

[2] Subtractive synthesishttp://www.audiomidi.com/classroom/synths/subtractive.cfm

[3] Sound, synthesis and audio reproductionhttp://www.helsinki.fi/˜ssyreeni/dsound/dsound-c-07

[4] Introductory guide to MIDIhttp://users.argonet.co.uk/users/lenny/midi/help/intro.html

80 Subtractive Synthesis on the TMS320C6713 DSP card

81

Part VII

Chorus

Christian Isaksson, Per-Ola Larsson,Marcus Lindh, Andreas Nilsson

Abstract

The chorus effect makes one voice or instrument sound as many. Itis created with variable delays that emulate the lack of synchronizationin a natural choir or orchestra. To further add to the effect, amplitudemodulation is also used. Sometimes it is desired to add delays thatdo not sound exactly as the original voice or instrument. This is ac-complished with use of filtering. This report is the result of a project,where the mentioned effects have been implemented in C-code. Thecode has then been compiled and uploaded to a DSP-card. In spite ofsome problems with the implementation, the programming was not themost complex task; that was instead the tuning. The tuning mostlyconsists of finding the optimal length of the delay and magnitude of theamplitude modulation. It is possible to find tunings that works quitewell on almost all sounds, but if an optimal effect is wanted every soundhas to be tuned individually.

82 Chorus

1 Introduction

In almost all music, vocal or instrumental, a number of effects are addedto enrich the sound. One of these effects is the chorus effect, which is usedto make one voice or instrument sound as many. The basic idea is to makethe sound “thicker” by adding delayed copies to it. In this report a brieftheoretical background on chorus effects is given followed by the strategy forimplementing it on a DSP-card from Texas Instruments. The theory is givenin chapter 2 and the implementation is described in chapter 3. A discussionon the difficulties encountered during the project is held in chapter 4.

Christian Isaksson, Per-Ola Larsson, Marcus Lindh, Andreas Nilsson 83

2 Signal Processing behind chorus

Chorus is an effect that can make one voice or instrument sound as many.The effect adds thickness to the sound by adding copies of it, which ismodified with respect to delay, amplitude and pitch. In, for example, achoir not everyone can sing synchronized; the voices differ both in delay,strength and frequency (e.g. soprano or alto). This is emulated by thechorus effect with a variable delay, an amplitude modulation and differentfilters [3]. The model for the chorus effect is of single input single outputtype. What the model should look like will be investigated in the followingchapters. See Fig. 1.

Figure 1: A simple block diagram of the chorus effect.

The parameters for the different parts of the chorus effect have to bemodified to fit both every unique sound and every unique environment forthat sound. Thus the difficulty is not to implement the different effects, butto actually find parameters that make it sound “good” in different situations.This is quite obvious; otherwise there would be no use of mixer tables.

2.1 Delay

The delay consists of two parts, a constant delay and a variable delay. Theconstant delay is normally between 30-100 ms. With a longer delay an echoeffect appears, which is not the intent of the chorus. In a real chorus thedelay between the different voices are not the same all the time, it changesover time. To model this delay sweep, the delay time is calculated with alow frequency oscillation (LFO) waveform, see Fig. 2. Normally a sinusoid,a logarithmic or a triangular shape is used [2], see Fig. 3 for examples. Thesewaveforms usually have a frequency of about 1 Hz, but can be up to 3 Hz [2].The sweep depth (the size of the variable delay) is usually between 20-40ms. Choices of the frequency and sweep depth is significant for the resultingvoice.

An illustration of how the delay changes over time is presented in Fig. 4.The largest total delay is given when the waveform reaches its maximumvalue and the smallest delay when it reaches its minimum value.

84 Chorus

Figure 2: Block diagram of the chorus effect including the LFO.

Sin

usoi

dT

riang

ular

Loga

ritm

ic

Figure 3: Three different LFO waveforms: sinusoid, triangular and logarith-mic.

Christian Isaksson, Per-Ola Larsson, Marcus Lindh, Andreas Nilsson 85

Tot

al d

elay

Sw

eep depth C

onstant delay

Figure 4: The delay change over time using a sinusoidal LFO.

2.2 Frequency analysis of the different LFO waveforms

To get a feeling for what the different waveforms add to the sound it couldbe useful to perform frequency analysis. The spectrograms of the outputfor the different waveforms are shown in Fig. 5- 7. In all three cases theinput signal is a sine wave. If no effect was used the spectrogram wouldonly contain a single horizontal line.

In the case of the sine wave one can clearly see that it introduces allfrequencies in a certain interval around the input frequency. This is becausewhen the sine wave goes from its lowest to its highest value it starts off witha very low derivative. The derivative increases until it reaches the mid-pointwhere it starts to decrease. This leads to that the sinusoidal shape of thespectrogram.

The logarithmic delay varies very slowly everywhere except where itstarts a new period. This gives a spectrogram with harmonics.

For the triangle wave there is only two different values of the derivative.This will lead to that the frequency never is the original one, but eitherlower or higher.

86 Chorus

Time

Fre

quen

cy

0 1 2 3 4 5 6 70

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Figure 5: Spectrogram of the output when a sinusoidal LFO was used on a5000 Hz sinusoidal input signal

Time

Fre

quen

cy

0 1 2 3 4 5 6 70

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Figure 6: Spectrogram of the output when a logarithmic LFO was used ona 5000 Hz sinusoidal input signal

Christian Isaksson, Per-Ola Larsson, Marcus Lindh, Andreas Nilsson 87

Time

Fre

quen

cy

0 1 2 3 4 5 6 70

500

1000

1500

2000

2500

3000

3500

4000

Figure 7: Spectrogram of the output when a triangular sinusoidal LFO wasused on a 2000 Hz sinusoidal input signal

2.3 Filtering

To make the chorus effect sound as desired the spectra can be shaped byusing different filters on the delayed signals. Filters are also used directly onthe input signal to remove various disturbances. The filters that are usedhave linear phase, but this is not necessary. See an updated version of theblock diagram in Fig. 8. A typcial useful filter is a low-pass FIR-filter with42 taps and a cut-off frequency of 4000 Hz.

Figure 8: Block diagram now with filter.

88 Chorus

2.4 Amplitude modulation

With the LFO delay we have introduced variable delay and frequency dis-tortion. This is however not enough to mimic a true choir. It is not verylikely that the entire choir or orchestra is able to maintain the same intensitythrough the whole music piece.

Thus, an effect is needed that modulates the amplitude of the delayedsound. The most common modulation constitutes of a sinusoidal. Theblock diagram can now be updated as seen in Fig. 9 and the effect of themodulation is shown in Fig. 10.

Figure 9: An updated version of the chorus effect, now with modulationfeature.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−0.03

−0.02

−0.01

0

0.01

0.02

0.03

ampl

itude

time/s

Figure 10: Example of a signal with strong amplitude modulation.

Christian Isaksson, Per-Ola Larsson, Marcus Lindh, Andreas Nilsson 89

2.5 Full model

When using it all at the same time and with multiple delay branches themodel becomes like in Fig. 11. With stereo input there should be differentdelay paths for the different channels.

Figure 11: Full chorus effect using multiple branches.

90 Chorus

3 Implementation in C-code

To realize the chorus effect the DSP-card DSP C6713 from Texas Instru-ments is used. The development environment used with this card is CodeComposer Studio (CCS). To implement the chorus effect on the DSP-cardwith CCS a C program has to be constructed.

3.1 Algorithm

The tactics to make it all work is the following:

1. Put the sample in the circular buffer.

2. Calculate the delay.

3. Interpolate.

4. Filter.

5. Modulate amplitude.

6. Sum up the output signal.

In the following sections the steps in the algorithm are thoroughly described.

3.1.1 Put the sample in the circular buffer

To be able to delay the signal a certain time all the previous samples, whichcorresponds to that time, have to be stored. In this implementation thatis done in a circular buffer. A circular buffer is a vector where a pointerpoints at the location where the next element is to be inserted. When anelement is inserted the pointer is incremented to the next position in thevector. See Fig. 12 for an illustration of the buffer. With sample rate 44.1kHz a suitable buffer size is 7000 for each channel.

Figure 12: Graphical overview of a circular buffer.

Christian Isaksson, Per-Ola Larsson, Marcus Lindh, Andreas Nilsson 91

3.1.2 Calculate the delay

As mentioned in previous sections there must be a variable delay to makethe chorus effect sound good. This is implemented by adding a constantdelay with a varying one. The variable delay is calculated in the differentfunctions sinDelay, logDelay and triDelay. These are shown below. Whenadding the constant and variable delay the result is the total delay given bya number of samples.

float sinDelay(int n, int amplitude, short phase, short freq) {return amplitude*sin(2*3.1415*freq/Fs*n + phase);

}

float logDelay(int n) {return log(n+1);

}

float triDelay(int n, int delay) {static int sign = -1;if (n==0) sign = -sign;

return sign * delay/Fs * 2 * n;}

3.1.3 Interpolate

The total delay can be a non-integer value. This is because the variabledelay functions sinDelay, logDelay and triDelay all returns a float. If thecalculated delay is non-integer it can be rounded off to the sample beforeor after. However, this introduces noise in the output signal. To get ridof this noise, interpolation is needed. A simple interpolation is the linearinterpolation. An illustrative example is shown in Fig. 13. Below is the codethat performs the interpolation.

Figure 13: The used interpolation principle.

92 Chorus

float interpolate(float delayAndSweep, short *xvec,int xvecLength) {

int var_index;int index1, index2;var_index = floor(delayAndSweep);index1 = var_index % xvecLength;index2 = (var_index+1) % xvecLength;return (xvec[index1] + (xvec[index2] -

xvec[index1]) * (delayAndSweep - var_index));}

3.1.4 Filter

In some delay branches filtering is desired. In our implementation the filter-ing is done with FIR-filters. A general filter function is used for this purpose(see code below). The function uses the filter coefficients h and the inputsignal x as two of the input parameters.

static short firfilt(int n, float *h, short x, short *xprev) {int i,j;short sum;sum = x*h[0];for (i=1;i<n;i++)

sum += h[i]*xprev[n-i-1];for (j=0;j<n;j++)

xprev[j] = xprev[j+1];xprev[n] = x;return sum;

}

3.1.5 Modulate amplitude

The amplitude modulation is done in the modulateAmplitude function. Thefunction makes the amplitude vary with a sinusoidal shape. The output isa value between 1− amp and 1 + amp.

float modulateAmplitude(float amp, float ampFreq,float phase, int n) {

return (1-amp*sin(2*3.1415*ampFreq/Fs * n+ phase));

}

Christian Isaksson, Per-Ola Larsson, Marcus Lindh, Andreas Nilsson 93

3.1.6 Sum up the output signal

As the final step in the algorithm all the branches are summed up to oneoutput signal. It is preferable that the branches have different parametersfor the calls to the functions for delay, amplitude modulation and filtering.It is of course possible to give the branches different weights. This resultsin that the different effects have more or less impact on the output signal.

3.2 Memory handling

To be able to have the delay large enough the sample vector has to be ofat least the length of the delay. With higher sample rate the vector sizegrows rapidly and more memory is required. Since the cache memory isnot enough, the vector has to be allocated on the external memory (SD-RAM). The external memory is slower than cache but it is fast enough inthis application.

To be able to use the external memory, it has to be declared with thename given in the memory section of DSP/BIOS-Config. In this case itis called EXTERNALHEAP ( EXTERNALHEAP in DSP/BIOS-Config).The wanted variable also has to be declared. This is done by the follow-ing two rows that are placed at the top of the .c-file amongst the otherdeclarations.

extern Int EXTERNALHEAP;short *xvec;

In main() the vector has to be allocated (here with size 7000).

xvec = (short*)MEM_alloc(EXTERNALHEAP, 7000*sizeof(short), 0);

Now the variable xvec, declared in the external memory, can be used in theprogram as a usual variable declared in the cache.

94 Chorus

4 Conclusions

During the project a number of difficulties were encountered. The difficultiesregarded both the material and the implementation. One material problemwas discovered when we first started to test the equipment. There wereserious problems with the cables that led to a situation, where changes inthe program were not detected on the output of the DSP-card. After sometesting it was determined that there was a ground error in one of the cables;a simple fault that caused a lot of irritation. We also had some problemswith Code Composer Studio, it crashed several times. The problem couldbe caused by the interference from other programs such as MATLAB andWindows Media Player.

In the beginning of the project we used a sample rate of 8 kHz. Thisproved however to give a poor sound quality, thus the sample rate had tobe increased. The final choice of sample rate was 44.1 kHz, since this is theCD standard. Note, the choice of sample rate is a trade-off between CPUload and sound quality.

The longest delay length in the chorus effect is approximately 100 ms.As previously shown this delay is achieved by saving old samples that corre-sponds to the 100 ms delay. When the sample rate is increased the neededdelay vector grows linearly. For 44.1 kHz the vector length (for one channel)is around 7000 elements. To be able to have a vector that big the externalmemory had to be used. Our first attempt was to use an ordinary vector tostore the samples. For every new sample all other samples in the vector aremoved one step. This burdens the CPU with a heavy load, and with thistechnique it was impossible to use as high sample frequency as 44.1 kHz. Inorder to have a better sound quality a new solution was needed, thereforethe circular buffer was implemented. This change led to a dramatic decreasein CPU-load. In the case of 8 kHz sample rate the CPU-load went from 87%to 43%. The execution graphs for the different vector implementations areshown in Fig. 14.

Figure 14: Executions graphs for ordinary buffer (top) and circular buffer(bottom).

Christian Isaksson, Per-Ola Larsson, Marcus Lindh, Andreas Nilsson 95

A vital and time-consuming task is to choose parameters that make thechorus effect sound good. Every sound has its own optimal set of param-eters. Something that sounds good for a trumpet can be really bad for aguitar. Because of this it has been difficult to find good test sounds. Almostall recordings contains more than one single instrument and often alreadywith a chorus effect applied. However, it is possible to give a hint of whatthe parameters should be, see table 1 and 2. When testing different param-eters one should have in mind that the chorus effect should not make toomuch impact on the original sound, only give a greater thickness to it.

Constant delay length 30-100 msVariable delay length 20-40 msVariable delay frequency 0.1-1 Hz

Table 1: Ranges for the delay variables.

Modulation amplitude (% of the original amplitude) 0-80 %Modulation frequency 0-1 Hz

Table 2: Ranges for the amplitude modulation variables.

96 Chorus

References

[1] Tobias O., Hubert P., Irene H. and Elisabeth S. The Not So ShortIntroduction to LATEX2e, 2003.

[2] Scott L. Harmony Cental R©www.harmony-central.com, 1996

[3] Curtis R. The computer music tutorial, p439-440