Adaptive Antenna-array Processing - TU Delft

Technical note TN-2007-00023

Issued: 01/2007

Adaptive Antenna-array ProcessingA DSP Implementation

T.Abayomi

Philips Research Eindhoven

Philips Restricted

c© Koninklijke Philips Electronics N.V. 2007

TN-2007-00023 Philips Restricted

Concerns: Interim Report

Period of Work: 23 January 2006 - 10 January 2007

Notebooks: None

Authors’ addresses: T. Abayomi [email protected]

M. Stassen [email protected]

c© KONINKLIJKE PHILIPS ELECTRONICS N.V. 2007All rights reserved. Reproduction or dissemination in whole or in part is prohibitedwithout the prior written consent of the copyright holder.

ii c© Koninklijke Philips Electronics N.V. 2007

MailTo:[email protected]

MailTo:[email protected]

Philips Restricted TN-2007-00023

Title: Adaptive Antenna-array Processing

Author(s): T.Abayomi

Reviewer(s): M.Stassen

Technical Note: TN-2007-00023

AdditionalNumbers:

Subcategory:

Project: Beamforming - CoSiNe (2007-00023)http://www.philips.com

Customer: Philips Research

Keywords: Beamforming, DSP, Adaptive array

Abstract: Adaptive beamforming is used to increase the gain of a receiver in thedirection of a desired signal using antenna arrays. The gain is also decreasedin the direction of interference and noise.

This report describes the implementation of a baseband processor thatcalculates the beamforming parameters and subsequently uses these param-eters to control the analog circuits to steer a beam towards a transmitter.The baseband processor is implemented on a Digital Signal Processor (DSP).

We aim to improve the processing time of existing implementationsand hereby allowing for real time simulations that enable an insight into theadvantages of beamforming.

We present a fixed-point implementation with no implemetation lossin the relevant Signal-to-Noise Ratios (SNR) and improve the processingspeed of existing implementations by a factor of 46.

c© Koninklijke Philips Electronics N.V. 2007 iii

http://www.philips.com


Conclusions: In this report, we have illustrated the implementation of a Beamform-ing receiver on a Digital Signal Processor (DSP). Initial simulations aredone in Matlab and the suitable algorithms are chosen. However, toinvestigate the benefits of beamforming in fast -changing environments,more processing power is needed so the calculations can be done in real-time.

Adaptive beamforming improves the Signal-Noise Ratio (SNR) of asignal by focusing on the transmitter with a narrow beam and reducingoutput in undesired directions, albeit with more complexity comparedto switched beamforming. Combining beamforming with Multiple-inputmultiple-output (MIMO) antenna systems which use spatial multiplexing toenhance the peak data transmission rate and/or transmit diversity methods toenhance the robustness of the transmission, enhances the performance of awireless transmission system.

A 1GHz DSP from Texas Instruments, the TMS320C6455, is chosenas a development platform. The algorithms are converted to the C program-ming language and are implemented in fixed point before it is ported to theDSP. We carried out simulations to ensure the integrity of the new imple-mentation with the Matlab implementation as a reference. After optimizingthe code using various methods which have been discussed, we showedthat we can obtain a speed improvement by a factor of 47. Without, theoptimizations and the code running on the same general purpose processoras Matlab, we obtain a speed improvement of 7.

The DSP implementation was subsequently integrated in a hardwaredemonstrator and the concept of a beamforming receiver running on a DSPin real time is deemed to be feasible.

iv c© Koninklijke Philips Electronics N.V. 2007


Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Beamforming: Concepts 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Areas of application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Beamforming Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3.1 Switched beamformers . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3.2 Adaptive Beamformers . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 Adaptive Beamforming approaches . . . . . . . . . . . . . . . . . . . . . . . . 52.4.1 Side lobe Cancellers . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4.2 Linearly Constrained Minimum Variance (LCMV) Beamforming . . . 52.4.3 Null-steering Beamforming . . . . . . . . . . . . . . . . . . . . . . . 52.4.4 Eigen-Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Beamforming Setup 63.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2.1 Training Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 Receiver Baseband Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3.1 Power Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3.2 Frequency and Timing Synchronization . . . . . . . . . . . . . . . . . 93.3.3 Channel Estimation & OFDM Demapping . . . . . . . . . . . . . . . 103.3.4 Beamformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.5 Carrier tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.6 Equalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3.7 Receiver Chain Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Algorithm Implementation 174.1 DSP Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 Evaluating Processor Performance . . . . . . . . . . . . . . . . . . . . 174.1.2 Choosing a processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 C/Floating Point Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.1 Header Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2.2 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2.3 Configuration Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . 20

c© Koninklijke Philips Electronics N.V. 2007 v


4.2.4 Baseband Processor - Communication . . . . . . . . . . . . . . . . . . 214.3 Fixed Point Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3.1 The Q. Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3.2 System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3.4 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 TMS320C6713 DSP Implementation . . . . . . . . . . . . . . . . . . . . . . . 264.4.1 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5 TMS320C6455 DSP Implementation . . . . . . . . . . . . . . . . . . . . . . . 274.5.1 Profiling and Optimization . . . . . . . . . . . . . . . . . . . . . . . . 274.5.2 The Tuning Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.5.4 Implementation Results . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Testbed Setup 345.1 Setup Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3 Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Conclusions 386.1 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

A Acknowledgements 39

References 41

vi c© Koninklijke Philips Electronics N.V. 2007


List of Figures

2.1 Radiation pattern for a line of 4 antenna elements . . . . . . . . . . . . . . . . 32.2 Radiation pattern for a line of 4 antenna elements . . . . . . . . . . . . . . . . 32.3 Adaptive beamforming setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.1 Adaptive beamforming setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Transmitter block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 OFDM packet structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.4 Baseband processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.5 Power Detection Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.6 Carrier Tracking control loop . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.7 Equalizer Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.8 Matlab Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.9 Operation Count Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.1 Baseband Receiver- Top Level Class Diagram . . . . . . . . . . . . . . . . . . 194.2 Header Processing - Class Diagram . . . . . . . . . . . . . . . . . . . . . . . 204.3 Data Processing - Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 204.4 Configuration - Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5 Baseband Processor - Communication Diagram . . . . . . . . . . . . . . . . . 214.6 Fractional Q.15 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.7 Fractional Q3.12 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.8 Setup Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.9 Packet Error Rate vs. SNR for Fixed-point, Floating-point and Matlab Imple-

mentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.10 TMS320C6455 Development Starter Kit . . . . . . . . . . . . . . . . . . . . . 274.11 Code Profiling - Goals Window . . . . . . . . . . . . . . . . . . . . . . . . . . 284.12 Code Profiling - Profile Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.13 Code Profiling - Profile Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . 304.14 Code Profiling - Profile Cycle Count . . . . . . . . . . . . . . . . . . . . . . . 304.15 C64x Dot product mnemonic - dst = dot p(src1,scr2) . . . . . . . . . . . . . . 324.16 Code Profiling - Profile Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.1 Testbed Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Testbed Setup - Communication diagram . . . . . . . . . . . . . . . . . . . . . 355.3 Testbed Setup - Sequence diagram . . . . . . . . . . . . . . . . . . . . . . . . 355.4 Headers sent by the transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . 365.5 Matlab GUI for monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

c© Koninklijke Philips Electronics N.V. 2007 vii


Section 1

Introduction

1.1 Background

Wireless communication has had a great impact in the way we communicate. Looking back inthe past to the advent of the Internet, data communication meant we had to find an access point,connect our devices or use existing devices and then transmit the information we would liketo communicate. Presently it is possible to carry our devices anywhere and transmit or receivemessages or email when we are in the range of a wireless network.

The Philips Research CoSiNe group, located at the NatLab. on the High Tech Campus in Eind-hoven, does research on several topics in wireless connectivity, such as systems with multipleantennas (MIMO), channel coding, the digital television standard DVB-H and 60 GHz, withemphasis on the development of PHY-layer algorithms.

1.1.1 Goals

Advancements in research has helped to make almost any device compatible with a wide rangeof networks, be it a mobile phone, PDA, laptop or printer. With these opportunities available, thedemand for applications that need high throughput, like data, video or news content downloadsarises.

However, there are big hurdles that have to be overcome to achieve high data rate communica-tion. The communication channel is packed with a lot of devices that need to talk to each otherand potentially interfere with others. This problem combined with the need for high throughputor high data rate communication requires the devices to be able to adapt to changes in the com-munication channel.

Beamforming is a technique used to steer a receiver’s beam towards a desired transmitter.This increases throughput between the communicating parties. The transmitter is also poten-tially mobile. The beamforming algorithm on the receiver has to be able to adapt accordingly.

The goal of my project is deliver an implementation of an algorithm in real-time. This reportincludes the extent of a feasible implementation as well as limiting factors.

c© Koninklijke Philips Electronics N.V. 2007 1


Section 2

Beamforming: Concepts

2.1 Introduction

”Beamforming” is a general signal processing technique used to control the directionality of thereception or transmission of a signal on a transceiver array. When receiving a signal, beamform-ing can increase the gain in the direction of wanted signals and decrease the gain in the directionof interference and noise. When transmitting a signal, beamforming can increase the gain in thedirection the signal is to be sent. This is done by creating lobes and nulls in the radiation pattern.

The demand for higher data rates and demand for greater user capacity are two constants inthe telecommunication industry. Both depend on spectrum efficiency, the ratio of informationbits transmitted per amount of spectrum space used(usually expressed in bits/s/Hz). Improv-ing that efficiency generally involves tradeoffs between quality of service, power, and coverage.Combining the radio signals from a set of small non-directional antennas to simulate a largedirectional antenna improves the signal-to-noise ratio (SNR) of the transmission which in turnallows for an improvement in the spectrum efficiency. The simulated antenna can then be pointedelectronically towards a receiver to bring about a gain in the direction of the receiver.

2.2 Areas of application

Beamforming can be used to improve the range of existing systems. An example is the 60GHzsystem which is confined to pico-cell zones due to high levels of atmospheric attenuation.Multiple-input multiple-output (MIMO) antenna systems use spatial multiplexing to enhancethe peak data transmission rate and/or transmit diversity methods to enhance the robustness ofthe transmission. Beamforming improves received signal gain and reducing interference to otherusers. In order to effectively employ the frequency band in the complex 60 GHz environment,the benefits of a combination of digital beamforming and MIMO systems can be the key becausetheir antenna arrays can not only generate desirable radiation patterns, but also improve the ca-pacity of systems with multimedia and multi-rate services [1].

Another area of application is found in cellular base stations. A cellular base station is respon-sible for the organisation of communication of many users at one time. The methods used toincrease the efficiency of the system include cell sectorization and spatial diversity. Efficiencyof the system can be increased further by focusing a narrow beam on a single user. This isachieved through beamforming.

2 c© Koninklijke Philips Electronics N.V. 2007


Sonar which is a technique that uses sound propagation under water to navigate or to detectother vessels, also applies beamforming to improve quality. It creates a pulse of sound, oftencalled a ”ping”, and then listens for reflections of the pulse. To measure the distance to an ob-ject, one measures the time from emission of a pulse to reception. To measure the bearing, oneuses several hydrophones, and beamforming is used to measure the relative arrival time to eachhydrophone.

2.3 Beamforming Methods

The general idea of beamforming is to use an array of antennas to steer a beam in a certaindirection according to the applied amplitude and phase difference.

Figure 2.1: Radiation pattern for a line of 4 antenna elements

A simple directional antenna consists of a linear array of small radiating antenna elements,each fed with identical signals (the same amplitude and phase) from one transmitter. As the totalwidth of the array increases, the central beam becomes narrower. As the number of elementsincreases, the side lobes become smaller. Figure 2.1 is the radiation pattern for a line of 4elements (small dipole antennas) spaced 1/2 wavelength apart.

Figure 2.2: Radiation pattern for a line of 4 antenna elements

By varying the signal phases of the elements in a linear array, its main beam can be steered.



In an electronically steered array, programmable electronic phase shifters are used at each ele-ment in the array. The antenna is steered by programming the required phase shift value for eachelement. Figure 2.2 shows the beam pattern for an 4-element linear array with a progressivephase shift of 0.7pi per element. The central beam has been steered to the right.

Beamforming techniques can be broadly divided into two categories.

• Switched beamformers

• Adaptive beamformers

2.3.1 Switched beamformers

Complex weights are used to steer a beam in a specifice direction. If the complex weights usedare selected from a library of weights that form beams in specific, predetermined directions, theprocess is called switched beam forming. In this process, a hand-off between beams is requiredas receivers move tangentially to the antenna array [3].

2.3.2 Adaptive Beamformers

An adaptive beamformer is able to automatically adapt its response to different situations. Ifthe weights are computed and adaptively updated in real time, the process is known as adaptivebeam forming. The adaptive process permits narrower beams and reduced output in other direc-tions, significantly improving the signal-to-interference-plus-noise ratio (SINR). This is appliedin cellular technology, where each user’s signal is transmitted and received by the base stationonly in the direction of that particular user. This drastically reduces the overall interference inthe system and allows the possibility of multiple users in the same frequency.

Figure 2.3: Adaptive beamforming setup

Figure 2.3 shows a setup for adaptive beamforming. The output of the array y(t) with variableelement weights is the weighted sum of the received signals s(t) at the array elements and thenoise n(t) at the receivers connected to each element. The weights wm are iteratively computedbased on the array output, a reference signal d(t) that approximates the desired signal, and pre-vious weights. Although the complexity of implementing adaptive beamforming is greater thanthat of switched beamforming, it allows for a greater increase in performance of a transmission



system especially when either transmitter and receiver are mobile. In the following sections, wefocus on various approaches to implement adaptive beamforming.

2.4 Adaptive Beamforming approaches

2.4.1 Side lobe Cancellers

The side-lobe canceller system consists of a main antenna and an auxiliary antenna. The mainantenna is highly directional and is pointed in the desired signal direction.

It is assumed that the main antenna receives both the desired signal and the interfering sig-nals through its sidelobes. The auxiliary antenna primarily receives the interfering signals sinceit has very low gain in the direction of the desired signal. The auxiliary array weights are chosensuch that they cancel the interfering signals that are present in the sidelobes of the main arrayresponse.

The idea is to have the interferer responses of both the main and auxilliary antenna to be similar.When this happens, the overall response cancels out which can result in white noise [2].

2.4.2 Linearly Constrained Minimum Variance (LCMV) Beamforming

The basic idea behind linearly constrained minimum variance (LCMV) beamforming is to con-strain the response of the beamformer so signals from the direction of interest are passed withspecified gain and phase. The weights are chosen to minimize output variance or power subjectto the response constraint. This has the effect of preserving the desired signal while minimizingcontributions to the output due to interfering signals and noise arriving from directions otherthan the direction of interest. [4]

2.4.3 Null-steering Beamforming

Null steering represent a generalization of classical pattern synthesis techniques. The main beamthat is formed points in the direction of the desired signal and nulls are formed in the directionsof strong undesired sources. The null synthesis method is based on based on utilizing the spatialphases obtained from controlling the positions of the antenna array elements. [5]

2.4.4 Eigen-Beamforming

Eigen-beamforming aims to mitigate the disadvantage of traditional beamforming techniquesby implementing a degree of diversity combining along with the techniques. This can be usedto significantly enhance the performance of space and space-time processors and/or lead to sig-nificant complexity reductions and improved flexibility in a large variety of given mobile radioenvironments [6].



Section 3

Beamforming Setup

3.1 Introduction

Figure 3.1: Adaptive beamforming setup

Figure 3.1 shows the setup of an adaptive beamforming system. The actual setup present in thelaboratory has 9 receivers and also 1 transmitter which not shown in the figure. The beamform-ing system is based on the IEEE802.11a OFDM standard [7]. The system consists of an RFFrontend and a Baseband processor. The RF frontend does the filtering, amplification (LNA)and downmixing of the incoming signal. The Baseband processor has the task of detecting anddecoding data. In addition to this, the Baseband processor also calculates the weights neededby the phase shifters to steer a beam in a desired direction. This, in principle, is the idea behindadaptive RF processing. Some features of the RF frontend is being controlled by parameterscalculated in software during the baseband processing. In the following sections, we will lookat the transmitter functionality and then take a deeper look into the Baseband processor to havean insight on the tasks carried out in the system.



3.2 Transmitter

Figure 3.2: Transmitter block diagram

Figure 3.2 shows the components of the transmitter. It has been developed to be IEEE802.11acompliant. The data packets sent are equipped with service fields. Training sequences precedeevery packet that is sent by the transmitter. There are 10 short training sequences and 2 longtraining sequences before every packet

The transmitter also implements the following processing blocks

• Data Scrambler

• Convolutional Coder with configurable code rate (1/2 , 2/3, 3/4)

• Puncturer

• Interleaver

• Symbol mapping/ Modulation (QPSK, 16QAM, 64QAM)

• Subcarrier mapping

• IFFT

• Cyclic prefix append

3.2.1 Training Sequences

The IEEE802.11a [7] standard is used as a reference where 10 short training sequences (STS)and 2 long training sequences (LTS) are appended before every packet that is sent. The shorttraining sequence utilizes 16 sub-carriers of an OFDM symbol while the long training sequenceutilizes 64. Figure 3.3 shows the structure of an OFDM packet and the positions of the STS andLTS. The amplitude of the sub-carriers for the STS and LTS is 1.



Figure 3.3: OFDM packet structure

3.3 Receiver Baseband Processor

Figure 3.4: Baseband processor

Figure 3.4 shows the individual algorithms carried out in the Baseband processor as blocks. Theindividual blocks are explained in detail in the following sections.

3.3.1 Power Detection

The signal from the antenna is constantly sampled to detect an incoming signal. IEEE802.11a isused as a reference standard where 10 short training sequences and 2 long training sequences areappended before every packet that is sent. The power detection block determines the presenceof a signal on the radio channel. This implemented by an autocorrelation of the received signal.A power threshold is set and depending on whether the results of the autocorrelation exceedsthis threshold or not, the block decides if there is an incoming signal. The threshold level isempirically determined from simulations which calculate the power level of incoming signals atdifferent SNRs to reduce the probability false alarms and/or false detections.



Figure 3.5: Power Detection Plot

Figure 3.5 shows a plot of the autocorrelation results. In this case, the block decides thereis an incoming signal and the synchronization block is notified to carry out the coarse and finefrequency offset estimation.

3.3.2 Frequency and Timing Synchronization

The OFDM symbol in the IEEE802.11a standard is comprised of 64 carriers which share thechannel bandwidth. The distance in the frequency band between the sub-carriers is called thecarrier spacing which usually constant between all sub-carriers. Due to a mismatch between theoscillators in the transmitter and receiver as well as channel impairments, a frequency offset δ fis introduced. This offset has to be determined in order to avoid adjacent channel interference(ACI).

When it has been determined that there is an incoming signal, the synchronization blockstarts to process the headers associated with each packet. The baseband receiver needs to identifythe the beginning of the packet in the signal.

The synchronization block determines the frequency offset, δ f as well as the starting pointof the packet. To determine the starting index, a number of correlations are carried out on theSTS and LTS. We define a vector s with 16 sub-carriers, s = {s1,s2, ....s16}.

Skk(τ) =16

∑k=1

s(k) · s∗(k− τ) , τ = 0,1,2, ...SearchArea (3.1)

SST S =8

∑m=1

Skk(l−16 ·m) (3.2)

IST S = indexo f (max(|SST S|)) (3.3)



where s(k) is the reference short training sequence (STS) and s∗(k− τ) is the conjugate ofthe received short training sequence (STS) delayed by τ sub-carriers.

In Equation 3.2 we fold and sum the correlation results for all but 2 of the 10 short trainingsequences [7]. The first 2 sequences are ignored because the stability of the receiver may be inquestion for the first two symbols after receiving noise. The index of the peak in the resultingvector SST S determines the start of the short training sequence. We implement a similar operationto determine the start of the long training sequence in Equations 3.4 to 3.6

Sqq(τ) =64

∑q=1

s(q) · s∗(q− τ) , τ = 0,1,2, ...SearchArea (3.4)

SLT S =2

∑m=1

Sqq(l−64 ·m) (3.5)

ILT S = indexo f (max(|SLT S|)) (3.6)

The starting index of the packet is determined as

I = IST S + ILT S (3.7)

The estimation of the frequency offset happens in two phases. The first phase is calledthe coarse synchronization phase. This is considered to be coarse synchronization because theestimation acquired is unacceptable for the decoding of the OFDM data. The coarse offset isestimated according to 3.8, where Fs is the sample rate in Hz. The second phase, called thefine synchronization phase is used to increase the accuracy of the coarse estimation. The finefrequency offset is determined in 3.9 and the total estimated frequency offset is a combinationof both the coarse and fine frequency offsets 3.10.

δ fcoarse =Fs

2 ·16 ·π·arg(SST S(IST S)) (3.8)

δ f f ine =Fs

2 ·64 ·π·arg(SLT S(ILT S)) (3.9)

δ f = δ fcoarse +δ f f ine (3.10)

3.3.3 Channel Estimation & OFDM Demapping

The long training sequence (LTS) is processed by channel estimation to determine the channelresponse vector for each receive antenna. Initially, the frequency offset in the received LTS iscorrected. The LTS on each receive antenna consists of 2 OFDM symbols, which are averagedand normalized according to 3.11. l1,n and l2,n are the long training sequences for the n-th an-tenna. The averaged sequence is then transformed into the frequency domain by the Fast FourierTransform (FFT) algorithm and multiplied with the reference LTS to generate the channel re-sponse vectors in 3.12. The pilot carriers are removed and the channel response vectors are usedlater to calculate the beamforming weights as well as the equalizer coefficients.

ln =l1,n + l2,n

2(3.11)



Hn = F{ln} ·Lre f (3.12)

where Lre f ⊂ {−1,1} is the reference long training sequence.

3.3.4 Beamformer

This block calculates the weights that are needed to update the phase shifters on the analogcircuits in the RF frontend. The carriers in the channel response vectors for each receive antennaare averaged over frequency. Subsequently, the vector received from this operation is multipliedby its complex-conjugate version to generate an N xN matrix, where N is the number ofreceive antennas.

Havg,n =1

M

M

∑m=1

Hn,m

Wn = max(eig(H∗avg,n ∗Havg,n)) (3.13)

where M the number of subcarriers.

Wn is the dominant eigenvector of H∗avg,n ∗Havg,n and this vector is used as weights for each

receive antenna. The weights are sent on a feedback line to the RF frontend and are also used toin combination with the channel response vector calculate the equalizer coefficients.

3.3.5 Carrier tracking

The main function of the carrier tracking block is to eliminate Inter-(sub)carrier Interference(ICI) by correcting the common phase error (CPE) which is caused by phase noise.

Figure 3.6: Carrier Tracking control loop

Figure 3.6 illustrates the algorithms carried out by the carrier tracking loop. The signal com-ing from the analog front-end (AFE) is sampled by the ADC. The signal s(t) has a phase offset,



δφ and frequency offset, δ f and this is represented by s(t) = s(t)e− jωt+δφ . The frequency off-set which corresponds to the e jωt component is corrected by the synchronization block whenmultiplied by c(t) = e jωt . The signal is transformed to the frequency domain by the FFT andthen the OFDM demapper extracts the pilots which were embedded in the signal. There are fourpilots each with an amplitude of 1, and with a averaging function, the phase offset dP = e jδφ isdetermined in the phase detect block and subsequently corrected in the CPE correction block.

The phase offset calculated as described above only depends on the four pilots in the currentsymbol. To increase the accuracy of the offset, the 2nd-order control loop in Figure 3.6 is used.The loop consists of one component c2.δφ which incorporates the current phase offset calculatedby the current pilots and a second component an shown in Equation 3.14 which incorporates thephase offsets calculated by the pilots in the previous symbols that have been received. Simula-tions were carried out to determine suitable values for the coefficients c1 and c2.

an =c1.δφ

1−Z−1 (3.14)

an = an−1 + c1.δφ (3.15)

where an−1 = an.Z−1

bn = an + c2.δφ (3.16)

Both components of the phase offset are combined in Equation 3.16. When the 2nd ordercontrol loop is used, the variable c(t) in Figure 3.6 is now includes the phase offset componente jωt+δφ which should , ideally, correct the symbol x(t).

The phase offset for the next symbol is updated and the process is repeated for the rest of thesymbols in the packet.

3.3.6 Equalizer

Figure 3.7: Equalizer Block Diagram

After common phase error (CPE) correction, the data in the the signal is now ready to undergoequalization and demodulation. The equalizer block carries out these tasks. The OFDM sym-bols from the various receive antennas are initially combined into one stream by summing upthe various streams with different degrees of attenuation which has been calculated by the beam-forming block. The combined stream is now demodulated from a 64-QAM constellation into



soft-bits which are subsequently equalized with coefficients also received from the beamformingblock.

3.3.7 Receiver Chain Analysis

The initial analysis was done on the blocks in the receiver chain. The aim is to obtain an approx-imate operation count, identify the bottlenecks and suggest solutions to resolve the bottlenecks.

The functions analyzed are.

• Synchronization

• Frequency offset correction

• Channel Estimation

• Sub-carrier demapping

• Carrier tracking loop

• Equalizer and QAM-Demapping

• De-interleaver and

• Depuncturer

• Viterbi decoder.

Baseband Processor Profile

The Matlab profiler is used to present an overview of the time spent in the functions. Figure 3.8shows a picture with the results from the profiler.



Figure 3.8: Matlab Profiler

Figure 3.9 shows an estimated number of operations for the blocks in the receiver chain.The estimation was arrived at by going through every block and counting the number of memoryloads, memory stores and operations needed by the block. Loops are also taken into account byestimating the number of times a loop will be executed and if this cannot be estimated, a worstcase scenario is used. Since, the counts are worst-case, the actual operation counts are expectedto be lower but the relative factors between blocks would stay roughly the same. This is carriedout to give an impression of which blocks will be regarded as bottlenecks.



Figure 3.9: Operation Count Estimation

Bottlenecks

This section identifies the bottlenecks and proposes solutions to resolve them.The first bottleneck is identified as the synchronization block. This is a result of the mov-

ing windows that are sliding over the signal for correlation. The function implements traditionalDSP functions (correlations) and can be optimized by tuning the code and exploiting the pipelin-ing capabilities on a DSP.

The second bottleneck identified is the Viterbi Decoder with lots of conditional executions andwhich are not traditional DSP functions with Multiply-accumulates.



The third bottleneck which can be identified at this stage is the method of acquiring data ontothe hardware (I/O). Data coming in from the ADC may not be acquired at the same rate to theprocessor can process data.



Section 4

Algorithm Implementation

4.1 DSP Selection

4.1.1 Evaluating Processor Performance

There are several ways of evaluating the performance of a processor or rather choosing a proces-sor that best suits the designer’s application. DSP vendors supply a number of metrics describingthe performance of their processors, some of which are explained below.

MIPS

MIPS in full stands for Millions of Instructions per Second. This is obviously the number ofinstructions a processor can carry out in one second with respect to its instruction set. This canbe calculated as

§MIPS = §Instruction per Cycle∗106/§Cycle Time (4.1)

MOPS/MFLOPS

MOPS/MFLOPS in full stands for Millions of Operations per Second/Millions of FLoating-Point Operations per Second. The MFLOPS metric is used when the DSP in question is afloating point DSP.

MMACS

MMACS in full stands for Millions of Multiply Accumulates per Second. The operation (MAC)is a central signal processing operation that occurs in blocks like filters, correlation, convolutionetc.

Application Benchmarking

A common approach used to benchmark computer systems is to use complete applications, oreven suites of applications. Application benchmarking allows the memory usage and energyconsumption performance of a processor to be measured.



Algorithm Benchmarking

Algorithm blocks are the building blocks of most signal processing systems and include func-tions such as fast Fourier transforms, vector additions, filters etc. Each algorithm in the appli-cation is simulated independently measuring the performance of each DSP. Hereby the criticalportions of the application can be highlighted. Using this method is very practical because thisway the developer can determine the DSP that suits his application.

4.1.2 Choosing a processor

The metrics (MIPS, MFLOPS, MMACS) can be very vague and misleading because DPSs havedifferent instruction sets especially DSPs from different vendors and the amount of work carriedout by one instruction varies greatly from one DSP to the other. As for the MAC instruction,there are a lot of other operations a DSP must carry out in an application, so depending on theMMAC metric alone would be insufficient. Moreover they do not address secondary perfor-mance issues like memory usage and power consumption. This is a severe limitation becauseexecution time means little if application memory requirements exceed system constraints.

Furthermore, if high memory consumption requires using slower external memory, then theprocessor’s speed may be reduced. Likewise, in a portable application, a processor is useless ifits power consumption exceeds the available battery capacity. Many manufacturers quote a typi-cal power consumption at a given clock rate. However, power consumption varies with differentinstructions and data values, so such specifications are suspect without details on the preciseinstructions and data used in the measurement. Furthermore, such measurements do not accountfor special power-saving modes available when a processor (or portions of it) is idle.

The metrics MOPS, MIPS, MFLOPS are not totally useless. They can be used, for instanceto familiarize oneself with the DSPs and in sorting out the DSPs into groups which are selectedfor testing according to mixture of the criteria application and algorithm benchmarking.

We chose to test two DSPs for the beamforming system.

• Texas Instruments TMS320C6713

• Texas Instruments TMS320C6455

These DSPs were chosen because their high performance capabilities. The TMS320C6455operates at a clock rate of 1GHz. It also has a dedicated hardware to carry Viterbi processing.Actually, the Texas Instruments TMS320C6455 was the preferred processor because it is a fixedpoint processor and the implementations would execute faster than it’s floating point counterpart,but due to the delay in the order process and the availability of the TMS320C6713 as well as thesimilarity in the developments steps for both processors, the TMS320C6713 was also added tothe list. We shall discuss the properties and implementation steps taken for both processors inthe next sections.

4.2 C/Floating Point Implementation

This section describes the procedure which was implemented to convert the existing algorithmsin Matlab into the C programming language. The algorithms were divided into functional blocksto organize the processing. Figure 4.1 shows a UML 2.0 top level class diagram of the baseband



receiver. The Unified Modeling Language (UML) uses graphical notations to create an abstractmodel of a system [10]. UML class diagrams show the classes of the system, their inter-relationships, and the operations and attributes of the classes. Figures 4.1, 4.2, 4.3 shows amodel of the classes which are used in my program.

Figure 4.1: Baseband Receiver- Top Level Class Diagram

The cluster block Data Generator simulates the data acquisition by generating random data,The Header Processing cluster includes all the functions which processes the training sequencesand the Data processing cluster as implied, processes the actual data in the received packets. TheUtilities cluster include function needed by various blocks in all cluster e.g the FFT needed inthe channel estimation as well as the symbol tracking.

4.2.1 Header Processing

Figure 4.2 illustrates in depth the functions carried out by the blocks in the header processingcluster. The diagram shows the attributes of each block followed by the operations/functionsand the the input parameters accepted by the function.



Figure 4.2: Header Processing - Class Diagram

4.2.2 Data Processing

Figure 4.3 illustrates in depth the functions carried out by the blocks in the data processing clus-ter. At this point, we highlight the control structures which are given as parameters to the blockswhich are identified by XX CT RL where XX is the name of the block. These act a type of mes-sage passing between functions as well as initialization parameters for the specific functions.The structure CONFIG CT RL holds important configuration parameters ad data needed by thebaseband receiver, including the channel estimation matrix, the frequency offset and beamform-ing weights.

Figure 4.3: Data Processing - Class Diagram

4.2.3 Configuration Utilities

The Utilities cluster include function needed by various blocks in all cluster. Starting withthe bf config block which implements among others, the functions needed to multiply complexnumbers, multiply complex conjugate versions, calculate the matrix transpose etc. The controlstructures consists of the configuration structures used in all blocks in the chain. Finally, TheFFT block implements DSP FFT signal libraries for efficient FFT implementation.



Figure 4.4: Configuration - Class Diagram

4.2.4 Baseband Processor - Communication

Figure 4.5 shows the flow of communication between the individual blocks in the baseband pro-cessor. The data generator sends the incoming packets to the synchronization as well as thesymbol synchronizer which unpacks the data field into parallel streams for processing. To dothis, it needs the index to the starting point of the data, which it receives from the synchroniza-tion block. Also, the synchronization block processes the headers and passes the long trainingsequence (LTS) to the channel estimation block which in turn calculates the channel matrix forthe beamforming block to calculate the weights which are sent to the analog circuits at the re-ceiver front-end. Other outputs from the beamforming block include the coefficients needed bythe equalizer.

Figure 4.5: Baseband Processor - Communication Diagram



4.3 Fixed Point Implementation

4.3.1 The Q. Format

Most fixed-point DSPs have a 16-bit data width shown above. This data structure is suitable forrepresenting fractional data types i.e numbers between -1 and 1 which occur frequently in DSPapplications and also integer types i.e whole numbers between -32768 and 32767 or positivenumbers between 0 and 65535. This range depends how the variable is interpreted i.e unsignedor signed.

For the fractional data types, the range [-1,1] can be extended although at the cost of preci-sion because the distance between representable fractions is increased. In Figure 4.6, we can seethe Q.15 format which is also applied by Texas Instruments for its fractional data types. Thisformat has been extended to represent fractions in the range [-8,8] by using the Q3.12 formatillustrated in Figure 4.7.

Figure 4.6: Fractional Q.15 Format

Figure 4.7: Fractional Q3.12 Format

Q.15number = −a020 +a12−1 + · · · ·+a152−15 (4.2)

In a Qm.n format, there are m bits used to represent the 2s complement integer portion ofthe number, and n bits used to represent the 2s complement fractional portion.m+n+1 bits areneeded to store a general Qm.n number. The extra bit is needed to store the sign of the numberin the most significant bit position. The representable integer range is specified by ( −2−m,2−m)



and the finest fractional resolution is 2−n. For example in the Q3.12 format the finest fractionalresolution is 2−12 = 2.441× 10−4 and for the Q.15 format the finest fractional resolution is2−15 = 3.05×10−5.The Q.15 is assembled according to Equation 4.2

Negative numbers are represented using the two’s complement which is calculated by per-forming an XOR operation between the positive number and the highest positive value, in thiscase 65535 and then adding 1.

One disadvantage of fixed-point implementations is overflows. The decimal point is fixedand is prone to overflows, for example, when 2 fixed point numbers are multiplied in hardware,the result is a 32-bit number which is wider than the normal 16-bit used to store the number andthis has to be scaled down to a 16-bit number and this might give different results, also whenfixed-point numbers added together, great care has to be taken to ensure that the result is not outof range.

4.3.2 System Analysis

To develop a fixed point implementation of the the baseband processor, we need to analyze eachfunction and variable. The results of this analysis should determine the ranges of each variablein order to assign a suitable Q. format for it.

In the following sections, we go through the Data Generator, Synchronization, Channel Esti-mation functions and discuss the issues that needed to be dealt with. We chose these functionsto illustrate the fixed point implementation because they contain notable issues to be discussedand the other functions which are implemented contain the same issues and are not listed hereto avoid repetition.

Data Generator

The input to the baseband processor is a vector of complex numbers in In-phase (I) and Quadra-ture (Q) components. These numbers are converted to a Q3.12 format which has a range between-8 to +8 which is also the range of the A/D converter. This is chosen with the knowledge of theformat of the data that will be acquired from the A/D converter.

Synchronization

The main task of the synchronization is to determine the start of the data through the use ofcorrelations. An issue that had to be solved was the possibility of overflow in the correlation re-sults. The operands of the correlation are the input signals which are in the Q3.12 format whichis 16-bit wide. The results of the correlations are stored in a Q19.12 format which is 32-bit wide.

Another issue we encountered was the determination of the fine frequency offset. After cor-relation, the phase difference between peaks would be needed to calculate the offset. The quan-tization errors induced by the correlation as well as the errors induced by the fixed point trigono-metric implementation lead to an unacceptable error in the fine frequency offset. To solve thisproblem, the trigonometric operations are carried out in floating point.

Channel Estimation

Since the implementation platform was known at this stage to be a Texas Instruments DSP, wedecided to implement the FFT libraries for the channel estimation block. The libraries were



modified to enable an implementation for various Q formats. In the channel estimation block,the FFT library for modified for the Q3.12 format.

Beamforming

To determine the equalization coefficients, the intermediate results were expanded to a 32-bitQ4.28 format. The actual coefficients are stored in a Q19.12 format.

A fixed-point implementation to calculate the beamforming weights did not generate accurateweights. The quantization errors did not allow for correct numerical processing so the calcula-tion of the eigen values of the channel matrix is carried out in floating point.

Figure 4.8 shows which functional blocks run on the different hardware platforms.

Figure 4.8: Setup Architecture

4.3.3 Performance Analysis

The existing implementation was done with Matlab and had been thoroughly tested. The goal atthis stage was to compare the results of the Matlab simulations with the new fixed-point simula-tions to determine the implementation loss introduced by fixed point processing.

To carry out this process, the individual blocks were fitted with MEX-interfaces and were com-piled as libraries so they could be accessed from Matlab. The simulations tested the fixed-pointalgorithms including the frequency offset estimation, carrier tracking and phase correction.

The parameters of the simulation environment were

• Data rate: 54 Mbit/s

• Channel model: 50ns delay spread channel

• 10ppm carrier offset



Figure 4.9: Packet Error Rate vs. SNR for Fixed-point, Floating-point and Matlab Implementa-tions

In Figure 4.9, the packet error rate (PER) is plotted against the signal-to-nosie ration (SNR)of the input signal. The different sets in the plot show the performance of the algorithms thatwere implemented in Matlab as well as the floating and fixed-point implementations. The maxi-mum packet error rate that we consider as required is 10−2 but we simulate past this rate to deter-mine where the floors occur in all simulations. The difference in the floors of the three datasetscome from the quantization of the different levels. The fixed and floating-point implementa-tion is 16 and 32-bit wide respectively compared with the 64-bit wide Matlab implementation.We conclude from the simulations that there is no implementation loss due to the fixed-pointimplementation at the relevant signal-to-noise ratios (SNR).

4.3.4 Data Acquisition

To simulate the receiver on the DSP, we decided to integrate the DSP with Matlab in a so-calledHardware-in-the-Loop architecture. The transmitter functions as well as a channel model werealready available in Matlab, so we modified the matlab simulations to transmit each successivepacket to the DSP, the DSP processes the packets and send the beamforming weights as wellas the equalizer output back to Matlab. To do this, we replaced the data generation block inFigure 4.5 with a data acquisition block. The data acquisition block used the Real-Time DataExchange (RTDX) interface of Texas Instruments to implement the transfer.

Real-Time Data Exchange (RTDX) is a technology developed by Texas Instruments that enablesreal-time bi-directional communication between a digital signal processor (DSP) or microcon-troller and a host application. RTDX consists of both target and host components. A small



RTDX software library runs on the DSP application and this application makes function callsto this library’s API in order to pass data to or from it. This library makes use of a scan-basedemulator to move data to or from the host (Matlab) platform via a USB interface. Data transferto the Matlab occurs in real-time while the target application is running.

4.4 TMS320C6713 DSP Implementation

The TMS320C6713 devices compose the floating-point DSP generation in the TMS320C6000DSP platform. The C6713 device is based on the high-performance, advanced very-long-instruction-word (VLIW) architecture developed by Texas Instruments (TI). [8].

Operating at 225 MHz, the C6713C delivers up to 1350 million floating-point operations persecond (MFLOPS), 1800 million instructions per second (MIPS), and with dual fixed-/floating-point multipliers up to 450 million multiply-accumulate operations per second (MMACS).

The TMS320C67x DSP generation is supported by benchmark development tools, including anoptimizing C/C++ Compiler, the Code Composer Studio Integrated Development Environment(IDE), JTAG-based emulation and real-time debugging, and the DSP/BIOS kernel.

Some features of the TMS320C6713 include

• High-Performance Floating-Point Digital Signal Processor (DSP)

• Eight 32-Bit Instructions/Cycle

• 32/64-Bit Data Word

• 225-MHz Clock Rate

• Very Long Instruction Word (VLIW) architecture

• Instruction Set Features Native Instructions for IEEE 754 format

• 512M-Byte Total Addressable External Memory Space

4.4.1 Issues

This section describes some issues encountered when porting the software code onto the DSP.

The first is the implementation of the software stack. While, implementing under a WindowsPC environment, the stack size did not pose a problem because of the readily available memoryspace. On the TMS320C6713 DSP, there is only 4KB available for the program cache and with-out any modifications, the implementation caused a stack overflow. The synchronization blockwas identified as being exceptionally problematic in this regards. Due to the long vectors neededto store the correlation results, a stack overflow occurred. The solution was to restructure thecode and only calculate and keep in memory the vectors that were needed immediately. This ledto some increase in processing time but within reasonable limits.



4.5 TMS320C6455 DSP Implementation

The TMS320C6455 device is a high-performance fixed-point DSP generation in the TMS320C6000DSP platform. It has a performance of up to 8000 million instructions per second (MIPS) [or8000 16-bit MMACs per cycle] at a clock rate of 1 GHz. [9]

Figure 4.10: TMS320C6455 Development Starter Kit

Figure 4.10 shows the development board we used to implement the receiver algorithms. Ithas a C6455 DSP on it which is highlighted in the figure, with a USB connection to the host PCused to control the board and other peripherals.

Although not implemented, the receiver chain algorithm included a Viterbi decoder and theC6455 DSP was chosen because it had an embedded Viterbi Decoder coprocessor.

The C6455 has a complete set of development tools which includes: a new C compiler, anassembly optimizer to simplify programming and scheduling, and a Windows R© debugger inter-face for visibility into source code execution.

Other features of the TMS320C6455 include:

• 1-GHz Clock Rate

• Eight 32-Bit Instructions/Cycle

• 8000 MIPS/MMACS (16-Bits)

• L1/L2 Memory Architecture:

• 256K-Bit (32K-Byte) Program Memory

• 256K-Bit (32K-Byte) Data Memory

• 16M-Bit (2048K-Byte) RAM

• 32M-Byte Total Addressable External Memory Space

4.5.1 Profiling and Optimization

This section explains the procedure of examining and optimizing the beamforming DSP softwareon the Texas Instruments Integrated Development Environment (IDE) called Code Composer



Studio. Two tools which we extensively used examine the software were the Tuning Dashboardand the Compiler Consultant.

4.5.2 The Tuning Dashboard

The Tuning Dashboard provides a focal point of all profile data to be collected and viewed,suggestions for what Tuning tool to use next, and the standard place from which to launch eachof the Tuning tools. It consists of the the Goals Window, the Profile Setup and the Profile Viewer.

The Goals window allows you to indicate your tuning priorities. It shows the number of cy-cles used the application and the code-size each time the application is run. It is used to comparethe performance between each run after an optimization step has been taken.

Figure 4.11: Code Profiling - Goals Window

The Profile Setup window allows you to determine the general or specific items for whichyou want data collected. This includes setting which functions or loops that are to be profiled,specifying the program ranges for which data is to be collected and setting halt or exits points inthe application. After the profile has been setup, the application can be run to acquire informationon its performance.



Figure 4.12: Code Profiling - Profile Setup

The procedure we took for optimizing the application to exploit the CPU capabilities is asfollows.

• Compile the program with full optimization enabled and include symbolic debugging in-formation.

• Set up all functions as profile ranges and run the application to generate profile data onthe execution of the entire program.

• If the application already meets real-time constraints, then optimization is not needed,which was not the case.

• Use the Profile Viewer to identify the bottlenecks. This procedure will be explainedshortly. Set up more profile ranges to further subdivide that function and run the ap-plication to generate profile data.

• Repeat the previous step until the function(s) or loop(s) causing the bottlenecks are foundand then rewrite code using the advice given from the Compiler Consultant.

• Repeat all steps.

The data collected during profiling is displayed in the Profile Viewer window. For all func-tions or loops that have been setup for profiling, it displays the access count, the number of cyclesused by the function itself including all sub-functions as well as excluding all sub-functions. Fig-ure 4.13 shows the profile viewer window for the synchronization block after it was identified tobe a bottleneck.



Figure 4.13: Code Profiling - Profile Viewer

Figure 4.14: Code Profiling - Profile Cycle Count

Figure 4.14 shows the initial cycle count for the complete receiver chain without any opti-mization steps taken.

4.5.3 Optimization

Several optimization steps were taken to reduce the amount of cycles needed by the receiveralgorithms. This section explains the steps which are implemented.

Function Inlining

The compiler cannot optimize loops that contain function calls, hereby function calls are avoidedin loops. Small functions are also declared inline to maintain the readability of the code.

Alignment

When the compiler can determine that pointers to data are aligned on certain address boundaries,usually 4-byte or 8-byte boundaries, it can combine multiple operations together into a single



instruction. In some cases, information about pointer alignment has to be given to the compilerto enable it to optimize the code

Eliminate Aliasing

Aliasing is when two or more memory references, one of which is typically through a pointer,may access the same memory location. When the compiler can determine that these memoryreferences do not alias, optimization often improves. The ”restrict” keyword is used to give in-formation to the compiler that memory references do not overlap.

Loop information

The compiler can often improve the optimization of loops when it knows certain things about thenumber of times a loop iterates. The number of loop iterations is called the Trip Count. It canbe beneficial to know the minimum or maximum trip count, or whether the trip count is alwaysa certain multiple. The MUST ITERATE pragma which is specific to the Code Composer StudioIDE can be used to inform the compiler of the properties of the trip count for the loop.

Remove Overflow check

Up until now, the fixed point processing had always checked for overflows and the variableswere saturated to a maximum values when this occurred. This took a significant part of theprocessing because of the conditional statements which are not traditional DSP flow processing.Removal of the overflow check would have caused a wrap-around when an overflow occurred,but in some cases, the functions had been verified and the removal of the checks would not posea problem. In other cases, the C64x- specific instructions which implemented saturation wereused. An example of this is the sadd2 instruction which adds to signed 16-bit integers withsaturation.

Packed Data Processing

The C64x DSPs has the capability called packed data processing which is the ability to operateon multiple data with one instruction, The data to be processed can be packed into one variablee.g two 16-bit integers are packed into one 32-bits integer and the the operation is carried out onthis operands. The C64x DSPs also have specific instructions called mnemonics that carry outcommon signal processing operations which implement packed data processing. An exampleis the DOP2 mnemonic, which calculates the dot-product between two pairs of signed, packed16-bit values.



Figure 4.15: C64x Dot product mnemonic - dst = dot p(src1,scr2)

All these optimization steps were applied to the application. Figure 4.16 shows the improve-ment in the simulation cycle count for various steps. The dataset original shows the cycle countfor the functions without optimization. The dataset saturation shows the cycle count after theoverflow checks were removed from the code. The step sync opt shows, the cycle count whenpacked data processing was implemented for the synchronization block. The last optimizationstep, opt all includes all the optimization steps that were described earlier e.g function inlining,aliasing, data alignment and loop information.



Figure 4.16: Code Profiling - Profile Setup

4.5.4 Implementation Results

Table 4.1 shows the improvement in raw simulation time i.e without taking data Input/Output(I/O) into account. This shows a speed improvement from the Matlab simulations of a factor of47.4 when compared with the fixed point implementation on the C6455 DSP. The floating-pointbenchmark values that was simulated on the C6713 DSP does not show an improvement whencompared to the Matlab simulation because no optimization techniques were implemented asthe C6713 DSP was not the final target platform.

Platform Pentium IV C6713 C6455Matlab C (floating-point) C (floating-point) C (fixed-point)

Simulation time 360ms 50ms 49ms 7.6ms

Table 4.1: Simulation time benchmarking



Section 5

Testbed Setup

5.1 Setup Components

The setup is designed to test the algorithms that have been developed in a hardware setup whichsimulates a real world environment with a transmitter and a receiver as well as a real radiochannel as the transmission medium. The data to be transmitted is generated in Matlab andsent over RF frontend to the antenna. The receiver system consists of 9 antennas arrangedin a circular array which illustrated in Figure 5.1. The array is connected to analog circuitsimplementing the receiver frontend with attenuators and phase shifters which are controlled byFPGA’s. All streams are connected to one ICS652 A/D converter which is inserted to a PCI slotin a computer.

Figure 5.1: Testbed Setup

The data acquired from the data is then processed with Matlab showing a Graphical UserInterface (GUI) with the necessary parameter to be monitored. The ideal idea to integrateout system into this setup would be to purchase a daughter card with A/D converter for theTMS320C6455 DSK and directly acquire the data from the frontend. But, due to the time re-striction on the project and the number of development hours that would be needed to implementthis frontend, it was decided to make use of the existing A/D converter and interface the DSPwith the computer and have a working demonstration platform in time.

The following sections describe the steps taken to integrate the DSP hardware into the setup.



5.2 Data Acquisition

The system is illustrated in the communication and sequence diagrams in Figures 5.2 and 5.3.The system control was implemented in Visual Studio 2005 with the C++ programming lan-guage.

Figure 5.2: Testbed Setup - Communication diagram

Figure 5.3: Testbed Setup - Sequence diagram

Initially, the A/D converter is initialized for data acquisition. The sampling frequency is setat 64MHz. The A/D converter and the system control communicate over the serial port. A loopis then set-up. In each run of this loop, the system control monitors and passes data to all partiesin the setup to manage the data processing. To acquire data from the A/D converter, the FPGA’sneed to fill the A/D’s FIFO buffers. This trigger is sent to a Matlab script which is connectedover a TCP connection to the system control. When an acknowledge has been received that thebuffers are filled, then the system control begins the data acquisition.



Some processing is done before that data is sent to the DSP. This includes, DC componentremoval, down-conversion and sample rate conversion to 20MHz. After this, the data is thenarranged into serial sequences signifying information for each receive antenna. To understandwhy this is done, we need to explain the format of the data sent by the transmitter.

Figure 5.4: Headers sent by the transmitter

Figure 5.4 shows the format of the data which is transmitted over the channel. It consists of arepeating sequence of long training sequences (LTS) which we refer to as headers. The objectiveis to train the receive channels with the beamforming weights for length of time before the datais actually transmitted. However, the implementation for the data detection is not included in thesetup at this stage.

To train the receive channels, the attenuators are set to attenuate the information sent over thechannels in a sequence. Basically, this is used to acquire data from only one receive channel ata certain point in time. For the duration of 3 OFDM symbols, receive channel 1 is activated andthe rest not, then for the next 3 OFDM symbols, receive channel 2 is activated and this goes ontill channel 9.

5.3 Processing

After the acquisition and initial processing, the baseband processing can begin. The headers aresent to the DSP using RTDX which is implemented in Visual Studio this time around. The DSPcalculates the weights for each receive branch/channel and are returned also through RTDX.When the control system receives these weights, it sends them over a TCP connection back toMatlab which sets the phase shifters on the FPGA’s as well as display the data to be monitoredon a GUI that has been implemented.



Figure 5.5: Matlab GUI for monitoring

Figure 5.5 shows the GUI that displays information on the receiver performance. For eachreceive antenna, the phase shift and attenuation is calculated and displayed in the figure e.gwe can see that, for the antenna on branch 9, a phase shift of 0.75π and an attenuation levelof -20.43dB is calculated. The radiation pattern of the antenna array is also displayed belowthe branches. When the location of the transmit antenna changes, we also see a change in theradiation pattern tending towards the transmitter.



Section 6

Conclusions

In this report, we have illustrated the implementation of a Beamforming receiver on a DigitalSignal Processor (DSP). Initial simulations are done in Matlab and the suitable algorithms arechosen. However, to investigate the benefits of beamforming in fast -changing environments,more processing power is needed so the calculations can be done in real-time.

Adaptive beamforming improves the Signal-Noise Ratio (SNR) of a signal by focusing on thetransmitter with a narrow beam and reducing output in undesired directions, albeit with morecomplexity compared to switched beamforming. Combining beamforming with Multiple-inputmultiple-output (MIMO) antenna systems which use spatial multiplexing to enhance the peakdata transmission rate and/or transmit diversity methods to enhance the robustness of the trans-mission, enhances the performance of a wireless transmission system.

A 1GHz DSP from Texas Instruments, the TMS320C6455, is chosen as a development plat-form. The algorithms are converted to the C programming language and are implemented infixed point before it is ported to the DSP. We carried out simulations to ensure the integrity ofthe new implementation with the Matlab implementation as a reference. After optimizing thecode using various methods which have been discussed, we showed that we can obtain a speedimprovement by a factor of 47. Without, the optimizations and the code running on the samegeneral purpose processor as Matlab, we obtain a speed improvement of 7.

The DSP implementation was subsequently integrated in a hardware demonstrator and the con-cept of a beamforming receiver running on a DSP in real time is deemed to be feasible.

6.1 Outlook

To fully exploit the processing power of the DSP, the hardware should be fully integrated in thedemonstrator. Currently, the A/D converters are connected to the PCI slots of a PC and the datais acquired through Visual Studio with some pre-processing before it is transferred to the DSP.Future work on this project could add a daughter card to the DSP board with an A/D converterdirectly connected which will eliminate the data acquisition which is currently the bottleneck.This would enable real-time performance.



Appendix A

Acknowledgements

I want to thank God for the plentiful blessings I have received and guiding my career steps, myparents for their support and love and my able supervisor, Maurice Stassen for his mentoring,interest he has shown in my work and his help in advancing my career.

I also want to thank my supervisors and colleagues at TU Eindhoven for their support andlast but not least, my friends for the much-needed refreshing times at the end of the work day.



References

[1] A 60 GHz Integrated Antenna Array for High-Speed Digital Beamforming Applica-tions Ji-Yong Park, Yuanxun Wang Member, Tatsuo Itoh, Microwave Symposium Di-gest, 2003 IEEE MTT-S International Volume 3, 8-13 June 2003 Page(s):1677 - 1680vol.3

[2] A Novel Algorithm for Uplink Interference Supression using Smart Antennas in Mo-bile Communiciations. Shetty, Kiran Kumar, Department of Electrical and ComputerEngineering, Florida state university. February 2004.

[3] A switched beamforming system with multiuser detectors Hak-Lim Ko; Bong-WeeYu; Jong Heon Lee; Vehicular Technology Conference Proceedings, 2000. VTC 2000-Spring Tokyo.

[4] B. D. Van Veen and K. M. Buckley, Beamforming: A Versatile Approach to SpatialFiltering, IEEE Acoust., Speech, Sig. Process. Magazine, pp. 4-24, Apr. 1988.

[5] T. Ismail and M. Dawoud, ”Null steering in phased arrays by controlling the el-ements positions”, IEEE Trans. Antennas Propagation, vol. AP-39, PP.1561-1566,NOV., 1991.

[6] Eigenbeamforming - A Novel Concept in Array Signal Processing Joachim S. Ham-merschmidt, Institute for Integrated Circuits, Munich, Christopher Brunner, Institutefor Circuit Theory and Signal Processing and Christian Drewes, Institute for Inte-grated Circuits, Munich

[7] ”Part II: Wireless LAN medium access control (MAC) and physical layer (PHY) speci-fications. High Speed Physical Layer in the 5GHz Band”. IEEE Standard 802.11-1999.The Institute of Electrical and Electronics Engineers, 1999.

[8] TMS320C6713 Floating-Point Digital Signal Processor Datasheet www.ti.com -tms320c6713.pdf

[9] TMS320C6455 Floating-Point Digital Signal Processor Datasheet www.ti.com -tms320c6455.pdf

[10] The Unified Modeling Language User Guide Grady Booch, James Rumbaugh, IvarJacobson


Adaptive Antenna-array Processing - TU Delft

Documents

Transcript of Adaptive Antenna-array Processing - TU Delft