AN INFORMATION ELASTICITY FRAMEWORK FOR CONSTANT …

The Pennsylvania State University

The Graduate School

AN INFORMATION ELASTICITY FRAMEWORK FOR

CONSTANT FALSE ALARM RATE DETECTION

A Thesis inElectrical Engineering

byAndrew Z. Liu

© 2020 Andrew Z. Liu

Submitted in Partial Fulfillmentof the Requirements

for the Degree of

Master of Science

May 2020

The thesis of Andrew Z. Liu was reviewed and approved by the following:

Ram M. NarayananProfessor of Electrical EngineeringThesis Adviser

Timothy J. KaneProfessor of Electrical Engineering

Muralidhar RangaswamySpecial Member

Kultegin AydinProfessor of Electrical EngineeringHead of the Department of Electrical Engineering

ii

Abstract

Within a decision making process, adjusting the amount of available information generally

causes the effectiveness of decisions to change. Often, an increase in this information quantity

causes the decision effectiveness to improve. However, under certain circumstances, increas-

ing the amount of information beyond a certain point causes the decision effectiveness to

suffer. This phenomenon, known as information overload, presents many important research

problems. One major concern is determining how much information a decision maker needs

for the decision effectiveness to be maximized. Another key problem is defining the metrics

that are used to model information quantity and decision effectiveness, given the specific

contextual factors and preferences of a decision maker. Recently, the concept of information

elasticity has been proposed to address these problems.

This thesis aims to design a framework using the concept of information elasticity to

observe the usability of information within different constant false alarm rate detectors.

Within this framework, the different factors which either benefit or hinder the performance

of these detectors are studied, and are used along with contextual factors to characterize the

effectiveness of decisions. Within this thesis, two different applications of this framework are

studied. The first involves the ordered statistics constant false alarm rate detector, and the

second involves the adaptive matched filter. The point at which information overload occurs

is uncovered within each of these applications, allowing a decision maker to make choices

that maximize the decision effectiveness.

iii

Contents

List of Figures vii

List of Tables xi

Dedication xii

Acknowledgments xiii

1 Introduction 1

1.1 Introduction to Information Elasticity . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Introduction to CFAR detection . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background Theory 5

2.1 Information Elasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 Application of Information Elasticity: Phase coded modulation . . . . 7

2.2 Topics in Multi-Objective Optimization . . . . . . . . . . . . . . . . . . . . . 11

3 Fundamentals of CFAR Detection 14

3.1 Detection Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.1 Coherent Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.2 Range Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.3 Performance measures for detection . . . . . . . . . . . . . . . . . . . 17

iv

3.1.4 Statistical analysis of interference and targets . . . . . . . . . . . . . 18

3.2 Scalar CFAR detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.2 Distribution of Test Statistic . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.3 PFA and PD of detector . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.4 Performance under Swerling Fluctuation Models . . . . . . . . . . . . 30

4 Robust Decision Making for Ordered Statistic CFAR 35

4.1 Ordered Statistic CFAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.1 OS-CFAR performance . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.1.2 Effects of Interfering targets . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 Information Elasticity Framework for OS-CFAR . . . . . . . . . . . . . . . . 45

4.2.1 Estimation of J using FAOSOSD . . . . . . . . . . . . . . . . . . . . 46

4.2.2 Performance Function for OS-CFAR . . . . . . . . . . . . . . . . . . 49

4.2.3 Robust decision making method . . . . . . . . . . . . . . . . . . . . . 50

5 Information Elasticity Framework for the AMF 53

5.1 Clairvoyant Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1.1 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1.2 PD and PFA of Clairvoyant Detector . . . . . . . . . . . . . . . . . . 56

5.2 Adaptive Matched Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2.1 Sample Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2.2 Rank Constrained Maximum Likelihood Estimation . . . . . . . . . . 64

5.2.3 Additional SNR required for clairvoyant performance . . . . . . . . . 68

5.3 Information Elasticity Framework for the AMF . . . . . . . . . . . . . . . . 71

5.3.1 Approximation for SNR loss . . . . . . . . . . . . . . . . . . . . . . . 73

5.3.2 User-defined constraint function . . . . . . . . . . . . . . . . . . . . . 79

5.3.3 AMF decision effectiveness . . . . . . . . . . . . . . . . . . . . . . . . 80

v

6 Conclusion 89

APPENDICES 90

Derivation of the AMF 91

A.0.1 Distribution of AMF test statistic . . . . . . . . . . . . . . . . . . . . 95

vi

List of Figures

1 Decision effectiveness shown as a function of information quantity, displaying

an example of the inverted U-curve. . . . . . . . . . . . . . . . . . . . . . . . 3

2 Output of matched filter using PCM pulse compression, shown using two

waveforms with frequency f = 5MHz. The waveform compressed in (a) has a

signal length of T = 1.5 µs, code length of 15 bits, and chip length TC = 0.1 µs.

The waveform compressed in (a) has a length of T = 3.1 µs, code length of

31 bits, and chip length TC = 0.1 µs. . . . . . . . . . . . . . . . . . . . . . . 9

3 D(Q) (PSLR) shown as a function of Q (sequence length). . . . . . . . . . . 10

4 C(Q) (autocorrelation computation time) shown as a function of Q. . . . . . 10

5 E(Q) (PSLR per ns of processing time) shown as a function of Q (sequence

length). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6 Example showing decisions in the criterion space along with their Pareto fron-

tier, utopia point, and nadir point. . . . . . . . . . . . . . . . . . . . . . . . 12

7 distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

8 distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

11 Probability distributions for the sufficient statistic Λ under hypotheses 0 and

1. The H1 distributions are shown for 3 different SNR values. . . . . . . . . 26

12 PD as a function of SNR (SNR is represented with A in equation (38)) for

different values of K for PFA = 1 · 10−6. . . . . . . . . . . . . . . . . . . . . . 31

vii

13 PD vs SNR curves shown for the Swerling I, Swerling III, and non-fluctuating

case for K = 10 and PFA = 10−4. Note that the PD vs SNR curves for

Swerling II and Swerling IV are equal to that of Swerling I and Swerling III

respectively when scalar data samples are used. . . . . . . . . . . . . . . . . 34

14 PD vs m for K = 24 and PFA = 1 · 10−4, shown for different SNR values. . . 41

15 PD vs SNR for K = 24 and PFA = 1 · 10−4 for the Swerling I case, shown for

both CA-CFAR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

16 Effects of interfering targets on PD vs m shown at RSS = 10 dB. Note that

the results from 106 Monte Carlo simulations is displayed with the dotted

curves, showing close agreement with the results in (61). α is found via (53)

for PFA = 10−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

17 Effects of interfering targets on PD vs m shown at RSS = 20 dB. Note that

the results from 106 Monte Carlo simulations is displayed with the dotted

curves, showing close agreement with the results in (61). α is found via (53)

for PFA = 10−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

18 Effects of interfering targets on PFA vs m for RSS = 10 dB. α is found via

(53) for PFA = 10−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

19 P (J |J) shown for K = 20 and different RSS values. . . . . . . . . . . . . . . 48

20 ψ0 shown for P ∗FA = 1 · 10−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

21 µ(x,A) vs Var(x,A) shown for J = 5, N = 20, and P ∗FA = 10−4. Points

shown for 8400 decision points and A = {5, 10, . . . , 90, 95}. Pareto frontier is

shown by the black curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

22 Decision effectiveness shown as a function of the measure of robustness, Var.

Note that the overload point occurs at the decision m = 12 and α = 8.8940. . 52

23 PD vs SNR for PFA = 1 · 10−4 and N = 10 shown for different values of K. . 63

24 PD vs SNR for PFA = 1 · 10−4 and N = 30 shown for different values of K. . 63

viii

25 PD vs SNR for PFA = 1 · 10−4 and N = 20 shown for different values of K.

Note that the degenerate N = K case is shown in blue. . . . . . . . . . . . . 64

26 PD vs SNR curves shown forN = 16, r = 7, K = {3, 4, . . . , 24}, and PFA = 10−4. 67

27 PD vs SNR curves shown for N = 24,r = 9, K = {3, 4, . . . , 24}, and PFA = 10−4. 68

28 PD vs SNR for PFA = 10−4 and K = {N, . . . , 200}, shown for different N

values. Note that darker curves represent PD values of larger K values. . . . 69

29 SNR loss for AMFs of different N values, shown as a function of K. . . . . . 70

30 SNR loss for RCML AMFs of different N and r values, shown as a function

of K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

31 Comparison of PD obtained using numerical methods as in (97) and approxi-

mation as in (101). PD is shown for N = 5, PFA = {10−4, 10−5, 10−6} values

and K = {5, 10, . . . , 50}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76



and K = {50, 51, . . . , 100}. . . . . . . . . . . . . . . . . . . . . . . . . . . . 77



and K = {500, 510, . . . , 600}. . . . . . . . . . . . . . . . . . . . . . . . . . . 78

34 SNR loss as a function K. Calculations from numerical methods in (97) and

approximation in (102) are displayed together, showing close agreement. . . . 79

35 Constraint function C1(Q) for λ1 = λ2 = 0.5, n = m = 1, a = 10−4, b = 10−6,

c = 40, and d = 60. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

36 Constraint function C2(Q) for λ1 = 1/3, λ2 = 2/3, n = 2,m = 4, a = 10−4,

b = 10−6, c = 40, and d = 60. . . . . . . . . . . . . . . . . . . . . . . . . . . 81

37 Decision metric D(Q) for the AMF using SMI and N = 20. Domain param-

eters are a = 10−4, b = 10−6, c = 40. Note that the PFA and K axes are

inverted from the axes in Figures 35 and 36. . . . . . . . . . . . . . . . . . . 82

ix

38 C1(Q) and D(Q) for different decision points for N = 20. Note that points

of a shared color represent data of a shared K value. . . . . . . . . . . . . . 83

39 C2(Q) and D(Q) for different decision points for N = 20. Note that points

of a shared color represent data of a shared K value. . . . . . . . . . . . . . 83

40 Pareto fronts for C1(Q) and D(Q), as well as C2(Q) and D(Q). Note that

these are labeled as Pareto front #1 and Pareto front #2 respectively. . . . . 84

41 Decision effectiveness E of Pareto efficient decisions shown as a function of

their cost C1(Q). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

42 Decision effectiveness E of Pareto efficient decisions shown as a function of

their cost C2(Q). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

x

List of Tables

1 Threshold value α for different values of m and K and for PFA = 1 · 10−4.

These values are obtained using a MATLAB routine involving a line search

method on equation (53). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2 Decisions at which E is minimized for DM 1. . . . . . . . . . . . . . . . . . 85

3 Decisions at which E is minimized for DM 2. . . . . . . . . . . . . . . . . . 85

4 Constraint function parameters. . . . . . . . . . . . . . . . . . . . . . . . . . 87

5 Specification for decision metrics. . . . . . . . . . . . . . . . . . . . . . . . . 87

6 Decisions at which E is minimized for different constraint functions and deci-

sion metrics. Note that w1 = w2 for each decision. . . . . . . . . . . . . . . . 88

xi

Dedication

This thesis is dedicated to my parents, Zheji and Xia, siblings, Michael and Sarah, and wife,

Kristi. Their love and encouragement gave me the inspiration to begin and complete this

academic journey. Also, to God, who upholds me each day.

‘His faithfulness is a shield and bulwark.’ (Psalm 91:4b)

xii

Acknowledgments

The completion of this thesis would not have been possible without the support of my thesis

advisor, Dr. Ram Narayanan. I am extremely grateful for his patience, understanding,

encouragement, and advice which has profoundly impacted me for the better. I would also

like to express gratitude to Dr. Muralidhar Rangaswamy. I am deeply thankful for his

invaluable guidance and expertise, and for his help in establishing foundational concepts in

this work. I would also like to thank Dr. Timothy Kane for serving on my thesis committee,

and for generously offering his time and support. Thanks should also go to the members

of the Radar and Communications Lab for their insight into different topics covered in this

research.

Finally, I would like to extend thanks to Dr. Doug Riecken of the US Air Force Office of

Scientific Research for supporting this research under grant FA9550-17-1-0032.

This content is the solely the responsibility of the author, and does not necessarily rep-

resent the views of the funding agency.

xiii

Chapter 1

Introduction

1.1 Introduction to Information Elasticity

The task of radar detection often requires the selection of different decision parameters,

which are chosen with the goal of improving the overall performance of the system. This

choice of parameters and the definition of “performance” generally vary between contexts

and the decision makers (DM) making choices within them. For example, a decision may

be selected in such a way that produces poor performance for one DM in a given context,

that may also produce good performance for a DM in another context. Thus, the contextual

factors and preferences of the DM must be carefully characterized in order for decisions to

be compared and analyzed.

Information elasticity is a concept that has recently been proposed [1] which seeks to char-

acterize the usability properties of information and its interaction its surrounding context.

Note that in this sense, information does not specifically refer to the concept of Shannon’s

entropy or other concepts in information theory. Rather, it refers to information in a general

sense: information takes the form of data, signals, or processes which increases the knowledge

level of a DM.

With this in mind, certain decision parameters have the ability to affect the quantity of

1

2

information seen by a radar system. When these parameters are adjusted, the quantity of

information and the general effectiveness of these decision parameters change. It is generally

assumed that the effectiveness of decisions improves as the quantity of information increases.

However, it has been shown that this is not always the case, and in certain instances, more

information may actually cause the decision effectiveness to worsen [2],[3],[4].

Information elasticity is defined as the ratio of incremental change in decision effectiveness

and the incremental change in the quantity of information [5]. Thus, a system characterized

by a high information elasticity sees a large increase in decision effectiveness as more infor-

mation is available. Similarly, a system with low information elasticity sees a small increase

in decision effectiveness as more information is available. Furthermore, a system with a

negative information elasticity sees a decrease in decision effectiveness as more information

is available.

Since these measures of information quantity and decision effectiveness generally depend

on contextual factors, they must be specified by a DM. The decision effectiveness, as defined

by a DM, is often affected by two types of factors. The first type are known as decision

quality metrics, which generally take the form of attributes that aid a DM in making better

decisions. For radar applications, examples of decision quality metrics may be probability of

detection or signal-to-noise ratio (SNR). The second type are known as constraint metrics,

which generally take the form of attributes that are undesirable in high levels. Examples of

radar constraint metrics may be processing time or power usage.

These metrics are generally functions of the information quantity, and in many cases

exhibit conflicting trade-off behavior. Typically, as the information quantity increases, the

decision quality metrics see improvement. On the other hand, this increase in information

quantity is also associated with an increase in the undesirable constraints. Under certain

circumstances, increasing the information quantity beyond a certain point may cause the

effects from the system constraints to dominate the effects from the decision quality metrics,

resulting in a decrease in decision effectiveness (information elasticity is negative). When

3

Figure 1: Decision effectiveness shown as a function of information quantity, displaying anexample of the inverted U-curve.

a system reaches this point, the decision effectiveness reaches a maximum, and increasing

the information quantity no longer provides any benefit. This phenomenon is known as

information overload and is shown in the form of an example in Figure 1. The exact point at

which information overload occurs is of interest, since it allows the decision maker to select

the decision that maximizes the decision effectiveness.

1.2 Introduction to CFAR detection

The detection of targets in the presence of clutter, noise, and other disturbances is an

important signal processing problem for many different radar systems. Typically a “detection

threshold” is defined, which is used to declare data values above the threshold as targets and

data values below the threshold as merely disturbance. Certain assumptions can often be

made involving the statistical behavior of this disturbance, allowing for different detection

schemes to be used. Since radar disturbance and interference generally vary depending on

the time/range at which the data are collected (non-stationary disturbance), these detection

schemes are often adaptive, and change according to the disturbance surrounding the data

under test.

Many commonly used adaptive schemes change the detection threshold to fix the rate

at which data containing disturbance only is mistakenly declared as a target (also known

4

as a false alarm). Any detector that accomplishes this task is known to have the constant

false alarm rate (CFAR) property. With this property, a detector is able to achieve a desired

performance level in the receiver operating characteristic space (discussed in Chapter 3.1).

Often, radar data are obtained as a collection of scalar values, representing the power

received from different range bins. This data is used to form decision statistics for each

range bin, which are thereby compared against the detection threshold. Different detection

algorithms typically use different types of decision statistics, depending on the application

of the radar system and the types of limitations they wish to overcome. Two commonly

used algorithms, which are analyzed in this thesis, are cell-averaging CFAR (CA-CFAR)

and ordered statistics CFAR (OS-CFAR) [6].

Other radar systems collect range data over a multitude of different antenna elements as

well as across different pulses sent by each element. In this case, each range bin is represented

by a 1 ×N vector rather than a singular scalar value. This parameter, N , is known as the

spatio-temporal product [7], and is simply the number of antenna elements of the radar

system multiplied with the number of pulses being considered. This thesis studies a well

known multi-dimensional CFAR detector known as the adaptive matched filter (AMF)[8].

The goal of this thesis is to analyze where this information overload behavior exists in

different applications of CFAR detection, and exploit it to make decisions with maximum

decision effectiveness. To accomplish this task, an information elasticity framework is em-

ployed in two different applications using CFAR detection, the first application involving

OS-CFAR, and the second involving the AMF. This thesis is organized as follows: Chapter

2 provides an overview on information elasticity and topics in multi-objective optimization.

Chapter 3 reviews background detection theory as well as analyzes the scalar CA-CFAR

detector. Chapter 4 analyzes the performance of the OS-CFAR detector, and presents an in-

formation elasticity framework which seeks to increase the robustness of decisions. Chapter

5 analyzes the performance of the AMF, and presents an information elasticity framework

for making decisions within different contexts. Chapter 6 serves as the conclusion.

Chapter 2

Background Theory

2.1 Information Elasticity

2.1.1 General Framework

As discussed in Section 1.1, information elasticity is defined as the ratio of the incremen-

tal change of decision effectiveness with respect to the incremental change of information

quantity. This is denoted as follows [5]:

ε =dE/E

dQ/Q(1)

where ε is the information elasticity, E is the decision effectiveness, and Q is the information

quantity. Thus, dQ represents the infinitesimal variation in information quantity, and dE

represents its associated infinitesimal variation in decision effectiveness. Note that it is

possible for Q to be defined such that it exists only in discrete quantities. In this case, these

differential terms are replaced by their respective forward difference terms, ∆E and ∆Q.

The decision effectiveness is generally a function of the decision quality and constraint

metrics described in Section 1.1. Furthermore, each of these metrics are functions of the

information quantity Q. The decision quality metric, represented by D(Q), is defined such

5

6

that it improves as Q increases. Similarly, the constraint function C(Q) is defined such that

it gets worse as Q increases. Thus, both D(Q) and C(Q) are monotonic in Q. Whether

these functions monotonically increase or decrease depends on whether it is desirable for

D(Q) or C(Q) to be minimized or maximized. From these definitions, a trade-off behavior

exists between D(Q) and C(Q), since increasing Q causes a better decision metric, but a

worse constraint function.

Note that the form of E is dependent on the DM. Consider the following simple model

for decision effectiveness:

E =D(Q)

C(Q)(2)

This formulation for decision effectiveness can be thought of as representing the amount of

decision quality achieved per unit of constraint metric. Note that in subsequent chapters,

other formulations for E are used.

Using this simple formulation, the elasticity ε can be broken down as follows. From (2):

ln(E(Q)) = ln(D(Q))− ln(C(Q))

Differentiating with respect to Q yields:

1

E

dE

dQ=

1

D

dD

dQ− 1

C

dC

dQ

=⇒ dE/E

dQ/Q=

dD/D

dQ/Q− dC/C

dQ/Q

The partial elasticities of D and C are defined as:

εD =dD/D

dQ/Q(3)

εC =dC/C

dQ/Q(4)

7

Thus, the elasticity ε can be decomposed as:

ε = εD − εC (5)

At the point of information overload, ε = 0. Thus, in order for information overload to occur

for this particular formulation for E, there must be some value of Q such that εD = εC .

Consider also that it is possible for a decision maker to use multiple decision quality

metrics and constraint metrics [2], which are represented as Ck(Q), k = 1, 2, . . . , K and

Cl(Q), l = 1, 2, . . . , L respectively. A simple example for a decision effectiveness that uses

multiple decision quality and constraint metrics is given as follows:

E =

∏Kk=1D

mkk∏L

l=1Cnll

(6)

where mk and nl represent exponential weightings, which allow a DM to specify emphasis

on metrics that have greater importance. Using the same rearrangements used above, the

elasticity of this particular model can be broken down as:

ε =K∑k=1

mkεDk +L∑l=1

mlεDl (7)

2.1.2 Application of Information Elasticity: Phase coded modu-

lation

In this section, the decision effectiveness model in (2) is used on an example radar application

to demonstrate how the information elasticity framework can be used to make decisions.

This example specifically deals with the application of using phase-coded modulated (PCM)

waveforms for the purpose of pulse compression. This technique is used by many radar

systems to improve the resolution and SNR of a radar. Pulse compression is implemented

using a matched filter, which considers the auto-correlation of a signal. Auto-correlation can

8

be thought of as the cross correlation of a signal with a time-reversed copy of itself [9].

Consider the cross-correlation operation defined as follows:

rxy[m] =∞∑

n=−∞

x[n]y[m− n] (8)

where x[n] is a copy of the transmitted waveform and y[n] is the received waveform. Let us

assume that the radar waveform is typically modulated in a way such that when matched

filtering occurs, the majority of the energy is compressed into a single main pulse of decreased

width. In particular, PCM separates the waveform into sections of equal length known as

chips. A phase shift is applied to each chip based on a given code, or sequence of numbers.

While many different types of codes exist, binary sequences are often considered due

to their simplicity and ease of implementation. In particular, maximal-length sequences

(MLSs) are a class of binary sequences that exhibit desirable auto-correlation properties

[10]. These codes are generated using linear feedback shift registers with taps located at

specific locations. These shift registers produce cyclic sequences, with periods of 2n− 1 bits,

where n is the number of stages in the shift register. Thus, MLS’s only exist in lengths of

2n − 1.

Two examples of pulse compression using PCM are shown in Figure 2. These examples

show that after modulating the waveforms, most of the energy of the signal is now concen-

trated in the central lobe, also known as the mainlobe. Note also, however, that some of the

energy exists outside of the main lobe, in sections of the signal known as sidelobes. These

sidelobes are often problematic in ranging radar applications, since sidelobes associated with

a strong target may mask the main lobe of a weaker target, preventing a user from detecting

the presence of the weaker target [9]. Thus, the metric known as peak-sidelobe ratio (PSLR)

is often used to characterize the size of the sidelobe relative to the main lobe. Generally,

this metric is simply the ratio between the magnitude of the mainlobe to the magnitude of

the largest sidelobe. Clearly, a larger PSLR is desirable for ranging applications.

9

0 1 2 3 4 5 6Time [μs]

0

0.5

1

Magnitude

(b)

0 0.5 1 1.5 2 2.5 3Time [μs]

0

0.5

1

Magnitude

(a)

Figure 2: Output of matched filter using PCM pulse compression, shown using two waveformswith frequency f = 5MHz. The waveform compressed in (a) has a signal length of T = 1.5 µs,code length of 15 bits, and chip length TC = 0.1 µs. The waveform compressed in (a) has alength of T = 3.1 µs, code length of 31 bits, and chip length TC = 0.1 µs.

The examples shown in Figure 2 each use a code of different lengths. Note that the result

in Figure 2(a) uses a code length of 15 bits and has a much higher PSLR than the result

in Figure 2(b), which uses a code length of 31 bits. For MLS’s in general, the longer the

code length, the lower is the PSLR. However, increasing the code length also increases the

amount of operations involved in the matched filtering operation in (8). This is observed by

measuring the computation time required to complete the matched filtering of a pulse.

Thus, for this particular application of radar, a decision maker may consider the sequence

length to represent the information quantity, Q. The PSLR, which has been shown to be a

function of Q, is used as the decision quality metric D(Q). Finally, the computation time

required for auto-correlation, which is also a function of Q, is used as the constraint function

C(Q). Both D(Q) and C(Q) are shown in Figures 3 and 4 respectively. Furthermore, using

these metrics, the formulation of E given in (2) is shown in Figure 5.

Note that the PSLR generally changes based on what cyclic permutation of the phase

10

0 500 1000 1500 2000 2500 3000 3500 4000Q [bits]

0

10

20

30

40

50

60

D(Q

)

Figure 3: D(Q) (PSLR) shown as a function of Q (sequence length).

0 1000 2000 3000 4000 5000 6000 7000 8000 9000Q [bits]

0

50

100

150

200

250

300

C(Q

)[ns]

Figure 4: C(Q) (autocorrelation computation time) shown as a function of Q.

0 500 1000 1500 2000 2500 3000 3500 4000Q [bits]

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

E[PSLR/n

s]

Figure 5: E(Q) (PSLR per ns of processing time) shown as a function of Q (sequence length).

11

code is being used [11]. Thus, the data collected and shown in Figure 3 only shows the

PSLR associated with the cyclic permutation that minimizes the PSLR. Furthermore, the

data collected in Figure 4 considers the autocorrelation time using the “xcorr” function in

MATLAB, averaged over 10,000 runs. Note, however, that these forms of data collection are

meant to represent the decision quality and constraint metrics of the DM in this particular

example only. Other forms of data collection can be used to fit the given context and DM.

For this particular example, information overload is shown to occur at a sequence length of

2047, as shown in Figure 5.

2.2 Topics in Multi-Objective Optimization

Equations (2) and (6) present simple models for the decision effectiveness. However, E may

take other forms, as long as it considers the trade-offs between the constraint functions C

and decision quality metrics D. As C and D can both be thought of as separate objectives

that the DM wishes to optimize, the field of multi-objective optimization can be helpful in

producing a measure for the decision effectiveness E.

In practice, there is typically no single solution that will optimize every objective in

question. For example, in the application in Section 2.1.2, a lower sequence length must

be selected to improve C, but a higher sequence length must be selected to improve D.

Note, however, that it is entirely possible for one decision to be strictly better than another

decision. For example, decision A is strictly better than decision B if all of its objectives are

more optimized than decision A’s objectives. A decision is known as Pareto optimal/efficient

if no other possible decision is strictly better than it [12].

The formal definition of Pareto optimality is as follows. Let f(x) represent a vector

containing the objectives of decision x, and let fi(x) represent the ith objective of decision

x. Say that the DM wishes to minimize each of these objectives. A decision x∗ is considered

Pareto efficient if there is no decision x such that fi(x) < fi(x∗) for all i. The Pareto efficient

12

solutions exist along a frontier known as the Pareto front [12], an example of which is shown

in Figure 6.

Figure 6 shows the criterion space, where the objectives of different decisions are dis-

played. In this particular example, C and D are the only objectives, simulating a possible

example in the information elasticity framework. Note that this example considers low values

of C and D to be more desirable. The figure shows the Pareto efficient solutions using red

x’s. Note that none of the other decisions shown (shown using black x’s) produce objectives

that are strictly better than any of that of the Pareto optimal decisions.

Figure 6 also shows two other points labelled the utopia point and the nadir point. These

points do not represent the objectives of real decisions. Rather, the utopia point represents

the point at which each objective is minimized, represented as:

F0 = {minxf1(x),min

xf2(x), . . . ,min

xfn(x)}

where n is the number of objectives. This point typically only exists in the criterion space

0 2 4 6 8 10C

0

1

2

3

4

5

6

7

8

9

10

D

Pareto Front

Utopia Point

Nadir Point

Figure 6: Example showing decisions in the criterion space along with their Pareto frontier,utopia point, and nadir point.

13

[13], and not in the decision space, since it is not realistic that a single decision optimizes every

objective. This point serves as an idealized baseline where the optimal value of each objective

is represented. Similarly, the nadir point represents the point at which each objective is

maximized, represented as:

F1 = {maxx

f1(x),maxx

f2(x), . . . ,maxx

fn(x)}

This point serves as a baseline where the worst possible value of each objective is represented.

A method known as compromise programming allows the DM to select a solution among

the Pareto optimal set. This method considers the utopia point as an idealized baseline,

and measures the distance from this ideal point to decisions along the Pareto front. In this

method, decisions that produce objectives that are closer in distance to the utopia point are

considered to be better. Thus, the point that minimizes this distance is selected.

Note, however, that in some cases, different objectives have different scales or units.

Thus, this measure of distance may show bias towards objectives that are larger in scale

or objectives that use units that are generally larger in size [13]. Thus, these objectives

are typically normalized before the distance measure is taken. Furthermore, a DM may

consider certain objectives to be more important than others. Thus, a DM may weights

these objectives based on their relative level of importance. The normalized and weighted

distance is given below [13]:

n∑i=1

wi

fi(x)− minx∗∈X

fi(x∗)

minx∗∈X

fi(x∗)−maxx∗∈X

fi(x∗)

p1/p

(9)

where wi represents the weight of the ith objective, and∑n

i wi = 1. Note also that this

normalization and weighting is defined such that the normalized and weighted distance to

the nadir point is 1.

Chapter 3

Fundamentals of CFAR Detection

3.1 Detection Theory

3.1.1 Coherent Detection

A received waveform signal can be represented as follows:

V (t) = A sin(2πft+ Φ) (10)

where A is the amplitude of the signal, f is the frequency of the signal, and Φ is the phase shift

of the signal. Consider that if the radar data are acquired as a collection of instantaneous

values of V (t), then the information of A and Φ are lost. This is problematic, since A

provides the amplitude of the radar signal at a given range bin. Thus, signals are often

modulated in such a way that the amplitude and phase information can both be recovered.

This process is known as coherent detection.

Since we are only concerned with finding the amplitude and the phase, consider a sinusoid

similar to the signal in equation (10), but does not contain the 2πft term in its argument:

VQ(t) = A sin(Φ)

14

15

= Im(AejΦ) (11)

where Im(·) is the imaginary component of its argument. Note that the second line arises

due to Euler’s formula. Note also that VQ(t) is still a function of t, since often the A and Φ

terms are both functions of t.

Consider also a signal that is identical to VQ(t), except it has a 90◦ phase shift:

VI(t) = A sin(Φ + π/2)

= A cos(Φ)

= Re(AejΦ) (12)

=⇒

AejΦ = VI(t) + jVQ(t) (13)

where Re(·) is the real component of its argument.

With the two modulated signals in (11) and (12), it becomes very simple to recover the

amplitude and phase information using the instantaneous values of VI(t) and VQ(t):

A =√

(VQ(t)2 + VI(t)2) (14)

Φ = arctan

(VQ(t)

VI(t)

)(15)

VI(t) and VQ(t) can be obtained by mixing the original signal V (t) with either a sine

term or cosine term, then applying a low pass filter. VI(t) is obtained by mixing V (t) with

sin(2πft), and is known as the “in-phase” component. VQ(t) is obtained by mixing V (t)

with cos(2πft), and is known as the “quadrature” component. This process is demonstrated

16

below:

VQ(t) = LPF [V (t) · cos(2πft)]

= LPF [A sin(4πft+ Φ) + A sin(Φ)]

= A sin(Φ) (16)

VI(t) = LPF [V (t) · sin(2πft)]

= LPF [A cos(Φ)− A cos(4πft+ Φ)]

= A cos(Φ) (17)

where LPF(·) represents the low pass filter operation.

3.1.2 Range Resolution

After the in phase and quadrature components of the received signal are obtained, equation

(14) is used to calculate the amplitude of the received signal at different points in time. This

amplitude of the received waveform is used to provide information on the range of different

targets. Consider a signal that is transmitted at time 0, is reflected off of a target, and arrives

at a radar receiver at time T . Assuming that the signal is being transmitted in free-space,

the range at which the signal was reflected is given by:

R =cT

2(18)

where c is the speed of light. The factor of 2 is introduced to account for the round-trip

travel time.

Furthermore, given the waveform and signal processing techniques used, there exists

17

a minimum range at which two different targets can no longer be differentiated from one

another [9]. This distance is known as the range resolution, which is theoretically expressed

as:

∆R =c

2B(19)

where B is the bandwidth of the radar waveform. Since targets that have a separation less

than ∆R can no longer be differentiated, the received waveform is typically only sampled at a

rate of B. In other words, the data will be sampled at discrete time instances { 1B, 2B, 3B, . . .}.

Using equation (18), these time instances directly correspond to ranges { c2B, 2c

2B, 3c

2B, . . .}.

These ranges are clearly separated by the full theoretical range resolution given in (19).

Thus, each sample corresponds to a different resolution cell or range bin.

3.1.3 Performance measures for detection

Detection schemes use this sampled amplitude data to form a decision statistic for each range

bin. As discussed in Section 1.2, if the statistic is larger than this threshold, the detector will

declare that a target is present within the range bin. Otherwise, the detector will declare

that the range bin contains only noise and interference. This process of hypothesis testing

may result in two types of errors. The first error, known as a missed detection, occurs when

a statistic containing target information falls below the threshold, and is incorrectly labelled

as disturbance. The second type of error, known as a false alarm, occurs when a statistic

containing only disturbance lies above the threshold and is incorrectly labelled as a target.

If the statistical behavior of the target and interference data are known, the probability

of these errors occurring for a given threshold can be found. Clearly, a high probability of

error is undesirable. Thus, the probability of detection, PD (which is simply 1 minus the

probability of making a missed detection), and the probability of false alarm PFA, are used

as measures of performance for a detector.

18

3.1.4 Statistical analysis of interference and targets

Radar interference often arises from phenomena known as clutter. Clutter is the portion

of the radar signal that comes from echoes of unwanted scatterers [14] (for example, birds,

trees, other terrain, etc.). In many cases (when the radar resolution is not too high), a

considerably large amount of these scatterers contribute to the interference in a given range

cell [15]. From the central limit theorem, the total sum of interference from scatterers can

often be thought of as a zero mean Gaussian random variable.

Since the data is thereby modulated using the process shown in (16) and (17), the in-

terference within VI(t) and VQ(t) can be thought of as Gaussian distributed as well. Thus,

the interference is a complex Gaussian random variable when the signal is represented in

its complex form, which is given in equation (13). This random variable is represented as

∼ CN (0, σ2), where σ2 is the noise variance, often assumed to be unknown.

The amplitude A of this interference is thus found by taking the magnitude of a complex

Gaussian random variable. For a random variable X ∼ CN (0, σ2), the real and imaginary

parts are distributed, respectively, as follows: XRe ∼ N (0, σ2/2), XIm ∼ N (0, σ2/2). Thus

the magnitude is A =√X2

Re +X2Im. It is well known that this set of operations produces a

Rayleigh distributed random variable with parameter σ√2

[16].

The probability density function of this statistical model is well known, allowing for the

probability of false alarm to be calculated quite easily. Let P0(x) represent the PDF of the

interference. The probability of false alarm is simply the probability that the interference is

above the detection threshold:

PFA =

∫ ∞η

P0(x)dx (20)

where η is the value of the detection threshold.

The probability of detection is calculated in a similar manner. Consider, when a range

cell contains target information, the data sample contains a combination of target and in-

terference data. The target data are assumed to be deterministic but unknown, and the

19

interference data are assumed to be random and complex Gaussian distributed. Thus, the

sum of target and interference data is complex Gaussian distributed with a non-zero mean

(∼ CN (a, σ2), where a is a deterministic but unknown complex scalar accounting for the

target’s reflectivity and channel propagation effects [7]).

Consider, when a range cell contains a target, has a trait known as the signal-to-noise

ratio (SNR). This SNR is simply the ratio between the power of the target information (Pa)

and the power of the noise (Pn). This is expressed as SNR = PsPn

. For randomly distributed

signals and noise, this definition can be extended to SNR = E(a2)E(n2)

[9]. However, the target

portion of the signal is known to be deterministic. Thus E(a2) = a2. Furthermore, we know

that the noise is zero mean with variance σ2. Therefore, E(n2) = σ2, and thus:

SNR = A =a2

σ2(21)

For the hypothesis 1 case, the signal amplitude can again be found via A =√X2

Re +X2Im.

However, when X has a non zero mean (i.e. X ∼ CN (a, σ2)), the real and imaginary parts

are distributed as XRe ∼ N (a/√

2, σ2/2) and XIm ∼ N (a/√

2, σ2/2). It is well known that

these components yield a magnitude which follows the Rician distribution with parameters

s and σ√2

[16].

Let P1(x) represent the PDF of the target + interference. The Detection probability is

calculated as follows:

PD =

∫ ∞η

P1(x)dx (22)

Figure 7 shows the PDF of the interference (P0(x)) alongside the PDF of the target +

interference (P1(x)). A visual representation of the integrations done in (20) and (22) is

shown in Figure 8. It is clear from this figure that both PFA and PD are dependent on the

threshold η and the distributions. Furthermore, these distributions are dependent on the

SNR by merit of a and σ2.

Figure 9 shows PD and PFA as a function of η using a fixed σ2 and s. As deduced from

20

Figure 9, if η is close to 0, both PD and PFA are near their maximum value of 1. As η begins

to increase, both PD and PFA monotonically decrease, until they each reach nearly 0. Thus,

each value of η provides an operating point with a distinct pairing of PD and PFA values.

Thus, each PD values is directly associated with a PFA, and vice versa.

From this, PD can be considered to be a function of PFA. This is displayed using the

receiver operating characteristic (ROC) curve, which is shown in Figure 10. As this figure

shows, there is a direct trade-off behavior between PD and PFA. Note that each curve

exhibits monotonic behavior, in that decreasing the false alarm rate will also decrease the

detection probability. Similarly, increasing the probability of detection will also increase the

probability of getting a false alarm.

ROC curves provide a visual representation of how the detector performs at different

threshold values. These performance curves change when the noise variance changes, as

shown in Figure 10. Note that this threshold value is a a decision variable, since it is

0 1 2 3 4 5 6 7 8x

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Likelihood

P0(x)

P1(x)

Figure 7: This figure shows the probability distributions of noise (∼ Rayleigh( σ√2)) and

noise + interference (∼ Rice(s, σ√2)). . For this example, the noise variance is σ2 = 1 and

the target amplitude is a = 2.

21

0 2 4 6 8x

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P0(x)

0 2 4 6 8x

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P1(x)

Figure 8: Visual representation of finding PD and PFA using a threshold value of 1.7. P0(x)and P1(x) are again distributed as X0 ∼ Rayleigh( σ√

2) and X1 ∼ Rice(a, σ√

2) respectively,

where a = 2 and σ2 = 1. In this example, PFA = 0.0555 and PD = 0.9606.

0 1 2 3 4 5 6Threshold value η

0

0.5

1

PD

0 1 2 3 4 5 6Threshold value η

0

0.5

1

PFA

Figure 9: PD and PFA as a function of the threshold value η. PDFs P0(x) and P1(x) arethe same as the distributions used in Figures 7 and 8.

selected by a DM. However, the noise variance is an environmental variable, as it cannot be

chosen and is often unknown to the decision maker. Thus, the threshold of the detection

22

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1PFA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PD

σ2 = 1, SNR = 4

σ2 = 2, SNR = 2

σ2 = 3, SNR = 1.333

Figure 10: ROC curve shown for three different values of σ2, while a = 2.

scheme should be carefully selected to provide suitable operating conditions of PD and PFA,

even when the noise variances changes. One such way this is accomplished is by adapting

the threshold based on the noise variance surrounding the target to provide the same exact

PFA for all detections. This methodology is known as CFAR detection.

3.2 Scalar CFAR detection

3.2.1 Likelihood Ratio Test

Section 3.1 discussed the statistical behavior of typical radar interference and target + in-

terference data. Using this assumption of Gaussian behavior, a likelihood ratio test is set

up to determine whether data from different range cells contain target information or not.

Consider the amplitude data from a single range cell, represented as x. This is sometimes

referred to as the “cell under test” (CUT) or the primary data, since we are testing it for

the presence of target information. This data must follow one of two different hypotheses.

Hypothesis 0, represented as H0, states that the range cell in question contains disturbance

23

only. Hypothesis 1, represented as H1, states that the range cell contains a combination of

disturbance and target data. Under these hypotheses, the primary data is given as follows:

H0 : x = n

H1 : x = a+ n

where n is a complex Gaussian random variable (as discussed in Section 3.1) and a is a

deterministic but unknown complex scalar accounting for the target’s reflectivity and channel

propagation effects [7]. It is unknown whether the primary observation data follows H0 or

H1, however it is assumed that it follows a complex Gaussian distribution in either instance.

The complex Gaussian distribution of a complex scalar under each hypothesis is as follows

[17]:

fx|H0(x|H0, σ) =1

πσ2e−

x2

σ2 (23)

fx|H1(x|H1, σ) =1

πσ2e−

(x−a)2

σ2 (24)

These distributions are each functions of σ, and describe the likelihood that x came from

either H0 or H1. Thus, the ratio between the two generally describes a level of confidence that

one hypothesis occurred over the other. In practice, if this ratio is larger than a threshold,

then H1 is selected. Otherwise, H0 is selected. This likelihood ratio test Λ is derived as

follows:

Λ(x) =fx|H1(x|H1, σ)

fx|H0(x|H0, σ)

= (1

πσ2e−

(x−a)2

σ2 )/(1

πσ2e−

x2

σ2 )

= e−(x2−2ax+a2)+x2

σ2

24

This likelihood ratio test is simplified by taking its natural logarithm, yielding the log like-

lihood ratio:

Λ(x) = ln Λ(x)

=2ax− a2

σ2(25)

To account for the fact that this a parameter is unknown, the value of a that maximizes this

likelihood ratio test is used. a is maximized as follows:

∂

∂aΛ(x) =

x− a∗

σ2= 0

=⇒ a∗ = x

where a∗ is the maximum likelihood estimate of a. Substituting this into equation (25) for

a yields:

Λ(x) =|x|2

σ2= |y|2

H1

≷H0

η (26)

where y = x/σ.

3.2.2 Distribution of Test Statistic

The Λ(x) term given in (26) is the test statistic used in the hypothesis test. Thus, if

the distribution of Λ(x) is known, then PFA and PD can be easily found. From (23), we

know that x is distributed as xH0 ∼ CN (0, σ2) when H0 is assumed. Thus, y must be

distributed as yH0 ∼ CN (0, 1) when H0 is assumed. Similarly, from (24) we know that x is

distributed as xH1 ∼ CN (a, σ2) when H1 is assumed. Thus, it is clear that y is distributed

as yH1 ∼ CN ( aσ, 1) when H1 is assumed. Furthermore, from (21), it is clear that the mean

value, aσ, is simply the square root of the SNR. Thus, yH1 ∼ CN (

√A, 1), where A is the

SNR.

25

Consider the magnitude of y under each hypothesis. From the discussion in Section 3.1,

it is clear that the magnitude of yH0 ∼ CN (0, 1) is distributed as |yH0| ∼ Rayleigh(√

2/2),

and the magnitude of yH1 ∼ CN (√A, 1) is distributed as |yH1| ∼ Rice(

√A,√

2/2). It is

well known that taking the square of a Rayleigh distributed variable produces an exponen-

tially distributed variable, and taking the square of a Rician distributed variable produces a

variable with a non-central Chi-squared distribution [16].

Specifically, |yH0|2 = ΛH0 ∼ exp(1) and |yH1|2 = ΛH1 ∼ 0.5 · χ2(2, 2A) (non-central Chi-

squared with 2 degrees of freedom, non-centrality parameter 2A, and a scaling factor of 0.5).

The exact form of these distributions are given as:

fΛ(x|H0) = e−x for x > 0 (27)

fΛ(x|H1) = e−(x+A)I0(2√Ax) for x > 0 (28)

where Iα(·) is the modified Bessel function of the first kind. These distributions are shown in

Figure 11. However, as discussed, the SNR parameter A = a2

σ2 is unknown. To account for the

unknown a value, the maximum likelihood estimate a∗ is used. To account for the unknown σ

parameter, an estimate σ is used. This estimate is found using multiple pieces of observation

data other than the primary data sample. These pieces of data are commonly referred to

as secondary or training data samples, and are represented here as x(k) : k = {1, . . . , K}.

These samples are typically made up of data from K range cells surrounding the CUT, since

it is assumed that the disturbance in these cells is similar to that in the CUT. However,

typically a specified number of range cells immediately surrounding the CUT are not used

for these secondary samples, as they may contain data related to the primary data that may

bias the estimate [9]. These unused cells are known as guard cells.

For the purpose of analysis, it is assumed that these secondary samples contain distur-

26

0 10 20 30 40 50 60 70 80 90 100x

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f Λ(x|H

)

fΛ(x|H0)fΛ(x|H1), SNR = 10 dBfΛ(x|H1), SNR = 15 dBfΛ(x|H1), SNR = 18 dB

Figure 11: Probability distributions for the sufficient statistic Λ under hypotheses 0 and 1.The H1 distributions are shown for 3 different SNR values.

bance only. Using these data samples, the sample variance is found as follows:

σ2 =1

K

K∑k=1

|x(k)|2 (29)

Since these secondary samples are all assumed to follow hypothesis 0, x(k) ∼ CN (0, σ2).

Furthermore, each sample can be split into their real parts and imaginary parts, given by

xRe(k) ∼ N (0, σ2

2) and xIm(k) ∼ N (0, σ

2

2) respectively. These variables can each be

rewritten as a scalar times a standard Gaussian random variable:

xRe(k) =

√σ2

2z1(k) xRe(k) =

√σ2

2z2(k)

where z1(k), z2(k) ∼ N (0, 1). Using these statements, the sample variance in equation (29)

can be expanded as:

27

σ2 =1

K

K∑k=1

x2Re(k) + x2

Im(k)

=1

K

K∑k=1

(√σ2

2z1(k)

)2

+

(√σ2

2z2(k)

)2

=σ2

2K

K∑k=1

(z1(k))2 + (z2(k))2

σ2 =σ2

KT (30)

where T = 12

∑Kk=1 (z1(k))2 + (z2(k))2.

Clearly, T is one half of the sum of 2K squared standard Gaussians. It is well known

that the sum of m squared standard Gaussians follows a Chi-squared distribution with m

degrees of freedom [16]. Thus, T must follow a chi-squared distribution with 2K degrees of

freedom that is scaled by 1/2. The exact distribution follows:

fT (t) =2

2K(K − 1)!(2t)K−1e−2t/2

fT (t) =tK−1

(K − 1)!e−t for t > 0 (31)

Now, substituting the σ2 in equation (26) with the estimate σ2 found in (30), we obtain:

|x|2

σ2=K|x|2

σ2T

H1

≷H0

η

Λ

T

H1

≷H0

α (32)

where α = ηK

is the new threshold value, since K is known. Both Λ and T are random

variables, and are assumed to be independent from one another. Their ratio distribution can

28

be found using the following formula [18]:

fZ(z|H) =

∫ ∞−∞|t|fΛ(zt|H)fT (t)dt (33)

where Z = ΛT

. Thus, for the hypothesis 0 case:

fZ(z|H0) =

∫ ∞0

|t|e−zt tK−1

(K − 1)!e−tdt

=1

(K − 1)!

∫ ∞0

tKe−t(1+z)dt

=1

(K − 1)!· K!

(1 + z)K+1

fZ(z|H0) =K!

(1 + z)K+1for z > 0 (34)

For the hypothesis 1 case:

fZ(z|H1) =

∫ ∞0

|t|e−(zt+A)I0(2√

2Azt)tK−1

(K − 1)!e−tdt

=1

(K − 1)!e−A

∫ ∞0

tKe−t(1+z)

∞∑m=0

√Atz

2m

(m!)2dt

=e−A

(K − 1)!

∞∑m=0

(Az)m

(m!)2

∫ ∞0

tK+me−t(1+z)dt

fZ(z|H1) =e−A

(K − 1)!

∞∑m=0

(Az)m

(m!)2

(K +m)!

(1 + z)K+m+1(35)

3.2.3 PFA and PD of detector

To find the probability of false alarm, one must apply equation (20), using equation (34) as

the distribution of the test statistic under hypothesis 0. Thus:

PFA =

∫ ∞α

fZ(z|H0)dz

29

=

∫ ∞α

K!

(1 + z)K+1dz

= (1 + α)−K (36)

Note that the PFA is independent from both the noise variance σ2 and the SNR. In fact, the

probability of false alarm is only dependent on the detection threshold α, and the number

of samples, K, that are used to estimate the noise variance.

Thus, given some number of samples used K, one can set a desired PFA by carefully

choosing the detection threshold α. For a given PFA and K, this choice of α is given as:

α = P−1/KFA − 1 (37)

Note that in order for a detector to be CFAR, it is necessary for the PFA to be independent

of the true interference variance parameter, σ2. In the case of multi-dimensional CFAR

(discussed in Chapter 5), PFA must be independent of the true interference covariance matrix,

Σ.

Similarly, the detection probability is found by applying (22) using (35) as the test

statistic distribution under hypothesis 1:

PD =

∫ ∞α

fZ(z|H1)dz

=

∫ ∞α

e−A

(K − 1)!

∞∑m=0

(Az)m

(m!)2

(K +m)!

(1 + z)K+m+1dz

=e−A

(K − 1)!

∞∑m=0

Am(K +m)!

(m!)2

∫ ∞α

zm

(1 + z)K+m+1dz

=e−A

(K − 1)!

∞∑m=0

(Am(K +m)!

(m!)2

m∑j=0

(K +m− 1− j)!(K +m)!

m!

(m− j)!αm−j

(α + 1)K+m−j

)

=e−A

(K − 1)!

∞∑m=0

(Am

m!

m∑j=0

(K +m− 1− j)!(m− j)!

αm−j

(α + 1)K+m−j

)

=e−A

(K − 1)!

∞∑m=0

(Am

m!

m∑i=0

(K + i− 1)!

i!

αi

(α + 1)K+i

)

30

=e−A

(K − 1)!

∞∑m=0

(Am

m!

m∑i=0

(K − 1)!

(i+K − 1

i

)αi

(α + 1)K+i

)

= e−A∞∑m=0

(Am

m!

m∑i=0

(i+K − 1

i

)(α

α + 1

)i(1

α + 1

)K)

Note that the terms within the second summation((

i+k−1i

) (αα+1

)i ( 1α+1

)K)is the prob-

ability mass function for a negative binomial random variable with parameters K and α1+α

.

Thus, the summation of these terms produces the negative binomial cumulative distribu-

tion function, which has the form of a regularized incomplete Beta function [16]. Thus, the

detection probability can be written as:

PD = e−A∞∑m=0

Am

m!I 1

1+α(K,m+ 1) (38)

where the regularized incomplete beta function is: Ix(a, b) = (a+b−1)!(a−1)!(b−1)!

∫ x0ta−1(1 − t)b−1dt.

Note that other closed forms of equation (38) exist, including a form that does not make use

of an infinite series [19].

The performance of this detector is shown in Figure 12 by displaying the relationship

between PD and SNR. Each curve on this figure shows PD vs SNR for different values of K.

A monotonic relationship between PD and SNR is clearly shown in this figure. Furthermore,

it can be noted that as K increases, the PD vs SNR curve for this detector appears to shift to

the left. This shift implies that at a fixed SNR value, two detectors with different K values

will perform differently. The detector with larger K will have a higher PD.

3.2.4 Performance under Swerling Fluctuation Models

In the above analysis, the SNR is assumed to be a deterministic value. However in certain

applications of radar, the SNR has the tendency to fluctuate and thus behave like a random

variable. Consider a radar whose antenna beam dwells on different targets for a given amount

of time. These periods of time during which the radar is collecting data are called “scans”.

31

0 5 10 15 20 25SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

K = 10

K = 20

K = 80

Figure 12: PD as a function of SNR (SNR is represented with A in equation (38)) for differentvalues of K for PFA = 1 · 10−6.

Consider also that during each scan, the radar collects data across multiple pulses.

When considering a radar that collects data this way, the fluctuation in SNR is often

thought to follow one of four different cases [20]. First consider a case when the SNR stays

relatively the same across different pulses, but fluctuates across different scans. Secondly,

consider a case when the SNR fluctuates wildly across every single pulse. Swerling cases I

and III describe cases when the fluctuation is from scan to scan, and cases II and IV describe

the pulse to pulse fluctuation. Note also that the type and amount of scatters present also

affects the type of fluctuation. Swerling cases I and II both consider when a target has many

different independent scatterers of about the same size. Swerling cases III and IV both

consider when a target is a combination of one larger scattering surface and many smaller

reflectors [20].

The SNR must now be represented as a random variable. One way to represent the

SNR is γA, where A is a deterministic value representing the average SNR, γ is its random

32

loss/gain multiplier term. Depending on the Swerling case, γ is distributed as [20]:

f(γ) =MM

(M − 1)!γM−1 exp (−Mγ) for γ > 0 (39)

where M =

1 for Swerling I

N for Swerling II

2 for Swerling III

2N for Swerling IV

where N is the spatio-temporal product. Note that N = 1 in the scalar CFAR case.

The detection probability can be found using the same process as before, except using a

different H1 distribution fΛ(x|H1). This distribution can be found by replacing the A term

in equation (28) with γA, then taking the expectation with respect to γ. This process is

shown below for the Swerling I case:

fΛ(x|H1) = E[e−(x+γA)I0(2

√Aγx)

]=

∫ ∞0

e−(x+γA)

∞∑m=0

(γAx)m

(m!)2f(γ)dγ

= e−x∞∑m=0

(Ax)m

(m!)2

∫ ∞0

e−γ(A+1)γmdγ

= e−x∞∑m=0

(Ax)m

(m!)2

m!

(1 + A)m+1

=e−x

1 + A

∞∑m=0

1

m!

(Ax

1 + A

)m=

e−x

1 + Aexp

(Ax

1 + A

)=

1

1 + Aexp

(−x

1 + A

)(40)

Since the conditional distribution for Λ depends on the unknown parameter σ (by merit

of A), the conditional distribution for the variable Z = ΛT

is found instead, as in equation (33)

33

(since Z is instead dependent on the estimated σ). This new test statistic Z is distributed

as follows:

fZ(z|H1) =

∫ ∞0

|t| 1

1 + Aexp

(−zt

1 + A

)tK−1

(K − 1)!e−tdt

=1

(K − 1)!(1 + A)

∫ ∞0

e−t(1+A+z

1+A )dt

=1

(K − 1)!(1 + A)· K!(

1+A+z1+A

)K+1

=K(1 + A)K

(1 + A+ z)K+1(41)

Finally, equation (22) is used to find the detection probability, using the fZ(z|H1) term

found in (41) as the distribution of the target + interference:

PD,Swerling I =

∫ ∞α

K(1 + A)K

(1 + A+ z)K+1dz

= K(1 + A)K[−1

K(1 + A+ z)−K

]∞α

=

(1 + A

1 + A+ α

)K(42)

This formulation for PD is much less cumbersome than PD for the non-fluctuating case,

shown in (38). The PD for the Swerling III case can also be found, using a very similar

process. While the steps are not shown here, the final form of PD under Swerling III is given

as follows:

PD,Swerling III =

(2 + A

A+ 2(1 + α)

)K+1(1 +

2α(2 + A(1 +K))

(2 + A)2

)(43)

Note that the Swerling fluctuation models only affect the SNR parameter A. Since A

is not used in the derivation of the PFA or the detection threshold α, these values are the

same as the formulations in (36) and (37), regardless of which fluctuation model is being

34

0 5 10 15 20 25 30 35SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

Non-fluctuating

Swerling I

Swerling III

Figure 13: PD vs SNR curves shown for the Swerling I, Swerling III, and non-fluctuatingcase for K = 10 and PFA = 10−4. Note that the PD vs SNR curves for Swerling II andSwerling IV are equal to that of Swerling I and Swerling III respectively when scalar datasamples are used.

used. The same statement applies to the 0 hypothesis distributions, fΛ(x|H0) in (27) and

fZ(z|H0) in (34). The relationship between PD and SNR for the different Swerling cases are

shown in Figure 13.

Chapter 4

Robust Decision Making for Ordered

Statistic CFAR

4.1 Ordered Statistic CFAR

So far, the analysis of PFA and PD for scalar CFAR has assumed that the secondary samples

x(k) are independent and identically distributed as noise/interference. Because of this, it’s

simple to estimate the variance of this noise/interference by using sample variance, as in

(29). This can also be thought of as taking the mean value of y(k) = |x(k)|2:

σ2 =1

K

K∑k=1

y(k)

These y(k) samples are obtained by applying a square law detector on the secondary samples

x(k) [9]. Since the detection scheme uses a statistic, σ2, that is a sample average of these

y(k) data samples, the detection scheme discussed in section 3.2 is commonly known as cell

averaging CFAR (CA-CFAR) [9].

Unfortunately, while a sample average is easy to implement, it is very susceptible to

producing poor estimates when the secondary data have outliers or samples that are not

identically distributed. Furthermore, in practice, many scenarios arise where secondary

35

36

samples are distributed differently. One such scenario is when the clutter interference comes

from different sources (such as different terrain or scatterer types) [6]. Another scenario

is when some of the secondary samples contain target information. These differing clutter

types and interfering targets cause the secondary samples to be non-homogeneous which is

known to degrade detection performance [21].

When secondary samples contain non-homogeneous data, the sample variance σ2 given

in (29) no longer provides an accurate estimate for the variance of the disturbance in the

primary data sample. Thus, other methods of estimating this variance are used which aim

to be more robust when these heterogeneities exist in the secondary data. One such method

orders the secondary data samples by their size, and selects the mth largest value as the

interference estimate. This method is known as ordered statistic CFAR (OS-CFAR).

4.1.1 OS-CFAR performance

The assumptions on noise used in Section 3.2.1 are used in this section as well. Specifically, it

is assumed that the interference follows a complex Gaussian distribution, and is independent

from sample to sample. It has been shown that this detector has the CFAR property when the

interference follows an exponential distribution [6]. As we have shown in previous sections,

if x ∼ CN (0, σ2), then |x|2 ∼ exponential(1/σ2). Thus, let the primary data sample be

represented as z = |x|2. Similarly, let the secondary data samples be represented as z(k) =

|x(k)|2 for k = {1, . . . , K}.

Out of the K secondary samples, let the random variable T represent the value of the

mth highest value. The hypothesis test for OS-CFAR is defined as [6]:

zH1

≷H0

Tα (44)

where α is a constant, scalar multiplier term. Note that there are random variables on both

sides of this inequality, since both z and T are random. It is entirely possible to put both

37

random variables on the same side of the inequality, to obtain a single detection statistic in

the form of the ratio zT

. However, the PDF for zT

does not easily admit a closed form. Thus,

instead, the PFA is calculated as follows:

PFA|T=t =

∫ ∞tα

fz(z|H0)dz (45)

where fz(·) is the distribution of z, and PFA|T=t is the probability of false alarm given that

the random variable T is equal to the value t.

Thus, to find PFA, one can take the expectation of PFA|T=t across T :

PFA = ET

[PFA|T=t

]=

∫ ∞−∞

(fT (t)

∫ ∞tα

fz(z|H0)dz dt

)(46)

Similarly, the detection probability PD has the form:

PD =

∫ ∞−∞

(fT (t)

∫ ∞tα

fz(z|H1)dz dt

)(47)

From (46) and (47), PFA and PD can be calculated as long as fT (t), fz(z|H0) and fz(z|H1)

are known.

Recall from equation (26) that Λ = |x|2σ2 , which implies that z = Λσ2. Thus, fz(z|Hi) is

just a scaled version of fΛ(x|Hi), with a scaling coefficient of σ2:

fz(z|Hi) =1

σ2fΛ

(z

σ2

∣∣∣∣Hi

)

This equation can be applied to equations (27) and (40), yielding:

fz(z|H0) =1

σ2exp

(−z/σ2

)(48)

38

fz(z|H1) =1

σ2(1 + A)exp

(−z

σ2(1 + A)

)(49)

Note that equation (28) can be also be used to solve for fz(z|H1). However, (40) is used here

since it is less cumbersome and more easily yields a closed form for the detection probability.

Recall that T represents the secondary sample with themth largest value. The assumption

that all secondary samples are independent, and identically distributed as interference only is

used for this derivation as well. For a collection of K independent and identically distributed

random variables, the probability distribution of the mth largest value is as follows[16]:

fT (t) = fm(z) = m

(K

m

)[1− F (z(k))]K−m [F (z(k))]m−1 f(z(k)) (50)

where F (z(k)) is the cumulative distribution function of z(k), and f(z(k)) is the probability

density function of z(k). Since these secondary samples follow the noise only hypothesis,

f(z) has the same distribution as (48), which has a cumulative distribution of:

F (z(k)) = 1− exp(−z/σ2)

Using these distributions, fT (t) is found to be:

fT (t) = m

(K

m

)[exp

(−tσ2

)]K−m [1− exp

(−tσ2

)]m−11

σ2exp

(−tσ2

)fT (t) =

m

σ2

(K

m

)[exp

(−tσ2

)]K−m+1 [1− exp

(−tσ2

)]m−1

for t > 0 (51)

Thus, PFA is:

PFA =

∫ ∞0

m

σ2

(K

m

)[exp

(−tσ2

)]K−m+1 [1− exp

(−tσ2

)]m−1 ∫ ∞tα

1

σ2exp

(−y/σ2

)dy dt

=m

σ2

(K

m

)∫ ∞−∞

[exp

(−tσ2

)]K−m+1 [1− exp

(−tσ2

)]m−1

exp

(−αtσ2

)dt

=m

σ2

(K

m

)∫ ∞−∞

exp

(−tσ2

(K −m+ 1 + α)

)[1− exp

(−tσ2

)]m−1

dt (52)

39

Consider the change of variables: x = tσ2 =⇒ dx = dt

σ2 :

PFA = m

(K

m

)∫ ∞−∞

exp (−x(K −m+ 1 + α)) [1− exp (−x)]m−1 dx

=m−1∏i=0

K − iK − i+ α

(53)

PD is found in this same manner. Since both (48) and (49) are exponentially distributed,

the derivation is very similar and is excluded here. The equation for PD is as follows:

PD =m−1∏i=0

K − iK − i+ α

1+A

(54)

For the CA-CFAR case, a closed form for α given some desired PFA value was obtained

in equation (37). However, in the OS-CFAR case, the threshold term given a desired PFA

must be obtained from equation (53), which does not easily permit a closed form. However,

PFA monotonically decrease in α, thus numerical methods, such as line search or Newton’s

method can be used to solve for α given a desired PFA. Below, Table 1 shows selected values

of α that have been numerically obtained using a MATLAB routine involving a line search

method.

Using these numerically calculated threshold values, equation (54) is used to calculate

the PD as a function of m at different SNR values. This relationship is shown in Figure

14. From this figure, it’s clear that the detection probability is quite poor at low m. As m

increases, however, the PD begins to rise up to a maximum point. Eventually, the PD begins

to decrease again as m nears K.

Figure 14 shows the particular case of K = 24 and PFA = 1 · 10−4. In this case, the PD

reaches a maximum around m = 20, or m = 21, depending on the SNR. As shown in the

figure, the PD surrounding the maximum are very close in value. Much of the literature on

OS-CFAR agrees that, while the maximum PD occurs at around m = 7K/8, it is better to

use a value of m = 3K/4, since it allows for the censoring of more interfering targets [6],[22].

40

Table 1: Threshold value α for different values of m and K and for PFA = 1 · 10−4. Thesevalues are obtained using a MATLAB routine involving a line search method on equation(53).

m α for K = 16 α for K = 20 α for K = 24...

......

...14 7.43066660 11.6549278 15.605634015 5.94201100 9.999266 13.661026616 4.49169440 8.5735688 12.011078817 7.315811 10.587831218 6.173234 9.3408045019 5.0893566 8.2310330020 3.9641966 7.2268355021 6.3002670022 5.4227899023 4.5563756024 3.61892660

Figure 15 compares the PD vs SNR curves for CA-CFAR and OS-CFAR when 24 sec-

ondary samples are used. Clearly, there is a slight loss in PD when using OS-CFAR and

m = 7K/8. However, as discussed, OS-CFAR is much more robust when dealing with sec-

ondary data samples that are differently distributed (non-homogeneous). This small amount

of loss in performance is traded for increased performance in scenarios where interfering tar-

gets or clutter edges are present.

41

2 4 6 8 10 12 14 16 18 20 22 24m

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

PD

SNR = 5 dBSNR = 6 dBSNR = 7 dBSNR = 8 dBSNR = 9 dBSNR = 10 dB

Figure 14: PD vs m for K = 24 and PFA = 1 · 10−4, shown for different SNR values.

0 5 10 15 20 25 30SNR [dB]

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PD

OS-CFAR, m = 20

CA-CFAR

Figure 15: PD vs SNR for K = 24 and PFA = 1 · 10−4 for the Swerling I case, shown forboth CA-CFAR.

42

4.1.2 Effects of Interfering targets

Note that equations (53) and (54) provide formulations for the PFA and PD when all sec-

ondary samples are homogeneous and disturbance only. While OS-CFAR has been proposed

as an algorithm with increased robustness towards non-homogeneous secondary samples, the

effects of non-homogeneous secondary samples still cause the PD and PFA to suffer. In par-

ticular, the effects of the secondary samples containing interfering targets are observed in

this section.

Let J represent the number of interfering targets present within the secondary samples.

Consider that each interfering target has a relative signal strength (RSS), which must also

be taken into account. Note that equation (50) provides the probability distribution for

the mth largest secondary sample, assuming that all secondary samples are independent

and identically distributed. The distribution of the mth largest secondary sample when the

secondary samples are independent but not necessarily identically distributed is given as

follows [23]:

gm(t) =1

(m− 1)!(K −m)!

∑p

Fi1(t)...Fim−1(t) · fim(t) · {1− Fim+1(t)}...{1− FiK (t)} (55)

where Fi1 , . . . , FiK and fi1 , . . . , fiK respectively represent the CDFs and PDFs of the K

different secondary samples. Note that i1, i2, . . . , iK represent different possible orderings of

1, 2, . . . , K, and∑

p represents the summation over every possible permutation of i1, i2, . . . , iK .

Note that K! total permutations exist for a list of length K, and thus there is an extremely

large number of summed terms in (55) for large K. However, this number of summed

terms is reduced if one considers that the secondary samples containing disturbance are

identically distributed. Furthermore, the number of summed terms is reduced even further

if the assumption is made that each interfering target has the same RSS [24].

Given these assumptions, each secondary sample containing an interfering target follows

hypothesis 1, and has the PDF given in (49). Similarly, each secondary sample containing

43

disturbance follows hypothesis 0 and has the distribution shown in (48). Thus:

f0(t) =1

σ2exp(−t/σ2) (56)

f1(t) =1

σ2(1 + A)exp

(−t

σ2(1 + A)

)(57)

F0(t) = 1− exp(−t/σ2) (58)

F1(t) = 1− exp

(−t

σ2(1 + A)

)(59)

where f0(t) and F0(t) represent the distributions of the secondary samples containing distur-

bance, and f1(t) and F1(t) represent the distributions of the secondary samples containing

interfering targets, and A is the RSS of the interfering targets.

Thus, now that the distributions in (55) are represented by a combination of J zeros

and K − J ones, not every permutation of i1, i2, . . . , iK will be unique. In fact, out of the

K! permutations of J zeros and K − J ones, only(KJ

)of them are unique. Note that each

of these unique permutations are repeated a total of J !(K − J)! times, thus resulting in(KJ

)· J !(K − J)! = K! total permutations. Furthermore, since (55) considers the product of

repeated terms, the number of summed terms in (55) is reduced even further. By taking all

of these observations into account, the following formulation of gm(t) is written:

gm(t) =J !(K − J)!

(m− 1)!(K −m)!

{ min(m−1,J)∑`2=max(0,J−K+m)

F `21 Fm−1−`20 · f0 · {1− F1}J−`2{1− F0}K−m−J+`2

(m− 1

`2

)(K −mJ − `2

)

+

min(m−1,J−1)∑`1=max(0,J−1−K+m)

F `11 Fm−1−`10 · f1 · {1− F1}J−1−`1{1− F0}K−m−J+1+`1

(m− 1

`1

)( K −mJ − 1− `1

)}(60)

As this is the distribution for the mth largest secondary sample, it can be used to find

the PD of the OS-CFAR detector via (47) as follows:

PD =

∫ ∞0

gm(t)

∫ ∞tα

1

σ2(1 + A)exp

(−t

σ2(1 + A)

)dt

44

=

∫ ∞0

gm(t) exp

(−αt

σ2(1 + A)

)dt (61)

Using the threshold values found via (53), the relationship between PD and m given in

(61) is shown in Figures 16 and 17. It is clear from these figures that while m = 7K/8 = 21

provides the best results when J = 0, the PD worsens significantly for J > 0. Furthermore,

m = 3K/4 = 18 is clearly much more robust in these cases, as was concluded in [6] and [22].

Figures 16 and 17 also compares the formulation in (61) with data collected from 106 Monte

Carlo simulations, showing close agreement.

Note also that interfering targets also affect the PFA value. Consider, the distribution

for the mth largest secondary sample given in (60) can be used to find the PFA via (46) as

follows:

PFA =

∫ ∞0

gm(t)

∫ ∞tα

1

σ2exp

(−y/σ2

)dy dt

PFA =

∫ ∞0

gm(t) exp

(−αtσ2

)dt (62)

2 4 6 8 10 12 14 16 18 20 22 24m

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PD

J = 0J = 1J = 2J = 3J = 4J = 5MC Data

Figure 16: Effects of interfering targets on PD vs m shown at RSS = 10 dB. Note that theresults from 106 Monte Carlo simulations is displayed with the dotted curves, showing closeagreement with the results in (61). α is found via (53) for PFA = 10−4.

45

The effects of interfering targets on the PFA is shown in Figure 18. The data in this figure

uses an α that is calculated via (53) for PFA = 10−4. Note that this α works as intended in

the case when there are no interfering targets, and the PFA is set to be a constant value of

10−4. However, the PFA begins to decrease below the prescribed value of 10−4 as interfering

targets are introduced. Note that a lower PFA value is also associated with a lower PD value,

as explained in Section 3.1.4. Part of this loss in PD is observed in Figures 16 and 17.

4.2 Information Elasticity Framework for OS-CFAR

Consider, given a fixed K, the PD and PFA of a OS-CFAR detector are generally a function

of four different variables. Two of these variables are parameters that the decision maker

has control of, namely the threshold α and the order statistic parameter m. The remaining

two variables are environmental parameters that are typically unknown to a decision maker:

the number of interfering targets J and the RSS of said targets.

Ideally, the decision maker would be able to select the decision variables (α and m) to be

able to fix the PFA at a constant value while also providing a satisfactory PD, as is the goal

2 4 6 8 10 12 14 16 18 20 22 24m

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PD

J = 0J = 1J = 2J = 3J = 4J = 5MC Data

Figure 17: Effects of interfering targets on PD vs m shown at RSS = 20 dB. Note that theresults from 106 Monte Carlo simulations is displayed with the dotted curves, showing closeagreement with the results in (61). α is found via (53) for PFA = 10−4.

46

of many CFAR detectors. Unfortunately, this task is difficult to accomplish if the number

of interfering targets and their associated RSS values are unknown. Fortunately, methods,

such as the Forward Automatic Order Selection Ordered Statistics Detector (FAOSOSD)

[25], have been proposed to provide an estimate for the number of interfering targets present.

However, the accuracy of the estimate provided by the FAOSOSD algorithm has been shown

to depend on the RSS of the interfering targets [4]. Thus, while a decision may be suitable at

one value of RSS, it may not necessarily be suitable at a different value. Thus, an information

elasticity framework is proposed to improve the system’s robustness to the variations in

RSS. In this section, the performance of the FAOSOSD estimate is analyzed, a performance

function is defined, and a measure of robustness and absolute performance are determined.

4.2.1 Estimation of J using FAOSOSD

In [25], an estimation technique known as information theoretic criteria (ITC) is used to find

an estimate for J (denoted by J). ITC are generally used to compare different statistical

models. The ITC provide a measure of quality for these different models, given a set of

observations [26], [27]. These criteria were first used in a signal detection application in [28],

2 4 6 8 10 12 14 16 18 20 22 24m

10-8

10-7

10-6

10-5

10-4

10-3

PFA

J = 0J = 1J = 2J = 3J = 4J = 5

Figure 18: Effects of interfering targets on PFA vs m for RSS = 10 dB. α is found via (53)for PFA = 10−4.

47

where ITC were used to estimate the number of signals present in observed multichannel

time-series data. This technique is applied in the FAOSOSD [25], comparing different models

describing the number of interfering targets present.

One particular type of ITC is known as minimum description length (MDL), and is first

given in [27]. The form of the MDL used in the FAOSOSD is given as [25]:

MDL(n) = −(K − n)K ln

(G(λn+1, . . . , λK)

A(λn+1, . . . , λK)

)+

1

2n(2K − n) ln(K) (63)

where λ1 ≥ λ2 ≥ . . . ≥ λK represent the ordered secondary samples, and G(·, . . . , ·) and

A(·, . . . , ·) represent the geometric and algebraic means of their arguments, respectively.

The value of n at which MDL(n) reaches a minimum (denoted as n∗) is considered to be the

model that best fits the observed data, and the K − n∗ + 1 largest samples are assumed to

come from interfering targets [14]. Thus, the estimated number of interfering targets is:

J = K − argmaxn

[MDL(n)] + 1 (64)

Using Monte Carlo simulations, the general performance of this algorithm is approxi-

mated. FAOSOSD is used to estimate J for K simulated secondary samples where J inter-

fering targets are present. This process is repeated in 106 separate Monte Carlo simulations.

Using the relative frequencies of J , the conditional probability mass function P (J |J) is ap-

proximated for 1 ≤ J ≤ K and 1 ≤ J ≤ K. This represents the probability of the FAOSOSD

estimating J given that J interfering targets exist. Using Bayes’ rule:

P(J |J) =P(J)P(J |J)∑KJ=0 P(J)P(J |J)

where P (J |J) represents the a posteriori PMF, describing the probability that J interfering

targets are present given that the FAOSOSD has estimated J . P (J) represents the a priori

PMF of J . If this knowledge of J is unknown to a user, then ignorance may be assumed,

48

and each value of J is considered equally likely to occur. In this case, a uniform distribution

is assumed for P (J), yielding the following:

P (J |J) =P(J |J)∑KJ=0 P(J |J)

This a posteriori PMF is shown for J = 5 and K = 20 in Figure 19. This figure

shows that at an RSS of 20 dB, the probability of correctly estimating J is larger than the

probability incorrectly estimating, i.e. P (J = 5|J = 5) > P (J 6= 5|J = 5). However, as

the RSS decreases to 17 dB, the probability of correctly estimating J decreases, and now

P (J = 5|J = 5) < P (J 6= 5|J = 5). The probability of correctly estimating J drops even

further when RSS is decreased to 7 dB. In general, Monte Carlo simulations have shown that

the FAOSOSD becomes more likely to correctly estimate J as the RSS increases.

0 2 4 6 8 10 12 14 16 18 20J

0

0.5

1

P(J

|J=

5)

0 2 4 6 8 10 12 14 16 18 20J

0

0.5

1

P(J

|J=

5)

0 2 4 6 8 10 12 14 16 18 20J

0

0.5

1

P(J

|J=

5)

RSS = 20 dB

RSS = 17 dB

RSS = 7 dB

Figure 19: P (J |J) shown for K = 20 and different RSS values.

49

4.2.2 Performance Function for OS-CFAR

This a posteriori distribution for J is used to produce a function describing the perfor-

mance of a decision given J from the FAOSOSD algorithm. As discussed in Section 1.1,

the performance varies depending on the context/application and the DM. In this particular

application, it is desired for PD to be high and PFA to be close to its prescribed value. Thus,

we define a function which increases as PFA approaches its desired value and as PD increases:

ψ0 = P 2D exp

(−∣∣∣∣1− PFA

P ∗FA

∣∣∣∣) (65)

where P ∗FA is the prescribed PFA value. Note that this is merely an example of a performance

function that can be used. Other functions can be defined to match the DM’s preferences.

This performance function ψ0 is shown in Figure 20 for P ∗FA = 10−4.

Clearly, ψ0 is a function of PD and PFA. Furthermore, PD and PFA are functions of

1

0

10-6

PD

0.5

0.2

0.4

10-4

PFA

ψ0

0.6

10-2

0.8

0

1

100

Figure 20: ψ0 shown for P ∗FA = 1 · 10−4.

50

α, m, J , and RSS, as discussed in Section 4.1.2. Thus, ψ0 is also a function of these four

variables. Let the decision variables α and m be contained in the vector x = (α,m). The

performance function can thus be represented as ψ0(x, J,RSS). To account for the fact that

J is unknown, the conditional mean of this performance function is obtained using the a

posteriori distribution of J , as follows:

ψJ(x,RSS) =K∑j=0

ψ0(x, J,RSS)P (J |J ,RSS) (66)

This conditional mean can also be thought of as a weighted sum of ψ0 values. The perfor-

mance ψ0(x, J,RSS) is weighted more if J is more probable, given the estimate J .

4.2.3 Robust decision making method

ψJ(x,RSS) provides a relative measure for performance of a decision x at a specific RSS

value. However, as discussed, the RSS of the interfering targets is unknown. Let A =

{A1, A2, . . . , AM} represent a vector containing M possible RSS values for the interfering

targets. The goal is to select a decision x that generally performs well at the RSS values

within A (increased absolute performance), while also reducing the sensitivity of the per-

formance function to these RSS values (increased robustness). The concepts of absolute

performance and robustness are characterized using the mean and variance of a performance

metric respectively, in [29]. Thus, the sample mean and variance across values in A are used

in this framework to represent absolute performance and the robustness respectively. The

sample mean and sample variances of ψJ(x,RSS) are obtained across these RSS values as

follows:

µ(x,A) =1

M

M∑i=1

ψJ(x, Ai) (67)

Var(x,A) =1

M − 1

M∑i=1

|ψJ(x, Ai)− µ(x,A)|2 (68)

51

The µ(x,A) and Var(x,A) for many different decision points are shown in Figure 21.

Consider, it is desired for µ(x,A) to be high and Var(x,A) to be low. With this in mind,

there is a clearly defined Pareto frontier for these decision points. Looking at these Pareto

efficient solutions only, a clear trade-off between absolute performance and robustness is

observed.

To select a decision that balances these trade-offs, the normalized distance from the

utopia point is used as a measure of decision effectiveness, E, as discussed in Section 2.2.

This measure is given as follows:

E =

[ µ−maxx

(µ)

maxx

(µ)

]2

+

Var

maxx

(Var)

21/2

(69)

The decision effectiveness is shown as a function of Var in Figure 21. Clearly, as the

measure of robustness improves (Var decreases), the decision effectiveness improves (E de-

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04Var(x,A)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

μ(x,A

)

Figure 21: µ(x,A) vs Var(x,A) shown for J = 5, N = 20, and P ∗FA = 10−4. Points shownfor 8400 decision points and A = {5, 10, . . . , 90, 95}. Pareto frontier is shown by the blackcurve.

52

creases). Eventually, a minimum/overload point is reached. Decreasing Var beyond this

overload point only causes the decision effectiveness to get worse. Thus, information over-

load is observed using this framework, allowing the DM to select the decision with the

maximum decision effectiveness. For this particular example, the overload point is reached

when Var = 0.4577, which occurs at the decision m = 12 and α = 8.8940.

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04Var(x,A)

0.4

0.5

0.6

0.7

0.8

0.9

1

E

Figure 22: Decision effectiveness shown as a function of the measure of robustness, Var.Note that the overload point occurs at the decision m = 12 and α = 8.8940.

Chapter 5

Information Elasticity Framework for

the AMF

Chapters 3 and 4 discussed detectors where the primary and secondary samples consist

of scalar values. This chapter focuses on the adaptive matched filter (AMF), where these

primary and secondary samples are each N × 1 vectors, where N is the dimensionality,

or the spatio-temporal product (as was briefly discussed in Chapter 1). With data being

N -dimensional, the behavior of the disturbance is now characterized using a multivariate

complex Gaussian distribution. This distribution considers an N × 1 mean vector and an

N ×N covariance matrix, represented by Σ.

Just as the scalar CFAR detector discussed in Section 3.2 used a test statistic involving

an estimate for σ2, the AMF uses a test statistic involving an estimate for Σ. Two methods

of estimating Σ are analyzed in this paper, namely sample matrix inversion (SMI) and rank

constrained maximum likelihood estimation (RCML). These methods each produce an esti-

mate from K secondary samples, and when these samples assumed to be homogeneous and

disturbance only, the performance of the AMF improves as K increases, up to a theoretical

limit [30]. This limit is the performance of a detector, known as the clairvoyant detector

where the disturbance covariance matrix is known.

53

54

While increasing K provides improved performance when secondary samples are assumed

to be homogeneous and target free, in practice, this choice of K is often constrained due to

the fact that only a certain number of homogeneous training samples are available within

a given environment. Unfortunately, radar data is often non-homogeneous in practice, and

increasing K increases the likelihood that the training samples contain a non-homogeneity

[9] [31], such as an interfering target or different clutter type. As these non-homogeneities

cause the detection performance to suffer, a high number of training samples is undesirable.

Thus, a DM must consider these trade-offs when selecting K.

This chapter develops an information elasticity framework for selecting decision param-

eters for the AMF. In this framework, the decision quality metric is based on a comparison

between the AMF performance and the clairvoyant detector performance. The clairvoyant

detector is based on the likelihood ratio test, which is derived in [32] and [8] and included

in section 5.1 for completeness. The PD for the AMF is originally derived in [8], and the

derivation is given in Section 5.2 for completeness. In this chapter, we define a constraint

function with user-tunable parameters, allowing a DM to specify the level of cost associ-

ated with using different decisions. Using this framework, information overload is observed,

allowing the DM to select cost-efficient solutions.

5.1 Clairvoyant Detector

5.1.1 Likelihood Ratio Test

As discussed, the primary data now takes the form of a N ×1 vector, represented by x. Just

as in Section 3.2, this observed data must follow one of two hypotheses:

H0 : x = n

H1 : x = sa+ n

55

where H0 is the disturbance only hypothesis and H1 is the target + disturbance hypothesis.

The n vector is assumed to follow a multivariate complex Gaussian distribution with mean 0

and a covariance matrix Σ. Furthermore, a is the same deterministic and unknown complex

scalar introduced in Section 3.2.1, and s is the steering vector, which is deterministic and

known.

Just as in Section 3.2.1, the detection test is set up using the likelihood ratio. The

conditional distributions of the observation vector are given as [17]:

fx|H1(x|H1,Σ) =1

πN |Σ|e−(x−as)HΣ−1(x−as)

fx|H0(x|H0,Σ) =1

πN |Σ|e−x

HΣ−1x

Again, the ratio of these conditional distributions is taken, and its natural log is considered:

Λ(x) = exHΣ−1x−(x−as)HΣ−1(x−as)

σ2

Λ(x) = ln Λ(x) = xHΣ−1x− (x− as)HΣ−1(x− as)

Λ(x) = 2Re(a∗sHΣ−1x

)− |a|2sHΣ−1s (70)

Again, the maximum likelihood estimator for a is obtained via differentiation:

∂

∂aΛ(x) =

(2sHx− 2asHs

)= 0

2sHΣ−1x = 2asHΣ−1s

a∗ =sHΣ−1x

sHΣ−1s

where a∗ is the maximum likelihood estimate of a. Substituting this in for a in equation (70)

56

yields:

Λ(x) = 2

(sHΣ−1x

sHΣ−1s

)∗sHΣ−1x−

∣∣∣∣sHΣ−1x

sHΣ−1s

∣∣∣∣2 sHΣ−1s

=

(2

∣∣sHΣ−1x∣∣2

sHΣ−1s−∣∣sHΣ−1x

∣∣2sHΣ−1s

)

Λ(x) =

∣∣sHΣ−1x∣∣2

sHΣ−1s

H1

≷H0

η (71)

This is the likelihood ratio test for the observed primary data vector when the true covariance

matrix Σ is known. Thus, Λ(x) is used as the sufficient statistic for the clairvoyant detector.

The distribution for Λ(x) is given in the following subsection.

5.1.2 PD and PFA of Clairvoyant Detector

Let y = sHΣ−1x√sHΣ−1s

. Clearly, the test statistic in (71) can be rewritten as Λ(x) = |y|2. Further-

more, let w = Σ−1s√sHΣ−1s

, which is clearly a deterministic N × 1 vector. Consider, y = wHx.

Thus y can be thought of a linear combination of elements in x, which follow a multivariate

complex Gaussian distribution. It is well known that a linear combination of Gaussian ran-

dom variables also produces a Gaussian random variable [16], and thus y must be Gaussian.

Although y is known to be Gaussian, its mean and variance depend on which hypothesis

is being considered. For H0, the mean and variance are found as follows:

E [y|H0] = E[wHx|H0

]= wH · E [x|H0] = 0

Var(y|H0) = E(|wHx|2 |H0)− E(wHx |H0)2

= E(wHxxHw |H0)− 0

= wHE(xxH |H0)w

57

= wHΣw

=sHΣ−1ΣΣ−1s

sHΣ−1s= 1.

Thus, y ∼ CN (0, 1) under hypothesis 0. Similarly:

E [y|H1] = E[wHx|H1

]= wH · E [x|H1] = wHas

= awHs

Var(y|H1) = E[|wHx|2 |H1

]− E

[wHx|H1

]2= wHE

[xxH |H1

]w − a2wHssHw

= wHE[(as− n)(as− n)H

]w − a2wHssHw

= wHE[a2ssH + 2asnH + nnH

]w − a2wHssHw

= wH

[a2ssH + 2as

��*0

E(nH)

+ E(nnH)

]w − a2wHssHw

=��a2wHssHw + wHΣw −��

��a2wHssHw = 1

Thus, y ∼ CN (awHs, 1) under hypothesis 1.

Consider, the w vector can be thought of as filter weights, x can be thought of as the

input of the filter, and y = wHx can be thought of as the output of the filter. Furthermore,

now that the distribution of y under both hypotheses is known, the following can be written:

H0 : y = n

H1 : y = awHs+ n

58

where n ∼ CN (0, 1). Using this consideration, the SNR of this output signal can be found

the same way as in Section 3.2.2:

SNR = A =|awHs|2

E(n2)

=|a|2wHssHw

1

= |a|2 sHΣ−1ssHΣ−1s

sHΣ−1s

= |a|2sHΣ−1s (72)

Furthermore, it’s clear that√A = awHs, thus y ∼ CN (0, 1) under hypothesis 0 and y ∼

CN (√A, 1) under hypothesis 1. From the discussion in Section 3.2.2, Λ(x) = |y|2 clearly is

exponentially distributed under hypothesis 0 and follows a non-central χ2 distribution under

hypothesis 1, as follows:

fΛ|H0(t) = e−t for t > 0 (73)

fΛ|H1(t) = e−(t+A)I0(2√At) for t > 0 (74)

Thus, using equation (20), the PFA for the clairvoyant detector is as follows:

PFA =

∫ ∞η

e−tdt

PFA = e−η (75)

where η is the threshold term given in (71). For this detector, the PFA can be set to a

desired value by setting the threshold term to η = − ln(PFA). Using (22), PD can be found

as follows:

PD =

∫ ∞η

e−(t+A)I0(2√At) dt

= e−A∫ ∞η

e−t∞∑m=0

(√At)2m

(m!)2dt

59

= e−A∞∑m=0

Am

m!

∫ ∞η

tme−t

m!dt

= e−A∞∑m=0

Am

m!Γ(η,m+ 1) (76)

where Γ(η,m+ 1) =∫∞η

tme−t

m!dt is the normalized upper incomplete gamma function. Note

that both the PFA and PD for these detectors are independent of the dimensionality N .

If the Swerling I model is assumed, the random SNR fluctuation term distributed by (39)

is applied, and the expectation is taken:

PD,Swerling I = Eγ

[e−Aγ

∞∑m=0

(Aγ)m

m!Γ(η,m+ 1)

]

=∞∑m=0

Eγ[e−Aγγm

] Amm!

∫ ∞η

tme−t

m!dt

=∞∑m=0

[∫ ∞0

e−γ(A+1)γm]Am

m!

∫ ∞η

tme−t

m!dt

=∞∑m=0

m!

(1 + A)m+1

Am

m!

∫ ∞η

tme−t

m!dt

=

∫ ∞η

e−t

1 + A

∞∑m=0

1

m!

(At

1 + 1

)mdt

=1

1 + A

∫ ∞η

e−tet(A/(1+A))dt

= e−η/(1+A) (77)

5.2 Adaptive Matched Filter

The AMF uses the likelihood ratio test given in (71) and replaces the true interference

covariance matrix Σ with an estimated covariance matrix Σ:

∣∣∣sHΣ−1x∣∣∣2

sHΣ−1s

H1

≷H0

η (78)

60

5.2.1 Sample Matrix Inversion

As discussed, Σ can be estimated using different methods. This section focuses on sample

covariance estimation, otherwise known as SMI. Using K secondary data samples represented

as x(k) for k = {1, 2, . . . , K}, the estimated covariance matrix is as follows:

Σ =1

K

K∑k=1

x(k)x(k)H (79)

where (·)H represents the Hermitian transpose. Note that Σ is known to follow the Wishart

distribution when x(k) is a multivariate zero mean complex Gaussian variable [33].

The distribution of the sufficient statistic given in (78) is obtained to find PFA and PD

via (22) and (20). However using the form given in (78), this distribution is difficult to

obtain. Reference [8] derives this distribution by rewriting the sufficient statistic in terms of

rotation and whitening matrices. Through rewriting this hypothesis test, it is shown that

this sufficient statistic is independent of the true covariance matrix Σ, implying that the

AMF is indeed CFAR. Ultimately, the hypothesis test in (78) is simplified to the following:

|v|2

T

H1

≶H0

αρ (80)

where α is the detection threshold, T ∼ χ2(K + 1 − N), ρ ∼ β(K + 2 − N,N − 1), and

v ∼ CN (a√ρ√sHΣ−1s, 1). Recall from equation (72) that the SNR of the clairvoyant

detector is given as |a|2sHΣ−1s. Thus, v ∼ CN (√ρA, 1), where A is the clairvoyant SNR.

Note also that a = 0 for the H0 case. The full derivation for the simplification given in (80)

is given in [8], and is provided in Appendix for completeness.

The PDFs for T and ρ are given as follows:

fT (t) =tL−1

(L− 1)!e−t for t > 0 (81)

61

fρ(ρ) =K!

L!(N − 2)!ρL(1− ρ)N−2 for 0 < ρ < 1 (82)

Crucially, note that the hypothesis test for the N = 1 case (given in (32)) has a form

that is very much similar to the hypothesis test given in (80). The two are compared below:

|y|2

T

H0

≶H1

α (83)

|v|2

T

H0

≶H1

ρα (84)

where y ∼ CN (√A, 1), and v ∼ CN (

√ρA, 1). Note also that, T ∼ χ2(K) for the N = 1

case, and T ∼ χ2(L) for the multidimensional case. Thus, the decision statistic for the

multidimensional AMF can be formed by taking the decision statistic of the scalar detector,

applying the ρ term to A and α, and replacing K with L [8]. Since ρ is a beta random

variable, it randomly takes values between 0 and 1. Thus, ρ can be thought of as a random

”loss factor” being applied to the clairvoyant SNR A and the threshold α.

Thus, PFA and PD for the multidimensional case are found by using the PD and PFA

equations for the scalar case (for example, equations (38), (42) or (43) for PD, and equation

(36) for PFA). The K term is replaced with L, the random loss factor ρ is applied to the

α and A terms, and finally the expected value is taken according the the distribution (82).

Thus:

PD =

∫ 1

0

PD,N=1(ρα, ρA, L)fρ(ρ)dρ (85)

PFA =

∫ 1

0

PFA,N=1(ρα, L)fρ(ρ)dρ (86)

The equation for PFA for the N = 1 case is the same for all fluctuation models, and is

given in (36). Thus, the following is defined:

PFA,N=1(α,K) = (1 + α)−K

62

Using this definition, the equation for PFA for the multidimensional case is as follows:

PFA =

∫ ∞0

(1 + ρα)−K(

K!

L!(N − 2)!ρL(1− ρ)N−2

)dρ

=K!

L!(N − 2)!

∫ ∞0

ρL(1− ρ)N−2

(1 + ρα)Ldρ (87)

Unfortunately, this equation does not easily yield a closed form for α, thus numerical tech-

niques are used to obtain this threshold variable. In this thesis, a given PFA is specified, and

α is computed using a line search method implemented in MATLAB.

Using α, PD can be found for the multidimensional case. The equation for PD for the

N = 1 case differs between fluctuation models. This paper has so far presented PD,N=1

for the non-fluctuating case (equation (38)), the Swerling I case (equation (42)), and the

Swerling III case (equation (43)). Considering the equation for PD in the non-fluctuating

case, the following is defined:

PD,N=1(α,A,K) = e−A∞∑m=0

Am

m!I 1

1+α(K,m+ 1)

Using this definition, the equation for PD in the multidimensional case is given as follows:

PD =

∫ ∞0

e−ρA

(∞∑m=0

(ρA)m

m!I 1

1+ρα(L,m+ 1)

)(K!

L!(N − 2)!ρL(1− ρ)N−2

)dρ (88)

The exact PD values in (88) can be computed using numerical integration. The result of this

numerical integration is shown below in Figures 23, 24 and 25.

63

0 5 10 15 20 25SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

K = 20

K = 30

K = 40

Figure 23: PD vs SNR for PFA = 1 · 10−4 and N = 10 shown for different values of K.

0 5 10 15 20 25SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

K = 60

K = 90

K = 120

Figure 24: PD vs SNR for PFA = 1 · 10−4 and N = 30 shown for different values of K.

64

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

K = 20

K = 40

K = 60

Figure 25: PD vs SNR for PFA = 1 · 10−4 and N = 20 shown for different values of K. Notethat the degenerate N = K case is shown in blue.

Recall from (78) that the test statistic for the AMF uses the inverted matrix Σ−1. When

using sample covariance matrix estimation to estimate Σ, as in (79), at least K ≥ N samples

are required for Σ to be invertible [34]. However, even when K = N , the matrix Σ is barely

singular, and the performance of the detector is extremely poor, as shown in Figure 25.

5.2.2 Rank Constrained Maximum Likelihood Estimation

As a general rule of thumb, the AMF requires at least K = 2N homogeneous secondary

samples in order for the random loss factor ρ to have an expectation below approximately 3

dB [8]. Unfortunately, in many cases, such as when the dimensionality of the radar is very

large, it is unrealistic that K = N homogeneous samples exist, let alone K = 2N . In these

cases, sample covariance estimation produces very poor performance within the AMF, due

to the non-homogeneous training samples.

Thus, the reduction of the required number of training samples for covariance matrix

estimation has become an important topic of research. Certain methods have been proposed

65

which are able to produce invertible covariance matrix estimates using less than N samples,

such as fast maximum likelihood (FML) estimation [35] and rank constrained maximum

likelihood (RCML) estimation [36]. In this section, the RCML estimation method is outlined,

since it has been shown to greatly outperform other estimation methods in many different

metrics, including the detection performance of the AMF [30].

The RCML estimation method is proposed in [37], and assumes that the true disturbance

covariance matrix Σ has a specific structure, splitting the disturbance covariance matrix into

components of noise and clutter:

Σ = σ2I + Σc (89)

where σ2 is the noise power, I is the N ×N identity matrix, and Σc is the clutter covariance

matrix. This estimation method also assumes that the clutter covariance matrix Σc is positive

semidefinite, is rank deficient (has a rank less than N), and has a known rank r.

The rank of the clutter matrix can be found via the Brennan rule [38] when certain

operating conditions are met (mainly in airborne radar scenarios [36]). This rule states the

following:

rank(Σc) = J + γ(P − 1) (90)

where J is the number of spatial array elements being used, P is the number of pulse-

repetition intervals being used, and γ is the slope of the clutter ridge. Note that the dimen-

sionality is the spatio-temporal product: N = JP .

The derivation of the RCML estimation method is given in [37],[36], and is briefly de-

scribed here. Similar to sample covariance estimation, the RCML estimate is obtained using

K training samples. The log likelihood of observing these K training samples, given some

covariance matrix Σ, is derived assuming complex Gaussian disturbance statistics. RCML

estimation then amounts to finding the Σ that maximizes this log likelihood. This maxi-

66

mization problem is simplified to minimizing the following convex function:

dtλ− 1T log λ (91)

where d is an N length vector containing eigenvalues of 1σ2 S in descending order, and λ is an

N length vector containing eigenvalues of σ2R−1 in ascending order. Note that S is defined

as the sample covariance matrix, given in (79). The λ vector that minimizes equation (91)

is found using convex optimization, i.e. :

λ∗ = argminλ

(dtλ− 1T log λ

)(92)

Using λ∗, the estimated covariance matrix Σ is found as follows:

Σ = σ2VΛ∗−1V (93)

Where V is the eigenvector matrix of S (from the eigendecomposition S = VDV), and Λ is

a diagonal matrix containing the elements of λ∗.

Note that equations (92) and (93) both require knowledge the noise power σ2. This can

typically be estimated by finding the thermal or ”kTB” noise and applying the relevant noise

factor terms, or by collecting receiver data when the radar is in receive only mode [39][9].

Reference [36] also provides a method of obtaining the RCML estimate when σ2 is unknown

but its lower bound is known.

Note also that [36] also provides a closed form for λ∗, which is obtained using convex

optimization techniques. This is given as follows:

λ∗i =

min(1, 1

di) for i = 1, 2, . . . , r

1 for i = r + 1, r + 2, . . . , N

(94)

67

where λ∗i represents the ith element of λ∗, and di represents the ith element of d.

The Σ found in (93) can thereby be used as the covariance matrix estimate in equation

(78). Note that when RCML is used in the AMF, PD is dependent on the SNR and Σ [30].

Unlike the PD vs SNR relationship for the SMI case, PD does not easily permit a closed

form when RCML estimation is used. Thus, these results are obtained empirically. Using a

true covariance matrix that matches the structure given in equation (89), training samples

are generated for RCML estimation. RCML estimation is used on these training samples to

calculate the α required to achieve the desired PFA value. This threshold value α is thereby

used to calculate the detection probability at different SNR values. Note that the methods

used to find PD vs SNR in this thesis are similar to the methods used in [30]. These PD vs

SNR curves are shown for two different true covariance matrices of difference sizes in Figures

26 and 27.

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

Figure 26: PD vs SNR curves shown for N = 16, r = 7, K = {3, 4, . . . , 24}, and PFA = 10−4.

68

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

Figure 27: PD vs SNR curves shown for N = 24,r = 9, K = {3, 4, . . . , 24}, and PFA = 10−4.

5.2.3 Additional SNR required for clairvoyant performance

The PD vs SNR curves serve as a means of comparing the relative performance between

detectors. However, when PFA and SNR are held constant, it is well known that PD has a

theoretic upper bound [30], characterized by the PD of the clairvoyant detector given in (76).

In general, as K increases, the PD vs SNR curves approach that of the clairvoyant detector,

as shown in Figure 28 below:

Note that the clairvoyant detector is able to achieve any PD value at a lower SNR than

the AMF detectors are able. In other words, some additional amount of SNR is required

for the AMF detectors to reach clairvoyant performance. This can also be thought of as an

”SNR loss”, due to the fact that K training samples are used for AMF detection since the

covariance matrix is unknown. Note that this loss is shown to decrease as K increases.

Reference [40] analyzes this SNR loss for a different, but similar multidimensional CFAR

detector known as the generalized likelihood ratio test (GLRT) detector. In this reference,

Kelly states that this SNR loss is partly due to effective loss factor term ρ (described in

Section A.0.1), and partly due to other losses. Thus, note that the ”SNR loss” described

69

0 5 10 15 20 25SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

N = 5

0 5 10 15 20 25SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

N = 15

Figure 28: PD vs SNR for PFA = 10−4 and K = {N, . . . , 200}, shown for different N values.Note that darker curves represent PD values of larger K values.

in this section refers to the additional SNR required for the AMF to reach clairvoyant

performance, and not the ρ term described in Section A.0.1.

Note that this SNR loss term can be found using the formulations for PD of the clairvoyant

detector and AMF, given in (76) and (88) respectively. Using these equations, the PD is fixed

to some value, and the SNR that achieves this PD is obtained for both detectors. Finally,

the ratio between these SNRs yields the SNR loss (the SNR values are subtracted if units

are in decibels).

Note, however, that equations (76) and (88) do not easily permit a closed form for the

SNR. Thus, line search methods (along with the numerical methods required to calculate

(76) and (88)) are used to solve for the SNR loss. These results are shown in Figure 29

below. Clearly, as K increases, the SNR loss approaches 0 dB. Furthermore, the rate at

which the SNR converges differs depends on what N value is being used.

When RCML estimation is used, a simple interpolation method is used to obtain the

SNR loss, since PD does not easily permit a closed form. The SNR loss is taken at PD = 0.6,

since the PD vs SNR curves appear to be linear in this region, as shown in Figure 26 and 27.

Linear interpolation is used in this region to approximate the SNR at which PD = 0.6 for

70

the RCML detector. The ratio between this SNR and the clairvoyant SNR is taken, yielding

the SNR loss. Figure 30 shows the SNR loss as a function of K for two RCML detectors.

Clearly, the SNR loss is dependent on what N and r values are being used.

0 20 40 60 80 100 120 140 160 180 200K

0

5

10

15

20

25

30

35

40

45SNR

loss

[dB]

N = 5N = 10N = 15N = 20N = 25N = 30

Figure 29: SNR loss for AMFs of different N values, shown as a function of K.

0 5 10 15 20 251

2

3

4

5

6

7

8

9

10

Figure 30: SNR loss for RCML AMFs of different N and r values, shown as a function ofK.

71

5.3 Information Elasticity Framework for the AMF

To summarize the previous analysis, it is clear that as the number of secondary samples,

K, increases, the PD of a detector at a given SNR improves up to an asymptotic limit.

The convergence of performance towards this limit can be characterized by the SNR loss, as

shown in Figure 29. However, all of the analysis used to find the SNR loss is based on the

assumption that the K secondary samples are all independent and identically distributed as

disturbance only (thus, homogeneous).

As discussed in Chapter 4, the radar data used as secondary samples are often non-

homogeneous, due to the interference environment and other system factors. Furthermore,

the detection performance has been shown to degrade significantly as a result of using non-

homogeneous training data [21]. Thus, a DM’s selection of K is often restricted or con-

strained based on how many homogeneous training samples are available in a given environ-

ment.

Thus, the choice of K should not solely be based on reducing the SNR loss, but it should

also be based on avoiding the use of non-homogeneous training samples. In general, the

likelihood that the secondary samples contain non-homogeneities increases as K increases [9,

31]. Thus, a trade-off behavior is observed between these two factors. Using the information

elasticity framework, a DM can weigh the cost and benefit of using different K.

Note that the DM also selects a desired PFA value for the AMF, which in turn determines

what the threshold α is set to. Thus, this desired PFA value also affects the PD and the

SNR loss, by virtue of α. In general, as PFA increases, the SNR loss decreases. However,

increasing the PFA is also more costly for a DM, since it increases the number of false alarms.

Thus, the selection of the desired PFA also exhibits a trade-off behavior.

In this particular application, the information quantity parameter is represented by a

vector of length 2: Q = {PFA, K}. K clearly represents the quantity of training data samples

being used to estimate the disturbance covariance matrix. PFA on the other, represents a

probability rather than a quantitative value. However, the desired PFA is directly related to

72

the quantity of false alarms per unit time [9]. The decision quality metric is defined to be the

SNR loss, since it characterizes the relative performance of the detector being considered.

Furthermore, the SNR loss is a function of both PFA and K, and is thus represented as

D(Q).

The constraint function C(Q), on the other hand, is defined to be a function describing

the relative cost of using a particular decision Q. As discussed, a larger K is associated

with a higher likelihood of non-homogeneities, and a larger PFA is associated with more

false alarms. Thus, C(Q) is defined such that an increase in either parameter produces

an increase in C(Q). Furthermore, these constraining factors are highly dependent on the

context, environmental factors, and the preference of the DM. Thus, C(Q) is defined with

user-tunable parameters, allowing a DM to define the relative costs of Q to fit his/her given

application.

Generally, a DM does not consider every possible decision Q. For example, K = 105 is

not a reasonable choice, since it is highly unlikely that 105 homogeneous training samples

are available in any realistic environment. Similarly K = N is not a reasonable choice, since

the SNR loss is unreasonably large for this case. The same logic goes for selecting PFA. For

example, a DM would most likely not select PFA = 0.5, since this would produce far too

many false alarms. Similarly, a DM would not select PFA = 0, since PD = 0 when the PFA

is this low. Given these factors, a DM may choose to select lower and upper bounds of PFA

and K as follows:

a ≤ PFA ≤ b (95)

c ≤ K ≤ d (96)

Note that the selection of these bounds depends on the context and preferences of the DM.

Given decisions in this domain, an approximation for the SNR loss (D(Q)) when SMI is used

is derived in Section 5.3.1. Furthermore, the user-tunable cost function of these decisions

(C(Q) is defined in Section 5.3.2

73

5.3.1 Approximation for SNR loss

The numerical methods used to calculate the SNR loss in Figure 29 are quite computation-

ally expensive and burdensome, especially when having to calculate the SNR loss of many

different decisions. Thus, a closed form approximation for this SNR loss is derived, using

the PD for the Swerling I case. Note that this fluctuation model is considered here because

the equation for PD is much simpler than that of the non-fluctuating case.

From (42) and (85), the AMF detection probability under the Swerling I fluctuation

model is given as follows:

PD =

∫ 1

0

(1 + Aρ

1 + (α + Aρ)

)LK!ρL(1− ρ)N−2

L!(N − 2)!dρ (97)

Furthermore, the clairvoyant detection probability under the Swerling I fluctuation model is

given in (77). Consider the PD conditioned on the random loss factor ρ:

PD|ρ =

(1 + Aρ

1 + (α + A)ρ

)L=

(1ρ

+ A1ρ

+ α + A

)L

(98)

Since the integral in (97) cannot easily be evaluated without using numerical tools, an

approximation is used instead. This approximation is obtained by considering the fact that

the variance of 1/ρ is negligible compared to other terms in (98). This behavior is verified

using Monte Carlo simulations, but is also explained below.

Note that the distribution of ρ is given in (82). Thus, the variance of 1/ρ is derived as

follows:

Var (1/ρ) = E(1/ρ2)− E(1/ρ)2

E(1/ρ2) =

∫ 1

0

1

ρ2

K!ρL(1− ρ)N−2

L!(N − 2)!dρ

74

=K(K − 1)

L(L− 1)

∫ 1

0

(K − 2)!ρL−2(1− ρ)N−2

(L− 2)!(N − 2)!dρ︸︷︷︸

=1

=K(K − 1)

L(L− 1)

E(1/ρ) =

∫ 1

0

1

ρ

K!ρL(1− ρ)N−2

L!(N − 2)!dρ

=K

L

∫ 1

0

(K − 1)!ρL−1(1− ρ)N−2

(L− 1)!(N − 2)!dρ︸︷︷︸

=1

=K

L(99)

Var(1/ρ) =K(N − 1)

L2(L− 1)

=m(N − 1)

(m− 1)(N(m− 1) + 1)2(100)

where m is defined to be the ratio m = KN

. Since K > N is necessary for the estimate in

(79) to be non-singular, m must be greater than 1. Consider, PD|ρ ≈ 0 when the SNR is

low, regardless of what value of m is used. Consider the cases when m is near 1, and when

m >> 1.

When m is near 1, we can consider that K is close in value to N . When this is the case,

the PDF of ρ has larger likelihoods at lower ρ values. This causes α to become large, due to

the nature of (87). Intuitively, this is because the threshold term α is affected by the random

loss factor term. As ρ is more likely to be small, the threshold term α should be selected to

be larger to account for this random loss factor. Since α is large, it is generally true that

α >> (1/ρ + A) for low SNR when m is near 1. Thus, PD|ρ ≈ 0. However, as the SNR

increases, α >> (1/ρ+A) will no longer be true, and PD|ρ 6≈ 0. Since α is relatively large in

this case, the SNR at which this occurs is also relatively large. Furthermore, A >> Var(1/ρ)

when the SNR is large enough such that PD|ρ 6≈ 0.

75

Consider also the case when m >> 1. We can consider that K >> N in this case.

Furthermore, since L = K − N + 1, we can also consider L >> 1 in this case. Due to the

size of L, PD|ρ =

(1ρ

+A1ρ

+α+A

)Lwill be approximately 0 for small

1ρ

+A1ρ

+α+A. This is the case

when the SNR is low, i.e. PD|ρ ≈ 0 for low SNR. However, as the SNR increases, eventually

PD|ρ 6≈ 0. Furthermore, when the SNR increases, it is generally true that A >> Var(1/ρ),

since Var(1/ρ) is extremely small when m >> 1, as shown in equation (100).

To summarize the statements made above, the SNR is either low enough such that

PD|ρ ≈ 0, or the SNR is large enough such that A >> Var(1/ρ). Thus, if the variance of 1/ρ

can be thought of as negligible, we can approximate PD by replacing the 1/ρ term in (98)

by its expectation, which is given in (99). Thus:

PD = Eρ(PD|ρ) ≈

E[

1ρ

]+ A

E[

1ρ

]+ α + A

L

PD ≈

(KL

+ AKL

+ α + A

)L

(101)

This approximation shows very close agreement with the PD given in equation (97) calculated

using numerical integration. This close agreement holds for many different PFA, K and N

values, as shown in Figures 31, 32, and 33. Using equations (77) and (101), the SNR loss for

the Swerling I case can be approximated to be:

SNR loss ≈

(α

P1/LD

1− P 1/LD

− K

L

)log(PD)

log(PFAPD

)(102)

Although this approximation was derived using the Swerling I fluctuation model, this ap-

proximation also fits the SNR loss for the nonfluctuating case very closely as well, as shown

in Figure 34.

76

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1PD

N = 5, PFA = 10−4

PD

PD Approximation

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

N = 5, PFA = 10−5

PD

PD Approximation

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

N = 5, PFA = 10−6

PD

PD Approximation

Figure 31: Comparison of PD obtained using numerical methods as in (97) and approximationas in (101). PD is shown for N = 5, PFA = {10−4, 10−5, 10−6} values and K = {5, 10, . . . , 50}.

77

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1PD

N = 50, PFA = 10−4

PD

PD Approximation

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

N = 50, PFA = 10−5

PD

PD Approximation

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

N = 50, PFA = 10−6

PD

PD Approximation

Figure 32: Comparison of PD obtained using numerical methods as in (97) and approx-imation as in (101). PD is shown for N = 50, PFA = {10−4, 10−5, 10−6} values andK = {50, 51, . . . , 100}.

78

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1PD

N = 500, PFA = 10−4

PD

PD Approximation

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

N = 500, PFA = 10−5

PD

PD Approximation

0 10 20 30 40 50 60 70SNR [dB]

0

0.2

0.4

0.6

0.8

1

PD

N = 500, PFA = 10−6

PD

PD Approximation

Figure 33: Comparison of PD obtained using numerical methods as in (97) and approx-imation as in (101). PD is shown for N = 500, PFA = {10−4, 10−5, 10−6} values andK = {500, 510, . . . , 600}.

79

"approximations fixed".pdf

Figure 34: SNR loss as a function K. Calculations from numerical methods in (97) andapproximation in (102) are displayed together, showing close agreement.

5.3.2 User-defined constraint function

As discussed, the cost function should increase when either K or PFA are increased. Thus,

consider the following functions:

CPFA(PFA) =

(PFA − ab− a

)nfor a ≤ PFA ≤ b (103)

CK(K) =

(K − cd− c

)mfor c ≤ K ≤ d (104)

Consider, equation (103) represents the component of the total cost associated with

using a given PFA, and (104) represents the component of the total cost associated with

using a given K. CPFA(PFA) is defined to increase from 0 to 1, such that CPFA(a) = 0 and

CPFA(b) = 1. Similarly, CK(K) is defined to increase from 0 to 1, such that CK(c) = 0 and

CK(d) = 1. Finally, the n and m parameters allow the DM to change the rate at which these

functions increase.

These functions are defined as such since they are meant to characterize relative cost.

80

Thus a scale is defined over which decisions can be compared, where 1 is defined to be the

maximum cost and 0 is defined to be the minimum cost. With this in mind, the relative cost

of a decision Q is defined as follows:

C(Q) = λ1CPFA(PFA) + λ2CK(K) (105)

for a ≤ PFA ≤ b and c ≤ K ≤ d

s.t. λ1 + λ2 = 1

λ1 and λ2 are defined as weights characterizing the relative importance of CPFA and CK

respectively. Furthermore, λ1 + λ2 = 1 so that the maximum value of C(Q) is still 1. C(Q)

can thus be thought of as a linear combination of the individual cost components CPFA and

CK . Furthermore, C(Q) increases from 0 to 1 as the parameters in Q increase over the

domains defined in (96) and (95), as desired.

5.3.3 AMF decision effectiveness

As with the previous examples of information elasticity, the decision effectiveness E for this

application is dependent on the decision quality metric D(Q) and the constraint function

C(Q). SinceD(Q) and C(Q) are conflicting criteria which the DM wishes to minimize, multi-

objective optimization techniques are used once again to define the decision effectiveness.

For comparison purposes, two constraint functions are shown in Figures 35 and 36, and are

labelled C1(Q) and C2(Q) respectively. Furthermore, a decision quality metric is shown in

Figure 37, labelled D(Q). For sake of clarity, let DM 1 be a decision maker who defines

C1(Q) as their constraint function and D(Q) as their decision quality metric. Similarly, let

DM 2 be a second decision maker who uses C2(Q) as their constraint function and D(Q) as

their decision quality metric.

81

0

0.2

1e-4 60

0.4

0.6C(Q

)

0.8

8e-5 55

1

6e-5

KPFA

504e-5452e-5

1e-6 40

Figure 35: Constraint function C1(Q) for λ1 = λ2 = 0.5, n = m = 1, a = 10−4, b = 10−6,c = 40, and d = 60.

0

0.2

601e-4

0.4

0.6

C(Q

)

0.8

8e-5 55

1

6e-5

PFA K

504e-5452e-5

401e-6

Figure 36: Constraint function C2(Q) for λ1 = 1/3, λ2 = 2/3, n = 2,m = 4, a = 10−4,b = 10−6, c = 40, and d = 60.

82

1.6

1.8

2

401e-6

2.2

2.4

2.6D(Q

)

2.8

2e-5

3

45

3.2

4e-5

KPFA

506e-5558e-5

601e-4

Figure 37: Decision metric D(Q) for the AMF using SMI and N = 20. Domain parametersare a = 10−4, b = 10−6, c = 40. Note that the PFA and K axes are inverted from the axesin Figures 35 and 36.

Note that each of these functions are defined over the same domain, given by a = 10−4,

b = 10−6, c = 40. Given these functions, each decision in this space has an associated cost

and associated decision quality metric, forming the criterion space. The criterion space for

DM 1 is shown in Figure 38 while the criterion space for DM 2 is shown in Figure 39. Each

of these figures, displays a clearly defined Pareto front within the criterion space. Note that

only 100 equally spaced PFA values per K value are represented in these figures. However,

a higher resolution for PFA may be used if necessary. Furthermore, in these figures, points

of a shared color represent the criteria of a shared K value. Thus, the Pareto front is clearly

made up of decisions containing different K and PFA values.

These Pareto fronts are shown in Figure 40. Note that each Pareto front shares the

nadir and utopia points in the criterion space. Since D(Q) decreases when either K or PFA

increases, D(Q) reaches a minimum when PFA = b and K = d. Note that this is also the

same as the point of maximum cost, or QMax. Furthermore, C(Q) is defined such that it

83

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1C(Q)

2

2.5

3

3.5

4

4.5

5

D(Q

)

Figure 38: C1(Q) and D(Q) for different decision points for N = 20. Note that points of ashared color represent data of a shared K value.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1C(Q)

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

D(Q

)

Figure 39: C2(Q) and D(Q) for different decision points for N = 20. Note that points of ashared color represent data of a shared K value.

84

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1C(Q)

1.8

2

2.2

2.4

2.6

2.8

3

D(Q

)

Pareto Front #1Pareto Front #2Nadir PointUtopia point

{0, 1.687}

{1, 3.0297}

{0, 1.687}

{1, 3.0297}

Figure 40: Pareto fronts for C1(Q) and D(Q), as well as C2(Q) and D(Q). Note that theseare labeled as Pareto front #1 and Pareto front #2 respectively.

always has a minimum value of 0. Thus, the utopia point is always given as:

F0 = {0, D(QMax)}

Similarly, D(Q) reaches a maximum when PFA = a and K = c, which is the point of

minimum cost, or QMin. Furthermore, C(Q) is defined such that it always has a maximum

value of 1. Thus, within this framework, the nadir point is always given as

F0 = {1, D(QMin)}

Just as in Section 4.2, the normalized and weighted L2 distance to the Pareto front is

used to characterize the decision effectiveness E. This function is given as follows:

E =

(w1 [C(Q)]2 + w2

[D(Q)−D(QMax)

D(QMin)−D(QMax)

]2)1/2

(106)

where w1 and w2 represent the weights on C(Q) and D(Q) respectively. The decision

85

effectiveness as a function of relative cost for both DM 1 and DM 2 is shown in Figures 41

and 42 respectively. Note that these figures also portray E using different weighting values

w1 and w2. The overload decisions as well as their associated SNR loss and relative cost

values are provided in Tables 2 and 3.

Table 2: Decisions at which E is minimized for DM 1.

SNR loss Cost K PFA α

w1 < w2 2.8037 dB 0.3952 55 5 · 10−6 0.6458

w1 > w2 3.3155 dB 0.2452 49 5 · 10−6 0.87167

Table 3: Decisions at which E is minimized for DM 2.

SNR loss Cost K PFA α

w1 < w2 2.7336 dB 0.18306 54 2.7 · 10−5 0.56325

w1 > w2 2.9113 dB 0.098678 52 2 · 10−5 0.64004

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1C1(Q)

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

E

w1 = 0.3, w2 = 0.7

w1 = 0.7, w2 = 0.3

Figure 41: Decision effectiveness E of Pareto efficient decisions shown as a function of theircost C1(Q).

86

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1C1(Q)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

E

w1 = 0.3, w2 = 0.7

w1 = 0.7, w2 = 0.3

Figure 42: Decision effectiveness E of Pareto efficient decisions shown as a function of theircost C2(Q).

Note that when w2 is increased, the overload decision’s SNR loss decreases. On the

other hand increasing w1 causes the cost to decreases. This behavior demonstrates that

these weights allow the DM to emphasize the importance of one criterion over the other.

Note also that the overload solutions for DM 2 have a lower SNR loss than the overload

solutions for DM 1. The reason for this can be seen in Figures 35 and 36. In figure 37, the

region associated with low SNR cost is the region where PFA and K are near 10−4 and 60

respectively. C2(Q) exhibits a larger gradient than C1(Q), in this region, since n = 2 and

m = 4 for DM2, and n = m = 1 for DM 1. Thus, the decisions associated with low SNR

cost are generally cheaper for DM 2 than for DM 1, which is also exhibited in the criterion

space, shown in Figures 38 and 39.

More constraint functions are defined in Table 4. Table 5, on the other hand, shows the

specifications of different AMF detectors that are used in different decision quality metrics.

Note that two of these detectors use SMI while two other use RCML. The decision metric E

is obtained, using different combinations of the cost functions in Table 4 and decision metrics

87

Table 4: Constraint function parameters.

λ1 λ2 n m

Ca(Q) 1/4 3/4 4 2

Cb(Q) 3/4 1/4 4 2

Cc(Q) 1/4 3/4 5 10

Cd(Q) 3/4 1/4 5 10

Ce(Q) 1/4 3/4 15 15

Cf (Q) 3/4 1/4 15 15

Table 5: Specification for decision metrics.

Estimation Type N r a b c dDa(Q) SMI 16 N/A 10−4 10−6 30 40

Db(Q) RCML 16 7 10−4 10−6 10 24

Dc(Q) SMI 24 N/A 10−4 10−6 50 60

Dd(Q) RCML 24 9 10−4 10−6 15 24

in Table 5. The overload solutions for these different combinations are given in Table 6. A

comparison of the associated SNR loss and relative cost values of these overload solutions

are also included in this table.

Clearly, from Table 5, the AMF detectors using RCML use a much smaller domain for K

than the AMF detectors using SMI. In particular, Db(Q) and Dd(Q) only consider K values

up to 24, while Da(Q) considers K values up to 40 and Dc(Q) considers K values up to 60.

The domains are defined as such in order to highlight the fact that RCML estimation is able

to perform well in contexts where the amount of usable training data is scarce.

For example, note that both Da(Q) and Db(Q) are defined for detectors with a dimen-

sionality of 16, but the former considers an SMI detector and the latter considers an RCML

detector. Given the same constraint function Cf (Q), the RCML detector has an overload

solution of K = 23, while the SMI detector has an overload solution of K = 59. Although

the SMI detector selects a much higher value for K, the SNR loss of the RCML detector’s

overload solution is still significantly lower.

Note, however, that there is only one case where the overload solution for an RCML

detector does not outperform the overload solution for an SMI detector. This is when the

88

Table 6: Decisions at which E is minimized for different constraint functions and decisionmetrics. Note that w1 = w2 for each decision.

Decision Metric Constraint SNR loss Cost K PFA

Da(Q)

Ca(Q) 3.8877 dB 0.2013 35 4.9e-5

Cb(Q) 3.6149 dB 0.1329 37 3.5e-5

Cc(Q) 3.6617 dB 0.0687 36 7.3e-5

Cd(Q) 3.4017 dB 0.0855 38 5.9e-5

Ce(Q) 3.3645 dB 0.0322 38 7.8e-5

Cf (Q) 3.2565 dB 0.0532 39 6.7e-5

Db(Q)

Ca(Q) 3.0309 dB 0.1929 17 3.9e-5

Cb(Q) 2.3051 dB 0.1294 20 2.3e-5

Cc(Q) 2.0572 dB 0.0432 20 5.9e-5

Cd(Q) 1.8198 dB 0.0545 22 2.7e-5

Ce(Q) 1.7214 dB 0.0286 21 8e-5

Cf (Q) 1.6339 dB 0.0343 22 7.5e-5

Dc(Q)

Ca(Q) 3.2503 dB 0.2080 55 5.4e-5

Cb(Q) 3.0432 dB 0.1704 58 3.5e-5

Cc(Q) 3.0592 dB 0.1328 57 7e-5

Cd(Q) 2.9917 dB 0.0889 58 6.3e-5

Ce(Q) 2.9697 dB 0.0366 58 8.1e-5

Cf (Q) 2.9096 dB 0.0548 59 7e-5

Dd(Q)

Ca(Q) 3.4998 dB 0.1819 19 6.1e-5

Cb(Q) 2.9632 dB 0.1404 21 4.5e-5

Cc(Q) 2.7906 dB 0.0728 22 5.5e-5

Cd(Q) 2.5754 dB 0.0841 23 4e-5

Ce(Q) 2.7107 dB 0.0386 22 8.5e-5

Cf (Q) 2.4782 dB 0.0438 23 6.5e-5

constraint function Ca(Q) is used for the decision metrics Da(Q) and Db(Q). However, note

that the overload solution for the RCML detector outperforms that of the SMI detector

when any other constraint function is used. This is because Ca(Q), in particular, assigns a

very high cost for decisions where the SNR loss is low, causing the overload solution to select

values of K that produce a large SNR loss. Furthermore, Dd(Q) is defined to represent a

scenario where there is limited data, whereas Dc(Q) considers a scenario where this limitation

does not exist. Even though the RCML detector in Dd(Q) faces this limitation, it is still

able to greatly outperform the SMI detector in Dc(Q), when constraint functions other than

Ca(Q) are used.

Chapter 6

Conclusion

This thesis covered a framework for making decisions in different applications pertaining

to CFAR detection. This framework is based on the concept of information elasticity, and

sought to characterize the usability properties of different data and processes that provide

information/knowledge to a DM. This usability or decision effectiveness is characterized to

be directly related to different factors that either improve performance (decision quality

metrics) or hinder performance (constraint functions). Through observing how changing

the quantity of information affects these conflicting factors, a point of maximum decision

effectiveness is found and information overload is exhibited.

This framework is applied to an OS-CFAR detector using the FAOSOSD algorithm to

estimate the number of interfering targets present. The accuracy of these estimates at

different SNR levels is analyzed using Monte Carlo simulations. A performance function is

defined by the DM to characterize the relative level of performance of a given decision. Given

the estimated number of interfering targets found via FAOSOSD, the performance function

of many different decisions is found at values of SNR that are relevant to a DM. The sample

mean and variance of the performance function over these different SNR values is taken.

In this decision making scheme, the DM seeks a decision that produces a high average

performance, which represents absolute performance, and a low variance on performance,

89

90

which represents the robustness of a decision. A trade-off behavior between this mean and

variance is exhibited, and compromise programming is used to find the overload solution.

This framework is also applied to the AMF. A domain of usable decisions of K and

desired PFA is defined by a DM. A user-defined function is proposed, which allows the DM

to specify the relative cost of using these decisions. The additional SNR required for the

AMF to perform as well as the clairvoyant detector is also found for each of the usable

decisions. By observing these decisions in the criterion space, a Pareto front is obtained, and

compromise programming is again used to find the overload solution. This analysis is used

on the AMF for when both SMI and RCML are used for covariance matrix estimation. It

is shown that an RCML detector with the same dimensionality as an SMI detector is able

to obtain an overload solution with a much lower SNR loss, even when the DM specifies the

RCML detector to be in a much more data starved scenario.

Appendix

Derivation of the AMF

Note that it is possible that the observation vector and the steering vector do not point along

the same direction. When this occurs, there is a loss in detection performance ([8],[17]). This

phenomenon is known as signal mismatch and is analyzed in [8]. However, signal mismatch

is beyond the scope of this thesis and its effects are not considered here.

Equation (78), has two random variables: Σ and x. The sufficient statistic can be

rewritten by incorporating ”rotation matrices,” U, as well as ”whitening matrices” Σ−1/2.

Consider the rotated and whitened terms:

Whitening:

u = Σ−1/2s

y = Σ−1/2x

Σ = Σ−1/2 Σ Σ−1/2 =⇒ Σ−1 = Σ1/2 Σ−1 Σ1/2

Rotation:

de = UHu = UHΣ−1/2s

z = UHy = UHΣ−1/2x

C = UHΣ U = UHΣ−1/2 Σ Σ−1/2U

91

92

Note that U is selected such that the primary data vector and steering vector are ”ro-

tated” to be in the direction of the first elementary vector: e =

[1 0 · · · 0

]T. Also note

that U is selected such that it is unitary [8] (i.e. UHU = UUH = I). Replacing Σ with the

expression in (79) yields:

C = UHΣ−1/2 1

K

K∑k=1

x(k)x(k)H Σ−1/2U

=1

K

K∑k=1

UHΣ−1/2 x(k)x(k)H Σ−1/2U

=1

K

K∑k=1

z(k)z(k)H

=1

KS

where S =K∑k=1

z(k)z(k)H (107)

Note that because U is unitary, UHU = I. Substituting all of these whitened and rotated

terms into the sufficient statistic (78) yields:

Λ =

∣∣∣sHΣ−1x∣∣∣2

sHΣ−1s

=

∣∣∣sHΣ−1/2U UHΣ1/2 Σ−1 Σ1/2U UHΣ−1/2x∣∣∣2

sHΣ−1/2U UHΣ1/2 Σ−1 Σ1/2U UHΣ−1/2s

=

∣∣deHC−1z∣∣2

deHC−1de=d2∣∣eHC−1z

∣∣2d2 eHC−1e

=

∣∣eHC−1z∣∣2

eHC−1e

=

∣∣eHKS−1z∣∣2

eHKS−1e= K

∣∣eHS−1z∣∣2

eHS−1e

Λ =

∣∣eHS−1z∣∣2

eHS−1e

H1

≶H0

α (108)

where in the last line, the constant K in front is absorbed by the threshold constant α.

93

Note that the mean of z is:

E[z] = E[UHΣ−1/2x]

= UHΣ−1/2E[x]

= aUHΣ−1/2s

The magnitude of this mean can be found as:

|E[z]| =√

E[z]HE[z] =√a2sHΣ−1/2UUHΣ−1/2s

= a√sHΣ−1s

Furthermore, if no signal mismatch is assumed, E[z] must point in the same direction as e

(since x points in the same direction as s). Thus, using the magnitude and the direction of

E[z], the mean of z is as follows:

E[z] =(a√sHΣ−1s

)e (109)

Since the steering vector was rotated in the direction of the first elementary vector (its

only non-zero element is in the first position), the following notation is introduced:

Z =

ZAZB

P = S−1 =

PAA PAB

PBA PBB

=

SAA SAB

SBA SBB

−1

where ZA is a scalar representing the first element of Z, and ZB is a N − 1 length vector

representing the rest of Z. Similarly, PAA is a scalar representing the first element in P, PAB

is a 1× (N − 1) vector, PBA is a (N − 1)× 1 vector, and PBB is a (N − 1)× (N − 1) matrix.

94

Using this notation, the statistic can be rewritten. First, consider the denominator of (108):

eHS−1e =

[1 0

]PAA PAB

PBA PBB

1

0

eHS−1e = PAA

Using Frobenius relations for partitioned matrices, otherwise known as matrix inversion in

block form, PAA is rewritten as follows [41]:

eHS−1e = PAA = (SAA − SABS−1BBSBA)−1 (110)

PBA = −S−1BBSAB(SAA − SABS−1

BBSBA)−1

= −S−1BBSBAPAA

PAB = PHBA

PAB = −PAASABS−1BB (111)

Now, consider the numerator of (108):

eHS−1z =

[1 0

]PAA PAB

PBA PBB

ZAZB

= PAAZA + PABZB

Substituting the result in (111) yields:

eHS−1z = PAAZA − PAASABS−1BBZB

= PAA(ZA − SABS−1BBZB) (112)

95

Using the form of the denominator given in (110) and the form of the numerator given

in (112), the sufficient statistic (108) can be rewritten as:

Λ =

∣∣PAA(ZA − SABS−1BBZB)

∣∣2PAA

=

∣∣ZA − SABS−1BBZB

∣∣2P−1AA

=

∣∣ZA − SABS−1BBZB

∣∣2SAA − SABS−1

BBSBA(113)

Now, the following is defined:

y = ZA − SABS−1BBZB

T = SAA − SABS−1BBSBA

Λ =|y|2

T(114)

A.0.1 Distribution of AMF test statistic

The derivation of the statistical behavior of this test statistic is given in [17], and is again

given here for completeness. Note that from equation (107), S can be thought of as K times

the sample covariance matrix of z. Thus, it’s clear that SAB =∑K

k=1 zA(k)zB(k)H . To find

the distribution of y, it is deconstructed as follows:

y = zA − SABS−1BBzB

= zA −K∑k=1

zA(k)zB(k)HS−1BBzB (115)

y appears to be dependent on the scalars zA and zA(k), as well as the vectors/matrices

zB, zB(k), and S−1BB. To find the distribution of y, the ”B vectors” are first considered to be

96

deterministic or given. After the conditional distribution given these B vectors is found, the

expectation across these B vectors can be used to find the actual distribution. From (109),

it’s clear that zA ∼ CN (a√sHΣ−1s, 1) (the covariance of z is I after whitening, thus variance

of zA is 1). Furthermore, the secondary data matrix is assumed to be interference/noise only,

and is thus distributed as: zA(k) ∼ CN (0, 1).

In (115), y is shown to be a linear combination of Gaussians (when B vectors are given).

Thus, y must also be Gaussian. Its mean and variance can be found to be:

E[y] = E[zA]−K∑k=1

E[zA(k)]zB(k)HS−1BBzB

E[y] = a√sHΣ−1s− 0

Var(y) = Var(zA) +K∑k=1

Var(zA(k)zB(k)HS−1BBzB)

= 1 +K∑k=1

Var (zA(k))∣∣zB(k)HS−1

BBzB∣∣2

= 1 +K∑k=1

zHB S−1BBzB(k)zB(k)HS−1

BBzB

= 1 + zHB S−1BB

K∑k=1

(zB(k)zB(k)H

)S−1BBzB

= 1 + zHB S−1BBzB

Thus, y ∼ CN (a√sHΣ−1s, 1− zHB S−1

BBzB). To simplify the problem, this random variable

is normalized to have unit variance, thus the following variables are introduced:

v = y√ρ (116)

ρ = (1 + zHB S−1BBzB)−1 (117)

Thus the new test statistic is:

97

Λ =

∣∣v/√ρ∣∣2T

=|v|2

Tρ

=⇒ |v|2

T

H1

≶H0

αρ (118)

From this normalization, v ∼ CN (b√ρ√sHΣ−1s, 1). It’s important to note that both the

T and ρ terms are random variables. References [17] and [40] show that T is a chi-squared

random variable with K+1−N complex degrees-of-freedom, and ρ follows a beta distribution

with parameters K + 2−N and N − 1.

Bibliography

[1] T. Gospodarek, “Elasticity of information,” Proc. 14th International Congress of Cy-

bernetics and Systems of WOSC, pp. Wroc law, Poland, 511–520, Sept. 2008.

[2] R. M. Narayanan, A. Z. Liu, P. G. Singerman, and M. Rangaswamy, “Information

elasticity in radar systems,” Electronics Letters, vol. 54, no. 17, pp. 1049–1051, 2018.

[3] R. Narayanan, A. Liu, and M. Rangaswamy, “Information elasticity in pseudorandom

code pulse compression,” p. 14, 05 2018.

[4] A. Z. Liu, R. M. Narayanan, and M. Rangaswamy, “Robust decision making method

for adaptive ordered-statistics CFAR technique using information elasticity,” in Radar

Sensor Technology XXIII (K. I. Ranney and A. Doerry, eds.), vol. 11003, pp. 59 – 67,

International Society for Optics and Photonics, SPIE, 2019.

[5] D. Bougherara, G. Grolleau, and N. Mzoughi, “Is more information always better? an

analysis applied to information-based policies for environmental protection,” 2007.

[6] H. Rohling, “New CFAR-processor based on an ordered statistic,” in International

Radar Conference, pp. 271–275, 1985.

[7] W. L. Melvin, “A stap overview,” IEEE Aerospace and Electronic Systems Magazine,

vol. 19, pp. 19–35, Jan 2004.

98

99

[8] F. C. Robey, D. R. Fuhrmann, E. J. Kelly, and R. Nitzberg, “A cfar adaptive matched

filter detector,” IEEE Transactions on Aerospace and Electronic Systems, vol. 28,

pp. 208–216, Jan 1992.

[9] M. Richards, W. Holm, and J. Scheer, Principles of Modern Radar: Basic Principles.

Electromagnetics and Radar, Institution of Engineering and Technology, 2010.

[10] G. W, Shift Register Sequences: Secure And Limited-access Code Generators, Efficiency

Code Generators, Prescribed Property Generators, Mathematical Models (Third Revised

Edition). World Scientific Publishing Company, 2017.

[11] A. Boehmer, “Binary pulse compression codes,” IEEE Transactions on Information

Theory, vol. 13, pp. 156–167, April 1967.

[12] K. Chang, e-Design: Computer-Aided Engineering Design. Elsevier Science, 2016.

[13] G. O. Odu and O. E. Charles-Owaba, “Review of multi-criteria optimization methods -

theory and applications,” IOSR Journal of Engineering, vol. 3, no. 10, pp. 1–14, 2013.

[14] M. Barkat and P. K. Varshney, “On adaptive cell-averaging CFAR (Constant False-

Alarm Rate) radar signal detection,” tech. rep., Oct. 1987.

[15] K. J. Sangston and K. R. Gerlach, “Coherent detection of radar targets in a non-gaussian

background,” IEEE Transactions on Aerospace and Electronic Systems, vol. 30, pp. 330–

340, April 1994.

[16] A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes.

Boston: McGraw Hill, fourth ed., 2002.

[17] E. J. Kelly, “An adaptive detection algorithm,” IEEE Transactions on Aerospace and

Electronic Systems, vol. AES-22, pp. 115–127, March 1986.

[18] J. H. Curtiss, “On the distribution of the quotient of two chance variables,” The Annals

of Mathematical Statistics, vol. 12, no. 4, pp. 409–421, 1941.

100

[19] E. J. Kelly, “Finite-sum expressions for signal detection probabilities,” NASA

STI/Recon Technical Report N, vol. 81, May 1981.

[20] P. Swerling, “Probability of detection for fluctuating targets,” IRE Transactions on

Information Theory, vol. 6, pp. 269–308, April 1960.

[21] B. Himed and W. L. Melvin, “Analyzing space-time adaptive processors using mea-

sured data,” in Conference Record of the Thirty-First Asilomar Conference on Signals,

Systems and Computers (Cat. No.97CB36136), vol. 1, pp. 930–935 vol.1, Nov 1997.

[22] P. P. Gandhi and S. A. Kassam, “Analysis of cfar processors in nonhomogeneous back-

ground,” IEEE Transactions on Aerospace and Electronic Systems, vol. 24, pp. 427–445,

July 1988.

[23] H. David and H. Nagaraja, Order Statistics. Wiley Series in Probability and Statistics,

Wiley, 2004.

[24] S. Blake, “Os-cfar theory for multiple targets and nonuniform clutter,” IEEE Transac-

tions on Aerospace and Electronic Systems, vol. 24, pp. 785–790, Nov 1988.

[25] B. Magaz, A. Belouchrani, and M. Hamadouche, “Automatic threshold selection in os-

cfar radar detection using information theoretic criteria,” Progress In Electromagnetics

Research B, vol. 30, pp. 157–175, 01 2011.

[26] H. Akaike, Information Theory and an Extension of the Maximum Likelihood Principle,

pp. 199–213. New York, NY: Springer New York, 1998.

[27] J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465

– 471, 1978.

[28] M. Wax and T. Kailath, “Detection of signals by information theoretic criteria,” IEEE

Transactions on Acoustics, Speech, and Signal Processing, vol. 33, pp. 387–392, April

1985.

101

[29] Y. Jin and B. Sendhoff, “Trade-off between performance and robustness: An evolu-

tionary multiobjective approach,” in Evolutionary Multi-Criterion Optimization (C. M.

Fonseca, P. J. Fleming, E. Zitzler, L. Thiele, and K. Deb, eds.), (Berlin, Heidelberg),

pp. 237–251, Springer Berlin Heidelberg, 2003.

[30] B. Kang, V. Monga, and M. Rangaswamy, “On the practical merits of rank con-

strained ml estimator of structured covariance matrices,” in 2013 IEEE Radar Con-

ference (RadarCon13), pp. 1–6, April 2013.

[31] M. Weiss, “Analysis of some modified cell-averaging cfar processors in multiple-target

situations,” IEEE Transactions on Aerospace and Electronic Systems, vol. AES-18,

pp. 102–114, Jan 1982.

[32] I. S. Reed, J. D. Mallett, and L. E. Brennan, “Rapid convergence rate in adaptive

arrays,” IEEE Transactions on Aerospace and Electronic Systems, vol. AES-10, pp. 853–

863, Nov 1974.

[33] N. R. Goodman, “Statistical analysis based on a certain multivariate complex gaussian

distribution (an introduction),” Ann. Math. Statist., vol. 34, pp. 152–177, 03 1963.

[34] F. Gini and M. Rangaswamy, Knowledge Based Radar Detection, Tracking and Clas-

sification (Adaptive and Learning Systems for Signal Processing, Communications and

Control Series). New York, NY, USA: Wiley-Interscience, 2008.

[35] M. Steiner and K. Gerlach, “Fast converging adaptive processor or a structured co-

variance matrix,” IEEE Transactions on Aerospace and Electronic Systems, vol. 36,

pp. 1115–1126, Oct 2000.

[36] B. Kang, V. Monga, and M. Rangaswamy, “Rank-constrained maximum likelihood

estimation of structured covariance matrices,” IEEE Transactions on Aerospace and

Electronic Systems, vol. 50, pp. 501–515, January 2014.

102

[37] V. Monga and M. Rangaswamy, “Rank constrained ml estimation of structured co-

variance matrices with applications in radar target detection,” in 2012 IEEE Radar

Conference, pp. 0475–0480, May 2012.

[38] J. Ward, “Space-time adaptive processing for airborne radar,” tech. rep., Massachusettes

Institute of Technology, 12 1994.

[39] M. Skolnik, Radar Handbook, Third Edition. Electronics electrical engineering, McGraw-

Hill Education, 2008.

[40] E. J. Kelly, “Adaptive detection in non-stationary interference, Part III,” Tech. Rep.

761, MIT Lincoln Laboratory, Lexington, MA, Aug. 1987.

[41] E. J. Kelly, “Adaptive detection in non-stationary interference. Part I and Part II,”

Tech. Rep. 724, MIT Lincoln Laboratory, Lexington, MA, May 1985.

AN INFORMATION ELASTICITY FRAMEWORK FOR CONSTANT …

Documents

Transcript of AN INFORMATION ELASTICITY FRAMEWORK FOR CONSTANT …