1 Review Signals Detection

Review of Signals andDetection

Introduction

Data Modulation and Demodulation

Data Modulation convert information bits into waveforms or signals suitable for transmission over communication channels

Data Detection reversing the modulation, i.e. finding which bits were transmitted over the noisy channel

Illustrative Simple Example

Suppose we have a passband channel

Since DC does not go through the channel 0 Volts and 1 Volts mapping respectively to binary bits will NOT work

However, one can use a simple Binary Phase Shift Keying modulation:

x0(t) = cos(2150t); x1(t) = -cos(2150t)

Detection: Detect +1 or -1 at the output (mapping it then to bits)

Caveat: This is for single transmission (will see very soon successive transmissions)

Mapping of vectors to waveforms Consider a set of real-valued functions {f(t)}, t [0, T] such that:

This is called a Hilbert space of continuous functions, i.e., L2[0, T]

Measure of distance (Inner product)

Basis functions:

A class of functions can be expressed in terms of basis functions {n(t)} as:

where

This waveform carries the information contained in {xn(t)} through the communication channel

Thus this relationship implies a mapping:

Signal Constellation: A set of M vectors {xi} , i = 0, , M-1 is called a signal constellation.

Examples of Signal Constellations: Binary antipodal, QPSK (Quadrature Phase Shift Keying)

The previous mapping enables a mapping of points in L2[0, T] with properties in RN

Vector Mapper

Mapping of binary vector into one of the signal points

Mapping is not arbitrary (good choices e.g Gray labelling lead to better performance over noisy channels)

Intuitive good idea:Label points that are close in Euclidean distance are close in Hamming distance

Why? (Hint: Think of Gray)

Modulator: Implements the basis expansion

Signal Set: Set of modulated waveforms {xi(t)} , i = 0, , M-1 , corresponding to signal constellation {xi}, where:

Average Energy:

where pX(i) is the probability of choosing xi The probability pX(i) depends on:

Underlying probability distribution of bits in message source

The vector mapper

Average power:

Example: Consider a 16 QAM constellation with basis functions (cosine/sine pulses):

Gram-Schmidt procedure allows the choice of a minimal basis to represent the signal sets {xi} (this might be reviewed in some exercise)

Demodulation

Takes the continuous time waveforms and extracts the discrete version

Demodulation extracts the coefficients of the expansion by projecting the signal on its basis

Thus, in the noiseless case, demodulation is just recovering the coefficients of the basis functions

Matched Filter

Operation equivalent to the recovery of the coefficients of the basis expansion

Model till now

Data Detection

Assume that demodulator captures the essential (discrete) information about x from y(t) (notion of essential to be clarified soon)

We can consider an equivalent discrete channel

In discrete domain:

Example: y = x + z pY|X(y|x) = pZ(y-x) Criterion for detection: Detection is guessing input xi given the noisy

output y Denote the detection decision to expressed as a function If M=m was the message sent, then:

= Pr( )

Optimum detector: Minimizes error probability over all detectors

The probability of observing Y = y if the message mi was sent is denoted as:

Decision Rule:H : Y M is a function which takes a realization input y and outputs a guess on the transmitted message.

The (deterministic) detector H(y) divides the space RN into M regions (M hypotheses).

Then, it follows that:

Implication: The decision rule:

maximizes the probability of being correct, i.e., minimizes error probability optimal decision rule called Maximum-a-posteriori (MAP) decision rule

Some comments:

MAP detector needs knowledge about the priors PX(x)

It can be simplified as follows:

since pY(y) is common to all the hypothesis. Thus, MAP rule is equivalent to:

Maximum Likelihood Detector

Let us assume that the priors are uniform, i.e. =1

. Then, the MAP

rule becomes: = arg

| (|)

ML rule chooses the message that most likely caused the observation (ignoring how likely the message itself was)

ML rule is clearly inferior to MAP for non-uniform priors. Suppose that the prior probabilities were unknown,

Question: Is there a robust detection scheme?One can think of this as a game where nature chooses the prior distribution while the detection is under control

Theorem: The ML detector minimizes the maximum possible average error probability when the prior distribution is unknown and if the conditional probability of error

Pe,ML|M=mi = Pr(HML(y) is incorrect|M = mi) is independent of i

Hence,

Therefore,

For any decision rule (Hypothesis Test):

Thus,

Interpretation: ML has also some canonical robustness for detection under uncertainty of priors

AWGN Example

In this case, we have simply:

where

Hence,

implying that:

For the MAP rule, we maximize the conditional probability pX|Y(X=xi|y)

Thus, the MAP rule will be given by:

The ML rule will be given by:

ML detector selects the message that is closest in Euclidean distance to received signal

MAP detector includes additionally a shift due to the prior probabilities (larger decision regions for more probable messages)

Observation: In both MAP and ML decision rules, one does not need y, but just the functions 2

i = 0, ..., M-1

There is no loss of information by retaining only { 2}=0

1

Sufficient Statistic: function that retains the essential information about the parameter of interest

Decision regions are given by:

= { :

2 < 2, }

These regions correspond to the Voronoi cells of points {}=01 Decision regions are separated by

hyperplanes

In the case of the MAP rule, the decision regions are separated by shifted hyperplanes

Minmax decision rule (Binary case)

What if the prior distributions are not known ?Let

For a decision rule H which does not depend on PX, Pe,H(PX) is linear in PX(x)

A robust decision rule H means the following: min

max0

,(0)

Clearly, for a given decision rule H which does not depend on PX,max0

, 0 = max{Pr 1 0 , Pr( 0|1)}

This would be for instance the case of a ML decision rule (since it does not depend on PX(x))

Let us look now at the MAP rule for every choice of 0, denoting V(0)=Pe,MAP(0)

MAP rule does depend on PX(x) V (0) = Pe,MAP (0) is not a Linear Function (Homework problem)

V(0) can be shown to be concave with a unique maximizer 0* 0

* is the worst prior for the MAP rule

Since MAP is the optimal decision rule

, 0 , 0 = 0 , for each 0Thus, the line , 0 lies above the curve 0

The best we can do is to make , 0 tangential to 0 for some 0 Decision rule H isthe MAP detector designed for prior 0

To find min

max0

,() over all possible decision rules H, choose horizontal line (slope 0)

0 = 0MAP rule designed for 0

0 , 0

(independent of 0)

, 0 = 0 1|0 + (1 0) 0|1= 1|0 = 0|1

Performance of all priors as bad as for the worst prior

Hence, ,|=0 = ,|=1(same conditional probability) Average error probability,is independent of 0

In the case that 0 =1

2(i.e. uniform )Min max rule ML rule (this happens in practice)

If ML satisfies that ,|=0 = ,|=1then the worst case prior for MAP rule is auniform and ML becomes the Min max rule

Other decision rules (Bayes rule)

Error probability is just one possible criterion for choosing a detector

Detectors minimize in general other cost functions

Let , denote the cost of choosing hypothesis i when actually hypothesis j was true

The expected cost incurred by some decision rule H(y) when hypothesis j was true, is:

=

, Pr = | =

Overall average cost is obtained by taking prior probabilities into account:

=

()

Question: What is the optimal decision rule to minimize the above equation?

Notice that the error probability criterion corresponds to a cost assignment:, = 1, ; , = 0, =

Bayes rule for minimizing risk

Considering the case M = 2, i.e., distinguishing between 2 hypotheses: = 0 0 + 1 1

where = 0, Pr = 0| = + 1, Pr = 1| = , = 0,1= 0, [1 Pr = 1| = ] + 1, Pr = 1| = , = 0,1

Operating = 0 0,0 Pr 0|0 + 0 1,0 Pr 1|0 + 1 0,1 Pr 0|1 +1 1,1 Pr 1|1

=0 0,0 0 0,0 Pr 1|0 + 0 1,0 Pr 1|0 + 1 0,1 1 0,1 Pr 1|1 + 1 1,1 Pr 1|1

= 0 0,0 + 1 0,1 + 0 (1,0 0,0) 1

| 0 + 1 (1,1 0,1) 1

| 1

= =0

1

0, + 1

=0

1

(1,0,)|

() = =0

1

0, + 1

=0

1

(1,0,)|

To minimize R(H), collect all the negative area in the function:

=0

1

(1,0,)| = 0(1,00,0)| 0 + 1(1,10,1)| 1

Hence

1 = : 0(1,00,0)| 0 + 1(1,10,1)| 1 < 0

= : | 1 >0(1,00,0)

1(0,11,1)| 0 =

:| 1| 0

> = : () >

where =0(1,00,0)

1(0,11,1)and =

| 1| 0

(Likelihood Ratio)

MAP corresponds to choosing 0,0 = 1,1 = 1 and 0,1 = 0,1 = 0, i.e. =01

(min. average error prob.)

Irrelevance and Sufficient Statistic

An observable Y may contain data that is irrelevant for the detection problem at hand

But how to say what is superfluous (irrelevant) for the detection process ?

Consider the following example:

Notice that we have a Markov chain:

If Z1 and Z2 are statistically independent + independent of X, then Y2 is irrelevant for detecting X MAP detector using only Y1 has the same performance than using both Y = [Y1, Y2]

T

Theorem: If =12

and XY1Y2 is irrelevant for the detection of X

Let T(Y) be a function (either stochastic or deterministic) of Y

Denifition: A function T(Y) of an observable Y is a sufficient statistic for X if XT(Y)Y

Notice that if T(Y) is a sufficient statistic, indeed, PX|T(Y),Y = PX|T(Y)MAP detector observing T(Y) gets the same average error probability than observing Y

In the previous example, T(Y) = Y1

Remember in the MAP rule we have seen for the AWGN example with two hypothesis:

We only need to know T(Y) to decide whether = 0 or = 1(Y becomes irrelevant once we know T(Y))

Reversibility Theorem: The application of an invertible mapping on the channel output vector y does not affect the performance of the MAP detector.

Let y2 be the channel output and y1 = G(y2), where G() is an invertible map12

=1

1 (1)which implies that: |1,2 = |1Thus, applying the Irrelevance Theorem, we can drop 2

Example: Continuous additive white Gaussian noise channel

Let's go through the entire detection chain for a continuous (waveform) channel

Our channel is given by: = + , [0, ]

Additive White Gaussian Noise: Process z(t) is Gaussian and white, which means:

=02

()

Consider the Vector Channel Representation:

= =0

1

()

where () =0 is an orthonormal basis in 2[0, ]

= =0

1

() + ()

To define the equivalent vector channel, consider the inner products: = , () , = , () , = 0, , 1

=

0

1= +

Notice that:

=0

1

() =0

1

()

Lemma: (Uncorrelated noise samples) Given any orthonormal basis () and white Gaussian noise (), thenthe coefficients = , of the basis expansion are i.i.d. Gaussian with variance

02

=02

Thus, if we extend the basis () to span (), the coefficients of the resulting expansion will be independent of the rest of coefficients

= + = =0

1

+

+

In vector expansion, is the vector containing basis coefficients from , = , Term can be dropped vector model y = x + z is sufficient for detection purposes

This equivalence allows to develop MAP and ML rules as we have done before

Canonical example: Binary constellation error probability

In the simplest case:

The conditional error probability is given by:

Notice that =(10)

10, is a Gaussian with = 0 and 2 = 2. Thus:

, = 0 ,|=0 + 1 ,|=1 = 102

Invariance Properties

Theorem: Rotation Invariance If all the data symbols =1 are rotated by an

orthogonal transformation, i.e. = , = 0, , 1, where = ,

then the average probability of error of the MAP/ML receiver remains unchanged over an AWGN channel

= +

=

+

= +

But is Gaussian and

= 2 = 2is probabilisticallyequivalent to Average probability of error is unchanged

Translational Invariance: If a signal constellation is translated by a constant vector, i.e., = + , = 1, ,M, the average probability of error of the MAP/ML receiver remains unchanged over an AWGN channel

Minimum energy Translate: In order to get an equivalent minimum energy constellation, we have to substract from every signal point, obtaining a zero-mean constellation.

Union Bound M > 2 [ML Detector]

Assuming an ML Detector for the AWGN channel, it can be easily seen that:

since Q() is a monotonically decreasing function

Nearest Neighbor Union Bound (NNUB) [ML Detector]

Let be the number of points sharing a decision boundary with It is clear that the following holds:

Signal Sets and Measures

Number of dimensions: If the signal bandwidth is approximately W and is approximately time-limited to T, then, the Dimensionality Theorem from Information Theory [Shannon, Landau, Pollack] states that the equivalent space has dimension N, which is:

= 2

If we have bits in a constellation in dimension , then, the parameters are:

is a useful measure in compound signal sets with different numbers of dimensions

Signal to noise ratio (SNR)

=2

=/

/ Constellation Figure of merit - CFM:

(/2)

2

measures the quality of a constellation used with an AWGN channel

As increases, we get better performance in AWGN (for same num. of bits/dim. only)

Fair comparison between constellations:

Make a multi-parameter comparison across the following measures:

Signal Constellations

Cubic constellations

= =01,

where is the number of dimensions and

= 1 = 0

where {0,1}, depending on the bit sequence and = 2

Orthogonal Constellations: = ,

Example: Bi-orthogonal (anti-podal) signal set = 2, = 2 signal points

Circular Constellations: -th root of unity

Example 1: Quadrature Phase-Shift Keying (QPSK)

The constellation consists of =12

, where

12

=

2

1,1

2

1,+1

2

+1,1

2

+1,+1

2 = 2 , =

22

2

2

= 1

In the particular case of BPSK, 2 = 4

What is the Error probability of QPSK ?

= =0

3

| = |0 = 1 2

2

= 1 22

+ 2

2

= 1 = 22

2

2

< 22

NNUB

Notice how for reasonably large, the NNUB becomes tight

Example 2: M-ary Phase-Shift Keying (MPSK) [ML Detector]

= 2 sin

, =

sin

2

2

= 2sin2

< 22 sin

2= 2

sin

Lattice-based Constellations

A lattice is a regular arrangement of points in an N-dimensional space = ,

where is called the generator matrix

Ex: Integer lattice corresponds to the case: = ,

Pulse Amplitude Modulation (PAM) constellation corresponds to N = 1, then:

=2

122 1 ,

2 =122 1

, =3

2 1

It can be easily shown that the probability of error of an ML Detector:

= 2

1 2

2

+2

1

2

= 2 1 1

2

Other lattice-based construccions

Quadrature Amplitude Modulation (QAM): Cookie-slice of 2-dimensional integer lattice. Other constellations are carved out of other lattices (e.g. hexagonal lattices, etc...)

Other performance measures of interest that may appear in the course:

Coding gain: =1

2

Shaping gain of a lattice

Peak-to-average ratio

Passband Systems

Passband transmission centered at frequency fc Examples: TV Broadcast, cellular and cordless phones, etc...

Equivalent representations

Carrier modulated signal x(t) is given by: = () cos( + ())

Quadrature decomposition: = cos

+ () sin()

Hence:

= 2 + 2 , = tan1

()

= cos , =() sin ()

(Complex) Baseband-equivalent signal:

+ (note that there is no reference to )

Analytic (pass-band) equivalent signal:

= = +

= cos sin()

()

+ cos sin()

()

What is ()?

() is the Hilbert Transform of ()

= =

, =

1 , 0

0, = 0

Letting = Im , notice that: =

= +

= cos + sin()

+ () cos sin()

Summarizing, there are four equivalent representations for = () cos( + ())

1. Magnitude and phase: , ()2. In-phase and Quadrature phase: , 3. Complex baseband equivalent signal: 4. Analytic signal:

Frequency Analysis

Let us assume that the signal x(t) is bandlimited so that: = 0, = 0, >

If {} denotes the Fourier Transform, then:

= = cos sin() =1

2

0 >0

+0 0

+ 0

Now consider:

sign =

2 + +

2 + +

=1

2 + +

1

2 + +

Hence,1 sign = sin + co() = Im = Im = () = = sign

Hilbert Transform only affects the phase but not the magnitude, = sign

Therefore, = + () = + ( sign ) = (1 + sign )

Fourier Relationships

Channel Input/Output Relationships

Given a passband channel , we can write the channel output as:

since 0 only for > 01

21 + sign = 1

Similarly,

Representation of Passband Channels

Baseband equivalent channels

Representation with baseband signals

Summary of signal representations

Passband = = ()

Analytic equivalent

=1

2

= =1

2 ()

Baseband

=1

2

=1

2 = +

Baseband equivalent Gaussian noise

Consider the WSS noise process () with autocorrelation ():

= ( ) = = = 2

with a PSD given by:

= 2 =

02 < < +

0

Bandlimited noise spectral density

Similarly to what we did before, let: = + ()

where =

Since = sign , this implies that =

2 = sign 2 =

Consider now the cross-correlations between and =

( ) = = () =

( ) = = ()

Hence, the following holds: =

( )= + 2 = + ( ()) +

Therefore, we get that: = 2( + )

which implies that: = = 2(1 + sign()) = 4 > 00

Hence:

Since = , we have that =

, yielding:

= + = 20 < 0

Problem: baseband noise has double energy as compared to passband noise

Here is where de factor 2 shows up

The factor of 2 (contd)

Similar issue occurs with deterministic modulated signals. Consider the QAM (complex) base-band signal:

= 2 1 + 2 = 21 ()

+ 22 ()

, 1, 2 {+1,1}

and the corresponding two QAM passband functions:1 = 2 cos() ,2 = 2 sin()

The modulated signal is: = 11 + 22 =21 cos() 22 sin()

It can be checked that if is normalized, i.e. = 1, then: 1 =2 = 1

Thus, under modulation, the factor 2 is needed.

Let us verify it:

Representation of additive noise channel

Scaling is necessary only for analytical convenience, since SNR at receiver is not changed (signal is processed after it is received).

Circularly symmetric complex Gaussian processes

Let = + , where and denote real and imaginary components respectively.

is a complex Gaussian , are jointly Gaussian

Thinking of

as a vector, then:

= 2

2

but since = , there are only three degrees of freedom.

Notice however that: = = 2 + 2

thus, this is not sufficient to specify the Gaussian random variable .

Consider also the quantity: = = 2 2 + 2

which gives two degrees of freedom

A random variable Z is circularly symmetric iff = 0, which implies:

= 0 2 = 2

= 0

= 2 0

0 2(2D circular)

Same concept can be generalized easily to vectors:

=1

, = +

is a complex Gaussian circularly symmetric random vector iff

The probability density function (pdf) for complex gaussian vectors is:

=1

()

1()

where = and =

How is the Hypothesis testing done in the complex domain ?

Gaussian Hypothesis Testing - Complex case

Consider the usual problem: = + , where , , and is a circularly symmetric complex Gaussian vector with =

2, = 0

where , =

Regarding the ML Detector: = argmax

2 = argmax

2Re , 2 = argmin

2

1 Review Signals Detection

Documents

Transcript of 1 Review Signals Detection