1 Review Signals Detection

68
Review of Signals and Detection

description

Un amplio resumen acerca de la detección y procesado de señales.

Transcript of 1 Review Signals Detection

  • Review of Signals andDetection

  • Introduction

    Data Modulation and Demodulation

    Data Modulation convert information bits into waveforms or signals suitable for transmission over communication channels

    Data Detection reversing the modulation, i.e. finding which bits were transmitted over the noisy channel

  • Illustrative Simple Example

    Suppose we have a passband channel

    Since DC does not go through the channel 0 Volts and 1 Volts mapping respectively to binary bits will NOT work

    However, one can use a simple Binary Phase Shift Keying modulation:

    x0(t) = cos(2150t); x1(t) = -cos(2150t)

    Detection: Detect +1 or -1 at the output (mapping it then to bits)

    Caveat: This is for single transmission (will see very soon successive transmissions)

  • Mapping of vectors to waveforms Consider a set of real-valued functions {f(t)}, t [0, T] such that:

    This is called a Hilbert space of continuous functions, i.e., L2[0, T]

    Measure of distance (Inner product)

    Basis functions:

    A class of functions can be expressed in terms of basis functions {n(t)} as:

    where

    This waveform carries the information contained in {xn(t)} through the communication channel

    Thus this relationship implies a mapping:

  • Signal Constellation: A set of M vectors {xi} , i = 0, , M-1 is called a signal constellation.

    Examples of Signal Constellations: Binary antipodal, QPSK (Quadrature Phase Shift Keying)

    The previous mapping enables a mapping of points in L2[0, T] with properties in RN

  • Vector Mapper

    Mapping of binary vector into one of the signal points

    Mapping is not arbitrary (good choices e.g Gray labelling lead to better performance over noisy channels)

    Intuitive good idea:Label points that are close in Euclidean distance are close in Hamming distance

    Why? (Hint: Think of Gray)

  • Modulator: Implements the basis expansion

    Signal Set: Set of modulated waveforms {xi(t)} , i = 0, , M-1 , corresponding to signal constellation {xi}, where:

  • Average Energy:

    where pX(i) is the probability of choosing xi The probability pX(i) depends on:

    Underlying probability distribution of bits in message source

    The vector mapper

    Average power:

  • Example: Consider a 16 QAM constellation with basis functions (cosine/sine pulses):

    Gram-Schmidt procedure allows the choice of a minimal basis to represent the signal sets {xi} (this might be reviewed in some exercise)

  • Demodulation

    Takes the continuous time waveforms and extracts the discrete version

    Demodulation extracts the coefficients of the expansion by projecting the signal on its basis

    Thus, in the noiseless case, demodulation is just recovering the coefficients of the basis functions

  • Matched Filter

    Operation equivalent to the recovery of the coefficients of the basis expansion

  • Model till now

  • Data Detection

    Assume that demodulator captures the essential (discrete) information about x from y(t) (notion of essential to be clarified soon)

    We can consider an equivalent discrete channel

    In discrete domain:

    Example: y = x + z pY|X(y|x) = pZ(y-x) Criterion for detection: Detection is guessing input xi given the noisy

    output y Denote the detection decision to expressed as a function If M=m was the message sent, then:

    = Pr( )

  • Optimum detector: Minimizes error probability over all detectors

    The probability of observing Y = y if the message mi was sent is denoted as:

    Decision Rule:H : Y M is a function which takes a realization input y and outputs a guess on the transmitted message.

    The (deterministic) detector H(y) divides the space RN into M regions (M hypotheses).

  • Then, it follows that:

    Implication: The decision rule:

    maximizes the probability of being correct, i.e., minimizes error probability optimal decision rule called Maximum-a-posteriori (MAP) decision rule

  • Some comments:

    MAP detector needs knowledge about the priors PX(x)

    It can be simplified as follows:

    since pY(y) is common to all the hypothesis. Thus, MAP rule is equivalent to:

  • Maximum Likelihood Detector

    Let us assume that the priors are uniform, i.e. =1

    . Then, the MAP

    rule becomes: = arg

    | (|)

    ML rule chooses the message that most likely caused the observation (ignoring how likely the message itself was)

    ML rule is clearly inferior to MAP for non-uniform priors. Suppose that the prior probabilities were unknown,

    Question: Is there a robust detection scheme?One can think of this as a game where nature chooses the prior distribution while the detection is under control

  • Theorem: The ML detector minimizes the maximum possible average error probability when the prior distribution is unknown and if the conditional probability of error

    Pe,ML|M=mi = Pr(HML(y) is incorrect|M = mi) is independent of i

    Hence,

    Therefore,

    For any decision rule (Hypothesis Test):

    Thus,

    Interpretation: ML has also some canonical robustness for detection under uncertainty of priors

  • AWGN Example

    In this case, we have simply:

    where

    Hence,

    implying that:

    For the MAP rule, we maximize the conditional probability pX|Y(X=xi|y)

  • Thus, the MAP rule will be given by:

    The ML rule will be given by:

    ML detector selects the message that is closest in Euclidean distance to received signal

    MAP detector includes additionally a shift due to the prior probabilities (larger decision regions for more probable messages)

  • Observation: In both MAP and ML decision rules, one does not need y, but just the functions 2

    i = 0, ..., M-1

    There is no loss of information by retaining only { 2}=0

    1

    Sufficient Statistic: function that retains the essential information about the parameter of interest

    Decision regions are given by:

    = { :

    2 < 2, }

    These regions correspond to the Voronoi cells of points {}=01 Decision regions are separated by

    hyperplanes

    In the case of the MAP rule, the decision regions are separated by shifted hyperplanes

  • Minmax decision rule (Binary case)

    What if the prior distributions are not known ?Let

    For a decision rule H which does not depend on PX, Pe,H(PX) is linear in PX(x)

    A robust decision rule H means the following: min

    max0

    ,(0)

    Clearly, for a given decision rule H which does not depend on PX,max0

    , 0 = max{Pr 1 0 , Pr( 0|1)}

  • This would be for instance the case of a ML decision rule (since it does not depend on PX(x))

  • Let us look now at the MAP rule for every choice of 0, denoting V(0)=Pe,MAP(0)

    MAP rule does depend on PX(x) V (0) = Pe,MAP (0) is not a Linear Function (Homework problem)

    V(0) can be shown to be concave with a unique maximizer 0* 0

    * is the worst prior for the MAP rule

  • Since MAP is the optimal decision rule

    , 0 , 0 = 0 , for each 0Thus, the line , 0 lies above the curve 0

    The best we can do is to make , 0 tangential to 0 for some 0 Decision rule H isthe MAP detector designed for prior 0

    To find min

    max0

    ,() over all possible decision rules H, choose horizontal line (slope 0)

    0 = 0MAP rule designed for 0

    0 , 0

    (independent of 0)

  • , 0 = 0 1|0 + (1 0) 0|1= 1|0 = 0|1

    Performance of all priors as bad as for the worst prior

    Hence, ,|=0 = ,|=1(same conditional probability) Average error probability,is independent of 0

    In the case that 0 =1

    2(i.e. uniform )Min max rule ML rule (this happens in practice)

    If ML satisfies that ,|=0 = ,|=1then the worst case prior for MAP rule is auniform and ML becomes the Min max rule

  • Other decision rules (Bayes rule)

    Error probability is just one possible criterion for choosing a detector

    Detectors minimize in general other cost functions

    Let , denote the cost of choosing hypothesis i when actually hypothesis j was true

    The expected cost incurred by some decision rule H(y) when hypothesis j was true, is:

    =

    , Pr = | =

    Overall average cost is obtained by taking prior probabilities into account:

    =

    ()

    Question: What is the optimal decision rule to minimize the above equation?

    Notice that the error probability criterion corresponds to a cost assignment:, = 1, ; , = 0, =

  • Bayes rule for minimizing risk

    Considering the case M = 2, i.e., distinguishing between 2 hypotheses: = 0 0 + 1 1

    where = 0, Pr = 0| = + 1, Pr = 1| = , = 0,1= 0, [1 Pr = 1| = ] + 1, Pr = 1| = , = 0,1

    Operating = 0 0,0 Pr 0|0 + 0 1,0 Pr 1|0 + 1 0,1 Pr 0|1 +1 1,1 Pr 1|1

    =0 0,0 0 0,0 Pr 1|0 + 0 1,0 Pr 1|0 + 1 0,1 1 0,1 Pr 1|1 + 1 1,1 Pr 1|1

    = 0 0,0 + 1 0,1 + 0 (1,0 0,0) 1

    | 0 + 1 (1,1 0,1) 1

    | 1

    = =0

    1

    0, + 1

    =0

    1

    (1,0,)|

  • () = =0

    1

    0, + 1

    =0

    1

    (1,0,)|

    To minimize R(H), collect all the negative area in the function:

    =0

    1

    (1,0,)| = 0(1,00,0)| 0 + 1(1,10,1)| 1

    Hence

    1 = : 0(1,00,0)| 0 + 1(1,10,1)| 1 < 0

    = : | 1 >0(1,00,0)

    1(0,11,1)| 0 =

    :| 1| 0

    > = : () >

    where =0(1,00,0)

    1(0,11,1)and =

    | 1| 0

    (Likelihood Ratio)

    MAP corresponds to choosing 0,0 = 1,1 = 1 and 0,1 = 0,1 = 0, i.e. =01

    (min. average error prob.)

  • Irrelevance and Sufficient Statistic

    An observable Y may contain data that is irrelevant for the detection problem at hand

    But how to say what is superfluous (irrelevant) for the detection process ?

    Consider the following example:

    Notice that we have a Markov chain:

    If Z1 and Z2 are statistically independent + independent of X, then Y2 is irrelevant for detecting X MAP detector using only Y1 has the same performance than using both Y = [Y1, Y2]

    T

    Theorem: If =12

    and XY1Y2 is irrelevant for the detection of X

  • Let T(Y) be a function (either stochastic or deterministic) of Y

    Denifition: A function T(Y) of an observable Y is a sufficient statistic for X if XT(Y)Y

    Notice that if T(Y) is a sufficient statistic, indeed, PX|T(Y),Y = PX|T(Y)MAP detector observing T(Y) gets the same average error probability than observing Y

    In the previous example, T(Y) = Y1

    Remember in the MAP rule we have seen for the AWGN example with two hypothesis:

    We only need to know T(Y) to decide whether = 0 or = 1(Y becomes irrelevant once we know T(Y))

  • Reversibility Theorem: The application of an invertible mapping on the channel output vector y does not affect the performance of the MAP detector.

    Let y2 be the channel output and y1 = G(y2), where G() is an invertible map12

    =1

    1 (1)which implies that: |1,2 = |1Thus, applying the Irrelevance Theorem, we can drop 2

  • Example: Continuous additive white Gaussian noise channel

    Let's go through the entire detection chain for a continuous (waveform) channel

    Our channel is given by: = + , [0, ]

    Additive White Gaussian Noise: Process z(t) is Gaussian and white, which means:

    =02

    ()

    Consider the Vector Channel Representation:

    = =0

    1

    ()

    where () =0 is an orthonormal basis in 2[0, ]

    = =0

    1

    () + ()

  • To define the equivalent vector channel, consider the inner products: = , () , = , () , = 0, , 1

    =

    0

    1= +

    Notice that:

    =0

    1

    () =0

    1

    ()

    Lemma: (Uncorrelated noise samples) Given any orthonormal basis () and white Gaussian noise (), thenthe coefficients = , of the basis expansion are i.i.d. Gaussian with variance

    02

    =02

    Thus, if we extend the basis () to span (), the coefficients of the resulting expansion will be independent of the rest of coefficients

    = + = =0

    1

    +

    +

    In vector expansion, is the vector containing basis coefficients from , = , Term can be dropped vector model y = x + z is sufficient for detection purposes

  • This equivalence allows to develop MAP and ML rules as we have done before

  • Canonical example: Binary constellation error probability

    In the simplest case:

    The conditional error probability is given by:

    Notice that =(10)

    10, is a Gaussian with = 0 and 2 = 2. Thus:

    , = 0 ,|=0 + 1 ,|=1 = 102

  • Invariance Properties

    Theorem: Rotation Invariance If all the data symbols =1 are rotated by an

    orthogonal transformation, i.e. = , = 0, , 1, where = ,

    then the average probability of error of the MAP/ML receiver remains unchanged over an AWGN channel

    = +

    =

    +

    = +

    But is Gaussian and

    = 2 = 2is probabilisticallyequivalent to Average probability of error is unchanged

    Translational Invariance: If a signal constellation is translated by a constant vector, i.e., = + , = 1, ,M, the average probability of error of the MAP/ML receiver remains unchanged over an AWGN channel

    Minimum energy Translate: In order to get an equivalent minimum energy constellation, we have to substract from every signal point, obtaining a zero-mean constellation.

  • Union Bound M > 2 [ML Detector]

    Assuming an ML Detector for the AWGN channel, it can be easily seen that:

    since Q() is a monotonically decreasing function

  • Nearest Neighbor Union Bound (NNUB) [ML Detector]

    Let be the number of points sharing a decision boundary with It is clear that the following holds:

  • Signal Sets and Measures

    Number of dimensions: If the signal bandwidth is approximately W and is approximately time-limited to T, then, the Dimensionality Theorem from Information Theory [Shannon, Landau, Pollack] states that the equivalent space has dimension N, which is:

    = 2

    If we have bits in a constellation in dimension , then, the parameters are:

    is a useful measure in compound signal sets with different numbers of dimensions

  • Signal to noise ratio (SNR)

    =2

    =/

    / Constellation Figure of merit - CFM:

    (/2)

    2

    measures the quality of a constellation used with an AWGN channel

    As increases, we get better performance in AWGN (for same num. of bits/dim. only)

    Fair comparison between constellations:

    Make a multi-parameter comparison across the following measures:

  • Signal Constellations

    Cubic constellations

    = =01,

    where is the number of dimensions and

    = 1 = 0

    where {0,1}, depending on the bit sequence and = 2

  • Orthogonal Constellations: = ,

    Example: Bi-orthogonal (anti-podal) signal set = 2, = 2 signal points

    Circular Constellations: -th root of unity

  • Example 1: Quadrature Phase-Shift Keying (QPSK)

    The constellation consists of =12

    , where

    12

    =

    2

    1,1

    2

    1,+1

    2

    +1,1

    2

    +1,+1

    2 = 2 , =

    22

    2

    2

    = 1

    In the particular case of BPSK, 2 = 4

  • What is the Error probability of QPSK ?

    = =0

    3

    | = |0 = 1 2

    2

    = 1 22

    + 2

    2

    = 1 = 22

    2

    2

    < 22

    NNUB

    Notice how for reasonably large, the NNUB becomes tight

  • Example 2: M-ary Phase-Shift Keying (MPSK) [ML Detector]

    = 2 sin

    , =

    sin

    2

    2

    = 2sin2

    < 22 sin

    2= 2

    sin

  • Lattice-based Constellations

    A lattice is a regular arrangement of points in an N-dimensional space = ,

    where is called the generator matrix

    Ex: Integer lattice corresponds to the case: = ,

    Pulse Amplitude Modulation (PAM) constellation corresponds to N = 1, then:

    =2

    122 1 ,

    2 =122 1

    , =3

    2 1

    It can be easily shown that the probability of error of an ML Detector:

    = 2

    1 2

    2

    +2

    1

    2

    = 2 1 1

    2

  • Other lattice-based construccions

    Quadrature Amplitude Modulation (QAM): Cookie-slice of 2-dimensional integer lattice. Other constellations are carved out of other lattices (e.g. hexagonal lattices, etc...)

    Other performance measures of interest that may appear in the course:

    Coding gain: =1

    2

    Shaping gain of a lattice

    Peak-to-average ratio

  • Passband Systems

    Passband transmission centered at frequency fc Examples: TV Broadcast, cellular and cordless phones, etc...

  • Equivalent representations

    Carrier modulated signal x(t) is given by: = () cos( + ())

    Quadrature decomposition: = cos

    + () sin()

    Hence:

    = 2 + 2 , = tan1

    ()

    = cos , =() sin ()

    (Complex) Baseband-equivalent signal:

    + (note that there is no reference to )

    Analytic (pass-band) equivalent signal:

    = = +

    = cos sin()

    ()

    + cos sin()

    ()

    What is ()?

  • () is the Hilbert Transform of ()

    = =

    , =

    1 , 0

    0, = 0

    Letting = Im , notice that: =

    = +

    = cos + sin()

    + () cos sin()

  • Summarizing, there are four equivalent representations for = () cos( + ())

    1. Magnitude and phase: , ()2. In-phase and Quadrature phase: , 3. Complex baseband equivalent signal: 4. Analytic signal:

  • Frequency Analysis

    Let us assume that the signal x(t) is bandlimited so that: = 0, = 0, >

    If {} denotes the Fourier Transform, then:

    = = cos sin() =1

    2

    0 >0

    +0 0

    + 0

  • Now consider:

    sign =

    2 + +

    2 + +

    =1

    2 + +

    1

    2 + +

    Hence,1 sign = sin + co() = Im = Im = () = = sign

    Hilbert Transform only affects the phase but not the magnitude, = sign

    Therefore, = + () = + ( sign ) = (1 + sign )

  • Fourier Relationships

  • Channel Input/Output Relationships

    Given a passband channel , we can write the channel output as:

    since 0 only for > 01

    21 + sign = 1

    Similarly,

  • Representation of Passband Channels

  • Baseband equivalent channels

    Representation with baseband signals

  • Summary of signal representations

    Passband = = ()

    Analytic equivalent

    =1

    2

    = =1

    2 ()

    Baseband

    =1

    2

    =1

    2 = +

  • Baseband equivalent Gaussian noise

    Consider the WSS noise process () with autocorrelation ():

    = ( ) = = = 2

    with a PSD given by:

    = 2 =

    02 < < +

    0

    Bandlimited noise spectral density

  • Similarly to what we did before, let: = + ()

    where =

    Since = sign , this implies that =

    2 = sign 2 =

    Consider now the cross-correlations between and =

    ( ) = = () =

    ( ) = = ()

    Hence, the following holds: =

    ( )= + 2 = + ( ()) +

    Therefore, we get that: = 2( + )

    which implies that: = = 2(1 + sign()) = 4 > 00

  • Hence:

    Since = , we have that =

    , yielding:

    = + = 20 < 0

    Problem: baseband noise has double energy as compared to passband noise

    Here is where de factor 2 shows up

  • The factor of 2 (contd)

    Similar issue occurs with deterministic modulated signals. Consider the QAM (complex) base-band signal:

    = 2 1 + 2 = 21 ()

    + 22 ()

    , 1, 2 {+1,1}

    and the corresponding two QAM passband functions:1 = 2 cos() ,2 = 2 sin()

    The modulated signal is: = 11 + 22 =21 cos() 22 sin()

    It can be checked that if is normalized, i.e. = 1, then: 1 =2 = 1

    Thus, under modulation, the factor 2 is needed.

  • Let us verify it:

    Representation of additive noise channel

  • Scaling is necessary only for analytical convenience, since SNR at receiver is not changed (signal is processed after it is received).

  • Circularly symmetric complex Gaussian processes

    Let = + , where and denote real and imaginary components respectively.

    is a complex Gaussian , are jointly Gaussian

    Thinking of

    as a vector, then:

    = 2

    2

    but since = , there are only three degrees of freedom.

    Notice however that: = = 2 + 2

    thus, this is not sufficient to specify the Gaussian random variable .

    Consider also the quantity: = = 2 2 + 2

    which gives two degrees of freedom

    A random variable Z is circularly symmetric iff = 0, which implies:

    = 0 2 = 2

    = 0

    = 2 0

    0 2(2D circular)

  • Same concept can be generalized easily to vectors:

    =1

    , = +

    is a complex Gaussian circularly symmetric random vector iff

    The probability density function (pdf) for complex gaussian vectors is:

    =1

    ()

    1()

    where = and =

    How is the Hypothesis testing done in the complex domain ?

  • Gaussian Hypothesis Testing - Complex case

    Consider the usual problem: = + , where , , and is a circularly symmetric complex Gaussian vector with =

    2, = 0

    where , =

    Regarding the ML Detector: = argmax

    2 = argmax

    2Re , 2 = argmin

    2