27may_3

3
The Mathematics of Entanglement - Summer 2013 27 May, 2013 Quantum Entropy Lecturer: Aram Harrow Lecture 3 1 Shannon Entropy In this part, we want to understand quantum information in a quantitative way. One of the important concepts is entropy. But let us first look at classical entropy. Given is a probability distribution p R d + , i p i = 1. The Shannon entropy of p is H (p)= - X i p i log p i (log is always to base to as we are talking about bits and units, convention lim x70 x log x = 0). Entropy quantifies uncertainty. We have maximal certainty for a deterministic distribution, e.g. p = (1, 0,..., 0), H (p) = 0. The distribution with maximal uncertainty is p =( 1 d ,..., 1 d ), H (p)= log d. In the following we want to give Shannon entropy an operational meaning with help of the problem of data compression. For this imagine you have a binary alphabet (d = 2) and you sample n times independently and identically distribution from the distribution p =(π, 1 - π); we write X 1 ,...,X n i.i.d. {0, 1}. (Prob[X i = 0] = π Prob[X i = 1] = 1 - π) Typically, the number of 0’s in the string ± O( n and the number of 1’s n(1 - π) ± O( n. In order to see why this is the case, consider the sum S = X 1 + ...,X n (this equals the number of 1’s in the string). The expectation of this random variable is E[S ]= E[X 1 ]+ ... + E[X n ]= n(1 - π), where we used the linearity of the expectation value. Furthermore the variance of S is V ar[S ]= V ar[X 1 ]+ ... + V ar[X n ]= nV ar[X 1 ]= (1 - π) 1 4 . Here we used the independence of the random variable in the first equation and V ar[X 1 ]= E[X 2 1 ] - E[X 1 ] 2 = (1 - π) - (1 - π) 2 = π(1 - π) in the third. This implies that the standard deviation of S is smaller than n 2 . What does this have to do with compression? The total number of possible n-bit strings is |{0, 1} n | =2 n . The number of strings with 0’s is ( n πn ) = n! (πn)!((1-π)n)! = (n/e) n (πn/e) πn ((1-π)n/e) (1-π)n = (1) (1/(1-π)) n(1-π) where we used Stirling’s approximation. We can rewrite this as exp(log 1+ (1 - π) log 1/(1 - π)) = exp(nH (p)). Hence, we only need to store around exp(nH (p)) possible strings, which we can do in a memory having nH (p) bits. (Note we ignored the fluctuations. If we took them into account, we would only need additional O( n) bits.) This analysis easily generalises to arbitrary alphabets (not only binary). 2 Typical I now want to give you a different way of looking at this problem, a way that is both more rigorous and will more easily generalise to the quantum case. This we will do with help of typical sets. Again let X 1 ,...,X n be i.i.d distributed with distribution p. The probability of a string is then given by 3-1

description

quantum excerice math

Transcript of 27may_3

Page 1: 27may_3

The Mathematics of Entanglement - Summer 2013 27 May, 2013

Quantum Entropy

Lecturer: Aram Harrow Lecture 3

1 Shannon Entropy

In this part, we want to understand quantum information in a quantitative way. One of theimportant concepts is entropy. But let us first look at classical entropy.

Given is a probability distribution p ∈ Rd+,∑

i pi = 1. The Shannon entropy of p is

H(p) = −∑i

pi log pi

(log is always to base to as we are talking about bits and units, convention limx 7→0 x log x = 0).Entropy quantifies uncertainty. We have maximal certainty for a deterministic distribution, e.g.p = (1, 0, . . . , 0), H(p) = 0. The distribution with maximal uncertainty is p = (1d , . . . ,

1d), H(p) =

log d.In the following we want to give Shannon entropy an operational meaning with help of the

problem of data compression. For this imagine you have a binary alphabet (d = 2) and you samplen times independently and identically distribution from the distribution p = (π, 1 − π); we writeX1, . . . , Xn ∈i.i.d. {0, 1}. (Prob[Xi = 0] = π Prob[Xi = 1] = 1− π)

Typically, the number of 0’s in the string nπ±O(√n and the number of 1’s n(1−π)±O(

√n. In

order to see why this is the case, consider the sum S = X1 + . . . , Xn (this equals the number of 1’sin the string). The expectation of this random variable is E[S] = E[X1] + . . .+E[Xn] = n(1− π),where we used the linearity of the expectation value. Furthermore the variance of S is V ar[S] =V ar[X1] + . . . + V ar[Xn] = nV ar[X1] = nπ(1 − π) ≤ 1

4 . Here we used the independence of therandom variable in the first equation and V ar[X1] = E[X2

1 ]−E[X1]2 = (1−π)−(1−π)2 = π(1−π)

in the third. This implies that the standard deviation of S is smaller than√n2 .

What does this have to do with compression? The total number of possible n-bit strings is|{0, 1}n| = 2n. The number of strings with nπ 0’s is

(nπn

)= n!

(πn)!((1−π)n)! = (n/e)n

(πn/e)πn((1−π)n/e)(1−π)n =

(1/π)nπ(1/(1−π))n(1−π) where we used Stirling’s approximation. We can rewrite this as exp(nπ log 1/π+(1 − π) log 1/(1 − π)) = exp(nH(p)). Hence, we only need to store around exp(nH(p)) possiblestrings, which we can do in a memory having nH(p) bits. (Note we ignored the fluctuations. If wetook them into account, we would only need additional O(

√n) bits.) This analysis easily generalises

to arbitrary alphabets (not only binary).

2 Typical

I now want to give you a different way of looking at this problem, a way that is both more rigorousand will more easily generalise to the quantum case. This we will do with help of typical sets.

Again let X1, . . . , Xn be i.i.d distributed with distribution p. The probability of a string is thengiven by

3-1

Page 2: 27may_3

Prob[X1 . . . Xn = x1, . . . , xn] = p(x1)p(x2) · · · p(xn) where we used the notation (random vari-ables are in capital letters and values in small letters)

p⊗(n) = p⊗ p · · · ⊗ p (n times).

xn = (x1, . . . , xn) ∈ Σn

where Σ = {1, . . . , d} is the alphabet and Σn denotes strings of length n over that alphabetNote that

log p⊗n(xn) =n∑i=1

log p(xi) ≈ nE[log p(xi)]±√n√V ar[log p(xi] = −nH(p)±O(

√n)

where we used

E[log p(xi)] =∑i

p(xi) log p(xi) = −H(p)

Let us now define the typical set as the set of strings whose

Tp,n,δ = {xn ∈ Σ| log p⊗n(xn) + nH(p)| ≤ nδ}

Then∀δ > 0 lim

n→∞p⊗nTp,n,δ = 1

Our compression algorithm simply keeps all the strings in the typical set and throws away allothers. Hence, all we need to know the size of the typical set. This is easy. Note that

xn ∈ Tp,n,δ =⇒ exp(−nH(p)− nδ) ≤ p⊗n(xn) ≤ exp(−nH(p) + nδ)

Note1 ≥ p⊗n(Tp,n,δ) ≥ |Tp,n,δ|min p⊗n(xn)

where the minimum is over all strings in the typical set. This implies

1 ≥ |Tp,n,δ| exp(−nH(p)− nδ)

which is equivalent tolog |Tp,n,δ| ≤ nH(p) + nδ

Exercise: Show this is optimal. More precisely, show that we cannot compress to nR bits forR < H(p) unless the error does not go to zero. Hint: Use Chebycheff ienquality: Let Z be a randomvariable Prob[|Z − E[Z]| ≥ kSD[Z]] ≤ 1/k2 Possible simplifications: 1) pretend all strings to betypical 2) use exactly nR bits.

3-2

Page 3: 27may_3

3 Quantum compression

Probability distributions are replaced by density matrices ρ⊗n = ρ⊗· · · ρ (n times) If ρ is a state ofa qubit then this state lives on a 2n dimensional space. The goal of quantum data compression is torepresent this state on a smaller dimensional subspace. Just as before in bits, we now measure thesize in terms of the number of qubits needed to represent that subspace, the log of the dimension.

It turns out to be possible (and optimal) to do this in nS(ρ)±nδ where S is the von Neumannentropy S(ρ) = −

∑λi log λi = H(λ) = −trρ log ρ, where the λi are the eigenvalues of the density

operator.

3-3