Information Theory and Communications

Information Theory and CommunicationsCSM25 Secure Information Hiding

Dr Hans Georg Schaathun

University of Surrey

Spring 2007

Dr Hans Georg Schaathun Information Theory and Communications Spring 2007 1 / 44

Learning Outcomes

become familiar with fundamental concepts in communicationsEntropy and RedundancyError-control codingCompression

be able to link communications fundamentals to steganography

Communications essentials Communications and Redundancy

Outline

1 Communications essentialsCommunications and RedundancyDigital CommunicationsShannon EntropySecurityPrediction

2 CompressionRecollectionHuffmann CodingHuffmann Steganography

3 Grammars

The communications problem

m m̂//

Noisychannel

// //Enc. Dec.// c // r // //

Alice Bob

Bob’s problemEstimate m,given (partly) random output m̂ from the channel

How much (un)certainty does Bob have about m?Information theory and Shannon entropy.

Noisychannel

Enc. Dec.// c // r // //

Alice Bob

Noisychannel

Enc. Dec.// c // r // //

Alice Bob

Noisychannel

Enc. Dec.// c // r // //

Alice Bob

Noisychannel

Enc. Dec.// c // r // //

Alice Bob

Noisychannel

Enc. Dec.// c // r // //

Alice Bob

Redundancy of English

FactThe English language is more than 50% redundant.

from http://www.cdt.org/crypto/glossary.shtml

Message destroyed on the channelRedundancy allows Bob to determine the original m.

t** p*oce*s o**hid**g *ata**nsid* o*her**ata. For ex*****, a **xt f*lec**ld*** hid*** "in**de"****im*ge or***s**nd *ile* By look****at t*eim*g***or list***** to th**s**nd,*yo* w*u*d n*t *no**that***ere is *x*rainfo******* *r*sent.

t*e p*oce*s o* hid**g *ata*insid* o*her*data. For ex*m***, a t*xt f*lec**ld*b* hidd** "ind*de" a**im*ge or*a*s*und *ile* By look**g*at t*eim*g*,*or list**in* to th* s**nd,*yo* w*uld n*t *no**that *here is *x*rainfo*****on *r*sent.

the process of hiding data inside other data. For example, a text filecould be hidden "inside" an image or a sound file. By looking at theimage, or listening to the sound, you would not know that there is extrainformation present.

Benefits of redundancy

Cross-word puzzlesUnderstand foreigners with imperfect pronounciation.

How much would you understand of a lecture without redundancy?

Hear in a noisy environment.Read bad hand writing

How could I mark exam scripts without redundancy?

Cryptanalysis? Steganalysis?

What if there were no redundancy?

No use for steganography!Any text would be meaningful,

in particular, ciphertext would be meaningfulSimple encryption would give a stegogramme

indistinguishable from cover-text.

Problems in natural language

Natural languages are arbitrarySome words/sentences have a lot of redundancy

Others have very little

Unstructured: hard to automate correction

Communications essentials Digital Communications

Outline

3 Grammars

CodingChannel and source coding

Source coding (aka. compression)Remove redundancyMake a compact representation

Channel coding (aka. error-control coding)Add mathematically structured redundancyComputationally efficient error-correctionOptimised (low error-rate, small space)

Two aspect of Information Theory

Channel and Source Coding

Message

Comp.��

Enc. Dec.Channel //

Decom.

��

Encrypt.��

��

Decrypt.

OOScramble

Remove redundancy

Add redundancy

Message

Comp.��

Enc. Dec.Channel //

Decom.

��

Encrypt.��

��

Decrypt.

OOScramble

Remove redundancy

Add redundancy

Message

Comp.��

Enc. Dec.Channel //

Decom.

��

Encrypt.��

��

Decrypt.

OOScramble

Remove redundancy

Add redundancy

Message

Comp.��

Enc. Dec.Channel //

Decom.

��

Encrypt.��

��

Decrypt.

OOScramble

Remove redundancy

Add redundancy

Communications essentials Shannon Entropy

Outline

3 Grammars

UncertaintyShannon Entropy

m and r are stochastic variables(drawn at random from a distribution)

How much uncertainty about the message m?Uncertainty measured by entropyH(m) before any message is received.H(m|r) after receipt of the message

Conditional entropy

Mutual Information is derived from entropyI(m; r) = H(m)− H(m|r)I(m; r) is the amount of information contained in r about m.I(m; r) = I(r; m)

Conditional entropy

Shannon entropyDefinition

Random variable X ∈ X

Hq(X ) = −∑x∈X

Pr(X = x) logq Pr(X = x)

Usually q = 2, giving entropy in bitsq = e (natural logarithm) gives entropy in nats

If Pr(X = xi) = pi for x1, x2, . . . ∈ X , we writeH(X ) = h(p1, p2, . . .)

Example: One question Q; Yes/No is 50-50 probability

H(Q) = −2(

12 log 1

Hq(X ) = −∑x∈X

H(Q) = −2(

12 log 1

Hq(X ) = −∑x∈X

H(Q) = −2(

12 log 1

Hq(X ) = −∑x∈X

H(Q) = −2(

12 log 1

Shannon entropyProperties

1 Additive, if X and Y are independent, thenH(X , Y ) = H(X ) + H(Y ).

If you are uncertain about two completely different questions,the entropy is the sum of uncertainty for each question

2 If X is uniformly distributed,then H(X ) increase when the size of X increases.The more possibilities, the more uncertainty

3 Continuity, h(p1, p2, . . .) is continuous in each pi .

Shannon entropy is a measure in mathematical terms

What it tells usShannon entropy

Consider a message X of entropy k = H(X ) (in bits)The average size of a file F describing X is

at least k bitsIf the size of F is exactly k bits on average

then we have found a perfect compression of FEach message bit contains one bit of information on average

What it tells usShannon entropy

Consider a message X of entropy k = H(X ) (in bits)The average size of a file F describing X is

at least k bitsIf the size of F is exactly k bits on average

then we have found a perfect compression of FEach message bit contains one bit of information on average

Example banale

A single bit may contain more than a 1 bit of informationE.G. Image Compression

0: Mona Lisa10: Lenna110: Baboon11100: Peppers11110: F-1611101: Che Guevarra11111. . . : other images

However, on average,Maximum information in one bit is one bit(most of the time it is less)

The example is based on Huffmann coding

Example banale

Communications essentials Security

Outline

3 Grammars

Cryptography

Alice ciphertext Bob,m → c → m

Eve seeks information about m, observing cIf I(m; c) > 0 then Eve succeeds in theory

or if I(k; c) > 0

If H(m|c) = H(m) then the system is absolutely secure.The above are strong statements

Even if Eve has information I(m; c) > 0,she may be unable to make sense of it.

Stegananalysis

Question: Does Alice send secret information to Bob?Answer: X ∈ {yes, no}

What is the uncertainty H(X )?Eve intercepts a message S,

Is there any information I(X ; S)?

If H(X |S) = H(X ), then the system is absolutely secure.

Stegananalysis

Communications essentials Prediction

Outline

3 Grammars

Random sequences

Text is a sequence of random samples (letters)(l1, l2, l3, . . .); li ∈ A = {A, B, . . . , Z}

Each letter has a probability distribution P(l), l ∈ A.Statistical dependence (aka. redundancy)

P(li |li−1) 6= P(li)H(li |li−1) < H(li): Letter i − 1 contains information about liUse this information to guess li

The more letters li−j , . . . , li−1 we have seenthe more reliable can we predict li

Wayner (Ch 6.1) gives example of first, second, . . . , fifth orderprediction

Using j = 0, 1, 2, 3, 4

First-order predictionExample from Wayner

Second-order predictionExample from Wayner

Third-order predictionExample from Wayner

Fourth-order predictionExample from Wayner

Compression Recollection

Outline

3 Grammars

Compression

F∗ is set of binary strings of arbitrary length

DefinitionA compression system is a function c : F∗ → F

∗, such thatE(length m) > E(length(c(m))) when m is drawn from F

The compressed string is expected to be shorter than the original.

DefinitionA compression c is perfect if all target strings are used, i.e. if for anym ∈ F∗, c−1(m) is a sensible file (cover-text).

Decompress a random string, and it makes sense!

Steganography by Perfect CompressionAnderson and Petitcolas 1998

A perfect compression scheme.A secure cipher.

Decompress

Encryption

C��

CompressS //

Decrypt

Message

��

MessageOO

Keyoo //

Steganography without data hiding.

Steganography by Perfect CompressionAnderson and Petitcolas 1998

A perfect compression scheme.A secure cipher.

Decompress

Encryption

C��

CompressS //

Decrypt

Message

��

MessageOO

Keyoo //

Steganography without data hiding.

Compression Huffmann Coding

Outline

3 Grammars

Huffmann Coding

Short codewords for frequent quantitiesLong codewords for unusual quantitiesEach symbol (bit) should be equally probable

ONMLHIJK50%��

��

0 ????

ONMLHIJK25%��

��

0 ONMLHIJK25%

Example

ooooooooooooo

0 OOOOOOOOOOOOO

ONMLHIJK25%��

��

0 ????

��

0 WVUTPQRS12 12 %

ONMLHIJK25%��

��

0 ONMLHIJK25%

ONMLHIJK7 14 %

1ONMLHIJK7 14 %

��

Decoding

Huffmann codes are prefix freeNo codeword is the prefix of anotherThis simplifies the decoding

This is expressed in the Huffmann tree,follow edges for each coded bit(only) leaf node resolves to a message symbol

When a message symbol is recovered, start over for next symbol.

Ideal Huffmann code

Each branch equally likely: P(bi |bi−1, bi−2, . . .) = 1/2Maximum entropy H(Bi |Bi−1, Bi−2, . . .) = 1

uniform distribution of compressed filesimplies perfect compression

In practice, the probabilities are rarely powers of 12

hence the Huffmann code is imperfect

Compression Huffmann Steganography

Outline

3 Grammars

Reverse Huffmann

Core Reading

Peter Wayner: Disappearing Cryptography Ch. 6-7

Stego-encoder: Huffmann decompressionStego-decoder: Huffmann compressionIs this similar to Anderson & Petitcolas

Steganography by Perfect Compression?

The Stegogramme

Stegogramme looks like random textuse probability distribution based on sample texthigher-order statistics make it look natural

Fifth-order statistics is reasonableHigher order will look more natural

The Stegogramme

Stegogramme looks like random textuse probability distribution based on sample texthigher-order statistics make it look natural

Fifth-order statistics is reasonableHigher order will look more natural

ExampleFifth order

For each 5-tupple of letters A0, A1, A2, A3, A4,Let li−4, . . . , li be consecutive letters in natural texttabulate P(li = A0|li−j = Aj , j = 1, 2, 3, 4)

For each 4-tuple A1, A2, A4, A5make an (approximate) Huffmann code for A0.

we may ommit some values of A0,or have non-unique codewords

We encode a message by Huffmann decompressionusing Huffmann code depending on the last four stegogrammesymbolsobtaining a fifth-order random text

ExampleFifth order

Consider four preceeding letters compNext letter may be

letter r e l a oprobability 40% 12% 22% 18% 8%combined 52% 22% 26%rounded 50% 25% 25%

Rounding to power of 12

Combining several letters reduces rounding error.

The example is arbitrary and fictuous.

ExampleThe Huffmann code

Huffmann code based on fifth-order conditional probabilities

ONMLHIJKr/e��

��

0 ????

?>=<89:;l��

��

0 ONMLHIJKa/o??

When two letters are possible,choose at random (according to probalitity in natural text)decoding (compression) is still uniqueencoding (decompression) is not unique

This evens out the statistics in the stegogramme

Is this practical?Exercise

To be discussed in groups of 2-4.

How would you steganalyse a potential Huffmann-basedstegogramme?How practical is the steganalysis?How would you implement Huffmann-based steganography?

Which implementation issues/challenges do you foresee?

Grammars

Grammar

A grammar describes the structure of a languageSimple grammar

sentence → noun verbnoun → Mr. Brown | Miss Scarletverb → eats | drinks

Each choice can map to a message symbol0 : Mr. Brown, eats1 : Miss Scarlet, drinks

Two messages can be stego-encryptedNo cover-text is input.

Grammars

More complex grammar

Grammars

Discussion

How practical is a grammar-based stego-system?Which implementation issues do you foresee?Can you visualise a grammar-variant for images?

Information Theory and Communications

Documents

Transcript of Information Theory and Communications