CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg...

Statistics and SteganalysisCSM25 Secure Information Hiding

Dr Hans Georg Schaathun

University of Surrey

Spring 2008

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2008 1 / 42

Learning Outcomes

After this session, everyone shouldhow statistical methods apply to steganographyunderstand how a statistical hypothesis can be usedbe able to implement the basic χ2 test of steganalysis

Suggested Reading

«Higher-order statistical steganalysis of palette images»by Jessica Fridrich, Miroslav Goljan, David Soukal in Proc. SPIEElectronic Imaging, Jan 2003, pp. 178-190

General Introduction Statistical models

Outline

1 General IntroductionStatistical modelsHistogramme

2 The χ2 testPairs of ValuesI visual approachHypothesis testingThe error types

3 PostlogueGeneralised χ2 testSummary

The fundamental question

Wendy the Warden intercepts an image.

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Is the image a stegogramme?

Is it a probable, natural image?

Is it a probable stegogramme?

A visual example

Two different patterns in LSB... sharp borderWhy?

Corresponding border in full image?No explanation in full message⇒ probably stego...

... but not certain

A visual example

... but not certain

A visual example

... but not certain

A visual example

... but not certain

The remit of statistics

Statistics can estimate ‘normal’ behaviourand compare behaviours

AdvantagesAutomated decisionsExtract detailExact, quantifiable featuresAggregate measures

The remit of statistics

Statistics can estimate ‘normal’ behaviourand compare behaviours

AdvantagesAutomated decisionsExtract detailExact, quantifiable featuresAggregate measures

General Introduction Histogramme

Outline

A typical image

Image histogram made by imhist in MatlabGives number of pixels per colour-value

And a stego-image

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

What happened?

What is characteristic?Pairs of values

Consider colour 2i (i = 0, 1, . . . , 127)What happens under LSB embedding?2i → 2i , 2i + 1Never 2i → 2i − 1.

Likewise 2i + 1 → 2i , 2i + 1(2i , 2i + 1) is a Pair of ValuesA pixel in (2i , 2i + 1) before embedding

... is a pixel in (2i , 2i + 1) after embedding

The χ2 test Pairs of Values

Outline

Pairs of ValuesThe statistic

Image X . Random variable Yk = #(x , y)|Xxy = kThe Yk -s is the Histogramme.

Recall that (2l , 2l + 1) is a pair of values.First 7 pixel bits determined by image colour.

i.e. which pairLast bit (LSB) determined by message

i.e. which half of the pair

Pairs of ValuesExpected behaviour

Sum Y2l + Y2l+1 unaffected by embedding.For a random message

Expect 50-50 2l and 2l + 1i.e. E(Y2l) = 1

2 (Y2l + Y2l+1)

Can we make a statistic out of this?

Pairs of ValuesExpected behaviour

Sum Y2l + Y2l+1 unaffected by embedding.For a random message

Expect 50-50 2l and 2l + 1i.e. E(Y2l) = 1

2 (Y2l + Y2l+1)

Can we make a statistic out of this?

The χ2 statistic

S =∑o∈Ω

(Fo − E(Fo))2

E(Fo), (general χ2 statistic),

S =127∑l∈0

(Y2l − 12(Y2l + Y2l+1))

12(Y2l + Y2l+1)

. (pairs of values)

Definition

SPoV =127∑l∈0

12(Y2l − Y2l+1)

Y2l + Y2l+1.

#Ω− 1 degrees of freedom

The χ2 PDF

The Pairs-of-Values χ2 Distribution

χ2 PDF127 degrees offreedomRed: 2% prob.+Green: 5%+Blue: 10%CumulativeDensityFunction (CDF)

Area underthe curve

χ2 in Matlab

Defined in the Statistics toolboxSimplified functions available on website:

chi2cdfchi2pdfchi2inv

You may have to exclude pixel values which do not occurthis may give fewer degrees of freedom

The χ2 test I visual approach

Outline

The p-value

Let S be a stochastic χ2 distributed variableLet s be the observed χ2 statisticDefine p-value:p = P(S < s)

I.e. low p-value ⇒ s is unusually smallImprobable if the image is a stegogramme.Conclusion: probably natural image

PlotsNo message

χ2 statistic p-value

Plots30% of capacity

The χ2 test Hypothesis testing

Outline

The null hypothesis

null hypothesis

H0 : The image X is a stegogramme.

Statistic with known distribution under H0S is χ2 distributed with 127 degrees of freedom.

We decide on a threshold T such thatPr(S > T |H0) is small

If the observed x > t we reject H0.

The null hypothesis

null hypothesis

The null hypothesis

null hypothesis

The null hypothesis

null hypothesis

The null hypothesis

null hypothesis

The null hypothesis

null hypothesis

Level of Significance

Before testing, choose desired level of significance α

Threshold T is taken such that Pr(X > T |H0) < α.If we observe X > T , we reject H0 at significance level αIf we observe X < T , we could not reject H0 at a significance level α

Equivalently, compare the p-value against α

p < α ⇒ Reject

RemarkIf H0 is true, the probability that the hypothesis test gives the wrongconclusion is α.

p < α ⇒ Reject

Choosing the level of significance

Say you gather the data first, and then choose level ofsignificance.

How does this influence the test?Error probability?

Tuning α to observations means you always reject the nullhypothesis(a priori) error probability under H0 is 100%

or bounded by the maximum α you would have accepted.

Level of significance is only meaningful if chosen in advance.

Common misconceptions

After the test, when we have or have not rejected H0The probability that H0 is correct is not α.The probability that H0 is false is not α either.

RemarkNo simple relation between level of significance and the probability ofany hypothesis being right or wrong.

In Matlab

Consider the relation Threshold — Level of Significance

Pr(X > T |H0) < α

α = 1− chi2cdf(T , 127)T = chi2inv(1− α, 127)

To plot the PDFX = [0:1:300]plot ( X, chi2pdf(X,127) )

In Matlab

Consider the relation Threshold — Level of Significance

Pr(X > T |H0) < α

α = 1− chi2cdf(T , 127)T = chi2inv(1− α, 127)

To plot the PDFX = [0:1:300]plot ( X, chi2pdf(X,127) )

The χ2 test The error types

Outline

Hypothesis tests

Hypothesis testing is a recurring theme in statistics.Typical hypotheses

Treatment A makes patients recover more quickly than notreatment.The climate in South-East Britain is as warm today as it was a 100years ago.The image sent by Alice is a stegogramme.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Hypothesis tests

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

The weirdness of the steganalysis

H0: The message is a stegogramme.

We consider it (implicitely) serious to declare the messageinnocent when it is a stegogramme.Why?

Makes strong surveillance regime.Might be appropriate for prison scenario.

Real reasonProbability distribution known only for stegogrammes.We require known distribution under H0.

Calculating probability of Type I Errors

DefinitionA Type I Error is the event that

H0 is true; andH0 is rejected.

What is the error rate?We want to calculate the conditional probability

Pr(Reject H0|H0) = Pr(X > t |H0).

Because of H0, distribution of X is known.Hence the error probability can be looked up.

Type II Errors

In theory: Similar to Type I Errors.In practice: What is the distribution of X when H0 is false?

Do we know this distribution at all?

RemarkVery often, we will not know the error probability.

Type II Errors

A problem of the χ2 test

Accusing Alice of sending a stegogramme when she is not, iscalled false positive.Suppose false positives is a serious matter.How can we limit the risk of false positives?False positives are Type II Errors.Distribution when H0 is false is unknown

RemarkWe cannot (theoretically) bound the probability of false positives in theχ2 test.

Postlogue Generalised χ2 test

Outline

Postlogue Generalised χ2 test

Randomised location

PoV assumes embedding in consecutive bitsGeneralised χ2 proposes a fixFridrich et al (2003) suggests an implementationNo rigid hypothesis test or statistical theory

works experimentally

Postlogue Summary

Outline

Postlogue Summary

Summary

Steganalysis can be cast as a problem of statisticsstandard statistical theory applies

The Pairs-of-Values χ2 test is a simple exampleThe weekly exercise is to implement and test this steganalysistechnique.

See website for detailed assignment.

CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg...

Documents

Transcript of CSM25 Secure Information Hiding Dr Hans Georg … · CSM25 Secure Information Hiding Dr Hans Georg...