G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data...

50
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN – Latin-American School on High Energy Physics Natal, Brazil, 4 April 2011 Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan

Transcript of G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data...

Page 1: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1

Topics in Statistical Data Analysis for HEPLecture 1: Parameter Estimation

CERN – Latin-American Schoolon High Energy Physics

Natal, Brazil, 4 April 2011

Glen CowanPhysics DepartmentRoyal Holloway, University of [email protected]/~cowan

Page 2: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 2

Outline

Lecture 1: Introduction and basic formalismProbabilityParameter estimationStatistical tests

Lecture 2: Statistics for making a discoveryMultivariate methodsDiscovery significance and sensitivitySystematic uncertainties

Page 3: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 3

A definition of probability

Consider a set S with subsets A, B, ...

Kolmogorovaxioms (1933)

Also define conditional probability:

Page 4: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 4

Interpretation of probabilityI. Relative frequency

A, B, ... are outcomes of a repeatable experiment

cf. quantum mechanics, particle scattering, radioactive decay...

II. Subjective probabilityA, B, ... are hypotheses (statements that are true or false)

• Both interpretations consistent with Kolmogorov axioms.• In particle physics frequency interpretation often most useful, but subjective probability can provide more natural treatment of non-repeatable phenomena: systematic uncertainties, probability that Higgs boson exists,...

Page 5: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 5

Bayes’ theoremFrom the definition of conditional probability we have

and

but , so

Bayes’ theorem

First published (posthumously) by theReverend Thomas Bayes (1702−1761)

An essay towards solving a problem in thedoctrine of chances, Philos. Trans. R. Soc. 53(1763) 370; reprinted in Biometrika, 45 (1958) 293.

Page 6: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 6

Frequentist Statistics − general philosophy In frequentist statistics, probabilities are associated only withthe data, i.e., outcomes of repeatable observations.

Probability = limiting frequency

Probabilities such as

P (Higgs boson exists), P (0.117 < s < 0.121),

etc. are either 0 or 1, but we don’t know which.The tools of frequentist statistics tell us what to expect, underthe assumption of certain probabilities, about hypotheticalrepeated observations.

The preferred theories (models, hypotheses, ...) are those for which our observations would be considered ‘usual’.

Page 7: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 7

Frequentist approach to parameter estimationThe parameters of a probability distribution function (pdf) are constants that characterize its shape, e.g.

random variable

Suppose we have a sample of observed values:

parameter

We want to find some function of the data to estimate the parameter(s):

← estimator written with a hat

Page 8: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 8

Properties of estimatorsIf we were to repeat the entire measurement, the estimatesfrom each would follow a pdf:

biasedlargevariance

best

We want small (or zero) bias (systematic error):

→ average of repeated measurements should tend to true value.

And we want a small variance (statistical error):

→ small bias & variance are in general conflicting criteria

Page 9: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 9

The likelihood functionSuppose the entire result of an experiment (set of measurements)is a collection of numbers x, and suppose the joint pdf forthe data x is a function that depends on a set of parameters :

Now evaluate this function with the data obtained andregard it as a function of the parameter(s). This is the likelihood function:

(x constant)

Page 10: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 10

Maximum likelihood estimatorsIf the hypothesized is close to the true value, then we expect a high probability to get data like that which we actually found.

So we define the maximum likelihood (ML) estimator(s) to be the parameter value(s) for which the likelihood is maximum.

ML estimators not guaranteed to have any ‘optimal’properties, (but in practice they’re very good).

Page 11: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 11

Example: fitting a straight line

Data:

Model: measured yi independent, Gaussian:

assume xi and i known.

Goal: estimate 0

(don’t care about 1).

Page 12: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 12

Maximum likelihood fit with Gaussian data

In this example, the yi are assumed independent, so thelikelihood function is a product of Gaussians:

Maximizing the likelihood is here equivalent to minimizing

i.e., for Gaussian data, ML same as Method of Least Squares (LS)

Page 13: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 13

Correlation between

causes errors

to increase.

Standard deviations from

tangent lines to contour

Variance of estimators

Several methods possible for obtaining (co)variances (effectively, the “statistical errors” of the estimates).

Page 14: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 14

The information on 1

improves accuracy of

Frequentist case with a measurement t1 of 1

Page 15: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 15

Bayesian Statistics − general philosophy In Bayesian statistics, interpretation of probability extended todegree of belief (subjective probability). Use this for hypotheses:

posterior probability, i.e., after seeing the data

prior probability, i.e.,before seeing the data

probability of the data assuming hypothesis H (the likelihood)

normalization involves sum over all possible hypotheses

Bayesian methods can provide more natural treatment of non-repeatable phenomena: systematic uncertainties, probability that Higgs boson exists,...

No golden rule for priors (“if-then” character of Bayes’ thm.)

Page 16: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 16

Bayesian methodWe need to associate prior probabilities with 0 and 1, e.g.,

Putting this into Bayes’ theorem gives:

posterior likelihood prior

← based on previous measurement

reflects ‘prior ignorance’, in anycase much broader than

Page 17: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 17

Bayesian method (continued)

Usually need numerical methods (e.g. Markov Chain MonteCarlo) to do integral.

We then integrate (marginalize) p(0, 1 | x) to find p(0 | x):

In this example we can do the integral (rare). We find

Page 18: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 18

Digression: marginalization with MCMCBayesian computations involve integrals like

often high dimensionality and impossible in closed form,also impossible with ‘normal’ acceptance-rejection Monte Carlo.

Markov Chain Monte Carlo (MCMC) has revolutionizedBayesian computation.

MCMC (e.g., Metropolis-Hastings algorithm) generates correlated sequence of random numbers:

cannot use for many applications, e.g., detector MC;effective stat. error greater than naive √n .

Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest.

Page 19: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 19

Although numerical values of answer here same as in frequentistcase, interpretation is different (sometimes unimportant?)

Example: posterior pdf from MCMCSample the posterior pdf from previous example with MCMC:

Summarize pdf of parameter ofinterest with, e.g., mean, median,standard deviation, etc.

Page 20: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 20

MCMC basics: Metropolis-Hastings algorithmGoal: given an n-dimensional pdf

generate a sequence of points

1) Start at some point

2) Generate

Proposal densitye.g. Gaussian centredabout

3) Form Hastings test ratio

4) Generate

5) If

else

move to proposed point

old point repeated

6) Iterate

Page 21: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 21

Metropolis-Hastings (continued)This rule produces a correlated sequence of points (note how each new point depends on the previous one).

For our purposes this correlation is not fatal, but statisticalerrors larger than naive

The proposal density can be (almost) anything, but chooseso as to minimize autocorrelation. Often take proposaldensity symmetric:

Test ratio is (Metropolis-Hastings):

I.e. if the proposed step is to a point of higher , take it;

if not, only take the step with probability

If proposed step rejected, hop in place.

Page 22: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 22

Bayesian method with alternative priorsSuppose we don’t have a previous measurement of 1 but rather, e.g., a theorist says it should be positive and not too much greaterthan 0.1 "or so", i.e., something like

From this we obtain (numerically) the posterior pdf for 0:

This summarizes all knowledge about 0.

Look also at result from variety of priors.

Page 23: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 23

Introduction to hypothesis testingA hypothesis H specifies the probability for the data, i.e., the outcome of the observation, here symbolically: x.

x could be uni-/multivariate, continuous or discrete.

E.g. write x ~ f (x|H).

x could represent e.g. observation of a single particle, a single event, or an entire “experiment”.

Possible values of x form the sample space S (or “data space”).

Simple (or “point”) hypothesis: f (x|H) completely specified.

Composite hypothesis: H contains unspecified parameter(s).

The probability for x given H is also called the likelihood ofthe hypothesis, written L(x|H).

Page 24: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 24

Definition of a testConsider e.g. a simple hypothesis H0 and alternative H1.

A test of H0 is defined by specifying a critical region W of thedata space such that there is no more than some (small) probability, assuming H0 is correct, to observe the data there, i.e.,

P(x W | H0 ) ≤

If x is observed in the critical region, reject H0.

is called the size or significance level of the test.

Critical region also called “rejection” region; complement isacceptance region.

Page 25: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 25

Definition of a test (2)But in general there are an infinite number of possible critical regions that give the same significance level .

So the choice of the critical region for a test of H0 needs to take into account the alternative hypothesis H1.

Roughly speaking, place the critical region where there is a low probability to be found if H0 is true, but high if H1 is true:

Page 26: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 26

Rejecting a hypothesisNote that rejecting H0 is not necessarily equivalent to thestatement that we believe it is false and H1 true. In frequentiststatistics only associate probability with outcomes of repeatableobservations (the data).

In Bayesian statistics, probability of the hypothesis (degreeof belief) would be found using Bayes’ theorem:

which depends on the prior probability (H).

What makes a frequentist test useful is that we can computethe probability to accept/reject a hypothesis assuming that itis true, or assuming some alternative is true.

Page 27: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 27

Physics context of a statistical test

Event Selection: the event types in question are both known to exist.

Example: separation of different particle types (electron vs muon)or known event types (ttbar vs QCD multijet).Use the selected sample for further study.

Search for New Physics: the null hypothesis H0 means Standard Model events, and the alternative H1 means "events of a type whose existence is not yet established" (to establish or exclude the signal model is the goalof the analysis).

Many subtle issues here, mainly related to the heavy burdenof proof required to establish presence of a new phenomenon.

The optimal statistical test for a search is closely related to that used for event selection.

Page 28: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 28

Suppose we want to discover this…

high pT

muons

high pT jets

of hadrons

missing transverse energy

p p

SUSY event (ATLAS simulation):

Page 29: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 29

But we know we’ll have lots of this…

SM event also has high p

T jets and muons, and

missing transverse energy.

→ can easily mimic a SUSY event and thus constitutes abackground.

ttbar event (ATLAS simulation)

Page 30: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 30

For each reaction we consider we will have a hypothesis for thepdf of , e.g.,

Example of a multivariate statistical testSuppose the result of a measurement for an individual event is a collection of numbers

x1 = number of muons,

x2 = mean pt of jets,

x3 = missing energy, ...

follows some n-dimensional joint pdf, which depends on the type of event produced, i.e., was it

etc.

Often call H0 the background hypothesis (e.g. SM events);H1, H2, ... are possible signal hypotheses.

Page 31: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 31

Defining a multivariate critical region

Each event is a point in x-space; critical region is now definedby a ‘decision boundary’ in this space.

What is best way to determine the decision boundary?

W

H1

H0

Perhaps with ‘cuts’:

Page 32: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 32

Other multivariate decision boundaries

Or maybe use some other sort of decision boundary:

WH1

H0

W

H1

H0

linear or nonlinear

Page 33: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 33

Test statisticsThe decision boundary can be defined by an equation of the form

We can work out the pdfs

Decision boundary is now a single ‘cut’ on t, defining the critical region.

So for an n-dimensional problem we have a corresponding 1-d problem.

where t(x1,…, xn) is a scalar test statistic.

Page 34: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 34

Significance level and power

Probability to reject H0 if it is true (type-I error):

(significance level)

Probability to accept H0 if H1 is true (type-II error):

( power)

Page 35: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 35

Signal/background efficiency

Probability to reject background hypothesis for background event (background efficiency):

Probability to accept a signal eventas signal (signal efficiency):

Page 36: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 36

Purity of event selectionSuppose only one background type b; overall fractions of signaland background events are s and b (prior probabilities).

Suppose we select signal events with t > tcut. What is the‘purity’ of our selected sample?

Here purity means the probability to be signal given thatthe event was accepted. Using Bayes’ theorem we find:

So the purity depends on the prior probabilities as well as on thesignal and background efficiencies.

Page 37: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 37

Constructing a test statisticHow can we choose a test’s critical region in an ‘optimal way’?

Neyman-Pearson lemma states:

To get the highest power for a given significance level in a testH0, (background) versus H1, (signal) (highest s for a given b)choose the critical (rejection) region such that

where c is a constant which determines the power.

Equivalently, optimal scalar test statistic is

N.B. any monotonic function of this is leads to the same test.

Page 38: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 38

Testing significance / goodness-of-fit

Suppose hypothesis H predicts pdf

observations

for a set of

We observe a single point in this space:

What can we say about the validity of H in light of the data?

Decide what part of the data space represents less compatibility with H than does the point less

compatiblewith H

more compatiblewith H

(Not unique!)

Page 39: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 39

p-values

where (H) is the prior probability for H.

Express level of agreement between data and H with p-value:

p = probability, under assumption of H, to observe data with equal or lesser compatibility with H relative to the data we got.

This is not the probability that H is true!

In frequentist statistics we don’t talk about P(H) (unless H represents a repeatable observation). In Bayesian statistics we do; use Bayes’ theorem to obtain

For now stick with the frequentist approach; result is p-value, regrettably easy to misinterpret as P(H).

Page 40: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 40

p-value example: testing whether a coin is ‘fair’

i.e. p = 0.0026 is the probability of obtaining such a bizarreresult (or more so) ‘by chance’, under the assumption of H.

Probability to observe n heads in N coin tosses is binomial:

Hypothesis H: the coin is fair (p = 0.5).

Suppose we toss the coin N = 20 times and get n = 17 heads.

Region of data space with equal or lesser compatibility with H relative to n = 17 is: n = 17, 18, 19, 20, 0, 1, 2, 3. Addingup the probabilities for these values gives:

Page 41: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan 41

Significance from p-value

Often define significance Z as the number of standard deviationsthat a Gaussian variable would fluctuate in one directionto give the same p-value.

1 - TMath::Freq

TMath::NormQuantile

CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1

Page 42: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 42

The significance of an observed signalSuppose we observe n events; these can consist of:

nb events from known processes (background)ns events from a new process (signal)

If ns, nb are Poisson r.v.s with means s, b, then n = ns + nb

is also Poisson, mean = s + b:

Suppose b = 0.5, and we observe nobs = 5. Should we claimevidence for a new discovery?

Give p-value for hypothesis s = 0:

Page 43: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan page 43

When to publish

HEP folklore is to claim discovery when p = 2.9 10,corresponding to a significance Z = 5.

This is very subjective and really should depend on the prior probability of the phenomenon in question, e.g.,

phenomenon reasonable p-value for discoveryD0D0 mixing ~0.05Higgs ~ 107 (?)Life on Mars ~10

Astrology

One should also consider the degree to which the data arecompatible with the new phenomenon, and the reliability ofthe model on which the p-value is based, not only the level ofdisagreement with the null hypothesis; p-value is only first step!

CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1

Page 44: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan page 44

Distribution of the p-valueThe p-value is a function of the data, and is thus itself a randomvariable with a given distribution. Suppose the p-value of H is found from a test statistic t(x) as

CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1

The pdf of pH under assumption of H is

In general for continuous data, under assumption of H, pH ~ Uniform[0,1]and is concentrated toward zero for Some (broad) class of alternatives. pH

g(pH|H)

0 1

g(pH|H′)

Page 45: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan page 45

Using a p-value to define test of H0

So the probability to find the p-value of H0, p0, less than is

CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1

We started by defining critical region in the original dataspace (x), then reformulated this in terms of a scalar test statistic t(x).

We can take this one step further and define the critical region of a test of H0 with size as the set of data space where p0 ≤ .

Formally the p-value relates only to H0, but the resulting test willhave a given power with respect to a given alternative H1.

Page 46: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan page 46

Inverting a test to obtain a confidence interval

CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1

Page 47: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 47

Page 48: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 48

Summary of lecture 1

Page 49: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 49

Extra slides

Page 50: G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 page 50

Some Bayesian references P. Gregory, Bayesian Logical Data Analysis for the Physical Sciences, CUP, 2005

D. Sivia, Data Analysis: a Bayesian Tutorial, OUP, 2006

S. Press, Subjective and Objective Bayesian Statistics: Principles, Models and Applications, 2nd ed., Wiley, 2003

A. O’Hagan, Kendall’s, Advanced Theory of Statistics, Vol. 2B, Bayesian Inference, Arnold Publishers, 1994

A. Gelman et al., Bayesian Data Analysis, 2nd ed., CRC, 2004

W. Bolstad, Introduction to Bayesian Statistics, Wiley, 2004

E.T. Jaynes, Probability Theory: the Logic of Science, CUP, 2003