G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 1
Statistics for SUSY/Exotics
ATLAS UK SUSY/Exotics
Cambridge, 1 May, 2012
Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 2
Outline Some developments for Bayesian methods
Bayesian limits Reference priors Bayes factors for discovery
Discovery sensitivity: beyond s/√b
Some comments on unfolding
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 3
Quick review of probablility
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 4
The Bayesian approach to limits In Bayesian statistics need to start with ‘prior pdf’ π(θ), this reflects degree of belief about θ before doing the experiment.
Bayes’ theorem tells how our beliefs should be updated in light of the data x:
Integrate posterior pdf p(θ | x) to give interval with any desired probability content.
For e.g. n ~ Poisson(s+b), 95% CL upper limit on s from
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 5
Bayesian prior for Poisson parameter Include knowledge that s ≥0 by setting prior π(s) = 0 for s
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 6
Bayesian interval with flat prior for s Solve numerically to find limit sup.
For special case b = 0, Bayesian upper limit with flat prior numerically same as one-sided frequentist case (‘coincidence’).
Otherwise Bayesian limit is everywhere greater than the one-sided frequentist limit, and here (Poisson problem) it coincides with the CLs limit.
Never goes negative.
Doesn’t depend on b if n = 0.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 7
Priors from formal rules Because of difficulties in encoding a vague degree of belief in a prior, one often attempts to derive the prior from formal rules, e.g., to satisfy certain invariance principles or to provide maximum information gain for a certain set of measurements.
Often called “objective priors” Form basis of Objective Bayesian Statistics
The priors do not reflect a degree of belief (but might represent possible extreme cases).
In Objective Bayesian analysis, can use the intervals in a frequentist way, i.e., regard Bayes’ theorem as a recipe to produce an interval with certain coverage properties.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 8
Priors from formal rules (cont.) For a review of priors obtained by formal rules see, e.g.,
Formal priors have not been widely used in HEP, but there is recent interest in this direction, especially the reference priors of Bernardo and Berger; see e.g.
L. Demortier, S. Jain and H. Prosper, Reference priors for high energy physics, Phys. Rev. D 82 (2010) 034002, arXiv:1002.1111.
D. Casadei, Reference analysis of the signal + background model in counting experiments, JINST 7 (2012) 01012; arXiv:1108.4270.
Casadei’s approach: describe nuisance parameters with informative gamma priors and marginalize; from this find the Jeffreys’ prior for the (single) parameter of interest.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 9
Jeffreys’ prior According to Jeffreys’ rule, take prior according to
where
is the Fisher information matrix.
One can show that this leads to inference that is invariant under a transformation of parameters.
For a Gaussian mean, the Jeffreys’ prior is constant; for a Poisson mean µ it is proportional to 1/√µ.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 10
Jeffreys’ prior for Poisson mean Suppose n ~ Poisson(µ). To find the Jeffreys’ prior for µ,
So e.g. for µ = s + b, this means the prior π(s) ~ 1/√(s + b), which depends on b. Note this is not designed as a degree of belief about s.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 11
Bayesian limit on s with uncertainty on b Uncertainty on b goes into the prior, e.g.,
Put this into Bayes’ theorem,
Marginalize over the nuisance parameter b,
Then use p(s|n) to find intervals for s with any desired probability content.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 12
Digression: marginalization with MCMC Bayesian computations involve integrals like
often high dimensionality and impossible in closed form, also impossible with ‘normal’ acceptance-rejection Monte Carlo.
Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation.
MCMC (e.g., Metropolis-Hastings algorithm) generates correlated sequence of random numbers:
cannot use for many applications, e.g., detector MC; effective stat. error greater than if all values independent .
Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 13
Although numerical values of answer here same as in frequentist case, interpretation is different (sometimes unimportant?)
Example: posterior pdf from MCMC Sample the posterior pdf from previous example with MCMC:
Summarize pdf of parameter of interest with, e.g., mean, median, standard deviation, etc.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 14
Bayesian model selection (‘discovery’)
no Higgs
Higgs
The probability of hypothesis H0 relative to an alternative H1 is often given by the posterior odds:
Bayes factor B01 prior odds
The Bayes factor is regarded as measuring the weight of evidence of the data in support of H0 over H1.
Interchangeably use B10 = 1/B01
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 15
Rewriting the Bayes factor Suppose we have models Hi, i = 0, 1, ...,
each with a likelihood
and a prior pdf for its internal parameters
so that the full prior is
where is the overall prior probability for Hi.
The Bayes factor comparing Hi and Hj can be written
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 16
Bayes factor is independent of P(Hi) For Bij we need the posterior probabilities marginalized over all of the internal parameters of the models:
Use Bayes theorem
So therefore the Bayes factor is
The prior probabilities pi = P(Hi) cancel.
Ratio of marginal likelihoods
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 17
Numerical determination of Bayes factors Both numerator and denominator of Bij are of the form
‘marginal likelihood’
Various ways to compute these, e.g., using sampling of the posterior pdf (which we can do with MCMC).
Harmonic Mean (and improvements) Importance sampling Parallel tempering (~thermodynamic integration) Nested sampling (MultiNest) ...
See e.g.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 18
Assessing Bayes factors One can use the Bayes factor much like a p-value (or Z value).
There is an “established” scale, analogous to HEP's 5σ rule: B10 Evidence against H0 -------------------------------------------- 1 to 3 Not worth more than a bare mention 3 to 20 Positive 20 to 150 Strong > 150 Very strong
Kass and Raftery, Bayes Factors, J. Am Stat. Assoc 90 (1995) 773.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 19
The Look-Elsewhere Effect
Gross and Vitells, EPJC 70:525-530,2010, arXiv:1005.1891
Suppose a model for a mass distribution allows for a peak at a mass m with amplitude µ.
The data show a bump at a mass m0.
How consistent is this with the no-bump (µ = 0) hypothesis?
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 20
p-value for fixed mass First, suppose the mass m0 of the peak was specified a priori.
Test consistency of bump with the no-signal (µ = 0) hypothesis with e.g. likelihood ratio
where “fix” indicates that the mass of the peak is fixed to m0.
The resulting p-value
gives the probability to find a value of tfix at least as great as observed at the specific mass m0.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 21
p-value for floating mass But suppose we did not know where in the distribution to expect a peak.
What we want is the probability to find a peak at least as significant as the one observed anywhere in the distribution.
Include the mass as an adjustable parameter in the fit, test significance of peak using
(Note m does not appear in the µ = 0 model.)
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 22
Distributions of tfix, tfloat For a sufficiently large data sample, tfix ~chi-square for 1 degree of freedom (Wilks’ theorem).
For tfloat there are two adjustable parameters, µ and m, and naively Wilks theorem says tfloat ~ chi-square for 2 d.o.f.
In fact Wilks’ theorem does not hold in the floating mass case because on of the parameters (m) is not-defined in the µ = 0 model.
So getting tfloat distribution is more difficult.
Gross and Vitells
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 23
Approximate correction for LEE We would like to be able to relate the p-values for the fixed and floating mass analyses (at least approximately).
Gross and Vitells show the p-values are approximately related by
where 〈N(c)〉 is the mean number “upcrossings” of -2ln λ(0) in the fit range based on a threshold
and where Zfix is the significance for the fixed mass case. So we can either carry out the full floating-mass analysis (e.g. use MC to get p-value), or do fixed mass analysis and apply a correction factor (much faster than MC).
Gross and Vitells
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 24
Upcrossings of -2lnλ
〈N(c)〉 can be estimated from MC (or the real data) using a much lower threshold c0:
Gross and Vitells
The Gross-Vitells formula for the trials factor requires 〈N(c)〉, the mean number “upcrossings” of -2lnλ(0) in the fit range based on a threshold c = tfix= Zfix2.
In this way 〈N(c)〉 can be estimated without need of large MC samples, even if the the threshold c is quite high.
25 G. Cowan
Multidimensional look-elsewhere effect Generalization to multiple dimensions: number of upcrossings replaced by expectation of Euler characteristic:
Applications: astrophysics (coordinates on sky), search for resonance of unknown mass and width, ...
ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics
Vitells and Gross, Astropart. Phys. 35 (2011) 230-234; arXiv:1105.4355
Remember the Look-Elsewhere Effect is when we test a single model (e.g., SM) with multiple observations, i..e, in mulitple places.
Note there is no look-elsewhere effect when considering exclusion limits. There we test specific signal models (typically once) and say whether each is excluded.
With exclusion there is, however, the analogous issue of testing many signal models (or parameter values) and thus excluding some even in the absence of signal (“spurious exclusion”)
Approximate correction for LEE should be sufficient, and one should also report the uncorrected significance.
“There's no sense in being precise when you don't even know what you're talking about.” –– John von Neumann
26 G. Cowan
Summary on Look-Elsewhere Effect
ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 27
Discovery significance for n ~ Poisson(s + b)
Consider the frequently seen case where we observe n events , model as following Poisson distribution with mean s + b (assume b is known). 1) For an observed n, what is the significance Z0 with which we would reject the s = 0 hypothesis? 2) What is the expected (or more precisely, median ) Z0 if the true value of the signal rate is s?
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 28
Gaussian approximation for Poisson significance For large s + b, n → x ~ Gaussian(µ,σ) , µ = s + b, σ = √(s + b).
For observed value xobs, p-value of s = 0 is Prob(x > xobs | s = 0),:
Significance for rejecting s = 0 is therefore
Expected (median) significance assuming signal rate s is
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 29
Better approximation for Poisson significance
Likelihood function for parameter s is
or equivalently the log-likelihood is
Find the maximum by setting
gives the estimator for s:
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 30
Approximate Poisson significance (continued) The likelihood ratio statistic for testing s = 0 is
For sufficiently large s + b, (use Wilks’ theorem),
To find median[Z0|s+b], let n → s + b (i.e., the Asimov data set):
This reduces to s/√b for s
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 31
n ~ Poisson(µ s+b), median significance, assuming µ = 1, of the hypothesis µ = 0
“Exact” values from MC, jumps due to discrete data. Asimov √q0,A good approx. for broad range of s, b. s/√b only good for s « b.
CCGV, arXiv:1007.1727
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 32
Some comments on unfolding
Given an observed distribution subject to statistical fluctuations and event migration between bins: try to estimate the “true” underlying distribution:
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 33
The (naive) Maximum Likelihood solution
Catastrophic failure???
Maximizing the likelihood:
gives
response matrix
back- ground
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 34
Regularized unfolding
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 35
Stat. and sys. errors of unfolded solution In general the statistical covariance matrix of the unfolded estimators is not diagonal; need to report full
But unfolding necessarily introduces biases as well, corresponding to a systematic uncertainty (also correlated between bins).
This is more difficult to estimate. Suppose, nevertheless, we manage to report both Ustat and Usys.
To test a new theory depending on parameters θ, use e.g.
Mixes frequentist and Bayesian elements; interpretation of result can be problematic, especially if Usys itself has large uncertainty.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 36
Folding Suppose a theory predicts f(y) → µ (may depend on parameters θ). Given the response matrix R and expected background β, this predicts the expected numbers of observed events:
From this we can get the likelihood, e.g., for Poisson data,
And using this we can fit parameters and/or test, e.g., using the likelihood ratio statistic
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 37
Versus unfolding If we have an unfolded spectrum and full statistical and systematic covariance matrices, to compare this to a model µ compute likelihood
where
Complications because one needs estimate of systematic bias Usys.
If we find a gain in sensitivity from the test using the unfolded distribution, e.g., through a decrease in statistical errors, then we are exploiting information inserted via the regularization (e.g., imposed smoothness).
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 38
ML solution again From the standpoint of testing a theory or estimating its parameters, the ML solution, despite catastrophically large errors, is equivalent to using the uncorrected data (same information content).
There is no bias (at least from unfolding), so use
The estimators of θ should have close to optimal properties: zero bias, minimum variance.
The corresponding estimators from any unfolded solution cannot in general match this.
Crucial point is to use full covariance, not just diagonal errors.
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 39
Some unfolding references H. Prosper and L. Lyons (eds.), Proceedings of PHYSTAT 2011, CERN-2011-006; many contributions on unfolding from p 225.
G. Cowan, A Survey Of Unfolding Methods For Particle Physics, www.ippp.dur.ac.uk/old/Workshops/02/statistics/proceedings/cowan.pdf
G. Cowan, 2012 CERN Academic Traning Lecture #4: indico.cern.ch/conferenceDisplay.py?confId=173728
G. Cowan, Statistical Data Analysis, OUP (1998), Chapter 11.
V. Blobel, An Unfolding Method for High Energy Physics Experiments, arXiv:hep-ex/0208022v1
G. D'Agostini, Improved iterative Bayesian unfolding, arXiv:1010.0632
G. Choudalakis, Fully Bayesian Unfolding, arXiv:1201.4612
Tim Adye, Unfolding algorithms and tests using RooUnfold, arXiv:1105.1160v1
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 40
Summary
Bayesian methods for limits
Beware of issues with flat priors
Consider reference priors
Issues related to discovery
For (frequentist) sensitivity, use
Look-Elsewhere Effect ~solved
For Bayesian discovery, use Bayes Factor
Unfolding
A big topic – avoid if possible!
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 41
Extra slides
Common practice in HEP has been to claim a discovery if the p-value of the no-signal hypothesis is below 2.9 × 10-7, corresponding to a significance Z = Φ-1 (1 – p) = 5 (a 5σ effect).
There a number of reasons why one may want to require such a high threshold for discovery:
The “cost” of announcing a false discovery is high.
Unsure about systematics.
Unsure about look-elsewhere effect.
The implied signal may be a priori highly improbable (e.g., violation of Lorentz invariance).
42 G. Cowan
Why 5 sigma?
ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics
But the primary role of the p-value is to quantify the probability that the background-only model gives a statistical fluctuation as big as the one seen or bigger.
It is not intended as a means to protect against hidden systematics or the high standard required for a claim of an important discovery.
In the processes of establishing a discovery there comes a point where it is clear that the observation is not simply a fluctuation, but an “effect”, and the focus shifts to whether this is new physics or a systematic.
Providing LEE is dealt with, that threshold is probably closer to 3σ than 5σ.
43 G. Cowan
Why 5 sigma (cont.)?
ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics
Reference priors J. Bernardo, L. Demortier, M. Pierini Maximize the expected Kullback–Leibler
divergence of posterior relative to prior:
This maximizes the expected posterior information about θ when the prior density is π(θ).
Finding reference priors “easy” for one parameter:
G. Cowan 44 ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics
(PHYSTAT 2011)
Reference priors (2) J. Bernardo, L. Demortier, M. Pierini
Actual recipe to find reference prior nontrivial; see references from Bernardo’s talk, website of Berger (www.stat.duke.edu/~berger/papers) and also Demortier, Jain, Prosper, PRD 82:33, 34002 arXiv:1002.1111:
Prior depends on order of parameters. (Is order dependence important? Symmetrize? Sample result from different orderings?)
G. Cowan 45 ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics
(PHYSTAT 2011)
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 46
MCMC basics: Metropolis-Hastings algorithm Goal: given an n-dimensional pdf generate a sequence of points
1) Start at some point
2) Generate
Proposal density e.g. Gaussian centred about
3) Form Hastings test ratio
4) Generate
5) If
else
move to proposed point
old point repeated
6) Iterate
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 47
Metropolis-Hastings (continued) This rule produces a correlated sequence of points (note how each new point depends on the previous one).
For our purposes this correlation is not fatal, but statistical errors larger than naive
The proposal density can be (almost) anything, but choose so as to minimize autocorrelation. Often take proposal density symmetric:
Test ratio is (Metropolis-Hastings):
I.e. if the proposed step is to a point of higher , take it; if not, only take the step with probability If proposed step rejected, hop in place.
Top Related