Common Statistical Methods in Geoscience
-
Upload
francopaultafoyagurtz -
Category
Documents
-
view
15 -
download
4
description
Transcript of Common Statistical Methods in Geoscience
Intro to Quantitative Geology www.helsinki.fi/yliopisto
Introduction to Quantitative Geology Lecture 3
Common statistical methods in geoscienceLecturer: David [email protected]
16.3.2015
2
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Goals of this lecture
• Address the many challenges of geological data samples
• Discuss uncertainty in Earth science data
• Review the basic mathematical representations of measurement uncertainty
3
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Why are statistics important for us?
4
Bighorn Basin, WY, USA
www.helsinki.fi/yliopistoIntro to Quantitative Geology 5
Bighorn Basin, Wyoming, USA
www.helsinki.fi/yliopistoIntro to Quantitative Geology 6
Bighorn Mtns, Wyoming, USA
www.helsinki.fi/yliopistoIntro to Quantitative Geology 7
Bighorn Basin, Wyoming, USA
www.helsinki.fi/yliopistoIntro to Quantitative Geology 8
Bighorn Basin, Wyoming, USA
www.helsinki.fi/yliopistoIntro to Quantitative Geology 9
Bighorn Basin, Wyoming, USA
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Why are statistics important for us?
10
Bighorn Basin, Wyoming, USA
Even here, bedrock exposures are limited and widespread
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Why are statistics important for us?
11
Bighorn Basin, Wyoming, USA
• We can directly observe only a tiny fraction of most geological features
• Bedrock exposure is limited
• Many rock units are very large and/or widely dispersed
• Sometimes, even well-exposed units can be hard to access
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Fully funded field opportunity - Bighorn Basin
12More information at https://rock.geosociety.org/eo/
www.helsinki.fi/yliopistoIntro to Quantitative Geology
A few statistical definitions
• Population: The total number of occurrences of a particular thing present in a defined area
• We basically cannot access this geologically
• Subset: A collected portion of the population
• Sampling units: The collected material
13
www.helsinki.fi/yliopistoIntro to Quantitative Geology
A representative sample
• Our goals, as Earth scientists is to collect data that forms a representative sample
• Representative sample: A sample that can be used to infer the characteristics of the population
• The best way to get a representative sample is to collect sampling units at random, without bias
14
www.helsinki.fi/yliopistoIntro to Quantitative Geology
A representative sample
• Our goals, as Earth scientists is to collect data that forms a representative sample
• Representative sample: A sample that can be used to infer the characteristics of the population
• The best way to get a representative sample is to collect sampling units at random, without bias
• Why might this be difficult?
15
www.helsinki.fi/yliopistoIntro to Quantitative Geology
The trouble with “Random” sampling
• Making our best effort to collect a “random” sample does not guarantee it is a representative sample
1. Samples taken from the same population may be very different from one another
16
Second, even if two populations are very different, samples from eachmay be similar simply by chance, and therefore give the misleadingimpression the populations are also similar (Figure 1.2).Finally, variation within samples may make it difficult to interpret any
effect of differences in location. There is often so much variation within asample (and a population) that differences in location may be difficult tointerpret. For example, imagine you are an environmental geologist work-ing to assess a landfill contaminated with lead. The lead content in a sample
Population
Sample 1
Sample 2
Figure 1.1 Even a random sample may not necessarily be a goodrepresentative of the population. Two samples have been taken at randomfrom a Devonian oil field in Ghawar. By chance, sample 1 contains a group ofrelatively large fossils, while those in sample 2 are relatively small, and thetypes of fossils in the two samples are also different.
1.1 Experimental design and statistics 3
Figure 1.1, McKillup and Dyar, 2010
Large fossils
Small fossils
www.helsinki.fi/yliopistoIntro to Quantitative Geology
The trouble with “Random” sampling
• Making our best effort to collect a “random” sample does not guarantee it is a representative sample
2. Samples taken from very different populations may be quite similar
17
Figure 1.2, McKillup and Dyar, 2010
of ten cores from the oldest part of the landfill is 1000mg/kg Pb on average,and ranges from 100–9000mg/kg. In contrast, a sample of ten cores fromthe youngest part of the landfill contains 2000mg/kg Pb on average butranges from 100–7000mg/kg. Which of these two areas would you considerto be most contaminated?
Variability within samples can also obscure the effect of experimentaltreatments. For example, opaque brown topaz crystals may change to
Population 1
Population 2
Sample 1
Sample 2
Figure 1.2 Samples selected at random from very different populations maynot necessarily be different. Simply by chance the samples from populations1 and 2 are similar in size and composition.
4 Introduction
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Using a sample to infer something about a population
• Collecting a representative sample is a clear challenge and needed to be able to make inferences about a population
• Every precaution must be taken to collect a representative sample
• Collecting a sufficient number of measurements
• Making measurements or collecting samples in the most logical locations
• Carefully preparing samples for analysis or measurement
18
www.helsinki.fi/yliopistoIntro to Quantitative Geology 19
Uncertainty
www.helsinki.fi/yliopistoIntro to Quantitative Geology 20
Deep water oil/gas exploration is expensiveDrill ship: ~350,000 € per day
Oil platform: ~300,000 € per day
Uncertainty
www.helsinki.fi/yliopistoIntro to Quantitative Geology 21
Deep water oil/gas exploration is expensiveDrill ship: ~350,000 € per day
Oil platform: ~300,000 € per day
Know your uncertainties!
Uncertainty
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Uncertainty in Earth science
• In addition to challenges collecting a representative sample, all geological measurements have inherent uncertainty
• It is not possible to make an exact measurement
• We may use tools with very high precision, but measurements are still uncertain
• A scientist’s goal is to reduce uncertainty when taking measurements by making them as accurate and precise as possible
• Our goal should be to always include that uncertainty when comparing measured values to predictions
22
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Precision versus accuracy
• Random error: Experimental uncertainty revealed by repeated measurements
• Small random error = Precise
• Systematic error: Error that cannot be revealed by repeated measurement
• Small systematic error = Accurate
• How can we detect systematic error?
23
Fig. 4.1, Taylor, 1997
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Precision versus accuracy
• As it turns out, detecting systematic error is difficult
• Scatter might reveal large random errors, but without a reference for the true value, the systematic error is unclear
• This is one reason that blanks and standards are used for measuring isotope concentrations when measuring geochronometer ages, for example
24
Fig. 4.2, Taylor, 1997
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Sources of uncertainty
25
• If we think about it, uncertainty is everywhere
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Sources of uncertainty
26
• If we think about it, uncertainty is everywhere
• A typical field compass is graduated in 1°, so we can’t expect any higher precision
• Magnetic declination can also only be set to within 1°, and the uncertainty larger if not set correctly
• Outcrops generally don’t provide perfect exposure, and repeated measurements vary often by 3-5° (at least)
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Reported measurements
• Ignoring the declination, the most precise measurement we can make with our compass is
• Imagine we take 5 measurements of the strike of a rock unit and find: 33°, 36°, 32°, 35°, 34°
• Based on these numbers alone, we could state the strike is 34±2°
• Including the uncertainty in reading the compass, we would say the strike is 34±2.5°
• Generally, any measurement should be reported in the form
27
xbest ± �x
xbest ± 0.5�
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Mean and standard deviation
• For the compass measurements, it is quite natural to take the average value, or mean, for the best estimate of the strike
• The mean, , is simply the sum of the measured values divided by the number of measurementswhere !" is the "th measured value, # is the total number of measurements and $ represents sigma notation of a sum
28
x̄
X
i
xi =NX
i=1
xi = x1 + x2 + ...+ xN
x̄ =x1 + x2 + ...+ xN
N
=⌃xi
N
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Mean and standard deviation
• For our example, calculating the mean is easy
• If we consider the mean as the best estimate of the quantity, a natural value to consider as well is the deviation (or residual) for each measurement
• If the deviation is small, the measurements are precise; if it is large, the measurements are not so precise
29
x̄ =33� + 36� + 32� + 35� + 34�
5= 34�
di = xi � x̄
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Mean and standard deviation
• It might be tempting to average the deviation to determine how much measurements deviate from the mean on average
• What is the average deviation in our dataset?
30
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Mean and standard deviation
• It might be tempting to average the deviation to determine how much measurements deviate from the mean on average
• What is the average deviation in our dataset?
• If we square the deviation, all values will be positive, and then we can average and take the square root of the result
31
�
x
=
r1
N � 1(d
i
)2 =
vuut 1
N � 1
NX
i=1
(xi
� x̄)2
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Mean and standard deviation
• It might be tempting to average the deviation to determine how much measurements deviate from the mean on average
• What is the average deviation in our dataset?
• If we square the deviation, all values will be positive, and then we can average and take the square root of the result
• This is the sample standard deviation, %!
32
�
x
=
r1
N � 1(d
i
)2 =
vuut 1
N � 1
NX
i=1
(xi
� x̄)2
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Mean and standard deviation
• If we plug in our values we see
33
Measured value
Mean Deviation Deviation squared33 34 -1 1
36 34 2 4
32 34 -2 4
35 34 1 1
34 34 0 0
�x
=
r1
N � 1(d
i
)2
=
r1
4(1 + 4 + 4 + 1 + 0)
= 1.6
www.helsinki.fi/yliopistoIntro to Quantitative Geology
What does the standard deviation tell us?
• The 2% uncertainty is often reported for geological measurements
• For the measured values, ~95% of the measurements are within ±2% of the mean & (& is the population mean, is the sample mean)
34
7.3.3 The standard deviation of a population
The importance of the variance is apparent when you obtain the standarddeviation, which is symbolized for a population by σ and is just the square rootof the variance. For example, if the variance is 64, the standard deviation is 8.
The standard deviation is important because the mean of a normallydistributed population, plus or minus one standard deviation, includes68.27% of the values within that population.
Evenmore importantly, 95% of the values in the population will be within±1.96 standard deviations of the mean. This is especially useful becausethe remaining 5% of values will be outside this range and therefore furtheraway from the mean (Figure 7.3). Remember from Chapter 6 that 5% is thecommonly used significance level.
These two statistics are all you need to describe the location and shape ofa normal distribution and can also be used to determine the proportionof the population that is less than or more than a particular value (Box 7.1).
7.3.4 The Z statistic
The proportions of the normal distribution described in the previoussection can be expressed in a different andmore workable way. For a normaldistribution, the difference between any value and the mean, divided by thestandard deviation, gives a ratio called the Z statistic that is also normally
Frequency
(a)
µ
Frequency
(b)
µ
Figure 7.3 Illustration of the proportions of the values in a normallydistributed population. (a) 68.27% of values are within ±1 standard deviationfrom the mean and (b) 95% of values are within ±1.96 standard deviationsfrom the mean. These percentages correspond to the area of the distributionenclosed by the two vertical lines.
70 Working from samples: data, populations and statistics
~68% of measurements within ±1% ~95% of measurements within ±2%
±1% ±2%
Fig. 7.3, McKillup and Dyar, 2010
x̄
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Normal distribution
• The normal distribution refers to a bell-shaped curve that mathematically describes the frequency that a measured value is observed
• This is also known as a Gaussian curve
• Many geological variables follow a normal distribution
• We will look at this in detail on Wednesday in lab
35
chosen at random and plot the frequency of these on the Y axis and angle onthe X axis, the distribution will look like a symmetrical bell, which has beencalled the normal distribution (Figure 7.1).The normal distribution has been found to apply to many types of
variables in natural phenomena (e.g. grain size distributions in rocks, theshell length of many species of marine snails, stellar masses, the distributionof minerals on beaches, etc.).The very useful thing about normally distributed variables is that two
descriptive statistics – themean and the standard deviation – can describethis distribution. From these, you can predict the proportion of data that willbe less than or greater than a particular value. Consequently, tests that use theproperties of the normal distribution are straightforward, powerful and easyto apply. To use them you have to be sure your data are reasonably “normal.”(There are methods to assess normality and these will be described later.)To understand parametric tests you need to be familiar with some
statistics used to describe the normal distribution and some of its properties.
7.3.1 The mean of a normally distributed population
First, the mean (the average) symbolized by the Greek μ describes thelocation of the center of the normal distribution. It is the sum of all the
0 70averageCinder cone angle (°)
Frequencyof eachangle
Figure 7.1 An example of a normally distributed population. The shape ofthe distribution is symmetrical about the average and the majority of valuesare close to the average, with an upper and lower “tail” of steeply and gentlysloping cinder cones, respectively.
7.3 The normal distribution 67
Fig. 7.1, McKillup and Dyar, 2010
www.helsinki.fi/yliopistoIntro to Quantitative Geology
A silly example of why uncertainties matter
36
Regression line
www.helsinki.fi/yliopistoIntro to Quantitative Geology
A silly example of why uncertainties matter
37
Question: Is this a good fit?
Regression line
www.helsinki.fi/yliopistoIntro to Quantitative Geology
A silly example of why uncertainties matter
38
Question: Is this a good fit?
Answer: Cannot say.No uncertainties shown Regression line
www.helsinki.fi/yliopistoIntro to Quantitative Geology
A silly example of why uncertainties matter
39
Trend line fits thedata
Conclusion:More treats = Happier cat
www.helsinki.fi/yliopistoIntro to Quantitative Geology
A silly example of why uncertainties matter
40
Major breakthrough!
More treats = happier cat,but cat happiness is not linear
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Reporting uncertainties
• The point is uncertainties matter
• What our data can say is a function of the uncertainty
• A model fit to data is meaningful only if the predictions lie within the uncertainty in the data
• A geological age of 1.88 Ga for an event might be interesting, but an age of 1.88±0.02 Ga is far more powerful
41
www.helsinki.fi/yliopistoIntro to Quantitative Geology
Recap
• It is difficult to collect a representative sample and requires careful planning
• Uncertainty is everywhere in Earth science data
• Characterising and reporting uncertainty is as important as the measured values themselves
42
www.helsinki.fi/yliopistoIntro to Quantitative Geology
References
McKillup, S., & Dyar, M. D. (2010). Geostatistics explained: an introductory guide for earth scientists. Cambridge University Press.
Taylor, J. R. (1997). An introduction to error analysis: the study of uncertainties in physical measurements. Sausalito, CA, USA: University science books.
43