Lecture 3 2014 Statistical Data Treatment and Evaluation

download Lecture 3 2014 Statistical Data Treatment and Evaluation

of 44

description

UWI MONA

Transcript of Lecture 3 2014 Statistical Data Treatment and Evaluation

Chemical Analysis I

Statistical Data Treatment and EvaluationLecture 31Experimentalist use statistical calculations to sharpen their judgments concerning the quality of experimental measurements. These applications include:

Defining a numerical interval around the mean of a set of replicate analytical results within which the population mean can be expected to lie with a certain probability. This interval is called the confidence interval (CI).

Determining the number of replicate measurements required to ensure at a given probability that an experimental mean falls within a certain confidence interval.

2Estimating the probability that (a) an experimental mean and a true value or (b) two experimental means are different.Determining at a given probability level whether the precision of two sets of measurements differs.Comparing the means of more than two samples to determine whether differences in the means are real or the result of random error. This process is known as analysis of varianceDeciding whether what appears to be an outlier in a set of replicate measurements is the result of a gross error or it is a legitimate result.3Confidence IntervalsSignificance level and Confidence levelSignificance levelThe probability that a result is outside the confidence interval is often called the significance level. When expressed as a fraction, the significance level is often given the symbol .Confidence levelThe confidence level is the probability value 1 associated with a confidence interval, where is the level of significance. It can also be expressed as a percentage and is sometimes called the confidence coefficient.

The confidence level (CL) is related to on a percentage basis by CL = (1 ) 100%Areas under a Gaussian curve for various values of z.We may assume that 90 times out of 100, true mean, , will be within 1.64 of any measurement that we make.Confidence level is 90% and the confidence interval is z = 1.64Confidence Level, % z50.0 0.6768.0 1.0080.0 1.2890.0 1.6495.0 1.9695.4 2.0099.0 2.5899.7 3.0099.9 3.29Confidence level for various values of zConfidence interval when s is a good approximation of EXAMPLE A

Determine the 80% and 95% confidence intervals for (a)1108mg/L glucose and (b) the mean value (1100.3mg/L) for month 1 inthe example. Assume that in each part, s =19 is a goodestimate of .

Ans: (a) z=1.28 and 1.96 for the 80% and 95% confidence levels: 80% CI = 11081.2819 = 110824.3 mg/L 95% CI = 11081.9619 = 110837.2 mg/L (b)For the 7 measurements:

How many replicate measurements in month 1 inExample A are needed to decrease the 95%confidence interval to 1100.310.0 mg/L of glucose?We thus conclude that 14 measurements are needed to provide a slightly better than 95% chance that the population mean will lie within 10 mg/L of glucose of the experimental mean.Finding the confidence interval when is unknown

N.B. t z as the number of degree of freedom becomes infiniteExample of calculating a confidence intervalConsider measurement of dissolved Ti in a standard seawater (NASS-3):Data: 1.34, 1.15, 1.28, 1.18, 1.33, 1.65, 1.48 nMDF = n 1 = 7 1 = 6 = 1.34 nMs = 0.17

95% confidence interval t(df=6, 95%) = 2.447CI95 = 1.34 0.16 nM

50% confidence intervalt(df=6, 50%) = 0.718CI50 = 1.34 0.05 nM

Values of t for Various Levels of ProbabilityDegrees of Freedom80%90%95%99%99.9%13.086.3112.763.763721.892.924.309.9231.631.642.353.185.8412.941.532.132.784.608.6151.482.022.574.036.8761.441.942.453.715.9671.421.902.363.505.4181.401.862.313.365.0491.381.832.263.254.78101.371.812.233.174.59151.341.752.132.954.07201.321.732.092.843.85401.301.682.022.703.55601.301.672.002.623.461.281.641.962.583.29Statistical Aids to Hypothesis Testing

Experimental results seldom agree exactly with those predicted from a theoretical model.Scientists and engineers frequently must judgewhether a numerical difference is a results of therandom errors inevitable in all measurements or aresult of systematic errors. Certain statistical tests are useful in sharpening these judgments.Tests of this kind make use of a null hypothesis,which assumes that the numerical quantities being compared are the same.

Null hypothesisIn statistics, a null hypothesis is a hypothesis that is presumed true until statistical evidence in the form of a hypothesis test indicates otherwise. It is a hypothesis that two or more populations are identical.The purpose of hypothesis testing is to test the viability of the null hypothesis in the light of experimental data. Depending on the data, the null hypothesis either will or will not be rejected as a viable possibility.The null hypothesis is often the reverse of what the experimenter actually believes; it is put forward to allow the data to contradict it.Hypothesis testing

Hypothesis testing is a method of inferential statistics. An experimenter starts with a hypothesis about a population parameter called the null hypothesis. Data are then collected and the viability of the null hypothesis is determined in light of the data. If the data are very different from what would be expected under the assumption that the null hypothesis is true, then the null hypothesis is rejected. If the data are not greatly at variance with what would be expected under the assumption that the null hypothesis is true, then the null hypothesis is not rejected. Failure to reject the null hypothesis is not the same thing as accepting the null hypothesis.Significance level

The probability of a false rejection of the null hypothesis in a statistical test. Also called level of significance.The significance level of a test is the probability that the test statistic will reject the null hypothesis when the hypothesis is true. In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis. The significance level is used in hypothesis testing as follows: Firstly, the difference between the results of the experiment and the null hypothesis is determined. Then, assuming the null hypothesis is true, the probability of a difference that large is computed. Finally, this probability is compared to the significance level. Significance level contdIf the probability is less than or equal to the significance level, then the null hypothesis is rejected and the outcome is said to be statistically significant.Traditionally, experimenters have used either the .05 level (sometimes called the 5% level) or the .01 level (1% level)The lower the significance level, the more the data must diverge from the null hypothesis to be significant. Therefore, the .01 level is more conservative than the .05 level. At 95 % confidence level, the significance level is 5%.Rejection regions for the 95% confidence level.Two-tailed test for Ha : o. Note the critical value of z is 1.96.(b) One-tailed test for Ha : > o. Here, the critical value of zcrit is 1.64, so that 95% of the area is to the left of and 5% of the area is to the right.(c) One-tailed test for Ha : < o . Here the critical value is again 1.64, so that 5% of the area lies to the left of zcrit.

Comparing an Experimental Mean with the True Value

A common way of testing for bias in an analytical method is to use the method to analyze a sample whose composition is accurately known. Bias in an analytical method is illustrated by the two curves on the next slide.This show the frequency distribution of replicate results in the analysis of identical samples by two analytical methods. Method A has no bias, so the population mean A is the true value xt. Method B has a systematic error, or bias, that is given by bias = B - xt = B - A 21Bias affects all the data in the set in the same way and that it can be either positive or negative.2223If a good estimate of is available, the above equation can be modified by replacing t with z and s with Comparing an experimental mean with a know value contdA standard material known to contain 38.9% Hg was analysed byatomic absorption spectroscopy. The results were 38.9%, 37.4%and 37.1%. At the 95% confidence level,is there any evidence for a systematic error in the method?

Assume null hypothesis (no bias). Only reject this if

But t (from Table) = 4.30, s (calc. above) = 0.943% and N = 3

Therefore the null hypothesis is maintained, and there is no evidence for systematic error at the 95% confidence level.Detection of Systematic Error (Bias)Comparing a measured result with a known value--exampleDissolved Fe analysis verified using NASS-3 seawater SRM Certified value = 5.85 nM Experimental results: 5.76 0.17 nM (n = 10)

Compare to ttable; df = 10 - 1 = 9, 95% CL

ttable(df=9,95% CL) = 2.262

If |tcalc| < ttable, results are not significantly different at the 95% CL.

If |tcalc| ttable, results are significantly different at the 95% CL.

For this example, tcalc < ttest, so experimental results are not significantly different at the 95% CL

Comparison of two experimental means

Here, the chemist has to judge whether a difference in the means of two sets of identical analyses is real and constitutes evidence that the samples are different or whether the discrepancy is simply a consequence of random errors in the two set

Again, If a good estimate of is available, the above equation can be modified by replacing t with z and s with Comparing replicate measurements or comparing means of two sets of dataExample: Given the same sample analyzed by two different methods, do the two methods give the same result?

Will compare tcalc to tabulated value of t at appropriate df and CL.df = n1 + n2 2 for this test

N.B. When we compare our test value of t with the critical value obtained from the table for the particular confidence level desired.

If the absolute value of the test statistic is smaller than the critical value, the null hypothesis is accepted and no significant difference between the means has been demonstrated. If tcalculated < t table (95%), the difference is not significant

A test value of t greater than the critical value of t indicates that there is a significant difference between the means. If tcalculated > t table (95%), the difference is significant30Comparing replicate measurements or comparing means of two sets of dataexampleMethod 1: Atomic absorption spectroscopyData: 3.91, 4.02, 3.86, 3.99 mg/g

= 3.94 mg/g

= 0.07 mg/g

= 4

Method 2: Spectrophotometry

Data: 3.52, 3.77, 3.49, 3.59 mg/g

= 3.59 mg/g

= 0.12 mg/g

= 4

Determination of nickel in sewage sludgeusing two different methods

Comparing replicate measurements or comparing means of two sets of dataexample

Compare to ttable at df = 4 + 4 2 = 6 and 95% CL.ttable(df=6,95% CL) = 2.447

If |tcalc| ttable, results are not significantly different at the 95%. CL.

If |tcalc| ttable, results are significantly different at the 95% CL. Since |tcalc| (5.056) ttable (2.447), results from the two methods are significantly different at the 95% CL.

Comparing replicate measurements or comparing means of two sets of data

Please Note: There is an important assumption associated with this t-test:

It is assumed that the standard deviations (i.e., the precision) of the two sets of data being compared are not significantly different.

How do you test to see if the two std. devs. are different?

How do you compare two sets of data whose std. devs. are significantly different?

The F test is used to compare the precision of two sets of data. The sets do not necessarily have to be obtained from the same sample as long as the samples are sufficiently alike that the sources of random error can be assumed to be the sameThe F test is a test designed to indicate whether there is a significant difference between two methods based on their standard deviations. F is defined in terms of the variance of the two methods. F = s12 / s22 V1 / V2Where s12 > s22. There are two different degrees of freedom. If the calculated F value exceeds a tabulated F value at selected confidence level, then there is a significant difference between the variances of the two methods.The F test : comparison of precisionThe F test : comparison of precisionThe F test provides insight into either two questions:Is method A more precise than Method B?Is there a difference between the precision of the two methods?For the first of these applicationsThe variance of the supposedly more precise procedure is always placed in the denominator For the second of these applicationsThe larger variance appears in the numeratorCritical Values of F at the 5% Probability Level (95% confidence level) Degrees of Freedom(Denominator) Degrees of Freedom (Numerator) 2 3 4 5 6 10 12 20 219.0019.1619.25 19.3019.3319.4019.4119.4519.5039.559.289.129.018.948.798.748.668.5346.946.596.396.266.165.965.915.805.6355.795.415.195.054.954.744.684.564.3665.144.764.534.394.284.064.003.873.67104.103.713.483.333.222.982.912.772.54123.893.493.263.113.002.752.692.542.30203.493.102.872.712.602.352.282.121.843.002.602.372.212.101.831.751.571.00F-test to compare standard deviationsFrom previous example:Let s1 = 0.12 and s2 = 0.073

Note: Keep 2 or 3 decimal places to compare with Ftable.

Compare Fcalc to Ftable at df = (n1 -1, n2 -1) = 3,3 and 95% CL.

If Fcalc Ftable, std. devs. are not significantly different at 95% CL.

If Fcalc Ftable, std. devs. are significantly different at 95% CL.

Ftable(df=3,3;95% CL) = 9.28

Since Fcalc (2.70) < Ftable (9.28), std. devs. of the two sets of data are not significantly different at the 95% CL. (Precisions are similar.)

Comparing replicate measurements or comparing means of two sets of data--revisitedThe use of the t-test for comparing means was justified for the previous example because we showed that standard deviations of the two sets of data were not significantly different.

If the F-test shows that std. devs. of two sets of data are significantly different and you need to compare the means, use a different version of the t-test Comparing replicate measurements or comparing means from two sets of data when std. devs. are significantly different

Flowchart for comparing means of two sets of data or replicate measurementsUse F-test to see if std. devs. of the 2 sets of data are significantly different or notStd. devs. are significantly differentStd. devs. are not significantly differentUse the 2nd version of the t-test (the beastly version)Use the 1st version of the t-test (see previous, fully worked-out example)One last comment on the F-testNote that the F-test can be used to simply test whether or not two sets of data have statistically similar precisions or not.

Can be used to answer a question such as: Do method one and method two provide similar precisions for the analysis of the same analyte?Errors in Hypothesis TestingThe choice of a rejection region for the nullhypothesis is made so that we can readilyunderstand the errors involved.

Type I error:The error that results from rejecting H0 when it is true.An unusual results occurred that put our test statistic z or t into the rejection region.Ex: at the 95% confidence level, there is a 5% chance thatwe will reject the null hypothesis even though it is trueThe significance level gives the frequency of rejecting H0 when it is true.

Type II error:We accept H0 when it is false.The probability of a type II error is given the symbol .Errors in Hypothesis Testing contdMaking smaller (0.01 instead of 0.05) would appear to minimize the type I error rate.Decreasing the type I error rate, however, increases the type II error rate because they are inversely related.If a type I error is much more likely to have serious consequences than a type II error, it is reasonable to choose a small value of .On the other hand, in some situations a type II error would be quite serious, and so a larger value of is employed to keep the type II error rate under control.

As a general rule of thumb, the largest that is tolerable for the situation should be used

This ensures the smallest type II error whilekeeping the type I error within acceptable limits.Next ClassStatistical Data Treatment and Evaluation contdQ Test

Laboratory ManagementGood Laboratory PracticesLaboratory AccreditationManagement SystemsMethod ValidationMethod VerificationControl ChartsProficiency TestingEtc.