The Nine-Step Version of Hypothesis Testingweb.cs.dal.ca/~anwar/ds/Lec6.pdf · 2011-10-31 · The...

The Nine-Step Version of Hypothesis Testing

• State H0.

• State Ha.

• Specify α (the level of significance desired).

• Specify the effect size (ES).

• Specify the level of power desired.

• Determine the proper size of the sample.

• Collect and analyze the sample data.

• Refer to a criterion to assess the sample evidence

• Make a decision to retain or discard H0.

Step4: Specifying the Effect Size

• Deciding the smallest study finding that is worth considering.

• Can be (small , medium, or large)

• Depends on the kind of test (t-test, correlation, … etc)

• e.g. in t-test (0.2, 0.5, 0.8 for small, medium, large respectively).

• The effect size is the researcher's judgment of the meaningful deviation from the null hypothesis.

• Example on excel.

• Effect size calculator: http://www.uccs.edu/~faculty/lbecker/

http://www.uccs.edu/~faculty/lbecker/


Step 5: Specifying the Desired Level of Power

• If after testing with the effect size, the H0 is still off, you need a level of power, i.e. H0 is off by the previously established effect size.

• Power is a probability value (0-1).

• The complement of power is the probability of type II error.

• Usually set to .8 meaning that the chance of type II error (accepting H0 when it is false).

• Power is not usually set greater than 90% (.9) for two reasons:

– The probability that trivial deviations from H0 will be labeled statistically significant.

– Puts too much demand on researcher to meet the required power when computing the sample size needed.

Step 6:Determination of the Sample Size Needed

• Using a calculator when you know the confidence interval, and the population size.

• http://www.surveysystem.com/sscalc.htm

• Another way to determine the sample size using the margin of error:

• http://www.raosoft.com/samplesize.html

http://www.surveysystem.com/sscalc.htm

http://www.surveysystem.com/sscalc.htm

http://www.raosoft.com/samplesize.html

Hypothesis Testing Using Confidence Interval

• Used as an alternative to the critical value or the p-value.

• Provides insight into why H0 was accepted or rejected.

• Compute an interval around the sample data instead of a single value.

• The alpha level has to be present (an α of 0.05 indicates a 95% interval).

• Calculate the pinpoint of H0 , if it is out of the CI, then H0 is rejected. Otherwise, it is accepted.

• The use of adding and subtracting the standard error to the sample statistic is not alpha-driven and it is in the 68% percent interval. Alpha-driven is in the 95% interval.

– However, it is traditional to use the standard error.

Adjusting for an inflated Type I Error

• The risk of inflated type I error is in place when the alpha level is used more than once in a test. That is, when H0 is tested more than once.

• Solution: Adjust the alpha level: dividing alpha by the number of tests where alpha is used. AKA, The Bonferroni Technique. Example on page 196 of the text book.

What is an intuitive solution?

Warnings

• Two types of effect size:

– In the 9-step hypothesis testing, ES is predicted to be the minimum possible effect size prior to evaluating the study data.

– In the 7-step hypothesis testing, ES is computed from the data.

Be careful where you report yours!

• The criteria for effect size (small, medium, large) depends on the study statistic.

– For the mean (.2, .5, .8 are small, medium, and large)

– For coefficient (.1, .3, .5 are small, medium, large)

• The six-step hypothesis testing version is simplistic but unfortunately widely adopted.

• Use Bonferroni Technique to reduce the possibility of the Inflated Type I Error.

Chapter 9

Statistical Inferences Concerning Bivariate Coefficients

Statistical Inferences Concerning Bivariate Coefficients

• Making the educated guess based on the sample data only.

• Statistical tests applied to correlation coefficients.

• How Bonferroni technique is used in conjunction with correlation coefficients.

• How to compare two or more correlation coefficients to see if they are significantly different.

• How statistical tests can be applied to reliability and validity coefficients.

Statistical Tests: Single Correlation Coefficient

• Purpose of Inferential

– Not being able to test the entire population. Infer form the sample data.

• The Null Hypothesis

– A null correlation hypothesis is usually implied as H0: p=0.00

• Deciding If r is Statistically Significant

– By comparing the p-value associated with r against α (usually α is set to 0.05[5%]).

– By comparing the calculated r from the sample to a table of critical values.

Statistical Tests: Single Correlation Coefficient, cont’d

• One-Tailed and Two-Tailed Tests on r – Most of the time, two-tailed is assumed

– That is, testing for both negative and positive correlation

• Tests on specific kinds of correlation – Spearman, Pearson, Phi, etc.

– If r is indicated with no type, Pearson Product Moment is assumed

Tests on Many Correlation Coefficients

• Sometimes two or more correlations are inferentially tested in the same study.

• Presented in various ways:

• Tests on the Entries of a Correlation Matrix

– Correlation coefficient does not test if the variables per se are correlatied.

– Rather, it is the measurements of the variables that are correlated.

• Tests on several correlation coefficients reported in the text.

• The Bonferroni Adjustment Technique

– Adjusting the level of p against α by the dividing the overall p by the number of correlations desired.

– Holds down the chance Inflated Type I Error

• Comparing two Correlation Coefficients Statistically

• Use http://faculty.vassar.edu/lowry/rdiff.html, that uses Fisher r-to-z

transformation.

http://faculty.vassar.edu/lowry/rdiff.html



Chapter 10

Inference Concerning One or Two Means (t-tests & z-test)

Inference Concerning a Single Mean

• Single sample

• The sample mean (𝑋 ) is in focus for inferential matters.

• Two approaches:

– Using confidence intervals

– Using the mean to evaluate a null hypothesis

The Inferential Purpose

• µ is made based on the known value of 𝑿 .

• Interval Estimation

– Confidence Interval is built around the mean.

– CI indicates that the population mean (µ) will (probably) fall into the CI.

– The accompanying (usually 95%) means that if many samples were drawn, the associated CIs will overlap the population mean (µ).

– http://pirate.shu.edu/~wachsmut/Teaching/MATH1101/Testing/confidence-mean.html

– CI is affected by the sample size (n), the sample mean (𝑿 ) and the standard deviation (s).

– In Excel, the function is: =CONFIDENCE(alpha, standard deviation of the sample, sample size)

– Good article about the use of CONFIDENCE is here: http://support.microsoft.com/kb/828124

http://pirate.shu.edu/~wachsmut/Teaching/MATH1101/Testing/confidence-mean.html



http://support.microsoft.com/kb/828124

http://support.microsoft.com/kb/828124

The Inferential Purpose, cont’d

• Tests Concerning a Null Hypothesis

– When H0 involves a pinpoint mean value, testing between the sample mean and H0 takes place.

• H0: µ=a, where a is the pinpoint value chosen by the researcher.

– If the difference is significant, H0 is rejected.

– t-test and z-test are the most popular approaches for this kind of testing.

– z-test is slightly biased, but the bias is ignorable when the sample size is at least .

– In excel, t- tests is here:

• http://www.youtube.com/watch?v=wGoMEYinf6Y

• And http://www.wellesley.edu/Psychology/Psych205/onettest.html

http://www.youtube.com/watch?v=wGoMEYinf6Y

http://www.wellesley.edu/Psychology/Psych205/onettest.html

Inferences Concerning Two Means

• Several points to consider:

– Independent vs. correlated samples

– Testing with two means (comparing samples) using confidence intervals.

– Testing with two means (comparing samples) by setting up an H0.

Inferences Concerning Two Means Independent vs. Correlated Samples

• Correlated Samples: a relationship exists between each member of one sample and one and only one member of the other sample.

– Test and re-test for the same group.

– Matching: a member (or more) of the first sample is chosen for the second samples for a different test.

– Biological twins split-up.

• Independent Sample: No such relationship exists.

•

Inferences Concerning Two Means The Inferential Purpose

• With the two different types of samples, when they are compared in terms of their means:

– The inferences is applied to both populations from which the samples were drawn.

– The inferences is made about the populations NOT the samples.

Inferences Concerning Two Means Setting Up and Testing a Null Hypothesis

• Usually, H0 is not stated. Assume it is that no difference between the means exists. H0: µ1 - µ2 =0, unless it is otherwise indicated.

• Use t-test or F-test.

• F- test gives the probability that the variance between the two samples is not significantly different.

• t-test is similar but with slightly different outcome.

• Excel Examples:

X Y

1 22

2 5

3 26

4 34

5 41

6 14

7 18

8 15

Data, F-test value was 0.000556

t-Test: Two-Sample Assuming Unequal Variances

X Y

Mean 4.5 21.875

Variance 6 134.125

Observations 8 8

Hypothesized Mean Difference 0

df 8

t Stat -4.15157

P(T<=t) one-tail 0.001601

t Critical one-tail 1.859548

P(T<=t) two-tail 0.003202

t Critical two-tail 2.306004

F-test, again

F-test 0.160883202

An F-test returns the two-tailed probability that the variances in array1 (IQ Group1) and array2 (IQ Group2) are not significantly different. Use this function to determine whether two samples have different variances. For example, given test scores from public and private schools, you can test whether these schools have different levels of test score diversity. In the example above (0.161) indicates that the probability that the variances in the two groups (112 and 299) are significantly different is high.

IQ Group1 IQ Group 2

123 78 123 33

111 23 113 45

101 54

103 34

99 61 89 45

110 65 105 65

ANOVA (Analysis of Variance)

Anova: Single Factor

SUMMARY

Groups Count Sum Average Variance

X 8 36 4.5 6

Y 8 175 21.875 134.125

ANOVA

Source of Variation SS df MS F P-value F crit

Between Groups 1207.563 1 1207.563 17.2355 0.000979 4.60011

Within Groups 980.875 14 70.0625

Total 2188.438 15

Most important values are F & P. F= Between Groups MS / Within Groups MS.

Here is a good video explaining ANOVA: http://www.youtube.com/watch?v=A6j9oxAkQ3g

http://www.youtube.com/watch?v=A6j9oxAkQ3g

ANOVA Example

IQ Group1 IQ Group 2

123 78

123 33

111 23

113 45 101 54

103 34 99 61 89 45

110 65

105 65

Anova: Single Factor

SUMMARY

Groups Count Sum Average Variance

IQ Group1 10 1077 107.7 112.4556

IQ Group 2 10 503 50.3 299.3444

ANOVA

Source of Variation SS df MS F P-value F crit

Between Groups 16473.8 1 16473.8 80.00874 4.83E-08 4.413873

Within Groups 3706.2 18 205.9

Total 20180 19

Sum of Squares

Means Sum of squares =

SS /DF

The F statistic = Means Sum of Squares

(between) / Means Sum of Squares (within)

Differences between the averages for each

level

The variance within each level Degree of freedom: 1+1

= 2 Groups, 1+18+1= 20 subjects

Interval Estimation with Two Means

• Built around the difference in means to be used instead of the significance of the difference.

• If the CI does not overlap ZERO, the difference is significant.

Multiple Dependant Variables

• Results Presented in Text

– With respect to each variable, state the difference (t-test, t(N-2), and p-value).

– F-test, you need F(1, N-2)=F-value In addition to p-value).

– If comparing means, you may need to state the means and the standard deviations.

• Results Presented in a Table

– Looks like the correlation matrix.

– State variables and conditions and put the means, the SDs, t-statistic in addition to the p-value for each variable.

– The excerpt on page 242 is a good example.

Use of Bonferroni Adjustment Technique

• Usually by dividing the alpha level of (0.05) by the number of

dependent variables each with its own H0.

• When alpha is decreased to a lower level by the researcher,

the technique is called Pseudo-Bonferroni.

Effect Size Assessment and Power Analysis

• Has to deal with the issue of ‘Practical Significance’ not just

the statistical significance.

• Effect size calculator: http://www.cemcentre.org/evidence-

based-education/effect-size-calculator

• Online calculator: http://www.uccs.edu/~faculty/lbecker/

• Cohen's d = M1 - M2 / spooled

where spooled = [(s 1+ s 2) / 2]

• TABLE ON PAGE 246

http://www.cemcentre.org/evidence-based-education/effect-size-calculator











Post Hoc Power Analysis

• An alternatives to the estimated effect size.

• To clarify the results.

• Usually done if the difference turned out to be insignificant.

Comments

• Insignificance does not mean H0 is true:

• Why?

– If there could be more than one conflicting null hypotheses.

– If the measurement is not reliable.

– Not doing a statistical power before making final conclusions.

• Overlapping Distributions

– Even though the means can be significantly different, scores may overlap.

– The standard deviation can show such case.

• The Typical Use of t-Test

– t-Test is typically used for different purposes such as evaluating H0 with one or two means.

– It can also be used to measure the difference between two correlations.

• Practical vs. Statistical Significance

• Type I and Type II error.

The Nine-Step Version of Hypothesis Testingweb.cs.dal.ca/~anwar/ds/Lec6.pdf · 2011-10-31 · The...

Documents

Transcript of The Nine-Step Version of Hypothesis Testingweb.cs.dal.ca/~anwar/ds/Lec6.pdf · 2011-10-31 · The...