Packet Spring 09 Stats

EDFI 641 Statistics in Education

Course Packet

Dr. Rachel Vannatta

Table of Contents

Video #1—Introduction to Statistics................................................................2 Video #2—Frequency Distributions ...................................................................6 Video #3—Central Tendencies & Variability.................................................. 10 Video #4—Probability & z Score ...................................................................... 18

M-n-M Activity........................................................................................ 19 Video #5—Distribution of Sample Means...................................................... 24 Video #6—Hypothesis Testing......................................................................... 28 Video #7—t Test ................................................................................................. 38 Video #8—t Test of Independent Samples................................................... 44

Interpreting Research ......................................................................... 49 Video #9—t Test of Related Samples............................................................ 50

Interpreting Research ......................................................................... 54 Coke vs. Pepsi Experiment ................................................................... 55 Video #10—AVOVA............................................................................................. 57

Interpreting Research ......................................................................... 63 Video #11—Correlation & Regression.............................................................. 64

Interpreting Research ......................................................................... 73 Video #12—Chi Square ...................................................................................... 75

Interpreting Research ..........................................................................81 Statistical Test Grid ............................................................................ 82 Unit Normal (z-score) Table ............................................................................. 84 t Distribution Table ............................................................................................ 88 F Distribution (ANOVA) table.......................................................................... 89 Pearson Correlation Table.................................................................................. 90 Chi Square Distribution Table ...........................................................................91

Video #1—Introduction to Statistics

Population—the entire group of individuals that the researcher WISHES to study.

Sample—a set of individuals selected from population, intended to represent the population

Parameter—value that describes the population

Statistic—value that describes the sample Two major types of statistical methods • descriptive stats—summarize, organize and simplify data (e.g., mean, standard deviation, tables,

graphs, distributions) • data • raw score

• inferential stats—techniques that allow us to study samples and make generalizations about the population from which they were selected (e.g., t test, ANOVA, correlation)

• sampling error—amount of error between the sample statistic and the population parameter (degree to which the sample differs from the population)

• random sampling—used to minimize error between sample and population Inferential statistics also allow us to study relationships between/among variables that the sample holds. • variable—characteristic/condition that differs among individuals (gender, height, test scores, IQ)

• construct—hypothetical concepts/theory to organize observations • operational definition—defines a construct in terms of how it is measured

Types of Variables • categorical variable (discrete)—consists of separate categories (e.g., gender, religion,

classification of personality) • quantitative variable (continuous)—can be divided into an infinite number of fractional parts

(e.g., height, time, age) • independent variable—usually a treatment that has been manipulated (control group versus

experimental group), usually categorical • dependent variable—usually the effect, usually quantitative • confounding variable—an uncontrolled variable that creates a difference between the

control and experimental groups Variables determine type of relationship being studied

• mutual • causal

• Groups must be compared to examine cause and effect groups are created by a categorical variable

Class #1: In-Class Practice Problems In the following research questions, identify the independent and dependent variables and indicate if it is categorical or quantitative. 1. Is there a significant relationship between college GPA and SAT scores among college freshmen?

independent variable—

dependent variable—

research design— 2. Does receiving a special diet of oat bran significantly decrease cholesterol levels among middle-age

adults? Note: Researcher compared a treatment group to a control group. Groups were created using random selection and assignment.



research design— 3. Does socio-economic status (low, middle, high) effect reading achievement among preschoolers?



research design— 4. Does receiving whole-language reading instruction increase reading achievement among elementary

students? Note: Research compared treatment group (whole-language) to control group (traditional). Existing groups were used.



research design—

Causal Mutual Independent Variable Categorical Quantitative

Dependent Variable Quantitative Quantitative Key Words Cause

Effect Increase/Decrease Difference

Relate Relationship Predict Associate

Research Designs • Correlational—studies relationships among 2 or more variables to explain for predict behaviors

• usually both IV and DV are quantitative

• example: Teacher studies the relationship between English grades and overall GPA.

• Experimental—examines cause and effect; manipulates a treatment and tests the outcome; compares the experimental and control groups (groups are randomly created)

• IV=nominal; DV= interval/ratio

• example: Researcher compares grades of a group of students that receive computer-assisted instruction to a group that receives none. Groups were created through random assignment.

• Quasi-Experimental—examines cause and effect; indirectly manipulates a treatment and tests the outcome; compares the experimental and control groups (uses existing groups)


• example: Researcher compares grades of a group of students that receive computer-assisted instruction to a group that receives none. Existing groups were used.

• Causal Comparative—examines cause and effect (cautiously); compares groups created by some

categorical characteristic (gender, religion, ethnicity)


• example: Researcher compares final grades of male and female students.

Most research is guided by a hypothesis, a prediction about the effect of the treatment.

Measurement Scales • Nominal—numbers have NO numerical value but represent categories (religion, ethnicity, occupation,

gender)

• Ordinal—numbers represent a rank (1 begin the best); interval can vary (e.g., class rank, Olympic ordinals)

• Interval—numbers have typical numerical value; interval are equal; no real zero (e.g., temperature, test score)

• Ratio—same as interval but has a real zero (e.g., money, time)

Identify the measurement scale (nominal, ordinal, interval, ratio) for each.

_________________5. Size of school district (small, medium, large)

_________________6. Rank of faculty on their teaching ratings

_________________7. Social security number

_________________8. Color of person’s eyes

_________________9. IQ scores

_________________10. Degree in Fahrenheit

_________________11. Religious affiliation

________________12. Medalists in an Olympic event

________________13. Income in actual dollars

Video #2—Frequency Distributions Frequency distribution—table/graph of the number of individuals located in each category

• places scores in highest to lowest; • groups together all individuals who have the same score

X f 10 1 9 4 8 5 6 6 5 2 4 2 Proportion and Percents of Frequency Distributions • Proportion—relative frequencies; measures the fraction of the total group that is associated

with each score; most often appear as decimals

• proportion = p = f N • Percentage—percent of the total group that is associated with each score

• percentage = p (100) = f (100) N X f p=f/N %=p(100) cum f cum% 10 1 1/20=.05 5 1 5 9 4 4/20=.20 20 5 25 8 5 5/20=.25 25 10 50 6 6 6/20=.30 30 16 80 5 2 2/20=.10 10 18 90 4 2 2/20=.10 10 20 100

Grouped Frequency Distribution Table • used when data covers a wide range of values; groups are based on class intervals

• to construct a grouped frequency distribution table, follow these rules: • rule 1—number of intervals—shoot for 8-12 intervals, 10 intervals being the ideal • rule 2—interval width—use appropriate width to reach appropriate # of intervals • rule 3—interval starting pt—should be a multiple of the width • rule 4—all intervals should be the same width

Helpful Hints • use the following equation to determine the number of intervals and the width of intervals that is

appropriate for the data

• number of intervals = highest score - lowest score + 1 * interval width

• ALWAYS round up the number of intervals! It is impossible to have a fourth of an interval at the end of the distribution. So even if the number of intervals (using the above formula equals 8.25, round up to 9!*

• try different widths, until an appropriate number of intervals is calculated

Example: N=25

51, 55, 57, 60, 63, 66, 68, 69, 70, 72, 74, 74, 74, 75, 77, 79, 83, 84, 85, 85, 88, 90, 92, 95, 98

• number of intervals = 98 – 51 + 1 = 48 = 9.6 (round up to 10) 5 5

X f 95-99 2 90-94 2 85-89 3 80-84 2 * keep in mind that since a continuous variable 75-79 3 contains an infinite number of points, a score 70-74 5 is not assigned a single point but rather an 65-69 3 interval with boundaries, also called real 60-64 2 limits, that separate a score from the 55-59 2 adjacent scores. 50-54 1 Example: X=88

• upper real limit = 88.4 • lower real limit= 87.5 • therefore, a score of 87.75 would fall in

the interval of X=88 Frequency Distribution Graphs • Uses an x-axis to represent scores or and a y-axis to represent frequencies

• List scores increasing in value from left to right • List frequencies in increasing value from bottom to top • The height of the y-axis should be approximately 2/3 to 3/4 of the length of the x-axis

• Creating a Grouped Frequency Histogram—follow rules for Grouped Frequency Table

• Histogram—used for interval/ratio data; a bar represents an interval (real limits of the score or class interval); bars touch each other to represent the continuous nature of the data; height corresponds to frequency

• Example: Using data from the Grouped Frequency Table on previous page

50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99

• Other Types of Frequency Distribution Graphs and Polygons

• Bar Graph—used for nominal/ordinal data; a bar represents a category, bars do not touch

• Frequency Distribution Polygons— used for interval/ratio data; a single dot represents an individual score or a class interval; dots are connected

• Distribution Curve—shows relative frequencies for the population; smooth

• Normal—symmetrical; greatest frequency in the middle, smallest frequency in the extremes (tails)

• Positively Skewed— smallest frequency in the positive (right) end of the distribution

• Negatively Skewed— smallest frequency in the negative (left) end of the distribution

5 4 3 2 1

f

Starting pt. is a multiple of the width (5)

Interval width is 5 in order to generate 10 intervals

10 intervals meet the 8-12 interval requirement

Video # 2 In-Class Practice Problems

10, 15, 18, 22, 25, 26, 29, 31, 33, 33, 34, 37, 38, 39, 39, 40, 40, 40, 41, 42, 42, 43, 44, 45, 46, 46, 47, 48, 49, 50 1. Using the data above, do the following: a. construct a histogram based upon the grouped frequency distribution

b. determine the distribution type (normal, positive, negative) from the histogram

Video #3: Central Tendency Measure of Central Tendency

• describes a group of individuals with a single measurement that is most representative of all individuals

• Types: mean, median, and mode • Mean—arithmetic average

• used for interval/ratio (quantitative) data • computed by adding all the scores and dividing by the number of scores • Population mean = μ = ΣX Sample mean = X = ΣX

N n

• Median—the midpoint; the score that divides the distribution exactly in half; 50% are above and below the median

• used for ordinal data or when: there is a skewed distribution, some scores are undetermined, or there is an open-ended distribution

• Calculating the median when N is an odd number • make sure scores are in order; find the middle score

• Calculating the median when N is an even number • make sure scores are in order; find the two middle scores; add the two scores & divide by 2

• Mode—the most frequent score • used especially for nominal data • represented by the highest point in the frequency distribution

Central Tendency and the Shape of Distributions

• Normal distribution—mean, median, and mode are equal and smack-dab in the middle of the distribution

• Skewed Distributions • not symmetrical • mean, median, mode are different • extreme scores on one end of the distribution • Mean is most affected by extreme scores, so it will be furthest out in the tail

Negatively Skewed—extreme scores are on the low end of the distribution

Mean Median Mode

Positively Skewed—extreme scores are on the high end of the distribution

Mode Median Mean

Variability Variability—a measure that describes how spread out or close together the scores are within the distribution

• Range—distance between the highest score and the lowest score in the distribution; easiest measure of variability

• range = (high score - low score)

1

2

3

4

5

6

5

4

3

2

1

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10 11

Distribution 1 Range = 10 Mean= 6 Median= 6 Mode= 6 SD= 2.45

• Standard Deviation from the Mean

• most common measure of variability; • average distance of scores from the mean

Distribution #2

8

1

2 2

3

4

3

2 2

1

8

01

23

45

67

89

1 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

5

4

3

2

1

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 1 0 1 1

Standard Deviation Activity o Need 16 pieces of candy (M-n-M’s, Skittles, etc.) o You must use all 16 pieces for each distribution. o Use Distribution Graph from Blackboard Course Site (located in Course Documents)

Steps

1. For distribution A create a normal distribution like Dr. Vannatta’s with your candy. Trace outline of distribution.

Now on your own, complete the following: 2. For distribution B, move candy around to create a distribution that has greater

variability than A. Trace outline of distribution. 3. For distribution C, move candy around to create a distribution that has less

variability than A. Trace outline of distribution. 4. For distribution D, move candy around to create a distribution that has the

least possible amount of variability.

Variability Key Concepts • Variability shows how spread out scores are in the distribution.

• Range only takes into account the two extreme scores (highest and lowest) • Standard deviation compares all scores to the mean

• When scores are close to the mean, then variability is less. • When scores are far from the mean (outliers, extreme ends of the distribution), then

variability is more. Calculating Standard Deviation

• standard deviation for population = σ = Σ(X - μ)2 N

• standard deviation for sample = s = Σ(X - X)2 n - 1

• degrees of freedom (df = n - 1) —an adjustment of sample bias; to calculate the standard deviation, we must know the sample mean—this places a restriction on sample variability since only (n - 1) scores are free to vary once we know the sample mean.

• Example for calculating the standard deviation for a sample

• Variance(s2) = Σ (X - X)2 = SS = 40 = 2.6 n – 1 n – 1 15

• Standard dev (s) = Σ(X - X)2= SS = 2.6 = 1.62 n - 1 n - 1 • Sum of squares—sum of squared deviation scores

or sum of squared differences • SS = Σ(X - X)2 also SS = s2(n-1)

• Variance—mean of squared deviation scores; sum

of squares divided by the number of scores minus 1

• variance = s2 = Σ(X - X)2 n - 1

Steps to Calculate Standard Deviation 1. Calculate mean (X) 2. Calculate the difference between each score and the mean (X – X) 3. Square each difference (X –X)2 4. Add the squared differences

• This is the Sum of Squares (SS) = Σ(X – X)2

5. Divide SS by degrees of freedom (df = n-1)

• This is Variance = Σ(X – X)2 n-1

6. Take the square root of variance • This is the Standard Deviation (SD) = Σ(X – X)2 n-1

X

X

X - X

(X - X)2

2 5 -3 9

3 5 -2 4

3 5 -2 4

4 5 -1 1

4 5 -1 1

4 5 -1 1

5 5 0 0

5 5 0 0

5 5 0 0

5 5 0 0

6 5 1 1

6 5 1 1

6 5 1 1

7 5 2 4

7 5 2 4

8 5 3 9

SS = 40

Standard Deviation Calculation Practice

a. Calculate the standard deviation for the following data (X=6).

b. Calculate the standard deviation for the following data (X=6). Notice the mean is the same, but three scores have been changed to 6.

How does the change in data effect the SD? Why? Characteristics of standard deviation

• a small standard deviation indicates that scores are close together • a large standard deviation indicate that scores are spread out • adding a constant to each score will not chance the standard deviation • multiplying each score by a constant cause the standard deviation to multiply by that same

constant • research articles usually use (SD) to refer to the standard deviation • Standard deviation and the normal distribution

• three standard deviations on each side of the mean -3σ −2σ −1σ mean +1σ +2σ +3σ

X

X

X – X

(X – X)2

2

2

8

8

10

SS=

X

X

X - X

(X - X)2

2

6

6

6

10

SS=

Video #3: In-Class Practice Problems

For the following sample of scores: 1, 2, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, 7, 7, 8, 9 This data is slightly different from what is presented in the video so that a “cleaner” mean would be calculated.

a. Sketch a frequency distribution histogram. b. Calculate the following: mean = ____________________ median = ____________________ mode = ____________________ range = ____________________ degrees of freedom = ____________________ standard deviation = ____________________

c. From you calculations, identify the distribution type.

Video #4: Probability

Probability is used to: • determine the types of sample we are likely to obtain from a population • make conclusions about the population from the sample • Probability—fraction, proportion or percent of selecting a specific outcome out of the

total number of possible selections

• probability of A = number of A’s total number of possible outcomes

• probability of selecting a heart out of deck of cards • p (heart) = 13 = 1 = .25 25%

52 4

Probability and the Normal Distribution • A normal distribution holds 100% of the individuals in it • the mean, median and mode are all equal and divide the distribution in half • 50% of distribution is above and below the mean • When the percent is divided by the standard deviations, it looks like this

99.7%

95%

68% 13.59% 34.13% 34.13% 13.59% .13% 2.14% 2.14% .13% -3σ −2σ −1σ mean +1σ +2σ +3σ 0.13% 2.28% 15.87% 50% 84.13% 97.72% 99.87%

z Scores

z score—measure of relative position; identifies position of a raw score in terms of the number of standard deviations it falls above or below the mean

• Use z scores to convert raw score into percentile rank z = X - μ σ

• Example: Jill gets a raw score of 55 on a standardized math test (μ=50, σ=10). What is Jill’s z score?

z = X - μ = 55 - 50 = 5 = .5 So Jill is .5 standard deviation σ 10 10 above the mean.

99.7%

95%

68% 13.59% 34.13% 34.13% 13.59% .13% 2.14% 2.14% .13%

-3z -2z -3z mean 1z 2z 3z 0.13% 2.28% 15.87% 50% 84.13% 97.72% 99.87%

• View area under the normal curve in terms of probability and percent: • What is the probability of selecting a score that fall beyond 1z? p=.1587 • What is the probability of selecting a score that fall below -2z? p=.0228 • What is the percentile rank of someone who has a z score of 2? 98th %tile • What is the percentile rank of someone who has a z score of 1? 84th %tile • What if we have a z-score of 1.2, how can we find the probability or percentile rank?

• we use the table of z scores provided in your course packet (see statistical tables on page 84)

Putting it all together

• Suppose Jack receives a raw score of 540 on the SAT-math (μ=500, σ=100). What is Jack’s z score and percentile rank?

Jack z = 540 - 500 = .4 100 Proportion (p) = .6554 Rank = 65.54 %tile 200 300 400 500 600 700 800 -3z -2z -1z 0z +1z +2z +3z

• Use z score determine an unknown raw score • Suppose an individual scored at the 70th % on a

standardized test (μ,=100, σ=10), but for some reason we don’t know his raw score and need to calculate it.

1. Use the equation:

raw score = μ + zσ

2. Use the percentile rank and convert it to a probability (example: 70% .7000).

3. Use the z-table to identify the z-score associated with the probability • .7000 corresponds to a z-score of z=.52 (Notice that we could not find a probability of

exactly .7000 but had to find a probability that was closest to .7000, which was .6985). • Now just plug in z, μ, and σ to our equation

• raw score = μ + zσ • raw score = 100 + .52(10) • raw score = 105.2


For the problems 1-4, apply the parameters (μ = 50, σ = 5). 1. Draw the distribution. Include z-scores and mean and standard deviation. 2. Bebe scored 48. Place Bebe’s score on the distribution. What is her z-score and percentile rank?

3. Kenny scored 63. Place Kenny’s score on the distribution. What is his z-score and percentile rank?

4. Sally is at the 71st percentile. Place Sally on the distribution. What is her z-score and raw score?

For the problems 5-7, use the following parameters from the GRE (μ = 500, σ = 100). 5. Mary scored 570. What is her z-score and percentile rank?

6. Dick scored 340. What is his z-score and percentile rank?

7. Jill is at the 38th percentile. What is her raw score?

For the problems 8-10, use the parameters from an IQ test (μ = 100, σ = 15). 8. Wendy scored at the 90th percentile. What is her raw score? 9. What percent falls between the scores of 100 and 115? 10. Jack scored 80. What is his z score and percentile rank? Answers for Class #4 In-Class Problems: 5) z=.7, percentile rank=75.8; 6) z=-1.6, percentile rank=5.48; 7) z=-.31, raw score = 469; 8) z=1.28, raw score = 119.2; 9) 34.13% fall between the mean and 1z; 10) z=-1.33, percentile rank = 9.2

Video #5: Distribution of Sample Means

With statistics, we are usually trying to make conclusions/inferences about the population from the studied sample.

• Consequently, we want to compare the sample to the population of similar samples. But in doing so, two issues arise:

• How do we know is a sample is representative of the population when every sample is different?

• How can we transform a population distribution of individuals to a population distribution of sample means?

• Every sample is different from the population, this is known as sampling error, or the

discrepancy/error between the sample and the population. • Random sampling is used to minimize sampling error, which can occur randomly

If we were to take a population distribution of individuals. . .

• randomly group individuals into similar sized samples • then calculated the means of these samples and placed them into a frequency distribution • a normal curve would form—this distribution is known as the distribution of sample means. • any distribution that is of sample statistics and NOT individual scores is referred to as a

sampling distribution. Characteristics of the distribution of sample means • will approach a normal distribution as sample size increases (a sample size greater than 30 is

considered normal) • the mean of the distribution of sample means is equal to the population mean of individuals and is also

known as the expected value of X. • standard deviation of this new distribution is called the standard error of X. • standard error (σx)—measures the standard distance between the sample mean (X) and the population

mean (μ); indicates how good an estimate X will be for μ.

• standard error (σx) = σ n

• as sample size increases, the standard error will decrease-----> which means that the samples are more representative of the population

Probability and the Distribution of Sample Means • We can now use the distribution of sample means to find the probability of obtaining a specific

sample mean from the population of samples

• Example: What is the probability of getting a sample mean of 515 or higher on the SAT-math (μ=500, σ=100) with a random sample of n = 400?

• Calculate the standard error for samples of n=400. • σx = σ = 100 = 100 = 5

n 400 20

• Draw distribution of sample means -3z -2z -1z 0z +1z +2z +3z pop of individuals 200 300 400 500 600 700 800 pop of samples (n=400) 485 490 495 500 505 510 515

• A sample mean of 515 corresponds to +3z • Using the z table, +3z corresponds to a probability of .0013 (.13%)

• What if the sample mean does not correspond to a whole z score?

• Use z = X - μ σx

• Example: What is the probability of getting a sample mean of 104 or higher on an IQ test (μ=100, σ=15) with a random sample of n = 36?

• Calculate the standard error for samples of n=36.

• σx = σ = 15 = 15 = 2.5 n 36 6

• Draw distribution of sample means

-3z -2z -1z 0z +1z +2z +3z pop of individuals 55 70 85 100 115 130 145 pop of samples (n=36) 92.5 95 97.5 100 102.5 105 107.5

• A sample mean of 104 corresponds to +1.6z

z = X - μ = 104 − 100 = 4 = 1.6 σX 2.5 2.5

• Using the z table, +1.6z corresponds to a probability of .0548 (5.48%)


1. A normal population has μ = 70 and σ = 12. a. Sketch the population distribution. What proportion of the scores have values greater than a

score of X = 73? b. Sketch the distribution of sample means for samples of size n = 16. What proportion of the

means have values greater than a mean of X = 73?

-3z -2z -1z 0z +1z +2z +3z pop of individuals pop of samples (n=36) 2. For a normal population with μ = 70 and σ= 20, what is the probability of obtaining a sample mean

greater than X = 75 a. For a random sample of n =4? b. For a random sample of n =16? c. For a random sample of n = 100?

-3z -2z -1z 0z +1z +2z +3z pop of samples (n=4) pop of samples (n=16) pop of samples (n=100)

Video #6: Hypothesis Testing

Hypothesis Testing—using sample data to evaluate a hypothesis (prediction) about the population so conclusions/inferences can be made about the population from the sample

• We are testing a hypothesis to determine if the treatment has caused a significant change in the population

• the majority of sample means are in the middle of the distribution; so for a sample to be significantly different, it should be with the extreme means in the tails of the distribution, where the probability is very low

Steps in Hypothesis Testing

1. Stating the Hypotheses 2. Establish significance criteria 3. Collect and analyze data 4. Evaluate null hypothesis 5. Draw conclusion

Step 1—Stating the Hypotheses

• hypotheses should be stated in terms of the population • like a research question, your hypothesis should include three parts: variables, relationship, and

sample

• two hypotheses must be developed—an alternative and a null • Write alternative hypothesis in statement form • Write notation for both alternative and null

• alternative hypothesis—the actual prediction about the change or relationship that may occur in the population

• null hypothesis—statement that the treatment has no effect on the population

• hypotheses can also be directional or non-directional • non-directional—just a prediction of a change/effect

• Key words: effect, impact, difference, cause

• directional—a prediction of increase or decrease • Key words: increase, decrease, higher, lower, positive, negative

• Summary of Hypotheses Notation (applying example values of μ=60)

Alternative Null

One-tailed (Directional)

H1: μsprog > 60 H1: μsprog< 60

H0: μsprog ≤ 60 H0: μsprog ≥ 60

Two-tailed (Non-directional)

H1: μsprog ≠ 60 H0: μsprog = 60

• Example: Suppose that local school district implemented an experimental program for science

education. After one year, 100 children in the special program obtained a mean score of X=63 on a national science achievement test (μ=60, σ=12). Did the program have an impact on the participants’ science achievement?

• alternative—The science program will significantly effect science achievement among program participants. This is an example of a non-directional hypothesis;

• H1: μsprog ≠ 60

• null— The science program will NOT significantly effect science achievement among program participants.

• H0: μsprog = 60

Step 2—Establish significance criteria • How much does the population need to change to show a significant effect from the treatment? • Is the change due to the treatment or sampling error? • Typically to be significantly different, we require the sample to be different from 95% or 99% of

the population

• By setting a benchmark or criteria that requires the change in the population mean to be quite large and the probability of this change due to be very low, we decrease our chance of a Type I error

• this criteria is known as the level of significance or alpha level (α) • most commonly used alpha levels are .05 (5%) and .01 (1%) • these levels of significance correspond with specific z scores, but depends upon whether

the hypothesis is directional or non-directional • non-directional hypothesis--->2-tailed test

• .05 level -------> zcritical = ± 1.96 • .01 level -------> zcritical = ± 2.58

99%

95%

-3z -2z -1z 0z +1z +2z +3z

-2.58z -1.96z +1.96z +2.58z • directional hypothesis---> 1-tailed test

• .05 level -------> zcritical = + or - 1.65 • .01 level -------> zcritical = + or - 2.33

95%

99%

-3z -2z -1z 0z +1z+ 2z +3z • when the sample mean exceeds +1.65z +2.33z

the limit, then it differs significantly so we would reject the null

Step 3—Collect & analyze sample data--random selection highly recommended so that sample is representative of population

• Recall that when a test statistic is calculated by hand, you need to identify the critical value (zcritical), which is then compared to the test statistic (zcalculated) to determine significance.

• Computer automatically determines the probability of obtaining a test statistic due to chance. Consequently, when determining significance you do NOT compare zcalculated to zcritical, rather you examine the p-value or level of significance.

• If p (or sig) is less than alpha level (.05 or .01) test statistic is significant reject the null.

• If p (or sig) is greater than alpha level (.05 or .01) test statistic is NOT significant fail to reject the null.

Decision-making Table

Comparison Significance? Decision? Conclusion zcalculated ≥ zcritical Significance! Reject Null Restate Alternative Hand Calculations zcalculated < zcritical Not! Fail to Reject Null Restate Null

p ≤ alpha Significance! Reject Null Restate Alternative Computer p > alpha Not! Fail to Reject Null Restate Null

Step 4–Evaluate the null hypothesis

• Compare the data with the null • if the sample data is significantly different, then reject the null • if the sample data is NOT significantly different,

then fail to reject the null

Step 5—Draw conclusion • If null is rejected restate alternative hypothesis for conclusion. • If you fail to reject the null state the null hypothesis as conclusion

Errors in Hypothesis Testing--Two types of errors are possible when testing a hypothesis: • Type I Error—we could make the mistake of rejecting the null when it really the H0 is true, when

there really isn’t a significant change due to the treatment

• this kind of error may be due to sampling error (the sample was above the population mean even before the treatment)

• minimize a Type I error by setting low alpha (α) level (low probability for making an error)

• Type I error is more serious! • Type II Error— we could make the mistake of not rejecting the null when we should have, when

there really is a significant change due to the treatment

• the treatment effect was not big enough most likely due to sampling error (the sample was below the population mean even before the treatment)

• Putting it all together Example of a two-tailed test • Let’s go back to our previous example of the science program: After one year, 100 children

in the special program obtained a mean score of 63 on a national science achievement test (μ=60, σ=12). Did the program have an impact on the participants’ science achievement? Test at the .05 level.

• Step 1: Develop hypotheses • State Alternative—Special science program will significantly effect science achievement

among program participants. • Determine if it is a one-tailed or two-tailed test.

• It is non-directional hypothesis ------>two-tailed • Notation: H1: μsprog ≠ 60 H0: μsprog = 60

• Step 2: Establish significance criteria • Computer α = .05 • Hand calculations identify z scores used for the alpha level and the appropriate test.

• two-tailed test at .05 corresponds to zcritical = ± 1.96

• Step 3: Collect and analyze sample data • Computer enter and analyze data • Hand calculations

• Calculate standard error σx = σ = 12 = 12 = 1.2

n 100 10

• Draw distribution of sample means and shade in critical region

95%

-3z -2z -1z 0z 1z 2z 3z pop of individuals 24 36 48 60 72 84 96 pop of sample means (n=100) 56.4 57.6 58.8 60 61.2 62.4 63.6

-1.96z +1.96z

• Step 4: Compare sample data to null • Computer

• Identify test statistic and level of significance (p-value) in output • z = 2.49, p=.0064

• Compare level of significance with alpha level • p-value of .0064 is less than .05 it is significant reject null

• Hand calculations • Calculate test statistic • Convert sample mean into z score to determine if it falls in critical region.

z = X − μ = 63 - 60 = 3 = 2.5 it exceeds +1.96z, so it σX 1.2 1.2 is significant, reject the null

• Step 5: Draw conclusion—

• Null is rejected so alternative hypothesis is restated as conclusion • Participation in the science program did significantly effect science achievement

scores among program participants.

Example of a one-tailed test: Suppose we took the same example, but hypothesized that the program would cause a significant increase in achievement scores--this would be a directional hypothesis. In addition, let’s change the level of significance to .01

Recall: n = 100, X = 63, μ = 60, σ = 12

• Step 1: Develop hypotheses • State alternative: Special science program will significantly increase science achievement

scores among program participants. • Determine if it is a one-tailed or two-tailed test.

• It is directional hypothesis ------>one-tailed H1: μsprog > 60 H0: μsprog < 60 • Step 2: Establish significance criteria

• Computer α = .01 • Hand calculations Identify z scores used for the alpha level and the appropriate test.

• one-tailed test at .01 corresponds to z = + 2.33, since we are looking for an increase, we are focusing on the positive end of the distribution

• Step 3: Collect and analyze sample data

• Computer enter and analyze data • Hand calculations

• Calculate standard error σx = σ = 12 = 12 = 1.2

n 100 10

• Draw distribution of sample means and shade in critical region

99%

-3z -2z -1z 0z +1z +2z +3z pop of individuals 24 36 48 60 72 84 96 sample means (n=100) 56.4 57.6 58.8 60 61.2 62.4 63.6 +2.33z • Step 4: Compare sample data to null

• Computer • Identify test statistic and level of significance (p-value) in output

• z= 2.49, p=.0032 • Compare level of significance with alpha level

• p-value of .0032 is less than .01 it is significant reject null • Hand calculations

• Calculate test statistic • Convert sample mean to z score to determine if it falls into the critical region.

z = X - μ = 63 – 60 = 3 = 2.5 it exceeds +2.33z, σX 1.2 1.2 so it is significant,reject the null

• Step 5: Draw conclusion • Null is rejected so alternative hypothesis is restated as conclusion

• Participation in the science program did significantly increase achievement scores among program participants.

Assumptions for Hypothesis Testing with z Scores • random sampling and independent observations • population standard deviation will remain the same after the treatment; it is like adding a

constant—the mean changes but the σ will not • normal sampling distribution

Reporting of Results of the Statistical Test • p-value is reported in as:

• reject the null—p<.05 • fail to reject the null—p>.05

• z test results statement include the following parts: • sample mean; (M=63) • z calculated with the degrees of freedom in parentheses; (z(99) = 2.5)

• to calculate degrees of freedom (df); df = n - 1 • in our example, n=100, so df= n-1 = 100 - 1 = 99

• alpha level; (p< .05) • two-tailed or one-tailed

• include population mean and SD (μ=60, σ=12) • Example from one-tailed test: Participation (M=63) in the science program did

significantly increase achievement scores; z(99)=2.5, p<.05, one-tailed; when compared to the population (μ=60, σ=12).

Video #6: In-Class Practice Problems Complete the process of hypothesis testing for each of the scenarios.

1. A high school counselor created preparation course for the SAT-verbal (μ=500, σ=100). A random sample of n = 16 students complete the course and then take the SAT. The sample had a mean score of X = 554. Does the course have a significant affect on SAT scores? Test at the .01 level.

Z-test results: μ - mean of Variable (Std. Dev. = 100) H0 : μ=500 HA : μ not equal 500

a. Alternative hypothesis in sentence form.

b. Circle: one-tailed or two-tailed c. Write the alternative and null hypotheses using correct notation.

H1: H0: d. zcalculated = e. Level of significance (p) = f. Circle: reject null or fail to reject null g. Write your conclusion in sentence form.

Variable n Sample Mean Std. Err. Z-Stat P-value

var1 16 554 25 2.16 0.0308

2. A researcher believes that children who grow up as an only child develop vocabulary skills at a faster

rate than children in large families. To test this, a sample of n = 25 four-year-old only children are tested on a standardized vocabulary test (μ=60, σ=10). The sample obtains a mean of X = 63.8. Test at the .05 level.

Z-test results: μ - mean of Variable (Std. Dev. = 10) H0 : μ=10 HA : μ > 10





var1 25 63.8 2 26.9 <0.0001

There was an error when conducting this test. The population mean is NOT 10 but rather 60. The result is still significant, but the z-statistics would have been 1.93 with p=.03.

3. A psychologist investigates IQ among autistic children to determine if their IQ is

significantly different from the norm. Using a standardized IQ test (μ=100, σ=10), he tests 10 autistic children, all age 12. The following output was generated using StatCrunch. Test at α = .05. Sample data are: 105, 110, 130, 150, 185, 100, 125, 95, 85, 120

Z-test results: μ - mean of Variable (Std. Dev. = 10) H0 : μ=100 HA : μ not equal 100





var1 10 120.5 3.1622777 6.4826694 <0.0001

Video #7: The t Statistic To use the z score as a test statistic, we must know the population standard deviation in order to calculate the standard error of sample means. Unfortunately, most of the time we do not know σ, so what do we do? The t statistic, commonly known as a t test, allows us to compare the sample to the null by using the sample standard deviation to estimate the standard error of sample means. estimated standard error (sX) = s n The t statistic uses a formula very similar to z but instead utilizes the estimated standard error.

z = X - μ t = X - μ σX sX

Tip on when to use which: • if you know σ, then use z • if you don’t know σ, use t

Since we are comparing a single sample mean to a population mean, this t test is called Single Sample t Test or One Sample t Test.

The t Distribution Since the t statistic utilizes the estimated standard error (sX), the t distribution only approximates the normal distribution and is based on degrees of freedom

• (df = n - 1) not the total sample size. • as df and sample size increase, the closer the s represents σ, and the better the t

distribution approximates the normal (z) distribution • since the t distribution has more variability, it is more spread out and flatter • we use the t statistic in a very similar way as we used z, in that we use a t distribution

table to find the probability of a t statistic • note: since the t statistic is dependent on degrees of freedom, the critical t statistics

corresponding to levels of significance (α) vary with the degrees of freedom, unlike the critical z scores (where a two-tailed test at .05 will always corresponds to zcritical = ± 1.96)

Summary Table of Hypotheses Notation (applies values from following example)

Alternative Null

One-tailed H1: μ > 27 H0: μ ≤ 27

Two-tailed H1: μ ≠ 27 H0: μ = 27

Reporting of Results of the t Test t Test results statement include the following parts:

• results with sample mean and standard deviation; (M = 24.58 , SD = 3.48 ) • t calculated with the degrees of freedom in parentheses; (t(11) = -2.40) • alpha level or p-value; (p< .05) • two-tailed or one-tailed

Example: Subjects (M = 24.58 , SD = 3.48) spent significantly less time talking to parents than the therapist’s claim; t(11) = -2.40, p< .05, two-tailed.

Assumptions of the t test: independent observations, normal population

Putting it all together Example of a two-tailed t test A family therapist states that parent talk to their teens an average of 27 minutes per week. Surprised by this claim, a counselor collects data on 12 teens and finds the following (X = 24.58, s = 3.48) Does the amount of parent talk for the sample significantly differ from the therapist’s claim? Test at the .05 level. • Step 1: Develop hypotheses

• State Alternative: Amount of parent talk for sample will significantly differ from the norm.

• Determine if it is a one-tailed or two-tailed test. • It is non-directional hypothesis ------>two-tailed

• H1: μ≠ 27 (samples will be different) • H0: μ= 27 (samples will NOT be different)

• Step 2: Establish significance criteria

• Computer α=.05 • Hand calculations Identify tcritical used for the alpha level, the appropriate test, & df

• two-tailed test at .05 (df =11) corresponds to tcritical = ± 2.201

• Step 3: Collect and analyze sample data • Computer enter and analyze data • Hand calculations

• Calculate estimated standard error sx = s = 3.48 = 3.48 = 1.01 n 12 3.46

• Step 4: Compare sample data to null------>calculate test statistic

• Computer Identify test statistic and p-value in output o t(11)=-2.396, p=.019 o p-value (.019) is less than alpha (.05) so it is significant reject null

Hand Calculations • Convert the sample mean into a t statistic to determine if it falls into the

critical region.

tcalculated = X - μ = 24.58 - 27 = -2.42 = -2.396 it exceeds -2.201, so sX 1.01 1.01 it is sig., reject null

• Step 5: Draw conclusion • Amount of parent talk for sample (M = 24.58, SD = 3.48) significantly differs

from the norm; t(11)=-2.396, p<.05, two-tailed.

Video #7: In-Class Practice Problems 1. On a standardized spatial skills task, normative data reveals that people typically get μ = 15

correct solutions. A psychologist tests n = 7 individuals who have brain injuries in the right cerebral hemisphere. For the following data, determine whether or not right-hemisphere damage results in reduced performance on the spatial skills task. Test at the .05 level. Data: 12, 16, 9, 8, 10, 17, 10

T-test results: μ - mean of Variable H0 : μ = 15 HA : μ < 15 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Circle: One-tailed Two-tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = g. Level of significance (p) = h. Circle: reject null or fail to reject null i. Write your conclusion in sentence form.

Variable Sample Mean Std. Err. DF T-Stat P-value

var1 11.714286 1.3222327 6 -2.4849744 0.0237

2. A researcher would like to examine the effects of humidity on eating behavior. It is know

that laboratory rats normally eat an average of μ = 21 grams of food each day. The researcher selects a random sample of n = 25 rats and places them in a controlled-atmosphere room where the relative humidity is maintained at 90%. On the basis of this sample, can the researcher conclude that humidity affects eating behavior. Test at the .05 level.

T-test results: μ - mean of Variable H0 : μ = 21 HA : μ not equal 21 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Circle: One-tailed Two-tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = g. Level of significance (p) = h. Circle: reject null or fail to reject null i. Write your conclusion in sentence form.


var1 16.12 0.79229623 24 -6.1593122 <0.0001

3. Does the average age of students enrolled in EDFI 641 differ significantly from the average age of BGSU grad students (24 years)? Test at the .01 level.

T-test results: μ - mean of Variable H0 : μ = 24 HA : μ not equal 24

a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Circle: One-tailed Two-tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = g. Level of significance (p) = h. Circle: reject null or fail to reject null i. Write your conclusion in sentence form.


var1 27.125 1.4314183 15 2.1831493 0.0453

Video #8: t Test of Independent Samples

So far, we have only used one sample to draw inferences about one population. What if we want to compare two different groups, such as male vs female or Treatment A students vs Treatment B students? t Test of Independent Samples draws conclusions about two populations by comparing two samples; since we are looking at differences between the two samples and the two populations, the t statistic reflects these multiple comparisons

tsingle sample = X - μ tind samples = (X1 - X2) - (μ1 − μ2) where sX1 - X2 = sp2 + sp

2 sX sX1 - X2 n1 n2

Recall, that for the single sample t test, we calculated the estimated standard error. Since we are now comparing two samples to two populations, we calculate the standard error of sample mean differences. Standard error of sample mean differences —total amount of error involved in using two sample means to approximate two population means (averages the error of the two sources).

• However, the preceding formula for sX1 - X2 is only appropriate when the two samples are the same size. To correct for the bias in sample variances, we need to combine the two sample variances into a single value called pooled variance.

Pooled Variance—averages the two sample variances, which allows the bigger sample to carry more weight. pooled variance = sp

2 = SS1 + SS2 df1 + df2

• Using the pooled variance, we can now calculate an unbiased measure of the standard error of sample mean differences:

sX1 - X2 = sp2 + sp

2 n1 n2

Hypothesis Testing with t Test of Independent Samples t Test of Independent Samples used to test a hypothesis about the mean difference between two populations

• null hypothesis reflects no difference • alternative hypothesis reflects a difference

Alternative Null One-tailed H1: μ1 > μ2 OR H1: μ1 − μ2 > 0 H0: μ1 ≤ μ2 OR H0: μ1 − μ2 ≤ 0 Two-tailed H1: μ1 ≠ μ2 OR H1: μ1 - μ2 ≠ 0 H0: μ1 = μ2 OR H1: μ1 - μ2 = 0

• rejection of null------>data indicate a significant difference between the two populations • failure to reject null------>data indicate NO significant difference between the two populations Assumptions about t test of independent samples: independent observations, each population must be normal and have equal variances (homogeneity of variance).

Putting it all together Example of a one-tailed t test A psychologist would like to examine the effects of fatigue on mental alertness. An attention test is prepared that requires subjects to sit in front of a blank TV screen and press a response button each time a dot appears on the screen. A total of 110 dots are presented during a 90 minute period, and the psychologist records the number of errors for each subject. Two groups of subjects are selected. The first group (n =5) is test after they have been awake for 24 hours (X = 34, SS = 63). The second group (n=10) is tested in the morning after a full night’s sleep (X = 24, SS = 100). Can the psychologist conclude that fatigue significantly increases errors on an attention task? Test at .05 level. • Step 1: Develop hypotheses

• State alternative: Fatigue will significantly increase the number of errors on an attention task. • It is directional hypothesis ------>one-tailed

H1: μfatigue > μrested H0: μfatigue ≤ μrested

• Step 2: Establish significance criteria • Computer α=.05 • Hand calculations Identify tcritical used for the alpha level, the appropriate test, and df

• one-tailed test at .05 (df =13) corresponds to tcritical = +1.771

• Step 3: Collect and analyze sample data; • Computer • Hand calculations Calculate pooled variance

pooled variance = sp2 = SS1 + SS2 = 63 + 100 = 163 = 12.54

df1 + df2 4 + 9 13

• Calculate standard error of sample mean differences

sX1 - X2 = sp2 + sp

2 = 12.54 + 12.54 = 2.51 + 1.25 = 1.94 n1 n2 5 10 • Step 4: Compare sample data to null------>calculate test statistic

• Computer review output Two Sample T-test results (with pooled variances): μ1 - mean of var2 where var1=1 μ2 - mean of var2 where var1=2 H0 : μ1 - μ2 = 0 HA : μ1 - μ2 > 0

Difference Sample Mean Std. Err. DF T-Stat P-value

μ1 - μ2 10 1.9360149 13 5.1652493 <0.0001

Identify test statistic and p-value in output t(13)=5.17, p<.0001

• Compare p-value to alpha level p is less than .05 reject null

• Hand calculations Calculate t • tind samples = (X1 - X2) - (μ1 − μ2) = (34 - 24) - 0 = 10 = 5.15

sX1 - X2 1.94 1.94

• tcalculated > t critical, reject null

• Step 5: Draw conclusion • Null is rejects so alternative hypothesis is restated as conclusion

• Fatigue significantly increased the number of errors in attention task; t(13)=5.17, p<.0001, one-tailed.

Some additional thoughts when comparing groups: • Create frequency polygons for each group to decide which measure of central tendency is

appropriate and if they follow a normal distribution • If possible use information about known groups, such as norms from standardized tests, to

compare sample data • Calculate effect size as a measure of the magnitude of a difference between the two

groups. This has become very important in recent years.

• A t test will not calculate effect size. You must calculate it by hand.

o A common index of effect size (r2) Percentage of Variance accounted for

• effect size (r2) = t2 t2 + df

• Typically an effect size of 0.50 (50%)or larger signifies an important difference

• Use inferential statistics very cautiously especially when dealing with non-random samples-be very careful in generalizing your results to the population

In-Class Practice Problems 1. Extensive data indicate that first-born children develop different characteristics than later-born

children. For example, first-borns tend to be more responsible, hard working, higher achieving, and more self-disciplined than their later-born siblings. The following data represent scores on a test measuring self-esteem and pride. Samples of n=10 first-born college freshman and n=20 later-born freshmen were each given the self-esteem test. Do these data indicate a significant difference? Test at the .05 level.

Summary statistics for var2 grouped by var1

var1 n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3

1 10 43.1 17.211111 4.1486278 1.3119112 43.5 14 36 50 40 46

2 20 36.8 25.010527 5.0010524 1.1182693 36.5 18 30 48 33 40

Two Sample T-test results (with pooled variances): μ1 - mean of var2 where var1=1 μ2 - mean of var2 where var1=2 H0 : μ1 - μ2 = 0 HA : μ1 - μ2 not equal 0 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Circle: One-tailed Two-tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = g. Level of significance (p) = h. Circle: reject null or fail to reject null j. effect size r2= i. Write your conclusion in sentence form.


μ1 − μ2 6.3 1.8372631 28 3.4290135 0.0019

2. Does level of anxiety (measured on a scale from 1 to 10) when enrolling in a statistics class differ by

gender? Test at the .05 level. Summary statistics for var2 grouped by var1

Two Sample T-test results (with pooled variances): μ1 - mean of var2 where var1=1 μ2 - mean of var2 where var1=2 H0 : μ1 - μ2 = 0 HA : μ1 - μ2 not equal 0 a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Circle: One-tailed Two-tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = g. Level of significance (p) = h. Circle: reject null or fail to reject null j. effect size r2= i. Write your conclusion in sentence form.

var1 n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3

1 10 7.1 8.1 2.8460498 0.9 8 7 3 10 4 10

2 10 5.6 6.711111 2.5905812 0.8192137 5 7 3 10 4 7


μ1 - μ2 1.5 1.2170091 18 1.2325299 0.2336

Additional Practice: Interpreting Research Articles t-test of Independent Sample

Read the following excerpt to complete the questions on the next page:

Researchers studied women enlisted in the Navy and examined the impact of sexual harassment on their satisfaction with the military. Among the participants, 436 were sexually harassed and 582 were not. Participants completed a 7-item question that utilized a 5 point scale in which higher scores indicate more positive perceptions. Item 3 scores have been reversed to align with the positive nature of the other items. Table 1. Mean responses and t-test results

Question Mean Harassed

Mean Not Harassed

t

1. I would recommend the Navy to others. 3.31 3.60 3.76* 2. I am satisfied with my rating. 3.24 3.56 4.02* 3. I plan to leave the Navy because I am dissatisfied. 3.17 3.67 5.89* 4. My experiences have encouraged me to stay in the Navy. 2.24 2.58 4.56* 5. This command provides the information people need to make

decisions about staying in the Navy. 2.71 3.00 3.80*

6. In general, I am satisfied with the Navy. 3.29 3.68 5.41* 7. I intend to stay in the Navy for at least 20 years. 2.66 3.22 5.63* * indicates p<.001 Source: Newell, C.E., Rosenfeld, P., & Culbertson, A. L. (1995). Sexual harassment experiences and equal opportunity perceptions of Navy women. Sex Roles, 32, 159-168. 1. Which group of Navy women is more likely to recommend the Navy to others? In other words, which

group has the higher mean for item one? 2. Is the mean difference for item 1 statistically significant? 3. Should we reject the null hypothesis for item 1? Explain. 4. How many items generated statistically significant mean differences? 5. In general, what can we conclude about sexual harassment and navy satisfaction? Answers: 1) Those who have NOT been sexually harassed have the higher mean and are more likely to recommend the Navy to others; 2) Yes, it is significant at the p<.001 level. 3) Yes, the t result is significant at p<.001.; 4) all items were significant; 5) Navy women who have NOT been sexually harassed are more satisfied with the Navy than those who have been sexually harassed.

Video #9: t Test of Related Samples Many times research evaluates the effect of a treatment by uses a pretreatment and post treatment design with a single sample, this is called a repeated measures study.

• since the test uses the same sample, there is no risk that one group is different from another even before the treatment begins.

• researchers try to build upon this concept when studying two samples by matching subjects from the two groups--this helps to eliminate pretreatment differences

• t test of related samples compares the differences between the pre and post treatment scores of the sample to pre-post differences in the population.

• difference score = D = X2 - X1

• Mean of differences (D) = ΣD n Computing the t of related samples

• Recall tsingle sample = X - μ sX

• For t of related samples, the sample data are the difference scores (D) and the population data we are interested in is NOT the population mean but the population mean difference (μD), therefore,

t related samples = D - μD where sD = s sD n

• We are not comparing means of the pre and post, rather the pre and post scores for each individual are compared!

Developing the hypotheses:

Alternative Null

One-tailed H1: μD > 0 H0: μD ≤ 0

Two-tailed H1: μD ≠ 0 H0: μD = 0

Assumptions of the related samples t test

• independent observations, normal distribution of pop of differences

Putting it all together Example of a one-tailed t test A researcher is interested in studying the effects of endorphins (the feeling-good chemical that is released in the brain at the end of aerobic exercise) on pain tolerance. A sample of 16 subjects is obtained; each person’s tolerance for pain is tested before and after a 50 minute session of aerobic exercise. On the average, the pain tolerance for the sample was D =10.5 higher after exercise than it was before. The SS for the sample difference scores was SS = 960. Do these data indicate a significant increase in pain tolerance following exercise. Test at the .01 level. • Step 1: Develop hypotheses

• State alternative—Exercise will significantly increase pain tolerance • It is directional hypothesis ------>one-tailed

H1: μD > 0 H0: μD ≤ 0 • Step 2: Establish significance criteria

• Computer α=.01 • Hand calculations Identify tcritical used for the alpha level, the appropriate test, and df

• one-tailed test at .01 (df =15) corresponds to tcritical = +2.602

• Step 3: Collect and analyze sample data • Computer • Hand calculations

• Calculate sample mean of D (D): D = 10.5 • Calculate standard deviation of D scores

s = SS = 960 = 64 = 8 n-1 15

• Calculate estimated standard error of D sD = s = 8 = 2 n 16 • Step 4: Compare sample data to null------>calculate test statistic

• Computer • Identify test statistic and p-value; t(15)=5.25, p<.001 • Compare p-value with alpha level

• .001 is less than .01 reject null • Hand calculations Calculate t

trelated samples = D - μD = 10.5 = 5.25 it exceeds tcritical reject null sD 2

• Step 5: Draw conclusion • Aerobic exercise significantly increased pain tolerance; t(15)=5.25, p<.001, one-tailed.

In-Class Practice Problems 1. An investigator for NASA examines the effect of cabin temperature on reaction time. A

random sample of 10 astronauts and pilots is selected. Each person’s reaction time to an emergency light is measured in a simulator where the cabin temperature is maintained at 70 degrees F and again the next day at 95 degrees. Using the results of this experiment, can the psychologist conclude that temperature has a significant effect on reaction time. Test at the .01 level.

Summary statistics Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3

var1 10 203 381.55554 19.533447 6.177018 205.5 55 176 231 183 216

var2 10 223 417.1111 20.423298 6.458414 224 65 190 255 206 240

Paired T-test results: μD - mean of the differences between var1 and var2 H0:μD = 0 HA:μD not equal 0

a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative

c. Circle: One-tailed Two-tailed

d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0:

f. tcalculated = g. Level of significance (p) =

h. Circle: reject null or fail to reject null i. Write your conclusion in sentence form.

Difference Sample Diff. Std. Err. DF T-Stat P-value

var1 - var2 -20 1.67332 9 -11.952286 <0.0001

2. Does eating oatmeal decrease cholesterol levels? A researcher implements a 30-day treatment that consists of eating a bowl of oatmeal everyday for breakfast. Cholesterol is measured before (var1) and after (var2) the treatment for the 10 participants. An α = .05 was utilized.

Summary statistics

Paired T-test results: μD - mean of differences between var1 and var2 H0:μD = 0 HA:μD > 0

a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative

c. Circle: One-tailed Two-tailed

d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = g. Level of significance (p) =

h. Circle: reject null or fail to reject null i. Write your conclusion in sentence form.

Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3

var1 10 258.2 192.4 13.870832 4.3863425 257.5 40 240 280 245 270

var2 10 222 269.33334 16.411379 5.1897335 221 56 190 246 210 230

Difference Sample Diff. Std. Err. DF T-Stat P-value

var1 - var2 36.2 4.319979 9 8.379669 <0.0001

Additional Practice: Interpreting Research Articles t-test of Related Samples


Seventy-four drug users participated in a Behavioral Counseling Program to reduce drug use. Among the participants, 75% were male, 75% were adults, 12% were minority, and 25% were mandated to obtain counseling by a public agency. With respect to drug use, about 50% used cocaine and 75% used marijuana. The Behavioral Counseling Program consisted of three parts: 1) stimulus control, including competing response training; 2) urge control procedure for interrupting incipient drug use urges, thoughts, and actions; and 3) behavior contracting, especially between youth and parents. Drug use was measured at the beginning of treatment, the end of treatment, and one month after treatment. Drug use decreased substantially from pretreatment to the end of treatment ( t=4.28, p<.001) with slight, nonsignificant decrease from end of treatment to the follow-up month ( t=.92,p=.72). The decrease from pretreatment to follow-up remained statistically significant ( t=4.42, p<.001). Source: Azrin, N. H., Acierno, R., Kogan, E. S., Donohue, B., Besalel, V. A., & McMahon, P.T. (1996). Follow-up results of supportive versus behavioral therapy for illicit drug use. Behavior Research and Therapy, 34, 41-46. 1. As is customary in journal article, the research did not state the null hypothesis. Write the

appropriate null hypothesis for the first t-test result reported in the excerpt. 2. Should the null hypothesis written for item 1 be rejected? Explain. 3. Should the null hypothesis be rejected for the second t test reported in the excerpt. Explain. 4. The last difference in the excerpt was statistically significant at the .001 level. Was it also

significant at the .05 level? Answers: 1)The treatment of Behavioral Counseling Program will NOT significantly reduce drug use among participants. 2) Yes, since the p-value is less than .05. 3) No, the p-value is greater than .05. 4)Yes, If it is significant at p<.001 then it is also significant at p<.05.

Coke vs. Pepsi Experiment: t tests

We are going to conduct an experiment using the Coke vs. Pepsi Taste Test that investigates two research questions: 1) Are diet drinkers (when compared to regular drinkers) more accurate in tasting the

difference between Coke and Pepsi? • This question will utilize a t-test of independent samples, which you can complete for 5

points of extra credit (Extra Credit #1).

2) When tasting the difference between Coke and Pepsi, is one’s prediction of accuracy significantly different from one’s actual ability/accuracy?

• This question will utilize a t-test of related samples, which you will complete for 5 points of extra credit (Extra Credit #2).

In order to complete this experiment, you need at least one other person (who has the same pop preference as you) to participate. It would be great if you can find 2-4 more individuals. Directions:

1. Identify your pop preference (Diet or Regular). • If you prefer diet pop, purchase one can/bottle of Diet Coke and one of Diet Pepsi. • If you prefer regular, purchase can/bottle of Coke and one of Pepsi.

2. In addition to the pop, you will need the following supplies to complete this experiment. • 5 small paper cups for each participant • Pen or pencil • Napkins in case you spill • Pretzels or chips for “cleansing one’s palate”

3. Once you have your supplies and participants together, record each participant’s name in the first column of the data grid below and one’s preference (diet=1, regular=2) in the second column.

Data Grid

Name Preference Prediction % Actual %

4. Have each participant predict how accurate they will be in identifying the pop as Coke or Pepsi.

Since each person will be given 5 cups of pop, predict how many times out of 5 chances you will be correct in the identification process (e.g., 3/5). Then, convert that fraction into a percent (e.g., 3/5=60%). Record this percent in the third column of the grid.

5. Determine who will complete the taste test first. Have that person turn away while another

participant fills 5 cups with pop (make sure that some cups have Pepsi and other cups have Coke

and that you know which cups have which pop). Hint: Don’t write the name of the pop on the bottom of the cup; it will show through as the person drinks the pop.

6. Have the taste tester proceed in identifying the pop in each cup, while another participant

records the accuracy. Don’t tell the results to the taster until all 5 cups have been tasted. Calculate the number of correct tastes out of five. Convert that fraction into a percent and record the percent in column 4 of the grid.

7. Once you and your fellow participants have finished the taste test, add your results to the

spreadsheet below. 8. Go to StatCrunch and enter ALL the data from the spreadsheet (including the data provided for

15 individuals). You should have a minimum of n=17 for your sample. Proceed with the t-test directions.

Extra Credit Worksheets are in Computer Lab Packet!

Video #10: Analysis of Variance Analysis of Variance (ANOVA) is a hypothesis testing procedure that evaluates mean differences between two or more treatments or groups; t test can only compare two groups. Single Factor Design—studies the effect that one factor (independent variable) has on the dependent variable. Note that although there is only one factor, this factor has more than two categories so that we are comparing two or more groups/treatments. Hypothesis Testing for ANOVA

• Null hypothesis states that there is no difference among the groups or treatments • H0: μ1 = μ2 = μ3

• Alternative hypothesis states that at least one mean is different from the others • H1: At least one mean will differ

ANOVA Test Statistic ANOVA creates a test statistic called an F-ratio that is similar to t statistic

• Recall that t = obtained difference between sample means = tsingle = X - μ difference expected by chance (error) sX

• F is similar to t, but since there are more than two means to compare, variance will be used to represent the differences between all the means being compared.

F = variance (differences ) between sample means variance (differences ) expected by chance (error)

• Like t, a large F value indicates the treatment effect (mean differences) that is unlikely

due to chance. • when the treatment had no effect so that the means are the same (H0 is true),

the F-ratio will be close to 1.00

Distribution of F-ratios • Like t, F is also distributed • But the F distribution is not normal; it is positively skewed, the degree of which depends

upon the degrees of freedom from the two variances. • large df -------> nearly all F-ratios are clustered around 1.00 • small df -------> the F-ratios are more spread out

• Since the F distribution is positively skewed, we are only looking in one tail for the difference. As a result we don’t need to indicate if the test is one or two tailed.

• Recall: we expect F near 1.00 if the null is true and expect a large F if the null is rejected • therefore, significant F-ratios will be in the tail of the F distribution

F = variance (differences) between group means variance (differences) expected by chance/error (within groups) Variance (differences) between groups can be due to:

• treatment effect • individual differences (subjects within the various groups are different even before

the treatment begins • experimental error (caused by poor equipment, lack of attention/knowledge on the

researcher’s part, unpredictable change of events) Variance within groups can be due to:

• individual differences (subjects within the various groups are different even before the treatment begins

• experimental error (caused by poor equipment, lack of attention/knowledge on the researcher’s part, unpredictable change of events)

Consequently, if we divide the variance between treatments by the variance within treatments, (individual differences and error cancel out) so we can determine the treatment effect. F = variance between groups = treatment effect + individual differences + error variance within groups individual differences + error The last few steps of ANOVA require the following calculations:

• df between groups = k – 1 where k is number of groups • df within-groups = N – k where N is total number of individuals in groups • MS between = variance between treatments = SSbetween

df between

• MS within = variance within treatments = SSwithin df within

• F-ratio = MS between MS within

Putting it all together Example: A number of studies on jetlag have found that jetlag seems to be worse when people are traveling east. A researcher examines how many days it takes a person to adjust after taking a long flight. One groups flies west across time zones (NY to CA); a second group flies east (CA to NY); and a third group takes a long flight within one time zone (San Francisco to Seattle). Perform an analysis of variance to determine if jetlag varies for the direction of travel. Use the .05 level of significance. Computer Results Analysis of Variance results for var2 grouped by var1 Sample means:

Group n Mean Std. Error

1 6 2.5 0.4281744

2 6 6 0.57735026

3 6 0.5 0.2236068 ANOVA table:

Source df SS MS F-Stat P-value

Treatments 2 93 46.5 41.02941 <0.0001

Error 15 17 1.1333333

Total 17 110

Step 1: Develop hypotheses

• State alternative—Direction of travel will significantly effect jetlag. • H0: μ1 = μ2 = μ3 H1: At least one mean will differ

Step 2: Establish significance criteria

• Computer α=.05

Step 3: Collect and analyze sample data • Computer enter data

Step 4: Compare sample data to null------>calculate test statistic

• Computer • Identify test statistic and p-value; F(2, 15)=41.03, p<.0001 • Compare p-value with alpha level

• .0001 is less than .05 reject null

Step 5: Draw conclusion • Direction of travel significantly effected jetlag.

Post Hoc Tests

So far, we have only been able to determine if there is a significant difference (treatment had an effect), but we are unable to determine which group is different. We could do a t test for each comparison, but we run the risk of a type I error when we run several hypothesis tests, called experimentwise alpha level, the overall probability of a Type I error over a series of separate hypothesis tests. Fortunately, there are some test that are very conservative and allow us to determine which group is different after ANOVA has been conducted and a difference has been found; these are called Post Hoc Tests. The Scheffe Test is the safest post hoc test used to compare two groups/treatments. It is safe because it uses the value of k to calculate the df and the critical F-ratio from the original ANOVA to determine if it is significant.

Unfortunately, StatCrunch is unable to conduct Post Hoc tests! Reporting of ANOVA Results Much of the time an ANOVA summary table is presented that includes SS, df, and MS for each treatment as well as the F-ratio; in addition a table of means and standard deviations for each treatment will be presented. Using the previous example, the tables would look like the following Westbound Eastbound Same zone M 2.5 6.0 0.5 SE 0.43 0.58 0.22 ANOVA SUMMARY Source SS df MS Between treatments 93 2 46.5 F = 41.02 Within treatments 17 15 1.13 Total 110 17 When space is an issue, the results should include the F-ration with both degrees of freedom in parentheses and the p-value. Do NOT indicate one-tailed or two-tailed!

• Travel direction does effect jetlag; F(2, 15) = 41.02, p < .05. Assumptions of ANOVA: independent observations, samples are selected from normal populations that also have equal variances.

ANOVA In-Class Practice Problems 1. The extent to which a person’s attitude can be changed depends on how big a change you are trying

to produce. In a classic study on persuasion, Aronson, et al. (1985) obtained three groups of subjects. One group listened to a persuasive message that differed only slightly from the subjects’ original attitudes. For the second group, there was a moderate discrepancy between the message and the original attitudes. For the third group, there was a large discrepancy between the message and the original attitudes. For each subject, the amount of attitude change was measured. Data were entered for the three groups (small, moderate, large discrepancy) and an ANOVA was utilized to determine if the amount of discrepancy between the original attitude and the persuasive argument has a significant effect on the amount of attitude change. Test at the .05 level.

Analysis of Variance results for var2 grouped by var1

a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Alternative hypothesis in sentence form. d. Write the alternative and null hypotheses using correct notation. H1: H0: e. Fcalculated = f. Level of significance (p) = g. Circle: reject null or fail to reject null h. Write your conclusion in sentence form.


1 6 1.5 0.4281744

2 6 6.6666665 0.71492034

3 6 1 0.2581989


Treatments 2 118.111115 59.055557 38.79562 <0.0001

Error 15 22.833334 1.5222223

Total 17 140.94444

2. A psychologist would like to examine the relative effectiveness of three therapy techniques for treating mild phobias. A sample of N=15 individuals who display a moderate fear of spiders is obtained. These individuals are randomly assigned to the three therapies. After a certain amount of therapy, the psychologist measures the degree of fear reported by each individual. ANOVA was conducted to determine if there are any significant differences among the three therapies. Test at the .05 level.

Analysis of Variance results for var2 grouped by var1

a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Alternative hypothesis in sentence form. d. Write the alternative and null hypotheses using correct notation. H1: H0: e. Fcalculated = f. Level of significance (p) = g. Circle: reject null or fail to reject null h. Write your conclusion in sentence form.


Treatments 2 20.933332 10.466666 6.1568627 0.0145

Error 12 20.4 1.7

Total 14 41.333332


1 5 4 0.70710677

2 5 1.6 0.50990194

3 5 1.4 0.50990194

Additional Practice: Interpreting Research Articles ANOVA

Read the following excerpt to complete the questions on the next page: Researchers examined the impact of teacher self-efficacy on classroom technology use. Participants included 101 teachers from four elementary (K-6) schools in Northwest Ohio. Of the 101 participants, 13 were male. Teachers were administered the Teacher Attribute Survey (TAS) which measured classroom technology use (teacher, student, and overall). Teacher self-efficacy was also measured in the instrument and represented one’s belief in affecting student performance. Low, moderate, and high levels of self-efficacy were created. As such, a teacher with low self-efficacy was defined as 3.29 or below, medium self-efficacy as range from 3.3 to 4.6, and high self-efficacy as 4.61 and higher. Table 1. Means and ANOVA results for Self-Efficacy groups and Technology Use Technology Use Means by Level of Self-Efficacy Low (n=12) Moderate (n=78) High (n=11)

ANOVA Results

Teacher Tech Use 1.73 2.15 2.36 F(2,98)=3.77, p<.05 Student Tech Use 1.24 1.49 1.81 F(2,98)=4.52, p<.05 Overall Tech Use 2.08 1.82 2.08 F(2,98)=4.71, p<.05 1. Which type of technology use is the highest among all levels of self-efficacy? 2. Which group of teachers (low, moderate, or high self-efficacy) report the highest technology use

among their students? 3. Write the null hypothesis for self-efficacy and overall technology use, where the ANOVA results

indicate: F(2,98)=4.71, p<.05. 4. Considering the null hypothesis that you wrote for item 3, should the null hypothesis be rejected?

Explain. Answers: 1) teacher technology use; 2) teachers with high self-efficacy (M=1.81); 3) Self-efficacy will NOT significantly impact overall technology use among teachers; 4) Reject the null, F(2,98)=4.71, p<.05.

Video #11: Correlation and Regression Correlation—statistical technique used to measure and describe a relationship between two quantitative variables; correlation measures 3 characteristics: • direction of relationship

• positive—as one variable increases so does the other (food intake & weight) • negative (inverse)—as one variable increases the other decreases (exercise & weight)

y y

x x Positive (r = +.90) Negative (r = -.90)

• form of relationship • linear—the relationship between x and y falls in a straight line • curvilinear— the relationship between x and y curves (age across the lifespan is a variable that

often creates a curvilinear relationship)

• degree (strength) of relationship • degree of relationship is reflected in a correlation coefficient (usually r) • r ranges between -1 to +1, 0 indicating no relationship, while +1 indicates a perfect positive

relationship, and -1 indicates a perfect negative relationship Pearson Correlation Coefficient

• measures the degree and direction of linear relationship between two variables

• r = degree to which X and Y vary together = SP degree to which X and Y vary separately SSXSSY

• since we will be computing variability for each variable as well as their variability together, we will be using SS and a new concept, SP, sum of products.

• Sum of products is used to compute the amount of covariability of two variables

• SP = Σ (X - X)(Y - Y)

Correlation • does NOT measure cause and effect • when data have a limited range of scores, the value of the correlation can be exaggerated • interpreting strength of coefficient (practical significance):

• r > .8 is very strong • r = .6 - .79 is strong • r = .4 - .59 is fair • r < .39 is weak

• to describe how accurately one variable predicts the other, square r. For example, if r=.60,

then r2 = .36, which can be interpreted as 36% of the variability in Y scores can be predicted from the relationship with X. r2 is called the coefficient of determination because is measures the proportion of variability in one variable that can be determined from the relationship with the other variable.

Hypothesis Testing (hypotheses use the Greek letter rho, ρ, to signify r)

Alternative Null One-tailed H1: ρ > 0 H0: ρ ≤ 0

Two-tailed H1: ρ ≠ 0 H0: ρ = 0

Putting it all together Example: To measure the relationship between anxiety level and test performance, a psychologist obtains a sample of n=6 college students from an intro stats course. Students arrive fifteen minutes prior to the exam and complete physiological measures of anxiety (heart rate, skin resistance, blood pressure, etc.). Anxiety ratings and exam scores are listed below. Compute the Pearson correlation to determine if a negative relationship exists between anxiety and test performance. Test at the .05 level. • Step 1: Develop hypotheses.

• State Alternative: Anxiety and test performance will negatively relate. • It is a directional hypothesis ---- one-tailed

H1: ρ < 0 (population shows negative correlation) H0: ρ > 0 (population does not show negative correlation) • Step 2: Establish significance criteria

• Computer StatCrunch does not calculate the p-value for the correlation coefficient. As a result, we must identify rcritical used for α, tails, and df

• df = n –2 = 6 – 2 = 4, r critical = -.729 • Notice that df is n-2 for correlation, since we need two points to create a line.

• Hand calculations Identify rcritical used for α, tails, and df

• Step 3: Utilize sample data to calculate r

• Computer • Hand calculations Calculate SP, SSX, SSY, r

Anxiety Rating (X) Exam Score(Y) (X - X) (Y - Y) (X - X) (Y - Y) (X - X)2 (Y - Y)2 5 80 0 -3 0 0 9 2 88 -3 5 -15 9 25 7 80 2 -3 -6 4 9 7 79 2 -4 -8 4 16 4 86 -1 3 -3 1 9 5 85 0 2 0 0 4 X= 5 Y = 83 SP = -32 SSX=18 SSY= 72 • Step 4: Compare sample data to null------>calculate test statistic

• Computer Identify test statistic and compare rcalculated to rcritical • Correlation between var2 and var1 is: -0.8888889 • r falls into critical region, it is significant reject null

Hand Calculations • Calculate r = SP = - 32 = -32 = -.888

SSX SSY 18(72) 36

• Compare rcalculated to rcritical • r falls into critical region reject null

• Step 5: Draw conclusion

• A negative relationship exists between anxiety and test performance, r(4)=-.889, p<.05, one-tailed.

Computer Output

Correlation between var2 and var1 is: -0.8888889

Regression Regression—statistical technique for finding the best-fitting straight line for a set of data; used when wanting to determine the ability of one variable to predict another variable (e.g., using SAT score to predict freshman college GPA) Regression line—line that represents the linear relationship; represented by a linear equation

• Y = a + bX, where a = Y-intercept and b=slope • Least-squares method helps determine the best-fitting line by minimizing the error

between the predicted & actual values of Y.

• Y = a + bX , where b = SP and a = Y – bX SSX Example: Using the correlation problem we just solved, let’s calculate the regression line.

• Step 1: Use X, Y, SSX, SP to calculate b and a • (previously calculated: X= 5, Y = 83, SP = -32, SSX=18, SSY= 72)

• b = SP = -32 = -1.777 SSX 18

• a = Y – bX a = 83 – (-1.777)(5)

a = 83 + 8.888 a = 91.888

• Step 2: Calculate regression equation • Y = a + bX

Y = 91.89 -1.78X

We can now use regression equation to predict Y for a given value of X. • If X=7, what is the predicted value of Y? • Y = 91.89 -1.78X • Y = 91.89 -1.78(7) = 79.43

Computer Output

Computer Output: The output in the video will appear different, since a different version of StatCrunch was used.

Simple linear regression results: Dependent Variable: var2 Independent Variable: var1 var2 = 91.888885 - 1.7777778 var1 Sample size: 6 R (correlation coefficient) = -0.8889 R-sq = 0.79012346 Estimate of error standard deviation: 1.9436506 Parameter estimates:

Analysis of variance table for regression model:

Predicted values:

Parameter Estimate Std. Err. DF T-Stat P-Value

Intercept 91.888885 2.4241583 4 37.905483 <0.0001

Slope -1.7777778 0.45812285 4 -3.88057 0.0178

Source DF SS MS F-stat P-value

Model 1 56.88889 56.88889 15.058824 0.0178

Error 4 15.111111 3.7777777

Total 5 72

X value Pred. Y s.e.(Pred. y) 95% C.I. 95% P.I.

7 79.44444 1.2120792 (76.07917, 82.809715) (73.08468, 85.80421)

Regression equation

Correlation coefficient

Predicted value for Y when X=7

Ignore these p-values since they are NOT for the correlation coefficient (r).

Video #11 In-Class Practice Problems

1. You probably have read about he relationship between years of education and salary potential. The

following hypothetical data represent a sample of n = 10 men who have been employed for five years. Does this data indicate a significant relationship between years of higher education and salary. Test at the .05 level. Also find the regression equation for predicting salary from education.

(X) Years of Higher Education: 4, 4, 2, 8, 0, 5, 10, 4, 12, 0 (Y)Salary (in $1000s): 31, 29, 28, 42, 23, 35, 45, 27, 44, 24

Simple linear regression results: Dependent Variable: salary Independent Variable: education salary = 23.135265 + 1.9723947 education Sample size: 10 R (correlation coefficient) = 0.9601 R-sq = 0.92169785 Estimate of error standard deviation: 2.4466708 Parameter estimates:


Predicted values:


Intercept 23.135265 1.2611643 8 18.34437 <0.0001

Slope 1.9723947 0.20325504 8 9.704039 <0.0001


Model 1 563.71045 563.71045 94.168365 <0.0001

Error 8 47.88958 5.9861975

Total 9 611.6


5 32.99724 0.77397215 (31.212456, 34.78202) (27.07964, 38.91484)

a. Independent Variable = Scale: Categorical Quantitative b. Dependent Variable = Scale: Categorical Quantitative c. Circle: One-Tailed OR Two-Tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. rcritical = g. rcalculated = h. Circle: reject null or fail to reject null i. Write your conclusion in sentence form.

j. Regression equation: k. If one has 5 years of education, what is the predicted salary?

2. Research has shown that similarity in attitudes, beliefs, and interests plays an important role in

interpersonal attraction. A therapist examines the correlation in attitudes between husbands (X) and wives (Y). She administers a questionnaire that measures how liberal or conservative one’s attitudes are. Low scores indicate that the person has liberal attitudes while high scores indicate conservatism (scale 1-10). Ten couples participate. Test at the .01 level.

Simple linear regression results: Dependent Variable: wife att Independent Variable: hus att wife att = 0.7785714 + 0.8035714 hus att Sample size: 10 R (correlation coefficient) = 0.7869 R-sq = 0.61919034 Estimate of error standard deviation: 1.6673064 Parameter estimates:


Predicted values:


Intercept 0.7785714 1.4370375 8 0.54178923 0.6027

Slope 0.8035714 0.22280319 8 3.6066422 0.0069


Model 1 36.160713 36.160713 13.007869 0.0069

Error 8 22.239286 2.7799108

Total 9 58.4


5 4.7964287 0.57239175 (3.4764907, 6.1163664) (0.73135275, 8.861505)

a. Independent Variable = Scale: Categorical Quantitative b. Dependent Variable = Scale: Categorical Quantitative c. Circle: One-Tailed OR Two-Tailed d. Alternative hypothesis in sentence form. e. Write the alternative and null hypotheses using correct notation. H1: H0: f. rcritical = g. rcalculated = h. Circle: reject null or fail to reject null i. Write your conclusion in sentence form.

j. Regression equation: k. If the husband has moderate attitude of “5”, what is the value of the wife’s attitude?

Additional Practice: Interpreting Research Articles

Correlation Read the following excerpt to complete the questions on the next page:

Boivin and Hymel (1997) examined the relationships among social behavior, peer experiences and self-perception. A total of 793 French Canadian children participated in the study (393 girls, 400 boys). The participants ranged from third to fifth grade, were from ten elementary schools and from a variety of socioeconomic backgrounds. The following variables were measured: Aggression and withdrawal were measure by showing a picture of all classmates and asking each

student to choose two classmates who best fit each descriptor. For aggression, a score was obtained for each child by summing the number of times he or she was selected for these descriptors: “gets into lots of fights,” “loses temper easily,” “too bossy,” and “picks on other kids.” For withdrawal, a score was obtained for each child by summer the number of times he or she was selected for these descriptors: “rather play alone than with others” and “very shy.”

Social preference was assessed by asking each child to name three other children they would like most and like least for playing together, inviting others to a birthday party, and sitting next to each other on a bus (Higher scores indicate greater social preference.)

Victimization by peers was measure by asking each child to nominate up to five other students who could be described as being made fun of, being called names, and getting hit and pushed by other kids. (Higher scores indicated greater victimization.)

Number of affiliative links was measured by asking, “You have probably noticed children in class who often hang around together and others who are more often alone. Could you name children who often hang around together?” (Higher scores indicate a larger number of affiliative links.)

Loneliness was measured with a 16-item questionnaire with higher scores indicating greater loneliness.

Perceived social acceptance and behavior-conflict were two aspects of self-concept measured with Harter’s Self-Perception Profile for Children. Higher scores reflect a better self-concept in each of the two domains.

Table 1. Correlations among the social behavior, peer expectation, and self-perception measures 1 2 3 4 5 6 7 8 1. Withdrawal -- 2. Aggression -.10 -- 3. Social Preference -.39 -.44 -- 4. Victimization by Peers .42 .53 -.68 -- 5. # of Affiliate Links -.35 .05 .35 -.21 -- 6. Loneliness .29 .12 -.34 .34 -.18 -- 7. Perceived social acceptance -.27 -.04 .28 -.26 .18 -.69 -- 8. Perceived behavior-conduct .06 -.32 .17 -.17 -.06 -.35 .39 --

Source: Boivine, M. & Hymel, S. (1997). Peer experiences and social self-perceptions: A sequential model. Developmental Psychology, 33, 135-143. Notice that the correlation coefficients are presented in a matrix. The column header represent the same variables presented in the row headers, however the column header only uses the number to indicate a certain variable. For example, the circle coefficient of .39, represents the correlation between “Perceived Social Acceptance” and “Perceived Behavior Conduct”.

-86- 1. What is the value of the Pearson r for the relationship between withdrawal and loneliness? Describe

this value in terms of strength and direction. 2. What is the value of the Pearson r for the relationship between social preference and victimization

by peers? Describe this value in terms of strength and direction. 3. Which variable has the strongest relationship with withdrawal? 4. Which variable has the weakest relationship with withdrawal? 5. The Pearson r for the relationship between withdrawal and loneliness indicates that those who tend

to be more lonely tend to be: A. more withdrawn B. less withdrawn

6. Which of the following pairs has the strongest relationship between them?

A. Perceived social acceptance and loneliness B. Withdrawal and victimization by peers C. Number of affiliate links and aggression

7. Which of the following pairs has the weakest relationship between them?

A. Withdrawal and social preference B. Withdrawal and perceived social acceptance C. Withdrawal and perceived behavior-conduct

Answers: 1) .29, weak and positive; 2) -.68, strong and negative; 3) Victimization by peers, r=.42; 4) Perceived behavior-conduct, r=.06; 5) A, more withdrawn; 6) A; 7) C.

Video #12: Chi Square Test for Independence

So far we have used parametric tests to evaluate a hypothesis about the population. Parametric tests require certain assumptions about the population parameters, such as a normal distribution, homogeneity of variance, and a quantitative (interval/ratio) dependent variable. When these assumptions for parametric tests cannot be fulfilled, nonparametric tests can be used. Nonparametric tests • usually do not state a hypothesis in terms of the population distribution, so they are often

called distribution-free tests • are suited for data that utilize a nominal or ordinal scale • are not as sensitive as parametric tests—are more likely to fail in detecting a real

difference between two treatments • one commonly used nonparametric tests is the Chi Square Test for Independence. Chi Square Test of Independence • Used to test a relationship (differences) between two categorical variables • If variables are independent of one another, then there is no relationship. As a result the

distribution of one variable will have the same shape for all the categories of the second variable.

• Alternative hypothesis for Chi Square Test for Independence can be written to focus on the

relationship or on the differences. • H1: Gender is related to learning style. • H1: Learning style will differ by gender.

• Chi Square Test for Independence compares the observed and expected frequencies. Our

expected frequencies come from our null hypothesis and our observed data.

χ2 = (fo-fe)2 fe

Building on our example of females and males with respect to learning styles, the table below presents the data observed for a sample of 125 males and 75 females.

Audio Visual Kinesthetic Males 30 30 65 125

Females 30 25 20 75 60 55 85

• If the distribution for gender is predicted to be the same for the each learning style

category, then the same proportion/percent of males and females in each category would be expected.

• to calculate the expected frequency for each category this formula is used

• fe = fcfr where fc = column total, fr = row total, n n = sample size • the table of expected frequencies would look something like this

Audio Visual Kinesthetic

Males 60(125)/200=38 55(125)/200=34 85(125)/200=53 125 Females 60(75)/200=22 55(75)/200=21 85(75)/200=32 75

60 55 85

• Degrees of freedom are calculated a bit differently

• df = (R - 1)(C - 1), where R= number of rows, C=number of columns • in our example, df = (2-1)(3-1) = 1(2) = 2

• using this and α=.05, our χ2critical = 5.99

Putting it all together Example: Based upon the observed frequencies presented in the table below, can a researcher conclude that learning styles differ by gender? Test at the .05 level.

Audio Visual Kinesthetic Males 30 30 65 125

Females 30 25 20 75 60 55 85

• Step 1: Develop hypotheses.

• State Alternative: Learning style will significantly differ by gender. • Step 2: Establish significance criteria

• Computer α = .05 • Hand calculations Identify χ2

critical used for α and df • df = (2-1)(3-1) = 2 χ2

critical = 5.99 • Step 3: Utilize sample data to calculate χ2

• Computer enter data • Hand calculations Calculate expected frequencies (fe), fo-fe, (fo-fe)2

fo fe fo-fe (fo-fe)2 (fo-fe)2 fe male-audio 30 38 -8 64 1.68 female-audio 30 22 8 64 2.91 male-visual 30 34 -4 16 0.47 female-visual 25 21 4 16 0.76 male-kinesthetic 65 53 12 144 2.72 female-kinesthetic 20 32 -12 144 4.50 Σ = 13.04

• Step 4: Compare sample data to null------>calculate test statistic • Computer Identify test statistic and compare p-value to a level

o p-value is less than .05 reject null Hand Calculations

• Calculate χ2 = 13.04 • Compare χ2

calculated to χ2critical

• Since χ2 = 13.04 and exceeds the χ2critical= 5.99, the null is rejected

• Step 5: Draw conclusion • Males and females differ in learning styles; χ2(2, n=200)=13.04, p<.05.

Statistic DF Value P-value

Chi-square 2 13.042 0.0019

Computer Output

Contingency table results: Rows: var1 (1=male, 2=female) Columns: var2 (1=audio, 2=visual, 3=kinesthetic)

Assumptions of Chi Square Tests • Random sampling • Independence of observations • Expected frequency for any cell MUST be greater than 5 Reporting Chi Square Results Statement should include chi-square value with df and n in parenthesis, and p-value:

• Males and females differ in learning styles; χ2(2, n=200)=13.04, p<.05.

Cell format: 1 2 3 Total

1

3024%50%15%

3024%

54.55%15%

6552%

76.47%32.5%

125100.00%

62.5%62.5%

2

3040%50%15%

2533.33%45.45%

12.5%

2026.67%23.53%

10%

75100.00%

37.5%37.5%

Total

6030%

100.00%30%

5527.5%

100.00%27.5%

8542.5%

100.00%42.5%

200100.00%100.00%100.00%

Count Row percent Column percent Total percent


Chi-square 2 13.042 0.0019

Video #12 In-Class Practice Problems

1. The US Senate recently considered a controversial amendment for school prayer.

The amendment did not get the required two-thirds majority, but the results of the vote are interesting when viewed in terms of the party affiliation of the senators. Does the vote on the prayer amendment (var2: 1=yes, 2=no) differ by political party (var1: 1=demo, 2=rep). Test at the .05 level.

Contingency table results: Rows: var1 Columns: var2

a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Alternative hypothesis in sentence form. d. χ2

calculated = e. Level of significance (p) = ) f. Circle: reject null or fail to reject null g. Write your conclusion in sentence form. (1 pt)

1 2 Total

1

19 42.22% 33.93%

19%

2657.78%59.09%

26%

45100.00%

45%45%

2

37 67.27% 66.07%

37%

1832.73%40.91%

18%

55100.00%

55%55%

Total

56 56%

100.00% 56%

4444%

100.00%44%

100100.00%100.00%100.00%


Chi-square 1 6.3032928 0.0121

2. A stats instructor would like to know whether it is worthwhile to require students to do weekly

homework assignments. For one section of the course, homework is assigned, collected and graded each week. For the second section, the same problems are recommended but not required. At the end of the semester, all students complete the same final exam. Letter grades (A, B, C, D, F) are tabulated for each student by section. Do these data indicate significant grade differences for students with homework versus no homework? Test at the .05 level.

Contingency table results: Rows: var1 Columns: var2

a. Independent Variable = Scale (circle): Categorical Quantitative b. Dependent Variable = Scale (circle): Categorical Quantitative c. Alternative hypothesis in sentence form. d. χ2

calculated = e. Level of significance (p) = f. Circle: reject null or fail to reject null g. Write your conclusion in sentence form. (1 pt)

1 2 3 4 5 Total

1

630%

66.67%14.29%

525%50%

11.9%

525%

45.45%11.9%

2 10%

28.57% 4.762%

210%40%

4.762%

20100.00%

47.62%47.62%

2

313.64%33.33%7.143%

522.73%

50%11.9%

627.27%54.55%14.29%

5 22.73% 71.43%

11.9%

313.64%

60%7.143%

22100.00%

52.38%52.38%

Total

921.43%

100.00%21.43%

1023.81%

100.00%23.81%

1126.19%

100.00%26.19%

7 16.67%

100.00% 16.67%

511.9%

100.00%11.9%

42100.00%100.00%100.00%


Chi-square 4 2.4870248 0.647

Additional Practice: Interpreting Research Articles


Researchers surveyed 120 college sophomores and juniors enrolled in general education psychology courses. Participants were between the ages of 18 and 23 and completed a survey that measured class absenteeism (cutting class) in the past month (for no valid reason) and seven negative behaviors and two positive behaviors--all measured using yes/no response. Negative behaviors included: speeding, slapped/hit someone, getting drunk, breaking the law, telling a significant lie, thinking about dropping out of school, feeling depressed, getting a tattoo, piercing body. Positive behaviors were reading a book that wasn’t required for class and visiting family. Table 1. Number and percentage of students answering “yes” to behaviors by groups of students who have cut class (n=68) and not cut class (n=52)

Cutting Not Cutting Behavior N % N %

χ2

Getting drunk 59 87 24 46 22.79** Speeding 63 93 39 75 7.19* Breaking law 35 51 10 19 13.07** Telling significant lie 14 21 8 15 0.53 Thoughts of dropping out 8 12 3 6 0.79 Feeling depressed 7 10 5 10 0.02 Hitting/ slapping 8 12 11 21 1.95 Getting tattoo 12 19 4 8 3.16 Piercing body 18 26 7 13 3.17 Reading a non-required book 25 37 15 29 0.83 Visiting family 62 91 40 77 4.61* Note: * p<.05, ** p<.002 Source: Trice, A.D. , Holland, S. A., & Gagne, P.E. (2000). Voluntary class absences and other behaviors in college students: An exploratory analysis. Psychological Reports, 87, 179-182.

1. What percentage of students who did not cut class report reading a non-required book? 2. Is the difference in frequencies for speeding significant for the two groups? Explain. 3. Write the null hypothesis for group differences in getting drunk. 4. Should the null hypothesis you wrote for item 3 be rejected? Explain. 5. What can you conclude about students who cut class and get drunk? Answers: 1) 29%; 2) yes, • 2 =7.10, p<.05; 3) Students who cut class will NOT significantly differ in the behavior of getting drunk from students who do not cut class; 4) The null should be rejected since • 2 =22.79, p<.002; 5) Students who cut class are more likely to get drunk and vice versa.

Statistical Test Grid

Independent Variable

Categorical

Quantitative

Categorical

Chi Square Test of Independence

1

Quantitative

t test (2) Single Sample Independent Samples Related Samples ANOVA (3+)

2

Pearson Correlation (relate) Regression (predict)

3

Dep

ende

nt V

ariable

Overview Items 1. Does disability category (LD, EBD, none, etc.) differ by gender? 2. Does gender effect GRE scores? 3. Are GRE scores related to graduate GPA? 4. Does SES (low, middle, high) effect reading preparedness (as measured by a test)

among preschoolers? 5. Does a seminar on self-esteem increase self-esteem scores? (Self-esteem was

measure before and after the seminar) 6. Does learning style type differ by hand preference? 7. Do ACT scores predict college freshman GPA? 8. Do BGSU’s GRE scores for entering graduate students significantly differ from the

population norm? 9. Does a reading intervention significantly increase 4th grade reading proficiency

scores? Note: one group receives intervention, while another group receives traditional instruction.

10. Does foot size (small, medium, large) effect IQ?

Packet Spring 09 Stats

Documents

Transcript of Packet Spring 09 Stats