November 15. In Chapter 12: 12.1 Paired and Independent Samples 12.2 Exploratory and Descriptive...

Post on 29-Jan-2016

217 views 0 download

Tags:

Transcript of November 15. In Chapter 12: 12.1 Paired and Independent Samples 12.2 Exploratory and Descriptive...

Apr 22, 2023

Chapter 12: Chapter 12: Comparing Independent MeansComparing Independent Means

In Chapter 12:

12.1 Paired and Independent Samples

12.2 Exploratory and Descriptive Statistics

12.3 Inference About the Mean Difference

12.4 Equal Variance t Procedure (Optional)

12.5 Conditions for Inference

12.6 Sample Size and Power

Sample Types (for Comparing Means)

• Single sample. One group; no concurrent control group, comparisons made to external population (Ch 11)

• Paired samples. Two samples w/ each data point in one sample uniquely matched to a point in the other; analyze within-pair differences (Ch 11)

• Two independent samples. Two separate groups; no matching or pairing; compare separate groups

Quantitative outcome

One sample§11.1 – §11.4

Two samples

Paired samples§11.5

Independent samplesChapter 12

What Type of Sample?

1. Measure vitamin content in loaves of bread and see if the average meets national standards

2. Compare vitamin content of bread loaves immediately after baking versus values in same loaves 3 days later

3. Compare vitamin content of bread immediately after baking versus loaves that have been on shelf for 3 days

Answers:1 = single sample2 = paired samples3 = independent samples

Illustrative Example: Cholesterol and Type A & B Personality

Group 1 (Type A personality): 233, 291, 312, 250, 246, 197, 268, 224, 239, 239, 254, 276, 234, 181, 248, 252, 202, 218, 212, 325

Group 2 (Type B personality): 344, 185, 263, 246, 224, 212, 188, 250, 148, 169, 226, 175, 242, 252, 153, 183, 137, 202, 194, 213

Do fasting cholesterol levels differ in Type A and Type B personality men? Data (mg/dl) are a subset from the Western Collaborative Group Study*

* Data set is documented on p. 49 in the text.

SPSS Data Table

• One column for the response variable (chol)

• One column for the explanatory variable (group)

§12.2: Exploratory & Descriptive Methods

• Start with EDA • Compare group

shapes, locations and spreads

• Examples of applicable techniquesSide-by-side stemplots

(right)Side-by-side boxplots

(next slide)

Group 1 | | Group 2-------------------- |1t|3 |1f|45 |1s|67 98|1.|8889 110|2*|011 33332|2t|22 55544|2f|4455 76|2s|6 9|2.| 21|3*| |3t| |3f|4 (×100)

Side-by-Side Boxplots

2020N =

GROUP

21

Cho

lest

erol

(m

g/dl

)

400

300

200

100

21

20

Interpretation :• Location:

group 1 > group 2

• Spreads: group 1 < group 2

• Shapes: Both fairly symmetrical, outside values in each; no major departures from Normality

Summary Statistics

Group n mean std dev

1 20 245.05 36.64

2 20 210.30 48.34

§12.3 Inference About Mean Difference (Notation)

Parameters (population)

Group 1 N1 µ1 σ1

Group 2 N2 µ2 σ2

Statistics (sample)

Group 1 n1 s1

Group 2 n2 s2

1x

2x

2121 ofestimator point theis xx

Standard Error of Mean Difference

2

22

1

21

21 n

s

n

sSE xx

? ofestimator an as is precise How 2121 xx

Standard error of the mean difference

There are two ways to estimate the degrees of freedom for this SE:

• dfWelch = formula on p. 244 [calculate w/ computer]

• dfconservative = the smaller of (n1 – 1) or (n2 – 1)

SPSS) (via 4.35

563.1320

340.48

20

638.36 22

21

Welsch

xx

df

SE

For the cholesterol comparison data:

dfconservative = smaller of (n1–1) or (n2 – 1) = 20 – 1 = 19

Confidence Interval for µ1–µ2

(1−α)100% confidence interval for µ1 – µ2=

))(()(2121,21 xxdf SEtxx

mg/dL 63.1) to(6.4

38.2875.34

)13.563)(093.2()30.21005.245())(()(

slide)(prior 19 and 563.13

21

21

975,.1921

conserv

xx

xx

SEtxx

dfSE

For the cholesterol comparison data:

Comparison of CI Formulas )*)((estimate)(point SEt

Type of sample

point estimate

df for t* SE

single n – 1

paired n – 1

independent smaller of n1−1 or n2−121 xx

dx

x n

s

n

sd

2

22

1

21

n

s

n

s

Hypothesis TestA. Hypotheses.

H0: μ1 = μ2 against Ha: μ1 ≠ μ2 (two-sided)

[Ha: μ1 > μ2 (right-sided) Ha: μ1 < μ2 (left-sided) ]B. Test statistic.

C. P-value. Convert the tstat to P-value with t table or software. Interpret.

D. Significance level (optional). Compare P to prior specified α level.

slide) previous (described or

where)(

conservWelch

2

22

1

2121

stat 21

21

dfdf

n

s

n

sSE

SE

xxt xx

xx

Hypothesis Test – ExampleA. Hypotheses. H0: μ1 = μ2 vs. Ha: μ1 ≠ μ2

B. Test stat. In prior analyses we calculated sample mean difference = 34.75 mg/dL, SE = 13.563 and dfconserv = 19.

C. P-value. P = 0.019 → good evidence against H0 (“significant difference”).

D. Significance level (optional). The evidence against H0 is significant at α = 0.02 but not at α = 0.01.

dfSE

xxt

xx

19 with 2.5613.563

34.75

)(

21

21stat

Equal variance t procedure (§12.4)

Preferred method (§12.3)

SPSS Output

12.4 Equal Variance t Procedure (Optional)

• Also called pooled variance t procedure

• Not as robust as prior method, but…

• Historically important

• Calculated by software programs

• Leads to advanced ANOVA techniques

We start by calculating this pooled estimate of variance

1

and groupin variance theis

where

))(())((

2

21

222

2112

ii

i

pooled

ndf

is

dfdf

sdfsdfs

Pooled variance procedure

• The pooled variance is used to calculate this standard error estimate:

• Confidence Interval

• Test statistic

• All with df = df1 + df2 = (n1−1) + (n2−1)

11

21

221

nn

sSE pooledxx

))(()(2121,21 xxdf SEtxx

)(

21

21stat

xxSE

xxt

Pooled Variance t Confidence Interval

38)120()120(

56.1320

1

20

11839.623

21

df

SE xx

62.14) (7.36,39.2775.34

)13.56)(02.2()30.21005.245(

))(()(for CI %9521975,.382121

xxSEtxx

Group ni si xbari

1 20 36.64 245.05

2 20 48.34 210.30

Data

Pooled Variance t Test

38)120()120(

56.1320

1

20

11839.623

21

df

SE xx

015.0

38 2.56; 56.13

75.34

:

21

21stat

210

P

dfSE

xxt

H

xx

Data:

Group ni si xbari

1 20 36.64 245.05

2 20 48.34 210.30

§12.5 Conditions for InferenceConditions required for t procedures:

“Validity conditions”

a. Good information (no information bias)

b. Good sample (“no selection bias”)

c. “No confounding”

“Sampling conditions”

a. Independence

b. Normal sampling distribution (§9.5, §11.6)