Chapter 4: More about Relationships Between Two Variables.

102
Chapter 4: More about Relationships Between Two Variables

Transcript of Chapter 4: More about Relationships Between Two Variables.

Page 1: Chapter 4: More about Relationships Between Two Variables.

Chapter 4: More about Relationships Between Two Variables

Page 2: Chapter 4: More about Relationships Between Two Variables.

4.1 – Transforming to Achieve LinearityExponential Growth

Page 3: Chapter 4: More about Relationships Between Two Variables.

Not all data can be expressed with a linear model.

Page 4: Chapter 4: More about Relationships Between Two Variables.

PROBLEM! We cannot use least-squares regression for nonlinear data because least-squares regression depends upon correlation, which only measures the strength of linear relationships.

SOLUTION! Transform the data into a linear set, then use the least-squares regression to determine the best fitting line for the transformed data. Finally, do a reverse transformation equation which will model our original nonlinear data.

Page 5: Chapter 4: More about Relationships Between Two Variables.

Properties of Logarithms

1. log ab = log a + log b

2. log = log a – log bab

3. log xp = p log x

Remember: log has a base of 10 and natural logs (ln) have a base of e. It doesn’t matter which one you use.

Page 6: Chapter 4: More about Relationships Between Two Variables.

Linearizing Exponential Functions:

We want to write an exponential function of the form y = abx as a linear model. (where x, y are variables and a,b are constants)

y = abx

log y = log (abx)

log y = log a + log bx

log y = log a + xlog b

(x, log y)(x, y)

Page 7: Chapter 4: More about Relationships Between Two Variables.

CONCLUSIONS:

1. If the graph of (x, y) is exponential, then the graph of (x, log y) is linear.

2. If the graph of (x, log y) is linear, then the graph of (x, y) is exponential.

Page 8: Chapter 4: More about Relationships Between Two Variables.

Example #1Transform the exponential data to a linear model using logs and then natural logs.

y = 5(2)x

log y = log (5 2x)

log y = log 5 + log 2x

log y = log 5 + xlog 2

log y = 0.69897 + 0.3010x

ln y = ln (5 2x)

ln y = ln 5 + log 2x

ln y = ln 5 + xln 2

ln y = 1.6094 + 0.6931x

Page 9: Chapter 4: More about Relationships Between Two Variables.

Example #2

Convert the equation back to an exponential function.

ln y = 16 + 9x

y = e(16 + 9x)

e e

y = e(16) e(9x)

y = e(16) e(9)x

y = 8,886,110.521 8103.0839x

Page 10: Chapter 4: More about Relationships Between Two Variables.

Example #3

Convert the equation back to an exponential function.

log y = 4 + 2x

y = 10(4 + 2x)

10 10

y = 10(4) 10(2x)

y = 10(4) 10(2)x

y = 10,000 100x

Page 11: Chapter 4: More about Relationships Between Two Variables.

Calculator Tip: Exponential Functions

L1: xL2: y

L3: leave blank for now!

L4: log y

LinReg(L1, L4, Y1) - (x, log y, Y1)

To prevent Overload error: convert years to a smaller number

Page 12: Chapter 4: More about Relationships Between Two Variables.

Calculator Tip: Residual Plot

After calculating the line of regression:In Lists!

Page 13: Chapter 4: More about Relationships Between Two Variables.

ExpReg(L1, L2, Y2) - (x, y, Y2)

Calculator Tip: Exponential Equation

Page 14: Chapter 4: More about Relationships Between Two Variables.

Exponential to Linear Change:

1. The ratio of the y’s should be fairly constant

2. Graph x and y and look at the pattern

3. Calculate the transformed linear model

4. Describe the r value and the residual plot

Page 15: Chapter 4: More about Relationships Between Two Variables.

Example#4: Consider the following data representing the population for Asian and Pacific Islander.

Year 1950 1960 1970 1980 1990 2000

Population (in thousands)

1131 1620 2320 3330 4770 6850

1. Make a scatterplot of the data and describe the graph.

Page 16: Chapter 4: More about Relationships Between Two Variables.

D: Positive, as year increases, population increases

F: Nonlinear

S: Strong

Page 17: Chapter 4: More about Relationships Between Two Variables.

2. Describe the pattern of change and find the percent of change for each y (ratio of y’s).

4324.11131

1620 4321.1

1620

2320 4353.1

2320

3330

432.13330

4770 436.1

4770

6850

Year 1950 1960 1970 1980 1990 2000

Population (in thousands)

1131 1620 2320 3330 4770 6850

The ratios of the y’s are fairly consistent, suggesting an exponential model

Page 18: Chapter 4: More about Relationships Between Two Variables.

3. Find r and describe its meaning

r = 0.968

D: Positive

S: Strong

Page 19: Chapter 4: More about Relationships Between Two Variables.

4. Graph and comment on the residual plot for x and y.

Curve, not a good linear model

Page 20: Chapter 4: More about Relationships Between Two Variables.

5. Take the log of the y-values and make a new scatterplot.

D: Positive

F: Linear

S: Strong

D: Positive

F: Nonlinear

S: Strong

Page 21: Chapter 4: More about Relationships Between Two Variables.

6. Find the least squares regression line of the transformed data.

Log(Population) = 2.27095 + 0.0156432(Year)

Page 22: Chapter 4: More about Relationships Between Two Variables.

7. Find the value of r and describe its meaning.

r = 0.999999

D: Positive

S: Strong

3. Find r and describe its meaning

r = 0.968

D: Positive

S: Strong

Page 23: Chapter 4: More about Relationships Between Two Variables.

8. Construct the residual plot and describe its meaning.

No pattern, so good linear model

4. Graph and comment on the residual plot for x and y.

Curve, not a good linear model

Page 24: Chapter 4: More about Relationships Between Two Variables.

9. Perform the inverse transformation to express y-hat as an exponential equation.

y = 10(2.27095 + 0.0156432x)

10 10

y = 10(2.27095) 10(0.0156432x)

y = 10(2.27095) 10(0.0156432)x

y = 186.6162 1.0367x

xy 0156432.027095.2ˆlog

Page 25: Chapter 4: More about Relationships Between Two Variables.

10. Check your work on your calculator using ExpReg.

Page 26: Chapter 4: More about Relationships Between Two Variables.

11. Make a prediction for the population in 2010 using both equations.

log y = 2.27095 + 0.0156432(110)

log y = 3.991697

y = 9810.6342

10 10

y = 186.6162 1.0367x

y = 186.6162 1.0367(110)

y = 9,810.6342

Page 27: Chapter 4: More about Relationships Between Two Variables.

Example#5: Consider the following data representing an account balance over time:

1. Make a scatterplot of the data and describe the graph.

x: time (months)

0 48 96 144 192 240

y: account balance ($)

100 161.22 259.93 419.06 675.62 1089.30

Page 28: Chapter 4: More about Relationships Between Two Variables.

D: Positive, as time increases, account balance increases

F: Nonlinear

S: Strong

Page 29: Chapter 4: More about Relationships Between Two Variables.

2. Describe the pattern of change and find the percent of change for each y (ratio of y’s).

6122.1100

22.161 6123.1

22.161

93.259 6122.1

93.259

06.419

612.106.419

62.675 6123.1

62.675

30.1089

x: time (months)

0 48 96 144 192 240

y: account balance ($)

100 161.22 259.93 419.06 675.62 1089.30

Page 30: Chapter 4: More about Relationships Between Two Variables.

3. Find r and describe its meaning

r = 0.9481

D: Positive

S: Strong

Page 31: Chapter 4: More about Relationships Between Two Variables.

4. Graph and comment on the residual plot for x and y.

Curved, not good linear model

Page 32: Chapter 4: More about Relationships Between Two Variables.

5. Take the natural log of the y-values and make a new scatterplot.

D: Positive

F: Nonlinear

S: Strong

D: Positive

F: Linear

S: Strong

Page 33: Chapter 4: More about Relationships Between Two Variables.

6. Find the least squares regression line of the transformed data.

ln(Account Balance) = 4.60516 + 0.00995047(Months)

Page 34: Chapter 4: More about Relationships Between Two Variables.

7. Find r and describe its meaning.

r = 0.999999

D: Positive

S: Strong

3. Find r and describe its meaning

r = 0.9481

D: Positive

S: Strong

Page 35: Chapter 4: More about Relationships Between Two Variables.

8. Construct the residual plot and describe its meaning.

No pattern, so good linear modelCurved, not good

linear model

4. Graph and comment on the residual plot for x and y.

Page 36: Chapter 4: More about Relationships Between Two Variables.

9. Perform the inverse transformation to express y-hat as an exponential equation.

y = e(4.60516 + 0.00995047x)

e e

y = e(4.60516) e (0.00995047x)

y = e(4.60516) e (0.00995047)x

y = 99.9988 1.01x

xy 0.00995047 4.60516ˆln

Page 37: Chapter 4: More about Relationships Between Two Variables.

10. Check your work on your calculator using ExpReg.

Page 38: Chapter 4: More about Relationships Between Two Variables.

11. Make a prediction for the account balance in 60 months using both equations.

ln y = 4.60516 + 0.00995047(60)

ln y = 5.20218656728

y = $181.67

e e

xy 0.00995047 4.60516ˆln y = 99.9988 1.01x

y = 99.9988 1.01(60)

y = $181.67

Page 39: Chapter 4: More about Relationships Between Two Variables.

4.1 – Transforming to Achieve Linearity – Power Model

Page 40: Chapter 4: More about Relationships Between Two Variables.

A power model is in the form y = axp. To transform this equation into a linear model you must apply the log transformation to both variables x and y.

y = axp

log y = log (axp)

log y = log a + log xp

log y = log a + plog x

How is this different than exponential functions?

You have to take the log of both x and y to make a linear model.

Page 41: Chapter 4: More about Relationships Between Two Variables.

Example #6Find the LSRL by taking the logs and then the natural logs.

y = 4x5

log y = log (4x5)

log y = log 4 + log x5

log y = log 4 + 5log x

log y = 0.6021 + 5log x

y = 4x5

ln y = ln (4x5)

ln y = ln 4 + ln x5

ln y = ln 4 + 5ln x

ln y = 1.3863 + 5ln x

Page 42: Chapter 4: More about Relationships Between Two Variables.

Example #7

Convert the equation back to a power equation.

ln y = -5 + 9ln x

y = e(-5 + 9lnx)

e e

y = e(-5) e(9lnx)

y = e(-5) e(lnx)9

y = 0.0067x9

Page 43: Chapter 4: More about Relationships Between Two Variables.

Example #8

Convert the equation back to a power equation.

log y = 0.5 + 2log x

y = 10(0.5 + 2logx)

10 10

y = 10(0.5) 10(2logx)

y = 10(0.5) 10(logx)2

y = 3.1623x2

Page 44: Chapter 4: More about Relationships Between Two Variables.

Calculator Tip: Power Functions

L1: xL2: y

L3: log x

L4: log y

LinReg(L3, L4, Y1) - (log x, log y, Y1)

Page 45: Chapter 4: More about Relationships Between Two Variables.

PwrReg(L1, L2, Y2) - (x, y, Y2)

Calculator Tip: Power Equation

Page 46: Chapter 4: More about Relationships Between Two Variables.

Example #9

The distances from our sun and the periods of the 9 planets in the solar system are given below.

Distance (astronomical units)

.39 .72 1 1.5 5.2 9.5 19 30 40

Period (earth years)

.24 .62 1 1.9 12 29 84 160 250

1. Make a scatterplot of the data and describe the graph.

Page 47: Chapter 4: More about Relationships Between Two Variables.

D: Positive, as distance increases, period increases

F: Nonlinear

S: Strong

Page 48: Chapter 4: More about Relationships Between Two Variables.

Distance (astronomical units)

.39 .72 1 1.5 5.2 9.5 19 30 40

Period (earth years)

.24 .62 1 1.9 12 29 84 160 250

2. Describe the pattern of change and find the percent of change for each y (ratio of y’s).

583.224.0

62.0 61.1

62.0

1 9.1

1

9.1 316.6

9.1

12

42.212

29 89.2

29

84 90.1

84

160 56.1

160

250

Ratio of y’s are not similar, perhaps not exponential

Page 49: Chapter 4: More about Relationships Between Two Variables.

3. Find r and describe its meaning

r = 0.9779

D: Positive

S: Strong

Page 50: Chapter 4: More about Relationships Between Two Variables.

4. Graph an exponential model and discuss if it is appropriate to use this model.

Curved, not good linear model

Page 51: Chapter 4: More about Relationships Between Two Variables.

5. Transform the data to a linear model by taking the log of the x’s and the y’s. Make a sketch of the new scatterplot.

D: Positive

F: Nonlinear

S: Strong

1. Make a scatterplot of the data and describe the graph.

D: Positive

F: Linear

S: Strong

Page 52: Chapter 4: More about Relationships Between Two Variables.

6. Find the least squares regression line of the transformed data.

log(Period) = 0.002916 + 1.49627log(Distance)

Page 53: Chapter 4: More about Relationships Between Two Variables.

7. Find the value of r and describe its meaning.

r = 0.9999765

D: Positive

S: Strong

3. Find r and describe its meaning

r = 0.9779

D: Positive

S: Strong

Page 54: Chapter 4: More about Relationships Between Two Variables.

8. Construct the residual plot and describe its meaning.

No pattern, so good linear model

Page 55: Chapter 4: More about Relationships Between Two Variables.

9. Perform the inverse transformation to express y-hat as an exponential equation.

y = 10(0.002916+ 1.49627logx)

10 10

y = 10(0.002916) 10(1.49627logx)

y = 10(0.002916) 10(logx)1.49627

y = 1.0067x1.49627

) log1.49627( 0.002916 ˆlog xy

Page 56: Chapter 4: More about Relationships Between Two Variables.

10. Check your work on your calculator using PwrReg.

Page 57: Chapter 4: More about Relationships Between Two Variables.

11. If a planet were discovered 35 astronomical units from our sun, predict its period using both equations.

) log1.49627( 0.002916 ˆlog xy

)35 log1.49627( 0.002916 ˆlog y

3723.52 ˆlog y10 10

709.205 ˆ y

y = 1.0067x1.49627

y = 1.0067(35)1.49627

y = 205.709

Page 58: Chapter 4: More about Relationships Between Two Variables.

How do you determine if the model is exponential or power?

1. Graph the original data. Do you see a curve?

2. Look for the ratio of the y values to see if maybe exponential

3. Take the logs of both x and y. Then graph (x, log y) and (log x, log y). Which graph looks more linear?

4. Use the r value and the residual plot to determine the strength of the linear relationship.

Page 59: Chapter 4: More about Relationships Between Two Variables.

Example #10An experiment was conducted to determine the effect of practice time (in seconds) on the percent of unfamiliar words recalled. Here is a Fathom scatterplot of the results with a least-squares regression line superimposed.

(a) Sketch a residual plot below.

Page 60: Chapter 4: More about Relationships Between Two Variables.

(b) Does a linear model fit the data well? Justify your answer.

No, the residual plot has a curve in it, so it isn’t linear

Page 61: Chapter 4: More about Relationships Between Two Variables.

We used Fathom to transform the original data in hopes of achieving linearity. The screen shots below show the results of two different transformations.

(c) Would an exponential model or a power model fit the original data better? Justify your answer.

Power, Stronger r value and residual plot is not as curved

Page 62: Chapter 4: More about Relationships Between Two Variables.

(d) Use the model you chose in (c) to predict word recall for 25 seconds of practice. Show your method.

) ln0.293( 3.48 ˆln xy

4.42313 ˆln ye e

3568.83 ˆ y

)25 ln0.293( 3.48 ˆln y

Page 63: Chapter 4: More about Relationships Between Two Variables.

Example #11

Foresters are interested in predicting the amount of usable lumber they can harvest from various tree species. The following data have been collected on the diameter of Ponderosa pine trees, measured at chest height, and the yield in board feet. Note that a board foot is defined as a piece of lumber 12 inches by 12 inches by 1 inch. Determine if an exponential or power model would make a better model. Support your reasoning. Using the model you have chosen, predict the yield in board feet from a diameter of 40.

Page 64: Chapter 4: More about Relationships Between Two Variables.

Diameter Bd Feet

36 192

28 113

28 88

41 294

19 28

32 123

22 51

38 252

25 56

17 16

31 141

20 32

25 86

19 21

39 231

33 187

17 22

37 205

23 57

39 265

1. Graph the original data. Do you see a curve?

yes

Page 65: Chapter 4: More about Relationships Between Two Variables.

2. Look for the ratio of the y values to see if maybe exponential

Diameter Bd Feet

36 192

28 113

28 88

41 294

19 28

32 123

22 51

38 252

25 56

17 16

31 141

20 32

25 86

19 21

39 231

33 187

17 22

37 205

23 57

39 265

no

Page 66: Chapter 4: More about Relationships Between Two Variables.

3. Take the logs of both x and y. Then graph (x, log y) and (log x, log y). Which graph looks more linear?

Power is more linear

Page 67: Chapter 4: More about Relationships Between Two Variables.

4. Use the r value and the residual plot to determine the strength of the linear relationship.

r = 0.9751 r = 0.9880

Power has a stronger r value and doesn’t have a curve in the residual plot, therefore, it is a power model.

Page 68: Chapter 4: More about Relationships Between Two Variables.

Using the model you have chosen, predict the yield in board feet from a diameter of 40.

) ln3.13667( 5.9157- ˆln xy

5.6551 ˆln ye e

7443.285 ˆ y

)40 ln3.13667( 5.9157- ˆln y

Page 69: Chapter 4: More about Relationships Between Two Variables.

4.2 – Relationship between Categorical Variables

Page 70: Chapter 4: More about Relationships Between Two Variables.

http://www.ruf.rice.edu/~lane/stat_sim/transformations/index.html

Page 71: Chapter 4: More about Relationships Between Two Variables.

Because we cannot perform direct calculation on categorical data, we use the counts or percents of individuals by category.

Two-Way Table: Classifies categorical data according to two variables.

Marginal Distribution: The total of each margin, column and row.

Conditional Distribution: Distribution of one variable for given categories of another variable.

Page 72: Chapter 4: More about Relationships Between Two Variables.

Segmented bar graph:

The following segmented bar graph represents the conditional distributions of living arrangements for each race category:

Adds up conditional probabilities to 100% based on categories

Page 73: Chapter 4: More about Relationships Between Two Variables.

Example #12In a national survey of adult Americans in 1998, people were asked to indicate their age and to classify their interest in politics as very much, somewhat, or not much. The ages were grouped in ranges.

18-35 36-55 56-94

Not Much 146 146 89

Somewhat 192 260 154

Very Much 47 125 106

a. Calculate the row and column totals.

385 531 349

381606

278

1265

Page 74: Chapter 4: More about Relationships Between Two Variables.

b. What proportion of the survey respondents were between ages 18 and 35?

1265

385

18-35 36-55 56-94

Not Much 146 146 89

Somewhat 192 260 154

Very Much 47 125 106385 531 349

381606

2781265

= 0.3043

Page 75: Chapter 4: More about Relationships Between Two Variables.

18-35 36-55 56-94

Not Much 146 146 89

Somewhat 192 260 154

Very Much 47 125 106385 531 349

381606

2781265

c. What proportion of the survey respondents were between 36 and 55?

1265

531= 0.41976

Page 76: Chapter 4: More about Relationships Between Two Variables.

18-35 36-55 56-94

Not Much 146 146 89

Somewhat 192 260 154

Very Much 47 125 106385 531 349

381606

2781265

d. What proportion of the survey respondents were between 56 and 94?

1265

349= 0.27589

Page 77: Chapter 4: More about Relationships Between Two Variables.

18-35 36-55 56-94

Not Much 146 146 89

Somewhat 192 260 154

Very Much 47 125 106385 531 349

381606

2781265

e. Restrict your attention (for the moment) to just the respondents under 35 years of age. What proportion of these young respondents classify themselves as having not much interest in politics?

385

146= 0.3792

Page 78: Chapter 4: More about Relationships Between Two Variables.

18-35 36-55 56-94

Not Much 146 146 89

Somewhat 192 260 154

Very Much 47 125 106385 531 349

381606

2781265

f. What proportion of the young respondents classify themselves as somewhat interested in politics?

385

192= 0.4987

Page 79: Chapter 4: More about Relationships Between Two Variables.

18-35 36-55 56-94

Not Much 146 146 89

Somewhat 192 260 154

Very Much 47 125 106385 531 349

381606

2781265

g. What proportion of the young respondents classify themselves as having very much interest in politics?

385

47= 0.1221

Page 80: Chapter 4: More about Relationships Between Two Variables.

h. Record the conditional distribution that you have just calculated in the table below.

18-35 36-55 56-94

Not Much .2749 .2550

Somewhat

Very Much

Total 1.000 1.000 1.000

0.3792

0.4987

0.1221

0.4896

0.2354

0.4413

0.3037

Page 81: Chapter 4: More about Relationships Between Two Variables.

i. Construct a segmented bar graph

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

18-35 36-55 56-94

Age Category

Per

cen

tag

e

Very Much

Somewhat

Not Much

Page 82: Chapter 4: More about Relationships Between Two Variables.

Example #13The University of CA at Berkeley was charged with having discriminated against women in their graduate admissions process for the fall quarter of 1973. The table below identifies the number of acceptances and denials for both men and women applicants in each of the six largest graduate programs at the institution at that time.

Men Accepted

Men Denied

Women Accepted

Women denied

Program A 511 314 89 19

Program B 352 208 17 8

Program C 120 205 202 391

Program D 137 270 132 243

Program E 53 138 95 298

Program F 22 351 24 317

total 1195 1486 559 1276

Page 83: Chapter 4: More about Relationships Between Two Variables.

a. Start by ignoring the program distinction, collapsing the data into a two-way table of gender by admissions status. To do this, find the total number of men accepted and denied and the total number of women accepted and denied. Fill in the table below:

Accepted Denied Total

Men

Women

Total

1195 1486

559 1276

1754 2762 4516

1835

2681

Page 84: Chapter 4: More about Relationships Between Two Variables.

b. Consider for the moment just the men applicants. Of the men who applied to one of these programs, what proportion were accepted? Now consider the women applicants; what proportion of them were accepted? Do these proportions seem to support the claim that men were given preferential treatment in admissions decisions?

2681

1195= 0.4457

1835

559= 0.3046

MEN WOMEN

Page 85: Chapter 4: More about Relationships Between Two Variables.

c. To try to isolate the program responsible for the alleged mistreatment of women applicants, calculate the proportion of men and the proportion of women within each program who were accepted. Record your results in the table below:

Proportion of men Accepted

Proportion of women Accepted

Program A

Program B

Program C

Program D

Program E

Program F

511/1195 = 0.4276

352/1195 = 0.2946

120/1195 = 0.1004

137/1195 = 0.1146

53/1195 = 0.0444

22/1195 = 0.0184

89/559 = 0.1592

17/559 = 0.0304

202/559 = 0.3614

132/559 = 0.2361

95/559 = 0.1699

24/559 = 0.0429

Page 86: Chapter 4: More about Relationships Between Two Variables.

d. Does it seem as if any program is responsible for the large discrepancy between women in the overall proportions admitted?

Yes, program A and program B accepted less women than men.

Page 87: Chapter 4: More about Relationships Between Two Variables.

Example #14:The following two-way table classifies hypothetical hospital patients according to the hospital that treated them and whether they survived or died:

Survived Died Total

Hospital A 800 200 1000

Hospital B 900 100 1000

a. Calculate the proportion of hospital A’s patients who survived and the proportion of hospital B’s patients who survived. Which hospital saved the higher percentage of its patients?

1000

800= 0.80

1000

900= 0.90

Hospital A Hospital B

Page 88: Chapter 4: More about Relationships Between Two Variables.

Suppose that when we further categorize each patient according to whether they were in fair condition or poor condition prior to treatment we obtain the following two-way table:

FAIR CONDITION Survived Died Total

Hospital A 590 10 600

Hospital B 870 30 900

POOR CONDITION Survived Died Total

Hospital A 210 190 400

Hospital B 30 70 100

Page 89: Chapter 4: More about Relationships Between Two Variables.

b. Among those who were in fair condition, compare the recovery rates for the two hospitals. Which hospital saved the greater percentage of its patients who had been in fair condition?

FAIR CONDITION Survived Died Total

Hospital A 590 10 600

Hospital B 870 30 900

POOR CONDITION Survived Died Total

Hospital A 210 190 400

Hospital B 30 70 100

600

590= 0.9833

900

870= 0.9667

Hospital A Hospital B

Page 90: Chapter 4: More about Relationships Between Two Variables.

c. Among those who were in poor condition, compare the recovery rates for the two hospitals. Which hospital saved the greater percentage of its patients who had been in poor condition?

FAIR CONDITION Survived Died Total

Hospital A 590 10 600

Hospital B 870 30 900

POOR CONDITION Survived Died Total

Hospital A 210 190 400

Hospital B 30 70 100

400

210= 0.525

100

30= 0.3

Hospital A Hospital B

Page 91: Chapter 4: More about Relationships Between Two Variables.

Simpson’s Paradox:

When you combine data sometimes it reverses the direction of the relationship in the individual pieces.

Page 92: Chapter 4: More about Relationships Between Two Variables.

d. Write a few sentences explaining (arguing from the given data given) how it happens that hospital B has the higher recovery rate overall, yet hospital A has the higher recovery rate for each type of patient.

e. Which hospital would you rather go to if you were ill? Explain.

Page 93: Chapter 4: More about Relationships Between Two Variables.

4.3 – Establishing Causation

Page 94: Chapter 4: More about Relationships Between Two Variables.

The only time you can determine causation is when you conduct an experiment.

What if you can’t do an experiment?

• Look for a strong, consistent association

• The increase in the explanatory variable leads to a stronger increase in response

• The cause is plausible

Page 95: Chapter 4: More about Relationships Between Two Variables.

x causes y

Page 96: Chapter 4: More about Relationships Between Two Variables.

Seems x causes y, but z has an effect on x and y, making it look like x causes y

“Z is common to both”

Page 97: Chapter 4: More about Relationships Between Two Variables.

Seems x causes y, but z also has an effect on y, making it look like x causes y

Page 98: Chapter 4: More about Relationships Between Two Variables.

Example #15A soccer coach wanted to improve the team's playing ability, so he had them run two miles a day. At the same time the players decided to take vitamins. In two weeks the team was playing noticeably better, but the coach and players did not know whether it was from the running or the vitamins. What type of variable is this?

Confounding.

Running Improve teams ability

vitamins

Page 99: Chapter 4: More about Relationships Between Two Variables.

Example #16An article that appeared in the San Luis Obispo Tribune (November 11, 1999) was titled “Study Points Out Dangerous Side to SUV Popularity: Half of All 1996 Ejection Deaths Occur in SUVs.” This article states that SUV’s have a much higher rate of passengers being thrown from a window during an accident than do automobiles. The article also states that more than half of all deaths caused by ejection involved SUVs – the basis for the conclusion that SUVs are more dangerous than cars. Later in the article, there is a comment that about 98% of those injured or killed in ejection accidents were not wearing seat belts. Comment on the conclusion that SUVs are more dangerous than cars.

Confounding variable

Page 100: Chapter 4: More about Relationships Between Two Variables.

SUV Roll over

Seat belts

Page 101: Chapter 4: More about Relationships Between Two Variables.

Example #17A study showed that households with more TV sets tend to have longer life expectancies. Describe a possible common response relationship.

More TVs Longer life expectancy

More $

COMMON RESPONSE!

Page 102: Chapter 4: More about Relationships Between Two Variables.

Example #18Based on a survey conducted on the DietSmart.com website, investigators concluded that women who regularly watched Oprah were only one-seventh as likely to crave fattening foods as those who watched other daytime talk shows. Is it reasonable to conclude that watching Oprah causes a decrease in cravings for fattening foods? Explain.

NO, Not an experiment!