Categorical Variables, Relative Risk, Odds Ratios STA 220 – Lecture #8 1.

Categorical Variables, Relative Risk, Odds Ratios

STA 220 – Lecture #8

1

Categorical Variables

• The raw data from categorical variables consist of group or category names that don’t necessarily have any ordering– –

• Ordinal variables can be thought of as categorical variables for which the categories have a natural ordering– Could define categories for age, income or years of

education

2


• A study of 479 children found that children who slept either with a nightlight or in a fully lit room before the age of two had a higher incidence of myopia (nearsightedness) later in childhood.

3

Slept with: No Myopia Myopia High Myopia Total

Darkness 155 (90%) 15(9%) 2(1%) 172

Nightlight 153(66%) 72(31%) 7(3%) 232

Full Light 34(45%) 36(48%) 5(7%) 75

Total 342(71%) 123(26%) 14(3%) 479


• Row percents show that the 2 variables are related• Incidence of myopia when the amount of

sleep-time light increases• We can’t conclude that sleeping with light causes

myopia, but we can say that the 2 variables are associated 4

Slept with: No Myopia Myopia High Myopia Total

Darkness 155 (90%) 15(9%) 2(1%) 172

Nightlight 153(66%) 72(31%) 7(3%) 232

Full Light 34(45%) 36(48%) 5(7%) 75

Total 342(71%) 123(26%) 14(3%) 479


• To analyze relationship between 2 categorical variables– Calculate the percents within

either the rows or the columns of the table– percents, like in the myopia table, are

the percents across a row of a contingency table– percents are the percents down a

column of a contingency table and are based on the total number of observations in the column

5


• Sometimes, one variable can be designated as the variable and the other variable as the response variable.

• In these situations it is customary to define the using the categories of the explanatory variable and the using the categories of the response variable

6

Categorical VariablesEver Divorced?

Smoke? Yes No Total

Yes 238 247 485

No 374 810 1184

Total 612 1057 1669

7


• Need to calculate row percents to describe the relationship– Only interested in comparing divorce rates

between smokers and nonsmokers– Of those who smoke• 49% (238/485) have been divorced• 51% (247/485) have not been divorced

– Of those who don’t smoke• 32% (374/1184) have been divorced• 68% (810/1184) have not been divorced

8


Healthy Self-Confidence Attractive Don’t know

Men 76% 16% 7% 1%

Women 74% 20% 4% 2%

9

In a 1997 poll conducted by the Los Angeles Times, 1218 southern California residents were surveyed about their health and fitness habits. One of the questions was “What is the most important reason why you try to take care of your body: Is it mostly because you want to be attractive to others, or mostly because you want to keep healthy, or mostly because it helps your self-confidence, or what?

Notice the pattern of responses is more or less the same for men and women. It seems reasonable to conclude that the response to the question was not related to the gender of the respondent.

Relative Risk

• When a particular outcome is undesirable, researchers and journalists may describe the risk of that outcome

• The that a randomly selected individual within a group falls into the undesirable category is simply the proportion in that category

10groupin number Total

categoryin Number Risk

Relative Risk

• Common to express risk as a percent rather than a proportion

• Suppose that within a group of 200 individuals, asthma affects 24 people

• In this group the risk of asthma is = 0.12, or 12%

11

Relative Risk

• Often want to know how the risk of an outcome relates to an explanatory variable

• Use which is the ratio of the risks in two different categories of an explanatory variable

12

2category in Risk

1category in Risk Risk Relative

Relative Risk

• Relative risk describes the risk in one group as a of the risk in another group

• Suppose that a researcher states that, for those who drive while under the influence of alcohol, the relative risk of an automobile accident is 15– The risk of an accident for those who drive under

the influence is 15 times the risk for those who don’t drive under the influence

13

Relative Risk

• Features of relative risk– When two risks are the same, the relative risk is – When two risks are different, the relative risk is

different from • When the category in the numerator has higher risk,

the relative risk is greater than 1– The risk in the denominator (bottom) of the ratio

is often the baseline risk, which is the risk for the category in which no additional treatment or behavior is present

14

Relative Risk

• Refer back to the smoking and divorce example• To compute the relative risk of divorce for

smokers, we first need to find the risk of divorce in each smoking category– For smokers, the risk of divorce is 238/485 = 0.491 or

about – For nonsmokers, the risk of divorce is 374/1184 =

0.316 or about • This will be considered the be the risk of divorce

15

Relative Risk

• Smoking and Divorce example, cont.– The relative Risk of divorce is the ratio of these

two risks

– In this sample, the risk of divorce for smokers is times the risk of divorce for nonsmokers

16

32%

49% Risk Relative

Percent Increase or Decrease in Risk

• Sometimes an increase or decrease in risk is presented as a percent change instead of a multiple

• The percent increase (or ) in risk can be calculated as follows:

17

%100*1) -risk (relative risk in increasePercent

%100*Risk Baseline

risksin Difference risk in increasePercent

Percent Increase or Decrease in Risk

• Percent Increase in the Risk of Divorce for Smokers– We’ve already calculated the risk– Percent increase in risk = (1.53 - 1)*100%

= – Get the same answer for the percent increase in

risk by using the other formula:

18%53%100*

32

32-49

%100*Risk Baseline

risksin Difference risk in increasePercent

Odds Ratio

• Sometimes counts for the outcomes of a categorical variable are summarized by comparing the of one outcome to another, rather than by comparing one outcome to the total

• The odds of getting a divorce to not getting a divorce for nonsmokers 374 divorced to 810 not divorced, or 0.46 divorced to 1 not divorced

19

Odds Ratio

• For smokers, the odds are 238 divorced to 247 not divorced, of 0.96 to – Note these are approximately even odds

• Odds are expressed using a phrase with the structure “ ” so a ratio is implied but not actually computed

20

Odds Ratio

• The is used to compare the odds of a certain behavior or event within two different groups– May want to compare the odds of success versus

failure for two different treatments of clinical depression

– The odds ratio comparing the odds of divorce for smokers and nonsmokers is

21

1.246.0

96.0

810374

247238

nonsmokersfor divorce of Odds

smokersfor divorce of Odds

Odds Ratio

• The value of an odds ratio stays the same if the roles of the response and explanatory variables are

• If we compared the odds of being a smoker for those who have divorced and those who haven’t the odds ratio is:

22

1.2305.0

636.0

810247

374238

divorcednever for smoking of Odds

divorcedfor smoking of Odds

Summary

Response Variable

Explanatory Variable Category 1 Category 2 Total

Category of Interest A1 A2 TA

Baseline Category B1 B2 TB

2

1

2

1

1

A

21

BB

AA

Ratio Odds

BT

Risk Relative

A toA interest ofcategory for 2) response to1 response (of Odds

interest ofcategory for 1) response (ofRisk

23

Confounding Variables

• A variable is a variable that both affects the response variable and also is related to the explanatory variable. The effect of a confounding variable on the response variable cannot be separated from the effect of the explanatory variable

• The term variable is sometimes used to describe a potential confounding variable that is not measured and is not considered in the interpretation of a study

24

Confounding VariablesBook # Pages Price Book # Pages Price

1 104 32.95 9 417 4.95

2 188 24.95 10 417 39.75

3 220 49.95 11 436 5.95

4 264 79.95 12 458 60.00

5 336 4.50 13 466 49.95

6 342 49.95 14 469 5.99

7 378 4.95 15 585 5.95

8 385 5.99

25

Confounding Variables

26

Confounding Variables• When we look at the relationship between

(the response variable) and (the explanatory variable) for each type of book separately, we see that the price does tend to increase with number of pages, especially for the technical books

• Type of book (confounding variable) price (response variable) and type of book (confounding variable) number of pages (explanatory variable), because hardcover technical books tend to have fewer pages than paperback novels

27

Simpson’s Paradox

• Occasionally, the effect of a confounding factor is strong enough to produce a paradox known as Simpson’s Paradox

• The paradox is that the relationship appears to be in a direction when the confounding variable is not considered than when the data are separated into the categories of the confounding variable

28

Simpson’s Paradox

• The following hypothetical data are similar to data from several actual studies looking at the association between oral contraceptive use and blood pressure

29

Sample Size #with High B.P. % with High B.P.

Use O. C. 800 64 64 of 800 = 8.0%

Don’t Use O. C. 1600 136 136 of 1600 = 8.5%

Simpson’s Paradox

Age 18-34 Age 35-49

Sample Size

n and % with High

B.P.

Sample Size

n and % with High

B.P.

Use O.C. 600 36 (6%) 200 28 (14%)

Don’t Use O.C. 400 16 (4%) 1200 120 (10%)

30

Categorical Variables, Relative Risk, Odds Ratios STA 220 – Lecture #8 1.

Documents

Transcript of Categorical Variables, Relative Risk, Odds Ratios STA 220 – Lecture #8 1.