Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL Session #1 Presented by: Dr. Del...

46
Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL Session #1 Presented by: Dr. Del Ferster Immaculata Week 2014 July 28—August 1, 2014

Transcript of Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL Session #1 Presented by: Dr. Del...

Page 1: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Statistics:Analyzing 2 Categorical

Variables

MIDDLE SCHOOL LEVEL

Session #1 Presented by: Dr. Del Ferster

Immaculata Week 2014July 28—August 1, 2014

Page 2: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Why are statistics significant? Why should we have young students be aware of statistics?

What kind of statistics can we consider with elementary students?

Why do many people who have studied statistics have “bad memories” of the subject?

Some questions to get us started

Page 3: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

We’re going to spend time today on QUALITATIVE STATISTICS.

We’ll consider effective ways to summarize qualitative statistics.

We’ll build TWO WAY TABLES. We’ll do an activity involving qualitative statistics that you might be able to adapt for use with your students.

What’s in store for today?

Page 4: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Some Basic Definitions

Page 5: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Qualitative variables classify the data into categories.

The categories may or may not have a natural ordering to them.

Qualitative variables are also called categorical variables.

EXAMPLES◦Eye color◦Political party◦Gender◦Do you smoke?

Qualitative Variables/Categorical Variables

Page 6: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Quantitative variables have numerical values that are measurements (length, weight, and so on) or counts (of how many).

Examples:◦How many are in your family?◦How many cars do you own?

Quantitative Variables

Page 7: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

We further distinguish quantitative variables based on whether or not the values fall on a continuum.◦A discrete variable is one for which you can count the number of possible values. How many siblings a person has

◦A continuous variable can take on any value within a given interval. A person’s weight

More on Quantitative Variables

Page 8: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

We’ll take a closer look at quantitative variables during our next meeting.

Quantitative Variables

Page 9: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

1 Categorical VariableA look at ways to represent our data

Page 10: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Distribution of a categorical variable

The distribution of a categorical variable provides the possible values that a variable can take on and how often these possible values occur.

The distribution of a categorical variable shows the pattern of variation of the variable.

Page 11: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

According to the Bureau of Justice, the following data represent the number of inmates by ethnicity in 2007.

Example #1

White 338,400Black 301,900

Hispanic 125,600

Page 12: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Graphing Qualitative Data

Often, rather than simply presenting numerical values, we choose to graph our data.

When generating a graph of 1 categorical variable, we might consider the following types of graph.◦Pie Chart◦Bar Graph

Page 13: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Pie Chart

A pie chart displays the distribution of the qualitative variable by dividing the circle into wedges corresponding to the categories of the variable such that the angle of each wedge is proportional to the percentage of items in that category.

Pie Charts are easy to do in EXCEL.

Page 14: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

A Pie Chart for the Prison Data

White 338,400

Black 301,900

Hispanic 125,600

Page 15: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

A Pie Chart for the Prison Data (Using Percents)

White 338,400

Black 301,900

Hispanic 125,600

Page 16: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Bar Graph

A bar graph displays the distribution of a qualitative variable by listing the categories of the variable along one axis and drawing a bar over each category with a height equal to the percentage of items in that category.

The bars should all be of equal width. We could also do one using percents.

Page 17: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Bar Graph for the Prison Data

White 338,400

Black 301,900

Hispanic 125,600

Page 18: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Comparing 2 Categorical Variable

•How does one variable compare to another?•2 Way Tables

Page 19: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Categorical Variables place individuals into one of several groups or categories.

The values of a categorical variable are labels for the different categories.

The distribution of a categorical variable lists the count or percent of individuals who fall into each category.

Comparing 2Categorical Variables

Page 20: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

When a dataset involves two categorical variables, we begin by examining the counts or percents in various categories for one of the variables.

Comparing 2Categorical Variables

Two-way Table – describes two categorical variables, organizing counts according to a row variable and a column variable.

Page 21: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Two-Way Tables

Two-way tables come about when we are interested in the relationship between two categorical variables.◦One of the variables is the row variable.

◦The other is the column variable.◦The combination of a row variable and a column variable is a cell.

Page 22: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Dr. F is hosting 38 of his friends to a cookout. Now, Dr. F. has limited cooking skills, so everyone is having a burger. However, he has bought sufficient tomatoes so that anyone who wants tomato on his or her burger will be happy.

The following slide details the results of his burger and tomato survey.◦ For the record….a good burger needs only 2

things…CHEESE….and KETCHUP!

Example #2

Page 23: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Burger/Tomato Two-Way Table

Let’s look at the components of a 2 way table

GENDER * TOMATOES Crosstabulation

Count

11 8 19

6 13 19

17 21 38

F

M

GENDER

Total

N Y

TOMATOES

Total

Row variable

Column variable

Column Totals

Row Totals

Overall Total

Cells

Page 24: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Dr. F. decided to survey a group of young adults, to determine whether they expected to be rich by the age of 30.

He decided to consider gender as one variable

The other variable indicates each participant’s expected likelihood of being rich (using the following options)◦ Almost no chance◦ Some chance, but probably not◦ A 50-50 chance◦ A good chance◦ Almost certain

Example #3

Page 25: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

The summary 2 way table

Young adults by gender and chance of getting rich

Female Male Total

Almost no chance 96 98 194

Some chance, but probably not 426 286 712

A 50-50 chance 696 720 1416

A good chance 663 758 1421

Almost certain 486 597 1083

Total 2367 2459 4826

Page 26: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Note: Percents are often more informative than counts, especially when comparing groups of different sizes.

Marginal DistributionThe Marginal Distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.

Page 27: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

To examine a marginal distribution:

1. Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals.

2. Make a graph to display the marginal distribution

More on Marginal Distribution

Page 28: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Marginal DistributionYoung adults by gender and chance of getting rich

Female

Male Total

Almost no chance 96 98 194

Some chance, but probably not

426 286 712

A 50-50 chance 696 720 1416

A good chance 663 758 1421

Almost certain 486 597 1083

Total 2367 2459 4826Response Percent

Almost no chance

194/4826 = 4.0%

Some chance 712/4826 = 14.8%

A 50-50 chance 1416/4826 = 29.3%

A good chance 1421/4826 = 29.4%

Almost certain 1083/4826 = 22.4%

Examine the marginal distribution of chance of getting rich.

Almost none

Some chance

50-50 chance

Good chance

Almost certain

05

101520253035

Chance of being wealthy by age 30

Survey Response

Perc

ent

Page 29: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Marginal distributions tell us nothing about the relationship between two variables.

The need for more??

A Conditional Distribution of a variable describes the values of that variable among individuals who have a specific value of another variable.

Page 30: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

To examine or compare conditional distributions:

1. Select the row(s) or column(s) of interest.2. Use the data in the table to calculate the

conditional distribution (in percents) of the row(s) or column(s).

3. Make a graph to display the conditional distribution.

4. Use a side-by-side bar graph or segmented bar graph to compare distributions.

Conditional Distribution

Page 31: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Conditional DistributionYoung adults by gender and chance of getting rich

Female Male Total

Almost no chance 96 98 194

Some chance, but probably not

426 286 712

A 50-50 chance 696 720 1416

A good chance 663 758 1421

Almost certain 486 597 1083

Total 2367 2459

4826Response Male

Almost no chance

98/2459 = 4.0%

Some chance 286/2459 = 11.6%

A 50-50 chance 720/2459 = 29.3%

A good chance 758/2459 = 30.8%

Almost certain 597/2459 = 24.3%

•Calculate the conditional distribution of opinion among males.•Examine the relationship between gender and opinion.

Almost no chance

Some chance

50-50 chance

Good chance

Almost certain

0

10

20

30

40

Chance of being wealthy by age 30

Males

Se-ries2

Opinion

Perc

ent

Female

96/2367 = 4.1%

426/2367 = 18.0%

696/2367 = 29.4%

663/2367 = 28.0%

486/2367 = 20.5%

Almost no chance

Some chance

50-50 chance

Good chance

Almost certain

0

10

20

30

40

Chance of being wealthy by age 30

Males

Females

Opinion

Perc

ent

Males Females0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Chance of being wealthy by age 30

Almost certain

Good chance

50-50 chance

Some chance

Almost no chanceOpinion

Perc

ent

Page 32: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

A chance to work on one together•An example concerning marginal and conditional distributions

Page 33: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Enrollment of recent high school graduates. The table below gives some census data concerning the enrollment status of recent high school graduates aged 16 to 24 years.

Setting up our problem

Page 34: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

How many male recent high school graduates aged 16 to 24 years were enrolled full-time in two-year colleges?

How many female recent high school graduates aged 16 to 24 years were enrolled in graduate schools?

Continuing our problem

890

366

Page 35: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

See if you can use the skills developed in this presentation to complete the handout that Dr. F. will distribute.

Feel free to consult your notes, work together, or ask me if you get really stuck or frustrated.

RELAX…it’s just for FUN!

Now the big challenge

Page 36: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

The marginal distribution of gender

Solution #1

status men women

2 year college, full time 890 969

2 year college, part time 340 403

4 year college, full time 2897 3321

4 year college, part time 249 383

graduate school 306 366total 4842 5579

Marginal Distributions of gender 46.5% 53.5%

Page 37: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Graph of The marginal distribution of gender

Solution #1

Page 38: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

The marginal distribution of status

Solution #2

status Percent

2 year college, full time 17.8%

2 year college, part time 7.1%

4 year college, full time 59.7%

4 year college, part time 6.1%graduate school 6.4%

vocational school 2.9%

Page 39: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Graph of The marginal distribution of status

Solution #2

Page 40: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Conditional Distribution of Gender for each status

Solution #3

2 year college, full time

2 year college, part time

4 year college, full time

4 year college, part time

graduate school

vocational school

Men 47.9% 45.8% 46.6% 39.4% 45.5% 53.9%

Women 52.1% 54.2% 53.4% 60.6% 54.5% 46.1%

Page 41: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Graph of Conditional Distribution of Gender for each status

Solution #3

2 ye

ar co

llege

, ful

l tim

e

2 ye

ar co

llege

, par

t tim

e

4 ye

ar co

llege

, ful

l tim

e

4 ye

ar co

llege

, par

t tim

e

grad

uate

scho

ol

voca

tiona

l sch

ool

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

Conditional Distribution of Gender for Each Status

MenWomen

Page 42: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Conditional Distribution of Status for Each Gender

Solution #4

2 year college, full time

2 year college, part time

4 year college, full time

4 year college, part time

graduate school

vocational school

Men 18.4% 7.0% 59.8% 5.1% 6.3% 3.3%

Women 17.4% 7.2% 59.5% 6.9% 6.6% 2.5%

Page 43: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Graph of Conditional Distribution of Status for Each Gender

Solution #4

Page 44: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

A Different Graph of Conditional Distribution of Status for Each Gender

Solution #4

Page 45: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Questions or Concerns?

Page 46: Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.

Next time we’ll be looking at:

1. Analysis of quantitative statistics 2. We’ll consider linear regression (without having to actually calculate

the equation of the regression line.

3. We’ll also look at correlation

Looking Ahead