Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in...

42
Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François Nielsen University of North Carolina Chapel Hill January 12, 2012

Transcript of Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in...

Page 1: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Soci252 – Data Analysis in SociologicalResearch

Chapter 3 – Displaying and DescribingCategorical Data

François Nielsen

University of North CarolinaChapel Hill

January 12, 2012

Page 2: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Lecture Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 3: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Lecture Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 4: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Lecture Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 5: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Lecture Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 6: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Lecture Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 7: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Lecture Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 8: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Lecture Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 9: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

A Peculiar Event

É What is the event described by the following table?

Survived No YesClass Sex Age1st Male Child 0 5

Adult 118 57Female Child 0 1

Adult 4 1402nd Male Child 0 11

Adult 154 14Female Child 0 13

Adult 13 803rd Male Child 35 13

Adult 387 75Female Child 17 14

Adult 89 76Other Male Child 0 0

Adult 670 192Female Child 0 0

Adult 3 20

Page 10: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 11: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

The Three Rules of Data Analysis

É The three rules of data analysis won’t be difficult toremember:É Make a picture—things may be revealed that are not obvious

in the raw data. These will be things to think about.É Make a picture—important features of and patterns in the data

will show up. You may also see things that you did not expect.É Make a picture—the best way to tell others about your data is

with a well-chosen picture.

Page 12: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 13: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Frequency Tables: Making Piles

É We can “pile” the data by counting the number of data valuesin each category of interest.

É We can organize these counts into a frequency table, whichrecords the totals and the category names.

Figure: A frequency table of the Titanic passengers

Page 14: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Frequency Tables: Making Piles (cont.)

É A relative frequency table is similar, but gives the percentages(instead of counts) for each category.

Figure: A relative frequency table of the Titanic passengers

Page 15: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Frequency Tables: Making Piles (cont.)

É Both types of tables show how cases are distributed across thecategories.

É They describe the distribution of a categorical variablebecause they name the possible categories and tell howfrequently each occurs.

Page 16: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 17: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

What’s Wrong With This Picture?

É You might think that a goodway to show the Titanicdata is with this display:

Page 18: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

The Area Principle

É The ship display makes it look like most of the people on theTitanic were crew members, with a few passengers along forthe ride.

É When we look at each ship, we see the area taken up by theship, instead of the length of the ship.

É The ship display violates the area principle:É The area occupied by a part of the graph should correspond to

the magnitude of the value it represents.

Page 19: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Bar Charts

É A bar chart displays thedistribution of acategorical variable,showing the counts foreach category next toeach other for easycomparison.

É A bar chart stays true tothe area principle.

É Thus, a better display forthe ship data is:

Page 20: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Bar Charts (cont.)É A relative frequency bar chart displays the relative proportion

of counts for each category.É A relative frequency bar chart also stays true to the area

principle.É Replacing counts with percentages in the ship data:

Figure: Relative frequency bar chart

Page 21: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Pie ChartsÉ When you are interested in parts of the whole, a pie chart

might be your display of choice.É Pie charts show the whole group of cases as a circle.É They slice the circle into pieces whose size is proportional to

the fraction of the whole in each category.

Figure: Titanic passengers in each class

Page 22: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 23: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Contingency Tables

É A contingency table allows us to look at two categoricalvariables together.

É It shows how individuals are distributed along each variable,contingent on the value of the other variable.É Example: we can examine the class of ticket and whether a

person survived the Titanic:

Figure: Contingency table of ticket class and survival

Page 24: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Contingency Tables (cont.)

É The margins of the table, both on the right and on the bottom,give totals and the frequency distributions for each of thevariables.

É Each frequency distribution is called a marginal distributionof its respective variable.É The marginal distribution of Survival is:

Alive Dead Total

711 1490 2201

É The marginal distribution of Class is:

First Second Third Crew Total

325 285 706 885 2201

Page 25: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Contingency Tables (cont.)

É Each cell of the contingency table gives the count for acombination of values of the two values.É For example, the second cell in the crew column tells us that

673 crew members died when the Titanic sunk.É The cells are defined according to an and logic: this cell

consists of individuals who are crew members and who died.

Figure: Contingency table of ticket class and survival (Repeated)

Page 26: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 27: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Conditional Distributions

É A conditional distribution shows the distribution of onevariable for just the individuals who satisfy some condition onanother variable.É The following is the conditional distribution of ticket Class,

conditional on having survived:

Figure: Conditional distribution of Class, conditional on having survived

Page 28: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Conditional Distributions (cont.)

É The following is the conditional distribution of ticket Class,conditional on having perished:

First Second Third Crew Total

Dead 122 167 528 673 1490

8.2% 11.2% 35.4% 45.2% 100%

Page 29: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Conditional Distributions (cont.)É The conditional distributions tell us that there is a difference

in class for those who survived and those who perished.É This is better shown with pie charts of the two distributions:

Figure: Pie charts of the conditional distribution of Class, for thesurvivors and nonsurvivors, separately

Page 30: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Conditional Distributions (cont.)

É We see that the distribution of Class for the survivors isdifferent from that of the nonsurvivors.

É This leads us to believe that Class and Survival are associated,that they are not independent.

É The variables would be considered independent when thedistribution of one variable in a contingency table is the samefor all categories of the other variable.

Page 31: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Segmented Bar Charts

É A segmented bar chartdisplays the sameinformation as a pie chart,but in the form of barsinstead of circles.

É Each bar is treated as the“whole” and is dividedproportionally into segmentscorresponding o thepercentage in each group.

É Here is the segmented barchart for ticket Class bySurvival status:

Figure: A segmented bar chart forClass by Survival

Page 32: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Conditional Distributions (cont.)

É We have looked at the distribution of Class conditional onSurvival.

É When one variable represents an outcome to be explained(called the response), and another variable represents apotential explanatory factor, it is more natural in looking foran association to examine the conditional distribution of theresponse variable, conditional on values of the explanatoryvariable.É In the Titanic example we would look at the conditional

distributions of Survival, conditional on Class.É These conditional distributions are obtained by percentaging

the table by columns, as in this table:

First Second Third CrewAlive 62.5 41.4 25.2 24.0Dead 37.5 58.6 74.8 76.0Total 100 100 100 100

Page 33: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Conditional Distributions (cont.)

É Looking at the distributions of Survival conditional on Classwe can clearly see the pattern of decreasing probability ofsurvival from first class (62.5%) to third class and crew(25.2% and 24.0%, respectively).

É An important general principle of data analysis is thatassociations among variables correspond to, and are revealed by,differences in the conditional distributions of the responsevariable, conditional on the value(s) of the explanatoryvariable(s).É This principle works for any combination of qualitative and

quantitative variables.

Page 34: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 35: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Simpson’s Paradox

É To be added later.

Page 36: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

What Can Go Wrong?

É Don’t violate the area principle.É While some people might like the pie chart on the left better, it

is harder to compare fractions of the whole, which a well-donepie chart does.

Page 37: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

What Can Go Wrong? (cont.)

É Keep it honest—make sure your display shows what it says itshows.É This plot of the percentage of high-school students who

engage in specified dangerous behaviors has a problem. Canyou see it?

Page 38: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

What Can Go Wrong? (cont.)

É Don’t confuse similar-sounding percentages—pay particularattention to the wording of the context.

É Don’t forget to look at the variables separately too—examinethe marginal distributions, since it is important to know howmany cases are in each category.

Page 39: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

What Can Go Wrong? (cont.)

É Be sure to use enough individuals!É Do not make a report like “We found that 66.67% of the rats

improved their performance with training. The other rat died.”

Page 40: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

What Can Go Wrong? (cont.)

É Don’t overstate your case—don’t claim something you can’t.É Don’t use unfair or silly averages—this could lead to

Simpson’s Paradox, so be careful when you average onevariable across different levels of a second variable.

Page 41: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

Outline

Three Rules of Data Analysis

Frequency Tables

Graphic Representations

Contingency Tables

Conditional Distributions

Caveats

What Have We Learned?

Page 42: Soci252 Data Analysis in Sociological Research Chapter 3 ... · Soci252 – Data Analysis in Sociological Research Chapter 3 – Displaying and Describing Categorical Data François

What have we learned?

É We can summarize categorical data by counting the numberof cases in each category (expressing these as counts orpercents).

É We can display the distribution in a bar chart or pie chart.É And, we can examine two-way tables called contingency

tables, examining marginal and/or conditional distributionsof the variables.

É Differences in the conditional distributions of the responsevariable, conditional on the value(s) of the explanatoryvariable(s), reveal patterns of dependence (association)among variables.