1 Describing Categorical Data Here we study ways of describing a variable that is categorical.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.
-
Upload
anthony-scott -
Category
Documents
-
view
221 -
download
0
Transcript of Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.
Copyright © 2014, 2011 Pearson Education, Inc. 1
Chapter 3Describing Categorical Data
Copyright © 2014, 2011 Pearson Education, Inc. 2
3.1 Looking At Data
Which hosts send the most visitors to Amazon’s Web site?
Data set consists of 188,996 visits
Host is a categorical variable
To answer this question we must describe the variation in Host
Copyright © 2014, 2011 Pearson Education, Inc. 3
3.1 Looking At Data
Frequency and Relative Frequency Tables
The distribution of a categorical variable is a list of values with its associated count (frequency)
A frequency table summarizes the distribution of a categorical variable
A relative frequency table shows the proportion (or percentage) in each category
Copyright © 2014, 2011 Pearson Education, Inc. 4
3.1 Looking At Data
Copyright © 2014, 2011 Pearson Education, Inc. 5
3.2 Charts of Categorical Data
Bar Charts and Pie Charts
Unless you need to know exact counts, charts are better than tables for summarizing more than five categories
The two most common displays of a categorical variable are a bar chart and a pie chart
Copyright © 2014, 2011 Pearson Education, Inc. 6
3.2 Charts of Categorical Data
The Bar Chart
Uses horizontal or vertical bars to show the distribution of a categorical variable
Is called a Pareto chart when the categories are sorted by frequency (popular in quality control)
Becomes cluttered with too many categories
Is appropriate for ordinal categorical variables
Copyright © 2014, 2011 Pearson Education, Inc. 7
3.2 Charts of Categorical Data
Bar Chart (Horizontal) of Top 10 Hosts
Copyright © 2014, 2011 Pearson Education, Inc. 8
3.2 Charts of Categorical Data
Bar Chart (Vertical) of Top 10 Hosts
Copyright © 2014, 2011 Pearson Education, Inc. 9
3.2 Charts of Categorical Data
The Pie Chart
Uses wedges of a circle to show the distribution of a categorical variable
Commonly chosen to illustrate market shares or sources of revenue for a company
Less useful than bar charts if we want to compare actual counts (easier to compare bars than angles of wedges)
Copyright © 2014, 2011 Pearson Education, Inc. 10
3.2 Charts of Categorical Data
Pie Chart of Top 10 Hosts
Copyright © 2014, 2011 Pearson Education, Inc. 11
3.3 The Area Principle
The Fundamental Rule for Data Displays
The area occupied by a part of the graph/chart that displays data should be proportional to the amount of data it represents
Charts decorated to attract attention often violate the area principle
Copyright © 2014, 2011 Pearson Education, Inc. 12
3.3 The Area Principle
An Example Violating the Area Principle
Copyright © 2014, 2011 Pearson Education, Inc. 13
3.3 The Area Principle
The Same Example Respecting the Area Principle
Copyright © 2014, 2011 Pearson Education, Inc. 14
4M Example 3.1: ROLLING OVER
Motivation
Are certain types of vehicles more prone to roll-over accidents than others?
Copyright © 2014, 2011 Pearson Education, Inc. 15
4M Example 3.1: ROLLING OVER
Method
Data gathered from Fatality Analysis Reporting System (FARS) for roll-over accidents on interstate highways. Cases that make up the rows are accidents resulting in roll-overs in 2000. The column of interest is model of the car involved.
Copyright © 2014, 2011 Pearson Education, Inc. 16
4M Example 3.1: ROLLING OVER
Mechanics
Copyright © 2014, 2011 Pearson Education, Inc. 17
4M Example 3.1: ROLLING OVER
Mechanics
Copyright © 2014, 2011 Pearson Education, Inc. 18
4M Example 3.1: ROLLING OVER
Message
Ford Broncos were involved in more than twice as many roll-over accidents as the next-closest model.
Copyright © 2014, 2011 Pearson Education, Inc. 19
4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES
Motivation
Apple, Google and Research in Motion (RIM) aggressively compete to sell their smartphones to businesses. RIM has dominated with its Blackberry line, but has that success held up to the intense competition from Apple and Google?
Copyright © 2014, 2011 Pearson Education, Inc. 20
4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES
Method
Copyright © 2014, 2011 Pearson Education, Inc. 21
4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES
Mechanics
Copyright © 2014, 2011 Pearson Education, Inc. 22
4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES
Mechanics
Copyright © 2014, 2011 Pearson Education, Inc. 23
4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES
Message
Corporate customers are purchasing more iPhones and Android phones for managers. From 2010 to 2011, Blackberry sales grew less than sales of iPhones and Android phones. While RIM still had the largest share of the market in 2011, it had decreased to less than 50%.
Copyright © 2014, 2011 Pearson Education, Inc. 24
3.4 Mode and Median
Mode
Category with the highest frequency
The longest bar in a bar chart
The widest slice in a pie chart
Two or more categories can tie with the highest frequency (bimodal or multimodal)
Copyright © 2014, 2011 Pearson Education, Inc. 25
3.4 Mode and Median
Median
Not appropriate for nominal data
Data must be ordinal
It is the category label of the middle observation in ordered data
Copyright © 2014, 2011 Pearson Education, Inc. 26
Best Practices
Use a bar chart to show the frequencies of a categorical variable.
Use a pie chart to show the proportions of a categorical variable.
Keep the baseline of a bar chart at zero.
Preserve the ordering of an ordinal variable.
Copyright © 2014, 2011 Pearson Education, Inc. 27
Best Practices (Continued)
Respect the area principle.
Show the best plots to answer the motivating question.
Label your chart to show the categories and indicate whether some have been combined or omitted.
Copyright © 2014, 2011 Pearson Education, Inc. 28
Pitfalls
Avoid elaborate plots that may be deceptive.
Do not show too many categories.
Do not put ordinal data in a pie chart.
Do not carelessly round data.