Double Bar Charts
When to Use Categorical data
How to construct– Constructed like bar charts, but with two (or
more) groups being compared– MUST use relative frequencies on the
vertical axis– MUST include a key to denote the different
barsWhy MUSTMUST we use relative frequencies?
Each year the Princeton Review conducts a survey of students applying to college and of parents of college applicants. In 2009, 12,715 high school students responded to the question “Ideally how far from home would you like the college you attend to be?” Also, 3007 parents of students applying to college responded to the question “how far from home would you like the college your child attends to be?” Data is displayed in the frequency table below.
Frequency
Ideal Distance Students Parents
Less than 250 miles 4450 1594
250 to 500 miles 3942 902
500 to 1000 miles 2416 331
More than 1000 miles 1907 180
Create a comparative bar chart with these data.
What should you do first?
Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06
Found by dividing the frequency by the total number of students
Found by dividing the frequency by the total number of parents
What does this graph show about the ideal distance college should be from home?
Segmented (or Stacked) Bar Charts
When to Use Categorical data
How to construct– MUST first calculate relative frequencies– Draw a bar representing 100% of the group– Divide the bar into segments corresponding
to the relative frequencies of the categories
Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06
Remember the Princeton survey . . .
Create a segmented bar graph with these data.
First draw a bar that
represents 100% of the
students who answered
the survey.
Less than 250 miles
250 to 500 miles
500 to 1000 miles
More than 1000 miles
Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06
First draw a bar that
represents 100% of the
students who answered
the survey. 0.2
0.4
0.6
0.8
1.0
Rela
tive f
requency
Students
Next, divide the bar into segments.
Do the same thing for
parents – don’t forget a key
denoting each category
Parents
Notice that this segmented bar chart displays the same relationship
between the opinions of students and parents concerning the ideal
distance that college is from home as the double bar chart does.
Pie (Circle) ChartWhen to Use Categorical data
How to construct– Draw a circle to represent the entire data
set– Calculate the size of each “slice”:
Relative frequency × 360° – Using a protractor, mark off each slice
To describe – comment on which category had the largest
proportion or smallest proportion
Typos on a résumé do not make a very good impression when applying for a job. Senior executives were asked how many typos in a résumé would make them not consider a job candidate. The resulting data are summarized in the table below.
Number of Typos
Frequency
Relative Frequency
1 60 .40
2 54 .36
3 21 .14
4 or more 10 .07
Don’t know 5 .03
Create a pie chart for these data.
Number of Typos
Frequency
Relative Frequency
1 60 .40
2 54 .36
3 21 .14
4 or more 10 .07
Don’t know 5 .03First draw a
circle to represent the entire data set.
Next, calculate the size of the
slice for “1 typo”
.40×360º =144º
Draw that slice.
Repeat for each slice.
Here is the completed pie chart created
using Minitab.
What does this pie chart tell us about the number of typos occurring in
résumés before the applicant would not be considered for a job?
What strikes you as the most distinctive difference among the distributions of exam scores in
classes A, B, & C ?
1. Center
• discuss where the middle of the data falls
• three measures of central tendency–mean, median, & mode
The mean and/or median is typically reported rather than
the mode.
What strikes you as the most distinctive difference among the
distributions of scores in classes D,
E, & F?
2. Spread
• discuss how spread out the data is
• refers to the variability in the data
• Measure of spread are–Range, standard deviation, IQR
Remember,Range = maximum value – minimum
value
Standard deviation & IQR will be discussed in Chapter 4
What strikes you as the most distinctive difference among the distributions of exam scores in
classes G, H, & I ?
3. Shape
• refers to the overall shape of the distribution
• symmetrical, uniform, skewed, or bimodal
The following slides will discuss these shapes.
Symmetrical
• refers to data in which both sides are (more or less) the same when the graph is folded vertically down the middle
•bell-shaped is a special type–has a center mound with two sloping tails
1. Collect data by rolling two dice and recording the sum of the two dice. Repeat three times.
2. Plot your sums on the dotplot on the board.
3. What shape does this distribution have?
Uniform
• refers to data in which every class has equal or approximately equal frequency
1. Collect data by rolling a single die and recording the number rolled. Repeat five times.
2. Plot your numbers on the dotplot on the board.
3. What shape does this distribution have?
To help remember the name for this shape,
picture soldier standing in
straight lines. What are they
wearing?
Skewed
• refers to data in which one side (tail) is longer than the other side
• the direction of skewness is on the side of the longer tail
1. Collect data finding the age of five coins in circulation (current year minus year of coin) and record
2. Plot the ages on the dotplot on the board.
3. What shape does this distribution have?The directions are positively (or right) skewed or negatively (or
left) skewed.
Name a variable with a distribution that is negatively skewed.
Bimodal (multi-modal)
• refers to the number of peaks in the shape of the distribution
• Bimodal would have two peaks• Multi-modal would have more
than two peaksBimodal distributions can occur when the data set consist of observations
from two different kinds of individuals or objects.
Suppose collect data on the time it takes to drive from San Luis Obispo, California to Monterey, California. Some people may take the inland route (approximately 2.5 hours) while others may take the coastal route (between 3.5 and 4 hours).
What shape would this distribution have?
What would a distribution be called if it had ONLY one peak?Unimodal
What strikes you as the most distinctive difference among the
distributions of exam scores in class J ?
5. In context• You must write your answer in
reference to the context in the problem, using correct statistical vocabulary and using complete sentences!
Stem-and-Leaf DisplaysWhen to Use Univariate numerical data
How to construct– Select one or more of the leading digits for the
stem– List the possible stem values in a vertical
column– Record the leaf for each observation beside
each corresponding stem value– Indicate the units for stems and leaves in a key
or legend
To describe – comment on the center, spread, and shape of the
distribution and if there are any unusual features
Each number is split into two parts:
Stem – consists of the first digit(s)Leaf - consists of the final digit(s)
Use for small to moderate sized
data sets. Doesn’t work well for large data sets.
Be sure to list every stem
from the smallest to the largest value
If you have a long lists of leaves behind a few
stems, you can split stems in order to
spread out the distribution.
Can also create comparative stem-and-leaf displays
Remember the data set collected in Chapter 1 – how many piercings do you have? Would a stem-and-leaf display be a good graph for this distribution? Why or
why not?
The following data are price per ounce for various brands of different brands of dandruff shampoo at a local grocery store.
0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23
Create a stem-and-leaf display with this data? Stem Leaf
1
2
3
4
5
What would an appropriate stem be?
List the stems vertically
For the observation of “0.32”, write the
2 behind the “3” stem.
2
Continue recording each leaf with the
corresponding stem
1 9
4
7
8
6
3
Describe this distribution.
The median price per ounce for dandruff
shampoo is $0.285, with a range of $0.37. The
distribution is positively skewed with an outlier
at $0.54.
The Census Bureau projects the median age in 2030 for the 50 states and Washington D.C. A stem-and-leaf display is shown below.
Notice that you really cannot see a distinctive
shape for this distribution due to the long list of
leaves
We can split the stems in order to better see the
shape of the distribution.
Notice that now you can see the
shape of this distribution.
We use L for lower leaf values (0-4) and H for higher leaf
values (5-9).
The following is data on the percentage of primary-school-aged children who are enrolled in school for 19 countries in Northern Africa and for 23 countries in Central African.
Northern Africa54.6 34.3 48.9 77.8 59.6 88.5 97.4 92.5 83.9 98.891.6 97.8 96.1 92.2 94.9 98.6 86.6 96.9 88.9
Central Africa58.3 34.6 35.5 45.4 38.6 63.8 53.9 61.9 69.9 43.085.0 63.4 58.4 61.9 40.9 73.9 34.8 74.4 97.461.0 66.7 79.6
Create a comparative stem-and-leaf display. What is an appropriate
stem?
Let’s truncate the leaves to the unit place.
“4.6” becomes “4”
Be sure to use comparative language when describing
these distributions!
The median percentage of primary-school-aged children enrolled in school is larger for countries in Northern Africa than in Central Africa, but the ranges are the same. The
distribution for countries in Northern Africa is strongly negatively skewed, but the
distribution for countries in Central Africa is approximately symmetrical.
HistogramsWhen to Use Univariate numerical
data
How to construct Discrete data―Draw a horizontal scale and mark it with the
possible values for the variable―Draw a vertical scale and mark it with frequency
or relative frequency―Above each possible value, draw a rectangle
centered at that value with a height corresponding to its frequency or relative frequency
To describe – comment on the center, spread, and shape of the
distribution and if there are any unusual features
Constructed differently for
discrete versus continuous data
For comparative histograms – use two separate graphs with the same scale on the horizontal axis
Queen honey bees mate shortly after they become adults. During a mating flight, the queen usually takes several partners, collecting sperm that she will store and use throughout the rest of her life. A study on honey bees provided the following data on the number of partners for 30 queen bees.
12 2 4 6 6 7 8 7 8 11 8 3 5 6 7 10 1 9 7 6 9 7 5 4 7 4 6 7 8 10
Create a histogram for the number of partners of the queen bees.
First draw a horizontal
axis, scaled with
the possible values of
the variable of interest.
Next draw a vertical
axis, scaled
with frequency or relative frequency.
Draw a rectangle
above each value with a
height correspondin
g to the frequency.
Suppose we use relative frequency instead of frequency
on the vertical axis.What do you notice about the shapes of these two histograms?
0 1 2 3 4 5 6 7 8 9 10 11 120
1
2
3
4
5
6
7
HistogramsWhen to Use Univariate numerical data
How to construct Continuous data―Mark the boundaries of the class intervals on the
horizontal axis―Draw a vertical scale and mark it with frequency or
relative frequency―Draw a rectangle directly above each class interval
with a height corresponding to its frequency or relative frequency
To describe – comment on the center, spread, and shape of the
distribution and if there are any unusual features
This is the type of histogram that most students are familiar
with.
A study examined the length of hours spent watching TV per day for a sample of children age 1 and for a sample of children age 3. Below are comparative histograms.
Children Age 1 Children Age 3
Notice the common scale on the horizontal axis
Write a few sentences comparing the distributions.
The median number of hours spent watching TV per day was greater for the 1-year-olds than for the 3-year-olds. The distribution for the 3-year-olds was more
strongly skewed right than the distribution for the 1-year-olds, but the two distributions had similar ranges.
Histograms with unequal intervals
When to use- when you have a concentration of data in the middle with some extreme values
How to construct- construct similar to histograms with continuous data, but with density on the vertical axis
interval of widthinterval for frequency relative
density
Cumulative Relative Frequency Plot
When to use- used to answer questions about percentiles.
How to construct- Mark the boundaries of the intervals on the horizontal axis- Draw a vertical scale and mark it with relative frequency- Plot the point corresponding to the upper end of each interval with its cumulative relative frequency, including the beginning point- Connect the points.
Percentiles are a value with a given percent of
observations at or below that value.
The National Climatic Center has been collecting weather data for many years. The annual rainfall amounts for Albuquerque, New Mexico from 1950 to 2008 were used to create the frequency distribution below.
Annual Rainfall
(in inches)
Relative frequency
Cumulative relative frequency
4 to <5 0.052
5 to <6 0.103
6 to <7 0.086
7 to <8 0.103
8 to <9 0.172
9 to <10 0.069
10 to < 11 0.207
11 to <12 0.103
12 to <13 0.052
13 to <14 0.052
Find the cumulative relative frequency for
each interval
0.052
0.155
0.241
+
+
Continue this pattern to
complete the table
The National Climatic Center has been collecting weather data for many years. The annual rainfall amounts for Albuquerque, New Mexico from 1950 to 2008 were used to create the frequency distribution below.
Annual Rainfall
(in inches)
Relative frequency
Cumulative relative frequency
4 to <5 0.052 0.052
5 to <6 0.103 0.155
6 to <7 0.086 0.241
7 to <8 0.103 0.344
8 to <9 0.172 0.516
9 to <10 0.069 0.585
10 to < 11 0.207 0.792
11 to <12 0.103 0.895
12 to <13 0.052 0.947
13 to <14 0.052 0.999
In the context of this problem, explain the meaning of
this value.
In the context of this problem, explain the meaning of
this value.
Why isn’t this value one (1)?
To create a cumulative relative frequency plot, graph a point for the upper value of the interval and the
cumulative relative frequency
Plot a point for each interval. Plot a starting point at (4,0).
Connect the points.
2 4 6 8 10 12 14
0.2
0.4
0.6
0.8
1.0
Rainfall
Cum
ula
tive r
ela
tive f
requen
cy
What proportion of years had rainfall amounts that were 9.5 inches or less?
Approximately 0.55
2 4 6 8 10 12 14
0.2
0.4
0.6
0.8
1.0
Rainfall
Cum
ula
tive r
ela
tive f
requen
cy
Approximately 30% of the years had annual rainfall less than what amount?
Approximately 7.5 inches
2 4 6 8 10 12 14
0.2
0.4
0.6
0.8
1.0
Rainfall
Cum
ula
tive r
ela
tive f
requen
cy
Which interval of rainfall amounts
had a larger proportion of
years –9 to 10 inches or 10 to 11 inches?
Explain The interval 10 to 11 inches, because its slope is steeper, indicating a larger proportion occurred.
ScatterplotsWhen to Use Bivariate numerical data
How to construct - Draw a horizontal scale and mark it with
appropriate values of the independent variable - Draw a vertical scale and mark it appropriate
values of the dependent variable - Plot each point corresponding to the
observations
To describe - comment the relationship between the variables
Scatterplots are discussed in much greater depth in
Chapter 5.
Time Series PlotsWhen to Use
- measurements collected over time at regular intervalsHow to construct
- Draw a horizontal scale and mark it with appropriate values of time
- Draw a vertical scale and mark it appropriate values of the observed variable
- Plot each point corresponding to the observations and connect
To describe - comment on any trends or patterns over time
Can be considered bivariate data where the y-variable is the variable measured
and the x-variable is time
Top Related