Post on 27-Jun-2020
STAT1010 – picturing data
1
1
3.2 Visualizing Distributions of data
! A frequency table provides information on the distribution of data. " When we discuss the distribution of a variable,
we are referring to the possible values, and which of the values occur more (or less) frequently than the others.
Political affiliation Frequency Democrat 517 Republican 371 Independent 112
Possible values
Occurred a lot
Occurred less frequently
2
The distribution of the data
! The distribution of data is the way the data values are spread over all possible values. " What values occur frequently? " If the variable is numeric, what is the maximum
value? What is the minimum value? " What is the “shape” of the distribution
390380370360350340330
15
10
5
0
Weight (grams)
Freq
uenc
y
Weight of Contents of Cans of Cola
3
Graphical displays of distributions
! As the phrase goes… “a picture is worth 1000 words”, and distributions are often better conveyed using graphics rather than tables.
Political affiliation
Frequency
Democrat 517 Republican 371 Independent 112
Democrat Republican Independent
Political affiliation in a 1000 person survey
politican affiliation
frequ
ency
of a
ffilia
tion
0100
200
300
400
500
600
STAT1010 – picturing data
2
Pioneer in Statistical Graphics
! Florence Nightingale " See video clip from “Joy of Statistics”
4
5
Bar graph
! Used to represent frequencies (or relative frequencies) for qualitative or categorical variables.
Democrat Republican Independent
Political affiliation in a 1000 person survey
politican affiliation
frequ
ency
of a
ffilia
tion
0100
200
300
400
500
600
6
Bar graph - labels
! Always provide useful labels.
Democrat Republican Independent
Political affiliation in a 1000 person survey
politican affiliation
frequ
ency
of a
ffilia
tion
0100
200
300
400
500
600
Main title
Vertical axis label
Horizontal axis label
Categories
Tick marks
STAT1010 – picturing data
3
7
Bar graph - formatting
! Some things to remember…
Democrat Republican Independent
Political affiliation in a 1000 person survey
politican affiliation
frequ
ency
of a
ffilia
tion
0100
200
300
400
500
600
Space between bars (specifically when this is a categorical variable plot)
Uniform (arbitrary) bar widths
Some white space at top
8
Bar graph – Pareto chart
! A bar graph in which the bars are arranged in frequency order is called a Pareto chart.
Democrat Republican Independent
Political affiliation in a 1000 person survey
politican affiliation
frequ
ency
of a
ffilia
tion
0100
200
300
400
500
600
A Pareto chart (descending order)
9
Bar graph – Pareto chart
! A bar graph in which the bars are arranged in frequency order is called a Pareto chart.
Not a Pareto chart
Democrat Independent Republican
Political affiliation in a 1000 person survey
politican affiliation
frequ
ency
of a
ffilia
tion
0100
200
300
400
500
600
STAT1010 – picturing data
4
10
Bar graph – Pareto chart
Not a Pareto chart (but it is a bar chart)
A Pareto chart (and also a bar chart)
11
Example: Deflategate
! In 2014, there was a National Football League (NFL) scandal called ‘Deflategate’.
! The Patriots were accused of underinflating their game footballs, which would allow for fewer fumbles (an unfair advantage).
! Did it look like the Patriots had fewer fumbles? If so, how many fewer? We will actually look at the data as number of Plays per Fumble (high-> fewer fumbles).
12
Example: Deflategate (offensive plays)
http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible
Categories (i.e. teams)
Presented as plays per fumble
Frequency of fumbles
STAT1010 – picturing data
5
13
Dot plot – similar to a bar graph
! If there are only a small number of observations (or counts), a dot plot can be used.
! One dot per observation.s ! Sometimes seen as a quick and easy plot in the engineering field.
14
Pie Charts ! Also used to plot qualitative variables. ! A pie chart is a circle divided so that each
wedge represents the relative frequency of a particular category.
Political affiliation
Frequency Relative frequency
Democrat 517 0.517 Republican 371 0.371 Independent 112 0.112
Democrat
Republican
Independent
Political affiliation in a 1000 person survey
51.7%
11.2%
37.1%
15
Pie Charts ! As I may have mentioned earlier, research
has been done that shows that our brains do not interpret pie charts very well.
! Consider other options first before presenting a pie chart.
Our brains comprehend this one better than this one.
STAT1010 – picturing data
6
16
Histograms
! A histogram is like a bar graph, but it shows a distribution for a quantitative variable.
! The bars have a natural order (thus, the classes must be quantitative in nature) and the bar widths have specific meaning.
! The bars in a histogram touch each other because there are no gaps between the categories.
17 17
Histogram
Freq
uenc
y
Measurement
How ‘often’ a value falls into a given bin
Quantitative values grouped into bins
18
Histogram Example ! 24 cola cans were sampled and weighed. ! A frequency table and histogram were
created:
390380370360350340330
15
10
5
0
Weight (grams)
Freq
uenc
y
Weight of Contents of Cans of Cola
Class range of values
Frequency
[340,350) 1 [350,360) 11 [360,370) 8 [370,380) 4
STAT1010 – picturing data
7
19
Histogram Example
390380370360350340330
15
10
5
0
Weight (grams)
Freq
uenc
y
Weight of Contents of Cans of ColaNo space between bars (specifically when this is a quantitative variable plot)
Rearranging these bars (as we did in a Pareto chart for qualitative data) would not make sense here. The classes are in order from smallest to largest.
Some white space at top
Axes and labels still important.
20
Histogram Example ! Same data, more classes (narrower bins)…
histogram looks a bit different.
Class range of values
Frequency
[345,350) 1 [350,355) 6 [355,360) 5 [360,365) 1 [365,370) 7 [370,375) 3 [375,380) 1
390380370360350340330
10
5
0
Weight (grams)
Freq
uenc
y
Weight of Contents of Cans of Cola
21
Example: Deflategate (all plays)
http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible
Numeric variable
Number of teams falling into each bin
NOTE: This author should have the bars touching each other for a correct histogram presentation.
Don’t put space between bars in a histogram
Patriots and their 187 plays/fumble
STAT1010 – picturing data
8
22
Displaying Quantitative Data
! Histogram " Provides a picture or shape of the distribution
of the data. " Collects values into bins. " Bins should be of equal width and they should
touch each other. " Different bin choices can yield different
pictures. " Can show frequencies or relative frequencies
23
Stem-and-leaf plots ! We can’t see individual data points in a
histogram due to the binning and the use of the bars for frequencies.
! A stem-and-leaf plot is similar to a histogram, but individual data points are identified.
! As with dot plots, this type of plot probably makes the most sense when the number of observations is relatively small.
24
Stem-and-leaf plots ! One leaf is associated with one data point.
! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6
Here, a ‘leaf’ is the value one place to the right of the decimal place.
STAT1010 – picturing data
9
25
Stem-and-leaf plots ! One leaf is associated with one data point.
! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6
Here, a ‘leaf’ is the value one place to the right of the decimal place.
26
Stem-and-leaf example ! Recall the 80 observations on compressive
strengths:
105 97 245 163 207 134 218 199 160 196 221 154 228 131 180 178 157 151 175 201 183 153 174 154 190 76 101 142 149 200 186 174 199 115 193 167 171 163 87 176 121 120 181 160 194 184 165 145 160 150 181 168 158 208 133 135 172 171 237 170 180 167 176 158 156 229 158 148 150 118 143 141 110 133 123 146 169 158 135 149
27
Stem-and-leaf example ! 80 observations ! Min: 76, Max: 245 ! Here, a ‘leaf’ represents
the “ones place”. ! Looks somewhat like a
histogram turned on its side, but we can identify individual data points.
! Gives you a feel for the distribution of the data.
7 | 6 8 | 7 9 | 7 10 | 15 11 | 058 12 | 013 13 | 133455 14 | 12356899 15 | 001344678888 16 | 0003357789 17 | 0112445668 18 | 0011346 19 | 034699 20 | 0178 21 | 8 22 | 189 23 | 7 24 | 5 The decimal point is 1 digit(s) to the right of the |
STAT1010 – picturing data
10
28
Line charts ! Also used to represent a quantitative
variable.
! Created by connecting the ‘center dots’ at the top of the bars of a histogram.
29
Line chart example A histogram is also shown here, but it is not part of the line chart
30
Time-Series Graph
! If a histogram or line chart has a horizontal axis of time, then it is a time-series graph.
! Time series plots show how things change over time.
! Often used with financial market information or housing data.
STAT1010 – picturing data
11
31
Time-Series Graph – example
! A line chart with a horizontal axis of time (Year) # a times series graph.
32
Homes sold in Iowa City by zip code and month
Year (data by the month)
Time-Series Graph – example
1) What is the general trend over the years 2006-2011? 2) What is the general trend within each year? 3) What is the width of the underlying bin?
33
Number of Olympic medals
Year
Time-Series Graph – example
1) What is the width of the underlying bin?