Chapter 1
description
Transcript of Chapter 1
Chapter 1: January 29, 2004 1
Chapter 1
Picturing Distributions with Graphs
2Chapter 1: January 29, 2004
StatisticsStatistics is a science that involves the extraction of information from numerical data obtained during an experiment or from a sample. It involves:
1. the design of the experiment or sampling procedure,
2. the collection and analysis of the data, and 3. making inferences (statements, assumptions,
guesses) about the population based upon information in a sample.
3Chapter 1: January 29, 2004
Individuals and Variables Individuals
the objects described by a set of datamay be people, animals, or things
Variableany characteristic of an individualcan take different values for different
individuals
4Chapter 1: January 29, 2004
Variables Categorical
Places an individual into one of several groups or categories
Quantitative (Numerical)Takes numerical values for which
arithmetic operations such as adding and averaging make sense
5Chapter 1: January 29, 2004
Example 1.1
Individuals?
Variables?
Categorical?
Quantitative?
6Chapter 1: January 29, 2004
Distribution
Tells what values a variable takes and how often it takes these values
Can be a table, graph, or function
7Chapter 1: January 29, 2004
Displaying Distributions
Categorical variablesPie chartsBar graphs
Quantitative variablesHistogramsStemplots (stem-and-leaf plots)Time Plots
8Chapter 1: January 29, 2004
Year Count Percent
Freshman 18 41.9%
Sophomore 10 23.3%
Junior 6 14.0%
Senior 9 20.9%
Total 43 100.1%
Data Table
Example: Class Make-up on First Day
9Chapter 1: January 29, 2004
Freshman41.9%
Sophomore23.3%
Junior14.0%
Senior20.9%
Pie Chart
Example: Class Make-up on First Day
10Chapter 1: January 29, 2004
41.9%
23.3%
14.0%
20.9%
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
45.0%
Freshman Sophomore Junior Senior
Year in School
Per
cen
t
Example: Class Make-up on First Day
Bar Graph
11Chapter 1: January 29, 2004
Example: U.S. Solid Waste (2000)
Data Table
Material Weight (million tons) Percent of total
Food scraps 25.9 11.2 %
Glass 12.8 5.5 %
Metals 18.0 7.8 %
Paper, paperboard 86.7 37.4 %
Plastics 24.7 10.7 %
Rubber, leather, textiles 15.8 6.8 %
Wood 12.7 5.5 %
Yard trimmings 27.7 11.9 %
Other 7.5 3.2 %
Total 231.9 100.0 %
12Chapter 1: January 29, 2004
Example: U.S. Solid Waste (2000)
Pie Chart
13Chapter 1: January 29, 2004
Example: U.S. Solid Waste (2000)
Bar Graph
14Chapter 1: January 29, 2004
Example: U.S. Solid Waste (2000)
Bar Graph
15Chapter 1: January 29, 2004
Histograms
For quantitative variables that take many values
Divide the possible values into class intervals (we will only consider equal widths)
Count how many observations fall in each interval (may change to percents)
Draw picture representing distribution
16Chapter 1: January 29, 2004
Case Study
Weight Data
Introductory Statistics classSpring, 1997
Virginia Commonwealth University
17Chapter 1: January 29, 2004
Weight Data
18Chapter 1: January 29, 2004
Weight Data: Frequency Table
Weight Group Count 100 - <120 7 120 - <140 12 140 - <160 7 160 - <180 8 180 - <200 12 200 - <220 4 220 - <240 1 240 - <260 0 260 - <280 1
sqrt(53) = 7.2, or 8 intervals; range (260100=160) / 8 = 20 = class width
19Chapter 1: January 29, 2004
Weight Data: Histogram
Weight
* Left endpoint is included in the group, right endpoint is not.
0
2
4
6
8
10
12
14
Frequency
100 120 140 160 180 200 220 240 260 280
Nu
mb
er o
f st
ud
ents
20Chapter 1: January 29, 2004
21Chapter 1: January 29, 2004
22Chapter 1: January 29, 2004
Histograms: Class Intervals
How many intervals? One rule is to calculate the square root of the sample
size, and round up.
Size of intervals? Divide range of data (maxmin) by number of intervals
desired, and round to convenient number
Pick intervals so each observation can only fall in exactly one interval (no overlap)
23Chapter 1: January 29, 2004
Examining the Distribution of Quantitative Data
Overall pattern of graphShape of the dataCenter of the dataSpread of the data (Variation)
Deviations from overall pattern Outliers
24Chapter 1: January 29, 2004
Shape of the Data Symmetric
bell shapedother symmetric shapes
Asymmetricright skewed left skewed
Unimodal, bimodal
25Chapter 1: January 29, 2004
Symmetric Bell-Shaped
Symmetric Mound-Shaped
Symmetric Uniform
26Chapter 1: January 29, 2004
Asymmetric Skewed to the Left
Asymmetric Skewed to the Right
27Chapter 1: January 29, 2004
28Chapter 1: January 29, 2004
29Chapter 1: January 29, 2004
30Chapter 1: January 29, 2004
Outliers Extreme values that fall outside the
overall patternMay occur naturallyMay occur due to error in recordingMay occur due to error in measuringObservational unit may be fundamentally
different
31Chapter 1: January 29, 2004
32Chapter 1: January 29, 2004
33Chapter 1: January 29, 2004
Stemplots(Stem-and-Leaf Plots)
For quantitative variables Separate each observation into a stem (first part
of the number) and a leaf (the remaining part of the number)
Write the stems in a vertical column; draw a vertical line to the right of the stems
Write each leaf in the row to the right of its stem; order leaves if desired
34Chapter 1: January 29, 2004
Weight Data
35Chapter 1: January 29, 2004
Weight Data:Stemplot(Stem & Leaf Plot)
Key
20|3 means203 pounds
Stems = 10’sLeaves = 1’s
1011121314151617181920212223242526
192
2
1522
5
135
36Chapter 1: January 29, 2004
Weight Data:Stemplot(Stem & Leaf Plot)
10 016611 00912 003457813 0035914 0815 0025716 55517 00025518 00005556719 24520 321 02522 023242526 0
Key
20|3 means203 pounds
Stems = 10’sLeaves = 1’s
37Chapter 1: January 29, 2004
Extended Stem-and-Leaf Plots
If there are very few stems (when the data
cover only a very small range of values),
then we may want to create more stems by
splitting the original stems.
38Chapter 1: January 29, 2004
Extended Stem-and-Leaf PlotsExample: if all of the data values were between 150 and 179, then we may choose to use the following stems:
151516161717
Leaves 0-4 would go on each upper stem (first “15”), and leaves 5-9 would go on each lower stem (second “15”).
39Chapter 1: January 29, 2004
Time Plots
A time plot shows behavior over time. Time is always on the horizontal axis, and the
variable being measured is on the vertical axis. Look for an overall pattern (trend), and
deviations from this trend. Connecting the data points by lines may emphasize this trend.
Look for patterns that repeat at known regular intervals (seasonal variations).
40Chapter 1: January 29, 2004
Class Make-up on First Day(Fall Semesters: 1985-1993)
0%
10%
20%
30%
40%
50%
60%
70%
Percent of ClassThat Are Freshman
1985 1986 1987 1988 1989 1990 1991 1992 1993
Year of Fall Semester
Class Make-up On First Day
41Chapter 1: January 29, 2004
42Chapter 1: January 29, 2004
Average Tuition (Public vs. Private)