Unit # 4: Statistics
description
Transcript of Unit # 4: Statistics
6th grade Statistics is the study of how to collect, organize,
analyze, and interpret data
Unit # 4: Statistics
“We are rapidly entering a world where everything can be monitored and measured but the big problem is going to be the ability of humans to use, analyze, and make sense of the data.”
The New York TimesAugust 6, 2009
1. How do we properly collect data?
2. How do we organize data?3. How do we analyze data? 4. How do we interpret results?
Essential Questions
To properly collect data, we must first ask a question that will result in multiple and varying answers rather than a single answer.
A statistical question will result in variability in data and involve
a real-world context.
Sort the following questions into two groups: statistical & non-
statistical
1. How old is the oldest student in our class?2. Are there more boys or girls in our class?3. How old are the students in our class?4. How long do students in this class spend on math
homework each week?5. How many students in our class like to watch
scary movies?6. What types of movies are preferred by students
in our class?
Statistical or Non-StatisticalStatistical
3. How old are the students in our class?
4. How long do students in this class spend on math homework each week?
6. What types of movies are preferred by students in our class?
Non-Statistical1. How old is the
oldest student in our class?
2. Are there more boys or girls in our class?
5. How many students in our class like to watch scary movies?
Decide whether the questions in your envelope are statistical or non-statistical. Discuss.
If a question is non-statistical, discuss how you could rewrite it to be a statistical question that contains multiple and varying answers.
http://grade6commoncoremath.wikispaces.hcpss.org/Unit+5+Statistics+Probability
6.SP.1 Questions for Sorting
Get into groups of 3
1. TV habits2. Cell phone usage3. Eating fruits and vegetables 4. Watching sports
Write a statistical question and a non-statistical question about your topic.
Pick one of the following topics
Day 2
Write a statistical question you could ask your classmates
about the movies they like.
Bell work:
You will have 10 minutes to ask your classmates the statistical question you wrote during bell work. Create a line
plot or a tally chart to display the results.
Line plot
Tally chart
Statistical questions can be divided into two groups:
categorical or numerical.
A statistical question is a question that will result in multiple and varying answers rather than a single answer.
Categorical data is measured qualitatively by placing the items into categories or groups.Ex. Favorite color, male or female, hair color
Categorical data is often displayed using a bar graph or a circle graph
Categorical data
Often times, categorical data does not involve numbers,
therefore will not have a mean, median, or range.
The mode is the answer that occurs most often. The mode is the most useful calculation for
categorical data.
Categorical data
Numerical data is measured quantitatively and has a value or number for which operations such as addition or averaging make sense.
Numerical data can be analyzed to find mean, median, range, and other forms of measurement
Numerical data is often displayed using a line plot (dot plot), histogram, or box and whisker plot. *We will study each of these in more detail throughout this unit
Numerical Data
Examples of numerical data graphs
Sort the questions from your envelope into two groups:
Categorical Numerical
Get into groups of 3
http://grade6commoncoremath.wikispaces.hcpss.org/Unit+5+Statistics+Probability 6.SP.1 Categorical and Numerical Questions
Identify whether the following graphs are categorical or numerical
Day 3
In five days, it snowed 4 inches, 3 inches, 5 inches, 1 inch, and 2 inches.
4 in 3 in 5 in 1 in 2 in
Move the cubes until each stack has the same number of cubes.
Oklahoma Math Connects Course 1 – Glencoe 2011 – pg. 102
Activity:
1. On average, how many inches did it snow per day in five days? Explain your reasoning.
2. Suppose on the sixth day it snowed 9 inches. If you moved the cubes again, how many cubes would be in each stack.
Oklahoma Math Connects Course 1 – Glencoe 2011 – pg. 102
Activity Summary:
Mean (or average) – the results from equally distributing all the data in a set. The mean can be found in either of the following
waysThe mean can be
found by leveling all the values to have an equal distribution.
http://www.shodor.org/interactivate/activities/PlopIt/
The mean is the sum of the data divided by the number of pieces of data.
Ex. Data set: 4, 3, 2, 5, 1
Mean: 4 + 3 + 2 + 5 + 1 5 = 15/5
Mean = 3
Find the mean of the data sets below.Years of experience
on a baseball teamCell phone usage
4 6 3 2 1 0 6 4 5 3
2
478 295 780 685 570 588 495
390 587 376
Outliers are values that are much higher or lower than others in a data setChapter 5 Test Scores
1. Calculate the mean of the data set.2. Calculate the mean leaving out the score of 0. 3. Calculate the mean leaving out both the 20 and
the 0.
Outliers
100 99 98 96 95 88 86 81 79 76 66 64 52 20 0
Answers: 1. 73.3, 2. 78.6, 3. 83.1
$115, $125, $55, $135, $400, $105, $115, $140
Calculate the mean of the data set.
Calculate the mean of the data set without the outlier(s).
Identify the outlier(s) in these costs:
Answers: The outliers would be $400 and $55.The mean of the entire data set is $148.75.The mean without the outliers is $122.50.
Write at least three sentences describing how outliers can distort calculations of the mean.
Day 4
Allison’s Math Test Scores
The table shows Allison’s scores on four tests. What score does she need on the 5th test to have an overall mean score of 92?
Bell work
Test Scores1 892 983 854 945 ?
1. Add up the points for the first 4 test scores.89 + 98 + 85 + 94 = 366
2. Find out how many points it would take to get a 92 average.92 x 5 = 460
3. Subtract to find the points needed on the 5th test.460-366 = 94
One possible solution strategy
A set of data collected to answer a statistical question has a distribution which can be described by its center, spread, and overall shape.
CENTER - A measure of center summarizes all of the values of a data set with a single number (mean, median, mode)
SPREAD - Describes how spread out or varied the data is (range)
OVERALL SHAPE – Skewed left, skewed right, normal distribution, uniform distribution, bimodal distribution
Analyzing Numerical Data
Mean – also called the average, fair share or balance point of a set of data – it can be found using a leveling strategy or by finding the sum of the data divided by the number of pieces of data.
Median – the middle number in a set of data ordered from smallest to largest. If the data set has an odd number of elements, the median is the single middle value. If the data set has an even number of elements, the median is the average of the two middle values.
Mode – the number(s) that occur(s) most often (there can be more than one mode in a data set)
Measures of Center
Range – the difference between the greatest and least values of the set
Mean Absolute Deviation- the absolute value of each data point from the mean of the data set
Measures of Spread or Variability
Hey diddle diddleThe median’s in the middle;You add and divide for the
mean.The mode is the one that
appears the mostAnd the range is the difference between.
Find the mean, median, mode & range.
Find the mean number of points scored.
Find the median number ofpoints scored.
What do the values of the mean and median tell you about the overall shape or distribution of the data?
Points scored by a basketball team
Player PointsEthan 16Collin 9Nathan 10Mason 4Tyler 12Aaron 8Cole 2
Each of the 20 students in Mr. Anderson’s class timed how long it took them to solve a puzzle. Their times (in minutes) are listed below:
Display the data using a line plot.Find the mean and median of the data. Does it
surprise you that the values of the mean and median are not equal? Explain why or why not.
http://www.illustrativemathematics.org/illustrations/877
Puzzle Times
Student
1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
Time (in minutes
3 5 4 6 4 8 5 4 9 5 3 4 7 5 8 6 3 6 5 7
Overall Shape
Normal DistributionBimodal Distribution
Uniform Distribution
Skewed LeftSkewed Right
2nd day on previous material
Also, we need to address when finding the mean is the best strategy, when finding the median is best, and when finding the mode is best.
Day 5
Mean – the data have no outliers (values much larger or smaller than the
rest of the data)
Median – the data contains outliers
Mode – data have many repeated numbers or the data is categorical in nature
This measure is most useful when:
Which is the best measure of center for this data set?
Which is the best measure of center for this data set?
Which is the best measure of center for this data set?
1. Jamal said that the number that best represented the following set of data is 27. Which measure of central tendency is he referring to?
28, 32, 21, 25, 33, 32, 20, 26
2. The number of books read by the students in each core literacy class is:
104, 90, 162, 134, 110, 97, 145, 126. Which measure of central tendency best
describes the data? Explain. Glencoe Pre-Algebra (2012) pg. 779
3. The high temperatures for one week are 79°, 81°, 77°, 81°, 82°, 75°, and 76°. If the temperature on the eighth day is 80°, which of the following would be true?
A. The mode will changeB. The mean will increase and the median will remain the sameC. The median will increase and the mean will remain the sameD. Both the mean and the median will increase Glencoe Pre-Algebra (2012) pg. 779
Day 6
Nigel’s class placed 10 empty rain gauges on the playground Monday morning. The line plot below shows the number of inches of rainwater in each gauge after it rained Monday afternoon.
Number of Inches of Rainwater x x
x x x x x x x x
3/8 1/2 5/8 3/4 7/8
What is the mean amount of rainwater per gauge, in inches, in the 10 rain gauges?
Bell work: *ACT Aspire Sample Question
What is the mean amount of rainwater per gauge, in inches, in the 10 rain gauges?
A. 25/80B. 5/8C. 51/80D. 37/56E. 51/8
Answer: C
Measures the average amount that the items of a data set differ from the mean of the set. Find the meanFind the difference of each value from the
meanSubtract the smaller value from the larger value, so each difference is positive
Find the mean absolute deviation by averaging the differences
Mean absolute deviation
Mean = (4 + 5 + 7 + 12)/ 4 = 7
Deviations from the mean = 3, 2 , 0, 5
Mean absolute deviation = (3 + 2 + 0 + 5)/4 = 10/4 = 2.5
Hours worked on a Saturday: 4, 5, 7, 12
Day 7 and 8
See page 81
Bar Graph
Bar graphs can be used to display categorical or numerical data.
A histogram is a special type of bar graph used to display numerical data that has been organized into intervals.
The heights of the bars show the number of people in each group.
The horizontal axis represents the intervals.
Histograms
The vertical axis
represents the
frequency or number
of observation
s
1. Draw and label a horizontal and vertical axis.
Use equal intervals Every interval from the lowest value to
highest value must be included even if it has a frequency of 0.
2. Include a title 3. For each interval, draw a bar whose height
is given by the frequencies.There is no space between the bars
To construct a histogram
Age Frequency
1-10 511-20
8
21-30
14
31-40
18
41-50
20
51-60
13
61-70
6
Ages of people who entered a storeDisplay the data in a histogram
Days 9 and 10
Pre-View of a box and whisker plot
1. What is the scale and interval of the number line below the plot?
2. Are you able to tell how many parts the data set is divided into?
3. What do you think the far right and far left points represent (at the end of the whiskers)?
4. What do you think the line inside the box represents?
Pre-View of a box and whisker plot
Where do you think the following labels should go on our box and whisker plot?Upper extreme, Lower extreme, second (lower)
quartile, third (upper) quartile, interquartile range, and median.
Pre-View of a box and whisker plot
Uses a number line to show the distribution of a set of data. It divides a set of data into four parts using the median and quartiles. A box is drawn around the quartile values, and whiskers extend from each quartile to the minimum and maximum values that are not outliers.
A Box and Whisker plot
Quartiles The median is the middle quartileThe median of the lower half of the data is the
lower quartile. The median of the upper half of the data is the
upper quartileThe lower extreme forms the lower whisker.The upper extreme forms the upper whisker.
Interquartile Range (IQR) is the difference between the upper quartile and the lower quartile
Key Features of a box plot
Step 1: Find the median, lower quartile, and
upper quartile.Step 2: Find the interquartile range.Step 3: Multiply the interquartile
range by 1.5.Step 4: Subtract the value from the
lower quartile and add the value
to the upper quartile.
Outliers – data more than 1.5 times the value of the interquartile range beyond the quartiles
Find any outliers in the data set.
Animal Speeds
Glencoe Pre-Algebra (2012) pg. 793
Step 1: median - 30 lower quartile
- 15 upper
quartile - 35
Step 2: interquartile range 35 – 15 = 20
Step 3: IQR x 1.520 x 1.5 =
30Step 4: lower quartile
– 30 15 – 30 = -15
upper quartile + 30
35 + 30 = 65
70 is an outlier
Animal Speed (mph)
Squirrel 12Turkey 15
Elephant 25Cat 30
Reindeer 32Rabbit 35
Cheetah 70
1. Organize the data from smallest to largest.2. Identify the lower extreme, lower quartile, median,
upper quartile, upper extreme.3. Find the interquartile range.4. Identify any outliers, use an asterisk(*) to indicate
an outlier. It is not connected to a whisker.5. Draw the number line, mark the scale, and label the scale.6. Draw the box plot.7. Give the box plot a title.
Steps to creating a box plot
Write 3-5 sentences summarizing the data displayed.
Pre-View of a box and whisker plot
The shortest hair measured was 9 cm.The lower quartile is 13 cm.The median is 25 cm.The upper quartile is 33 cm.The longest hair measured was 42 cm.The range is 42 cm – 9 cm or 33 cm.The interquartile range is 33 cm -13 cm = 20 cmThe middle half of the data lies between 33 and 13. So
the middle half of the girls measured had hair between 33cm and 13 cm. This is more variability in the lower half of the data indicating that more girls had hair lower than 25 cm than had hair longer than 25 cm.
Key Ideas
The box plot provides a summary of the data. It does not show the number of observations (so you can’t find the mean) nor does it indicate if a particular value was especially common (so you can’t find the mode).
Limitations of a box and whisker plot
1. The range is always, sometimes, or never affected by outliers. Justify your reasoning.
2. True or false. The interquartile range is affected by outliers of the data set. Explain your reasoning.
3. It is always, sometimes, or never possible for the mean, median, and mode to be equal? Justify your reasoning.
4. Can a data set have more than one median? Explain.
Glencoe Pre-Algebra (2012) pg. 778 and 796
Higher –Order Thinking Skills