Excursions in Modern Mathematics, 7e: 14.4 - 2Copyright © 2010 Pearson Education, Inc. 14...

20

Transcript of Excursions in Modern Mathematics, 7e: 14.4 - 2Copyright © 2010 Pearson Education, Inc. 14...

Excursions in Modern Mathematics, 7e: 14.4 - 2Copyright © 2010 Pearson Education, Inc.

14 Descriptive Statistics

14.1 Graphical Descriptions of Data

14.2 Variables

14.3 Numerical Summaries

14.4 Measures of Spread

Excursions in Modern Mathematics, 7e: 14.4 - 3Copyright © 2010 Pearson Education, Inc.

An obvious approach to describing the spread of a data set is to take the difference between the highest and lowest values of the data. This difference is called the range of the data set and usually denoted by R. Thus, R = Max – Min. The range of a data set is a useful piece of information when there are no outliers in the data. In the presence of outliers the range tells a distorted story.

The Range

Excursions in Modern Mathematics, 7e: 14.4 - 4Copyright © 2010 Pearson Education, Inc.

For example, the range of the test scores in the Stat 101 exam is 24 – 1 = 23 points, an indication of a big spread within the scores (i.e., a very heterogeneous group of students). True enough, but if we discount the two outliers, the remaining 73 test scores would have a much smaller range of 16 – 6 = 10 points.

The Range

Excursions in Modern Mathematics, 7e: 14.4 - 5Copyright © 2010 Pearson Education, Inc.

To eliminate the possible distortion caused by outliers, a common practice when measuring the spread of a data set is to use the interquartile range, denoted by the acronym IQR. The interquartile range is the difference between the third quartile and the first quartile (IQR = Q3 – Q1), and it tells us how spread out the middle 50% of the data values are. For many types of real-world data, the interquartile range is a useful measure of spread.

The Interquartile Range

Excursions in Modern Mathematics, 7e: 14.4 - 6Copyright © 2010 Pearson Education, Inc.

The five-number summary for the 2007 SAT math scores was Min = 200 (yes, there were a few jokers who missed every question!), Q1

= 430, M = 590, Max = 800 (there are still a few geniuses around!). It follows that the 2007 SAT math scores had a range of 600 points (800 – 200 = 600) and an interquartile range of 160 points (IQR = 590 – 430 = 160).

Example 14.18 2007 SAT Math Scores: Part 3

Excursions in Modern Mathematics, 7e: 14.4 - 7Copyright © 2010 Pearson Education, Inc.

The most important and most commonly used measure of spread for a data set is the standard deviation. The key concept for understanding the standard deviation is the concept of deviation from the mean. If A is the average of the data set and x is an arbitrary data value, the difference x – A is x’s deviation from the mean. The deviations from the mean tell us how “far” the data values are from the average value of the data. The idea is to use this information to figure out how spread out the data is.

Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 8Copyright © 2010 Pearson Education, Inc.

The deviations from the mean are themselves a data set, which we would like to summarize. One way would be to average them, but if we do that, the negative deviations and the positive deviations will always cancel each other out so that we end up with an average of 0. This, of course, makes the average useless in this case. The cancellation of positive and negative deviations can be avoided by squaring each of the deviations.

Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 9Copyright © 2010 Pearson Education, Inc.

The squared deviations are never negative, and if we average them out, we get an important measure of spread called the variance, denoted by V. Finally, we take the square root of the variance and get the standard deviation, denoted by the Greek letter (and sometimes by the acronym SD).

The following is an outline of the definition of the standard deviation of a data set.

Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 10Copyright © 2010 Pearson Education, Inc.

■ Let A denote the mean of the data set. For each number x in the data set, compute its deviation from the mean (x – A) and square each of these numbers. These numbers are called the squared deviations.

■ Find the average of the squared deviations. This number is called the variance V.

■ The standard deviation is the square

root of the variance

THE STANDARD DEVIATION OF A DATA SET

= V( ).

Excursions in Modern Mathematics, 7e: 14.4 - 11Copyright © 2010 Pearson Education, Inc.

Over the course of the semester, Angela turned in all of her homework assignments. Her grades in the 10 assignments (sorted from lowest to highest) were 85, 86, 87, 88, 89, 91, 92, 93, 94, and 95. Our goal in this example is to calculate the standard deviation of this data set the old-fashioned way (i.e., doing our own grunt work).

The first step is to find the mean A of the data set. It’s not hard to see that A = 90.

Example 14.19 Calculation of a SD

Excursions in Modern Mathematics, 7e: 14.4 - 12Copyright © 2010 Pearson Education, Inc.

The second step is to calculate the deviations from the mean and then the squared deviations. When we average the squared deviations, we get 11. This means that the variance isV = 11 and thus the standard deviation (rounded to one decimal place) is

Example 14.19 Calculation of a SD

= 11 ≈3.3 points.

Excursions in Modern Mathematics, 7e: 14.4 - 13Copyright © 2010 Pearson Education, Inc.

It is clear from just a casual look at Angela’s homework scores that she was pretty consistent in her homework, never straying too much above or below her average score of 90 points. The standard deviation is, in effect, a way to measure this degree of consistency (or lack thereof). A small standard deviation tells us that the data are consistent and the spread of the data is small, as is the case with Angela’s homework scores.

Interpreting the Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 14Copyright © 2010 Pearson Education, Inc.

The ultimate in consistency within a data set is when all the data values are the same (like Angela’s friend Chloe, who got a 20 in every homework assignment). When this happens the standard deviation is 0.

Interpreting the Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 15Copyright © 2010 Pearson Education, Inc.

On the other hand, when there is a lot of inconsistency within the data set, we are going to get a large standard deviation. This is illustrated by Angela’s other friend, Tiki, whose homework scores were 5, 15, 25, 35, 45, 55, 65, 75, 85, and 95. We would expect the standard deviation of this data set to be quite large–in fact, it is almost 29 points.

Interpreting the Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 16Copyright © 2010 Pearson Education, Inc.

The standard deviation is arguably the most important and frequently used measure of data spread.Yet it is not a particularly intuitive concept. Here are a few basic guidelines that recap our preceding discussion:

Summary of the Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 17Copyright © 2010 Pearson Education, Inc.

■ The standard deviation of a data set is measured in the same units as the original data. For example, if the data are points on a test, then the standard deviation is also given in points. Conversely, if the standard deviation is given in dollars, then we can conclude that the original data must have been money–some prices, salaries, or something like that. For sure, the data couldn’t have been test scores on an exam.

Summary of the Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 18Copyright © 2010 Pearson Education, Inc.

■ It is pointless to compare standard deviations of data sets that are given in different units. Even for data sets that are given in the same units–say, for example, test scores–the underlying scale should be the same. We should not try to compare standard deviations for SAT scores measured on a scale of 200–800 points with standard deviations of a set of homework assignments measured on a scale of 0–100 points.

Summary of the Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 19Copyright © 2010 Pearson Education, Inc.

■ For data sets that are based on the same underlying scale, a comparison of standard deviations can tell us something about the spread of the data. If the standard deviation is small, we can conclude that the data points are all bunched together–there is very little spread. As the standard deviation increases, we can conclude that the data points are beginning to spread out.

Summary of the Standard Deviation

Excursions in Modern Mathematics, 7e: 14.4 - 20Copyright © 2010 Pearson Education, Inc.

The more spread out they are, the larger the standard deviation becomes. A standard deviation of 0, means that all data values are the same.

Summary of the Standard Deviation

As a measure of spread, the standard deviation is particularly useful for analyzing real-life data.