Chapter 12: Statistics - websites.rcc.edu

32
Chapter 12: Statistics Diana Pell Section 12.1: Gathering and Organizing Data Data are measurements or observations that are gathered for an event under study. Statistics is the branch of mathematics that involves collecting, organizing, summarizing, and presenting data and drawing general conclusions from the data. A population consists of all subjects under study. A sample is a representative subgroup or subset of a population. In order to obtain a random sample, each subject of the population must have an equal chance of being selected. Descriptive and Inferential Statistics Statistical techniques used to describe data are called descriptive statistics. This is based on collecting, organizing, and reporting data without using the data to draw any wide-ranging conclusions. Statistical techniques used to make inferences are called inferential statistics. This is based on studying characteristics of a sample within a larger population and using them to draw con- clusions about the entire population. Exercise 1. Twenty-five volunteers for a medical research study were given a blood test to obtain their blood types. Data obtained is below. Construct a frequency distribution for the data. 1

Transcript of Chapter 12: Statistics - websites.rcc.edu

Page 1: Chapter 12: Statistics - websites.rcc.edu

Chapter 12: Statistics

Diana Pell

Section 12.1: Gathering and Organizing Data

Data are measurements or observations that are gathered for an event under study.

Statistics is the branch of mathematics that involves collecting, organizing, summarizing, andpresenting data and drawing general conclusions from the data.

A population consists of all subjects under study.

A sample is a representative subgroup or subset of a population.

In order to obtain a random sample, each subject of the population must have an equal chanceof being selected.

Descriptive and Inferential Statistics

Statistical techniques used to describe data are called descriptive statistics. This is basedon collecting, organizing, and reporting data without using the data to draw any wide-rangingconclusions.

Statistical techniques used to make inferences are called inferential statistics. This is basedon studying characteristics of a sample within a larger population and using them to draw con-clusions about the entire population.

Exercise 1. Twenty-five volunteers for a medical research study were given a blood test toobtain their blood types. Data obtained is below.

Construct a frequency distribution for the data.

1

Page 2: Chapter 12: Statistics - websites.rcc.edu

Another type of frequency distribution that can be constructed uses numerical data and is calleda grouped frequency distribution. In a grouped frequency distribution, the numerical dataare divided into classes.

When deciding on classes, here are some useful guidelines:

1. Try to keep the number of classes between 5 and 15.

2. Make sure the classes don’t overlap.

3. Don’t leave out any numbers between the lowest and highest, even if nothing falls into aparticular class.

4. Make sure the range of numbers included in a class is the same for each one.

Exercise 2. These data represent the record high temperatures for each of the 50 states indegrees Fahrenheit. Construct a grouped frequency distribution for the data using 7 classes.

2

Page 3: Chapter 12: Statistics - websites.rcc.edu

Exercise 3. In one math class, the data below represent the number of hours each studentspends on homework in an average week. Construct a grouped frequency distribution for thedata using six classes.

3

Page 4: Chapter 12: Statistics - websites.rcc.edu

Exercise 4. The data below show the number of games won by the Chicago Cubs in each ofthe 21 seasons from 1988 to 2011, with the exception of 1994, which was a short season becauseof a player strike. Draw a stem and leaf plot for the data.

Section 12.2: Picturing Data

Bar Graphs and Pie Charts

Exercise 5. The marketing firm Deloitte Retail conducted a survey of grocery shoppers. Thefrequency distribution below represents the responses to the survey question How often do youbring your own bags when grocery shopping?

Response Frequency

Always 10

Never 39

Frequently 19

Occasionally 32

a) Draw a vertical bar graph to represent the data.

4

Page 5: Chapter 12: Statistics - websites.rcc.edu

b) Draw a pie chart for the data.

Histograms

Exercise 6. One of the biggest business stories of 2012 was Facebook’s initial public offering ofstock, an event that didn’t live up to the hype. The frequency distribution to the right is for theclosing price of the stock in dollars for each of the first 26 business days the stock was traded.Draw a histogram for the data and analyze it.

Class Frequency

25.50 - 27.49 8

27.50 - 29.49 4

29.50 - 31.49 4

31.50 - 33.49 8

33.50 - 35.49 1

35.50 - 37.49 0

37.50 - 39.49 1

5

Page 6: Chapter 12: Statistics - websites.rcc.edu

Section 12.3: Measures of AverageThe Mean

Definition 1. The mean is the sum of the values in a data set divided by the number of values.If x1, x2, . . . , xn are the data values, we use X̄ to stand for the mean, and

X̄ =x1 + x2 + · · · + xn

n=

ΣX

n

Exercise 7. In 2003, there were 12 inmates on death row who were exonerated and freed. Inthe 8 years after that, there were 6, 2, 1, 3, 4, 9, 1, and 1. Find the mean number of death rowinmates proven innocent for the 9 years from 2003 through 2011.

Formula for Finding the Mean for Grouped Data:

X̄ =Σ(f · xm)

n

where f = frequencyxm = midpoint of each classn = Σf or sum of the frequencies (which is the total number of data values)

Exercise 8. Find the mean of closing price of Facebook stock for the first 26 days it was traded.The frequency distribution is below.

6

Page 7: Chapter 12: Statistics - websites.rcc.edu

Class Frequency

25.50 - 27.49 8

27.50 - 29.49 4

29.50 - 31.49 4

31.50 - 33.49 8

33.50 - 35.49 1

35.50 - 37.49 0

37.50 - 39.49 1

The Median

Definition 2. The median is the halfway point of a data set when it’s arranged in order.

Steps in Computing the Median of a Data Set

1. Arrange the data in order, from smallest to largest.

2. Select the middle value. If the number of data values is odd, the median is the value in theexact middle of the list. If the number of data values is even, the median is the mean of thetwo middle data values.

Exercise 9. The ages of seven members of a family living in a single household are 73, 43, 39,8, 6, 5, and 3. Find the median age. How does it compare to the mean?

Exercise 10. One of the authors of this book conducts a campus food drive at the end of eachsemester. The amount of food in pounds gathered over the last eight semesters is shown below.Find the median weight.

2, 171 2, 292 2, 650 2, 830 3, 150 3, 211 3, 301 3, 398 3, 693 4, 405

7

Page 8: Chapter 12: Statistics - websites.rcc.edu

The Mode

Definition 3. The value that occurs most often in a data set is called the mode.

Exercise 11. These data represent the duration (in days) of the final 20 U.S. space shuttlevoyages. Find the mode.

11 12 13 12 15 12 15 13 15 12 12 15 13 10 13 15 11 12 15 12

Exercise 12. The number of wins in a 16-game season for the Cincinnati Bengals from 1996?2011is listed below. Find the mode.

8 7 3 4 4 6 2 8 8 11 8 7 4 10 4 9

Exercise 13. A survey of the junior class at Fiesta State University shows the following numberof students majoring in each field. Find the mode.

Business 1,425Liberal arts 878Computer science 632Education 471General studies 95

The Midrange

Definition 4. (Finding the Midrange for a Data Set)

Midrange =lowest value + highest value

2

Exercise 14. The number of death row inmates exonerated for the years 2003 - 2011 was12, 6, 2, 1, 3, 4, 9, 1, and 1. Find the midrange.

8

Page 9: Chapter 12: Statistics - websites.rcc.edu

Exercise 15. The table below lists the number of golfers who finished the Masters tournamentwith a better score than Tiger Woods between 1997 and 2012.

Year Number1997 01998 71999 172000 42001 02002 02003 142004 212005 02006 22007 12008 12009 52010 32011 32012 39

Find

a) mean

b) median

c) mode

d) midrange

9

Page 10: Chapter 12: Statistics - websites.rcc.edu

Section 12.4: Measures of Variation

Range

Definition 5. The range of a data set is the difference between the highest and lowest valuesin the set.

Range = Highest value − Lowest value

Exercise 16. The first list below is the weights of the dogs in the first picture, and the secondis the weights of the dogs in the second picture. Find the mean and range for each list.

70 73 58 60

30 85 40 125 42 75 60 55

Procedure for Finding the Variance and Standard Deviation

1. Find the mean.

2. Subtract the mean from each data value in the data set.

3. Square the differences.

4. Find the sum of the squares.

5. Divide the sum by n− 1 to get the variance, where n is the number of data values.

6. Take the square root of the variance to get the standard deviation.

10

Page 11: Chapter 12: Statistics - websites.rcc.edu

Exercise 17. The heights in inches of the top six scorers for the Cleveland Cavaliers during the2011 - 2012 season are listed below. Find the variance and standard deviation.

75 81 83 78 81 74

Exercise 18. (You Try!) The heights in inches for the players in the starting lineup for theNew York Mets (professional baseball team.) on opening day 2012 are listed below. Find thevariance and standard deviation.

70 74 72 76 74 76 73 71 72

11

Page 12: Chapter 12: Statistics - websites.rcc.edu

Definition 6. The variance for a data set is an approximate average of the square of the distancebetween each value and the mean. If X represents individual values, X̄ is the mean and n is thenumber of values:

s2 =Σ(X − X̄)2

n− 1

The standard deviation (s) is the square root of the variance.

Note: Since standard deviation measures how far typical values are from the mean, its size tellsus how spread out the data are.

Exercise 19. The mean and standard deviation for heights of all adult males in the UnitedStates are 69.3 inches and 2.8 inches, respectively. What do the results of Exercise 11 tell us bycomparison?

The mean and standard deviation from Exercise 11 are inches and inches.

The top scorers for a professional basketball team tend to be a lot than average people.

The heights are spread out a bit than the population in general.

Exercise 20. Compare the results of Exercise 12 to the average and standard deviation for theadult male population. What do you think accounts for differences or similarities?

12

Page 13: Chapter 12: Statistics - websites.rcc.edu

Exercise 21. A professor has two sections of Math 115 this semester. The 8:30 a.m. class hasa mean score of 74% with a standard deviation of 3.6%. The 2 p.m. class also has a mean scoreof 74%, but a standard deviation of 9.2%. What can we conclude about the students’ averagesin these two sections?

The morning class has a standard deviation and the afternoon class has a .

In the morning class, most of the students probably have scores relatively the mean, withfew very high or very low scores. In the afternoon class, the scores vary , with a lot ofhigh scores and a lot of low scores that average out to a mean of 74%.

Exercise 22. For the dogs in the two pictures below, discuss what you think the standard de-viations might be in comparison to one another.

13

Page 14: Chapter 12: Statistics - websites.rcc.edu

Section 12.5: Measures of Position

A percentile, or percentile rank, of a data value indicates the percent of data values in a setthat are below that particular value.

If your percentile rank on SAT was 60%, that means that 60% of all students who took the SATscored lower than you did.

Exercise 23. Suppose you score 77 on a test in a class of 10 people, with the 10 scores listedbelow. What was your percentile rank?

93 82 64 75 98 52 77 88 90 71

Exercise 24. The weights in pounds for the 12 members of a college gymnastics team are below.Find the percentile rank of the gymnast who weighs 97 pounds.

101 120 88 72 75 80 98 91 105 97 78 85

Exercise 25. The number of words in each of the last 10 presidential inaugural addresses islisted below. Find the length that corresponds to the 30th percentile.

2406 2073 1571 2170 1507 2283 2546 2463 1087 1668

14

Page 15: Chapter 12: Statistics - websites.rcc.edu

Exercise 26. Two students are competing for one remaining spot in a law school class. Miguelranked 51st in a graduating class of 1,700, while Dustin ranked 27th in a class of 540. Whichstudent’s position was higher in his class?

Exercise 27. In the 2011 - 2012 school year, the University of Arkansas finished the seasonranked fifth out of 120 teams in football and ninth out of 297 teams in baseball. Based onpercentile rank, which team had the better ranking?

Another statistical measure we will study is the quartile, which divides a data set into quarters.The second quartile is the same as the median, and divides a data set into an upper half anda lower half. The first quartile is the median of the lower half, and the third quartile is themedian of the upper half. We use the symbols Q1, Q2, and Q3 for the first, second, and thirdquartiles respectively.

Exercise 28. The data below are the percentages of total electricity generated that comes fromnuclear power for the 12 nations most reliant on nuclear power. Find the quartiles.

74.1 51.8 51.2 48.1 42.1 39.4 38.1 38.0 37.333.3 33.1 32.2

15

Page 16: Chapter 12: Statistics - websites.rcc.edu

Exercise 29. The data below are the number of cattle on farms in the United States (in mil-lions) for each year that begins a decade from 1910 to 2010. Find the quartiles.

59.0 70.4 61.0 68.3 78.0 96.2 112.4 111.2 95.898.2 93.9

Exercise 30. Draw a box plot for the nuclear power data in Exercise 22, then use it to answersome questions about the data.

1. What does the position of the box tell you about the data set?

2. Find any outliers in the data set.

3. What information about the data set do the values inside the box represent?

16

Page 17: Chapter 12: Statistics - websites.rcc.edu

Exercise 31. Draw a box plot for the data in Exercise 23, then use it to answer these questions:

1. What does the position of the box tell you about the data?

2. Find any outliers. What does this tell you?

Section 12.6: The Normal Distribution

Definition 7. A normal distribution is a continuous, symmetric, bell-shaped distribution.

Shaped.jpg

1. It is bell-shaped.

2. The mean, median, and mode are all exactly the same, and are located at the center of thedistribution.

17

Page 18: Chapter 12: Statistics - websites.rcc.edu

3. It’s symmetric about its mean. In other words, if you draw a vertical line through the center,the graph is divided into two identical halves.

4. The curve is continuous; it has no gaps or holes, and extends from to + along the horizontalaxis.

5. The area under any portion of the curve is the percentage (in decimal form) of data valuesthat fall between the values that begin and end the region. (We’ll use this property a LOTin Section 12 -7.)

6. The total area under the entire curve is 1. This makes sense based on properties 4 and 5:100% of the data values are somewhere on the real number line.

The Empirical Rule

Definition 8. When data are normally distributed, approximately 68% of the values are within1 standard deviation of the mean, approximately 95% are within 2 standard deviations of themean, and approximately 99.7% are within 3 standard deviations of the mean (see Figure below).

18

Page 19: Chapter 12: Statistics - websites.rcc.edu

Rule.jpg

Exercise 32. According to the website answerbag.com, the mean height for male humans is 5feet 9.3 inches, with a standard deviation of 2.8 inches. If this is accurate, out of 1,000 randomlyselected men, how many would you expect to be between 5 feet 6.5 inches and 6 feet 0.1 inch?

The Standard Normal Distribution

Definition 9. The standard normal distribution is a normal distribution with mean 0 andstandard deviation 1.

19

Page 20: Chapter 12: Statistics - websites.rcc.edu

Note: To change any normal distribution into one with mean 0 and standard deviation 1, we’llfind z scores for given data values.

For a given data value from a data set that is normally distributed, we define that value’s zscore to be

z =Data value - mean

Standard deviation=x− µ

σ

Exercise 33. According to the website answerbag.com, the mean height for male humans is 5feet 9.3 inches, with a standard deviation of 2.8 inches. Find the z score for a man who is 6 feet4 inches tall.

Exercise 34. A standard test of intelligence is scaled so that the mean IQ is 100, and thestandard deviation is 15. Find the z score for a person with an IQ of 91.

20

Page 21: Chapter 12: Statistics - websites.rcc.edu

Finding Areas under the Standard Normal Distribution

Two Important Facts about the Standard Normal Curve

1. The area under any normal curve is divided into two equal halves at the mean. Each of thehalves has area 0.500.

2. The area between z = 0 and a positive z score is the same as the area between z = 0 and thenegative of that z score.

Exercise 35. Find the area under the standard normal distribution

a. Between z = 1.55 and z = 2.25.

b. Between z = - 0.60 and z = -1.35.

c. Between z = 1.50 and z = -1.75.

21

Page 22: Chapter 12: Statistics - websites.rcc.edu

Exercise 36. Find the area under the standard normal distribution

a. Between z = 2.05 and z = 2.40.

b. Between z = - 3.2 and z = - 2.0.

c. Between z = - 0.55 and z = 1.6.

22

Page 23: Chapter 12: Statistics - websites.rcc.edu

Exercise 37. Find the area under the standard normal distribution

a. To the right of z = 1.70.

b. To the right of z = 0.95.

Exercise 38. Find the area under the standard normal distribution

a. To the right of z = - 2.40.

23

Page 24: Chapter 12: Statistics - websites.rcc.edu

b. To the right of z = 0.25.

Exercise 39. Find the area under the standard normal distribution

a. To the left of z = - 2.20.

b. To the left of z = 1.95.

24

Page 25: Chapter 12: Statistics - websites.rcc.edu

Exercise 40. Find the area under the standard normal distribution

a. To the left of z = - 1.05

b. To the left of z = 0.1

Applications of the Normal Distribution

A quantity that can vary randomly from one individual trial to another, like the exact weight ofa package of Oreos, is called a random variable.

Exercise 41. If the weights of Oreos in a package are normally distributed with mean 518 gramsand standard deviation 4 grams, find the percentage of packages that will weigh less than 510grams.

25

Page 26: Chapter 12: Statistics - websites.rcc.edu

Exercise 42. Based on data compiled by the World Health Organization, the mean systolicblood pressure in the United States is 120, the standard deviation is 16, and the pressures arenormally distributed. Find each.

a. The percent of individuals who have a blood pressure between 120 and 128.

b. The percent of individuals who have a blood pressure above 132.

c. The percent of individuals who have a blood pressure between 112 and 116.

26

Page 27: Chapter 12: Statistics - websites.rcc.edu

d. The percent of individuals who have a blood pressure between 124 and 144.

e. The percent of individuals who have a blood pressure lower than 104

Probability and Area under a Normal Distribution

The area under a normal distribution between two data values is the probability that a randomlyselected data value is between those two values.

Exercise 43. Based on data in the 2012 Statistical Abstract of the United States, the averageAmerican generates 1,570 pounds of garbage per year. Let’s estimate that the number of poundsgenerated per person is approximately normally distributed with standard deviation 200 pounds.Find the probability that a randomly selected person generates

a. Between 1,250 and 2,050 pounds of garbage per year.

27

Page 28: Chapter 12: Statistics - websites.rcc.edu

b. More than 2,050 pounds of garbage per year.

Exercise 44. The Statistical Abstract also indicates that of the 1,570 pounds of garbage gen-erated by the average individual, 872 pounds will end up in a landfill. If these amounts areapproximately normally distributed with standard deviation 160 pounds, find the probabilitythat a randomly selected person generates

a. Less than 600 pounds that end up in a landfill.

b. Between 600 and 1,000 pounds that end up in a landfill.

28

Page 29: Chapter 12: Statistics - websites.rcc.edu

Exercise 45. The American Automobile Association reports that the average time it takes torespond to an emergency call is 25 minutes. If the response time is approximately normallydistributed and the standard deviation is 4.5 minutes:

a. If 80 calls are randomly selected, approximately how many will have response times less than15 minutes?

b. In what percentile is a response time of 30 minutes?

Exercise 46. The mean for a reading test given nationwide is 80, and the standard deviationis 8. The random variable is normally distributed. If 10,000 students take the test, find each.

a. The number of students who will score above 90

29

Page 30: Chapter 12: Statistics - websites.rcc.edu

b. The number of students who will score between 78 and 88

c. The percentile of a student who scores 94

d. The number of students who will score below 76

30

Page 31: Chapter 12: Statistics - websites.rcc.edu

Exercise 47. A random sample of 25 entres served by a college cafeteria is tested to find thenumber of calories per serving, with the data in the list below.

845 460 620 752 683 1,088 785575 580 755 720 650 512 945672 526 1,050 725 822 740 773812 880 911 910

a. Draw a histogram for the data using seven classes. Do the data appear to be normallydistributed?

b. Find the mean and standard deviation for the data.

c. What percentage of randomly selected entres from this cafeteria has more than 700 calories?

31

Page 32: Chapter 12: Statistics - websites.rcc.edu

32