Producing Data

31
Chapter 3: Numerical Summary Measures http://anengineersaspect.blogspot.com/2013_05_01_archive.html 1

description

Producing Data. http://www.cartoonstock.com/directory/d/data_gathering.asp. Anecdotal Data. A woman who was deaf from birth was hit by lightning and regained her hearing. Does this mean that lightning is a cure for deafness?. Observational Data. - PowerPoint PPT Presentation

Transcript of Producing Data

Page 1: Producing Data

1

Chapter 3: Numerical Summary Measures

http://anengineersaspect.blogspot.com/2013_05_01_archive.html

Page 2: Producing Data

2

Numerical Summary Measures: Goals• Describe the center of a distribution by:– mean– Median– mode

• Compare the mean and median• Describe the measure of spread:– range– Variance and standard deviation– Quartiles

• Be able to determine which summary statistics are appropriate for a given situation

• Empirical Rule and introduction to the normal distribution• Describe a distribution by a boxplot (five-number summary

and outliers)

Page 3: Producing Data

3

Definition

Measures of central tendency indicate where the majority of the data is centered, bunched or clustered.

Page 4: Producing Data

4

Notation

• lower case letters, x, y, z indicate the variables.• x1, x2, x3,….., xn refers to a set of fixed

observations of a variable.• n : This is the number of observations in a data

set which is called the sample size.

Page 5: Producing Data

5

Sample Mean

μ = population mean

Sample --> Latin lettersPopulation --> Greek letters

Page 6: Producing Data

6

Sample Mean: ExampleThe following data give the time in months from hire to

promotion to manager for a random sample of 20 software engineers from all software engineers employed by a large telecommunications firm.

a) What is the mean time for this sample?

b) Suppose that instead of x20 = 69, we had chosen another engineer that took 483 months to be promoted. what is the mean time for this new sample?

5 7 12 14 18 14 14 22 21 25

23 24 34 37 34 49 64 47 67 69

Page 7: Producing Data

7

Sample Median, x̃Procedure1. Sort n observations from smallest to largest2. If n is odd, is the centerx̃

If n is even, is the average of the two center x̃observations

Page 8: Producing Data

8

Sample Median: ExampleThe following data give the time in months from hire to

promotion to manager for a random sample of 20 software engineers from all software engineers employed by a large telecommunications firm.

a) What is the median time for this sample?

b) Suppose that instead of x20 = 69, we had chosen another engineer that took 483 months to be promoted. what is the median time for this new sample?

5 7 12 14 14 14 18 21 22 23

24 25 34 34 37 47 49 64 67 69

Page 9: Producing Data

9

Mean and Median

MeanMedian

Left skewMean Median

Right skewMeanMedian

Page 10: Producing Data

10

Mode, M

• The value with the greatest frequency.

Page 11: Producing Data

11

Sample Mode: ExampleThe following data give the time in months

from hire to promotion to manager for a random sample of 20 software engineers from all software engineers employed by a large telecommunications firm.

a) What is the mode for this sample?5 7 12 14 14 14 18 21 22 23

24 25 34 34 37 47 49 64 67 69

Page 12: Producing Data

12

Variability of Data

Set 1 -15 -10 -5 0 5 10 15Set 2 -15 -5 -1 0 1 5 15Set 3 -3 -2 -1 0 1 2 3

-20 -10 0 10 20

123

Page 13: Producing Data

13

Measures of Variability

• Sample range• Sample variance (sample standard deviation)• Interquartile Range (IQR)

Page 14: Producing Data

14

Measures of Variability

• Sample range• Sample variance (sample standard deviation)• Interquartile Range (IQR)

Page 15: Producing Data

15

Measures of Variability

• Sample range• Sample variance (sample standard deviation)• Interquartile Range (IQR)

Page 16: Producing Data

16

Sample Variance

2 = population variance

Page 17: Producing Data

17

Comments for Standard Deviation

• Variance is used to determine spread for comparisons.

• s2 = 0 means that all of the observations are the same, normally s > 0

• n = 1• s is not resistant to outliers• s has the same units of measurement as the

original observations

Page 18: Producing Data

18

Sample Standard Deviation: ExampleThe following data give the time in months from hire to

promotion to manager for a random sample of 20 software engineers from all software engineers employed by a large telecommunications firm.

a) What is the standard deviation for this sample?

b) Suppose that instead of x20 = 69, we had chosen another engineer that took 483 months to be promoted. what is the standard deviation for this new sample?

5 7 12 14 14 14 18 21 22 23

24 25 34 34 37 47 49 64 67 69

Page 19: Producing Data

19

Measures of Variability

• Sample range• Sample variance (sample standard deviation)• Interquartile Range (IQR)

Page 20: Producing Data

20

Quartiles

Q1 Q2 Q3

Page 21: Producing Data

21

Quartiles - Procedure1. Sort the values from lowest to highest and locate

the median.2. The first quartile, Q1 is the median of the lower half.

a. Compute d1 = n/4b. If d1 is an integer, then Q1 is the mean of the

observations at d1 and d1 + 1c. If d1 is not an integer, the Q1 is the observation at

3. The third quartile, Q3 is the median of the upper half.

a. Computer d2 = 3n/4.b. Repeat steps 2b and 2c.

Page 22: Producing Data

22

Quartiles: ExampleThe following data give the time in months from hire

to promotion to manager for a random sample of 19 software engineers from all software engineers employed by a large telecommunications firm.

a) Find the median and the quartiles.b) What is the Interquartile Range?c) Are there any outliers in this data set?

7 12 14 14 14 18 21 22 23

24 25 34 34 37 47 49 64 100 150

Page 23: Producing Data

23

OutliersAfter finding the IQR, find the two inner fences (low and high) and the two outer fences (low and high)

IFL= Q1 – 1.5(IQR) IFH = Q3 + 1.5 (IQR) mildOFL= Q1 – 3(IQR) OFH = Q3 + 3 (IQR) extreme

Page 24: Producing Data

24

Quartiles: ExampleThe following data give the time in months from hire

to promotion to manager for a random sample of 19 software engineers from all software engineers employed by a large telecommunications firm.

a) Find the median and the quartiles.b) What is the Interquartile Range?c) Are there any outliers in this data set?

7 12 14 14 14 18 21 22 23

24 25 34 34 37 47 49 64 100 150

Page 25: Producing Data

25

BoxplotsProcedure1. Find Q1, Q3, median and IQR2. Calculate IFL, IFH, OFL, OFH

3. Draw a central box from Q1 to Q3. Draw a line for the median.

4. Extend lines (whiskers) from the box to the minimum and maximum values that are not outliers.

5. Put in closed circles for mild outliers and open circles for extreme outliers.

Page 26: Producing Data

26

Boxplot: Example

160

140

120

100

80

60

40

20

0

Prom

otio

n

Boxplot of Promotion

Page 27: Producing Data

27

Distributions and Boxplots

Page 28: Producing Data

28

Side-by-side Boxplot: Example

Page 29: Producing Data

29

Choosing Measures of Center and Spread

Choices1. Mean and standard deviation2. Median and IQR

ALWAYS PLOT YOUR DATA!

http://freshspectrum.com/wp-content/uploads/2012/09/Hans-Rosling-Bubble-Plot-Cartoon.jpg

Page 30: Producing Data

30

Empirical Rule68-95-99.7 Rule

Page 31: Producing Data

31

z-score

• z-score is a measure of relative standing• Given a set of n observations, the sum of the

z-scores is 0.