Ch 4: Describing Data: Displaying and Exploring Data Goals

38
1 1. Develop and interpret a stem-and-leaf display 2. Develop and interpret a: 1. Dot plot 3. Develop and interpret quartiles, deciles, and percentiles 4. Develop and interpret a: 1. Box plots 5. Compute and understand the: Coefficient of Variation Coefficient of Skewness 6. Draw and interpret a scatter diagram 7. Set up and interpret a contingency table Ch 4: Describing Data: Displaying and Exploring Data Goals

description

Ch 4: Describing Data: Displaying and Exploring Data Goals. Develop and interpret a stem-and-leaf display Develop and interpret a: Dot plot Develop and interpret quartiles, deciles, and percentiles Develop and interpret a: Box plots Compute and understand the: Coefficient of Variation - PowerPoint PPT Presentation

Transcript of Ch 4: Describing Data: Displaying and Exploring Data Goals

Page 1: Ch 4: Describing Data: Displaying and Exploring Data Goals

1

1. Develop and interpret a stem-and-leaf display2. Develop and interpret a:

1. Dot plot3. Develop and interpret quartiles, deciles, and

percentiles4. Develop and interpret a:

1. Box plots5. Compute and understand the:

• Coefficient of Variation• Coefficient of Skewness

6. Draw and interpret a scatter diagram7. Set up and interpret a contingency table

Ch 4: Describing Data:Displaying and Exploring Data Goals

Page 2: Ch 4: Describing Data: Displaying and Exploring Data Goals

2

Note: Advantages of the stem-and-leaf display over a frequency distribution:

1. We do not lose the identity of each observation

2. We can see the distribution

Stem-and-leaf Displays

Stem-and-leaf display: A statistical technique for displaying a set of data. Each numerical value is divided into two parts: the leading digits become the stem and the trailing digits the leaf.

Page 3: Ch 4: Describing Data: Displaying and Exploring Data Goals

3

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12

Stock prices on twelveconsecutive days for a major

publicly traded company

8 6 , 7 9 , 9 2 , 8 4 , 6 9 , 8 8 , 9 1

8 3 , 9 6 , 7 8 , 8 2 , 8 5 .

Page 4: Ch 4: Describing Data: Displaying and Exploring Data Goals

4

stem leaf

6 9

7 8 9

8 2 3 4 5 6 8

9 1 2 6

Stem and leaf display of stock prices

60 up to 70 170 up to 80 280 up to 90 690 up to 100 3

Totals 12

Compare to:

Leading digit(s) along

vertical axis

Trailing digit(s) along

horizontal axis

Page 5: Ch 4: Describing Data: Displaying and Exploring Data Goals

5

Stem-and-Leaf – Example

Suppose the seven observations in the 90 up to 100 class are: 96, 94, 93, 94, 95, 96, and 97.

The stem value is the leading digit or digits, in this case 9. The leaves are the trailing digits. The stem is placed to the left of a vertical line and the leaf values to the right. The values in the 90 up to 100 class would appear as

Then, we sort the values within each stem from smallest to largest. Thus, the second row of the stem-and-leaf display would appear as follows:

Page 6: Ch 4: Describing Data: Displaying and Exploring Data Goals

6

Stem-and-leaf: Another Example

Listed in Table 4–1 is the number of 30-second radio advertising spots purchased by each of the 45 members of the Greater Buffalo Automobile Dealers Association last year. Organize the data into a stem-and-leaf display. Around what values do the number of advertising spots tend to cluster? What is the fewest number of spots purchased by a dealer? The largest number purchased?

Page 7: Ch 4: Describing Data: Displaying and Exploring Data Goals

7

Stem-and-leaf: Another Example

Page 8: Ch 4: Describing Data: Displaying and Exploring Data Goals

8

Dot Plots

A dot plot groups the data as little as possible and the identity of an individual observation is not lost.

To develop a dot plot, each observation is simply displayed as a dot along a horizontal number line indicating the possible values of the data.

If there are identical observations or the observations are too close to be shown individually, the dots are “piled” on top of each other.

Page 9: Ch 4: Describing Data: Displaying and Exploring Data Goals

9

Dot plots: Report the details of each observation Are useful for comparing two or more data sets

Dot Plot

Page 10: Ch 4: Describing Data: Displaying and Exploring Data Goals

10

Percentage of men participating

In the labor force for the 50 states.

Percentage of women participating

In the labor force for the 50 states.

Page 11: Ch 4: Describing Data: Displaying and Exploring Data Goals

11

Quartiles, Deciles, Percentiles(Measures Of Dispersion)

• Quartiles divide a set of data into four equal parts (three points)– Each interval contains 1/4 of the scores

• Deciles divide a set of data into ten equal parts (nine points)– Each interval contains 1/10 of the scores

• Percentiles divide a set of data into 100 equal parts (99 points)– Each interval contains 1/100 of the scores

• GPA in the 43rd percentile means that 43% of the students have a GPA lower and 57% of the students have a GPA higher

Page 12: Ch 4: Describing Data: Displaying and Exploring Data Goals

12

Location Of A Percentile In An Ordered Array

( 1)100p

PL n

Desired Percentile = PNumber of Observations = nLocation (or position) of the Desired Percentile value = Lp

If P = 43rd Percentile, then = L43

Parts of 100 = P/100

(n + 1)*(P/100) = (n + 1)/2this is to find the value that marks the 50th percentile = (n + 1)/2

If there are an even number of observation, your Lp may be between two numbers

In this case, you must estimate the number that you will report as the percentile

If Lp = 4.25, and the distance between the two numbers is 37, Lower number + .25(37) = percentile

Page 13: Ch 4: Describing Data: Displaying and Exploring Data Goals

13

Compute And Interpret Quartiles

• Quartiles– Quartiles divide a set of data into four equal parts

– Value Q1, Value Q2 (Median), Value Q3 , are the three marking points that divide the data into four parts

• 25% of the values occur below Value Q1

• 50% of the values occur below Value Q2

• 75% of the values occur below Value Q3

• Value Q1 is the median of the lower half of the data

• Value Q2 is the median of all the data

• Value Q3 is the median of the upper half of the data

Page 14: Ch 4: Describing Data: Displaying and Exploring Data Goals

14

$1,460 $1,471 $1,637 $1,721 $1,758 $1,787 $1,940 $2,038$2,047 $2,054 $2,097 $2,205 $2,287 $2,311 $2,406 $2,412

Commisions earned last month by a sample of 16 brokers at Salomon Smith Barney's

Oakland, California Office

n = 16P = 25 (n + 1)/*(25/100) = 4.25P = 50 (n + 1)/*(50/100) = 8.5P = 75 (n + 1)/*(75/100) = 12.75

L25 = 4.25

Range (upper-lower) = $37.004.25 - 4 = 0.25

37 * 0.25 = $9.25Add 9.25 to 1721 = $1,730.25

25th Percentile = $1,730.25

25th Percentile

L50 = 8.5

Range (upper-lower) = $9.008.5 - 8 = 0.509 * 0.5 = 4.5

Add 4.5 to 2038 = $2,042.5050th Percentile = $2,042.50

75th Percentile

L75 = $12.75

Range (upper-lower) = $82.0012.75 - 12 = 0.7582 * 0.75 = 61.5

Add 61.5 to 2205 = $2,266.5075th Percentile = $2,266.50

50th Percentile

Page 15: Ch 4: Describing Data: Displaying and Exploring Data Goals

15

Example 2: Using twelve stock prices, we can find the median, 25th, and 75th percentiles as follows:

L 5 0 = (1 2 + 1 ) 5 01 0 0 = 6 .5 0 th o b se rv a tio n

L 2 5 = (1 2 + 1 ) 2 51 0 0

= 3 .2 5 th o b se rv a tio n

L 7 5 = (1 2 + 1 ) 7 51 0 0

= 9 .7 5 th o b se rv a tio n

Quartile 1

Quartile 3

Median

Page 16: Ch 4: Describing Data: Displaying and Exploring Data Goals

16

969291888685848382797869

121110987654321

25th percentilePrice at 3.25 observation = 79 + .25(82-79) = 79.75

50th percentile: MedianPrice at 6.50 observation = 85 + .5(85-84) = 84.50

75th percentilePrice at 9.75 observation = 88 + .75(91-88) = 90.25

Q1

Q2

Q3

Q4

Page 17: Ch 4: Describing Data: Displaying and Exploring Data Goals

17

Five pieces of data are needed to construct a box plot:

1. Minimum Value

2. First Quartile

3. Median

4. Third Quartile

5. Maximum Value

A box plot is a graphical display, based on quartiles, that helps to picture a set of

data.

Page 18: Ch 4: Describing Data: Displaying and Exploring Data Goals

18

Boxplot - Example

Page 19: Ch 4: Describing Data: Displaying and Exploring Data Goals

19

Boxplot Example

Page 20: Ch 4: Describing Data: Displaying and Exploring Data Goals

20

Outliers

• Outlier > Q3 + 1.5*(Q3 – Q1)

• Outlier < 0

Page 21: Ch 4: Describing Data: Displaying and Exploring Data Goals

21

Coefficient Of Variation• Coefficient Of Variation converts the standard

deviation to standard deviations per unit of mean

• Use Coefficient Of Variation to compare:• Data in different units• Data in the same units, but the means are far

apart• s = Standard Deviation• Xbar = Sample Mean

(100%)s

CVX

Page 22: Ch 4: Describing Data: Displaying and Exploring Data Goals

22

Example 1 of Coefficient of Variance

Mean bonus paid per employee = $2,000 Mean years worked = 15Standard deviation = $755 Standard deviation = 3

Coefficient of Variance = 38% Coefficient of Variance = 20%

There is more dispersion relative to the mean in the distribution of bonus paidcompared with the distribution of years worked*Notice here that the units cancel out

Compare Relative Dispersion in Two Distributions

Page 23: Ch 4: Describing Data: Displaying and Exploring Data Goals

23

Page 24: Ch 4: Describing Data: Displaying and Exploring Data Goals

24

Skewness - Formulas for Computing

The coefficient of skewness can range from -3 up to 3. – Measures the lack of symmetry in a distribution– A value near -3, such as -2.57, indicates considerable negative skewness. – A value such as 1.63 indicates moderate positive skewness. – A value of 0, which will occur when the mean and median are equal, indicates

the distribution is symmetrical and that there is no skewness present.

In Excel use the SKEW function

Page 25: Ch 4: Describing Data: Displaying and Exploring Data Goals

25

Commonly Observed Shapes

Page 26: Ch 4: Describing Data: Displaying and Exploring Data Goals

26

Skewness – An Example

Following are the earnings per share for a sample of 15 software companies for the year 2005. The earnings per share are arranged from smallest to largest.

Compute the mean, median, and standard deviation. Find the coefficient of skewness using Pearson’s estimate. What is your conclusion regarding the shape of the distribution?

Page 27: Ch 4: Describing Data: Displaying and Exploring Data Goals

27

Skewness – An Example Using Pearson’s Coefficient

017.122.5$

)18.3$95.4($3)(3

22.5$115

))95.4$40.16($...)95.4$09.0($

1

95.4$15

26.74$

222

s

MedianXsk

n

XXs

n

XX

The skew is moderately positive. This means that a few large values are pulling the mean up, above the median and mode.

Page 28: Ch 4: Describing Data: Displaying and Exploring Data Goals

28

X ($000) X2 (X-Xbar)/s ((X-Xbar)/s)^3$26.00 676 -1.2114 -1.7778$28.00 784 -0.7067 -0.3529$31.00 961 0.0505 0.0001$33.00 1089 0.5552 0.1712$36.00 1296 1.3124 2.2603

Total 154 4806 0.000000000000000000 0.300923681119145000

n = 5Mean = 30.8Median = 31Mode = nones = 3.962322551sk = -0.15142634sk soft = 0.125384867sk soft = 0.125384867

sk

sk softThe data is skewed slightly positive The bell shape is leaning slightly toward the y-axis. There are a few small values that are pulling the mean up away from the median and mode.

Sample of 5 starting salaries

The data is skewed slightly negative. The bell shape is leaning slightly away from the y-axis. There are a few small values that are pulling the mean down away from the median and mode.

Page 29: Ch 4: Describing Data: Displaying and Exploring Data Goals

29

Describing Relationship between Two Variables

One graphical technique we use to show the relationship between variables is called a scatter diagram.

To draw a scatter diagram we need two variables. We scale one variable along the horizontal axis (X-axis) of a graph and the other variable along the vertical axis (Y-axis).

Page 30: Ch 4: Describing Data: Displaying and Exploring Data Goals

30

Describing Relationship between Two Variables – Scatter Diagram Examples

Page 31: Ch 4: Describing Data: Displaying and Exploring Data Goals

31

Scatter Diagram Independent Variable (X)

– The Independent Variable provides the basis for estimation– It is the predictor variable

Dependent Variable– The Dependent Variable is the variable being

predicted or estimated

Scatter Diagrams– Visual portrayal of the relationship between two variables– A chart that portrays the relationship between the two variables

• X axis (properly labeled – name and units)• Y axis (properly labeled– name and units)

– Scatter diagram requires both variables to be at least interval scale

Page 32: Ch 4: Describing Data: Displaying and Exploring Data Goals

32

In the Introduction to Chapter 2 we presented data from AutoUSA. In this case the information concerned the prices of 80 vehicles sold last month at the Whitner Autoplex lot in Raytown, Missouri. The data shown include the selling price of the vehicle as well as the age of the purchaser.

Is there a relationship between the selling price of a vehicle and the age of the purchaser? Would it be reasonable to conclude that the more expensive vehicles are purchased by older buyers?

Describing Relationship between Two Variables – Scatter Diagram Excel Example

Page 33: Ch 4: Describing Data: Displaying and Exploring Data Goals

33

Describing Relationship between Two Variables – Scatter Diagram Excel Example

Page 34: Ch 4: Describing Data: Displaying and Exploring Data Goals

34

Contingency Tables

A scatter diagram requires that both of the variables be at least interval scale.

What if we wish to study the relationship between two variables when one or both are nominal or ordinal scale? In this case we tally the results in a contingency table.

Page 35: Ch 4: Describing Data: Displaying and Exploring Data Goals

35

A contingency table is a cross tabulation that simultaneously summarizes two variables of interest.

A contingency table is used to classify observations according to two identifiable characteristics.

Contingency tables are used when one or both variables are nominally or ordinally scaled.

Page 36: Ch 4: Describing Data: Displaying and Exploring Data Goals

36

Weight Loss45 adults, all 60 pounds overweight, are randomly assigned to three weight loss programs. Twenty weeks into the program, a researcher gathers data on weight loss and divides the loss into three categories: less than 20 pounds, 20 up to 40 pounds, 40 or more pounds. Here are the results.

Contingency Tables – Example 1

Page 37: Ch 4: Describing Data: Displaying and Exploring Data Goals

37

Weight

Loss

Plan

Less than 20 pounds

20 up to 40

pounds

40 pounds or more

Plan 1 4 8 3

Plan 2 2 12 1

Plan 3 12 2 1

Compare the weight loss under the three plans.

Contingency Tables – Example 1

Page 38: Ch 4: Describing Data: Displaying and Exploring Data Goals

38

Contingency Tables – Example 2

A manufacturer of preassembled windows produced 50 windows yesterday. This morning the quality assurance inspector reviewed each window for all quality aspects. Each was classified as acceptable or unacceptable and by the shift on which it was produced. Thus we reported two variables on a single item. The two variables are shift and quality. The results are reported in the following table.