Describing data with graphics and numbers

68
Describing data with graphics and numbers QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

description

Describing data with graphics and numbers. Types of Data. Categorical Variables also known as class variables, nominal variables Quantitative Variables aka numerical nariables either continuous or discrete. Graphing categorical variables. - PowerPoint PPT Presentation

Transcript of Describing data with graphics and numbers

Page 1: Describing data with graphics and numbers

Describing datawith graphicsand numbers

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 2: Describing data with graphics and numbers

Types of Data

•Categorical Variables –also known as class variables, nominal variables

•Quantitative Variables –aka numerical nariables

–either continuous or discrete.

Page 3: Describing data with graphics and numbers

Graphing categorical variables

Page 4: Describing data with graphics and numbers

Ten most common causes of death in Americans between 15 and 19 years old in 1999.

Page 5: Describing data with graphics and numbers

Bar graphs

Page 6: Describing data with graphics and numbers

Graphing numerical variables

Page 7: Describing data with graphics and numbers

Heights of BIOL 300 students (cm)

165 168 163 173 170 163 170 155 152 190 170 168 142 160 154 165 156 177 173 165 165 175

155 166 168 165 180 165

Page 8: Describing data with graphics and numbers

Stem-and-leaf plot

Page 9: Describing data with graphics and numbers

Stem-and-leaf plot

191817161514

000 0 0 3 3 5 70 3 3 5 5 5 5 5 5 6 8 8 82 4 5 5 6 2

Page 10: Describing data with graphics and numbers

Frequency table

Height Group

Frequency

141-150

151-160

161-170

171-180

181-190

Page 11: Describing data with graphics and numbers

Frequency table

Height Group

Frequency

141-150 1

151-160 6

161-170 15

171-180 5

181-190 1

Page 12: Describing data with graphics and numbers

Histogram

Page 13: Describing data with graphics and numbers

Histogram

Page 14: Describing data with graphics and numbers

HistogramFrequency distribution

Page 15: Describing data with graphics and numbers

Histogram with more data

Page 16: Describing data with graphics and numbers
Page 17: Describing data with graphics and numbers

150 160 170 180 190 200 210

0.2

0.4

0.6

0.8

1

Cumulative

Frequency

Height (in cm) of Bio300 Students

Cumulative Frequency Distribution

Page 18: Describing data with graphics and numbers

150 160 170 180 190 200 210

0.2

0.4

0.6

0.8

1

Cumulative

Frequency

Height (in cm) of Bio300 Students

Cumulative Frequency Distribution

90th percentile50th percentile(median)

Page 19: Describing data with graphics and numbers

Associations between two categorical variables

Page 20: Describing data with graphics and numbers

Association between reproductive effort and avian

malariaTable 2.3A. Contingency table showing incidence of

malaria in female great tits subjected to experimental

egg removal.

contro lgroup

egg removalgroup

rowtotal

malaria 7 15 22nomalaria

28 15 43

columntotal

35 30 65

Page 21: Describing data with graphics and numbers

Association between reproductive effort and avian

malariaTable 2.3A. Contingency table showing incidence of

malaria in female great tits subjected to experimental

egg removal.

contro lgroup

egg removalgroup

rowtotal

malaria 7 15 22nomalaria

28 15 43

columntotal

35 30 65

Page 22: Describing data with graphics and numbers

Mosaic plot

Control Egg removal

0.0

0.2

0.4

0.6

0.8

1.0

Treatment

Relative frequency

Figure 2.3B. Mosaic plot for reproductive effort and avian malariain great tits (Table 2.3A). Blue fill indicates diseased birds whereasthe white fill indicates birds free of malaria. n = 65 birds.

Page 23: Describing data with graphics and numbers

Grouped Bar Graph

Malaria No malaria Malaria No malaria

0

5

10

15

20

25

Control Egg removal

Page 24: Describing data with graphics and numbers

Associations between categorical and numerical

variables

Page 25: Describing data with graphics and numbers

Multiple histograms

0 200 400 600 800 1000

0

200

400

600

0

200

400

600

Non-conserved

0 200 400 600 800 1000

Protein length

Conserved

Page 26: Describing data with graphics and numbers

Associations between two numerical variables

Page 27: Describing data with graphics and numbers

Scatterplots

Page 28: Describing data with graphics and numbers

Scatterplots

Page 29: Describing data with graphics and numbers

Evaluating Graphics

• Lie factor

• Chartjunk

• EfficiencyQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 30: Describing data with graphics and numbers

Don’t mislead with graphics

Page 31: Describing data with graphics and numbers

Better representation of truth

Page 32: Describing data with graphics and numbers

Lie Factor

• Lie factor = size of effect shown in graphic

size of effect in data

Page 33: Describing data with graphics and numbers

Lie Factor Example

Effect in graphic: 2.33/0.08= 29.1

Effect in data: 6748/5844= 1.15

Lie factor = 29.1 / 1.15= 25.3

Page 34: Describing data with graphics and numbers

ChartjunkChartjunk

Page 35: Describing data with graphics and numbers

0 50 100

1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

NorthWestEast

Page 36: Describing data with graphics and numbers

Needless 3D Graphics

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 37: Describing data with graphics and numbers

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 38: Describing data with graphics and numbers

Summary: Graphical methods for frequency distributions

Type of Data MethodCategorical data Bar graph

Numerical dataHistogram

Cumulative frequency distribution

Page 39: Describing data with graphics and numbers

Summary: Associations between variables

Explanatory variableResponse variable Categorical Numerical

CategoricalContingency tableGrouped bar graph

Mosaic plot

NumericalMultiple histograms

Cumulative frequency distributionsScatter plot

Page 40: Describing data with graphics and numbers

Great book on graphics

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 41: Describing data with graphics and numbers

Describing data

Page 42: Describing data with graphics and numbers

Two common descriptions of data

• Location (or central tendency)

• Width (or spread)

Page 43: Describing data with graphics and numbers

Measures of location

Mean

Median

Mode

Page 44: Describing data with graphics and numbers

Mean

Y =

Yi

i=1

n

∑n

n is the size of the sample

Page 45: Describing data with graphics and numbers

Mean

Y1=56, Y2=72, Y3=18, Y4=42

Page 46: Describing data with graphics and numbers

Mean

Y1=56, Y2=72, Y3=18, Y4=42

= (56+72+18+42) / 4 = 47

Y

Page 47: Describing data with graphics and numbers

Median

• The median is the middle measurement in a set of ordered data.

Page 48: Describing data with graphics and numbers

The data:

18 28 24 25 36 14 34

Page 49: Describing data with graphics and numbers

The data:

18 28 24 25 36 14 34

can be put in order:

14 18 24 25 28 34 36

Median is 25.

Page 50: Describing data with graphics and numbers

0.0

2.5

5.0

7.5

10.0

12.5

5 6 7 8 9 10 11 12 13 14 15 16 17 18

Frequency

Mouse weight at 50 days old, in

a line selected for small size

Mean

Mode

Median

Page 51: Describing data with graphics and numbers

Mean vs. median in politics

• 2004 U.S. Economy

• Republicans: times are good– Mean income increasing ~ 4% per year

• Democrats: times are bad– Median family income fell

• Why?

Page 52: Describing data with graphics and numbers

Mean 169.3 cm

Median 170 cm

Mode 165-170 cm

Page 53: Describing data with graphics and numbers

150 160 170 180 190 200 210

0.2

0.4

0.6

0.8

1

Cumulative

Frequency

Height (in cm) of Bio300 Students

Page 54: Describing data with graphics and numbers

Measures of width

• Range

• Standard deviation

• Variance

• Coefficient of variation

Page 55: Describing data with graphics and numbers

Range

14 17 18 20 22 22 24 25 26 28 28 28 30 34 36

Page 56: Describing data with graphics and numbers

Range

14 17 18 20 22 22 24 25 26 28 28 28 30 34 36

The range is 36-14 = 22

Page 57: Describing data with graphics and numbers
Page 58: Describing data with graphics and numbers

Population Variance

σ 2 =

Yi − μ( )2

i=1

N

∑N

Page 59: Describing data with graphics and numbers

Sample variance

s2 =

Yi −Y ( )2

i=1

n

∑n −1

n is the sample size

Page 60: Describing data with graphics and numbers

Shortcut for calculating sample variance

s2 =n

n −1

⎝ ⎜

⎠ ⎟

Yi2

i=1

n

∑n

−Y 2

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Page 61: Describing data with graphics and numbers

Standard deviation (SD)

• Positive square root of the variance

σ is the true standard deviations is the sample standard deviation

Page 62: Describing data with graphics and numbers

In class exercise

Calculate the variance and standard deviation of a sample

with the following data:

6, 1, 2

Page 63: Describing data with graphics and numbers

Answer

Variance=7Standard deviation =

7

Page 64: Describing data with graphics and numbers

Coefficient of variance (CV)

CV = 100 s / .

Y

Page 65: Describing data with graphics and numbers

Equal means, different variances

-5 0 5 10

0.1

0.2

0.3

0.4

Value

Frequency

V = 1

V=2

V=10

Page 66: Describing data with graphics and numbers

Manipulating means

• The mean of the sum of two variables:

E[X + Y] = E[X]+ E[Y]

• The mean of the sum of a variable and a constant:

E[X + c] = E[X]+ c

• The mean of a product of a variable and a constant:

E[c X] = c E[X]

• The mean of a product of two variables:

E[X Y] = E[X] E[Y]

if and only if X and Y are independent.

Page 67: Describing data with graphics and numbers

Manipulating variance

• The variance of the sum of two variables:

Var[X + Y] = Var[X]+ Var[Y]

if and only if X and Y are independent.

• The variance of the sum of a variable and a constant:

Var[X + c] = Var[X]

• The variance of a product of a variable and a constant:

Var[c X] = c2 Var[X]

Page 68: Describing data with graphics and numbers

Parents’ heights

Mean Variance

Father Height

174.3 71.7

Mother Height

160.4 58.3

Father Height +Mother Height

334.7 184.9