Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and...

22
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative Ways to chart categorical data: bar graphs and pie charts Ways to chart quantitative data: histograms and stem plots Interpreting histograms Time plots

Transcript of Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and...

Page 1: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Review BPS chapter 1Picturing Distributions with Graphs

• What is Statistics ?

• Individuals and variables

• Two types of data: categorical and quantitative

• Ways to chart categorical data: bar graphs and pie charts

• Ways to chart quantitative data: histograms and stem plots

• Interpreting histograms

• Time plots

Page 2: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Example BPS chapter 1Indicate whether each of the following variables is categorical orquantitative.

a. We have data on 20 individuals measuring amount of time it takes toclimb five flights of stairs.

b. During a clinical trial, an experimental pain relief drug is administered toindividuals. Each individual is then asked whether s/he experiencedany pain relief.

Quantitative

Categorical

Page 3: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Objectives (BPS chapter 2)Describing distributions with numbers

• Measure of center: mean and median

• Measure of spread: quartiles and standard deviation

• The five-number summary and boxplots

• IQR and outliers

• Choosing among summary statistics

Page 4: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

The mean or arithmetic average

To calculate the average, or mean, add

all values, then divide by the number of

individuals. It is the “center of mass.”

Sum of heights is 1598.3

Divided by 25 women = 63.9 inches

58.2 64.059.5 64.560.7 64.160.9 64.861.9 65.261.9 65.762.2 66.262.2 66.762.4 67.162.9 67.863.9 68.963.1 69.663.9

Measure of center: the mean

Page 5: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

n

nx....xxx

21

x 1598.3

2563.9

Mathematical notation:

x1

n ixi1

n

woman(i)

height(x)

woman(i)

height(x)

i = 1 x1= 58.2 i = 14 x14= 64.0

i = 2 x2= 59.5 i = 15 x15= 64.5

i = 3 x3= 60.7 i = 16 x16= 64.1

i = 4 x4= 60.9 i = 17 x17= 64.8

i = 5 x5= 61.9 i = 18 x18= 65.2

i = 6 x6= 61.9 i = 19 x19= 65.7

i = 7 x7= 62.2 i = 20 x20= 66.2

i = 8 x8= 62.2 i = 21 x21= 66.7

i = 9 x9= 62.4 i = 22 x22= 67.1

i = 10 x10= 62.9 i = 23 x23= 67.8

i = 11 x11= 63.9 i = 24 x24= 68.9

i = 12 x12= 63.1 i = 25 x25= 69.6

i = 13 x13= 63.9 n =25 S=1598.3

Learn right away how to get the mean using your calculators.

Page 6: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Measure of center: the medianThe median(M) is the midpoint of a distribution—the number such that half of the observations are smaller and half are larger.

1. Sort observations from smallest to largest. 2. Find the location of the median (L)

1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 8 2.39 9 2.510 10 2.811 11 2.912 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 8 4.722 9 4.923 10 5.324 11 5.6

n = 24 L=(n+1)/2 = 12.5

M= (3.3+3.4) /2 = 3.35

(2). If n is even, the median is the mean of the two center observations

1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 8 2.39 9 2.510 10 2.811 11 2.912 12 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 8 4.722 9 4.923 10 5.324 11 5.625 12 6.1

n = 25 L=(n+1)/2 = 26/2 = 13 M = 3.4

(1). If n is odd, the median is observation (n+1)/2 down the list n = number of observations

Page 7: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Mean and median for skewed distributions

Mean and median for a symmetric distribution

Left skew Right skew

MeanMedian

Mean Median

MeanMedian

Comparing the mean and the median

The mean and the median are the same only if the distribution is

symmetrical. In a skewed distribution, the mean is usually farther out in

the long tail than is the median. The median is a measure of center that

is resistant to skew and outliers. The mean is not.

Page 8: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

The median, on the other hand,

is only slightly pulled to the right

by the outliers (from 3.4 to 3.6).

The mean is pulled to the

right a lot by the outliers

(from 3.4 to 4.2).

P

erc

en

t o

f p

eo

ple

dyi

ng

Mean and median of a distribution with outliers

4.3x

Without the outliers

2.4x

With the outliers

Page 9: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Disease X:

Mean and median are the same.

Mean and median of a symmetric distribution

4.3

4.3

M

x

Multiple myeloma:

5.2

4.3

M

x

and a right-skewed distribution

The mean is pulled toward the skew.

Impact of skewed data

Page 10: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Example: STAT 200 Midterm Score

Midterm303540404040454545455050555560656570

100100

Descriptive Statistics: Midterm

Variable N Mean StDev Minimum Q1 Median Q3 MaximumMidterm 20 53.75 18.98 30.00 40.00 47.50 63.75 100.00

Page 11: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

M = median = 3.4

Q1= first quartile = 2.2

Q3= third quartile = 4.35

1 1 0.62 2 1.23 3 1.64 4 1.95 5 1.56 6 2.17 7 2.38 1 2.39 2 2.510 3 2.811 4 2.912 5 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 1 4.722 2 4.923 3 5.324 4 5.625 5 6.1

Measure of spread: quartiles

The first quartile, Q1, is the value in

the sample that has 25% of the data

at or below it.

The third quartile, Q3, is the value in

the sample that has 75% of the data

at or below it.

Page 12: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

M = median = 3.4

Q3= third quartile = 4.35

Q1= first quartile = 2.2

25 6 6.124 5 5.623 4 5.322 3 4.921 2 4.720 1 4.519 6 4.218 5 4.117 4 3.916 3 3.815 2 3.714 1 3.613 3.412 6 3.311 5 2.910 4 2.89 3 2.58 2 2.37 1 2.36 6 2.15 5 1.54 4 1.93 3 1.62 2 1.21 1 0.6

Largest = max = 6.1

Smallest = min = 0.6

Disease X

0

1

2

3

4

5

6

7

Yea

rs u

nti

l dea

th

“Five-number summary”

Center and spread in boxplots

Page 13: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

0123456789

101112131415

Disease X Multiple myeloma

Yea

rs u

ntil

deat

h

Comparing box plots for a normal and a right-skewed distribution

Boxplots for skewed data

Boxplots remain true

to the data and clearly

depict symmetry or

skewness.

Page 14: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

IQR and outliersThe interquartile range (IQR) is the distance between the first

and third quartiles (the length of the box in the boxplot) IQR = Q3 - Q1

An outlier is an individual value that falls outside the overall pattern.

• How far outside the overall pattern does a value have to fall to be considered an outlier?

• The 1.5 X IQR Rules for OutliersLow outlier: any value < Q1 – 1.5 IQR

High outlier: any value > Q3 + 1.5 IQR

Page 15: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Example: STAT 200 Midterm Score

IQR = Q3 - Q1 =63.75-40.00=23.75

Low outlier: any value < Q1 – 1.5 IQR = 40.00 - 1.5(23.75) = 4.375

High outlier: any value > Q3 + 1.5 IQR = 63.75 + 1.5(23.75) =99.375

Midterm303540404040454545455050555560656570

100100

Outliers !!

Page 16: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

The standard deviation is used to describe the variation around the mean.

2

1

2 )(1

1xx

ns

n

i

1) First calculate the variance s2.

2

1

)(1

1xx

ns

n

i

2) Then take the square root to get

the standard deviation s.

Measure of spread: standard deviation

Mean± 1 s.d.

x

Page 17: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Calculations …

We’ll never calculate these by hand, so make sure you know how to get the standard deviation using your calculator.

2

1

1( )

1

n

is x xn

Mean = 63.4

Sum of squared deviations from mean = 85.2

Degrees freedom (df) = (n − 1) = 13

s2 = variance = 85.2/13 = 6.55 inches squared

s = standard deviation = √6.55 = 2.56 inches

Women’s height (inches)

Page 18: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Choosing among summary statistics

• Because the mean is not resistant to outliers or skew, use it to describe distributions that are fairly symmetrical and don’t have outliers. Plot the mean and use the standard deviation for error bars.

• Otherwise, use the median in the five-number summary, which can be plotted as a boxplot.

Height of 30 women

58

59

60

61

62

63

64

65

66

67

68

69

Box plot Mean +/- sd

Hei

ght i

n in

ches

Box plot Mean ± s.d.

Page 19: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Example 1

Suppose a sample of twelve lab rats is found to have the following glucose levels:

3 4 4 6 6 6 8 8 9 10 12 15

1. Find the five-number summary of the data and construct box-plot .

2. Based on the box plot, the data set is

a. Skewed to left b. roughly symmetric c. skewed to right

Min=3, Q1=5, M=7, Q3=9.5, Max=15

Page 20: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Example 2

Suppose a researcher is recording fifty values in a database. Suppose she records every value correctly except the lowest value, which is supposed to be “2” but which she incorrectly types as “200”.

In the above scenario, the effect of the researcher’s error on mean and Median is:

a. Her calculated mean will be lower than it would have been without the error, but her calculated Median will remain unchanged.

b. Her calculated mean will be higher than it would have been without the error, but her calculated Median will remain unchanged.

c. Her calculated mean will remain unchanged, but her calculated Median will be lower than it would have been without the error.

d. Her calculated mean will remain unchanged, but her calculated Median will be lower than it would have been without the error.

Page 21: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Example 2

In the above scenario, the effect of the researcher’s error on standard deviation is:

a. The error will not affect standard deviation.

b. Her calculated standard deviation will be smaller than it would have been without the error.

c. Her calculated standard deviation will be larger than it would have been without the error.

d. The error is likely to make the calculated standard deviation negative.

Page 22: Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.

Example 3

There are three children in a room -- ages 3, 4, and 5. If a four-year-old child enters the room, the 

a.mean age and variance will stay the same.

b.mean age and variance will increase.

c.mean age will stay the same but the variance will increase.

d.mean age will stay the same but the variance will decrease.