How can you best represent statistical information and draw conclusions from it?

75
How can you best represent statistical information and draw conclusions from it?

Transcript of How can you best represent statistical information and draw conclusions from it?

Page 1: How can you best represent statistical information and draw conclusions from it?

How can you best represent statistical information and draw

conclusions from it?

Page 2: How can you best represent statistical information and draw conclusions from it?

What is statistics?Statistics is the branch of mathematics that is concerned with the collection, organization, display and interpretation of data.

Page 3: How can you best represent statistical information and draw conclusions from it?
Page 4: How can you best represent statistical information and draw conclusions from it?

S.1 Organizing DataHow can data be shown on a table or in a

graph and how can you read such data?

What is categorical data?When should you use a pie chart and how are

they made?How do you organize a frequency distribution?

Page 5: How can you best represent statistical information and draw conclusions from it?

Data types:categorical and

numericCategorical—any non numeric data

Use frequency distributionsBar chartsPie charts

Numeric—anything that can be measured and list by numberDotplotsStem and leafFrequency distributionshistograms

Page 6: How can you best represent statistical information and draw conclusions from it?

Does this data mean anything to you and can you answer questions about it in its current form?Example

Leisure time activitiesW T A W G T W WC W T W A T T WG W W C A W A WW W T W W T

W=walkingT=weight training C=cyclingG= gardening A=aerobics

Page 7: How can you best represent statistical information and draw conclusions from it?

Displaying Catagoric DataHow can you display and interpret catagoric

data?

catagoric—anything that can’t be measured and listed by number

Frequency distributionsBar ChartsPie Charts

Page 8: How can you best represent statistical information and draw conclusions from it?

Frequency DistributionDisplays all categories and a tally for eachRelative frequency—the percentage as a

decimal of time this category appears in the data Category Tally Frequenc

yRelative

Frequency

Walking

Weight training

Cycling

Gardening

Aerobics

Leisure time activities

W T A W G T W W C W

T W A T T W G W W C

A W A W W W T W W T

/

/ //

/

/

/

/

/

/ //

/

/

/

/

/

/

----

/ /

/

/

/

/

/

/

/

----

/

/

/

/

/

/

/

/

//

/

/

/

----

/

/

/

/

/

/

/

/

/

/

/

/

/

----

/

/

/

15

7

2 2

4

Total = 30

.5

2

2 2

2

Total = 1

Page 9: How can you best represent statistical information and draw conclusions from it?

Bar ChartGraphs the frequency of categorical dataBars DO NOT touchCategories are on the x-axisFrequencies are on the y-axis

Walking Wt Training Cycling Gardening Aerobic

Page 10: How can you best represent statistical information and draw conclusions from it?

Pie Charts (circle graphs)Used when there are not too many categories

Rule of thumb 8 or fewerEach “slice” is determined by the relative

frequencyDegrees in slice = rel freq x 360

Page 11: How can you best represent statistical information and draw conclusions from it?

HomeworkWorksheet 1

Page 12: How can you best represent statistical information and draw conclusions from it?

S-2 Displaying Numeric DataEQ: How do you construct and read

stem and leaf plots, dotplots, frequency distributions and histograms?

Numeric—anything that can be measured and list by numberDotplotsStem and leafFrequency distributionshistograms

Page 13: How can you best represent statistical information and draw conclusions from it?

DotplotsSimple way to represent small amounts of

dataEach piece of data has its own dotDots stack vertically above the position on

the x-axisDepending on the data set, you may lose the

exact value for each piece512 615 524 632 645

575 592 716 618 521

682 675 549 523 651 5 6

7

Page 14: How can you best represent statistical information and draw conclusions from it?

Stem PlotWorks for a small to moderate set of dataStems go in a vertical columnStems may be split low and high (0-4 and 5-

9)Comparative or double stemplot—shows

multiple data sets51 61 52 63 6457 59 71 61 5268 67 54 52 65

5

6

7

51 61 52 73 5457 59 71 61 5268 67 74 52 65

1 2 2 2 4 7 9

1 1 3 4 5 7 8

1

1 2 2 2 4 7 9

1 1 5 7 8

1 3 4

Page 15: How can you best represent statistical information and draw conclusions from it?

HistogramsA bar chart for numeric dataCenter the rectangle over the indicated value

on the x-axis—the bars touchCan be drawn off of the frequency or the

relative frequency distribution

# of partners in local law firms frequency

relative frequency

1 2 0.12 3 0.153 6 0.34 6 0.35 3 0.15

Totals 20 1

Page 16: How can you best represent statistical information and draw conclusions from it?

Shapes of HistogramsUnimodal—has one peak

Bimodal—has two peaks

Multimodal—has more than two peaks

Page 17: How can you best represent statistical information and draw conclusions from it?

Types of Unimodal CurvesSymmetric

Normal or Bell Shaped

Heavy tailed--Having long tailsLarger standard dev.

Light Tailed--Having short tailsSmaller Standard dev.

Page 18: How can you best represent statistical information and draw conclusions from it?

Skewed Curves

Lower (left) tail Upper (right) tail

When there is an outlier to the right, the curve is skewed right

When there is an outlier to the left, the curve is skewed left

Skewness is judged by the tail not where the majority of the data lies.

Skewness is judged by the tail not where the majority of the data lies.

Skewness is judged by the tail not where the majority of the data lies.

Page 19: How can you best represent statistical information and draw conclusions from it?

Frequency DistributionsContinuous and Discrete DataDiscrete Data

Individual data pointsThe range is always from the set of integers or

whole numbers Continuous Data

Data that may include decimals

Page 20: How can you best represent statistical information and draw conclusions from it?

Frequency Distributions

There are no natural breaks for continuous dataWe create our own

Ex. The fuel efficiency of a particular car ranges from 25.3 to 29.8 mpg we decide to use an interval of .5 Note:

Always start at an even increment lower than the lowest piece of data and go to an even increment higher than the highest piece of data

Interval # Interval

Low High

1 25.0 25.5

2 25.5 26.0

3 26.0 26.5

4 26.5 27.0

5 27.0 27.5

6 27.5 28.0

7 28.0 28.5

8 28.5 29.0

9 29.0 29.5

10 29.5 30.0

In which interval would you place 27.5 mpg? HxL i

Page 21: How can you best represent statistical information and draw conclusions from it?

Homework Numeric DataWorksheet 2

Page 22: How can you best represent statistical information and draw conclusions from it?

Density GraphsWhen data is unevenly distributed

You may want to use unequal groups or intervals This may only be done if you graph the density

widthclass

class of freq rel.density

interval name low high frequency

relative frequency density

1 1 10 2 0.09091 0.008262 10 20 3 0.13636 0.006493 20 30 4 0.18182 0.005874 30 40 3 0.13636 0.003335 40 50 6 0.27273 0.005356 50 100 1 0.04545 0.000457 100 200 2 0.09091 0.000458 200 1000 1 0.04545 0.00005

total 22

Page 23: How can you best represent statistical information and draw conclusions from it?
Page 24: How can you best represent statistical information and draw conclusions from it?

S-3 Describing the Center of a Data Set

EQ:What are the measures of central tendency and how can they be determined?

Page 25: How can you best represent statistical information and draw conclusions from it?

Center and SpreadTwo of the most critical descriptors of a data setGraphical methods such as those in the last chapter give a general impression

of bothNumerical methods give precise value that can be compared in detail

Page 26: How can you best represent statistical information and draw conclusions from it?

The three M’sMean

Median

Mode

• Also known as the average

• Also called the middle

• Most Frequent

Page 27: How can you best represent statistical information and draw conclusions from it?

The Meanformula for the sample mean

• x= each piece of data • xi= i indicates the position of the data from within the

original data set• n= number of pieces of data in the data set• ∑ = Greek letter Sigma means to add what follows

Always use more accuracy (more decimals) than any one piece of data has.

µ is used for the population meanGreek letters are always used for population values

n

xx

n

ii

1

Page 28: How can you best represent statistical information and draw conclusions from it?

The Median

The middle value in a list of ordered values

Median has no symbol but is often abbreviated Med

If n is odd then the median is the exact middle number

If n is even then the median is the mean of the two middle numbers

Page 29: How can you best represent statistical information and draw conclusions from it?

Comparison and Contrast of the Mean and Median

Median divides the data into two equal parts 50% of the data is on either side of the median

Mean is where the fulcrum would cause the “data scale” to balance if the values had weight

It is very sensitive to outliers

Page 30: How can you best represent statistical information and draw conclusions from it?

Balancing the “data scale”

Normal/Bell curve

meanmedian

Skewed Left Skewed Right

Page 31: How can you best represent statistical information and draw conclusions from it?

Trimmed MeanMakes the mean less susceptible to outliers

Order the data Remove the same number of pieces of data from each

end Recalculate the mean

% x n = number of pieces to be removed from EACH end

A small to moderate trim is 5% to 25%

Page 32: How can you best represent statistical information and draw conclusions from it?

Trimmed MeanExample:Find the 15% Trimmed mean of: 3, 6, 8, 2, 9, 10, 7, 15, 4, 12, 20, 36, 15, 5, 3, 7, 10, 16, 17,

12

Order the numbers: 2, 3, 3, 4, 5, 6, 7, 7, 8, 9, 10, 10, 12, 12, 15, 15, 16, 17,

20, 36,

20 items • .15 = 3

4, 5, 6, 7, 7, 8, 9, 10, 10, 12, 12, 15, 15, 16 =

8.914

136

Page 33: How can you best represent statistical information and draw conclusions from it?

Weighted Meanis similar to an arithmetic mean (the most

common type of average), where instead of each of the data points contributing equally to the final average, some data points contribute more than others.

Page 34: How can you best represent statistical information and draw conclusions from it?

Weighted Mean# of students Class average

1st period 20 75

2nd period 35 79

55

79357520 ave. weighted

Page 35: How can you best represent statistical information and draw conclusions from it?

Homework worksheet 3

Page 36: How can you best represent statistical information and draw conclusions from it?
Page 37: How can you best represent statistical information and draw conclusions from it?

S-4 SpreadWhat are the quartiles, percentiles, and box

plots?

Page 38: How can you best represent statistical information and draw conclusions from it?

RangeHigh - Low

Page 39: How can you best represent statistical information and draw conclusions from it?

• IQRIQR = upper quartile (Q3) – lower

quartile (Q1)

Lower quartile—the median of the lower halfUpper quartile—the median of the upper half

IF n is odd, the exact median is excluded from the quartiles

Used because it is resistant to outliersThere is no special name for the population IQR

Interquartile Range

Page 40: How can you best represent statistical information and draw conclusions from it?

Boxplot• Can be used for many types of summarizations

• Iqr = Q3 – Q1• Outlier = data more than 1.5•iqr from the end of the

box• Extreme=data more than 3•iqr from the end of the

box

25% 25% 25% 25%

Page 41: How can you best represent statistical information and draw conclusions from it?

Outlier(closed circle)

ExtremeOutlier(open circle)

Modified Boxplot

Page 42: How can you best represent statistical information and draw conclusions from it?

Percentages and percentiles:

Percentage: “ the score “ * 100 total possible points

Percentile: “The position of the score w/in an ordered list”*100 the total number of items

EX: 10 students took a 90 point test60, 65, 68, 74, 75, 80, 81, 81, 84, 90 (note: an ordered list)1 2 3 4 5 6 7 8 9 10

What is the percent and the percentile for a score of 81?

Percent: 81/90 *100=90%

 Percentile: 7/10*100= 70ieth percentile

Page 43: How can you best represent statistical information and draw conclusions from it?

10 2 5 720 1 630 5 8 9 940 2 3 5 7 850 260 3 6

•the median

•the first quartile

•the third quartile

•the interquartile range

•the mode

•the percentile for .271

•the value closest to the 60th percentile

EXAMPLE:Given a stem and leaf plotFIND:

Page 44: How can you best represent statistical information and draw conclusions from it?
Page 45: How can you best represent statistical information and draw conclusions from it?

S-5 Measures of VariabilityHow do the measures of variability help us to

better understand what our data set might look like?

Page 46: How can you best represent statistical information and draw conclusions from it?

S-5 Measures of VariabilityRange = high – low

Deviation from the mean= xi – if positive then xi is larger than the mean

if negative then xi is smaller than the mean Mean deviation is the average of the deviations

Sample Variance

x

1

)(2

12

n

xxs

n

ii

Page 47: How can you best represent statistical information and draw conclusions from it?

Sample Standard Deviation“average distance” the items fall from the

mean

A small s or s2 indicates low variabilityA high s or s2 indicates large variability

2ss

Page 48: How can you best represent statistical information and draw conclusions from it?

Population Variance (knowing all the data)

Population Standard Deviation

compute to the same accuracy as the population

n

xxn

ii

1

2

2

)(

2

Page 49: How can you best represent statistical information and draw conclusions from it?

Uses of the IQRStandard deviation can be approximated by

SD = IQR/1.35

If SD > IQR/1.35 it suggests heavier or longer tails than the normal curve

Page 50: How can you best represent statistical information and draw conclusions from it?

Example20, 15, 12, 18, 17, 15, 17, 16, 18, 25

Reorder12, 15, 15, 16, 17, 17, 18, 18 20, 25

range =iqr =

sd =

x

Median= 17Q1= 15 Q3= 18

Page 51: How can you best represent statistical information and draw conclusions from it?

continuedFind the mean deviation and the standard

deviationBy hand

i xi Xi- (xi- )2

1 12

2 15

3 15

4 16

5 17

6 17

7 18

8 18

9 20

10 25

totals

xx

Page 52: How can you best represent statistical information and draw conclusions from it?

Given 12, 15, 15, 16, 17, 17, 18, 18, 20, 25Find the SD By iqr

By calculator

Page 53: How can you best represent statistical information and draw conclusions from it?

Homework worksheet 5

Page 54: How can you best represent statistical information and draw conclusions from it?

S-6 Translation and ScaleWhat is the difference in the impact of translation and scale change on data?

In class project:

Page 55: How can you best represent statistical information and draw conclusions from it?
Page 56: How can you best represent statistical information and draw conclusions from it?

Hints for review #1How many intervals should be used for a set

of data?The book recommends

data ofpieces of#

Page 57: How can you best represent statistical information and draw conclusions from it?

Homework

Page 58: How can you best represent statistical information and draw conclusions from it?

TEST 1

Page 59: How can you best represent statistical information and draw conclusions from it?
Page 60: How can you best represent statistical information and draw conclusions from it?

S-7 Data CollectionHow do you know which method of data

collection is most appropriate?

Page 61: How can you best represent statistical information and draw conclusions from it?

Random SamplesWhat methods of data collection constitute

collecting a random sample?

Page 62: How can you best represent statistical information and draw conclusions from it?

SamplingSince time and money usually do not permit a

scientist to collect the opinion or measure the effect on every person in the population, they take samples which should include all groups so they can make accurate statements about the entire population

Page 63: How can you best represent statistical information and draw conclusions from it?

Simple Random SampleEach object in the population has an equal chance

of being selected for the sample

Each object in the sample is chosen independently of any other object in the sampleIndependent—choosing one has no bearing on the

choice of the next object Independent example

All names are placed in a hat and 10 are chosen Dependent example

Two names are drawn and they each ask 4 people to participate with them

Page 64: How can you best represent statistical information and draw conclusions from it?

BiasWhen one group is over-represented in

sample Causes:

Basis of selection Who responds Who asks the questions or how they are asked

Page 65: How can you best represent statistical information and draw conclusions from it?

Stratified SampleThe population is divided into groups and a

specified number are chosen from each group

Page 66: How can you best represent statistical information and draw conclusions from it?

River Project

Page 67: How can you best represent statistical information and draw conclusions from it?
Page 68: How can you best represent statistical information and draw conclusions from it?

The Normal DistributionHow does normally distributed data begin

to relate statistics to probability?

Page 69: How can you best represent statistical information and draw conclusions from it?

The Normal DistributionWhen most of the data falls close to the average and only a few pieces of data fall at a distance from

the mean. This configuration is often called a bell shaped or normal curve. Research has found that when data is normally distributed:68% of the data lies within one standard deviation of the mean95% of the data lies within two standard deviations

(13.5% lies in the one to two SD range)99.7% of the data lies within three standard deviations

(2.35% lies in the two to three SD range) 

.15% of the data lies beyond each of the three standard deviation range

3X 2X X X X 2X 3X

Page 70: How can you best represent statistical information and draw conclusions from it?

Normal curves are symmetric to the mean some are narrow and some are wide—this is determined by the value of one standard deviation.

The area under a normal curve represents all the data—100% or 1. The area under any section represents the percentage and therefore probability that a given piece of data will fall to the left of this region of the curve.

Page 71: How can you best represent statistical information and draw conclusions from it?

Normal distributions have a direct link to Probability through something called z-scores. The z-score tells exactly how many full and partial standard deviations a particular piece of data falls from the mean. A negative number means the data is to the left of the mean, a positive number tell you the data is to the right of the mean.

 the formula for z-scores is

The attached table gives the probability that a given value has a z-score less than a given value. (falls to the left of a particular spot on the normal curve)

xx

z

Page 72: How can you best represent statistical information and draw conclusions from it?

Return to problem a

Return to problem b and c

Page 73: How can you best represent statistical information and draw conclusions from it?

Examples: Find the z-score for each of the following:

a) 45 when = 50 and = 4x

Return to z-chart

Page 74: How can you best represent statistical information and draw conclusions from it?

b) 56 when = 60 and = 10

c) between 20 and 60 = 50 and = 10x

x

Return to z-chart

Page 75: How can you best represent statistical information and draw conclusions from it?