STAT 211 – 019 Dan Piett West Virginia University Lecture 1.

STAT 211 – 019Dan Piett

West Virginia University

Lecture 1

Overview1.1 Science of Statistics1.2 Displaying Small Sets of Numbers1.3 Graphing Categorical DataBREAK - Problems1.4 Frequency Histograms1.5 Density Histograms1.6 Centers of Data1.7 Mean vs Median

Section 1.1

Science of Statistics

DefinitionsStatistics

The science of collecting, organizing, and analyzing data for the purpose of estimation and making inferences.

DataValues which arise from observing characteristics on

a selected group (sample) of individuals. The characteristic(s) which are observed are called variables.

PopulationThe entire group of “things” from which data may be

collected.Sample

The selected group of the population that is studied.

Population vs. SamplePopulation

All undergraduate students at WVU

SampleA group of 100

undergraduate students at WVU

All WVU Students (population)

Group of 100

Students

(sample)

Types of DataVariables

The characteristics which we look at when observing data. These can be classified into four groups.

NumericDiscreteContinuous

CategoricalRanked (Ordinal)Unranked (Nominal)

Numeric VariablesNumeric Variables

Variables whose values represent quantities. This can be further broken down into discrete and continuous.

Discrete NumericVariables which usually arise by countingExamples

Number of cars in a parking lotNumber of courses taken in a semester

Continuous NumericVariables which arise by measuringExamples

Time spent running a marathonWeight

Categorical VariablesCategorical Variables

Variables that are not numeric. These can be further classified into ranked and unranked.

Ranked Categorical VariablesIf the possible values of a categorical value

follow a “natural” orderingExample

Olympic MedalUnranked Categorical Variables

Anything that is not rankedExample

Blood Type

Two Major Branches of Statistics

Descriptive Statistics Inferential Statistics

Use graphical displays and numeric summarizations to represent data.

First half of this semester

Use analytic methods and theory of probability to draw conclusions or make decisions.

Second half of this semester

Section 1.2

Displaying Small Sets of Numbers

Displaying Small Sets of DataWe will be looking at three different ways

to display small sets of data.Dot PlotsStem and Leaf DisplayHistograms

Dot PlotEach Dot

represents one data value

Example – Height of Students62, 71, 65, 68, 64, 72, 66, 68, 70, 67, 67, 68, 64, 65, 68

Stem and Leaf DisplayRepresents the data using the actual digits

that make up the data.Leading digit becomes the stemTrailing digit becomes the leaf

Stem and Leaf ExamplesExample – Exam

Grades76, 74, 82, 96, 66, 76, 93, 86, 84, 62, 82, 75, 58, 71, 73, 79, 65, 80

Example - Decimals1.3, 2.4, 1.7, 3.2, 5.6

OutliersOutlier

An observation whose value is unusual or extreme.

Example15, 21, 13, 18, 23, 19, 26, 16, 71, 22, 14, 21

71 is an outlier

Section 1.3

Graphing Categorical Data

Grouped Frequency TableFrequencies are

tabulated for each value of the categorical variable.

Example – STAT 101 Grade Distribution (2000)

Bar GraphRepresents categorical

data by showing the amount of data that belong to each category as proportionally sized rectangles.

Example: US Coal Production (in millions)

WV – 172.0PA – 189.2KY – 154.8WY – 233.6Other – 120.4

Pie Chart (Circle Graph)A pie chart shows the

amount of data that belongs to each category as a proportional part of a circle.

Example: US Coal Production (in millions)

WV – 172.0PA – 189.2KY – 154.8WY – 233.6Other – 120.4

Section 1.4

Frequency Histograms

Grouped Frequency DistributionsGrouped Frequency Distributions

A list (or table) which pairs ranges for values of a variable with their frequencies (counts). Each range is called a class.

Rules for constructing a grouped frequency distribution1.Each class should be the same width.2.Classes should not overlap.3.Each observation falls into one and only one

class.4.Use between 3 and 15 classes.

Frequency HistogramFrequency Histogram

A graphical representation of a frequency distribution

Example – Math3 Exam Scores

Shape of a DistributionSymmetric Shaped

Mound Shaped

U-Shaped

Uniform

Shape of a DistributionAsymmetric

Shaped

Skewed Right (Positive)

Skewed Left (Negative)

Section 1.5

Density Histograms

Density HistogramRelative Frequency Histogram

A histogram in which the vertical axis represents percentages or proportions, rather than counts.

Example – Math3 Exam Scores

Constructing Density Histograms1. For each class, compute the percent of

observations in each class.2. Divide the percent associated with each

class by the width of that class. This yields the percent of observations associated with each unit of the measurement scale.

3. Draw the histogram using the values computed in step 2.

More on Density HistogramsIf the bars of a histogram are arranged

from largest to smallest, it is called a Pareto Chart.

A density histogram is a histogram whose vertical axis is scaled so that the sum of the areas of it’s rectangles is 1 square unit.

If the sample size, n, is very large, a density histogram can be used to estimate the distribution of the population from which the data was obtained.

Section 1.6

Centers of Data

AveragesAverage

An average is a single value which represents all of the data.

Types of AveragesSample MeanSample MedianSample Mode

Sample Mean (x)Sample MeanFormula

Examplex: 14, 23, 8, 19, 41Sample Mean = (14 + 23 + 8 + 19 +

41)/5= 105/5= 21

More on Sample MeansIf we acquire data from all members of a

population, we can compute the population mean ( )

Because of this, we can use the sample mean to estimate the population mean

Sample Median ( )Sample Median

The middle value of the observed data values ranked from lowest to highest

If the sample contains n observations, the median is the ½(n+1) ranked value

Example:x: 14, 23, 8, 19, 41Ranked: 8, 14, 19, 23, 41 n = 5Position of the median: ½(5+1) observation =

3rd Median = 19

Sample Median Even ExampleIn the previous example, n was oddExample with an even n:x: 43, 26, 37, 19, 52, 80Ranked: 19, 26, 37, 43, 52, 80 n=6Position of the Median: ½ (6+1) observation

=3½ positionMedian is the mean of the 3rd and 4th ranked

observationMedian = (37+43)/2 = 40

Sample ModeSample Mode

The observed value which occurs with the greatest frequency

Example:x: 4, 3, 1, 4, 0, 3, 3, 1, 4, 0, 1, 1, 2, 2Mode = 1

More Mode ExamplesA set of x might

have no mode (every value occurs once)Mode = NoneNOT Mode = 0

A set of x might have more than one modeMode = 5 and 6

Section 1.7

Mean vs. Median

Best Measure of the Center

Mean Median

Use for approximately symmetric distributions

Greatly influenced by outliers

Use for significantly skewed distributions

Not greatly influenced by outliers

How to Handle OutliersIf the outlier is a “mistaken” data value,

eliminate it from the analysis

If the outlier is an actual data value, do not eliminate it

End of Class

STAT 211 – 019 Dan Piett West Virginia University Lecture 1.

Documents

Transcript of STAT 211 – 019 Dan Piett West Virginia University Lecture 1.