basic statistic chap03
-
Upload
irving-adrian-santoso -
Category
Documents
-
view
63 -
download
2
Transcript of basic statistic chap03
-
Chapter 3
Numerical Descriptive MeasuresStatistics for ManagersUsing Microsoft Excel 4th Edition
-
Chapter GoalsAfter completing this chapter, you should be able to: Compute and interpret the mean, median, mode, geometric mean, and quartiles for a set of dataFind the range, variance, standard deviation, and coefficient of variation and know what these values meanConstruct and interpret a box and whiskers plot Compute and explain the correlation coefficientUse numerical measures along with graphs, charts, and tables to describe data
-
Chapter TopicsMeasures of central tendency, variation, and shapeMean, median, mode, geometric meanQuartilesRange, interquartile range, variance and standard deviation, coefficient of variationSymmetric and skewed distributionsPopulation summary measuresMean, variance, and standard deviationThe empirical ruleFive number summary and box-and-whisker plotsCoefficient of correlationEthical considerations in numerical descriptive measures
-
Summary MeasuresArithmetic MeanMedianModeDescribing Data NumericallyVarianceStandard DeviationCoefficient of VariationRangeInterquartile RangeGeometric MeanSkewnessCentral TendencyVariationShapeQuartiles
-
Measures of Central TendencyCentral TendencyArithmetic MeanMedianModeGeometric MeanOverviewMidpoint of ranked valuesMost frequently observed value
-
Arithmetic MeanThe arithmetic mean (mean) is the most common measure of central tendency
For a sample of size n:Sample sizeObserved values
-
Arithmetic MeanMean = sum of values divided by the number of valuesAffected by extreme values (outliers)
(continued)0 1 2 3 4 5 6 7 8 9 10Mean = 3 0 1 2 3 4 5 6 7 8 9 10Mean = 4
-
MedianNot affected by extreme values
In an ordered array, the median is the middle number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10Median = 3 0 1 2 3 4 5 6 7 8 9 10Median = 3
-
Finding the MedianThe location of the median:
If the number of values is odd, the median is the middle numberIf the number of values is even, the median is the average of the two middle numbers
Note that is not the value of the median, only the position of the median in the ranked data
-
ModeA measure of central tendencyValue that occurs most oftenNot affected by extreme valuesMainly used for grouped numerical data or categorical dataThere may may be no modeThere may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 90 1 2 3 4 5 6No Mode
-
Review ExampleFive houses on a hill by the beachHouse Prices: $2,000,000 500,000 300,000 100,000 100,000
-
Review Example:Summary StatisticsMean: ($3,000,000/5) = $600,000
Median: middle value of ranked data = $300,000
Mode: most frequent value = $100,000House Prices: $2,000,000 500,000 300,000 100,000 100,000Sum 3,000,000
-
Which measure of location is the best?Mean is generally used, unless extreme values (outliers) existThen median is often used, since the median is not sensitive to extreme values.Example: Median home prices may be reported for a region less sensitive to outliers
-
Geometric MeanGeometric meanUsed to measure the rate of change of a variable over time
Geometric mean rate of returnMeasures the status of an investment over time
Where Ri is the rate of return in time period i
-
Geometric Mean ExampleAn investment of $100,000 declined to $50,000 at the end of year one and rebounded to $100,000 at end of year two:50% decrease 100% increaseThe overall two-year return is zero, since it started and ended at the same level.
-
Geometric Mean ExampleUse the 1-year returns to compute the arithmetic mean and the geometric mean:Arithmetic mean rate of return:Geometric mean rate of return:Misleading resultMore accurate result(continued)
-
Geometric Mean ExampleAn investment of $100,000 declined to $50,000 at the end of year one and rebounded to $100,000 at end of year two:Returns as percents: 50% and 100% are converted to decimals -.5 and 1.00 Add 1 to each decimal yields .5 and 2 Find the geometric mean using the geomean functionSubtract 1 from the answer to get a rate of return of 0
-
QuartilesQuartiles split the ranked data into 4 segments with an equal number of values per segment25%25%25%25%The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are largerQ2 is the same as the median (50% are smaller, 50% are larger)Only 25% of the observations are greater than the third quartile
Q1Q2Q3
-
Quartile FormulasFind a quartile by determining the value in the appropriate position in the ranked data, where
First quartile position: Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median position)
Third quartile position: Q3 = 3(n+1)/4
where n is the number of observed values
-
Quartiles (n = 9) Q1 = is in the (9+1)/4 = 2.5 position of the ranked dataso use the value half way between the 2nd and 3rd values, Q1 = 12.5Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Example: Find the first quartile Q1 and Q3 are measures of noncentral location Q2 = median, a measure of central tendency
-
Measures of VariationSame center, different variationVariationVarianceStandard DeviationCoefficient of VariationRangeInterquartile RangeMeasures of variation give information on the spread or variability of the data values.
-
RangeSimplest measure of variationDifference between the largest and the smallest observations:
Range = Xlargest Xsmallest
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13Example:
-
Disadvantages of the RangeIgnores the way in which data are distributed
Sensitive to outliers
7 8 9 10 11 12Range = 12 - 7 = 57 8 9 10 11 12Range = 12 - 7 = 51,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,51,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120Range = 5 - 1 = 4Range = 120 - 1 = 119
-
Interquartile Range
You can eliminate some outlier problems by using the interquartile range Difference between the first and third quartiles
Interquartile range = 3rd quartile 1st quartile = Q3 Q1
-
Interquartile RangeMedian(Q2)XmaximumXminimumQ1Q3Example:25% 25% 25% 25%12 30 45 57 70Interquartile range = 57 30 = 27
-
VarianceAverage of squared deviations of each value from the mean
Sample variance:Where = arithmetic meann = sample sizeXi = ith value of the variable X
-
Standard DeviationMost commonly used measure of variationShows variation about the meanHas the same units as the original data
Sample standard deviation:
-
Calculation Example:Sample Standard DeviationSample Data (Xi) : 10 12 14 15 17 18 18 24 n = 8 Mean = X = 16
-
Measuring variationSmall standard deviation
Large standard deviation
-
Comparing Standard DeviationsMean = 15.5 S = 3.338 11 12 13 14 15 16 17 18 19 20 2111 12 13 14 15 16 17 18 19 20 21Data BData AMean = 15.5 S = .925811 12 13 14 15 16 17 18 19 20 21Mean = 15.5 S = 4.57Data C
-
Coefficient of VariationMeasures relative variationAlways a percentage (%)Shows variation relative to meanIs used to compare two or more sets of data measured in different units
-
Comparing Coefficients of VariationStock A:Average price last year = $50Standard deviation = $5
Stock B:Average price last year = $100Standard deviation = $5Both stocks have the same standard deviation, but stock B is less variable relative to its price
-
Shape of a DistributionDescribes how data is distributedShape - Symmetric or skewedMean = Median Mean < Median Median < MeanRight-SkewedLeft-SkewedSymmetric
-
Microsoft ExcelDescriptive Statistics can be obtained from Microsoft Excel
Use menu choice: tools / data analysis / descriptive statistics
Enter details in dialog box
-
Using ExcelUse menu choice: tools / data analysis / descriptive statistics
-
Using ExcelEnter dialog box details
Check box for summary statistics
Click OK(continued)
-
Excel outputMicrosoft Excel descriptive statistics output, using the house price data:House Prices: $2,000,000 500,000 300,000 100,000 100,000
-
Population Summary MeasuresThe population mean is the sum of the values in the population divided by the population size, N = population meanN = population sizeXi = ith value of the variable XWhere
-
Population VarianceAverage of squared deviations of values from the mean
Population variance:Where = population meanN = population sizeXi = ith value of the variable X
-
Population Standard DeviationMost commonly used measure of variationShows variation about the meanHas the same units as the original data
Population standard deviation:
-
If the data distribution is bell-shaped, then the interval:contains about 68% of the values in the population or the sampleThe Empirical Rule68%
-
contains about 95% of the values in the population or the samplecontains about 99.7% of the values in the population or the sampleThe Empirical Rule99.7%95%
-
Exploratory Data AnalysisBox-and-Whisker Plot: A Graphical display of data using 5-number summary:Minimum -- Q1 -- Median -- Q3 -- MaximumExample:25% 25% 25% 25%
-
Shape of Box and Whisker PlotsThe Box and central line are centered between the endpoints if data is symmetric around the median
A Box and Whisker plot can be shown in either vertical or horizontal formatMin Q1 Median Q3 Max
-
Distribution Shape and Box and Whisker PlotRight-SkewedLeft-SkewedSymmetricQ1Q2Q3Q1Q2Q3Q1Q2Q3
-
Box-and-Whisker Plot ExampleBelow is a Box-and-Whisker plot for the following data: 0 2 2 2 3 3 4 5 5 10 27
This data is right skewed, as the plot depicts0 2 3 5 27Min Q1 Q2 Q3 Max
-
Coefficient of CorrelationMeasures the relative strength of the linear relationship between two variables
Sample coefficient of correlation:
-
Features of Correlation Coefficient, rUnit freeRanges between 1 and 1The closer to 1, the stronger the negative linear relationshipThe closer to 1, the stronger the positive linear relationshipThe closer to 0, the weaker any linear relationship
-
Scatter Plots of Data with Various Correlation CoefficientsYXYXYXYXYXr = -1r = -.6r = 0r = +.3r = +1YXr = 0
-
Using Excel to Find the Correlation CoefficientSelect Tools/Data AnalysisChoose Correlation from the selection menuClick OK . . .
-
Using Excel to Find the Correlation CoefficientInput data range and select appropriate optionsClick OK to get output(continued)
-
Interpreting the Resultr = .733
There is a relatively strong positive linear relationship between test score #1 and test score #2
Chart1
82
88
91
90
92
85
89
81
96
77
Test #2 Score
Test #1 Score
Test #2 Score
Scatter Plot of Test Scores
Sheet4
Test #1 ScoreTest #2 Score
Test #1 Score1
Test #2 Score0.73324370471
Sheet1
Test #1 ScoreTest #2 Score
7882
9288
8691
8390
9592
8585
9189
7681
8896
7977
Sheet1
Test #2 Score
Test #1 Score
Test #2 Score
Scatter Plot of Test Scores
Sheet2
Sheet3
-
Ethical ConsiderationsNumerical descriptive measures:
Should document both good and bad resultsShould be presented in a fair, objective and neutral mannerShould not use inappropriate summary measures to distort facts
-
Chapter SummaryDescribed measures of central tendencyMean, median, mode, geometric meanDiscussed quartilesDescribed measures of variationRange, interquartile range, variance and standard deviation, coefficient of variationIllustrated shape of distributionSymmetric, skewed, box-and-whisker plotsDiscussed correlation coefficientAddressed pitfalls in numerical descriptive measures and ethical considerations