Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.
-
Upload
brianne-watts -
Category
Documents
-
view
215 -
download
0
Transcript of Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.
![Page 1: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/1.jpg)
Edpsy 511Edpsy 511
Exploratory Data AnalysisExploratory Data Analysis
Homework 1: Due 9/20Homework 1: Due 9/20
![Page 2: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/2.jpg)
Landmarks in the dataLandmarks in the data
► QuartilesQuartiles We’re often interested in the 25We’re often interested in the 25thth, 50, 50thth and 75 and 75thth
percentiles.percentiles.► 39, 38, 38, 36, 36, 31, 29, 29, 28, 19 39, 38, 38, 36, 36, 31, 29, 29, 28, 19
StepsSteps► First, order the scores from least to greatest.First, order the scores from least to greatest.► Second, Add 1 to the sample size.Second, Add 1 to the sample size.
Why?Why?► Third, Multiply sample size by percentile to find Third, Multiply sample size by percentile to find locationlocation..
Q1 = (10 + 1) * .25Q1 = (10 + 1) * .25 Q2 = (10 + 1) * .50Q2 = (10 + 1) * .50 Q3 = (10 + 1) * .75Q3 = (10 + 1) * .75
► If the value obtained is a fraction take the average of If the value obtained is a fraction take the average of the two adjacent X values.the two adjacent X values.
![Page 3: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/3.jpg)
Box-and-Whiskers Plots (a.k.a., Box-and-Whiskers Plots (a.k.a., Boxplots)Boxplots)
![Page 4: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/4.jpg)
Shapes of DistributionsShapes of Distributions
►Normal distributionNormal distribution►Positive SkewPositive Skew
Or right skewedOr right skewed
►Negative SkewNegative Skew Or left skewedOr left skewed
![Page 5: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/5.jpg)
How is this variable How is this variable distributed?distributed?
87654321
score
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Frequency
Mean = 4.3Std. Dev. = 1.494N = 10
![Page 6: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/6.jpg)
How is this variable How is this variable distributed?distributed?
7.006.005.004.003.002.001.000.00
right
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Frequency
Mean = 2.80Std. Dev. = 1.75119N = 10
![Page 7: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/7.jpg)
How is this variable How is this variable distributed?distributed?
8.007.006.005.004.003.002.00
left
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Frequency
Mean = 5.40Std. Dev. = 1.42984N = 10
![Page 8: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/8.jpg)
Descriptive StatisticsDescriptive Statistics
![Page 9: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/9.jpg)
Statistics vs. ParametersStatistics vs. Parameters
► A parameter is a characteristic of a A parameter is a characteristic of a population.population. It is a numerical or graphic way to summarize It is a numerical or graphic way to summarize
data obtained from the populationdata obtained from the population
► A statistic is a characteristic of a sample.A statistic is a characteristic of a sample. It is a numerical or graphic way to summarize It is a numerical or graphic way to summarize
data obtained from a sampledata obtained from a sample
![Page 10: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/10.jpg)
Types of Numerical DataTypes of Numerical Data
► There are two fundamental types of There are two fundamental types of numerical data:numerical data:
1)1) Categorical data: obtained by determining Categorical data: obtained by determining the frequency of occurrences in each of the frequency of occurrences in each of several categoriesseveral categories
2)2) Quantitative data: obtained by determining Quantitative data: obtained by determining placement on a scale that indicates amount placement on a scale that indicates amount or degreeor degree
![Page 11: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/11.jpg)
Techniques for Summarizing Techniques for Summarizing Quantitative DataQuantitative Data
► Frequency DistributionsFrequency Distributions► HistogramsHistograms► Stem and Leaf PlotsStem and Leaf Plots► Distribution curvesDistribution curves► AveragesAverages► VariabilityVariability
![Page 12: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/12.jpg)
Summary MeasuresSummary Measures
Central Tendency
Arithmetic Mean
Median Mode
Quartile
Summary Measures
Variation
Variance
Standard Deviation
Range
![Page 13: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/13.jpg)
Measures of Central Measures of Central TendencyTendency
Central Tendency
Average (Mean) Median Mode
1
1
n
ii
N
ii
XX
n
X
N
![Page 14: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/14.jpg)
Mean (Arithmetic Mean)Mean (Arithmetic Mean)
►Mean (arithmetic mean) of data Mean (arithmetic mean) of data valuesvalues SampleSample mean mean
PopulationPopulation mean mean
1 1 2
n
ii n
XX X X
Xn n
1 1 2
N
ii N
XX X X
N N
Sample Size
Population Size
![Page 15: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/15.jpg)
MeanMean
►The most common measure of central The most common measure of central tendencytendency
►Affected by extreme values (outliers)Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
![Page 16: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/16.jpg)
Mean of Grouped FrequencyMean of Grouped Frequency
XX ff fXfX
1010 11
99 33
88 22
77 44
66 66
55 55
TotalTotal NN
2121 fX NfXX /
![Page 17: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/17.jpg)
Weighted MeanWeighted Mean
A form of mean A form of mean obtained from obtained from groups of data in groups of data in which the different which the different sizes of the groups sizes of the groups are accounted for or are accounted for or weighted.weighted.
total
wN
xfx
)(
![Page 18: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/18.jpg)
GroupGroup xbarxbar NN f(xbar)f(xbar)
11 3030 1010
22 2525 1515
33 4040 2525
total
wN
xfx
)(
![Page 19: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/19.jpg)
MedianMedian
►Robust measure of central tendencyRobust measure of central tendency►NotNot affected by extreme values affected by extreme values
►In an In an Ordered arrayOrdered array, median is the , median is the “middle” number“middle” number If n or N is odd, median is the middle numberIf n or N is odd, median is the middle number If n or N is even, median is the average of If n or N is even, median is the average of
the two middle numbersthe two middle numbers
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
![Page 20: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/20.jpg)
ModeMode► A measure of central tendencyA measure of central tendency► Value that occurs most oftenValue that occurs most often► Not affected by extreme valuesNot affected by extreme values► Used for either numerical or categorical Used for either numerical or categorical
datadata► There may may be no modeThere may may be no mode► There may be several modesThere may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
![Page 21: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/21.jpg)
The Normal CurveThe Normal Curve
![Page 22: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/22.jpg)
Different Distributions ComparedDifferent Distributions Compared
![Page 23: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/23.jpg)
VariabilityVariability
► Refers to the extent to which the scores on a Refers to the extent to which the scores on a quantitative variable in a distribution are spread quantitative variable in a distribution are spread out.out.
► The The rangerange represents the difference between the represents the difference between the highest and lowest scores in a distribution.highest and lowest scores in a distribution.
► A A five number summaryfive number summary reports the lowest, the first reports the lowest, the first quartile, the median, the third quartile, and highest quartile, the median, the third quartile, and highest score.score. Five number summaries are often portrayed graphically by Five number summaries are often portrayed graphically by
the use of the use of box plots.box plots.
![Page 24: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/24.jpg)
VarianceVariance► The Variance, sThe Variance, s22, represents the amount of variability of , represents the amount of variability of
the data relative to their meanthe data relative to their mean► As shown below, the variance is the “average” of the As shown below, the variance is the “average” of the
squared deviations of the observations about their meansquared deviations of the observations about their mean
1
)( 22
n
xxs i
► The Variance, sThe Variance, s22, is the , is the samplesample variance, and is variance, and is used to estimate the actual used to estimate the actual populationpopulation variance, variance, 22
N
xi
22 )(
![Page 25: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/25.jpg)
Standard DeviationStandard Deviation
► Considered the most useful index of variability.Considered the most useful index of variability.► It is a single number that represents the spread of a It is a single number that represents the spread of a
distribution.distribution.► If a distribution is normal, then the mean plus or If a distribution is normal, then the mean plus or
minus 3 SD will encompass about 99% of all scores minus 3 SD will encompass about 99% of all scores in the distribution.in the distribution.
![Page 26: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/26.jpg)
Calculation of the Variance and Standard Calculation of the Variance and Standard Deviation of a DistributionDeviation of a Distribution
√
RawScore Mean X – X (X – X)
2
85 54 31 96180 54 26 67670 54 16 25660 54 6 3655 54 1 150 54 -4 1645 54 -9 8140 54 -14 19630 54 -24 57625 54 -29 841
Variance (SD2) =
Σ(X – X)2
N-1
= 3640
9 =404.44
Standard deviation (SD) = Σ(X – X)2
N-1
![Page 27: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/27.jpg)
Comparing Standard Comparing Standard DeviationsDeviations
Mean = 15.5 S = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5 S = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 S = 4.57
Data C
![Page 28: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/28.jpg)
Facts about the Normal Facts about the Normal DistributionDistribution
► 50% of all the observations fall on each side of 50% of all the observations fall on each side of the mean. the mean.
► 68% of scores fall within 1 SD of the mean in a 68% of scores fall within 1 SD of the mean in a normal distribution.normal distribution.
► 27% of the observations fall between 1 and 2 27% of the observations fall between 1 and 2 SD from the mean.SD from the mean.
► 99.7% of all scores fall within 3 SD of the mean. 99.7% of all scores fall within 3 SD of the mean. ► This is often referred to as the This is often referred to as the 68-95-99.7 rule68-95-99.7 rule
![Page 29: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/29.jpg)
Fifty Percent of All Scores in a Normal Fifty Percent of All Scores in a Normal Curve Fall on Each Side of the MeanCurve Fall on Each Side of the Mean
![Page 30: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/30.jpg)
Probabilities Under the Normal Probabilities Under the Normal CurveCurve
![Page 31: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/31.jpg)
Standard ScoresStandard Scores
► Standard scores use a common scale to indicate how Standard scores use a common scale to indicate how an individual compares to other individuals in a an individual compares to other individuals in a group.group.
► The simplest form of a standard score is a The simplest form of a standard score is a Z scoreZ score..► A A Z score Z score expresses how far a raw score is from the expresses how far a raw score is from the
mean in standard deviation units. mean in standard deviation units. ► Standard scores provide a better basis for comparing Standard scores provide a better basis for comparing
performance on different measures than do raw performance on different measures than do raw scores.scores.
► A A Probability Probability is a percent stated in decimal form and is a percent stated in decimal form and refers to the likelihood of an event occurring.refers to the likelihood of an event occurring.
► T scores T scores are z scores expressed in a different form (z are z scores expressed in a different form (z score x 10 + 50).score x 10 + 50).
![Page 32: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/32.jpg)
Probability Areas Between the Mean Probability Areas Between the Mean and Different Z Scoresand Different Z Scores
![Page 33: Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/20.](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e7b5503460f94b7b8d4/html5/thumbnails/33.jpg)
Examples of Standard Scores Examples of Standard Scores