Copyright © 2011 Pearson Education, Inc. Statistical Reasoning.
Statistical Reasoning III
-
Upload
physiotherapist-ali -
Category
Documents
-
view
221 -
download
0
Transcript of Statistical Reasoning III
-
8/13/2019 Statistical Reasoning III
1/30
The Basics
Statistical Analysis of Difference
-
8/13/2019 Statistical Reasoning III
2/30
Introduction When Statistical Tests are used?
When researchers want to determine whether a statisticallysignificant difference exists b/w two or more set of
numbers
The decision to reject or accept the null hypothesis is basedon whether or not the observed values are in the criticalregion.
What we will try to learn from next few classes?- Data Handling
- Use of specific statistical tests
-
8/13/2019 Statistical Reasoning III
3/30
Distribution for Analysis of DifferenceWhat type of distribution you have learned so far?
- Standard Normal Distributions
- Z-scores
- When we use these distributions we assume thatpopulation standard deviation is known.
- Because population standard deviations is usuallynot known we cannot ordinarily use the standardnormal distribution and its z- scores to drawstatistical conclusions from samples.
-
8/13/2019 Statistical Reasoning III
4/30
Distribution for Analysis of DifferenceThen What should we do????
Then researchers conduct most statistical tests
using distributions that resemble the normaldistributions but are altered somewhat to accountfor the errors the errors that are made whenpopulation parameters are not known.
The Three most common distributions used are t, Fand chi-square distributions
-
8/13/2019 Statistical Reasoning III
5/30
How we use these distributions?
Just like z-score distribution
We determine the probability of certain z- scores
based on standard normal distribution
We can determine the probability of obtaining certaint, F and chi-square statistics based on their respective
distribution.
The decision to reject or accept the null hypothesis isbased on whether or not the observed values are inthe critical region.
-
8/13/2019 Statistical Reasoning III
6/30
What are the shapes of thesedistributions?
-
8/13/2019 Statistical Reasoning III
7/30
What things influence the shapes ofthese distribution?
Degrees of Freedom
The degrees of freedom are calculated in different
ways for the different distributions but in generalare related to two things.
1. Number of participants in study
2. Number of levels of independent variable
-
8/13/2019 Statistical Reasoning III
8/30
t- distributions
The picture shows the shape of the t -distribution incomparison to the standard normal (or Z ) distribution. Noticethat the t -distribution becomes flatter with a smaller value of n.
-
8/13/2019 Statistical Reasoning III
9/30
T-distributionSome characteristics of t-distribution also known as
student t distribution
1.The mean of the distribution is equal to 0 .2. The variance is equal to v / ( v - 2 ), where v isthe degrees of freedom and v > 2.
3. The S.D. is always greater than 1
-
8/13/2019 Statistical Reasoning III
10/30
F-distribution
The shape of the F distribution is dependent upon thedegrees of freedom of both the numerator and denominator.Red has df
1=2 and df
2=3 , blue has df
1= 4 and df
2=30 ,
and black has df 1= 20 and df 2=20.
-
8/13/2019 Statistical Reasoning III
11/30
F-distributionCharacteristics of the F-distribution
1. It is not symmetric. The F-distribution is skewedright. That is, it is positively skewed.
2. The shape of the F-distribution depends uponthe degrees of freedom in the numerator anddenominator.
3. The total area under the curve is 1.
4. The values of F are always greater than or equalto zero. That is F distribution can not benegative.
The F distribution is used to test whether two
population variances are the same.
-
8/13/2019 Statistical Reasoning III
12/30
Chi-square distribution
Notice that in this picture as df gets large,curve is less skewed, more normal.
-
8/13/2019 Statistical Reasoning III
13/30
Properties of Chi-square distributionChi-square is non-negative. Is the ratio of twonon-negative values, therefore must be non-negative itself.Chi-square is non-symmetric or asymmetric.There are many different chi-square distributions,one for each degree of freedom.The degrees of freedom when working with a
single population variance is n-1.
-
8/13/2019 Statistical Reasoning III
14/30
Let compare and review threedistributions
t- distribution F- distribution Chi-square distribution
A symmetric distribution Non-symmetric distribution.Why asymmetric becauseobtained from squaredscores of t-statistic
Non-symmetric as the dfincreases it becomes moresymmetricObtained by distribution ofsquared z-scores
Shape of t-distributionvaries with degree offreedom which is base onsample size In case oflarge sample size the t-distribution becomes morelike z-distribution becausedf and sample size arelarge
Shape of f-distributiondepends on two degrees offreedom called numeratorand denominatorFirst one is associated withnumber of groups beingcomparedSecond is associated withsample size
Shape of distribution varieswith its degree of freedom.
Consist of both positive
and negative values
Consist of only positive
values. Therefore positivelyskewed
The value of chi-square is
never negative. Thereforeit is positively skewed
-
8/13/2019 Statistical Reasoning III
15/30
Types of Test
Parametric Non-parametricUse sample statistics such asthe mean, standard deviationand variance to estimatedifferences b/w populationparameters
Use rank and frequencydistributions to draw conclusionsabout the distribution ofpopulation parameters
Major classes of parametrictests are t-test and Analysis ofvariancePearson product momentcorrelation
Chi-square, Spearman rankorder rho,
Based on specific assumptions Used when those assumptionsnot meet
More powerful and preferred,
however cannot always be usedbecause assumptions on which
Considered less powerful
however used as many times inactual researches the
-
8/13/2019 Statistical Reasoning III
16/30
Assumptions of Tests ofDifference
Assumptions for Parametric Tests are- Random selection- Homogeneity of variance- Level of Measurement (Controversial)
-
8/13/2019 Statistical Reasoning III
17/30
from a normally distributedpopulationParticipants are randomly selected from normally
distributed populationsEven if data sets are relatively normally distributed still
accepted
The extent to which data set is normally distributed canbe tested
(We will practice this today on SPSS)In case when data set is not normally distributed there
is one strategy called transform or convert data andthen use parametric tests on that data
Otherwise can also use non-parametric tests
-
8/13/2019 Statistical Reasoning III
18/30
Assumption 2: Homogeneity ofVariance
Population variance of groups being tested are equalof homogenous
This can also be tested statistically
Will practice how to compute in next class
What to do after checking homogeneity of variance?In case if the variances of the groups are found to
differ significantly, non-parametric tests must be usedIn case if the sample sizes of groups being comparedare same, differences in the variances of groupsbecome less concernResearchers often design their studies to have equal
sample size in two groups
-
8/13/2019 Statistical Reasoning III
19/30
ssump on : eve oMeasurement
Do you know what are levels of measurements?In the previous slide where we compared parametricand non-parametric test have you noted which type ofsample statistics we use for parametric and nonparametric testsParametric Non-parametric
Use sample statistics such as themean, standard deviation andvariance to estimate differencesb/w population parameters
Use rank and frequencydistributions to draw conclusionsabout the distribution of populationparameters
Interval and ratio data meet thisneed
Nominal and ranked ordinal datameet this need
The controversy is about use ofparametric tests with ordinalmeasurements which not remainvalid so much
Interval and ratio data can beconverted into ranks or groupedinto categories to meet this need
-
8/13/2019 Statistical Reasoning III
20/30
Assumption 3: Level ofMeasurementNote : Regardless of the origin of numbers,
parametric tests can be conducted as the dataitself meet the assumptions of parametric tests
However the researcher must interpret theparametric statistical conclusions based onordinal data in light of their clinical and practicalimplications.
Can b e il lus t ra ted w i th examp le
-
8/13/2019 Statistical Reasoning III
21/30
Assumption 3: Level ofMeasurementExample from Rehabilitation ResearchVariable : Amount of assistance a patient needs to
accomplish various functional tasks.Categories are :
Codes
Categories Mean Scores offour group
1 Maximal 1.0
2 Moderate 2.0
3 Minimal 3.0
4 Standby 4.0
5 No Assistance
5.0
These group means have found tobe significantly different from oneanother.
If the researcher believe that thereal interval b/w maximal andmoderate is greater than theinterval b/w stand -by and noassistance they might interpret
the differences b/w 1.0 and 2.0 asc l in ical ly imp then b/w stand by
-
8/13/2019 Statistical Reasoning III
22/30
Checking Normality of DataIf we learn it by example, our hypothesis in this example is the nullhypothesis (Ho) is that the data is normally distributed and thealternative hypothesis (Ha) is that the data is not normallydistributed.
Steps ctions
Step 1 Select "Analyze -> Descriptive Statistics -> Explore".
Step 2 From the list on the left, select the variable Age" to the "Dependent List".Click "Plots" on the right. A new window will come. Check "None" forboxplot, un h k everything for descriptive and make sure the box"Normality plots with tests " is checked.
Step 3 The results now appear in the "Output" window.
Step 4 Interpret the result.
Look into the third table. Here two tests for normality are run. For datasetsmall than 2000 elements, we use the Shapiro-Wilk test, otherwise, theKolmogorov-Smirnov test is used.If the Sig. value of the Shapiro-Wilk Test is greater than 0.05, the data isnormal. If it is below 0.05, the data significantly deviate from a normaldistribution.
-
8/13/2019 Statistical Reasoning III
23/30
Graphical Method
Normal quantile-quantile plot (Q-Qplot) is the most commonly used andeffective diagnostic tool for checkingnormality of the data.
It is constructed by plotting theempirical quantiles of the data againstcorresponding quantiles of the normaldistribution.
If the empirical distribution of the datais approximately normal, the quantilesof the data will closely match thenormal quantiles, and the points onthe plot will fall near the line y=x.
-
8/13/2019 Statistical Reasoning III
24/30
Graphical Method
It is impossible to fit a straight line in Q-Q plot for the real data due to the factthat the random fluctuations will cause the points to drift away and aberrantobservations often contaminate the samples.
Only large or systematic departures from the line indicate the abnormality of thedata. The points will remain reasonably close to the line if there is just natural
variability.
Therefore, the straightness of the normal Q-Q plot helps us to judge whetherthe data has the same distribution shape as a normal distribution, while shiftsand tilts away from the line y=x indicate differences in location and spread,respectively
-
8/13/2019 Statistical Reasoning III
25/30
Graphical Method (Q-Q plotInterpretation points)If the data are normally distributed, the data pointswill be close to the diagonal line. If the data pointsstray from the line in an obvious non-linear fashion,the data are not normally distributed.
If you are at all unsure of being able to correctlyinterpret the graph, rely on the numerical methodsinstead because it can take a fair bit of experience tocorrectly judge the normality of data based on plots.
-
8/13/2019 Statistical Reasoning III
26/30
Normality Check Other way
Histogram: When a histograms shapeapproximates a bell-curve it suggests that the
data may have come for a normal population.
-
8/13/2019 Statistical Reasoning III
27/30
Example from Data SetIn both plots, there is a singlevalue that appears to beconsiderably different. It is anoutlier. This happens to beobservation number 5 in the data
set.
-
8/13/2019 Statistical Reasoning III
28/30
If we readjust outlier
-
8/13/2019 Statistical Reasoning III
29/30
Analysis of Skewness andKurtosis
Since the skewness and kurtosis of the normaldistribution are zero, values for these two parametersshould be close to zero for data to follow a normaldistribution.
A rough measure of the standard error of theskewness is 6/n where n is the sample size.
A rough measure of the standard error of the kurtosisis 24/n where n is the sample size.If the absolute value of the skewness for the data is
more than twice the standard error this indicates thatthe data are not symmetric, and therefore not normal.Similarly if the absolute value of the kurtosis for thedata is more than twice the standard error this is alsoan indication that the data are not normal.
-
8/13/2019 Statistical Reasoning III
30/30
ExampleIf in a data set the skewness for the data is(Absolute value .23 ) and the kurtosis is (Absolutevalue -1,53).The standard error for the skewness is .55 the
standard error for the kurtosis is 1.10.Both values are nearly not the twice the standarderror
As in previous slide
If the absolute value of the skewness/kutosis for thedata is more than twice the standard error thisindicates that the data are not symmetric, andtherefore not normal.
Both statistics are within two standard errorsh h h h d l k l b