Statistical Reasoning III

8/13/2019 Statistical Reasoning III

1/30

The Basics

Statistical Analysis of Difference


2/30

Introduction When Statistical Tests are used?

When researchers want to determine whether a statisticallysignificant difference exists b/w two or more set of

numbers

The decision to reject or accept the null hypothesis is basedon whether or not the observed values are in the criticalregion.

What we will try to learn from next few classes?- Data Handling

- Use of specific statistical tests


3/30

Distribution for Analysis of DifferenceWhat type of distribution you have learned so far?

- Standard Normal Distributions

- Z-scores

- When we use these distributions we assume thatpopulation standard deviation is known.

- Because population standard deviations is usuallynot known we cannot ordinarily use the standardnormal distribution and its z- scores to drawstatistical conclusions from samples.


4/30

Distribution for Analysis of DifferenceThen What should we do????

Then researchers conduct most statistical tests

using distributions that resemble the normaldistributions but are altered somewhat to accountfor the errors the errors that are made whenpopulation parameters are not known.

The Three most common distributions used are t, Fand chi-square distributions


5/30

How we use these distributions?

Just like z-score distribution

We determine the probability of certain z- scores

based on standard normal distribution

We can determine the probability of obtaining certaint, F and chi-square statistics based on their respective

distribution.

The decision to reject or accept the null hypothesis isbased on whether or not the observed values are inthe critical region.


6/30

What are the shapes of thesedistributions?


7/30

What things influence the shapes ofthese distribution?

Degrees of Freedom

The degrees of freedom are calculated in different

ways for the different distributions but in generalare related to two things.

1. Number of participants in study

2. Number of levels of independent variable


8/30

t- distributions

The picture shows the shape of the t -distribution incomparison to the standard normal (or Z ) distribution. Noticethat the t -distribution becomes flatter with a smaller value of n.


9/30

T-distributionSome characteristics of t-distribution also known as

student t distribution

1.The mean of the distribution is equal to 0 .2. The variance is equal to v / ( v - 2 ), where v isthe degrees of freedom and v > 2.

3. The S.D. is always greater than 1


10/30

F-distribution

The shape of the F distribution is dependent upon thedegrees of freedom of both the numerator and denominator.Red has df

1=2 and df

2=3 , blue has df

1= 4 and df

2=30 ,

and black has df 1= 20 and df 2=20.


11/30

F-distributionCharacteristics of the F-distribution

1. It is not symmetric. The F-distribution is skewedright. That is, it is positively skewed.

2. The shape of the F-distribution depends uponthe degrees of freedom in the numerator anddenominator.

3. The total area under the curve is 1.

4. The values of F are always greater than or equalto zero. That is F distribution can not benegative.

The F distribution is used to test whether two

population variances are the same.


12/30

Chi-square distribution

Notice that in this picture as df gets large,curve is less skewed, more normal.


13/30

Properties of Chi-square distributionChi-square is non-negative. Is the ratio of twonon-negative values, therefore must be non-negative itself.Chi-square is non-symmetric or asymmetric.There are many different chi-square distributions,one for each degree of freedom.The degrees of freedom when working with a

single population variance is n-1.


14/30

Let compare and review threedistributions

t- distribution F- distribution Chi-square distribution

A symmetric distribution Non-symmetric distribution.Why asymmetric becauseobtained from squaredscores of t-statistic

Non-symmetric as the dfincreases it becomes moresymmetricObtained by distribution ofsquared z-scores

Shape of t-distributionvaries with degree offreedom which is base onsample size In case oflarge sample size the t-distribution becomes morelike z-distribution becausedf and sample size arelarge

Shape of f-distributiondepends on two degrees offreedom called numeratorand denominatorFirst one is associated withnumber of groups beingcomparedSecond is associated withsample size

Shape of distribution varieswith its degree of freedom.

Consist of both positive

and negative values

Consist of only positive

values. Therefore positivelyskewed

The value of chi-square is

never negative. Thereforeit is positively skewed


15/30

Types of Test

Parametric Non-parametricUse sample statistics such asthe mean, standard deviationand variance to estimatedifferences b/w populationparameters

Use rank and frequencydistributions to draw conclusionsabout the distribution ofpopulation parameters

Major classes of parametrictests are t-test and Analysis ofvariancePearson product momentcorrelation

Chi-square, Spearman rankorder rho,

Based on specific assumptions Used when those assumptionsnot meet

More powerful and preferred,

however cannot always be usedbecause assumptions on which

Considered less powerful

however used as many times inactual researches the


16/30

Assumptions of Tests ofDifference

Assumptions for Parametric Tests are- Random selection- Homogeneity of variance- Level of Measurement (Controversial)


17/30

from a normally distributedpopulationParticipants are randomly selected from normally

distributed populationsEven if data sets are relatively normally distributed still

accepted

The extent to which data set is normally distributed canbe tested

(We will practice this today on SPSS)In case when data set is not normally distributed there

is one strategy called transform or convert data andthen use parametric tests on that data

Otherwise can also use non-parametric tests


18/30

Assumption 2: Homogeneity ofVariance

Population variance of groups being tested are equalof homogenous

This can also be tested statistically

Will practice how to compute in next class

What to do after checking homogeneity of variance?In case if the variances of the groups are found to

differ significantly, non-parametric tests must be usedIn case if the sample sizes of groups being comparedare same, differences in the variances of groupsbecome less concernResearchers often design their studies to have equal

sample size in two groups


19/30

ssump on : eve oMeasurement

Do you know what are levels of measurements?In the previous slide where we compared parametricand non-parametric test have you noted which type ofsample statistics we use for parametric and nonparametric testsParametric Non-parametric

Use sample statistics such as themean, standard deviation andvariance to estimate differencesb/w population parameters

Use rank and frequencydistributions to draw conclusionsabout the distribution of populationparameters

Interval and ratio data meet thisneed

Nominal and ranked ordinal datameet this need

The controversy is about use ofparametric tests with ordinalmeasurements which not remainvalid so much

Interval and ratio data can beconverted into ranks or groupedinto categories to meet this need


20/30

Assumption 3: Level ofMeasurementNote : Regardless of the origin of numbers,

parametric tests can be conducted as the dataitself meet the assumptions of parametric tests

However the researcher must interpret theparametric statistical conclusions based onordinal data in light of their clinical and practicalimplications.

Can b e il lus t ra ted w i th examp le


21/30

Assumption 3: Level ofMeasurementExample from Rehabilitation ResearchVariable : Amount of assistance a patient needs to

accomplish various functional tasks.Categories are :

Codes

Categories Mean Scores offour group

1 Maximal 1.0

2 Moderate 2.0

3 Minimal 3.0

4 Standby 4.0

5 No Assistance

5.0

These group means have found tobe significantly different from oneanother.

If the researcher believe that thereal interval b/w maximal andmoderate is greater than theinterval b/w stand -by and noassistance they might interpret

the differences b/w 1.0 and 2.0 asc l in ical ly imp then b/w stand by


22/30

Checking Normality of DataIf we learn it by example, our hypothesis in this example is the nullhypothesis (Ho) is that the data is normally distributed and thealternative hypothesis (Ha) is that the data is not normallydistributed.

Steps ctions

Step 1 Select "Analyze -> Descriptive Statistics -> Explore".

Step 2 From the list on the left, select the variable Age" to the "Dependent List".Click "Plots" on the right. A new window will come. Check "None" forboxplot, un h k everything for descriptive and make sure the box"Normality plots with tests " is checked.

Step 3 The results now appear in the "Output" window.

Step 4 Interpret the result.

Look into the third table. Here two tests for normality are run. For datasetsmall than 2000 elements, we use the Shapiro-Wilk test, otherwise, theKolmogorov-Smirnov test is used.If the Sig. value of the Shapiro-Wilk Test is greater than 0.05, the data isnormal. If it is below 0.05, the data significantly deviate from a normaldistribution.


23/30

Graphical Method

Normal quantile-quantile plot (Q-Qplot) is the most commonly used andeffective diagnostic tool for checkingnormality of the data.

It is constructed by plotting theempirical quantiles of the data againstcorresponding quantiles of the normaldistribution.

If the empirical distribution of the datais approximately normal, the quantilesof the data will closely match thenormal quantiles, and the points onthe plot will fall near the line y=x.


24/30

Graphical Method

It is impossible to fit a straight line in Q-Q plot for the real data due to the factthat the random fluctuations will cause the points to drift away and aberrantobservations often contaminate the samples.

Only large or systematic departures from the line indicate the abnormality of thedata. The points will remain reasonably close to the line if there is just natural

variability.

Therefore, the straightness of the normal Q-Q plot helps us to judge whetherthe data has the same distribution shape as a normal distribution, while shiftsand tilts away from the line y=x indicate differences in location and spread,respectively


25/30

Graphical Method (Q-Q plotInterpretation points)If the data are normally distributed, the data pointswill be close to the diagonal line. If the data pointsstray from the line in an obvious non-linear fashion,the data are not normally distributed.

If you are at all unsure of being able to correctlyinterpret the graph, rely on the numerical methodsinstead because it can take a fair bit of experience tocorrectly judge the normality of data based on plots.


26/30

Normality Check Other way

Histogram: When a histograms shapeapproximates a bell-curve it suggests that the

data may have come for a normal population.


27/30

Example from Data SetIn both plots, there is a singlevalue that appears to beconsiderably different. It is anoutlier. This happens to beobservation number 5 in the data

set.


28/30

If we readjust outlier


29/30

Analysis of Skewness andKurtosis

Since the skewness and kurtosis of the normaldistribution are zero, values for these two parametersshould be close to zero for data to follow a normaldistribution.

A rough measure of the standard error of theskewness is 6/n where n is the sample size.

A rough measure of the standard error of the kurtosisis 24/n where n is the sample size.If the absolute value of the skewness for the data is

more than twice the standard error this indicates thatthe data are not symmetric, and therefore not normal.Similarly if the absolute value of the kurtosis for thedata is more than twice the standard error this is alsoan indication that the data are not normal.


30/30

ExampleIf in a data set the skewness for the data is(Absolute value .23 ) and the kurtosis is (Absolutevalue -1,53).The standard error for the skewness is .55 the

standard error for the kurtosis is 1.10.Both values are nearly not the twice the standarderror

As in previous slide

If the absolute value of the skewness/kutosis for thedata is more than twice the standard error thisindicates that the data are not symmetric, andtherefore not normal.

Both statistics are within two standard errorsh h h h d l k l b

Statistical Reasoning III

Documents

Transcript of Statistical Reasoning III