Basic Statistical Tools for Research

download Basic Statistical Tools for Research

of 53

Transcript of Basic Statistical Tools for Research

  • 8/4/2019 Basic Statistical Tools for Research

    1/53

    by

    Benjamin L. Marciano, Jr.

  • 8/4/2019 Basic Statistical Tools for Research

    2/53

    ObjectivesyUnderstand the statistical nature of

    research data

    yIdentify approaches in quantitativeresearch planning (data collection,

    organization and analysis)yIdentify appropriate statistical

    techniques for a given study design

  • 8/4/2019 Basic Statistical Tools for Research

    3/53

    Consid rations inChoosing

    Statistical Tools

    y1. Level of Measurement

    y2. Nature of StatisticalRelationship

    y3. Parametric versusNonparametric Test

  • 8/4/2019 Basic Statistical Tools for Research

    4/53

    Levels of MeasurementyNominal numbers are just categories

    y

    Ordinal ranks, hierarchy, orderyInterval equally spaced scores; no

    mathematical concept of multiplicity;

    no true zeroyRatio highest level of measurement

  • 8/4/2019 Basic Statistical Tools for Research

    5/53

    a ure o a s ca

    Relationship( epen s on objective

    of the stu y)

    yAssociation/Correlation

    yComparing groups or treatmenteffects

    yPredicting a value of an attribute ofinterest

    yTesting the effect of several factors

    on a response

  • 8/4/2019 Basic Statistical Tools for Research

    6/53

    Parametric vs. NonparametricChoice relies on

    y

    the level of measurementyassumption of normality

    ysample size

    Note: Parametric tests are generallymore powerful than nonparametrictests.

  • 8/4/2019 Basic Statistical Tools for Research

    7/53

    Probability and Non-probability

    Samplingy Probability procedure wherein every

    Sampling element of the population

    is given a (known) nonzerochance of being selected in

    the sample

    y Nonprobability procedure wherein not all

    Sampling the elements in thepopulation are given a chanceof being included

    in the sample

  • 8/4/2019 Basic Statistical Tools for Research

    8/53

    Issuesy Choice relies on

    Nature of measurement

    Variation in the populationTolerable margin of error

    y Treatment of Heterogeneity

    Stratification

    Clustering

    Multi-staging

    y Formula

  • 8/4/2019 Basic Statistical Tools for Research

    9/53

    Testing Statistical HypothesesThe Hypotheses

    y

    Null hypothesis (Ho) -the hypothesis ofno difference or no effect

    yAlternative hypothesis (Ha) -the

    operational statement that is acceptedin case the null hypothesis is rejected

  • 8/4/2019 Basic Statistical Tools for Research

    10/53

    Testing Statistical HypothesesLevel of Significance (alpha)

    y the size of the risk (0 < alpha< 1) of erroneously

    rejecting Ho that the researcher is willing to makey The choice of alpha usually depends on the

    consequences associated with erroneouslyrejecting Ho.

    y alpha=0.01 or less => very serious error

    y alpha=0.05 => moderate

    y alpha=0.10 => not too serious error

  • 8/4/2019 Basic Statistical Tools for Research

    11/53

    A Summary ofPossible Decisions in

    Hypothesis TestingState of Nature

    (True Situation)Ho is true Ho is false

    Decision(Data says) Reject Ho TYPE Ierror CORRECTdecision

    chance of chance ofoccurrence=alpha occurrence= 1 - beta(level of significance) (power of the test)

    Do not reject Ho CORRECTdecision TYPEII errorchance of chance of occurrence= 1 - alpha occurrence= beta

  • 8/4/2019 Basic Statistical Tools for Research

    12/53

    Testing Statistical HypothesesThe p-value

    y the smallest level of significance at which Ho will

    be rejected based on the information contained inthe sample

    y Alternative form of decision rule based on the p-value:

    Reject Ho if the p-value is less than or equal to thelevel of significance (alpha).

    y Remember: If p is low, Ho must go!

  • 8/4/2019 Basic Statistical Tools for Research

    13/53

  • 8/4/2019 Basic Statistical Tools for Research

    14/53

    DESCRIPTIVE METHODSDescribing and Summarizing

    A Set of MeasurementsyPresentation of Tables

    yConstruction of Graphs

    yComputation of SummaryMeasures

  • 8/4/2019 Basic Statistical Tools for Research

    15/53

    How to escribe atayAverages describe the central value

    Issue: Which average to use?

    yVariation describes extent of dispersionIssue: Absolute or comparative dispersion?

    ySkewness describes degree of asymmetry

    Where in the range of values do datacluster?

    yPercentiles identify markers or thresholds

  • 8/4/2019 Basic Statistical Tools for Research

    16/53

    Chi-Square TestyThe chi-square test determines the

    association between two (categorical)

    variables set in a contingency table.yGenerally regarded as a nonparametric test

    though no parametric counterpart is gaining

    popularity.yThe Fisher Exact Test is an alternative to this

    test for 2x2 contingency tables.

  • 8/4/2019 Basic Statistical Tools for Research

    17/53

    Chi-Square TestLow Income Middle Income High Income

    (-) attitude 31 29 27

    (+) attitude 48 93 165Total 79 122 192

    The null and alternative hypotheses are-

    y Ho: Socioeconomic status and attitude areindependent.

    y Ha: The 2 variables are associated.

  • 8/4/2019 Basic Statistical Tools for Research

    18/53

    Correlation AnalysisyCorrelation means the degree of linear

    association between two measurements.

    yThe most common correlation measure isthe Pearson coefficient, r. Alternative to thisis the Spearman coefficient for rank data.

    yPearsons r ranges from -1 to +1. Values closeto either -1 or +1 indicate strong correlationwhile near-zero values mean minimal or nocorrelation.

  • 8/4/2019 Basic Statistical Tools for Research

    19/53

    Correlation AnalysisyPositive correlation means that as onevariable increases, there is a tendency for

    the other to increase as well. Also, there is atendency for both variables to decreasetogether.

    yNegative correlation means that as onevariable increases, there is a tendency forthe other to decrease; and vice-versa.

  • 8/4/2019 Basic Statistical Tools for Research

    20/53

    Correlation AnalysisyExample: Refer to the data showing 20

    nations ranked with respect to births

    attended by trained health care personneland maternal mortality rate. Spearmancorrelation (rs) is -0.88 (p=0.000). Asignificant negative correlation exists; there

    is a general tendency for maternal mortalityto decrease when more births are attendedby medical personnel.

  • 8/4/2019 Basic Statistical Tools for Research

    21/53

    Nation Rank by AttendedPercentage Rank by Maternal

    Mortality Rate per100,000 Live Births

    y Bangladesh 1 18y Nepal 2 20y Morocco 3 16y Pakistan 4 17y Nigeria 5 19y Kenya 6 14.5y Philippines 7 11y Iran 8 12.5y Ecuador 9 14.5y Portugal 10 6.5y Vietnam 11 12.5y Spain 12.5 2.5y Panama 12.5 9y Chile 14 10y Switzerland 16 2.5y US A 16 5y Hungary 16 8y Netherlands 19 6.5y Hong Kong 19 4y Belgium 19 1

  • 8/4/2019 Basic Statistical Tools for Research

    22/53

    Paire -Sample TestsyPaired-sample tests are used to test

    significant differences in scores between

    related observations or matched pairs.yThe two common types of paired-sample

    tests are:

    y Paired t-test (parametric)y Wilcoxon Signed Ranks Test

    (nonparametric)

  • 8/4/2019 Basic Statistical Tools for Research

    23/53

    Paire -Sample TestsyThe paired t-test is used when scores

    are assumed to be normally distributedor following a bell-shaped histogram.

    yThe Wilcoxon signed-ranks test is used

    when there is marked skewness in thedata or when data is measured in anordinal scale (ranks).

  • 8/4/2019 Basic Statistical Tools for Research

    24/53

    In epen ent-Sample Testsy Independent-sample tests are used to

    determine if scores significantly differ

    between two disjoint or exclusive groups.yThe two most common types of

    independent-sample tests are:

    Independent-sample t-test (parametric)Mann-Whitney Test (nonparametric)

  • 8/4/2019 Basic Statistical Tools for Research

    25/53

    In epen ent-Sample TestsyLike the paired t-test, the independent

    sample t-test is used when scores are

    assumed to be normally distributed orfollowing a bell-shaped histogram.

    yThe Mann-Whitney test is used when

    marked skewness in the observedmeasurements is present or when data isordinal (ranks).

  • 8/4/2019 Basic Statistical Tools for Research

    26/53

    One-way Analysis of VarianceyThe One-wayANOVA is the extension of the

    independent-sample t-test to the case of

    three or more disjoint or exclusive groups.yWhen data is ordinal or when there is

    skewness, the counterpart procedure is theKruskal-Wallis test.

    yWhen the null hypotheses of equality ofmeans is rejected, pairwise comparisons arenecessary (e.g. Duncan, Tukey, Scheffe,etc.)

  • 8/4/2019 Basic Statistical Tools for Research

    27/53

    One-way Analysis of VarianceyExample: Four techniques are being

    used to perform a task. Five subjectseach were included in the experimentaldesign to determine whether or notthey yield, on the average, the sameresults (time, in seconds). Theanalytical results for the 4 techniquesare as follows:

  • 8/4/2019 Basic Statistical Tools for Research

    28/53

    A 58.7 61.4 60.9 59.1 58.2B 62.7 64.5 63.1 59.2 60.3C 55.9 56.1 57.3 55.2 58.1D 60.7 60.3 60.9 61.4 62.3

    Lab A Lab B Lab C Lab D

    Mean 59.76

    2.0 56

    .261

    .1

    Std. Dev. 1.4 2.2 1.2 0.8

  • 8/4/2019 Basic Statistical Tools for Research

    29/53

    One-way Analysis of VarianceyHo: The means across four techniques

    are equal.

    yHa: At least one mean is different.

    yThe F-test statistic has p-value 0.000.

    yAt 5% level of significance, we rejectHo. At least one mean is different.

  • 8/4/2019 Basic Statistical Tools for Research

    30/53

    N-way Analysis of VarianceyAllows analysis of main effects and

    interactions

    yMost popular is the two-wayANOVA

    yPresents difficulty for higher orderA

    NOVA

    yUseful if there are blocking variables

  • 8/4/2019 Basic Statistical Tools for Research

    31/53

    Regression AnalysisyRegression analysis is a method relevant

    to analyzing a variable by using

    information on other variables. Thevariable that is being explained oranalyzed is called the response ordependentvariable.

    yThe variables whose effects act on theresponse are called predictor, regressor orindependentvariables.

  • 8/4/2019 Basic Statistical Tools for Research

    32/53

    Regression AnalysisyWhen there is only one predictor, we have a

    simple linear regression model.

    yResponse = function (one predictor)

    y Ex. O2Consumption = function of RunningTime

    y The formal model is Yi= b0+ b1Xi+ i where i

    is a random disturbance.y O2= intercept value + slope value timesRunTime+ random error

  • 8/4/2019 Basic Statistical Tools for Research

    33/53

    Regression AnalysisyWhen there are many predictors, we have

    amultiple linear regression model.y Response = function (several predictors)y Ex. O2= function of RunTime and Agey The MLRM is written as

    Yi= 0+ 1X1i+ 2X2i+ . + kXki+ ei.Where Yi is the value of the response variablein the ith observation 0, 1, 2, ., k are

    parameters of the modely X1i, X2i, .,Xki are the values of the predictors

    in the ith observation andei is theerror term

  • 8/4/2019 Basic Statistical Tools for Research

    34/53

    So,I ant to s r gr ssion. What

    is the first thingI should do?

    IDENTIFYYOUR

    RESPONSE VARIABLE!yThis should be quantifiable.

    yYes/No, High/Low, andsimilar categorical responses

    are not valid here.

  • 8/4/2019 Basic Statistical Tools for Research

    35/53

    How about my pre ictors?yYou may choose quantitative and dummy variables as

    your predictors. Quantitative predictors must have

    correlation with the response.y Make sure there is no redundancy among

    predictors. Check this by computing theircorrelations. If there arecorrelated predictors,choose only the one that has practicalsignificance to your study. There are advancedstatistical methods that treat correlatedpredictors.

  • 8/4/2019 Basic Statistical Tools for Research

    36/53

    Whats next?y You are now ready to fit the regression equation.To

    illustrate, consider an example.

    RenarInteriors operates in medium size businessareas. In considering an expansion into other areas ofsimilar size, it wishes to investigate how sales (Y) canbe predicted from the size of the target market, i.e.,

    the 20-39 age group (X1) and the average monthlyincome of households in the area (X2). Data on these

    variables in the most recent year for 21 business areaswhere thecompany operates is given below.

  • 8/4/2019 Basic Statistical Tools for Research

    37/53

    Renar InteriorsDatay See the provided copies.

  • 8/4/2019 Basic Statistical Tools for Research

    38/53

    How to use the excel?In Excel, clickTools, DataAnalysis, Regression.

    y 1. Supply the InputY-Range box with the

    appropriatecell addresses.y 2. Supply the InputX-Range box with the

    appropriatecell addresses of theX1 andX2 valuescontiguously placed in the data matrix.

    y3.Supply the Output Range with any convenientlocation.

    y 4.Excel shall return an output of analysis.

  • 8/4/2019 Basic Statistical Tools for Research

    39/53

    ResultsyThe Coefficients column gives the

    estimated values of the regressionparameters.

    yHere,thefitted model is:

    Y=-3.887+0.146X1+0.929X2ySALES = -3.887 + 0.146xMarket

    Size + 0.929x Income

  • 8/4/2019 Basic Statistical Tools for Research

    40/53

    How o I interpret the fitte

    mo el?-3.887y The value of the intercept 3.887 is not interpreted since

    the two predictors do not have values equal to zero.

    0.146 x Market Sizey There is an estimated increase of 0.146 million pesos (i.e.,

    P146,000) in mean sales when the size of the target marketincreases by one percent holding the average monthlyfamily income constant.

    0.929 x Incomey There is an estimated increase of 0.929 million pesos (i.e.,

    P929,000) in the mean sales when the average monthlyfamily income increases by one thousand pesos holding thesize of the target market constant.

  • 8/4/2019 Basic Statistical Tools for Research

    41/53

    Can I use the mo el alrea y for

    pre iction purposes?NOT YET!

    y

    You still need to investigate themodels goodness-of-fit.

    yYou need to prove if your predictors

    are significant.yYou must also verify if the

    assumptions of regression hold.

  • 8/4/2019 Basic Statistical Tools for Research

    42/53

    How o I assess goo ness-of-fit?Three things:yANOVAyF-testyR squared

    They lurk somewhere in the Exceloutput!

  • 8/4/2019 Basic Statistical Tools for Research

    43/53

    Analysis of Variance (ANOVA)y TheANOVAis a decomposition of the total

    variation in the response into explained

    (pattern) and unexplained (error) parts.y Theexplained variability is the amount of

    variation in the response variable that may beattributed to the predictors explicitly stated in

    the model.y The unexplained variability is the amount of

    variation attributed to random error.

  • 8/4/2019 Basic Statistical Tools for Research

    44/53

    Results from the ANOVA table for

    the Renar Interiors datay The first column in the table labels the sources of

    variation (Regression and Residual).

    y

    The df column refers to the degrees of freedom. The df forRegression is always the number ofregression parameters minus one.

    The df forResidual, it is the sample size minus the

    number of regression parameters. The total df is the sum of these two degrees offreedom.

  • 8/4/2019 Basic Statistical Tools for Research

    45/53

    Results from the ANOVA table for

    the Renar Interiors datay SS refers to Sum of Squares. The value 240.3407

    represents the amount of variation in sales explainedby the two predictors in the model. The value 21.9658

    represents the unexplained variation. These twovalues sum to 262.3065. There is good fit if theRegression Sum of Squares is much larger than theResidual Sum of Squares

    y MS refers to Mean Squares. The values in this column

    are the ratio ofeach sum of square to their respectivedegrees of freedom. Mean squares have no physicalmeaning but are instrumental in computing the F-statistic.

  • 8/4/2019 Basic Statistical Tools for Research

    46/53

    The F-testyTheF-test determines if

    regression is meaningful for thedata at hand. When the p-valueis small (seeSignificance F in

    Excel output), it means thatthere is at least one significantpredictor in the analysis.

  • 8/4/2019 Basic Statistical Tools for Research

    47/53

    What is the role of the p-value?y The p-value is our evidence against the hypothesis that

    we do not have any significant predictor in the data.When it is small,we reject that hypothesis.

    y Technically, we call the above hypothesis our nullhypothesis or Ho.

    y Remember: WHENp IS LOW, Ho MUST GO!

    yRule of Thumb: The p-value is low if it is less than0.05.

  • 8/4/2019 Basic Statistical Tools for Research

    48/53

    Results from the Renar DatayIn the Renar data, the F-statisticis

    98.47 with an associatedp-value of

    2.03x10 raised to 10 (almostzero!).

    ySince the p-value is lower than 0.05,we reject Ho. Wecan thereforeconclude that at least one of ourtwo predictors can significantlyexplain sales.

  • 8/4/2019 Basic Statistical Tools for Research

    49/53

    The Coefficient of Multiple

    Determination (R squared)yThe coefficient of multiple

    determination, Rsquared, is a

    goodness-of-fit measure.

    yR squared is a figure of merit; thehigher theR squared, the better isthe success of the model inexplaining the variation in theresponse using the set of predictors.

  • 8/4/2019 Basic Statistical Tools for Research

    50/53

    Results from the Renar DatayThe R squared is normally expressed as a

    percentage and is interpreted as the

    amount of variability in the responseexplained by the independent variables.

    yThevalue of the R squared = 0.9163 means

    that 91.63% of the variation in sales canbeexplained by size of target marketand average monthly family income.

  • 8/4/2019 Basic Statistical Tools for Research

    51/53

    CAVEAT on the Coefficient of

    MultipleDetermination (R2)y Adraw back of the R squared is that it naturally

    increases as the number of predictors increases. This istrue even if the added predictor(s) are not significant.

    y As an alternative, we use the adjusted-Rsquared(Rasquared).

    y Ra squaredpenalizes theR squared for theaddition of regressors that do not contribute to

    theexplanatory power of the model.y TheRa squared is never larger than theR squared

    andcan decrease as regressors are added and forpoorly fitting models, may even be negative.

  • 8/4/2019 Basic Statistical Tools for Research

    52/53

    TheT-testsy The t-test helps in assessing if an individual

    predictor is significant.

    y Let us interpret the t-tests for the Renar data.

    X Variable 1 (Target Market Size): Since p=2.05x10-6

  • 8/4/2019 Basic Statistical Tools for Research

    53/53