Chapter015

58
Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 15 Statistical Analysis of Quantitative Data

Transcript of Chapter015

Page 1: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Chapter 15

Statistical Analysis of Quantitative Data

Page 2: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Statistical Analysis

• Descriptive statistics

– Used to describe and synthesize data

• Inferential statistics

– Used to make inferences about the population based on sample data

Page 3: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Descriptive Indexes

• Parameter

– A descriptor for a population (e.g., the average age of menses for Canadian females)

• Statistic

– A descriptor for a population (e.g., the average age of menses for female students at McGill University)

Page 4: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Descriptive Statistics: Frequency Distributions• A systematic arrangement of numeric values on a

variable from lowest to highest, and a count of the number of times (and/or percentage) each value was obtained

• Frequency distributions can be described in terms of:

– Shape

– Central tendency

– Variability

• Can be presented in a table (Ns and percentages) or graphically (e.g., frequency polygons)

Page 5: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Shapes of Distributions•Symmetry

– Symmetric

– Skewed (asymmetric)

•Positive skew (long tail points to the right)

•Negative skew (long tail points to the left)

Page 6: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Page 7: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Page 8: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Shapes of Distributions (cont.)

• Peakedness (how sharp the peak is)

• Modality (number of peaks)

– Unimodal (1 peak)

– Bimodal (2 peaks)

– Multimodal (2+ peaks)

Page 9: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Normal Distribution

• Characteristics:

– Symmetric

– Unimodal

– Not too peaked, not too flat

• More popularly referred to as a bell-shaped curve

• Important distribution in inferential statistics

Page 10: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Question

Is the following statement True or False?

• A bell-shaped curve is also called a normal distribution.

Page 11: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Answer

• True

– A normal distribution is commonly referred to as a bell-shaped curve.

Page 12: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Central Tendency• Index of “typicalness” of a set of scores that comes

from center of the distribution

• Mode—the most frequently occurring score in a distribution – Ex: 2, 3, 3, 3, 4, 5, 6, 7, 8, 9 Mode = 3

• Median—the point in a distribution above which and below which 50% of cases fall

– Ex: 2, 3, 3, 3, 4 | 5, 6, 7, 8, 9 Median = 4.5

• Mean—equals the sum of all scores divided by the total number of scores

– Ex: 2, 3, 3, 3, 4, 5, 6, 7, 8, 9 Mean = 5.0

Page 13: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Comparison of Measures of Central Tendency

• Mode, useful mainly as gross descriptor, especially of nominal measures

• Median, useful mainly as descriptor of typical value when distribution is skewed (e.g., household income)

• Mean, most stable and widely used indicator of central tendency

Page 14: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Variability

The degree to which scores in a distribution are spread out or dispersed

• Homogeneity—little variability

• Heterogeneity—great variability

Page 15: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Page 16: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Indexes of Variability

• Range: highest value minus lowest value

• Standard deviation (SD): average deviation of scores in a distribution

Page 17: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Page 18: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Bivariate Descriptive Statistics

• Used for describing the relationship between two variables

• Two common approaches:

– Contingency tables (Crosstabs)

– Correlation coefficients

Page 19: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Contingency Table

• A two-dimensional frequency distribution; frequencies of two variables are cross-tabulated

• “Cells” at intersection of rows and columns display counts and percentages

• Variables usually nominal or ordinal

Page 20: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Correlation Coefficients

• Indicate direction and magnitude of relationship between two variables

• The most widely used correlation coefficient is Pearson’s r.

• Pearson’s r is used when both variables are interval- or ratio-level measures.

Page 21: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Question

The researcher subtracts the lowest value of data from the highest value of data to obtain which of the following?

a. Mode

b. Median

c. Mean

d. Range

Page 22: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Answer

d. Range

• The range is calculated by subtracting the lowest value of data from the highest value of data. The mode refers to the most frequently occurring score. The median refers to the point distribution above which and below which 50% of the cases fall. The mean is the sum of all the scores divided by the total number of scores.

Page 23: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Correlation Coefficients (cont.)• Correlation coefficients can range from

-1.00 to +1.00

– Negative relationship (0.00 to -1.00) —one variable increases in value as the other decreases, e.g., amount of exercise and weight

– Positive relationship (0.00 to +1.00) —both variables increase, e.g., calorie consumption and weight

Page 24: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Correlation Coefficients (cont.)

• The greater the absolute value of the coefficient, the stronger the relationship:

Ex: r = -.45 is stronger than r = +.40

• With multiple variables, a correlation matrix can be displayed to show all pairs of correlations.

Page 25: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Describing Risk• Clinical decision-making for EBP may

involve the calculation of risk indexes, so that decisions can be made about relative risks for alternative treatments or exposures.

• Some frequently used indexes:

– Absolute Risk

– Absolute Risk Reduction (ARR)

– Odds Ratio (OR)

Page 26: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

The Odds Ratio (OR)

• The odds = the proportion of people with an adverse outcome relative to those without it

– e.g., the odds of ….

• The odds ratio is computed to compare the odds of an adverse outcome for two groups being compared (e.g., men vs. women, experimentals vs. controls).

Page 27: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Inferential Statistics

• Used to make objective decisions about population parameters using sample data

• Based on laws of probability

• Uses the concept of theoretical distributions

– e.g., the sampling distribution of the mean

Page 28: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Sampling Distribution of the Mean• A theoretical distribution of means for an infinite

number of samples drawn from the same population

• Is always normally distributed

• Its mean equals the population mean.

• Its standard deviation is called the standard error of the mean (SEM).

• SEM is estimated from a sample SD and the sample size.

Page 29: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Statistical Inference—Two Forms

•Parameter estimation

•Hypothesis testing (more common among nurse researchers than among medical researchers)

Page 30: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Question

Is the following statement True or False?

• A correlation coefficient of -38 is stronger than a correlation coefficient of +32.

Page 31: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Answer

• True

– For a correlation coefficient, the greater the absolute value of the coefficient, the stronger the relationship. So the absolute value of -.38 is greater than the absolute value of +.32 and thus is stronger.

Page 32: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Estimation of Parameters

• Point estimation—A single descriptive statistic that estimates the population value (e.g., a mean, percentage, or OR)

• Interval estimation—A range of values within which a population value probably lies

– Involves computing a confidence interval (CI)

Page 33: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Confidence Intervals

• CIs indicate the upper and lower confidence limits and the probability that the population value is between those limits.

– For example, a 95% CI of 40–50 for a sample mean of 45 indicates there is a 95% probability that the population mean is between 40 and 50.

Page 34: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Hypothesis Testing• Based on rules of negative inference: research

hypotheses are supported if null hypotheses can be rejected.

• Involves statistical decision-making to either:

– accept the null hypothesis or

– reject the null hypothesis

• Researchers compute a test statistic with their data and then determine whether the statistic falls beyond the critical region in the relevant theoretical distribution.

– Values beyond the critical region indicate that the null hypothesis is improbable, at a specified probability level.

Page 35: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Hypothesis Testing (cont.)

• If the value of the test statistic indicates that the null hypothesis is improbable, then the result is statistically significant.

• A nonsignificant result means that any observed difference or relationship could have happened by chance.

• Statistical decisions are either correct or incorrect.

Page 36: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Errors in Statistical Decisions • Type I error: rejection of a null hypothesis when

it should not be rejected; a false-positive result

– Risk of error is controlled by the level of significance (alpha), e.g., α = .05 or .01.

• Type II error: failure to reject a null hypothesis when it should be rejected; a false-negative result

– The risk of this error is beta (β).

– Power is the ability of a test to detect true relationships; power = 1 – β.

– By convention, power should be at least .80.

– Larger samples = greater power

Page 37: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Page 38: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Parametric and Nonparametric Tests• Parametric Statistics:

– Use involves estimation of a parameter; assumes variables are normally distributed in the population; measurements are on interval/ratio scale

• Nonparametric Statistics:

– Use does not involve estimation of a parameter; measurements typically on nominal or ordinal scale; doesn’t assume normal distribution in the population

Page 39: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Overview of Hypothesis-Testing Procedures

• Select an appropriate test statistic.

• Establish significance criterion (e.g., α = .05).

• Compute test statistic with actual data.

• Calculate degrees of freedom (df) for the test statistic.

• Obtain a critical value for the statistical test (e.g., from a table).

• Compare the computed test statistic to the tabled value.

• Make decision to accept or reject null hypothesis.

Page 40: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Commonly Used Bivariate Statistical Tests

• t-Test

• Analysis of variance (ANOVA)

• Pearson’s r

• Chi-squared test

Page 41: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Question

Is the following statement True or False?

• Parametric statistical testing usually involves measurements on the nominal scale.

Page 42: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Answer

• False

– Parametric statistics usually involve measurements that are on the interval or ratio scale; nonparametric tests are used for nominal or ordinal data.

Page 43: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

t -Test

• Tests the difference between two means

• t-test for independent groups: between-subjects test

– e.g., means for men vs. women

• t-test for dependent (paired) groups: within-subjects test

– e.g., means for patients before and after surgery

Page 44: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Analysis of Variance (ANOVA)

Tests the difference between more than 2 means

– One-way ANOVA (e.g., 3 groups)

– Multifactor (e.g., two-way) ANOVA

– Repeated measures ANOVA (RM-ANOVA): within subjects

Page 45: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Chi-Squared Test

• Tests the difference in proportions in categories within a contingency table

• Compares observed frequencies in each cell with expected frequencies—the frequencies expected if there was no relationship

Page 46: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Correlation

• Pearson’s r is both a descriptive and an inferential statistic.

• Tests that the relationship between two variables is not zero.

Page 47: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Effect Size• Effect size is an important concept in power

analysis.

• Effect size indexes summarize the magnitude of the effect of the independent variable on the dependent variable.

• In a comparison of two group means (i.e., in a t-test situation), the effect size index is d.

• By convention:

d ≤ .20, small effect

d = .50, moderate effect

d ≥ .80, large effect

Page 48: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Multivariate Statistics

• Statistical procedures for analyzing relationships among 3 or more variables

• Two commonly used procedures in nursing research:

– Multiple regression

– Analysis of covariance (ANCOVA)

Page 49: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Multiple Linear Regression

• Used to predict a dependent variable based on two or more independent (predictor) variables

• Dependent variable is continuous (interval or ratio-level data).

• Predictor variables are continuous (interval or ratio) or dichotomous.

Page 50: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Multiple Correlation Coefficient (R )

• The correlation index for a dependent variable and 2+ independent (predictor) variables: R

• Does not have negative values: shows strength of relationships, not direction

• R2 is an estimate of the proportion of variability in the dependent variable accounted for by all predictors.

Page 51: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Analysis of Covariance (ANCOVA)• Extends ANOVA by removing the effect of

confounding variables (covariates) before testing whether mean group differences are statistically significant

• Levels of measurement of variables:

– Dependent variable is continuous—ratio or interval level

– Independent variable is nominal (group status)

– Covariates are continuous or dichotomous

Page 52: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Question

Which test would be used to compare the observed frequencies with expected frequencies within a contingency table?

a. Pearson’s r

b. Chi-squared test

c. t-Test

d. ANOVA

Page 53: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Answer

b. Chi-squared test

• The chi-squared test evaluates the difference in proportions in categories within a contingency table, comparing the observed frequencies with the expected frequencies. Pearson’s r tests that the relationship between two variables is not zero. The t-test evaluates the difference between two means. The ANOVA tests the difference between more than 2 means.

Page 54: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Logistic Regression

• Analyzes relationships between a nominal-level dependent variable and 2+ independent variables

• Yields an odds ratio—the risk of an outcome occurring given one condition, versus the risk of it occurring given a different condition

• The OR is calculated after first removing (statistically controlling) the effects of confounding variables.

Page 55: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Factor Analysis

• Used to reduce a large set of variables into a smaller set of underlying dimensions (factors)

• Used primarily in developing scales and complex instruments

Page 56: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Multivariate Analysis of Variance

• The extension of ANOVA to more than one dependent variable

• Abbreviated as MANOVA

• Can be used with covariates: Multivariate analysis of covariance (MANCOVA)

Page 57: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

Causal Modeling

• Tests a hypothesized multivariable causal explanation of a phenomenon

• Includes:

– Path analysis

– Structural equations modeling

Page 58: Chapter015

Copyright © 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins

End of Presentation