Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

103
Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879

Transcript of Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Page 1: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantitative Data Analysis

Edouard Manet: In the Conservatory, 1879

Page 2: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

1. Introduction

• To conduct quantitative analysis, responses to open-ended questions in survey research and the raw data collected using qualitative methods must be coded numerically.

Page 3: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

1. Introduction (Continued)

• Most responses to survey research questions already are recorded in numerical format.• In mailed and face-to-face surveys,

responses are keypunched into a data file.• In telephone and internet surveys,

responses are automatically recorded in numerical format.

Page 4: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

2. Developing Code Categories

• Coding qualitative data can use an existing scheme or one developed by examining the data.

• Coding qualitative data into numerical categories sometimes can be a straightforward process.• Coding occupation, for example, can rely

upon numerical categories defined by the Bureau of the Census.

Page 5: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

2. Developing Code Categories (Continued)• Coding most forms of qualitative data, however,

requires much effort.• This coding typically requires using an iterative

procedure of trial and error.• Consider, for example, coding responses to the

question, “What is the biggest problem in attending college today.”

• The researcher must develop a set of codes that are:• exhaustive of the full range of responses.• mutually exclusive (mostly) of one another.

Page 6: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

2. Developing Code Categories (Continued)• In coding responses to the question, “What is

the biggest problem in attending college today,” the researcher might begin, for example, with a list of 5 categories, then realize that 8 would be better, then realize that it would be better to combine categories 1 and 5 into a single category and use a total of 7 categories.

• Each time the researcher makes a change in the coding scheme, it is necessary to restart the coding process to code all responses using the same scheme.

Page 7: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

2. Developing Code Categories (Continued)

• Suppose one wanted to code more complex qualitative data (e.g., videotape of an interaction between husband and wife) into numerical categories.

• How does one code the many statements, facial expressions, and body language inherent in such an interaction?

• One can realize from this example that coding schemes can become highly complex.

Page 8: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

2. Developing Code Categories (Continued)

• Complex coding schemes can take many attempts to develop.

• Once developed, they undergo continuing evaluation.

• Major revisions, however, are unlikely. • Rather, new coders are required to learn the

existing coding scheme and undergo continuing evaluation for their ability to correctly apply the scheme.

Page 9: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

3. Codebook Construction

• The end product of developing a coding scheme is the codebook.

• This document describes in detail the procedures for transforming qualitative data into numerical responses.

• The codebook should include notes that describe the process used to create codes, detailed descriptions of codes, and guidelines to use when uncertainty exists about how to code responses.

Page 10: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

4. Data Entry• Data recorded in numerical format can be

entered by keypunching or the use of sophisticated optical scanners.

• Typically, responses to internet and telephone surveys are entered directly into a numerical data base.

5. Cleaning Data• Logical errors in responses must be reconciled.• Errors of entry must be corrected.

Page 11: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

6. Collapsing Response Categories

• Sometimes the researcher might want to analyze a variable by using fewer response categories than were used to measure it.

• In these instances, the researcher might want to “collapse” one or more categories into a single category.

• The researcher might want to collapse categories to simplify the presentation of the results or because few observations exist within some categories.

Page 12: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

6. Collapsing Response Categories: Example

Response Frequency

Strongly disagree 2Disagree 22Neither agree nor disagree 45Agree 31Strongly Agree 1

Page 13: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

6. Collapsing Response Categories: Example

One might want to collapse the extreme responses and work with just three categories:

Response Frequency

Disagree 24Neither agree nor disagree 45Agree 32

Page 14: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

7. Handling “Don’t Knows”

• When asking about knowledge of factual information (“Does your teenager drink alcohol?”) or opinions on a topic the subject might not know much about (“Do school officials do enough to discourage teenagers from drinking alcohol?”), it is wise to include a “don’t know” category as a possible response.

• Analyzing “don’t know” responses, however, can be a difficult task.

Page 15: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantification of Data

7. Handling “Don’t Knows” (Continued)

• The research-on-research literature regarding this issue is complex and without clear-cut guidelines for decision-making.

• The decisions about whether to use “don’t know” response categories and how to code and analyze them tends to be idiosyncratic to the research and the researcher.

Page 16: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantitative Data Analysis

• Descriptive statistics attempt to explain or predict the values of a dependent variable given certain values of one or more independent variables.

• Inferential statistics attempt to generalize the results of descriptive statistics to a larger population of interest.

Page 17: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantitative Data Analysis

1. Data Reduction

• The first step in quantitative data analysis is to calculate descriptive statistics about variables.

• The researcher calculates statistics such as the mean, median, mode, range, and standard deviation.

• Also, the researcher might choose to collapse response categories for variables.

Page 18: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantitative Data Analysis

2. Measures of Association

• Next, the researcher calculates measures of association: statistics that indicate the strength of a relationship between two variables.

• Measures of association rely upon the basic principle of proportionate reduction in error (PRE).

Page 19: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantitative Data Analysis

2. Measures of Association (Continued)

• PRE represents how much better one would be at guessing the outcome of a dependent variable by knowing a value of an independent variable.

• For example: How much better could I predict someone’s income if I knew how many years of formal education they have completed? If the answer to this question is “37% better,” then the PRE is 37%.

Page 20: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantitative Data Analysis

2. Measures of Association (Continued)

• Statistics are designated by Greek letters.• Different statistics are used to indicate the

strength of association between variables measured at different levels of data.• Strength of association for nominal-level

variables is indicated by λ (lambda).• Strength of association for ordinal-level

variables is indicated by γ (gamma).• Strength of association for interval-level

variables is indicated by correlation (r).

Page 21: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantitative Data Analysis

2. Measures of Association (Continued)

• Covariance is the extent to which two variables “change with respect to one another.” • As one variable increases, the other variable

either increases (positive covariance) or decreases (negative covariance).

• Correlation is a standardized measure of covariance. • Correlation ranges from -1 to +1, with

figures closer to one indicating a stronger relationship.

Page 22: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Quantitative Data Analysis

2. Measures of Association (Continued)

• Technically, covariance is the extent to which two variables co-vary about their means.• If a person’s years of formal education is

above the mean of education for all persons and his/her income is above the mean of income for all persons, then this data point would indicate positive covariance between education and income.

Page 23: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Statistics

1. Introduction

• To make inferences from descriptive statistics, one has to know the reliability of these statistics.

• In the same sense that the distribution of one variable has a standard deviation, a parameter estimate has a standard error—the distribution of the estimate from its mean with respect to the normal curve.

Page 24: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Statistics

1. Introduction (Continued)

• To better understand the concepts standard deviation and standard error, and why these concepts are important to our course, please review the presentation regarding standard error.

• Presentation on Standard Error.

Page 25: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Statistics

2. Types of Analysis

• The presentation on inferential statistics will cover univariate, bivariate and multivariate analysis.

• Univariate Analysis: • Mean.• Median.• Mode.• Standard deviation.

Page 26: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Statistics

2. Types of Analysis (Continued)

• Bivariate Analysis• Tests of statistical significance.• Chi-square.

• Multivariate Analysis: • Ordinary least squares (OLS) regression.• Path analysis.• Time-series analysis.• Factor analysis.• Analysis of variance (ANOVA).

Page 27: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

1. Distributions

• Data analysis begins by examining distributions.

• One might begin, for example, by examining the distribution of responses to a question about formal education, where responses are recorded within six categories.

• A frequency distribution will show the number and percent of responses in each category of a variable.

Page 28: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

2. Central Tendency

• A common measure of central tendency is the average, or mean, of the responses.

• The median is the value of the “middle” case when all responses are rank-ordered.

• The mode is the most common response.• When data are highly skewed, meaning heavily

balanced toward one end of the distribution, the median or mode might better represent the “most common” or “centered” response.

Page 29: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

2. Central Tendency (Continued)

• Consider this distribution of respondent ages:• 18, 19, 19, 19, 20, 20, 21, 22, 85

• The mean equals 27. But this number does not adequately represent the “common” respondent because the one person who is 85 skews the distribution toward the high end.

• The median equals 20.• This measure of central tendency gives a more

accurate portrayal of the “middle of the distribution.”

Page 30: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

3. Dispersion

• Dispersion refers to the way the values are distributed around some central value, typically the mean.

• The range is the distance separating the lowest and highest values (e.g., the range of the ages listed previously equals 18-85).

• The standard deviation is an index of the amount of variability in a set of data.

Page 31: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

3. Dispersion (Continued)

• The standard deviation represents dispersion with respect to the normal (bell-shaped) curve.

• Assuming a set of numbers is normally distributed, then each standard deviation equals a certain distance from the mean.

• Each standard deviation (+1, +2, etc.) is the same distance from each other on the bell-shaped curve, but represents a declining percentage of responses because of the shape of the curve (see: Chapter 7).

Page 32: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

3. Dispersion (Continued)

• For example, the first standard deviation accounts for 34.1% of the values below and above the mean.• The figure 34.1% is derived from probability

theory and the shape of the curve.• Thus, approximately 68% of all responses fall

within one standard deviation of the mean.• The second standard deviation accounts for the

next 13.6% of the responses from the mean (27.2% of all responses), and so on.

Page 33: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

3. Dispersion (Continued)

• If the responses are distributed approximately normal and the range of responses is low—meaning that most responses fall close to the mean—then the standard deviation will be small.• The standard deviation of professional

golfer’s scores on a golf course will be low.• The standard deviation of amateur golfer’s

scores on a golf course will be high.

Page 34: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

4. Continuous and Discrete Variables

• Continuous variables have responses that form a steady progression (e.g., age, income).

• Discrete (i.e., categorical) variables have responses that are considered to be separate from one another (i.e., sex of respondent, religious affiliation).

Page 35: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

4. Continuous and Discrete Variables

• Sometimes, it is a matter of debate within the community of scholars about whether a measured variable is continuous or discrete.

• This issue is important because the statistical procedures appropriate for continuous-level data are more powerful, easier to use, and easier to interpret than those for discrete-level data, especially as related to the measurement of the dependent variable.

Page 36: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Univariate Analysis

4. Continuous and Discrete Variables (Continued)• Example: Suppose one measures amount of

formal education within five categories: less than hs, hs, 2-years vocational/college, college, post-college).

• Is this measure continuous (i.e., 1-5) or discrete?

• In practice, five categories seems to be a cutoff point for considering a variable as continuous.

• Using a seven-point response scale will give the researcher a greater chance of deeming a variable to be continuous.

Page 37: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

1. Introduction

• Bivariate analysis refers to an examination of the relationship between two variables.

• We might ask these questions about the relationship between two variables:• Do they seem to vary in relation to one

another? That is, as one variable increases in size does the other variable increase or decrease in size?

• What is the strength of the relationship between the variables?

Page 38: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

1. Introduction (Continued)

• Divide the cases into groups according to the attributes of the independent variable (e.g., men and women).

• Describe each subgroup in terms of attributes of the dependent variable (e.g., what percent of men approve of sexual equality and what percent of women approve of sexual equality).

Page 39: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

1. Introduction (Continued)

• Read the table by comparing the independent variable subgroups with one another in terms of a given attribute of the dependent variable (e.g., compare the percentages of men and women who approve of sexual equality).

• Bivariate analysis gives an indication of how the dependent variable differs across levels or categories of an independent variable.

• This relationship does not necessarily indicate causality.

Page 40: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

1. Introduction (Continued)

• Tables that compare responses to a dependent variable across levels/categories of an independent variable are called contingency tables (or sometimes, “crosstabs”).

• When writing a research report, it is common practice, even when conducting highly sophisticated statistical analysis, to present contingency tables also to give readers a sense of the distributions and bivariate relationships among variables.

Page 41: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance

• If one assumes a normal distribution, then one can examine parameters and their standard errors with respect to the normal curve to evaluate whether an observed parameter differs from zero by some set margin of error.

• Assume that the researcher sets the probability of a Type-1 error (i.e., the probability of assuming causality when there is none) at 5%.• That is, we set our margin of error very low,

just 5%.

Page 42: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• To evaluate statistical significance, the researcher compares a parameter estimate to a “zero point” on a normal curve (its center).

• The question becomes: Is this parameter estimate sufficiently large, given its standard error, that, within a 5% probability of error, we can state that it is not equal to zero?

Page 43: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• To achieve a probability of error of 5%, the parameter estimate must be almost two (i.e., 1.96) standard deviations from zero, given its standard error.

• Sometimes in sociological research, scholars say “two standard deviations” in referring to a 5% error rate. Most of the time, they are more precise and state 1.96.

Page 44: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• Consider this example:• Suppose the unstandardized estimate of the

effect of self-esteem on marital satisfaction equals 3.50 (i.e., each additional amount of self-esteem on its scale results in 3.50 additional amount of marital satisfaction on its scale).

• Suppose the standard error of this estimate equals 1.20.

Page 45: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• If we divide 3.50 by 1.20 we obtain the ratio of 2.92. This figure is called a t-ratio (or, t-value).

• The figure 2.92 means that the estimate 3.50 is 2.92 standard deviations from zero.

• Based upon our set margin of error of 5% (which is equivalent to 1.96 standard deviations), we can state that at prob. < .05, the effect of self-esteem on marital satisfaction is statistically significant.

Page 46: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• The t-ratio is the ratio of a parameter estimate to its standard error.

• The t-ratio equals the number of standard deviations that an estimate lies from the “zero point” (i.e., center) of the normal curve.

Page 47: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• Why do we state that we need to have 1.96 standard deviations from the zero point of the normal curve?

• Recall the area beneath the normal curve: • The first standard deviation covers 34.1% of

the observations on one side of the zero point.

• The second standard deviation covers the next 13.6% of the observations.

Page 48: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• Let’s assume for a moment that our estimate is greater than the “real” effect of self-esteem on marital satisfaction. • Then, at 1.96 standard deviations, we have

covered the 50% probability below the “real” effect, and we have covered 34.1% + 13.4% probability above this effect.

• In total, we have accounted for 97.5% of the probability that our estimate does not equal zero.

Page 49: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• That leaves 2.5% of the probability above the “real” estimate.

• But we have to recognize that our estimate might have fallen below the “real” estimate.

• So, we have the probability of error on both sides of “reality.”• 2.5% + 2.5% equals 5%• This is our set margin of error!

Page 50: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• Thus, inferential statistics are calculated with respect to the properties of the normal curve.

• There are other types of distributions besides the normal curve, but the normal distribution is the one most often used in sociological analysis.

Page 51: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

2. Tests of Statistical Significance (Continued)

• If we know the properties of the normal curve, and we have calculated an estimate of a parameter, and we know the standard error of this estimate (e.g., the range of values that the estimate might be), then we can calculate statistical significance.

• Recall that statistical significance does not necessarily equal substantive significance.

Page 52: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square

• Chi-square is a test of independence between two variables.

• Typically, one is interested in knowing whether an independent variable (x) “has some effect” on a dependent variable (y).

• Said another way, we want to know if y is independent of x (e.g., if it goes its own way regardless of what happens to x).

• Thus, we might ask, “Is church attendance independent of the sex of the respondent?”

Page 53: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• Scenario 1: Consider these data on sex of the subject and church attendance:

Church AttendanceSex Yes No TotalMale 28 12 40Female 42 18 60Total: 70 30 100

Page 54: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• Note that:• 70% of all persons attend church.• 70% of men attend church.• 70% of women attend church.

• Thus, we can say that church attendance is independent of the sex of the respondent because, if the total number of church goers equals 70%, then, with independence, we expect 70% of men and 70% of women to attend church, and they do.

Page 55: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• Scenario 2: Now, suppose we observed this pattern of church attendance:

Church AttendanceSex Yes No TotalMale 20 20 40Female 50 10 60Total: 70 30 100

Page 56: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• Note that:• 70% of all persons attend church.

• Therefore, if church attendance is independent of the sex of the respondent, then we expect 70% of the men and 70% of the women to attend church.

• But they do not.• Instead, 50% of the men attend church and

83.3% of the women attend church.

Page 57: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• So, for this second set of data, is church attendance independent of the sex of the respondent?

• Let’s begin by calculating how much error we would make by assuming men and women behave as expected.

• That is, for each cell of the table, we will calculate the difference between the observed and expected values.

Page 58: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• Observed in Red• Expected in White

Church AttendanceSex Yes NoMale 20-28 = -8 20-12 = 8Female 50-42 = 8 10-18 = -8

Page 59: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• Note that in each cell, if we assume independence, we make a mistake equal to “8” (sometimes positive and sometimes negative).

• If we add all of our mistakes, we obtain a sum of zero, which we know is not true.

• So, we will square each mistake to give every number a positive valence.

Page 60: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• How badly did we do in each cell?

• To know the magnitude of our mistake in each cell, we will divide the size of the mistake by the expected value in the cell (a PRE measure).

• The following table shows our proportionate degree of error in each cell and our total amount of proportionate error for the entire table.

Page 61: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• Proportionate error is calculated for each cell:

Church AttendanceSex Yes NoMale (-8 )2 / 28 = 2.29 (8)2 / 12 =

5.33Female (8)2 / 42 = 1.52 (-8)2 / 18 =

3.56

The total of all proportionate error = 12.70.This is the chi-square value for this table.

Page 62: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• Our chi-square value of 12.70 gives us a number that summarizes our proportionate amount of mistakes for the whole table.

• Is this number big enough to indicate a lack of independence between church attendance and sex of the respondent?

• To make this assessment, we compare our observed chi-square with a standardized distribution of PRE measures: the chi-square distribution.

Page 63: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• The chi-square distribution looks like a lopsided version of the normal curve.

• To compare our observed chi-square with this distribution, we need some indication of where we should be on the distribution, as we did with standard errors on the normal curve.

• On the chi-square distribution, we are “allowed” a certain amount of error depending upon our degrees of freedom.

Page 64: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• To understand degrees of freedom, reconsider our table on observed church attendance:

Church AttendanceSex Yes No TotalMale 20 20 40Female 50 10 60Total: 70 30 100

Given the margin totals, once we fill in one cell with the correct number, all the other cells are given.

Page 65: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• A degree of freedom is the number of correct guesses one must make to reach a point where all the other cells are given.

• Our table has one degree of freedom.

• The more correct guesses one must make, the greater the degrees of freedom and the more proportionate amount of error one is “allowed” within the chi-square distribution before claiming a lack of independence.

Page 66: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• The amount of chi-square we are allowed, at a probability of error set to 5%, for one degree of freedom, equals 3.841.

• Our chi-square exceeds this amount. Thus, we can claim a lack of independence between church attendance and sex of the subject at a probability of error equal to less than 5%.

Page 67: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

3. Chi-Square (Continued)

• Are you wondering where the number 3.841 comes from? It is 1.96 squared.

• Remember 1.96? It is the number of standard deviations within the normal curve that indicates a 5% Type-I error rate.

• The t-ratios for the effects of the independent variables in regression analysis each had one degree of freedom.

• So, we are working with the same principles we used for the normal curve, but with a different distribution: the chi-square distribution.

Page 68: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Bivariate Analysis

4. Some Words of Caution

1. Recognize that statistical significance does not necessarily mean that one has substantive significance.

2. Statistical significance refers to mistakes made from sampling error only.

3. Tests of statistical significance depend upon assumptions about sampling and distributions of data, which are not always met in practice.

Page 69: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis

• Regression analysis is a procedure for estimating the outcome of a dependent variable based upon the value of an independent variable.

• Thus, for just two variables, regression analysis is the same as analysis using the covariance or correlation between the variables.

Page 70: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• Typically, regression analysis is used to simultaneously examine the effects of more than one independent variable on a dependent variable.

• One might want to know, for example, the ability to predict income by knowing the education, age, race, and sex of the respondent.

Page 71: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• The statistic used to summarize the total PRE of multiple variables is the correlation squared, or R-square.

• R-square represents the total variance explained in the dependent variable.

• It represents “how well we did” in explaining the topic we wanted to explain.

Page 72: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• R-square ranges from 0 to +1, wherein the larger the value of R-square, the greater the predictive ability of the independent variables.

• The predictive ability of each variable is indicated by the statistic β (beta).

Page 73: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• Consider this equation:

• y = α + β1x1 + β2x2 + β3x3 + β4x4 + ε

• where:• y = the value of the dependent variable,• α = the intercept, or “starting point” of y,• βi = the strength of the effect of xi on y,• ε = the amount of error in the prediction of y.

Page 74: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• β is called a parameter estimate. It represents the amount of change in y for a one unit change in x.

• For example, a beta of .42 would mean that for each one unit change in x (e.g., education) we would expect to observe a .42 unit change in y.

Page 75: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• For the example we discussed earlier, we can rewrite the equation as:

• Income = + 1education + β2age + β3race + β4sex + ε

• where each of the beta’s (β) ranges in size from - to + to let us know the direction and strength of the relationship between each independent variable and income.

Page 76: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• In standardized form, this equation is:

• Income = β*1education + β*2age + β*3race + β*4sex + ε

• where each of the standardized beta’s (β*) ranges in size from -1 to +1.

• Note that the intercept () is omitted because, in standardized form, it equals zero.

Page 77: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• Each of the beta terms in these equations represents the partial effect of the variable on the dependent variable, meaning the effect of the independent variable on y after controlling for the effects of all other variables on y.

• The partial effects of independent variables in explaining the variance in a dependent variable can be visualized by thinking about the contributions of each player on a basketball team to the overall team performance.

Page 78: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• Suppose the team wins, 65-60. The player at center is the leading scorer with 18 points.

• So, we might say that the center is the most important contributor to the win. “Not so fast,” says regression analysis.

• Regression analysis also wants to know the contributions of the other players on the team and how they helped the center.

Page 79: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• Suppose that the point guard had 10 assists, 8 of which went to the center. Eight times the point guard drove the lane and then passed the ball to the center for an easy layup, accounting for 16 of the 18 points scored by the center.

• To best understand the contributions of the center, we would calculate the contributions of the center while “controlling for” the contributions of the point guard.

Page 80: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• Similarly, regression analysis shows the contribution to R-square for each variable, while controlling for the contributions of the other variables.

• The contribution of each variable in explaining variance in the dependent variable is summarized as a partial beta coefficient.

Page 81: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• In summary, regression analysis provides two indications of our ability to explain how societies work:

• The R-Square shows how much variance is explained in the dependent variable.

• The standardized beta’s (parameter estimates) show the partial effects of the independent variables in explaining the dependent variable.

Page 82: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• The graphic shown on the next slide shows a diagram of a regression of education (x) on income (y).

• The regression equation (Y2) is shown as blue-colored line. The intercept (α) is located where the regression line meets the y axis.

• The slope of the line is the beta coefficient (β), which equals .42.

Page 83: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

Page 84: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• We would interpret the results of the regression equation shown on the preceding slide in this manner: “A one unit change in education will result in a .42 unit change in income.”

• We can adjust this interpretation into actual units of education and income as we measured them in our study, to state, for example, “Each additional year of education results in an additional $4,200 in annual income.”

Page 85: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• One should be cautious about interpreting the results of regression analysis:• A high R-square value does not necessarily

mean that the researcher can be confident of knowing cause and effect.

• Predictions regarding the dependent variable are valid only within the range of the independent variables used in the regression analysis.

Page 86: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

1. Regression Analysis (Continued)

• The preceding discussion has focused upon linear regression.

• Regression lines can be curvilinear or some combination of straight and curved lines.

Page 87: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

2. Path Analysis

• Path analysis is the simultaneous calculation of regression coefficients within a complex model of direct and indirect relationships.

• The example of an elaboration model regarding the success of women-owned businesses is an example of path analysis .

• Path analysis is a very powerful tool for examining cause and effect within a complex theoretical model.

Page 88: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

3. Time-Series Analysis

• Time-series analysis uses comparisons of statistics and/or parameter estimates across time to learn how changes in the independent variable(s) affect changes in the dependent variable(s).

• Time-series analysis, when the data are available, can be a powerful tool for gaining a stronger indication of cause and effect than one learns from a cross-sectional analysis.

Page 89: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

4. Factor Analysis

• Factor analysis indicates the extent to which a set of variables measures the same underlying concept.

• This procedure assesses the extent to which variables are highly correlated with one another compared with other sets of variables.

• Consider the table of correlations (i.e., a “correlation matrix”) on the following slide:

Page 90: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

4. Factor Analysis (Continued)

X1 X2 X3 X4 X5 X6

X1 1 .52 .60 .21 .15

.09

X2 .52 1 .59 .12 .13

.11

X3 .60 .59 1 .08 .10

.10

X4 .21 .12 .08 1 .72

.70

X5 .15 .13 .10 .72 .68

.73

X6 .09 .11 .10 .70 .73

1

Page 91: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

4. Factor Analysis (Continued)

• Note that variables X1-X3 are moderately correlated with one another, but have weak correlations with variables X4-X6.

• Similarly, variables X4-X6 are moderately correlated with one another, but have weak correlations with variables X1-X3.

• The figures in this table indicate that variables X1-X3 “go together” and variables X4-X6 “go together.”

Page 92: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

4. Factor Analysis (Continued)

• Factor analysis would separate variables X1-X3 into “Factor 1” and variables X4-X6 into “Factor 2.”

• Suppose variables X1-X3 were designed by the researcher to measure self-esteem and variables X4-X6 were designed to measure marital satisfaction.

Page 93: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

4. Factor Analysis (Continued)

• The researcher could use the results of factor analysis, including the statistics produced by it, to evaluate the construct validity of using X1-X3 to measure self-esteem and using X4-X6 to measure marital satisfaction.

• Thus, factor analysis can be a useful tool for confirming the validity of measures of latent variables.

Page 94: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

4. Factor Analysis (Continued)

• Factor analysis can be used also for exploring groupings of variables.

• Suppose a researcher has a list of 20 statements that measure different opinions about same-sex marriage.

• The researcher might wonder if the 20 opinions might reflect a fewer number of “basic” opinions.

Page 95: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

4. Factor Analysis (Continued)

• Factor analysis of responses to these statements might indicate, for example, that they can be reduced into three latent variables, related to religious beliefs, beliefs about civil rights, and beliefs about sexuality.

• Then, the researcher can create scales of the grouped variables to measure religious beliefs, civil beliefs, and beliefs about sexuality to examine support for same-sex marriage.

Page 96: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

5. Analysis of Variance

• Analysis of variance (ANOVA) examines whether a difference in the mean value for one group differs from that of another group.

• Is the mean income for males, for example, statistically different from the mean income for females?

Page 97: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

5. Analysis of Variance (Continued)

• For examining mean differences across just one other variable, the researcher uses one-way ANOVA, which is equivalent to a t-test.

• For two or more other variables, the researcher uses two-way ANOVA. The researcher might be interested, for example, in knowing how mean incomes differ based upon sex of subject and level of education.

Page 98: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

5. Analysis of Variance (Continued)

• The logic of a statistical test of a difference in means is identical to that of testing whether an estimate differs from zero, except that the comparison point is the mean of the other group rather than zero.

• Rather than using just the estimate and its standard error for a single group, the procedure is to use the estimates and standard errors of two groups to assess statistical significance.

Page 99: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

5. Analysis of Variance (Continued)

• Suppose we wanted to know if the mean height of male ISU students differs significantly from the mean height of female ISU students.

• Rather than comparing the mean height of male ISU students to a hypothetical zero point, we would compare it to the mean height of female ISU students, where this comparison takes place within the context of standard errors and the shape of the normal curve.

Page 100: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

5. Analysis of Variance (Continued)

• Suppose we find in our sample of 100 female ISU students that their mean height equals 65 inches with a standard error of 1.5 inches. These figures indicate that most females (68.2%) are 63.5 to 66.5 inches in height.

• Suppose that a sample of 100 male ISU students shows a mean height for them of 70 inches with a standard error of 2.0 inches.

Page 101: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

5. Analysis of Variance (Continued)

• Let’s set our margin of error (probability of a Type-1 error) at 5%, meaning that we are looking at 1.96 standard deviations on the normal curve to indicate statistical significance.

• Here is our question: If we allow the mean of females to “grow” by 1.96 standard deviations and the mean of males to “shrink” by 1.96 standard deviations, will they reach one another?

Page 102: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Multivariate Analysis

5. Analysis of Variance (Continued)

• The answer is no, not even close. The t-ratio (number of standard deviations on the normal curve needed to join the two groups) equals 26.7.

• We can state that the difference in mean heights between ISU males and females is statistically significant at prob. < .05 (actually, considerably less than that; but that was our test margin).

Page 103: Quantitative Data Analysis Edouard Manet: In the Conservatory, 1879.

Summary of Data Analysis

• Sociologists have at their disposal a wide range of statistical techniques to help them understand relationships among their variables of interest.

• These techniques, when used properly, can help sociologists understand human societies for the purpose of improving human well-being.

• Students who want to be professional sociologists must learn statistics and the proper applications of these statistics to data analysis.

• Enjoy!