Lecture 4

31
Lecture 4 Lecture 4 Survey Design and Data Survey Design and Data Coding Coding

description

Lecture 4. Survey Design and Data Coding. Overview. What is a survey Question design and considerations Testing a survey instrument Data considerations (data coding). What is a Survey?. To survey: “act of looking or seeing, observing” Research Surveys - PowerPoint PPT Presentation

Transcript of Lecture 4

Page 1: Lecture 4

Lecture 4Lecture 4

Survey Design and Data CodingSurvey Design and Data Coding

Page 2: Lecture 4

22

OverviewOverview

What is a surveyWhat is a survey

Question design and considerationsQuestion design and considerations

Testing a survey instrumentTesting a survey instrument

Data considerations (data coding)Data considerations (data coding)

Page 3: Lecture 4

33

What is a Survey?What is a Survey?

To survey: To survey: ““act of looking or seeing, observing”act of looking or seeing, observing”

Research SurveysResearch Surveys Qualitative interviews, focus groupsQualitative interviews, focus groups Specific, systematic, quantitative data-Specific, systematic, quantitative data-

collection instrumentscollection instruments

Page 4: Lecture 4

44

Page 5: Lecture 4

55

Example SurveysExample Surveys

US CensusUS Census

General Social SurveyGeneral Social Survey

Online SurveysOnline Surveys

Page 6: Lecture 4

66

Some General Considerations for a Some General Considerations for a SurveySurvey

Must have a logic for each questionMust have a logic for each question Must have a logic for the question responses (if Must have a logic for the question responses (if

provided)provided) Must have a logic for the sequencing of questionsMust have a logic for the sequencing of questions

Must have clear wordingMust have clear wording

Must have clear formattingMust have clear formatting

Must take into account the sample population Must take into account the sample population that will actually take/use the survey instrumentthat will actually take/use the survey instrument Culture, language, interpretive ambiguityCulture, language, interpretive ambiguity

Page 7: Lecture 4

77

Question LogicQuestion Logic

Avoid redundant questions unless you have a Avoid redundant questions unless you have a reasonreason Exception: Sometimes it is a good idea to ask two Exception: Sometimes it is a good idea to ask two

questions that tap same concept (first as a scale, then questions that tap same concept (first as a scale, then as a categorical decision)as a categorical decision)

Example (ranks):Example (ranks): First ask respondent to rate 1-10 how much they First ask respondent to rate 1-10 how much they

agree with several statements.agree with several statements. Next, ask respondents to rank the statements by how Next, ask respondents to rank the statements by how

much they agree with them (i.e., 5 statements, rank much they agree with them (i.e., 5 statements, rank them 1-5).them 1-5).

Page 8: Lecture 4

88

Question Logic (continued)Question Logic (continued)

Avoid asking unnecessary questionsAvoid asking unnecessary questions Example: Survey on computer usageExample: Survey on computer usage

Socio-demographic questions (age, gender)?Socio-demographic questions (age, gender)?

Risk behavior questions (drug use, etc)?Risk behavior questions (drug use, etc)?

Page 9: Lecture 4

99

Question Response Logic: Scales Question Response Logic: Scales versus Categoriesversus Categories

As a basic rule, metric scales with more range As a basic rule, metric scales with more range are better than binary or categorical responses are better than binary or categorical responses (when appropriate).(when appropriate).

Example: Happiness Example: Happiness Are you happy or sad today?Are you happy or sad today?How happy are you on a scale of 0-10 (0 is least happy, 10 is How happy are you on a scale of 0-10 (0 is least happy, 10 is most happy)?most happy)?

With general scales and Likert-scales, consider With general scales and Likert-scales, consider having no “middle” category (neutral, no having no “middle” category (neutral, no opinion).opinion).

Page 10: Lecture 4

1010

Question Response Logic: Question Response Logic: Categorical ResponsesCategorical Responses

Responses must be mutually exclusiveResponses must be mutually exclusive Example (bad): Where do you live?Example (bad): Where do you live?

Berkeley, San Fran, Bay Area, OtherBerkeley, San Fran, Bay Area, Other

Responses must be exhaustiveResponses must be exhaustive Example (bad): What kind of computer do you have?Example (bad): What kind of computer do you have?

PC, MacPC, Mac

Use ‘Don’t Know’, ‘Other’, ‘Not Applicable’ when Use ‘Don’t Know’, ‘Other’, ‘Not Applicable’ when absolutely necessary absolutely necessary

Page 11: Lecture 4

1111

Question SequenceQuestion Sequence

Static order for questions needs to have some Static order for questions needs to have some rationale/logicrationale/logic Grouping similar items togetherGrouping similar items together Scattering similar items throughout surveyScattering similar items throughout survey

Personal demographic questions work best at Personal demographic questions work best at end of survey (response rate and completion)end of survey (response rate and completion)

Randomization for all respondentsRandomization for all respondents

Page 12: Lecture 4

1212

Clear Wording / Leading QuestionsClear Wording / Leading Questions

Clear WordingClear Wording Example:Example:

“ “What ISP do you use?”What ISP do you use?”““If you have Internet service at home, what company or If you have Internet service at home, what company or service provider do you use for Internet access?”service provider do you use for Internet access?”

Leading QuestionsLeading Questions Example:Example:

““Don’t you think we should support our troops in Iraq?” Don’t you think we should support our troops in Iraq?” ““How strongly do you agree or disagree with the following How strongly do you agree or disagree with the following question: ‘We should support our troops in Iraq’”question: ‘We should support our troops in Iraq’”

Page 13: Lecture 4

1313

Clear Formatting, LogicClear Formatting, Logic

Not all questions apply to everyoneNot all questions apply to everyone Example:Example:

““How much do you spend on gas heat each month?”How much do you spend on gas heat each month?”

Branching is a possible solutionBranching is a possible solution Example:Example:

““Do you have gas heat? If yes, go to next question. If not, skip to Do you have gas heat? If yes, go to next question. If not, skip to question #3.question #3.

Condense when possible to avoid unnecessary Condense when possible to avoid unnecessary branching.branching.

Example:Example:““How much do you spend on gas heat each month? (write 0 if you How much do you spend on gas heat each month? (write 0 if you do not have gas heat)do not have gas heat)

Page 14: Lecture 4

1414

Know your sample populationKnow your sample population

Regional language and terminologyRegional language and terminology

Cultural differencesCultural differences

How you conduct survey can influence How you conduct survey can influence your valid sampleyour valid sample Door to door?Door to door? Registered telephone directory?Registered telephone directory? Internet-based survey?Internet-based survey?

Page 15: Lecture 4

1515

Replication and Using Existing Replication and Using Existing Survey InstrumentsSurvey Instruments

ALWAYS a good idea to find other surveys that are used ALWAYS a good idea to find other surveys that are used in your area of interest.in your area of interest.

Especially with large, funded surveys the questions may have Especially with large, funded surveys the questions may have been tested for been tested for reliabilityreliability..

Allows for Allows for comparisons between different samplescomparisons between different samples if the if the question wording is the same.question wording is the same.

If a question or set of questions is accepted as a good If a question or set of questions is accepted as a good operationalization of the concept you are interested in, operationalization of the concept you are interested in, you don’t you don’t want to reinvent itwant to reinvent it unless you really intend to argue that your unless you really intend to argue that your measure is more appropriate.measure is more appropriate.

Page 16: Lecture 4

Pre-Testing and Pilot Pre-Testing and Pilot StudiesStudies

Page 17: Lecture 4

1717

Testing a Survey InstrumentTesting a Survey Instrument

Pre-testing versus PilotsPre-testing versus Pilots

Pre-tests: Focus on individual questions or Pre-tests: Focus on individual questions or the entire survey instrument/questionnaire.the entire survey instrument/questionnaire.

Pilots: Usually larger scale than pre-testing, Pilots: Usually larger scale than pre-testing, involve testing the entire survey procedure.involve testing the entire survey procedure.

Page 18: Lecture 4

1818

Pre-Testing and PilotsPre-Testing and Pilots

Pre-tests and Pilots are always necessaryPre-tests and Pilots are always necessary, unless the , unless the survey in its existing form has already been given before.survey in its existing form has already been given before.

Pre-testing and Pilot studies should have a Pre-testing and Pilot studies should have a large enough large enough response rateresponse rate so that you can actually find problems! so that you can actually find problems!

Example: You want to survey 100 undergrads for a small study. Example: You want to survey 100 undergrads for a small study. You may need to at least pre-test on a 20% sample with You may need to at least pre-test on a 20% sample with differentdifferent undergrads than your intended valid sample (i.e., pre- undergrads than your intended valid sample (i.e., pre-test on 20 undergrads from your intended population, but not test on 20 undergrads from your intended population, but not students who could end up in your final survey)students who could end up in your final survey)

Page 19: Lecture 4

1919

Testing your QuestionsTesting your Questions

Does the respondent’s comprehension of Does the respondent’s comprehension of question meaning match that of the question meaning match that of the researcher?researcher?

Does the researcher put too much of an Does the researcher put too much of an expectation of recall on the respondent?expectation of recall on the respondent?

Page 20: Lecture 4

2020

Ways to test your questionsWays to test your questions

Behavior codingBehavior coding– interview some respondents – interview some respondents as you give the survey questions and keep track as you give the survey questions and keep track of requests for clarification.of requests for clarification.

Ask pretest respondents to Ask pretest respondents to rephraserephrase your your questions in their own words.questions in their own words.

Panels of ‘experts’:Panels of ‘experts’: give your questions to give your questions to groups of individuals for comments/suggestions.groups of individuals for comments/suggestions.

Page 21: Lecture 4

2121

The pitfalls of skipping the testing The pitfalls of skipping the testing stagestage

Best case scenario: you get some or most of Best case scenario: you get some or most of what you wanted to get– but often an uphill what you wanted to get– but often an uphill battle with justifying your operationalizations and battle with justifying your operationalizations and wording choices.wording choices.

Worst case scenario: you get wild differences in Worst case scenario: you get wild differences in responses, respondents don’t understand key responses, respondents don’t understand key questions, large incompletion rate, money and questions, large incompletion rate, money and time spent on conducting survey is wasted time spent on conducting survey is wasted (except for your newfound appreciation for pre-(except for your newfound appreciation for pre-testing and pilots)testing and pilots)

Page 22: Lecture 4

After the Survey: Data After the Survey: Data Coding and Error CheckingCoding and Error Checking

Page 23: Lecture 4

2323

Data CodingData Coding

The coding considerations start with the The coding considerations start with the survey itself.survey itself.

You develop a You develop a codebookcodebook that records what that records what the possible numeric responses will be for the possible numeric responses will be for everyevery question. question.

No open-ended questions unless No open-ended questions unless absolutely necessaryabsolutely necessary for other reasons. for other reasons.

Page 24: Lecture 4

2424

Making data numericMaking data numeric

Use numbers to represent variable valuesUse numbers to represent variable values Assign a numeric value to all of the values that your Assign a numeric value to all of the values that your

variables can take. variables can take. Example: Gender (Male, Female) Male=0, Female=1.Example: Gender (Male, Female) Male=0, Female=1.

Develop a systematic way of handling missing Develop a systematic way of handling missing data!data! You You mustmust enter a value for missing data– otherwise enter a value for missing data– otherwise

you will not know if missing is due to input error, N/A, you will not know if missing is due to input error, N/A, skipped question, etc.skipped question, etc.

Example: use numeric codes that would not normally make Example: use numeric codes that would not normally make sense for the variable (e.g., -9 for Missing, -8 for Not sense for the variable (e.g., -9 for Missing, -8 for Not Applicable, etc).Applicable, etc).

Page 25: Lecture 4

2525

Other tips for creating your datasetOther tips for creating your dataset

Use ID numbers– always, no exceptions! Use ID numbers– always, no exceptions! seriously!seriously! Datasets get manipulated and resorted constantly. Datasets get manipulated and resorted constantly.

Without ID’s, errors cannot be corrected, outliers Without ID’s, errors cannot be corrected, outliers cannot be identified. ID’s should allow you to match cannot be identified. ID’s should allow you to match any case in the dataset with an actual survey taken by any case in the dataset with an actual survey taken by that individual.that individual.

Use conventional data structure.Use conventional data structure. Rectangular format, each row is a case and each Rectangular format, each row is a case and each

column is a single variable.column is a single variable.

Page 26: Lecture 4

2626

Error Checking DataError Checking Data

Why?Why? Solves problems that may occur laterSolves problems that may occur later Makes sure your entire analysis is not bogusMakes sure your entire analysis is not bogus You may accidentally engage in coitus more You may accidentally engage in coitus more

often as you get older….WHAT?!often as you get older….WHAT?!

Page 27: Lecture 4

2727

Marital Coital FrequencyMarital Coital Frequency

Jasso and Guillermina (1985) “Marital Coital Jasso and Guillermina (1985) “Marital Coital Frequency and the Passage of Time: Estimating Frequency and the Passage of Time: Estimating the Separate Effects of Spouses’ Ages and the Separate Effects of Spouses’ Ages and Marital Duration, Birth and Marriage Cohorts, Marital Duration, Birth and Marriage Cohorts, and Period Influences” (American Sociological and Period Influences” (American Sociological Review)Review) Major Findings of the Study:Major Findings of the Study:

Controlling for cohort and age effects, negative period effectControlling for cohort and age effects, negative period effectControlling for period and cohort effects, wife’s age had a Controlling for period and cohort effects, wife’s age had a positive effectpositive effectBoth findings differ significantly from earlier studies of the Both findings differ significantly from earlier studies of the same topic.same topic.

Page 28: Lecture 4

2828

Coitus: Part DeuxCoitus: Part Deux

Kahn and Udry (1986) “Marital Coital Frequency: Kahn and Udry (1986) “Marital Coital Frequency: Unnoticed Outliers and Unspecified Interactions Unnoticed Outliers and Unspecified Interactions Lead to Erroneous Conclusions” (American Lead to Erroneous Conclusions” (American Sociological Review)Sociological Review) Major Findings:Major Findings:

In the Jasso study, 4 cases were coded as 88– MISSING In the Jasso study, 4 cases were coded as 88– MISSING DATA CODES!!!DATA CODES!!!4 more cases had very large studentized residuals (each 4 more cases had very large studentized residuals (each was also very different from the first survey)was also very different from the first survey)Missed an important interaction between length of marriage Missed an important interaction between length of marriage and wife’s ageand wife’s ageDropping the 8 outliers from the sample of more than 2000 Dropping the 8 outliers from the sample of more than 2000 cases drastically changed the findingscases drastically changed the findings

Page 29: Lecture 4

2929

How to Error Check dataHow to Error Check data

Know what you are looking to check, use Know what you are looking to check, use appropriate methods:appropriate methods: DescriptivesDescriptives FrequenciesFrequencies Cross-tabulationsCross-tabulations

Page 30: Lecture 4

3030

Error Checking ExamplesError Checking Examples

Checking Original Variables for ErrorsChecking Original Variables for Errors FrequenciesFrequencies DescriptivesDescriptives

Checking and Setting “Missing” codesChecking and Setting “Missing” codes

Recoding and Creating New Variables from Recoding and Creating New Variables from Existing VariablesExisting Variables FrequenciesFrequencies Cross-TabulationsCross-Tabulations

Page 31: Lecture 4

3131

Example:Example:

Class Data SetClass Data Set