Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle
description
Transcript of Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle
![Page 1: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/1.jpg)
Curating and Managing Research Data for Re-Use
Review & ProcessingJared Lyle
![Page 2: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/2.jpg)
We Are Here Today: Review & Processing
![Page 3: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/3.jpg)
http://weknowmemes.com/2011/12/this-is-my-room-what-i-think-it-looks-like-what-my-mom-thinks-it-looks-like/
![Page 4: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/4.jpg)
A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.
Do no harm.
![Page 5: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/5.jpg)
http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf
![Page 6: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/6.jpg)
Review
• Documentation• Data• [Disclosure Review]
![Page 7: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/7.jpg)
Is the data collection complete, accurate, and well-documented?
![Page 8: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/8.jpg)
Documentation
http://dx.doi.org/10.3886/ICPSR31521.v1
![Page 9: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/9.jpg)
Essential Descriptive Elements
• Basic front matter• Variable level details• Methodology
![Page 10: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/10.jpg)
Documentation: Front Matter
Title
Principal Investigator(s)
http://dx.doi.org/10.3886/ICPSR31521.v1
![Page 11: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/11.jpg)
Description
Documentation: Front Matter
Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009. Johnston, Lloyd D., Jerald G. Bachman, Patrick M. O'Malley, and John E. Schulenberg. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2009 [Computer file]. ICPSR28401-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-10-27. doi:10.3886/ICPSR28401.v1
![Page 12: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/12.jpg)
Documentation: Variable-level Details
National Longitudinal Study of Adolescent Health (Add Health), 1994-1995 (National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook. http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html
![Page 13: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/13.jpg)
Variable Name
Documentation: Variable-level Details
![Page 14: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/14.jpg)
Variable Label
Documentation: Variable-level Details
![Page 15: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/15.jpg)
Variable Type
Documentation: Variable-level Details
![Page 16: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/16.jpg)
Question Text
Documentation: Variable-level Details
![Page 17: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/17.jpg)
Values
Documentation: Variable-level Details
![Page 18: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/18.jpg)
Value Labels
Documentation: Variable-level Details
![Page 19: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/19.jpg)
Missing Data
Documentation: Variable-level Details
![Page 20: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/20.jpg)
Summary Statistics
Documentation: Variable-level Details
![Page 21: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/21.jpg)
Constructed Variables
Documentation: Variable-level Details
![Page 22: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/22.jpg)
Documentation: Variable-level Details
Skip Patterns
Notes
![Page 23: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/23.jpg)
Documentation: Variable-level Details(examples)
American National Election Study, 2008-2009 Panel Study Frequency codebook, version 20090903. http://electionstudies.org/studypages/2008_2009panel/anes2008_2009panel_fcodebook.txt
![Page 24: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/24.jpg)
Documentation: Variable-level Details(examples)
Davis, James A., Tom W. Smith, and Peter V. Marsden. General Social Surveys, 1972-2008 [Cumulative File] [Computer file]. ICPSR25962-v2. Storrs, CT: Roper Center for Public Opinion Resarch, University of Connecticut/Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2010-02-08. doi:10.3886/ICPSR25962
![Page 25: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/25.jpg)
Documentation: Variable-level Details(examples)
United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Survey on Drug Use and Health, 2009 [Computer file]. ICPSR29621-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-11-16. doi:10.3886/ICPSR29621
![Page 26: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/26.jpg)
Documentation: Variable-level Details(examples)
United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. Capital Punishment in the United States, 1973-2008 [Computer file]. ICPSR27982-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-09-07. doi:10.3886/ICPSR27982
![Page 27: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/27.jpg)
• Sample design: A description of how the cases that appear in the study were selected, including details about target populations, sampling frames, sample sizes, sampling errors, and sampling methods.
• Data collection procedures: The methods used to collect the data (e.g., telephone, mail, computer-assisted). Where applicable, this includes the exact instructions and protocols used by interviewers when they collected the data.
• Data processing: The activities and quality checks performed on the data collection to generate the final data products from the raw collected data. If files were merged , a full description of the process should be provided.
Documentation: Methodology
![Page 28: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/28.jpg)
• Weighting: Where applicable, a description of the criteria for using weights in the analysis of a data collection, including how the weights were created, all weighting formulae or coefficients, a definition of their elements, and an indication of how the formulae are applied to the data.
• Confidentiality issues: Where applicable, a discussion of any confidentiality issues in the data, as well as the steps taken to mitigate disclosure risk.
Documentation: Methodology
![Page 29: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/29.jpg)
Other Documentation
• Questionnaire• User Guide• Handbook• Manual• Report• Table• User Agreement• Errata
![Page 30: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/30.jpg)
Useful Resources: DescriptionICPSR, “What is a codebook?” http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is-codebook
Institute for Health and Care Research Quality Handbook http://www.emgo.nl/kc/preparation/data%20collection/3%20Codebook.html Princeton University Data and Statistical Services, “How to Use a Codebook” http://dss.princeton.edu/online_help/analysis/codebook.htm UCLA Social Science Data Archive, “Codebooks” http://dataarchives.ss.ucla.edu/tutor/tutcode.htm
![Page 31: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/31.jpg)
Data
![Page 32: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/32.jpg)
Data Labels
• Does each variable have a variable name and label?
• Do all categorical variables have value labels?• Are labels consistent?
![Page 33: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/33.jpg)
Naming Conventions: Variables
Variable Names:
•One-up numbers (V1, V2)•Question numbers (Q1, Q2)•Mnemonic names (age, race)•Prefix, root, suffix systems (FAED, MOED)
![Page 34: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/34.jpg)
Naming Conventions: Variables
Variable Labels:
•Item/Question number•Indicate variable content•Indicate if variable constructed
Q14: Assessment of R’s Health
![Page 35: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/35.jpg)
Naming Conventions: Values
Value Labels:
•Mutually exclusive, exhaustive, and defined•Preserve original information•Retain original coding scheme
Respondent’s Employment StatusSelf-employed (1)Somewhere-else (2) No answer (9)Not applicable (BK)
![Page 36: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/36.jpg)
Missing Data
• Are there missing data?• Are missing data labeled?
77 = Inapplicable88 = Don’t Know99 = No Answer
![Page 37: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/37.jpg)
Values
• Are the values reasonable (for example, date variables contain dates, gender variables don't have 10 categories, variables aren't all system missing)?
• Are there weight variables? If so, are they well documented?
![Page 38: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/38.jpg)
Matching Data & Documentation
• Do the data match the documentation? Are values and/or labels listed in one but not in the other?
• Are all codes in the data valid (documented) according to the data collection instrument or PI's codebook?
• Are there duplicate records?• Does the spelling look OK?
![Page 39: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/39.jpg)
Processing History
![Page 40: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/40.jpg)
Useful Resources: DataUK Data Archive, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.data-archive.ac.uk/create-manage/document/data-level?index=1
ICPSR Guide to Social Science Data Preparation and Archiving: Phase 3: Data Collection and File Creation, “Documenting Your Data/Data Level/Structured Tabular Data” http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter3quant.html
![Page 41: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/41.jpg)
Activity
• Review the following data output and report any issues you find.
![Page 42: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/42.jpg)
Examples of What to Look For:
42
![Page 43: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/43.jpg)
43
Examples of What to Look For:
![Page 44: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/44.jpg)
44
Examples of What to Look For:
![Page 45: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/45.jpg)
45
Examples of What to Look For:
![Page 46: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/46.jpg)
46
Examples of What to Look For:
![Page 47: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/47.jpg)
47
Examples of What to Look For:
![Page 48: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/48.jpg)
[Disclosure Review]
![Page 49: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/49.jpg)
Discussion• How much cleaning do you do to a data
collection?• When is it appropriate to change the ‘original
order’ of a data collection?• How many processing details do you include
in the study documentation?
![Page 50: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/50.jpg)
Example: Review @ICPSR
![Page 51: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/51.jpg)
A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.
Do no harm.
![Page 52: Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle](https://reader035.fdocuments.net/reader035/viewer/2022062808/568153d5550346895dc1cbc0/html5/thumbnails/52.jpg)
We Are Here Today: Review & Processing