Poster RDAP13: Provenance of Figures in the Global Change Information System
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
description
Transcript of RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
![Page 1: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/1.jpg)
Domain Repositories and Institutional Repositories Partnering to Curate: Opportunities and Examples
Jared LyleRDAP13
![Page 2: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/2.jpg)
About ICPSR• Founded in 1962 as a consortium of 21
universities to share the National Election Survey
• Today: 700+ members around the world• Data dissemination for more than 20 federal
and non-government sponsors• 600,000+ visitors per year
![Page 3: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/3.jpg)
What we do• Acquire and archive social science data• Distribute data to researchers• Preserve data for future generations• Provide training in quantitative methods
Archive size• 8,000 data collections, over 60,000 data sets• Grows by 300+ collections a year• 9 Terabytes, soon to be 40+ Terabytes
![Page 4: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/4.jpg)
http://www.icpsr.umich.edu
![Page 5: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/5.jpg)
http://www.flickr.com/photos/dwiggs/3983200894/sizes/l/in/photostream/
![Page 6: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/6.jpg)
1. Sharing Data (Archiving)
![Page 7: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/7.jpg)
“It saves funding and avoids repeated data collecting efforts, allows the verification and replication of research findings, facilitates scientific openness, deters scientific misconduct, and supports communication and progress.”
Niu (2006). “Reward and Punishment Mechanism for Research Data Sharing.” http://www.iassistdata.org/downloads/iqvol304niu.pdf
![Page 8: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/8.jpg)
“Virtually all geneticists believe that scientists should share their results freely with peers…”
Louis, Jones, and Campbell (2002). “Sharing in Science.” http://dx.doi.org/10.1511/2002.4.304
![Page 9: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/9.jpg)
“…the era of data sharing has arrived.”
Samet (2009). “Data: To Share or Not to Share?” http://dx.doi.org/10.1097/EDE.0b013e3181930df3
![Page 11: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/11.jpg)
Most PIs indicated that they wanted to be “Good Citizens” and help:
“This sounds like an exciting project.”
“I hope your project is successful because I think that it is important.”
![Page 12: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/12.jpg)
“Good Citizens” = high willingness
…but no time, money, or resources to submit data to us.
![Page 13: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/13.jpg)
14.2%
58.7%
25.7%
010203040506070
Data AreArchived
Has Copy ofData
Data Are Lost
Data Sharing (N=1,544)
Pienta, Gutmann, & Lyle (2009). “Research Data in The Social Sciences: How Much is Being Shared?” http://ori.hhs.gov/content/research-research-integrity-rri-conference-2009
See also: Pienta, Gutmann, Hoelter, Lyle, & Donakowski (2008). “The LEADS Database at ICPSR: Identifying Important ‘At Risk’ Social Science Data.” http://www.data-pass.org/sites/default/files/Pienta_et_al_2008.pdf
![Page 14: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/14.jpg)
Data Sharing (N=935)
Federal Agency
Shared Formally, Archived(n=111)
Shared Informally, Not Archived(n=415)
Not Shared(n=409)
NSF (27.3%)
22.4% 43.7% 33.9%
NIH(72.7%)
7.4% 45.0% 47.6%
Total 11.5% 44.6% 43.9%
Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data”. http://hdl.handle.net/2027.42/78307
![Page 15: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/15.jpg)
2. Enhancing Data (Curating)
![Page 16: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/16.jpg)
A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.
![Page 17: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/17.jpg)
A corollary: Do no harm.
http://img.gawkerassets.com/img/17xbuy519gga2jpg/ku-xlarge.jpg
![Page 18: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/18.jpg)
Data
![Page 19: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/19.jpg)
Documentation
http://dx.doi.org/10.3886/ICPSR31521.v1
![Page 20: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/20.jpg)
20
![Page 21: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/21.jpg)
21
![Page 22: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/22.jpg)
![Page 23: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/23.jpg)
Disclosure Issues
• Direct Identifiers? – personal names– addresses (including ZIP codes)– telephone numbers– social security numbers– driver license numbers– patient numbers– certification numbers,
![Page 24: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/24.jpg)
Disclosure Issues
• Indirect Identifiers? – detailed geography (i.e., state, county, or
census tract of residence)– exact date of birth– exact occupations held– exact dates of events– detailed income
![Page 25: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/25.jpg)
Disclosure Issues
• External Linkages?– public patient/medical records– court records– police and correction records– Social Security records– Medicare records– driver’s licenses– military records
![Page 26: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/26.jpg)
http://www.flickr.com/photos/k3v1nm/3366181223/
Opportunity
![Page 27: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/27.jpg)
“It saves funding and avoids repeated data collecting efforts, allows the verification and replication of research findings, facilitates scientific openness, deters scientific misconduct, and supports communication and progress.”Niu (2006). “Reward and Punishment Mechanism for Research Data Sharing.” http://www.iassistdata.org/downloads/iqvol304niu.pdf
![Page 28: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/28.jpg)
“Search/Compare Variables” examines 2.1 million variables in 4,000 data collections
![Page 29: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/29.jpg)
![Page 30: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/30.jpg)
Emerging sources and types of data
• Geo-spatial• Video• Administrative data• Online text• Transactions• Clicks• Sensors
![Page 31: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/31.jpg)
Partnerships
Green, Ann G., and Myron P. Gutmann. (2007) "Building Partnerships Among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives." OCLC Systems and Services: International Digital Library Perspectives. 23: 35-53. http://hdl.handle.net/2027.42/41214
“We propose that domain specific archives partner with institution based repositories to provide expertise, tools, guidelines, and best practices to the research communities they serve.”
![Page 32: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/32.jpg)
Support:
![Page 34: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/34.jpg)
5 Pilot Data Collections
http://www.flickr.com/photos/smithsonian/2551170386/
![Page 35: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/35.jpg)
Selection & Appraisal
![Page 36: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/36.jpg)
Recovery
![Page 37: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/37.jpg)
Finding interested partners
http://www.flickr.com/photos/usnationalarchives/4726917373/
![Page 38: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/38.jpg)
Time & Willingness
http://www.flickr.com/photos/floridamemory/7026619371/
![Page 39: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/39.jpg)
Inter-university Consortium for Political and Social Research. Survey of Data Curation Services for Repositories, 2012. ICPSR34302-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2012-09-21. doi:10.3886/ICPSR34302.v1
Survey of Repositories’ Data Needs
![Page 40: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/40.jpg)
• Media recovery, format migration, data recovery
• Cost estimating and policy review• Metadata tools, documentation, and catalog
linkages• Support networks and training• Confidential data dissemination and
confidentiality review
Repository Suggested Solutions:
![Page 41: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/41.jpg)
1. Community Wayfinder
![Page 42: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/42.jpg)
http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf
![Page 43: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/43.jpg)
2. Confidentiality Review & Treatment
![Page 44: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/44.jpg)
• Suppressing unique cases• Grouping values (e.g., 13-29=1, 30-49=2)• Top-coding (e.g., >1,000=1,000)• Aggregating geographic areas• Swapping values• Sampling within a larger data collection• Adding “noise”• Replacing real data with synthetic data
![Page 45: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/45.jpg)
http://www.icpsr.umich.edu/icpsrweb/content/DSDR/tools/qualanon.html
![Page 46: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/46.jpg)
3. Access to Processing Tools
![Page 47: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/47.jpg)
![Page 48: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/48.jpg)
![Page 49: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/49.jpg)
The Virtual Data Enclave (VDE) provides remote access to quantitative data in a secure environment.
![Page 50: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/50.jpg)
![Page 51: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/51.jpg)
Hermes Outputs
• ASCII data files– Column- and tab-delimited
• Stat package setup files– SAS, SPSS, Stata (.do and .dct)
• “Ready-to-go” data files– SAS transport (CPORT engine)– SPSS system (.sav)– Stata system (.dta)– R (.rda)
![Page 52: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…](https://reader038.fdocuments.net/reader038/viewer/2022110114/54628447b1af9f7d228b4f02/html5/thumbnails/52.jpg)
Useful categories for discussion?• Media recovery, format migration, data recovery• Cost estimating and policy review• Metadata tools, documentation, and catalog
linkages• Support networks and training• Confidential data dissemination and
confidentiality review
Your ideas on partnerships?