RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

53
Domain Repositories and Institutional Repositories Partnering to Curate: Opportunities and Examples Jared Lyle RDAP13

description

Jared Lyle, ICPSR Domain Repositories and Institutional Repositories Partnering to Curate: Opportunities and Examples Panel: Partnerships between institutional repositories, domain repositories, and publishers Research Data Access & Preservation Summit 2013 Baltimore, MD April 4, 2013 #rdap13

Transcript of RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Page 1: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Domain Repositories and Institutional Repositories Partnering to Curate: Opportunities and Examples

Jared LyleRDAP13

Page 2: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

About ICPSR• Founded in 1962 as a consortium of 21

universities to share the National Election Survey

• Today: 700+ members around the world• Data dissemination for more than 20 federal

and non-government sponsors• 600,000+ visitors per year

Page 3: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

What we do• Acquire and archive social science data• Distribute data to researchers• Preserve data for future generations• Provide training in quantitative methods

Archive size• 8,000 data collections, over 60,000 data sets• Grows by 300+ collections a year• 9 Terabytes, soon to be 40+ Terabytes

Page 4: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

http://www.icpsr.umich.edu

Page 5: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

http://www.flickr.com/photos/dwiggs/3983200894/sizes/l/in/photostream/

Page 6: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

1. Sharing Data (Archiving)

Page 7: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

“It saves funding and avoids repeated data collecting efforts, allows the verification and replication of research findings, facilitates scientific openness, deters scientific misconduct, and supports communication and progress.”

Niu (2006). “Reward and Punishment Mechanism for Research Data Sharing.” http://www.iassistdata.org/downloads/iqvol304niu.pdf

Page 8: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

“Virtually all geneticists believe that scientists should share their results freely with peers…”

Louis, Jones, and Campbell (2002). “Sharing in Science.” http://dx.doi.org/10.1511/2002.4.304

Page 9: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

“…the era of data sharing has arrived.”

Samet (2009). “Data: To Share or Not to Share?” http://dx.doi.org/10.1097/EDE.0b013e3181930df3

Page 10: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

http://www.data-pass.org/

Page 11: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Most PIs indicated that they wanted to be “Good Citizens” and help:

“This sounds like an exciting project.”

“I hope your project is successful because I think that it is important.”

Page 12: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

“Good Citizens” = high willingness

…but no time, money, or resources to submit data to us.

Page 13: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

14.2%

58.7%

25.7%

010203040506070

Data AreArchived

Has Copy ofData

Data Are Lost

Data Sharing (N=1,544)

Pienta, Gutmann, & Lyle (2009). “Research Data in The Social Sciences: How Much is Being Shared?” http://ori.hhs.gov/content/research-research-integrity-rri-conference-2009

See also: Pienta, Gutmann, Hoelter, Lyle, & Donakowski (2008). “The LEADS Database at ICPSR: Identifying Important ‘At Risk’ Social Science Data.” http://www.data-pass.org/sites/default/files/Pienta_et_al_2008.pdf

Page 14: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Data Sharing (N=935)

Federal Agency

Shared Formally, Archived(n=111)

Shared Informally, Not Archived(n=415)

Not Shared(n=409)

NSF (27.3%)

22.4% 43.7% 33.9%

NIH(72.7%)

7.4% 45.0% 47.6%

Total 11.5% 44.6% 43.9%

Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data”. http://hdl.handle.net/2027.42/78307

Page 15: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

2. Enhancing Data (Curating)

Page 16: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.

Page 17: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

A corollary: Do no harm.

http://img.gawkerassets.com/img/17xbuy519gga2jpg/ku-xlarge.jpg

Page 18: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Data

Page 19: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Documentation

http://dx.doi.org/10.3886/ICPSR31521.v1

Page 20: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

20

Page 21: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

21

Page 22: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
Page 23: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Disclosure Issues

• Direct Identifiers? – personal names– addresses (including ZIP codes)– telephone numbers– social security numbers– driver license numbers– patient numbers– certification numbers,

Page 24: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Disclosure Issues

• Indirect Identifiers? – detailed geography (i.e., state, county, or

census tract of residence)– exact date of birth– exact occupations held– exact dates of events– detailed income

Page 25: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Disclosure Issues

• External Linkages?– public patient/medical records– court records– police and correction records– Social Security records– Medicare records– driver’s licenses– military records

Page 26: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

http://www.flickr.com/photos/k3v1nm/3366181223/

Opportunity

Page 27: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

“It saves funding and avoids repeated data collecting efforts, allows the verification and replication of research findings, facilitates scientific openness, deters scientific misconduct, and supports communication and progress.”Niu (2006). “Reward and Punishment Mechanism for Research Data Sharing.” http://www.iassistdata.org/downloads/iqvol304niu.pdf

Page 28: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

“Search/Compare Variables” examines 2.1 million variables in 4,000 data collections

Page 29: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
Page 30: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Emerging sources and types of data

• Geo-spatial• Video• Administrative data• Online text• Transactions• Clicks• Sensors

Page 31: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Partnerships

Green, Ann G., and Myron P. Gutmann. (2007) "Building Partnerships Among Social Science  Researchers, Institution-based Repositories, and Domain Specific Data Archives."  OCLC Systems and Services: International Digital Library Perspectives. 23: 35-53.   http://hdl.handle.net/2027.42/41214

“We propose that domain specific archives partner with institution based repositories to provide expertise, tools, guidelines, and best practices to the research communities they serve.”

Page 32: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Support:

Page 33: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

http://www.icpsr.umich.edu/icpsrweb/IR/

Page 34: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

5 Pilot Data Collections

http://www.flickr.com/photos/smithsonian/2551170386/

Page 35: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Selection & Appraisal

Page 36: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Recovery

Page 37: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Finding interested partners

http://www.flickr.com/photos/usnationalarchives/4726917373/

Page 38: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Time & Willingness

http://www.flickr.com/photos/floridamemory/7026619371/

Page 39: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Inter-university Consortium for Political and Social Research. Survey of Data Curation Services for Repositories, 2012. ICPSR34302-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2012-09-21. doi:10.3886/ICPSR34302.v1

Survey of Repositories’ Data Needs

Page 40: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

• Media recovery, format migration, data recovery

• Cost estimating and policy review• Metadata tools, documentation, and catalog

linkages• Support networks and training• Confidential data dissemination and

confidentiality review

Repository Suggested Solutions:

Page 41: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

1. Community Wayfinder

Page 42: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf

Page 43: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

2. Confidentiality Review & Treatment

Page 44: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

• Suppressing unique cases• Grouping values (e.g., 13-29=1, 30-49=2)• Top-coding (e.g., >1,000=1,000)• Aggregating geographic areas• Swapping values• Sampling within a larger data collection• Adding “noise”• Replacing real data with synthetic data

Page 45: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

http://www.icpsr.umich.edu/icpsrweb/content/DSDR/tools/qualanon.html

Page 46: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

3. Access to Processing Tools

Page 47: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
Page 48: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
Page 49: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

The Virtual Data Enclave (VDE) provides remote access to quantitative data in a secure environment.

Page 50: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
Page 51: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Hermes Outputs

• ASCII data files– Column- and tab-delimited

• Stat package setup files– SAS, SPSS, Stata (.do and .dct)

• “Ready-to-go” data files– SAS transport (CPORT engine)– SPSS system (.sav)– Stata system (.dta)– R (.rda)

Page 52: RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…

Useful categories for discussion?• Media recovery, format migration, data recovery• Cost estimating and policy review• Metadata tools, documentation, and catalog

linkages• Support networks and training• Confidential data dissemination and

confidentiality review

Your ideas on partnerships?