Cal Poly - Data Management and the DMPTool

70
Carly Strasser | @carlystrasser California Digital Library Cal Poly Oct 2013 From Flickr by OZinOH DMPTool The Data Management Planning Tool
  • date post

    17-Oct-2014
  • Category

    Technology

  • view

    387
  • download

    1

description

October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University. Many funders now require researchers to submit a Data Management Plan alongside their project proposals. The DMPTool is a free, online wizard that helps you create a data management plan specific to your project, and provides you with links and resources for ensuring your plan is successful.

Transcript of Cal Poly - Data Management and the DMPTool

Page 1: Cal Poly - Data Management and the DMPTool

 

Carly  Strasser      |  @carlystrasser  California  Digital  Library    

 

Cal  Poly  Oct  2013  

From  Flickr  by  OZinOH  

DMPTool    The  Data  Management  Planning  Tool  

Page 2: Cal Poly - Data Management and the DMPTool
Page 3: Cal Poly - Data Management and the DMPTool

Background  

Current  DMP  tools  

A  walkthrough  of  DMPTool  

DMPTool2  

Roadmap  

From

 Flickr  by    (L

uciano

)  

Page 4: Cal Poly - Data Management and the DMPTool

From

 Flickr  by  Flickm

or  

From

 Flickr  by  US  Arm

y  En

vironm

ental  C

omman

d  

From

 Flickr  by    DW08

25  

C.  Strasser  

Courtesey  of  W

HOI  

From

 Flickr  by    deltaMike  

Page 5: Cal Poly - Data Management and the DMPTool

From  Flickr  by  ~Minnea~  

Data  management  Documentation  Reproducibility  

Page 6: Cal Poly - Data Management and the DMPTool

From  Flickr  by    ahockey  

DMPs!  

Page 7: Cal Poly - Data Management and the DMPTool

A  document  that  describes  what  you  will  do  with  your  

data  both  during  your  research    

and  after  you  complete  your  project  

From  Flickr  by  Barbies  Land  

What  is  a  data  management  plan?  

Page 8: Cal Poly - Data Management and the DMPTool

For  funders:  A  short  plan  submitted  alongside  grant  applications  

But  they  all  have  different  requirements  and  express  them  in  

different  ways  

From  Flickr  by  401(K)  2013  

 An  outline  of    –  what  will  be  collected  –  methods  –  Standards  –  Metadata  –  sharing/access  –  long-­‐term  storage  

 Includes  how  and  why  

Page 9: Cal Poly - Data Management and the DMPTool

•  Saves  time  •  Increases  research  efficiency  

•  Satisfies  requirements  

Why  prepare  a  DMP?  From  Flickr  by  natalinha    

Page 10: Cal Poly - Data Management and the DMPTool

When  do  you  plan?  

Yes.  

Page 11: Cal Poly - Data Management and the DMPTool

Plan  

Collect  

Assure  

Describe  

Preserve  

Discover  

Integrate  

Analyze  

Proposal  writing  

Research  

Publication  

Ideas  

When  do  you  plan?  

Page 12: Cal Poly - Data Management and the DMPTool

Remember!  

From  Flickr  by  gmacfadyen  

•  Keep  your  plan  current  •  Incorporate  changes    •  Use  as  a  guide  for  daily  

activities    

Page 13: Cal Poly - Data Management and the DMPTool

From

 Flickr  by  celikins  

Where  to  start?  

Page 14: Cal Poly - Data Management and the DMPTool

Small  &  Simple  •  Document  what  you  know  now  •  Share  the  plan  with  your  team  •  Avoid  procrastination  and  immobilization    

Where  to  start?  

Page 15: Cal Poly - Data Management and the DMPTool

Unruly  Data  

Researcher  

Research  Support  Office  

Data  Library  /  Repository  

Computing  Support  

Faculty  Ethics  Committee   Etc...  

DMP?  

Courtesy  of  Martin  Donnelly  

Page 16: Cal Poly - Data Management and the DMPTool

Who:  Support  Services  &  Collaborators  

Plan  section   Support  

Information  about  data   PI,  co-­‐PIs,  research  staff  

Metadata  content  &  format   Librarians,  data  repositories  

Policies  for  access,  sharing,  reuse  

Funder,  institute,  HIPPA,  IRB,  users  

Long-­‐term  storage  and  data  management  

Librarians,  IT  staff,  data  repositories  

Budget   Sponsored  programs  office,  funder  

Page 17: Cal Poly - Data Management and the DMPTool

DMPs:  A  Short  History  

Liz  Lyon:  Dealing  with  Data    2008  

UK  funder  expectations  2009  

2009-­‐10  

Page 18: Cal Poly - Data Management and the DMPTool

Federal  Funding  Accountability  and  Transparency  Act  2006    

Across  the  Pond…  

2010  2010  –present    

DMPs:  A  Short  History  

Page 19: Cal Poly - Data Management and the DMPTool

 DMP  supplement  may  include:  1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum  

materials,  and  other  materials  to  be  produced  in  the  course  of  the  project  

2.   the  standards  to  be  used  for  data  and  metadata  format  and  content  (where  existing  standards  are  absent  or  deemed  inadequate,  this  should  be  documented  along  with  any  proposed  solutions  or  remedies)  

3.   policies  for  access  and  sharing  including  provisions  for  appropriate  protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other  rights  or  requirements  

4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of  derivatives  

5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for  preservation  of  access  to  them  

NSF  DMP  Requirements  From  Grant  Proposal  Guidelines:  

Page 20: Cal Poly - Data Management and the DMPTool

•  Types  of  data  produced  

•  Relationship  to  existing  data  

•  How/when/where  will  the  data  be  captured  or  created?  

•  How  will  the  data  be  processed?  

•  Quality  assurance  &  quality  control  measures  

•  Security:  version  control,  backing  up  

•  Who  will  be  responsible  for  data  management  during/after  project?  

1.  Types  of  data  &  other  information  

biology.kenyon.edu  

C.  Strasser  

From  Flickr  by  Lazurite  

Page 21: Cal Poly - Data Management and the DMPTool

Wired.com  

•  What  metadata  are  needed  to  make  the  data  meaningful?  •  How  will  you  create  or  capture  these  metadata?    •  Why  have  you  chosen  particular  standards  and  approaches  

for  metadata?  

2.  Data  &  metadata  standards  

Page 22: Cal Poly - Data Management and the DMPTool

•  Are  you  under  any  obligation  to  share  data?    

•  How,  when,  &  where  will  you  make  the  data  available?    

•  What  is  the  process  for  gaining  access  to  the  data?    

•  Who  owns  the  copyright  and/or  intellectual  property?  

•  Will  you  retain  rights  before  opening  data  to  wider  use?  How  long?  •  Are  permission  restrictions  necessary?  •  Embargo  periods  for  political/commercial/patent  reasons?    •  Ethical  and  privacy  issues?  •  Who  are  the  foreseeable  data  users?  •  How  should  your  data  be  cited?  

3.  Policies  for  access  &  sharing  4.  Policies  for  re-­‐use  &  re-­‐distribution  

Page 23: Cal Poly - Data Management and the DMPTool

•  What  data  will  be  preserved  for  the  long  term?  For  how  long?      

•  Where  will  data  be  preserved?  

•  What  data  transformations  need  to  occur  before  preservation?  

5.  Plans  for  archiving  &  preservation  

From  Flickr  by  theManWhoSurfedTooMuch  

•  What  metadata  will  be  submitted  alongside  the  datasets?  

•  Who  will  be  responsible  for  preparing  data  for  preservation?  Who  will  be  the  main  contact  person  for  the  archived  data?  

Page 24: Cal Poly - Data Management and the DMPTool

Don’t  forget:  Budget  

•  Costs  of  data  preparation  &  documentation  Hardware,  software  Personnel  Archive  fees  

•  How  costs  will  be  paid    Request  funding!  

dorrvs.com  

Page 25: Cal Poly - Data Management and the DMPTool

NSF’s  Vision*  

DMPs  and  their  evaluation  will  grow  &  change  over  time    

Peer  review  will  determine  next  steps  

Community-­‐driven  guidelines    

Evaluation  will  vary  with  directorate,  division,  &  program  officer  

 

*Unofficially  

Page 26: Cal Poly - Data Management and the DMPTool

•  Project  name:  Effects  of  temperature  and  salinity  on  population  growth  of  the  estuarine  copepod,  Eurytemora  affinis  

•  Project  participants  and  affiliations:    Carly  Strasser  (University  of  Alberta  and  Dalhousie  University)  Mark  Lewis  (University  of  Alberta)  Claudio  DiBacco  (Dalhousie  University  and  Bedford  Institute  of    Oceanography)  

•  Funding  agency:  CAISN  (Canadian  Aquatic  Invasive  Species  Network)    •     •  Description  of  project  aims  and  purpose:  •    We  will  rear  populations  of  E.  affinis  in  the  laboratory  at  three  temperatures  and  three  

salinities  (9  treatments  total).  We  will  document  the  population  from  hatching  to  death,  noting  the  proportion  of  individuals  in  each  stage  over  time.  The  data  collected  will  be  used  to  parameterize  population  models  of  E.  affinis.  We  will  build  a  model  of  population  growth  as  a  function  of  temperature  and  salinity.  This  will  be  useful  for  studies  of  invasive  copepod  populations  in  the  Northeast  Pacific.    

•  Video  Source:  Plankton  Copepods.  Video.    Encyclopædia  Britannica  Online.    Web.  13  Jun.  2011    

A  DMP  Example  (1)  

Page 27: Cal Poly - Data Management and the DMPTool

•  1.  Information  about  data  •     Every  two  days,  we  will  subsample  E.  affinis  populations  growing  at  our  

treatment  conditions.    We  will  use  a  microscope  to  identify  the  stage  and  sex  of  the  subsampled  individuals.    We  will  document  the  information  first  in  a  laboratory  notebook,  then  copy  the  data  into  an  Excel  spreadsheet.  For  quality  control,  values  will  be  entered  separately  by  two  different  people  to  ensure  accuracy.    The  Excel  spreadsheet  will  be  saved  as  a  comma-­‐separated  value  (.csv)  file  daily  and  backed  up  to  a  server.  After  all  data  are  collected,  the  Excel  spreadsheet  will  be  saved  as  a  .csv  file  and  imported  into  the  program  R  for  statistical  analysis.  Strasser  will  be  responsible  for  all  data  management  during  and  after  data  collection.  

•     Our  short-­‐term  data  storage  plan,  which  will  be  used  during  the  experiment,  will  be  to  save  copies  of  1)  the  .txt  metadata  file  and  2)  the  Excel  spreadsheet  as  .csv  files  to  an  external  drive,  and  to  take  the  external  drive  off  site  nightly.    We  will  use  the  Subversion  version  control  system  to  update  our  data  and  metadata  files  daily  on  the  University  of  Alberta  Mathematics  Department  server.  We  will  also  have  the  laboratory  notebook  as  a  hard  copy  backup.  

A  DMP  Example  (2)  

Page 28: Cal Poly - Data Management and the DMPTool

•  2.      Metadata  format  &  content  •    We  will  first  document  our  metadata  by  taking  careful  notes  in  the  laboratory  

notebook  that  refer  to  specific  data  files  and  describe  all  columns,  units,  abbreviations,  and  missing  value  identifiers.    These  notes  will  be  transcribed  into  a  .txt  document  that  will  be  stored  with  the  data  file.    After  all  of  the  data  are  collected,  we  will  then  use  EML  (Ecological  Metadata  Language)  to  digitize  our  metadata.  EML  is  on  of  the  accepted  formats  used  in  Ecology,  and  works  well  for  the  type  of  data  we  will  be  producing.  We  will  create  these  metadata  using  Morpho  software,  available  through  the  Knowledge  Network  for  Biocomplexity  (KNB).  The  documentation  and  metadata  will  describe  the  data  files  and  the  context  of  the  measurements.  

A  DMP  Example  (3)  

Page 29: Cal Poly - Data Management and the DMPTool

3.    Policies  for  access,  sharing  &  reuse  •    We  are  required  to  share  our  data  with  the  CAISN  network  after  all  data  

have  been  collected  and  metadata  have  been  generated.  This  should  be  no  more  than  6  months  after  the  experiments  are  completed.    In  order  to  gain  access  to  CAISN  data,  interested  parties  must  contact  the  CAISN  data  manager  ([email protected])  or  the  authors  and  explain  their  intended  use.      Data  requests  will  be  approved  by  the  authors  after  review  of  the  proposed  use.      

•    The  authors  will  retain  rights  to  the  data  until  the  resulting  publication  is  produced,  within  two  years  of  data  production.    After  publication  (or  after  two  years,  whichever  is  first),  the  authors  will  open  data  to  public  use.    After  publication,  we  will  submit  our  data  to  the  KNB  allowing  discovery  and  use  by  the  wider  scientific  community.  Interested  parties  will  be  able  to  download  the  data  directly  from  KNB  without  contacting  the  authors,  but  will  still  be  encouraged  to  give  credit  to  the  authors  for  the  data  used  by  citing  a  KNB  accession  number  either  in  the  publication’s  text  or  in  the  references  list.  

A  DMP  Example  (4)  

Page 30: Cal Poly - Data Management and the DMPTool

4.      Long-­‐term  storage  and  data  management    The  data  set  will  be  submitted  to  KNB  for  long-­‐term  preservation  and  storage.    The  authors  will  submit  metadata  in  EML  format  along  with  the  data  to  facilitate  its  reuse.  Strasser  will  be  responsible  for  updating  metadata  and  data  author  contact  information  in  the  KNB.        

•  5.    Budget    •    A  tablet  computer  will  be  used  for  data  collection  in  the  field,  which  

will  cost  approximately  $500.    Data  documentation  and  preparation  for  reuse  and  storage  will  require  approximately  one  month  of  salary  for  one  technician.  The  technician  will  be  responsible  for  data  entry,  quality  control  and  assurance,  and  metadata  generation.  These  costs  are  included  in  the  budget  in  lines  12-­‐16.  

A  DMP  Example  (5)  

Page 31: Cal Poly - Data Management and the DMPTool

From

 Flickr  by  thew

matt  

Page 32: Cal Poly - Data Management and the DMPTool

Background  

Current  DMP  tools  

A  walkthrough  of  DMPTool  

DMPTool2  

Roadmap  

From

 Flickr  by    (L

uciano

)  

Page 33: Cal Poly - Data Management and the DMPTool

Step-­‐by-­‐step  wizard  for  generating  DMP  

Create  |  edit  |  re-­‐use  |  share  |  save  |  generate    

Open  to  community    

DMPonline:    dmponline.dcc.ac.uk  

Page 34: Cal Poly - Data Management and the DMPTool

From  Flickr  by  Max  Chu  

dmptool.org                    

Page 35: Cal Poly - Data Management and the DMPTool

DMPTool  Project  

•  Partners  started  working  in  January  2011  •  Developed  requirements,  divided  work  •  Self-­‐funded  /  In-­‐kind  

Page 36: Cal Poly - Data Management and the DMPTool

•  Free    •  Guides  through  creating  a  DMP  •  Helps  meet  funder  requirements    

•  Supplies  questions    •  Includes  explanation/context  provided  by  

the  agency  •  Provides  links  to  the  agency  website    

dmptool.org                    

Page 37: Cal Poly - Data Management and the DMPTool

Wait!  

Data  management  planning  is  complex  &  requires  dialog    

Range  of  support  &  understanding    

Our  focus:    •  simplify  &  scale  the  common  parts  •  develop  community  •  provide  incremental  improvement  

in  functionality  

From  Flickr  by  ChrisGoldNY  

Page 38: Cal Poly - Data Management and the DMPTool

Background    

Current  DMP  tools  

A  walkthrough  of  DMPTool  

DMPTool2  

Roadmap  

From

 Flickr  by    (L

uciano

)  

Page 39: Cal Poly - Data Management and the DMPTool
Page 40: Cal Poly - Data Management and the DMPTool
Page 41: Cal Poly - Data Management and the DMPTool
Page 42: Cal Poly - Data Management and the DMPTool
Page 43: Cal Poly - Data Management and the DMPTool
Page 44: Cal Poly - Data Management and the DMPTool
Page 45: Cal Poly - Data Management and the DMPTool
Page 46: Cal Poly - Data Management and the DMPTool
Page 47: Cal Poly - Data Management and the DMPTool
Page 48: Cal Poly - Data Management and the DMPTool
Page 49: Cal Poly - Data Management and the DMPTool
Page 50: Cal Poly - Data Management and the DMPTool
Page 51: Cal Poly - Data Management and the DMPTool

From  Flickr  by  Clonny  

Researchers  like  it  here  

DMPTool  can  be  added  to  campus  single  sign-­‐on  service  

Researchers  use  campus  login  to  access  tool  

Access  

Page 52: Cal Poly - Data Management and the DMPTool

Customized  Resources  

Institution-­‐specific…  •  Help  text  •  Links  to  resources  &  services  •  Suggested  answers    …at  different  levels  •  All  DMPs  •  All  DMPs  for  a  particular  

funding  agency  •  Question  within  a  data  

management  plan    

From  Flickr  by  lumachrome  

Page 53: Cal Poly - Data Management and the DMPTool
Page 54: Cal Poly - Data Management and the DMPTool

From  Flickr  by  OZinOH  

DMPTool2    

Page 55: Cal Poly - Data Management and the DMPTool

DMPTool  Uptake  

0  

100  

200  

300  

400  

500  

600  

700  

800  

900  

1000  

0  

1000  

2000  

3000  

4000  

5000  

6000  

7000  

Num

ber  o

f  Ins

titution

s  

Num

ber  o

f  Plans

 (solid)  &

 Uniqu

e  Use

rs  (d

ashe

d)  

Unique  Users  Plans  Institutions  

Page 56: Cal Poly - Data Management and the DMPTool

DMPTool  2:    Responding  to  the  Community    

Coming  

Soon!  

Page 57: Cal Poly - Data Management and the DMPTool

Plan  Creators   &   Plan  

Administrators  

2

Page 58: Cal Poly - Data Management and the DMPTool

•  Collaborative  plan  creation  •  Role-­‐based  user  authorization  &  access  •  Better  plan  templates  &  resources  

Improvements  for  Plan  Creators  

2

Page 59: Cal Poly - Data Management and the DMPTool

•  Template  creation:  •  Better  plan  template  granularity        discipline,  funder,  question  •  Better  institution  granularity  

 department,  college,  lab  group,  …  •  Enhanced  search  and  browse  of  plans  •  Access  to  metrics  for  reporting  &  follow-­‐up  

New  administrator  Interface  

2

What  this  means  for  plan  creators:    •  Better  plans  •  More  granular  help  •  Local  input  &  assistance  

Page 60: Cal Poly - Data Management and the DMPTool

2 Open  RESTful  API  

?  

Page 61: Cal Poly - Data Management and the DMPTool

Yelp  

Google  Maps  

APIs  

Page 62: Cal Poly - Data Management and the DMPTool

DMPTool  

•  Carefully  thought  out  code    •  Invisible  to  user  •  Expose  specific  functionality  and/or  data  

•  Other  functionality/data  protected  

Other  stuff  

API  

Page 63: Cal Poly - Data Management and the DMPTool

Interactions   Popularity  

API  Benefits  

Improve  functionality  Add  more  functionality  

Combine  with  their  services    

Page 64: Cal Poly - Data Management and the DMPTool
Page 65: Cal Poly - Data Management and the DMPTool
Page 66: Cal Poly - Data Management and the DMPTool

IMLS  Grant  

Improving  Data  Stewardship  with  the  DMPTool  

Provide  librarians  with  the    tools  and  resources    

to  claim  the  data  management  education  space  

Page 67: Cal Poly - Data Management and the DMPTool
Page 68: Cal Poly - Data Management and the DMPTool

blog.dmptool.org  

Page 69: Cal Poly - Data Management and the DMPTool

facebook.com/dmptool  

Page 70: Cal Poly - Data Management and the DMPTool

Email  Tweet  me  My  slides  

[email protected]  @carlystrasser    slideshare.net/carlystrasser    

Questions?  Website  Twitter  

Blog  

dmptool.org  @TheDMPTool  blog.dmptool.org