Oscon 2011 Practicing Open Science

32
Practicing Open Science William J Schroeder, Kitware, Inc. Brian Wylie, Sandia National Labs Marcus Hanwell, Kitware, Inc.

Transcript of Oscon 2011 Practicing Open Science

Page 1: Oscon 2011 Practicing Open Science

Practicing Open Science William J Schroeder, Kitware, Inc. Brian Wylie, Sandia National Labs Marcus Hanwell, Kitware, Inc.

Page 2: Oscon 2011 Practicing Open Science

Speakers & Topics

§  William Schroeder, President & CEO, Kitware, Inc. -  The whys and hows of Open Science

§  Dr. Marcus Hanwell, R&D Engineer, Kitware, Inc. -  Building an open-source research program (in Chemistry)

§  Brian Wylie, Sandia National Labs -  Research collaborations from a government perspective

Page 3: Oscon 2011 Practicing Open Science

The Scientific Method

•  Document    •  Share  

•  Data  •  Methodology  

•  Archive  

Galileo Galilei 1613

Page 4: Oscon 2011 Practicing Open Science

Open Science

§  Open Documents -  Hypothesis -  Descriptions -  Results

§  Open Data

§  Open Methodology -  Experimental apparatus -  Software -  Workflow -  Parameter Sets

Ensuring reproducibility

If it isn’t reproducible, it isn’t science

REPRODUCIBILITY

Positive Evidence

Accumulate Support

Negative Evidence

Disproof Hypothesis

Page 5: Oscon 2011 Practicing Open Science

§  Augmented PDF §  Contains links to executable viewer §  Downloads data and viewer as necessary to reproduce

paper images (results)

Example: OSA Interactive Science Publishing (ISP)

Page 6: Oscon 2011 Practicing Open Science

Example: Insight Journal §  Timely publishing of publications, data, and software §  Evaluated automatically; further reviewed by community

Code

Input Data

Journal Git Repository

Web Site

Results Data

Author

Build Machines

PDF doc

Page 7: Oscon 2011 Practicing Open Science

Benefits of Open Science

§  Collaboration -  Leveraging international communities

and expertize

§  Agile Innovation -  Facilitate technology mashups -  Move science to application faster -  More focus on technology; less on protection

§  Business Models -  Growing the pie, creating new opportunities -  Customization, software integration

“…much of our intelligence and creativity results from interactions with tools and artifacts and from collaborating with other individuals.”

-- Shneiderman

Page 8: Oscon 2011 Practicing Open Science

Example: Collaboration §  NIH National Center of Biomedical Computing NA-MIC §  Developing the OS NA-MIC Kit; 3D Slicer application

Page 9: Oscon 2011 Practicing Open Science

Example: Agile Innovation (Open Source for Medical Imaging)

Led to the creation of: -  ITK

-  VolView

-  BioImageXD

-  Osirix

-  MedINRIA

-  VisTrails

-  NIH / NCI caBIG – XIP

-  VR-Renderer

-  IGSTK

-  ParaView

-  Etc….

Creating VTK (Visualization Toolkit)

and finally…

Page 10: Oscon 2011 Practicing Open Science

Example: Business Models

§  Kitware: Building open source collaboration platforms -  The usual support and training -  Consulting -  Engaging in collaborative R&D -  Providing technology integration services,

aka creating custom solutions

CMake

CDash

Page 11: Oscon 2011 Practicing Open Science

The Open Technology Highway

§  Provide an open infrastructure -  Support research, teaching, non-profit

and commercial activities -  Any (legal) activity can hang off of the highway

-  Spur innovation, create opportunities -  Get from idea to product faster

-  Do not have to replicate technology -  Too many toll gates (i.e., closed systems,

unreasonable IP) slows everything down -  Prefer non-reciprocal licenses

Page 12: Oscon 2011 Practicing Open Science

Next Up

§  Marcus: Building a research program for chemistry

§  Brian: open science and research collaboration from a government perspective

Page 13: Oscon 2011 Practicing Open Science

Open Chemistry Growing a Research Program Through Open Source Dr. Marcus Hanwell, Kitware, Inc.

Page 14: Oscon 2011 Practicing Open Science

Grass Roots Effort §  Bootstrapped several efforts without funding

-  Spare time -  Parts of other projects when possible

§  Formed an “unorganization” – Blue Obelisk -  Published first article in 2005 -  Open data, open standards and open source -  Meet at ACS and other conferences when possible -  Follow-up article currently in press

§  Quixote collaboration more recently -  Provide meaningful data storage and exchange -  Principally targeting computational chemistry

Page 15: Oscon 2011 Practicing Open Science

The Early Years §  Avogadro projected started in 2006 §  First funded work in 2007 by Marcus Hanwell

-  Google Summer of Code student -  Final year of Ph.D. spent the summer coding -  Funded as part of KDE project – Kalzium editor

§  Built on several other open source projects -  Qt, Eigen, Open Babel, Blue Obelisk Data Repository

§  Also uses open standards, such as OpenGL for rendering §  Cross platform, open source stack

Page 16: Oscon 2011 Practicing Open Science

Community Tools, Standards and Resources §  Make extensive use of Qt for standard GUI elements

-  Much more than just GUI – multithreading, web resources -  Avogadro chosen as an outstanding example of “Qt in Use” -  Marcus Hanwell recently chosen as a “Qt Ambassador”

§  OpenGL for cross platform 3D rendering -  Accelerated rendering of 3D molecular geometry -  Facilitates interacting with the scene -  Use of GLSL for impressive, fast rendering

§  Open Babel for chemical input/output and more -  There are a lot of chemical file formats… -  Has a lot of chemical knowledge, e.g. bond perception

§  Git for distributed version control -  We work across multiple sites, time zones and institutions -  Gerrit for code review more recently – improving code quality

Page 17: Oscon 2011 Practicing Open Science

Evangelizing: Getting the Message Out §  Traditional social media used to communicate

-  Blogs, Planets, Twitter, Identi.ca, Friendfeed, Google+

§  Talks and posters at conferences -  Open source conferences talking about chemistry -  Chemistry conferences talking about open source chemistry

§  Several meetings and workshops about open chemistry -  Daresbury Laboratory: Chemical Visualization and Quixote -  NIH National Cancer Institute – Databases and Open Chemistry

§  Publications in the traditional journals §  Screencasts showing off what the software can do §  In person workshops and training sessions

Page 18: Oscon 2011 Practicing Open Science

Bringing About Real Change §  2011 is the ”International Year of Chemistry” §  Chemistry has been quite closed traditionally §  We are working hard to change this §  Recently led a Phase I SBIR to develop “open chemistry tools”

-  GUI acting as the center of the chemical workflow -  Database application using MongoDB, chemically aware -  Cluster integration on the desktop – submit, monitor and retrieve

§  Chemical simulation/calculation now biggest HPC user in military §  Open tools can use both open and closed computational codes

-  Largely written in Fortran to run on clusters -  NWChem recently open sourced – PNNL quantum code -  Already work with GAMESS, GAMESS-UK, Q-Chem, Gaussian…

§  The time is right for change in chemistry -  Opportunity to accelerate the rate of research

Page 19: Oscon 2011 Practicing Open Science

Funding Open Chemistry Tools §  Kitware’s core business is based on “open collaboration platforms” §  Led a Phase I Small Business Innovation Research project (US Army)

-  Invited to apply for Phase II funding, currently pending §  Make use of Apache and BSD licenses

-  Allow for participation of a wider cross-section of the community -  Reduced licensing complications -  Important for industry and government collaboration

§  Successfully taken part in Google Summer of Code – funded students -  Student in 2007 working on Avogadro and Kalzium -  Mentor for KDE in 2008-2010 -  VTK organization administrator and mentor in 2011

§  Looking to other funding agencies and collaborations in future

Page 20: Oscon 2011 Practicing Open Science

Developing in Niche Areas §  The population of active researchers in chemistry is relatively small

-  The number of those researchers who code is even smaller -  Of those, the number that wish to contribute to open source is tiny

§  Developing and nurturing these communities can be challenging

§  Some students develop a feature in a summer and disappear

§  Other professors might develop code over the summers

§  Have to lower the barrier to entry as much as possible

§  Often need to help with tools, build systems, etc

Page 21: Oscon 2011 Practicing Open Science

Enabling Technologies in Chemistry §  Large number of computational chemistry codes

-  Many do not have dedicated user interfaces -  Forming a new area enabling chemical workflows -  Some of the open source codes that can benefit

-  NWChem – quantum chemistry code -  Quantum Espresso – plane wave code

-  Free for use codes such as GAMESS -  Commercial codes such as Molpro, Q-Chem, others -  These codes are executed in a separate process

§  Libraries that can be used in the GUI: -  The Visualization Toolkit (VTK) provides advanced rendering -  ParaView library provides client-server technology for large data

Page 22: Oscon 2011 Practicing Open Science

Working With Academia, Industry and Government §  In the past licensing has not been ideal

-  Some form of GPL or non-commercial only license fine for most academics -  Industry and government need more liberal licenses in general, e.g. BSD, Apache 2

§  Can be challenging to ensure everyone gets something out of the deal §  Avoiding the trap of dual-licensing – often kills community and shared ownership §  Funders can find it harder to understand commercialization §  We normally employ a services/consulting role

Page 23: Oscon 2011 Practicing Open Science

Government  Open  Source                Collabora'ons  

Brian Wylie Sandia National Laboratories

Sandia  Na7onal  Laboratories  is  a  mul7-­‐program  laboratory  managed  and  operated  by  Sandia  Corpora7on,  a  wholly  owned  subsidiary  of  Lockheed  Mar7n  Corpora7on,  for  the  U.S.  Department  of  

Energy’s  Na7onal  Nuclear  Security  Administra7on  under  contract  DE-­‐AC04-­‐94AL85000.  

Page 24: Oscon 2011 Practicing Open Science

Government Open Source Resources  

•  GOSCON  Government  Open  Source  Conference  (goscon.org)  

•  Open  Source  Center:  Foreign  open  source  intelligence  data  (opensource.gov)  

•  Open  Source  SoQware  Ins7tute:  Non-­‐profit  corp/govt/acad  (oss-­‐ins7tute.org)  

•  Government  Open  Source  SoQware  Resource  Centre  (gossrc.org)    

•  Center  for  Strategic  and  Interna7onal  Studies  (tracks  open  source  legisla7on  csis.org)    

Page 25: Oscon 2011 Practicing Open Science

Government Open Source Around  the  World  

Data  Courtesy  of  the  Center  for  Strategic  and  Interna'onal  Studies  

0  

20  

40  

60  

80  

100  

120  

140  

160  

180  

Europe   Asia   La7n  America  

North  America  

Africa   Middle  East  

Failed  Proposed  Approved  

Open  Source  Ini'a'ves  by  Region  (2000-­‐2009)  

Page 26: Oscon 2011 Practicing Open Science

Government Open Source Example  Projects  

Open  source  data  analysis  and  visualiza7on  pla[orm  

Sandia    Los  Alamos  

Kitware  

University  of  Utah  

Page 27: Oscon 2011 Practicing Open Science

Government Open Source Example  Projects  

Sandia  

Kitware  

Indiana  University  Stanford  

Page 28: Oscon 2011 Practicing Open Science

Government Open Source Collabora'on  Benefits  

Government  

Commercial  

Academic  

No  specific  vendor  “lock-­‐in/out”  Allows  a  diversified  development  team  Known  code  base  (strengths  and  weaknesses)  Typically  easier  to  integra7on  with  other  OS  tools  Improvement  of  the  OS  project    Money  Leveraging  project  for  other/future  work  Improvement  of  the  OS  project      

Student/Professor  support  Publishing/Sharing  Improvement  of  the  OS  project        

Page 29: Oscon 2011 Practicing Open Science

Government Open Source Collabora'on  Issues  

Need  to  relax  into  exis7ng  OS  license*  New  projects  should  pick  a  liberal  OS  license  Funding  source  may  hesitate  on  Open  Source  Proprietary  projects  /  Intellectual  Property    Government  bureaucracy  Mixed  soQware  skill  set  Deliverables  can  get  distorted  *  No  gov’t  sell  back  clause      Work  may  not  be  publica7on  material  If  you  do  publish,  it  may  be  a  joint  publica7on  

Government  

Commercial  

Academic  

Page 30: Oscon 2011 Practicing Open Science

Government Open Source Ques'ons  Sec'on  

Page 31: Oscon 2011 Practicing Open Science

Contact Information

§  Will Schroeder [email protected]

§  Brian Wylie [email protected]

§  Marcus Hanwell [email protected]

Page 32: Oscon 2011 Practicing Open Science

(view included video)