Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile |...
-
Upload
continuum-analytics -
Category
Data & Analytics
-
view
108 -
download
1
Transcript of Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile |...
![Page 1: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/1.jpg)
DATA SCIENCETEAM COLLABORATION
FORGET ABOUT MEETING ME HALFWAY,TAKE ME THE LAST MILE
![Page 2: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/2.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
![Page 3: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/3.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
OGT molecular dynamics simulationProtein “mouth” opening, 1us
![Page 4: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/4.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokesCERN computing facilityGeneva, Switzerland
![Page 5: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/5.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
![Page 6: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/6.jpg)
SUCCESS COMES FROM TEAM WORK
http://bit.ly/ac17-collab
![Page 7: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/7.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
SUCCESS COMES FROM TEAM WORK
![Page 8: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/8.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
IAN: ENGINEER, PHYSICIST, BIOLOGIST?
• Ian Stokes-Rees, @ijstokes• Product Marketing Manager• Computational Scientist• Passionate advocate of
Open Data Science• Educator and evangelist for use of
Python and Anaconda
![Page 9: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/9.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
FIRST TASTE OF “BIG DATA” COMPUTING
• 100,000 acoustic tri-phone models• 100 parameters per model• 10 million parameters to estimate• adaptation = real-time adjustment• computation = tricky!
![Page 10: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/10.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
PhD on CERN LHCb COMPUTING TEAM
Distributed computing infrastructure• 1000s of concurrent users• 100s of federated computing centers
• no centralized control• 1M+ servers with software installed• 20+ year life span• 20 GB of data per second• 14 hours per day• 7 days a week• 7 months of the year
March 26, 2010 LHCb first physics at 3.5 TeV
![Page 11: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/11.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
HOW DO CERN PHYSICISTS DO THIS?
• Some smart people over there• Who brought us the Web, HTTP, and HTML?
• Big Data• Multi-PB per year
• Large collaborating teams• 1000s of people accessing systems
• Computation critical• Or there is no way to make sense of the data• And discover new physics December 2, 2016
LHCb proton-lead collisions
![Page 12: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/12.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
CERN ATLAS detectorCalorimeter end cap wiring harnessMillions of data feeds @ 40 MHz signal rate
![Page 13: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/13.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
HOW WOULD YOU DO IT?
Custom hardware:CMS L0 muon trigger ASIC
Giant compute and storage clusters
Wicked fast algorithmswritten in Fortran and C
Python: the Swiss army knife for computational physics
![Page 14: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/14.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
PYTHON: LINGUA FRANCA FOR DATA SCIENCE
• Human readable• Easy to learn• Object oriented• Cleanly wraps C and Fortran• Amazing foundation of high
quality data science libraries• Suitable for scripting,
algorithms, data processing and applications
![Page 15: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/15.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
THE CALCULUS OF NEWTON AND LEIBNIZ
![Page 16: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/16.jpg)
SOMETIMES ESOTERIC IS OK
http://bit.ly/ac17-collab
![Page 17: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/17.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
HERMITS AND HIGH PRIESTS
NPS, Richard Proenneke 1985
![Page 18: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/18.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
MOLECULAR BIOLOGY:FROM PROTONS TO PROTEINS
• It takes 3-9 months in the wet lab to prepare protein samples
• Once prepared it is only a few days to ”image” those samples and produce digitized representations
• However the “images” aren’t yet 3D atomic models
• That takes from weeks to months to complete, sitting behind a computer
• You may know it as protein folding
Nature, 2011 PMID: 21240259Lazarus, Nam, Jiang, Sliz, Walker
![Page 19: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/19.jpg)
HOW DO WE ACCELERATETHE TIME TO INSIGHT?
http://bit.ly/ac17-collab
![Page 20: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/20.jpg)
SUCCESS COMES FROM TEAM WORK
http://bit.ly/ac17-collab
![Page 21: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/21.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
WHAT DOES “HALF WAY” LOOK LIKE?
Today’s “good” data science environment:• Provide high performance computing resources
• For example, Hadoop infrastructure• Deploy a wide selection of the most popular analysis software
• Training and documentation• Technical support
![Page 22: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/22.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
FISH OUT OF WATER
• Why would we take an expert biochemist and force them to be
• A software engineer?• An IT system administrator?• A statistician?
• What can we do to let them focus on being a great biochemist?
![Page 23: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/23.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
FISH OUT OF WATER
• Why would we take an expert business analyst and force them to be
• A software engineer?• An IT system administrator?• A statistician?
• What can we do to let them focus on being a great business analyst?
![Page 24: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/24.jpg)
SUCCESS COMES FROM TEAM WORK
http://bit.ly/ac17-collab
![Page 25: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/25.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
TAKE ME THE LAST MILE
• DevOps engineer pre-configures scalable computation• Laptop to server to cluster• DevOps team is a partner, not a service provider
• Software engineer creates and customizes software for the task, project or individual
• Avoiding generic, static software setups
• Data scientist composes workflow• Analyst is provided simple high level interface
• With option to “drill down”
![Page 26: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/26.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
WHAT ABOUT THOSE PROTEINS?
• Normally it takes 10-200 hours of computing time to match a ”template” protein fragment to the imaging data
• There are 100k templates (known protein “folds”) to choose from• ”Be stupid” and just try them all – sometimes you’ll be surprised!• I spent 18 months working with biochemists and IT sys admins across
the country to create a sensible parallel & distributed workflow• 4-40 hours wall clock time to run 2k-20k hour parallel computation• Real-time updates of results• Web based interface to access summary and detailed data viz• Analysis performed in Jupyter Notebook, allowing customization• File-system based to enable “drill down” and direct access• 6M hours per year (~700 years), peak parallelism 20k cores
![Page 27: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/27.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
DATA SCIENCE PATTERN
• How is it done today?• What is the opportunity for improvement?• Prototype and evaluate – is it better? Rinse and repeat• Standardize and automate the workflow/model• Scale the workflow/model• Preprocess and distribute the data• Instrument execution and set quality metrics• Establish easy access interface• Create programmatic APIs
FIN
![Page 28: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/28.jpg)
SUCCESS COMES FROM TEAM WORK
Remember the footnote?Collaborative cross-functional teams
http://bit.ly/ac17-collab
![Page 29: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/29.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
BREAKING DATA SCIENCE OPEN
![Page 30: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/30.jpg)
ANACONDA & COLLABORATION
http://bit.ly/ac17-collab
![Page 31: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/31.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 1: ANACONDA
http://continuum.io/downloads
![Page 32: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/32.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
![Page 33: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/33.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
![Page 34: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/34.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
![Page 35: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/35.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
NOTEBOOKS FOR DATA SCIENCE COLLABORATION
Do you understand why notebooks are so popular?There are many angles to this, but my take:
• Visual record of the data science process• They tell a story, and support rich hyperlinked prose• Data can be embedded• Algorithms or analysis techniques are captured• Interactive visualizations are inline• Sharable• Reproducible*
![Page 36: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/36.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: ANACONDA CLOUD
http://anaconda.org
![Page 37: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/37.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: ANACONDA CLOUD
![Page 38: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/38.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: (MY) ANACONDA CLOUD
http://anaconda.org/ijstokes
![Page 39: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/39.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: (MY) ANACONDA CLOUD
![Page 40: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/40.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: (MY) ANACONDA CLOUD
![Page 41: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/41.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 3: ANACONDA ENTERPRISE (TODAY)
![Page 42: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/42.jpg)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 3: ANACONDA ENTERPRISE (COMING SOON)
![Page 43: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/43.jpg)
ANACONDA:GIVING SUPERPOWERS TO THE PEOPLEWHO CHANGE THE WORLD
TEAMS
http://bit.ly/ac17-collab
![Page 44: Data Science Team Collaboration: Forget About Meeting Me Halfway, Take Me the Last Mile | AnacondaCON 2017](https://reader035.fdocuments.net/reader035/viewer/2022070600/58ce822a1a28ab210a8b5c57/html5/thumbnails/44.jpg)
THANK YOU! QUESTIONS?
Ian Stokes-Rees @ijstokes
http://bit.ly/ac17-collab