JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC...
Transcript of JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC...
![Page 1: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/1.jpg)
JUPYTER ASCENDING:A PRACTICAL HAND GUIDE TO GALACTIC
SCALE, REPRODUCIBLE DATA SCIENCE
John Fonner, PhD
University of Texas at Austin
April 5th, 2016
4/5/2016 1
![Page 2: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/2.jpg)
Photos, Tweets, and hate mail all welcome!
Slides: tinyurl.com/FonnerSEA2016
Email: [email protected]
Twitter: @johnfonner
4/5/2016 2
![Page 3: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/3.jpg)
SCIENCE AS A SECOND THOUGHT
1. Formulate a theory2. Gather data3. Learn about data storage4. Learn about data
movement protocols5. Lose data6. Check out of rehab7. Learn about backup and
replication8. Gather data9. Learn about versioning10. Start preliminary analysis11. Buy a newer laptop12. Buy more memory13. Buy a desktop with more
memory
14. Buy a bigger monitor & GPUs “for work”
15. Google “250GB Excel Spreadsheet”
16. Learn about batch processing
17. Learn about batch schedulers
18. Learn about patience.19. Learn more about data
storage20. Learn about distributed
systems.21. Go back through notes to
remember the science question.
22. Learn R & Python23. Learn linux admin24. Finish preliminary analysis.25. Grow a ponytail26. Write a paper.27. Learn about data publishing28. Learn about reproducibility29. Plot the death of your
advisor/dept. head30. Apply for grants & research
allocations on public systems
31. Wait to apply next time32. Finish analyzing data33. Reformulate your theory34. Goto 1
4/5/2016 3
![Page 4: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/4.jpg)
SCIENTIFIC REPRODUCIBILITY
4/5/2016 4
+ +
![Page 5: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/5.jpg)
SOME ASSEMBLY REQUIRED…
4/5/2016 5
?
? ?
?
![Page 6: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/6.jpg)
4/5/2016 6
![Page 7: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/7.jpg)
SCIENTISTS, WITH FEW EXCEPTIONS,
ARE NOT TRAINED PROGRAMMERS
Research is hard
Coding is hard
Research code is
well designed,
documented,
leverages design patterns,
highly reusable,
portable,
and usually open source.
4/5/2016 7
![Page 8: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/8.jpg)
ACCESSIBILITY >= CAPABILITY
For scientific reproducibility, the impact of your
work will be more about accessibility than
capability
Domain grad students, not sys admins, are the
early adopters
Where can we focus effort to create community
around capability?
4/5/2016 8
![Page 9: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/9.jpg)
What has changed the least about the
computation you do over the last 10 years?
What do we ask domain researchers to learn to
use our tools and data?
4/5/2016 9
Memory/CPU/Disk
Operating System
Applications
Interface
![Page 10: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/10.jpg)
4/5/2016 10
![Page 11: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/11.jpg)
4/5/2016 11
![Page 12: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/12.jpg)
4/5/2016 12
![Page 13: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/13.jpg)
Decoupling the technology “stack”
4/5/2016 13
“Reproducers”• Web Browser
• GUIs
• Windows / Mac OS
Support
• Sample Data and
Sample Workflows
“Producers”• Linux CLI
• Hadoop / GPFS / Lustre
• Clusters / Clouds /
Containers
• Dockerfile / Makefile /
Ansible
![Page 14: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/14.jpg)
BACKEND INFRASTRUCTURE: SYSTEMS
Categorize systems as either Storage or Execution
Describe and support relevant protocols, directories, schedulers, and quotas
Each system includes the credentials to log into the system (SSH Keys, X509, username/password)
Register everything with a JSON document
http://agaveapi.co/documentation/tutorials/system-management-tutorial/
4/5/2016 14
![Page 15: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/15.jpg)
BACKEND INFRASTRUCTURE: APPS
An “App” is a versioned instance of a software package on a specific Execution System
App assets are bundled into a directory and stored on a Storage System
Apps can be private, shared with individual users, or made public
Public apps are compressed, assigned a checksum, and stored in a protected space
http://agaveapi.co/documentation/tutorials/app-management-tutorial/
4/5/2016 15
![Page 16: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/16.jpg)
BACKEND INFRASTRUCTURE: JOBS
A “Job” is an execution of an App with a
specific set of input files and parameters
All jobs are given an ID, all inputs and
parameters are preserved, output is also tracked
Jobs can be shared with others
http://agaveapi.co/documentation/tutorials/job-
management-tutorial/
4/5/2016 16
![Page 17: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/17.jpg)
4/5/2016 17
![Page 18: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/18.jpg)
4/5/2016 18
![Page 19: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/19.jpg)
4/5/2016 19
![Page 20: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/20.jpg)
4/5/2016 20
![Page 21: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/21.jpg)
4/5/2016 21
![Page 22: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/22.jpg)
DEVELOPER COMMAND-LINE TOOLS
https://bitbucket.org/agaveapi/cli
Requires bash and python’s json.tool
Uses caching for authentication
Parses JSON responses to condense output
As a Linux user, this is home-sweet-home
4/5/2016 22
![Page 23: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/23.jpg)
WHAT ABOUT JUPYTER?
Bleeding edge research will never be on a
webpage
Data exploration “outside the app” also needs to
be captured
An infrastructure for responsible computing at
scale inevitably must support responsible data
exploration
Jupyter has broad OS support, domain adoption,
domain libraries, and a more interactive UI
4/5/2016 23
![Page 24: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/24.jpg)
AGAVEPY
github.com/TACC/agavepy
Pythonic wrapper for all Agave endpoints
pip install agavepy
Developers actively “dogfooding” the module
(Obviously) usable within Jupyter
Has had greater uptake by users (not just
developers)
4/5/2016 24
![Page 25: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/25.jpg)
AGAVE-AWARE JUPYTERHUB
Going one step further – give users a notebook
jupyter.public.tenants.prod.agaveapi.co/
(Free) account creation here:
public.tenants.prod.agaveapi.co/create_account
Beta implementation at the moment
data purges during updates
Limited capacity on the current VM
All notebooks run inside Docker containers
4/5/2016 25
![Page 26: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/26.jpg)
WHAT’S NEXT?
Full-featured developer portal
Open-source reference implementation of an
Angular Javascript portal built on Agave
Additional Jupyter notebook examples
Production-grade support for a hosted
JupyterHub
4/5/2016 26
![Page 27: JUPYTER ASCENDING - SEA · 2020. 1. 6. · JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April](https://reader034.fdocuments.net/reader034/viewer/2022051804/5fecf81ecc417b3ee86e72fe/html5/thumbnails/27.jpg)
THANKS!
QUESTIONS?Slides: tinyurl.com/FonnerSEA2016
Email: [email protected]
Twitter: @johnfonner
TACC: www.tacc.utexas.edu
Agave: www.agaveapi.co
AgavePy: github.com/TACC/agavepy
4/5/2016 27