Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

28
© 2016 Continuum Analytics - Confidential & Proprietary Python for Data: Past, Present, and Future Peter Wang CTO, Co-founder Anaconda / Continuum Analytics

Transcript of Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

Page 1: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2016 Continuum Analytics - Confidential & Proprietary

Python for Data:Past, Present, and Future

Peter Wang CTO, Co-founder Anaconda / Continuum Analytics

Page 2: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

• Our Journey with Anaconda • Why Python for Data? • The Future

Agenda

2

Page 3: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

3

My Journey with Anaconda

Page 4: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

• Degree in Physics (Cornell Univ.) • Computer graphics developer (C, C++) • Scientific Python developer and consultant (Chaco, Traits, …) • Founded Continuum Analytics in 2012 with Travis Oliphant • Launched / Created: PyData conferences and community, Anaconda

distribution, conda package manager, Bokeh web visualization, Blaze data library

• Think a lot about future of Python for data+science, machine learning

About Peter

4

Page 5: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

When we started 5 years ago…

Page 6: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

The birth of conda…

6

“Guido, please help convince core dev to work with us to solve

the packaging problem!”

“Meh. Feel free to solve it yourselves.”

Page 7: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc. 7

• 500+ Popular Python Packages • Optimized & Compiled • Free for Everyone

• Extensible via Conda Package Manager • Sandbox Packages & Libraries • Cross-Platform – Windows, Linux, Mac • Not just Python - over 230 R packages

Page 8: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc. 8

0

500

1,000

1,500

2,000

2,500

3,000

3,50020

15/1

2015

/2 20

15/3

2015

/4 20

15/5

2015

/6 20

15/7

2015

/8 20

15/9

2015

/10

2015

/11

2015

/12

2016

/1 20

16/2

2016

/3 20

16/4

2016

/5 20

16/6

2016

/7 20

16/8

2016

/9 20

16/10

20

16/11

20

16/12

20

17/1

2017

/2 20

17/3

2017

/4 20

17/5

2017

/6 20

17/7

Thou

sand

s

Anaconda & Miniconda Downloads

Anaconda Miniconda

Over 20 Million Downloads

Page 9: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

The Growth of Data Science - Python Leading the Way

9

https://stackoverflow.blog/2017/09/06/incredible-growth-python/

Page 10: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

Other Problems in 2012…

10

• Performance: You had to choose between vectorized system like NumPy, or going to Cython or wrapping C code. No nice JIT like Julia.

• We created Numba

• No system for building simple data-driven web apps, like Shiny for R. • We created Bokeh, to serve as both Shiny and D3 for Python

• No easy parallelism, or intrinsic parallel primitives like Spark. • We created Dask, which has parallel arrays and dataframes. • Also solves “data doesn't fit in RAM” problem.

Page 11: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc. 11

• Everyone is learning it, major universities are teaching it • Proven in production at Serious Places, not merely hip startups • Vastly outstrips scripting language rivals like Ruby, Perl • Growing faster than pure analysis langs like R, SAS, Matlab • Data science, machine learning application is taking off like a rocket • Python is most popular language for Deep Learning, the most

rapidly-innovating area of machine learning • Python 2 vs 3 rift is less of an issue for most people

Python in 2017

Page 12: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

https://www.youtube.com/watch?v=nU09j2gGHYg

Page 13: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

Why Python for Data?

13

Page 14: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc. 14

1973 19811968 1974

SQL

Numeric

19962005 1993 1991

Page 15: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

Python & ABC

15

It is interactive, structured, high-level, and intended to be used instead of BASIC, Pascal, or AWK.

It is not meant to be a systems-programming language but is intended for teaching or prototyping.

Page 16: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc. 16

Analyst

• Uses graphical tools • Can call functions,

cut & paste code• Can change some

variables

Gets paid for: Insight

Excel, VB, Tableau,

Analyst / Data Developer

• Builds simple apps & workflows• Used to be "just an analyst" • Likes coding to solve problems• Doesn't want to be a "full-time

programmer"

Gets paid (like a rock star) for: Code that produces insight

SAS, R, Matlab,

Programmer

• Creates frameworks & compilers

• Uses IDEs • Degree in CompSci• Knows multiple

languages

Gets paid for: Code

C, C++, Java, JS,

Python Python Python

Page 17: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

• VERY common misconception • Python is probably the most misunderstood language

• There are “tribes” and ecosystems in Python: web dev, scipy, pydata, embedded, scripting, 3D graphics, etc.

• But businesses tend to pigeonhole it: • IT/software/data engineering view: competes

with Java, C#, Ruby… • Analytics, stats, data science view: competes

with R, SAS, Matlab, SPSS, BI systems

Data science != Software Development

17

Page 18: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

• Data exploration and analysis are going to be a new kind of literacy that will be required to do great work in any field.

• Language is a human instinct and is a natural path to insight. We see this in our interaction with Python/PyData users, whose passion chiefly stems from this expressiveness and agility.

• An analytical language is “thoughtware”, not “software”.

Era of Data Literacy

18

Page 19: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc. 19

Page 20: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

What’s Next?

20

Page 21: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

• Python will become a preferred way to develop cognitive applications: online model learning and training

• There will be a steady income stream for people who want to maintain Python 2.x codebases

• Multi-language interoperability will be greatly improved once people adopt the Apache Arrow format for storing data. This means Python code running alongside Java/Scala/JVM will not be a second-class citizen.

• Constant improvements in memory and storage, as well as GPUs, mean that people will continue doing lots of Python locally on big workstations.

A Few Predictions

21

Page 22: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
Page 23: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

• Not about licenses • Empowering people &

communities to innovate • Aligns us with users, customers,

innovators

• “Software is eating the world” • Open source is eating software

Open Source and Developers

23

Page 24: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
Page 25: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

• Not about cost of software (“capital expense”)

• Not even about maintenance of software (“operational expense”)

• Core business goals: • Avoid lock-in • Harness innovation

Open Source and Businesses

25

Page 26: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
Page 27: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

5 Years 25+ Conferences 100s of talks

Page 28: Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

© 2017 Anaconda, Inc.

Questions?

28