The Joy of SciPy

33
The Joy of SciPy PUB February 14, 2013 David Kammeyer

description

Presentation to Python Users Berlin on 2/14/2013

Transcript of The Joy of SciPy

Page 1: The Joy of SciPy

The Joy of SciPy

PUB February 14, 2013

David Kammeyer

Page 2: The Joy of SciPy

Brief History

Person Package Year

Jim Fulton Matrix Object in Python

1994

Jim Hugunin Numeric 1995

Perry Greenfield, Rick White, Todd Miller Numarray 2001

Travis Oliphant NumPy 2005

Page 3: The Joy of SciPy

SciPy 2001

Founded in 2001 with Travis Vaught

Eric JonesweaveclusterGA*

Pearu Petersonlinalg

interpolatef2py

Travis Oliphantoptimizesparse

interpolateintegratespecialsignalstats

fftpackmisc

Page 4: The Joy of SciPy

SciPy Ecosystem

Page 5: The Joy of SciPy

Community effort• Chuck Harris• Pauli Virtanen• David Cournapeau• Stefan van der Walt• Dag Sverre Seljebotn• Robert Kern• Warren Weckesser• Ralf Gommers• Mark Wiebe• Nathaniel Smith

Page 6: The Joy of SciPy

Why Python for Technical Computing

• Syntax (it gets out of your way)• Over-loadable operators• Complex numbers built-in early• Just enough language support for arrays• “Occasional” programmers can grok it• Supports multiple programming styles• Expert programmers can also use it effectively• Has a simple, extensible implementation• General-purpose language --- can build a system• Critical mass

Page 7: The Joy of SciPy

Putting Science back in Comp Sci

• Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web- Complex numbers?- Vectorized primitives?• Array-oriented programming has been

supplanted by Object-oriented programming• Software stack for scientists is not as helpful

as it should be• Fortran is still where many scientists end up

Page 8: The Joy of SciPy

NumPy: an Array-Oriented Extension• Data: the array object

– slicing and shaping – data-type map to Bytes

• Fast Math: – vectorization – broadcasting– aggregations

Page 9: The Joy of SciPy

Zen of NumPy

• strided is better than scattered• contiguous is better than strided• descriptive is better than imperative • array-oriented is better than object-oriented• broadcasting is a great idea • vectorized is better than an explicit loop• unless it’s too complicated --- then use Cython/Numba• think in higher dimensions

Page 10: The Joy of SciPy

Memory using Object-oriented

Object

Attr1

Attr2

Attr3

Object

Attr1

Attr2

Attr3

Object

Attr1

Attr2

Attr3

Object

Attr1

Attr2

Attr3

Object

Attr1

Attr2

Attr3

Object

Attr1

Attr2

Attr3

Page 11: The Joy of SciPy

Array-oriented (Table) approach

Attr1 Attr2 Attr3

Object1

Object2

Object3

Object4

Object5

Object6

Page 12: The Joy of SciPy

Benefits of Array-oriented

• Many technical problems are naturally array-oriented (easy to vectorize)• Algorithms can be expressed at a high-level• These algorithms can be parallelized more

simply (quite often much information is lost in the translation to typical “compiled” languages)• Array-oriented algorithms map to modern

hard-ware caches and pipelines.

Page 13: The Joy of SciPy

We need more focus on complied array-oriented languages with fast compilers!

Page 14: The Joy of SciPy

What is good about NumPy?• Array-oriented• Extensive Dtype System (including structures)• C-API• Simple to understand data-structure• Memory mapping• Syntax support from Python• Large community of users• Broadcasting• Easy to interface C/C++/Fortran code

Page 15: The Joy of SciPy

New Project

Blaze

NumPy

Next Generation NumPyOut-of-core

Distributed Tables

Page 16: The Joy of SciPy

Overview

Main Script

Processing Node

Processing Node

Processing Node

Processing Node

Processing Node

Code

Code

Code

Code

Page 17: The Joy of SciPy

Timeline (Available on GitHubNow!)

Date Milestone

July 2012 Pre-alpha release

December 2012 Early Beta Release

June 2013 Version 1.0

Page 18: The Joy of SciPy

Spectrogram Demo

Page 19: The Joy of SciPy

Introducing Numba (lots of kernels to write)

Page 20: The Joy of SciPy

NumPy Users

• Want to be able to write Python to get fast code that works on arrays and scalars• Need access to a boat-load of C-extensions

(NumPy is just the beginning)

PyPy doesn’t cut it for us!

Page 21: The Joy of SciPy

Dynamic compilationPython

Function

NumPy Runtime

Ufu

ncs

Gen

eral

ized

U

Func

s

Func

tion-

base

d In

dexi

ng

Mem

ory

Filte

rs

Win

dow

K

erne

l Fu

ncs

I/O F

ilter

s

Red

uctio

n Fi

lters

Com

pute

d C

olum

ns

Dynamic Compilation

function pointer

Page 22: The Joy of SciPy

SciPy needs a Python compiler

integrateoptimize

odespecial

writing more of SciPy at high-level

Page 23: The Joy of SciPy

Numba -- a Python compiler

• Replays byte-code on a stack with simple type-inference• Translates to LLVM (using LLVM-py)• Uses LLVM for code-gen• Resulting C-level function-pointer can be

inserted into NumPy run-time• Understands NumPy arrays• Is NumPy / SciPy aware

Page 24: The Joy of SciPy

NumPy + Mamba = Numba

LLVM 3.1

Intel Nvidia AppleAMD

OpenCLISPC CUDA CLANGOpenMP

LLVM-PY

Python Function Machine Code

Page 25: The Joy of SciPy

Examples

Page 26: The Joy of SciPy

Software Stack Future?

LLVM

Python

COBJCFORTRAN

R

C++

Plateaus of Code re-use + DSLs

MatlabSQL

TDPL

Page 27: The Joy of SciPy

Wakari/Numba Demo

Page 28: The Joy of SciPy

How to pay for all this?

Page 29: The Joy of SciPy

Dual strategy

Blaze

Page 30: The Joy of SciPy

NumFOCUS

Num(Py) Foundation for Open Code for Usable Science

Page 31: The Joy of SciPy

NumFOCUS

• Mission• To initiate and support educational programs

furthering the use of open source software in science.

• To promote the use of high-level languages and open source in science, engineering, and math research

• To encourage reproducible scientific research• To provide infrastructure and support for open

source projects for technical computing

Page 32: The Joy of SciPy

NumFOCUS

Core Projects

Other Projects (seeking more --- need representatives)

NumPy SciPy IPython Matplotlib

Scikits Image

Page 33: The Joy of SciPy

• Large-scale data analysis products• Anaconda, SciPy in a Box• Wakari.io -- Cloud Hosted SciPy• Python training (data analysis and

development)• NumPy support and consulting• Blaze, Numba, and More Development