Maciej Fijałkowski - SEA · An efficient implementation of Python language A framework for...

Post on 24-May-2020

1 views 0 download

Transcript of Maciej Fijałkowski - SEA · An efficient implementation of Python language A framework for...

Fast numerics in Python - NumPyand PyPy

Maciej Fijałkowski

SEA, NCAR

22 February 2012

fijal PyPy

What is this talk about?

What is PyPy and why?Numeric landscape in PythonWhat we achieved in PyPyWhere we’re going?

fijal PyPy

What is PyPy?

An efficient implementation of PythonlanguageA framework for writing efficient dynamiclanguage implementationsAn open source project with a lot of volunteereffort, released under the MIT licenseAgile development, 13000 unit tests, continuousintegration, sprints, distributed teamI’ll talk today about the first part (mostly)

fijal PyPy

PyPy status right now

An efficient just in time compiler for thePython languageRelatively “good’ on numerics (compared toother dynamic languages)Example - real time video processing2-300x faster on Python code

fijal PyPy

Why should you care?

If I write this stuff in C/fortran/assembler it’ll befaster anywaymaybe, but ...

fijal PyPy

Why should you care? (2)

Experimentation is importantImplementing something faster, in human time,leaves more time for optimizations andimprovementsFor novel algorithms, clearer implementationmakes them easier to evaluate (Python often iscleaner than C)

Sometimes makes it possible in the first place

fijal PyPy

Why should you care? (2)

Experimentation is importantImplementing something faster, in human time,leaves more time for optimizations andimprovementsFor novel algorithms, clearer implementationmakes them easier to evaluate (Python often iscleaner than C)

Sometimes makes it possible in the first place

fijal PyPy

Why would you care even more?

Growing communityEverything is for free with reasonable licensingThere are many smart people out thereaddressing hard problems

fijal PyPy

Example of why would you care

You spend a year writing optimized algorithmsfor a GPUNext year a new generation of GPUs come alongYour algorithms are no longer optimized

Alternative - express your algorithmsLeave low-level details to people who havenothing better to do

... like me (I don’t know enough Physics to dothe other part)

fijal PyPy

Example of why would you care

You spend a year writing optimized algorithmsfor a GPUNext year a new generation of GPUs come alongYour algorithms are no longer optimized

Alternative - express your algorithmsLeave low-level details to people who havenothing better to do

... like me (I don’t know enough Physics to dothe other part)

fijal PyPy

Example of why would you care

You spend a year writing optimized algorithmsfor a GPUNext year a new generation of GPUs come alongYour algorithms are no longer optimized

Alternative - express your algorithmsLeave low-level details to people who havenothing better to do

... like me (I don’t know enough Physics to dothe other part)

fijal PyPy

Numerics in Python

numpy - for array operationsscipy, scikits - various algorithms, alsoexposing C/fortran librariesmatplotlib - pretty picturesipython

fijal PyPy

There is an entire ecosystem!

Which I don’t even know very wellPyCUDA

pandas

mayavi

fijal PyPy

What’s important?

There is an entire ecosystem built by peopleIt’s available for free, no shady licensingIt’s being expandedIt’s growingIt’ll keep up with hardware advancments

fijal PyPy

Problems with numerics in python

Stuff is reasonably fast, but...

Only if you don’t actually write much PythonArray operations are fine as long as they’revectorizedNot everything is expressable that wayNumpy allocates intermediates for eachoperation, suboptimal

fijal PyPy

Problems with numerics in python

Stuff is reasonably fast, but...

Only if you don’t actually write much PythonArray operations are fine as long as they’revectorizedNot everything is expressable that wayNumpy allocates intermediates for eachoperation, suboptimal

fijal PyPy

Our approach

Build a tree of operationsCompile assembler specialized for aliasing andoperationsExecute the specialized assembler

fijal PyPy

Examples

a, b, c are single dimensional arraysa+a would generate different code than a+ba+b*c is as fast as a loop

fijal PyPy

Performance comparison

NumPyPyPy GCCa+b 0.6s 0.4s 0.3s

(0.25s)a+b+c 1.9s 0.5s 0.7s

(0.32s)5+ 3.2s 0.8s 1.7s

(0.51s)

Pathscale is actually slower

fijal PyPy

Performance comparion SSE

Branch only so far!

PyPySSE

PyPy GCC

a+b 0.3s 0.4s 0.3s(0.25s)

a+b+c 0.35s 0.5s 0.7s(0.32s)

5+ 0.36s 0.8s 1.7s(0.51s)

fijal PyPy

Status

This works reasonably wellFar from implementing the entire numpy,although it’s in progressAssembler generation backend needs worksVectorization in progress

fijal PyPy

Status benchmarks - slightly morecomplex

laplace solutionsolutions:

NumPy: 4.3slooped: too long to run ~2100s

PyPy: 1.6slooped: 2.5s

C: 0.9s

fijal PyPy

Progress plan

Express operations in high-level languagesLet us deal with low level details

However, retain knobs and buttons for advancedusersDon’t get penalized too much for not using them

fijal PyPy

Progress plan

Express operations in high-level languagesLet us deal with low level details

However, retain knobs and buttons for advancedusersDon’t get penalized too much for not using them

fijal PyPy

Few words about the future

Predictions are hard

Especially when it comes to futureTake this with a grain of salt

fijal PyPy

Few words about the future

Predictions are hard

Especially when it comes to futureTake this with a grain of salt

fijal PyPy

This is just the beginning...

PyPy is an easy platform to experiment withWe did not spend a whole lot of time dealing withthe low-level optimizationsAutomatic vectorization over multiple threadsSSE, GPU, dynamic offloadingOptimizations based on machine cache sizeWe’re running a fundraiser, make your employerdonate money

fijal PyPy

Q&A

http://pypy.org/

http://buildbot.pypy.org/numpy-status/latest.html

http://morepypy.blogspot.com/

Any questions?

fijal PyPy