New D20 Python & Julia · 2017. 12. 5. · JIT compiler for subset of Python Uses LLVM Compiles to...
Transcript of New D20 Python & Julia · 2017. 12. 5. · JIT compiler for subset of Python Uses LLVM Compiles to...
CS 696 Intro to Big Data: Tools and Methods Fall Semester, 2017
Doc 20 Python & Julia Dec 5, 2017
Copyright ©, All rights reserved. 2017 SDSU & Roger Whitney, 5500 Campanile Drive, San Diego, CA 92182-7700 USA. OpenContent (http://www.opencontent.org/openpub/) license defines the copyright on this document.
Python
2
Very popular Lots of libraries
Slow
How Popular - Tiobe Index Nov 2017
3
Nov 2017 Nov 2016Programming
LanguageRatings Change
1 1 Java 13.2% -5.5%
2 2 C 9.3% +0.1%
3 3 C++ 5.3% -0.1%
4 5 Python 4.5% +0.9%
5 4 C# 3.0% -0.6%
6 8 JavaScript 3.0% +0.3%
7 6Visual Basic .NET
2.9% -0.2%
8 7 PHP 1.9% -1.2%
11 19 R 1.6% -0.1%
12 14 Matlab 1.6% -0.3%
35 Julia 0.6%
39 Scala 0.5%
51-100 SPARK
Over Time
4
How Slow?
5
R Python Julia Fortran
fib 533.5 77.8 2.1 0.7
quick sort 264.5 32.9 1.6 1.3
rand_mat_stat 14.6 17.9 1.7 1.5
Some Basics About Python's Implementation
6
def example(): a = 10 a = 'this is a string' a = [1,2,3] a = {'a':5,'b':6} return a
example()
How can the activation record hold for a integer string array dictionary?
a is a pointer to memory on the heap
All datum are Objects!
7
sys.getsizeof Returns the size of the immediate object in bytes
Data Call Size
Empty String sys.getsizeof('') 49
Small Integer sys.getsizeof(1) 28
Larger Integer sys.getsizeof(1_073_741_824) 32
Float sys.getsizeof(1.2) 24
Empty Array sys.getsizeof([]) 64
Empty Dictionary sys.getsizeof({}) 240
Python 3.6.1 Anaconda 4.4.0 Mac OS
Objects are Not Good for Big Data
8
On a 64-bit processor an integer should take 8 bytes
Python 3 uses 28 bytes 3.5 times more than needed
So memory can store on 29% as many integers as using C
Memory Implementation
9
array = [1, 'Cat', 2.3]
array
1
2.3
Cat
array uses 192 bytes
Objects are Bad for Big Data - Locality
10
1
5
2
Iterating over an array is common What elements of the array to be next to each other in memory
To minimize page faults
2
5
1
Python Class
11
class FooBar: def __init__(self, x, y): self.foo = x self.bar = y def __add__(self, aFooBar): return FooBar(self.foo + aFooBar.foo, self.bar + aFooBar.bar) def __str__(self): return "FooBar(%i, %i)" % (self.foo, self.bar)
a = FooBar(2,3) a.z = 10
Python Objects Store Fields in Dictionary
12
b = FooBar(3,1)
b.__dict__ {'bar': 1, 'foo': 3}
get_size(b) //328
b.z = 12
b.__dict__ //{'bar': 1, 'foo': 3, 'z': 12}
get_size(b) //406
b.bar = 12 Follow pointer to object in memory Compute hash of 'bar' Use hash to find location of 'bar' in dictionary Change value
Recursive Compute Total Size
13
import sys def get_size(obj, seen=None): """Recursively finds size of objects""" size = sys.getsizeof(obj) if seen is None: seen = set() obj_id = id(obj) if obj_id in seen: return 0 # Important mark as seen *before* entering recursion to gracefully handle # self-referential objects seen.add(obj_id) if isinstance(obj, dict): size += sum([get_size(v, seen) for v in obj.values()]) size += sum([get_size(k, seen) for k in obj.keys()]) elif hasattr(obj, '__dict__'): size += get_size(obj.__dict__, seen) elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)): size += sum([get_size(i, seen) for i in obj]) return size
Methods are Slower than Functions
14
A
B E
C
D
foo
foo
test = A() test.foo()
test = D() test.foo()
foo
test = E() test.foo()
Why is Python Slow
15
Dynamically Typed Interpreted Object Model Global Interpreter Lock (GIL)
http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/
Dynamically Typed
16
17
# python code a = 1 b = 2 c = a + b
Assign 1 to a 1a. Set a->PyObject_HEAD->typecode to integer 1b. Set a->val = 1
Assign 2 to b 2a. Set b->PyObject_HEAD->typecode to integer 2b. Set b->val = 2
call binary_add(a, b) 3a. find typecode in a->PyObject_HEAD 3b. a is an integer; value is a->val 3c. find typecode in b->PyObject_HEAD 3d. b is an integer; value is b->val 3e. call binary_add<int, int>(a->val, b->val) 3f. result of this is result, and is an integer.
Create a Python object c 4a. set c->PyObject_HEAD->typecode to integer 4b. set c->val to result
Object Memory
18
Global Interpreter Lock (GIL)
19
Occurs in multithreaded CPython & PyPy code
CPython memory management is not thread safe
When a thread needs to access an object it must first obtain a lock via a mutex
Some Solutions
20
NumPy & SciPy Pandas
Cython, Numba
NumPy
21
Package for Scientific computing with Python
N-dimensional array object Broadcasting functions Tools for integrating C/C++ and Fortran code Linear algebra, Fourier transform, and random number capabilities
corrcoef(x[, y, rowvar, bias, ddof])
Pearson product-moment correlation coefficients.
Solving 4'th Order 2D Laplace Equation
22
Compilers/Packages n=50 n=100
Python 46.15 751.78
NumPy 0.61 6.39
Java 0.12 2.20
https://modelingguru.nasa.gov/docs/DOC-1762
Time in seconds
Numpy Array
23
NumPy Array source
24
static PyObject * array_new(PyTypeObject *subtype, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"shape", "dtype", "buffer", "offset", "strides", "order", NULL}; PyArray_Descr *descr = NULL; int itemsize; PyArray_Dims dims = {NULL, 0}; PyArray_Dims strides = {NULL, 0}; PyArray_Chunk buffer; npy_longlong offset = 0; NPY_ORDER order = NPY_CORDER; int is_f_order = 0; PyArrayObject *ret;
buffer.ptr = NULL;
Python Calling C/C++ code
25
NumPy, SciPy, Pandas Implement data structures & algorithms in C
If you just use existing algorithms Python performance is competitive
If you implement new computational algorithms Python is slow Implement in C for performance
pandas rule of thumb: have 5 to 10 times as much RAM as the size of your dataset Wes McKinney Author of Pandas
Cython
26
Special compiler that compiles Python into C binaries C binaries still call CPython interpreter & CPython libraries
def fib(n): if n<2: return n return fib(n-1)+fib(n-2)
%%cython def fib_cython(n): if n<2: return n return fib_cython(n-1)+fib_cython(n-2)
27
def fib(n): if n<2: return n return fib(n-1)+fib(n-2)
Python
%%cython def fib_cython(n): if n<2: return n return fib_cython(n-1)+fib_cython(n-2)
Cython
%%cython cpdef long fib_cython_type(long n): if n<2: return n return fib_cython_type(n-1)+fib_cython_type(n-2)
Typed Cython
100 loops, best of 3: 2.6 ms per loop
%timeit fib(20)
1000 loops, best of 3: 815 µs per loop
10000 loops, best of 3: 35.2 µs per loop
Numba
28
JIT compiler for subset of Python Uses LLVM
Compiles to native machine code
@jit def fib_seq_numba(n): if n < 2: return n a,b = 1,0 for i in range(n-1): a,b = a+b,a return a
from numba import jit %timeit fib_seq_numba(20)
The slowest run took 1001183.05 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 149 ns per loop
fib(20)
29
Time micro-seconds
Python 2,600
Cython 815
Number 149
Typed Cython 35
Julia - median of 4 106
Julia - Best of 4 12.1
Julia 0.6.1
Python 3.6.1 Anaconda 4.4.0
MacBook Pro 2.8GHz Core I7
Python Performance
30
Python is Slow
Calling C code or compiling to C code produces fast code
Julia
31
New general purpose computation language
Designed by people doing computation programming
Uses lessons learned using R, Matlab, Python
Using current CS technology
Fast & Interactive
Julia - Getting Started
32
Books & Tutorials
IDE Julia Pro - https://juliacomputing.com
https://julialang.org/learning/
Julia - Current State
33
Currently Version 0.6
Version 0.7 = 1.0 beta
Runs on Windows Linux MacOS
Free open source - MIT license
IDE Command line - REPL Juno = Atom + Julia JuliaPro
Free version - https://juliacomputing.com
Jupyter Notebook JuliaBox - juliabox.com
Two Language Problem
34
Ease of use Interactive Simple syntax
Performance
35
Matlab 1984
R 1995
Python 1991
NumPy 2005C 1972
C++ 1983
Fortran 1957 77 90 95 2003 2008 2015
LLVM 2003
Julia 2012
Java 1995
Julia vs Matlab
36
Open source & free
Syntax is similar to Matlab x = 2 y = 3 x + y
Function names correspond to Matlab function names
Can call Matlab
General purpose computational language
Faster
Julia vs R
37
Faster
Not everything is a vector
Has statistical libraries
Can call R code
General purpose computational language
Julia Libraries
38
1,400+ packages Julia 0.6
Julia Statistics & Machine Learn 36 packages
Basics statistics R data Distributions R d-p-q-r functions Multivariate Hypothesis test Time Series Statistical models Clustering
Generalized Linear models Data Frames Local Regression
Julia vs Python
39
Faster
Type checking
General purpose computational language
function foo(x::Int64):Int64 w::Int64 = x + 1 w end
function bar(x) x + 1 end
Julia
40
Federal Reserve Bank Model of US Economy Matlab -> Julia About 10 times faster
Celeste Analyze 178 terabytes of images 15 minutes 1.54 petaflops using 9,300 nodes
Petaflop club C C++ Fortran Julia
Julia
41
Designed by and for people doing computational programming using current technology
Fast Interactive Simple syntax
Demo
42
Julia
43
Designed by and for people doing computational programming using current technology
Fast Interactive Simple syntax Can call C/Fortran/Java/R/Python code Libraries
Statistics ML Web Graphics, etc
Lisp-like macros
Plays well with others
44
Call C & Fortran code directly when compiled as shared library
path = ccall((:getenv, "libc"), Cstring, (Cstring,), "SHELL") unsafe_string(path)
Libraries to call C++ Matlab Mathematica Objective C Python
Pandas R
Julia Super Powers
45
Interactive & Fast Types LLVM + JIT Multiple dispatch
Lisp like macros
Types
46
Python Dynamic Typing Types checked at run time
Scala Type inference Compiler adds type declaration Compiler still checks types
Julia Compiler checks types Produces type specific code
Optional Types
47
h(x::Int) = 3x^2 - 2*x + 4*π
f(x) = 3x^2 - 2x + 4π
f(0) // calls f compiled for integers
f(2.3) // calls f compiled for doubles
Julia generates a version of f for different types f is as fast as h
JIT
48
Like Java's hot spot compiler Julia runs code in interpreted mode to profile Then generates optimized machine code
fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
@time fib(20) 0.003403 seconds (503 allocations: 28.684 KiB)
@time fib(20) 0.000056 seconds (5 allocations: 176 bytes)
@time fib(20) 0.000060 seconds (5 allocations: 176 bytes)
Multiple Dispatch
49
No methods, just functions
Vtable uses function name and argument types
function factorial(n::Int) n <= 0 || error("n must be non-negative") n == 1 && return 1 n * factorial(n-1) end
function factorial(s::String) s == "" && return 0 product = BigInt(1) for c in s product = product * factorial(BigInt(Int(c))) end product end
factorial("a")
Using Multiple Cores - macro example
50
addprocs(2) workers() # [ 2, 3] procs() # [ 1, 2, 3]
function count_heads(n) c::Int = 0 for i=1:n c += rand(Bool) end c end
a = @spawn count_heads(100000000) b = @spawn count_heads(100000000) fetch(a)+fetch(b)
HPAT.jl, ParallelAccelerator.jl
51
Intel Labs projects to provide high level efficient & fast parallel code
ParallelAccelerator.jl Converts Julia code to C/C++ Imports C/C++ code into Julia
Supports subset of Julia
HPAT.jl Using ParallelAccelerator converts Julia code to C/C++ & MPI calls for distributed computing
Sample Using ParallelAccelertor
52
using ParallelAccelerator @acc function calc_pi(n) x = rand(n) .* 2.0 .- 1.0 y = rand(n) .* 2.0 .- 1.0 return 4.0 * sum(x.^2 .+ y.^2 .< 1.0)/n end
calc_pi(10_000_000)
53