NumPy Essentials - Sample Chapter

16
Community Experience Distilled Boost your scientic and analytic capabilities in no time at all by discovering how to build real-world applications with NumPy NumPy Essentials Leo (Liang-Huan) Chin Tanmay Dutta NumPy Essentials Free Sample

Transcript of NumPy Essentials - Sample Chapter

Page 1: NumPy Essentials - Sample Chapter

C o m m u n i t y E x p e r i e n c e D i s t i l l e d

Boost your scientifi c and analytic capabilities in no time at all by discovering how to build real-world applications with NumPy

NumPy Essentials

Leo (Liang-Huan) C

hin Tanmay D

utta

NumPy Essentials

In today's world of science and technology, it's all about speed and fl exibility. When it comes to scientifi c computing, NumPy tops the list by giving you both speed and high productivity.

This book will walk you through NumPy using clear, step-by-step examples and theory. We will focus on the fundamentals of NumPy, including array objects, functions, and matrices with practical examples.

You will then learn about different NumPy modules while performing operations such as calculating the Fourier Transform; solving linear systems of equations, interpolation, extrapolation, regression, and curve fi tting; and evaluating integrals and derivatives. We will introduce you to using Cython with NumPy arrays and writing extension modules for NumPy code using the C API. This book will give you exposure to the vast NumPy library and help you build effi cient, high-speed programs.

Who this book is written for

If you are an experienced Python developer who intends to drive your numerical and scientifi c applications with NumPy, this book is for you. Prior experience or knowledge of working with the Python language is required.

$ 29.99 US£ 19.99 UK

Prices do not include local sales tax or VAT where applicable

Leo (Liang-Huan) ChinTanmay Dutta

What you will learn from this book

Manipulate the key attributes and universal functions of NumPy

Utilize matrix and mathematical computation using linear algebra modules

Implement regression and curve fi tting for models

Perform time frequency / spectral density analysis using the Fourier Transform modules

Collate with the distutils and setuptools modules used by other Python libraries

Integrate Cython and NumPy

Write extension modules for NumPy code using the C API

Build sophisticated data structures using NumPy arrays with libraries such as Pandas and SciPy

Num

Py Essentials

P U B L I S H I N GP U B L I S H I N G

community experience dist i l led

Visit www.PacktPub.com for books, eBooks, code, downloads, and PacktLib.

Free Sample

Page 2: NumPy Essentials - Sample Chapter

In this package, you will find: The authors biography

A preview chapter from the book, Chapter 1 'An Introduction to NumPy'

A synopsis of the book’s content

More information on NumPy Essentials

Page 3: NumPy Essentials - Sample Chapter

About the AuthorsLeo (Liang-Huan) Chin is a data engineer with more than 5 years of experience in the fieldof Python. He works for Gogoro smart scooter, Taiwan, where his job entails discoveringnew and interesting biking patterns . His previous work experience includes ESRI,California, USA, which focused on spatial-temporal data mining. He loves data, analytics,and the stories behind data and analytics. He received an MA degree of GIS in geographyfrom State University of New York, Buffalo. When Leo isn't glued to a computer screen, hespends time on photography, traveling, and exploring some awesome restaurants across theworld. You can reach Leo at h t t p : / / c h i n l e o c k . g i t h u b . i o / p o r t f o l i o /.

Tanmay Dutta is a seasoned programmer with expertise in programming languages suchas Python, Erlang, C++, Haskell, and F#. He has extensive experience in developingnumerical libraries and frameworks for investment banking businesses. He was alsoinstrumental in the design and development of a risk framework in Python (pandas,NumPy, and Django) for a wealth fund in Singapore. Tanmay has a master's degree infinancial engineering from Nanyang Technological University, Singapore, and acertification in computational finance from Tepper Business School, Carnegie MellonUniversity.

Page 4: NumPy Essentials - Sample Chapter

PrefaceWhether you are new to scientific/analytic programming, or a seasoned expert, this bookwill provide you with the skills you need to successfully create, optimize, and distributeyour Python/NumPy analytical modules.

Starting from the beginning, this book will cover the key features of NumPy arrays and thedetails of tuning the data format to make it most fit to your analytical needs. You will thenget a walkthrough of the core and submodules that are common to variousmultidimensional, data-typed analysis. Next, you will move on to key technicalimplementations, such as linear algebra and Fourier analysis. Finally, you will learn aboutextending your NumPy capabilities for both functionality and performance by usingCython and the NumPy C API. The last chapter of this book also provides advancedmaterials to help you learn further by yourself.

This guide is an invaluable tutorial if you are planning to use NumPy in analytical projects.

What this book coversChapter 1, An Introduction to NumPy, is a Getting Started chapter of this book, whichprovides the instructions to help you set up the environment. It starts with introducing theScientific Python Module family (SciPy Stack) and explains the key role NumPy plays inscientific computing with Python.

Chapter 2, The NumPy ndarray Object, covers the essential usage of NumPy ndarray object,including the initialization, the fundamental attributes, data types, and memory layout. Italso covers the theory underneath the operation, which gives you a clear picture of ndarray.

Chapter 3, Using Numpy Arrays, is an advanced chapter on NumPy ndarray usage, whichcontinues Chapter 2, The NumPy ndarray Object. It covers the universal functions inNumPy and shows you the tricks to speed up your code. It also shows you the shapemanipulation and broadcasting rules.

Chapter 4, Numpy Core and Libs Submodules, includes two sections. The first section hasdetailed explanation about the relationship between the way NumPy ndarray allocatesmemory and the interaction of CPU cache. The second part of this chapter covers the specialNumPy Array containing multiple data types (the structure/record array). Also, this chapterexplores the experimental datetime64 module in NumPy.

Page 5: NumPy Essentials - Sample Chapter

Preface

Chapter 5, Linear Algebra in NumPy, starts by utilizing matrix and mathematicalcomputation using linear algebra modules. It shows you multiple ways to solve amathematical problem: using Matrix, vector decomposition, and polynomials. It alsoprovides concrete practice for curve fitting and regression.

Chapter 6, Fourier Analysis in NumPy, covers the signal processing with NumPy FFTmodule and the Fourier application on amplifying signals/enlarging images withoutdistortion. It also provides the basic usage of the matplotlib package in Python.

Chapter 7, Building and Distributing NumPy Code, covers the basic details aroundpackaging and publishing the code in Python. It provides a basic introduction to NumPy-specific setup files and how to build extension modules.

Chapter 8, Speeding Up NumPy with Cython, introduces the users to the Cythonprogramming language and introduces readers to techniques that can be used to speed upexisting Python code.

Chapter 9, Introduction to the NumPy C-API, provides a basic introduction to the NumPy CAPI and, in general, how to write wrappers around the existing C/C++ library. The chapteraims to provide a gentle introduction along with equipping the readers with a basicknowledge of how to create new wrappers and understand the existing programs.

Chapter 10, Further Reading, is the last chapter of this book. It gives a summary of whatwe've learned in the book and explores 4 SciPy stack Python modules relying on NumPyarrays, which give you ideas about further scientific Python programming.

Page 6: NumPy Essentials - Sample Chapter

1An Introduction to NumPy

“I'd rather do math in a general-purpose language than try to do general-purposeprogramming in a math language.”

- John D Cook

Python has become one of the most popular programming languages in scientificcomputing over the last decade. The reasons for its success are numerous, and these willgradually become apparent as you proceed with this book. Unlike many othermathematical languages, such as MATLAB, R and Mathematica, Python is a general-purpose programming language. As such, it provides a suitable framework to buildscientific applications and extend them further into any commercial or academic domain.For example, consider a (somewhat) simple application that requires you to write a piece of software and predicts the popularity of a blog post. Usually, these would be the steps thatyou'd take to do this:

Generating a corpus of blog posts and their corresponding ratings (assuming that1.the ratings here are suitably quantifiable).Formulating a model that generates ratings based on content and other data2.associated with the blog post.Training a model on the basis of the data you found in step 1. Keep doing this3.until you are confident of the reliability of the model.Deploying the model as a web service.4.

Page 7: NumPy Essentials - Sample Chapter

An Introduction to NumPy

[ 8 ]

Normally, as you move through these steps, you will find yourself jumping betweendifferent software stacks. Step 1 requires a lot of web scraping. Web scraping is a verycommon problem, and there are tools in almost every programming language to scrape theWeb (if you are already using Python, you would probably choose Beautiful Soup orScrapy). Steps 2 and 3 involve solving a machine learning problem and require the use ofsophisticated mathematical languages or frameworks, such as Weka or MATLAB, whichare only a few of the vast variety of tools that provide machine learning functionality.Similarly, step 4 can be implemented in many ways using many different tools. There isn'tone right answer. Since this is a problem that has been amply studied and solved (to areasonable extent) by a lot of scientists and software developers, getting a working solutionwould not be difficult. However, there are issues, such as stability and scalability, thatmight severely restrict your choice of programming languages, web frameworks, ormachine learning algorithms in each step of the problem. This is where Python wins overmost other programming languages. All the preceding steps (and more) can beaccomplished with only Python and a few third-party Python libraries. This flexibility andease of developing software in Python is precisely what makes it a comfortable host for ascientific computing ecosystem. A very interesting interpretation of Python's prowess as amature application development language can be found in Python Data Analysis, Ivan Idris,Packt Publishing. Precisely, Python is a language that is used for rapid prototyping, and it isalso used to build production-quality software because of the vast scientific ecosystem it hasacquired over time. The cornerstone of this ecosystem is NumPy.

Numerical Python (NumPy) is a successor to the Numeric package. It was originallywritten by Travis Oliphant to be the foundation of a scientific computing environment inPython. It branched off from the much wider SciPy module in early 2005 and had its firststable release in mid-2006. Since then, it has enjoyed growing popularity among Pythonistswho work in the mathematics, science, and engineering fields. The goal of this book is tomake you conversant enough with NumPy so that you're able to use it and can buildcomplex scientific applications with it.

Page 8: NumPy Essentials - Sample Chapter

Chapter 1

[ 9 ]

The scientific Python stackLet's begin by taking a brief tour of the Scientific Python (SciPy) stack.

Note that SciPy can mean a number of things: the Python module namedscipy (h t t p : / / w w w . s c i p y . o r g / s c i p y l i b), the entire SciPy stack (h t t p :/ / w w w . s c i p y . o r g / a b o u t . h t m l), or any of the three conferences onscientific Python that take place all over the world.

Figure 1: The SciPy stack, standard, and extended libraries

Fernando Perez, the primary author of IPython, said in his keynote at PyCon, Canada 2012:

“Computing in science has evolved not only because software has evolved, but also becausewe, as scientists, are doing much more than just floating point arithmetic.”

Page 9: NumPy Essentials - Sample Chapter

An Introduction to NumPy

[ 10 ]

This is precisely why the SciPy stack boasts such rich functionality. The evolution of most ofthe SciPy stack is motivated by teams of scientists and engineers trying to solve scientificand engineering problems in a general-purpose programming language. A one-lineexplanation of why NumPy matters so much is that it provides the core multidimensionalarray object that is necessary for most tasks in scientific computing. This is why it is at theroot of the SciPy stack. NumPy provides an easy way to interface with legacy Fortran andC/C++ numerical code using time-tested scientific libraries, which we know have beenworking well for decades. Companies and labs across the world use Python to glue togetherlegacy code that has been around for a long time. In short, this means that NumPy allows usto stand on the shoulders of giants; we do not have to reinvent the wheel. It is a dependencyfor every other SciPy package. The NumPy ndarray object, which is the subject of the nextchapter, is essentially a Pythonic interface to data structures used by libraries written inFortran, C, and, C++. In fact, the internal memory layouts used by NumPy ndarray objectsimplement C and Fortran layouts. This will be addressed in detail in upcoming chapters.

The next layer in the stack consists of SciPy, matplotlib, IPython (the interactive shell ofPython; we will use it for the examples throughout the book, and details of its installationand usage will be provided in later sections), and SymPy modules. SciPy provides the bulkof the scientific and numerical functionality that a major part of the ecosystem relies on. Matplotlib is the de facto plotting and data visualization library in Python. IPython is anincreasingly popular interactive environment for scientific computing in Python. In fact, theproject has had such active development and enjoyed such popularity that it is no longerlimited to Python and extends its features to other scientific languages, particularly R andJulia. This layer in the stack can be thought of as a bridge between the core array-orientedfunctionality of NumPy and the domain-specific abstractions provided by the higher layersof the stack. These domain-specific tools are commonly called SciKits-popular ones amongthem are scikit-image (image processing), scikit-learn (machine learning), statsmodels(statistics), pandas (advanced data analysis), and so on. Listing every scientific package inPython would be nearly impossible since the scientific Python community is very active,and there is always a lot of development happening for a large number of scientificproblems. The best way to keep track of projects is to get involved in the community. It isimmensely useful to join mailing lists, contribute to code, use the software for your dailycomputational needs, and report bugs. One of the goals of this book is to get you interestedenough to actively involve yourself in the scientific Python community.

Page 10: NumPy Essentials - Sample Chapter

Chapter 1

[ 11 ]

The need for NumPy arraysA fundamental question that beginners ask is. Why are arrays necessary for scientificcomputing at all? Surely, one can perform complex mathematical operations on any abstractdata type, such as a list. The answer lies in the numerous properties of arrays that makethem significantly more useful. In this section, let's go over a few of these properties toemphasize why something such as the NumPy ndarray object exists at all.

Representing of matrices and vectorsThe abstract mathematical concepts of matrices and vectors are central to many scientificproblems. Arrays provide a direct semantic link to these concepts. Indeed, whenever a pieceof mathematical literature makes reference to a matrix, one can safely think of an array asthe software abstraction that represents the matrix. In scientific literature, an expressionsuch as Aij is typically used to denote the element in the ith row and jth column of array A.The corresponding expression in NumPy would simply be A[i,j]. For matrix operations,NumPy arrays also support vectorization (details are addressed in Chapter 3, UsingNumPy Arrays), which speeds up execution greatly. Vectorization makes the code moreconcise, easier to read, and much more akin to mathematical notation. Like matrices, arrayscan be multidimensional too. Every element of an array is addressable through a set ofintegers called indices, and the process of accessing elements of an array with sets ofintegers is called indexing. This functionality can indeed be implemented without usingarrays, but this would be cumbersome and quite unnecessary.

EfficiencyEfficiency can mean a number of things in software. The term may be used to refer to thespeed of execution of a program, its data retrieval and storage performance, its memoryoverhead (the memory consumed when a program is executing), or its overall throughput.NumPy arrays are better than most other data structures with respect to almost all of thesecharacteristics (with a few exceptions such as pandas, DataFrames, or SciPy's sparsematrices, which we shall deal with in later chapters). Since NumPy arrays are staticallytyped and homogenous, fast mathematical operations can be implemented in compiledlanguages (the default implementation uses C and Fortran). Efficiency (the availability offast algorithms working on homogeneous arrays) makes NumPy popular and important.

Page 11: NumPy Essentials - Sample Chapter

An Introduction to NumPy

[ 12 ]

Ease of developmentThe NumPy module is a powerhouse of off-the-shelf functionality for mathematical tasks. Itadds greatly to Python's ease of development. The following is a brief summary of what themodule contains, most of which we shall explore in this book. A far more detailed treatmentof the NumPy module is in the definitive Guide to NumPy, Travis Oliphat. The NumPy API isso flexible that it has been adopted extensively by the scientific Python community as thestandard API to build scientific applications. Examples of how this standard is appliedacross scientific disciplines can be found in The NumPy Array: a structure for efficientnumerical computation, Van Der Walt, and others:

Submodule Contents

numpy.core Basic objects

lib Additional utilities

linalg Basic linear algebra

fft Discrete Fourier transforms

random Random number generators

distutils Enhanced build and distribution

testing Unit testing

f2py Automatic wrapping of the Fortran code

NumPy in Academia and IndustryIt is said that, if you stand at Times Square long enough, you will meet everyone in theworld. By now, you must have been convinced that NumPy is the Times Square of SciPy. Ifyou are writing scientific applications in Python, there is not much you can do withoutdigging into NumPy. Figure 2 shows the scope of SciPy in scientific computing at varyinglevels of abstraction. The red arrow denotes the various low-level functions that areexpected of scientific software, and the blue arrow denotes the different applicationdomains that exploit these functions. Python, armed with the SciPy stack, is at the forefrontof the languages that provide these capabilities.

Page 12: NumPy Essentials - Sample Chapter

Chapter 1

[ 13 ]

A Google Scholar search for NumPy returns nearly 6,280 results. Some of these are papersand articles about NumPy and the SciPy stack itself, and many more are about NumPy'sapplications in a wide variety of research problems. Academics love Python, which isshowcased by the increasing popularity of the SciPy stack as the primary language ofscientific programming in countless universities and research labs all over the world. Theexperiences of many scientists and software professionals have been published on thePython website:

Figure 2: Python versus other languages

Code conventions used in the bookNow that the credibility of Python and NumPy has been established, let's get our handsdirty.

The default environment used for all Python code in this book will be IPython. Instructionson how to install IPython and other tools follow in the next section. Throughout the book,you will only have to enter input in either the command window or the IPython prompt.Unless otherwise specified, code will refer to Python code, and command will refer to bashor DOS commands.

Page 13: NumPy Essentials - Sample Chapter

An Introduction to NumPy

[ 14 ]

All Python input code will be formatted in snippets like these:

In [42]: print("Hello, World!")

In [42]: in the preceding snippet indicates that this is the 42nd input to the IPythonsession. Similarly, all input to the command line will be formatted as follows:

$ python hello_world.py

On Windows systems, the same command will look something like this:

C:\Users\JohnDoe> python hello_world.py

For the sake of consistency, the $ sign will be used to denote the command-line prompt,regardless of OS. Prompts, such as C:\Users\JohnDoe>, will not appear in the book.While, conventionally, the $ sign indicates bash prompts on Unix systems, the samecommands (without typing the actual dollar sign or any other character), can be used onWindows too. If, however, you are using Cygwin or Git Bash, you should be able to useBash commands on Windows too.

Note that Git Bash is available by default if you install Git on Windows.

Installation requirementsLet's take a look at the various requirements we need to set up before we proceed.

Using Python distributionsThe three most important Python modules you need for this book are NumPy, IPython, andmatplotlib; in this book, the code is based on the Python 3.4/2.7- compatible version,NumPy version 1.9, and matplotlib 1.4.3. The easiest way to install these requirements (andmore) is to install a complete Python distribution, such as Enthought Canopy, EPD,Anaconda, or Python (x,y). Once you have installed any one of these, you can safely skipthe remainder of this section and should be ready to begin.

Page 14: NumPy Essentials - Sample Chapter

Chapter 1

[ 15 ]

Note for Canopy users: You can use the Canopy GUI, which includes anembedded IPython console, a text editor, and IPython notebook editors.When working with the command line, for best results use the CanopyTerminal found in Canopy's Tools menu.Note for Windows OS users: Besides the Python distribution, you can alsoinstall the prebuilt Windows python extended packages from GhristophGohlke's website at h t t p : / / w w w . l f d . u c i . e d u / ~ g o h l k e / p y t h o n l i b s /

Using Python package managersYou can also use Python package managers, such enpkg, Conda, pip or easy_install, toinstall the requirements using one of the following commands; replace numpy with anyother package name you'd like to install, for example, ipython, matplotlib and so on:

$ pip install numpy$ easy_install numpy$ enpkg numpy # for Canopy users$ conda install numpy # for Anaconda users

Using native package managersIf the Python interpreter you want to use comes with the OS and is not a third-partyinstallation, you may prefer using OS-specific package managers such as aptitude, yum, orHomebrew. The following table illustrates the package managers and the respectivecommands used to install NumPy:

Package managers Commands

Aptitude $ sudo apt-get install python-numpy

Yum $ yum install python-numpy

Homebrew $ brew install numpy

Note that, when installing NumPy (or any other Python modules) on OS X systems withHomebrew, Python should have been originally installed with Homebrew.

Page 15: NumPy Essentials - Sample Chapter

An Introduction to NumPy

[ 16 ]

Detailed installation instructions are available on the respective websites of NumPy,IPython, and matplotlib. As a precaution, to check whether NumPy was installed properly,open an IPython terminal and type the following commands:

In [1]: import numpy as np In [2]: np.test()

If the first statement looks like it does nothing, this is a good sign. If it executes without anyoutput, this means that NumPy was installed and has been imported properly into yourPython session. The second statement runs the NumPy test suite. It is not criticallynecessary, but one can never be too cautious. Ideally, it should run for a few minutes andproduce the test results. It may generate a few warnings, but these are no cause for alarm. Ifyou wish, you may run the test suites of IPython and matplotlib, too.

Note that the matplotlib test suite only runs reliably if matplotlib has beeninstalled from a source. However, testing matplotlib is not very necessary.If you can import matplotlib without any errors, it indicates that it is readyfor use.

Congratulations! We are now ready to begin.

SummaryIn this chapter, we introduced ourselves to the NumPy module. We took a look at howNumPy is a useful software tool to have for those of you who are working in scientificcomputing. We installed the software required to proceed through the rest of this book.

In next chapter, we will get to the powerful NumPy ndarray object, showing you how touse it efficiently.