Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D...

16
Pandas Python for Data Analysis By: Asst. Prof. Abhishek Singh Sambyal Central University of Jammu

Transcript of Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D...

Page 1: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Pandas

Python for Data Analysis

By: Asst. Prof. Abhishek Singh Sambyal

Central University of Jammu

Page 2: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Pandas - outline● Overview

● Purpose - Why?????

● Terminology

● DataFrame

● DataFrame – Reindexing

● DataFrame - Summarising and Descriptive Statistics

● Bibliography

Page 3: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Pandas - overview● Python Data Analysis Library, similar to:

○ R

○ MATLAB

● Combined with IPython toolkit

● Built on top of NumPy, SciPy, to some Matplotlib

● Open source - BSD License

● Key Component

○ Dataframe

Page 4: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Pandas - Why??????● Ideal tool for data scientists

● Munging data

● Cleaning data

● Analyzing data

● Modeling data

● Organizing the results of the analysis into a form

suitable for plotting or tabular display

Page 5: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Pandas - terminology● IPython is a command shell for interactive computing in

multiple programming languages, especially focused on the

Python programming language, that offers enhanced

introspection, rich media, additional shell syntax, tab

completion, and rich history.

● NumPy is the fundamental package for scientific computing

with Python.

Page 6: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Pandas - terminology● Matplotlib is a python 2D plotting library which produces

publication quality figures in a variety of hardcopy

formats and interactive environments across platforms.

● SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem

of open-source software for mathematics, science, and

engineering.

● Data Munging or Data Wrangling means taking data that's

stored in one format and changing it into another format.

Page 7: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Pandas - terminology● Cython programming language is a superset of Python with

a foreign function interface for invoking C/C++ routines

and the ability to declare the static type of subroutine

parameters and results, local variables, and class

attributes.

Page 8: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Pandas -Dataframe

Data structure

Page 9: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Dataframe data structure● Spreadsheet-like data structure containing an order

collection of columns

● Has both a row and column index

● Consider as dict of Series (with shared index)

Page 10: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Dataframe● Creation with dict of equal-length lists

Page 11: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Dataframe● Creation with dict of dicts

Page 12: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Dataframe● Columns can be retrieved as

Series

○ dict notation

○ attribute notation

● Rows can retrieved by position or by name (using ix attribute)

Page 13: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Dataframe● New Columns can be added (by

computation or direct

assignment)

Page 14: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

DataFrame - Reindexing● Creation of new object with the

data conformed to a new index

Page 15: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

DataFrame - Summarising and Descriptive Statistics

Page 16: Pandas - cujammu.ac.in _Python_for_Data… · Pandas - terminology Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats

Pandas - Bibliography ● Pandas: Python Data Analysis Library.http://pandas.pydata.org/

● IPython. http://ipython.org/http://en.wikipedia.org/wiki/IPython

● NumPy - http://www.numpy.org/

● SciPy - http://scipy.org/

● Matplotlib - http://matplotlib.org/

● Cython - http://www.cython.org/