Power of Python with Big Data
-
Upload
edureka -
Category
Technology
-
view
916 -
download
0
Transcript of Power of Python with Big Data
![Page 1: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/1.jpg)
![Page 2: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/2.jpg)
What will you learn today?
Introduction to Big Data
Why Python is popular with Big Data?
Running MapReduce in Python
Working with Python NLTK and Hadoop
Demo on Zombie Invasion Model
Data Analytics with Pandas
![Page 3: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/3.jpg)
Big Data and Hadoop
![Page 4: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/4.jpg)
Big Data
Lots of Data (Terabytes or Petabytes)
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications
The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization
cloud
tools
statistics
No SQL
compression
storage
support
database
analize
information
terabytes
processing
mobile
Big Data
![Page 5: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/5.jpg)
Un-Structured Data is Exploding
Complex, Unstructured
Relational
2500 exabytes of new information in 2012 with internet as primary driver
Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year
![Page 6: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/6.jpg)
Hadoop for Big Data
Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
It is an Open-source Data Management with scale-out storage & distributed processing
![Page 7: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/7.jpg)
Why Python With Big Data?
![Page 8: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/8.jpg)
Why Python is popular with Big data?
Data Cleansing / Preparation
Writing Map Reduce Using Python
Leveraging Analytical power of Python on Big Data Set
With libraries like PyDoop and SciPy, it’s a dream come true for Big Data Analytics
![Page 9: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/9.jpg)
Demo: Data Preparation / Cleaning
Extracting Data
- Extract Data from Complex JSON for processing
Text analytics
- Remove stop words from a text Paragraph for further processing
![Page 10: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/10.jpg)
Demo
![Page 11: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/11.jpg)
PyDoop – Hadoop with Python
One of the biggest advantage of PyDoop is it’s HDFS API. This allows
you to connect to an HDFS installation, read and write files, and get
information on files, directories and global file system properties
The MapReduce API of PyDoop allows you to solve many complex
problems with minimal programming efforts. Advance MapReduce
concepts such as ‘Counters’ and ‘Record Readers’ can be implemented
in Python using PyDoop
Python can be used to write Hadoop MapReduce programs and applications to access HDFS API for Hadoop with
PyDoop package
![Page 12: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/12.jpg)
Python NLTK on Hadoop
![Page 13: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/13.jpg)
Python and Data Science
Python has a diverse range of open source
libraries for just about everything that a
Data Scientist does in his day-to-day work
Python and most of its libraries are both
open source and free
The day-to-day tasks of a data scientist involves many interrelated but different activities such as accessing
and manipulating data, computing statistics and , creating visual reports on that data, building predictive and
explanatory models, evaluating these models on additional data, integrating models into production systems,
etc.
![Page 14: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/14.jpg)
SciPy.org
SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science,
and engineering
NumPyBase N-dimensional array package
IPythonEnhanced Interactive Console
SciPy libraryBase N-dimensional array package
SympySymbolic mathematics
MatplotlibComprehensive 2D Plotting
pandasData structures and analysis
![Page 15: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/15.jpg)
Demo: Zombie Invasion Model
This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a
"zombie invasion", using the equations specified by Philip Munz
The system is given as:
dS/dt = P - B*S*Z - d*S
dZ/dt = B*S*Z + G*R - A*S*Z
dR/dt = d*S + A*S*Z - G*R
There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial
conditions
This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R]
Where:S: the number of susceptible victimsZ: the number of zombiesR: the number of people "killed”
P: the population birth rated: the chance of a natural deathB: the chance the "zombie disease" is transmitted (an alive person becomes a zombie)G: the chance a dead person is resurrected into a zombieA: the chance a zombie is totally destroyed
![Page 16: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/16.jpg)
Demo
![Page 17: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/17.jpg)
Python Pandas – Data Frames
![Page 18: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/18.jpg)
Demo
![Page 19: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/19.jpg)
Course Details
Become an expert in Python by Edureka
Go to www.edureka.co/python
Edureka's Mastering Python course:
• This course will cover both basic and advance concepts of Python like writing python scripts, sequence and file operations inpython, Machine Learning in Python, Web Scraping, Map Reduce in Python, Hadoop Streaming, Python UDF for Pig and Hive.
• You will also go through important and most widely used packages like pydoop, pandas, scikit, numpy, scipy etc.• Online Live Courses: 30 hours• Assignments: 40 hours• Project: 20 hours• Lifetime Access + 24 X 7 Support
![Page 20: Power of Python with Big Data](https://reader031.fdocuments.net/reader031/viewer/2022030316/58720dad1a28ab176b8b7da3/html5/thumbnails/20.jpg)
Thank You
Questions/Queries/Feedback
Recording and presentation will be made available to you within 24 hours