Introduction to data science
-
Upload
vignesh-prajapati -
Category
Technology
-
view
702 -
download
2
Transcript of Introduction to data science
Introduction to Data Science
Dr. Bill Howe - Director of Research, Scalable Data Analytics
What is data science?◦ Set of theories and principles to perform several data
related tasks, like
◦ Data collection
◦ Data cleaning
◦ Data integration
◦ Data modeling
◦ Data visualization
Introduction to Data Science
Data science is different from ◦ Business intelligence
◦ Statistics
◦ Database management
◦ Visualization
◦ Machine Learning
Introduction to Data Science
DBA- Unstructured data
Statistician – data that doesn’t fit in to memories
Software engineer- statistical models and how to communicate results
Business analyst- algorithms and tradeoff at scale
Suggest ion for students!!
Common three skills of Data scientiest◦ Statistics
traditional analysis
◦ Data Munging parsing, scraping, and formatting data
◦ Visualization graphs, tools, etc.
What do data scientists do?
Three types of tasks:
◦ Preparing to run a model
◦ Running the model
◦ Communicating the results
What do data scientists do?
◦ Preparing to run a model Gathering
Cleaning
Integrating
Restructuring
Transforming
Loading
Filtering
◦ Running the model Choosing appropriate machine learning
algorithms for regression, classification, clustering and recommendations.
Validation of model
Improvement of model
◦Communicating the results
Breadth◦ Mapreduce/Relational algebra/Logistic
regression/visualization Depth
◦ Structure (Relational algebra)/ statics (linear algebra)
Scale◦ Desktop (R)/Cloud (Hadoop)
Target◦ Hackers(R,Java, python) /Analyts (little/no
programming)
Data science dimensions
Scale – Cloud for Bigdata The bigdata can be measured by 3 V’s
◦ Volume – number of rows (size)
◦ Variety – number of columns OR sources (text, images, audio, video)
◦ Velocity - number of rows OR bytes per unit time (processing time )
Data science dimensions
“data exhaust” from customers
new and pervasive sensors
the ability to “keep everything”
Where does big data come from?
Prior programming exercise◦ SQL◦ Python
Basic statistics
Basic database concepts
Prequisites
Twitter sentiment Analysis◦ Extract the tweets from twitter API
◦ Calculate the sentiment score for tweets
◦ Calculate the sentiment score for terms in tweets
◦ Calculate frequency for terms of tweets
◦ Identify the happiest state
◦ Identify the top ten hastag
Programming Assignment 1
Thanks !!