Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can...
Transcript of Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can...
![Page 1: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/1.jpg)
1© 2015 The MathWorks, Inc.
Is a Data Scientist the
New Quant?
Stuart Kozola
MathWorks
![Page 2: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/2.jpg)
2
Data Science
Knowledge about or study of the natural world based upon facts learned through
experiments and observation
A particular area of scientific study (such as biology, physics, or chemistry)
A subject that is formally studied in a college, university, etc.
Facts or information used usually to calculate, analyze, or plan something
Information that is produced or stored by a computer
Source: http://www.merriam-webster.com/dictionary/
![Page 3: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/3.jpg)
3
Quant
An expert at analyzing and managing quantitative data
Source: http://www.merriam-webster.com/dictionary/
![Page 4: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/4.jpg)
4
What do Data Scientists do?
Data Analysis
Statistics
Machine Learning
Software Engineering
Multivariable Calculus and Linear
Algebra
Big Data
Data Munging
Data Visualization and
Communication
Source: http://blog.udacity.com/2014/11/data-science-job-skills.html
![Page 5: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/5.jpg)
5
Hacking
Skills
Math &
Statistics
Knowledge
Financial Expertise
What do Quants do?
Data Analysis
Statistics
Machine Learning
Software Engineering
Multivariable Stochastic Calculus,
Linear Algebra, Mathematical
Programming
Big Data
Data Munging
Data Visualization and
Communication
Machine
Learning
Quant
Traditional
Research
Danger
Zone
![Page 6: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/6.jpg)
6
Is a Data Scientist the New Quant?
Source: http://indeed.com/trends
![Page 7: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/7.jpg)
7
Data Science: 2 Major Trends
Big Data
Machine Learning
![Page 8: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/8.jpg)
8
Big Data - Data Sources
Web
– Social Media
– Web Forms
Businesses
– Transactions
– Customers
Sensors
– Cameras
– Accelerometers
– GPS
– Microphones
– Weather stations
File I/O• Text
• Spreadsheet
• XML
• CDF/HDF
• Image
• Audio
• Video
• Geospatial
• Web content
Hardware Access• Data acquisition
• Image capture
• GPU
• Lab instruments
Communication Protocols• CAN (Controller Area Network)
• DDS (Data Distribution Service)
• OPC (OLE for Process Control)
• XCP (eXplicit Control Protocol)
Database Access• Financial Data
• ODBC
• JDBC
• HDFS (Hadoop)
![Page 9: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/9.jpg)
9
Big Data
“Any collection of data sets so large and complex that it becomes difficult to
process using … traditional data processing applications.” (Wikipedia)
Traditional: local in-memory processing
Laptop: 12 GB
>> a = rand(1E9,1); % 8 GB array
What about data processing?
8 GB
![Page 10: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/10.jpg)
10
2.5 ???
2.5 Zettabytes
Sensor DataFrom 1 year of commercial flights in the US
![Page 11: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/11.jpg)
11
Considerations for Big Data
Data characteristics
– Size, type and location of your data
Compute platform
– Single desktop machine or cluster
Analysis Characteristics
– Embarrassingly Parallel
– Analyze sub-segments of data and aggregate results
– Operate on entire dataset
![Page 12: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/12.jpg)
12
New Big Data Capabilities in MATLAB
Memory and Data Access
64-bit processors
Memory Mapped Variables
Disk Variables
Databases
Datastores
Platforms
Desktop (Multicore, GPU)
Clusters
Cloud Computing (MDCS on EC2)
Hadoop
Programming Constructs
Streaming
Block Processing
Parallel-for loops
GPU Arrays
SPMD and Distributed Arrays
MapReduce
![Page 13: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/13.jpg)
13
Big Data Challenges
Data Hygiene
– Data is dirty, it wasn’t collected with your use-case in mind
Data Munging
– Combining data from different sources
![Page 14: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/14.jpg)
14
Data Hygiene
![Page 15: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/15.jpg)
15
Data Hygiene
Anomalies
Missing Data
![Page 16: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/16.jpg)
16
Data Munging
Cleaning data that has errors, outliers, or duplicates
Handling missing data
– Discarding
– Filtering
– Imputation
Merging and time-aligning data (might have different
sample rates)
Working with data in different domains
![Page 17: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/17.jpg)
17
Domains
Mathematical
Time-series
Signal
Image & Video
Acoustic
Financial
Geospatial
Text
Weather and Environmental
![Page 18: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/18.jpg)
18
Machine learning uses data and produces a program to
perform a task
Standard Approach Machine Learning Approach
𝑚𝑜𝑑𝑒𝑙 = <𝑴𝒂𝒄𝒉𝒊𝒏𝒆𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈
𝑨𝒍𝒈𝒐𝒓𝒊𝒕𝒉𝒎>(𝑠𝑒𝑛𝑠𝑜𝑟_𝑑𝑎𝑡𝑎, 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦)
Computer
Program
MachineLearning
𝑚𝑜𝑑𝑒𝑙: Inputs → OutputsHand Written Program Formula or Equation
If X_acc > 0.5
then “SITTING”If Y_acc < 4 and Z_acc > 5
then “STANDING”
…
𝑌𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦= 𝛽1𝑋𝑎𝑐𝑐 + 𝛽2𝑌𝑎𝑐𝑐+ 𝛽3𝑍𝑎𝑐𝑐 +
…
Task: Human Activity Detection
Machine LearningExample: Human Activity Learning Using Mobile Phone Data
![Page 19: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/19.jpg)
19
Example: Human Activity Learning Using Mobile Phone Data
Objective: Train a classifier to classify
human activity from sensor data
Data:
Approach:
– Extract features from raw sensor signals
– Train and compare classifiers
– Test results on new sensor data
Predictors 3-axial Accelerometer and
Gyroscope data
Response Activity:
![Page 20: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/20.jpg)
20
Machine Learning is Everywhere
Image Recognition
Speech Recognition
Stock Prediction
Medical Diagnosis
Data Analytics
Robotics
and more…
[TBD]
![Page 21: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/21.jpg)
21
Overview – Machine Learning
Machine
Learning
Supervised
Learning
Classification
Regression
Unsupervised
LearningClustering
Group and interpretdata based only
on input data
Develop predictivemodel based on bothinput and output data
Type of Learning Categories of Algorithms
![Page 22: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/22.jpg)
22
Unsupervised Learning
Clustering
k-Means,
Fuzzy C-Means
Hierarchical
Neural
Networks
Gaussian
Mixture
Hidden Markov
Model
![Page 23: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/23.jpg)
23
Supervised Learning
Regression
Non-linear Reg.
(GLM, Logistic)
Linear
RegressionDecision Trees
Ensemble
Methods
Neural
Networks
Classification
Nearest
Neighbor
Discriminant
AnalysisNaive Bayes
Support Vector
Machines
![Page 24: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/24.jpg)
24
Application: Retail / Supply ChainTesco: 2nd largest retailer in the world
How do promotions and weather affect food sales?
Use historical data to develop a predictive model
Validate model and incorporate into business systems
![Page 25: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/25.jpg)
25
Data
AeronauticsOff-highway
vehicles
Automotive
Oil & Gas
Industrial
Automation
Fleet
Analytics
Health Monitoring
Asset
Analytics
Process
Analytics
Integrated Vehicle
Health Management
Condition
Monitoring
Clean
Energy
Medical
Devices
![Page 26: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/26.jpg)
26
MathWorks Approach
Don’t expect you to be an expert in everything
Make it easy to analyze all types of data in any domain
Integrated workflow
– Access Data
– Analysis
– Deployment
Flexible language for customizing
One platform for multidisciplinary collaboration
![Page 27: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/27.jpg)
27
Predictive Analytics Example
Energy Demand Forecasting
Forecast electricity demand for US power grids with live data from
ISOs and weather stations using Neural Network models.
http://ec2-54-165-201-58.compute-
1.amazonaws.com:8080/DemandForecastWeb/
![Page 28: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/28.jpg)
28
Takeaways
Quants – original data scientists?
Data Science – Analytics across multiple domains
– Big Data
– Machine Learning
Demand for data science skills is high
Data is messy – getting the signal from the noise is often the biggest part
![Page 29: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/29.jpg)
29
Moving Beyond the Hype to Practical Data Science
What you can learn to get ahead of the competition:
Stochastic Processes
– Learn from quantitative finance
From Static to Dynamic Systems
– Move beyond static relationships
– System identification
Decision by Optimization
– Mathematical programming
– Simulation based decision making
![Page 30: Is a Data Scientist the New Quant?...Moving Beyond the Hype to Practical Data Science What you can learn to get ahead of the competition: Stochastic Processes –Learn from quantitative](https://reader035.fdocuments.net/reader035/viewer/2022063001/5f1237129f53742599630e84/html5/thumbnails/30.jpg)
30
© 2015 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list of
additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.