ECE 592Topics in Data Science
Dror BaronAssociate Professor
Dept. of Electrical and Computer Engr.North Carolina State University, NC, USA
Today’s class
2
About instructor
Advice for new graduate students
Course structure
Motivation for data science– What is data science?– Applications– Examples
Constantly improving course Please provide feedback
About the instructor
3
Dr. Dror Baron– Email: [email protected]– Office: EB2 2097– Office hour: after Monday class or by appointment
At NC State since 2010
Also taught – ECE 308 (control)– ECE 421 (signal processing)– ECE 514 (random processes)– ECE 792 (universal algorithms in communication & signal proc.)
Research interests– Statistical signal processing & information theory– Information theoretic approaches to sparse signal processing– Recent interest in large scale iterative algorithms
Advice for new graduate students
4
Who here is new?
Welcome!
Many international graduate students in ECE– Hope you aren’t shocked – If you aren’t sure – ask!– Lots of cars drivers unaware of pedestrians be careful
Course Structure
Course structure
6
Course webpage – contains relevant materials
Main course components:– Providing feedback (might get a message board)– Prerequisites– Course purpose– Outline / main topics– Textbook(s)– Matlab and/or Python– Assignments (homeworks & projects)– Grade structure
We have a tentative schedule and syllabus
Feedback
7
Message board (maybe)
Questions?
Prerequisites
8
Eager to learn about data science
Coursework:– ECE 421 (signal processing)– ST 371 (probability)
Comfortable with linear algebra & probability
Comfortable with programming– Big data big datasets must be fast– Matlab and/or Python– Will cover scientific programming
Course purpose
9
A big picture idea about data science– Probabilistic / information theoretic perspective– Scientific programming
What’s it good for?– Learning from data – Big data sets
Core techniques?
Components: math, computers, algorithms, data, …
Outline / main topics
10
Introduction/motivation Scientific programming
– Computational complexity, data structures, profiling
Optimization– Dynamic programming, linear programming, convex
optimization, integer programming, EM algorithm Machine learning basics
– Classification, clustering, regression
Sparse signal processing– Wavelets, sparse acquisition & reconstruction
Dimensionality reduction – Principle components analysis
New in 2019!
11
2016: too much sparse signal processing
2017: less sparsity; started with machine learning (ML); students wanted more (and more, and more…) of that– End of semester: recommended to start with scientific
programming (realized they lacked knowledge there)
2018: started w/scientific programming– Background on probability & information theory (helps
understand ML); more optimization & ML; less sparsity
2019: improve projects/homeworks with TA
Textbook(s) and online references
12
No single textbook Borrowing from multiple sources:
– Bishop, Pattern Recognition and Machine Learning– MacKay, Information Theory, Inference, and Learning Algorithms– Mohri et al., Foundations of Machine Learning– Hastie et al., The Elements of Statistical Learning
Slides posted online– Typing details for new stuff as we go along– Please ask for extra supplemental material if helpful
Matlab and/or Python
13
Matlab: good for prototyping Python:
– Closer to normal programming language– Increasingly used in industry
Various languages used in data science– SAS, R, …– Core implementations often in C/C++, Java, …
Please download Matlab/Python to personal machines– Links (including tutorial) on webpage
Assignments
14
Homeworks– More math– Less programming– Every 2-3 weeks
Projects– 3-4 “homework style” projects– Integration of math, algorithm development, & programming– Oriented around application and data– Final project will focus on topic of specific interest to students; 2-3
students will submit report and present to class
Both HW & P submtted individually, in pairs, or triples
Tentative grade structure
15
Homework 15% Projects 20% Final project 20% Midterm 20% Final exam 25% (schedule determined by university)
– Note: 2 hour final exam
Extra credit 2-3%
Motivation for course?
16
Why take ECE 592? [Students suggest reasons for doing so; we discuss]
Motivation & Applications
Keywords: big data, data science
What is data science?
18
Wikipedia:Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured,[1][2] which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics
Extract knowledge from data Multi-disciplinary (math, statistics, programming, …)
Why is it receiving attention?
19
Big data– Petabytes (1015 B) now commonplace– Often requires multiple processors
• Large amounts of storage• Clusters or GPU’s
Big data as societal feedback system
20
Can extract bigger profits from bigger data– Note: can replace “profit” by utility, societal benefit, etc.
improved computing capabilities
learn more from data
profits
process more data
provide better servicebuy more computers; spend more on R&D…
Application #1 – Click prediction
21
Show users ads online Paid for clicks Track various data related to each ad
– Ad topic, user history, geographic location, time of day, …
Better prediction more ad revenue
Personal anecdote– Read something about Audi– Lots of Audi commercials that week (creepy?)
Application #2 – Speech recognition
22
User speaks into phone Phone converts audio signal to speech We’re seeing more of this in automated call centers
Technical approach shifting from modeling speech (hidden Markov models) to training on lots of data– Major trend due to increasing computational power
Application #3 – Mortgage defaults
23
Consumers have mortgages on homes Some consumers stop paying (default) Bank loses $ Want to predict who defaults and how much
Similar to click prediction (binary classification) Possibly more complex (want default amount)
Similar: credit card payments
Application #4 – Financial prediction
24
Lots of financial assets (stocks, bonds, …) traded Data about different assets
– Company sector, profits, growth rate, R&D spending, past prices, …
Want to predict future prices Want to design portfolio that goes up with low
volatility (small fluctuations)
Application #5 – Games
25
Go – popular game in Asia Deep learning method trained on millions of games Beat Korean champion player
Old approach – program computer to play chess
New approach – let computer look at (lots of) games
Application #6 – Identify handwriting
26
Post office wants to recognize zip codes
Seems “easy”– Location of zip code on envelope can be identified– Can partition into individual digits– Only 10 digits
Typical approach – look at lots of data, compare individual digit to data base, choose nearest neighbor(s)
Application #7 – Autonomous cars
27
You heard about this, right?
Example: Polynomial Curve Fitting [Bishop – Sec. 1.1]
Keywords: curve fitting, least squares
Problem setting
29
Input variables x=(x1,…,xN)T
Observe noisy target variables t=(t1,…,tN)T
– Want to predict (future) target variables
Model for noisy observations: tn=sin(2πxn)+zn
Measurement noise zn
Want to perform polynomial curve fitting Find order-M polynomial that best-explains t
Why polynomial curve fitting?
30
Why might polynomial approximation to unknown function work?– Taylor series – approximate function w/polynomial
Maybe Fourier expansion is “better”– It is in this case
Side information about problem very useful– True function sparse in Fourier basis– Sometimes we have side information; sometimes not
What does “best explains” mean?
31
Suppose polynomial weights w We predict y(x,w)=t’=w0+w1x+…+wMxM
Expect y(x,w)=t’≈t(x)
Let’s provide a score for w weights
𝐸𝐸 𝑤𝑤 = �𝑛𝑛=1
𝑁𝑁
{𝑦𝑦 𝑥𝑥𝑛𝑛,𝑤𝑤 − 𝑡𝑡𝑛𝑛}2
Want w that minimizes E(w)
Why squared error?
32
Our score for w sums over squared errors:
𝐸𝐸 𝑤𝑤 = �𝑛𝑛=1
𝑁𝑁
{𝑦𝑦 𝑥𝑥𝑛𝑛,𝑤𝑤 − 𝑡𝑡𝑛𝑛}2
Absolute error would emphasize “typical” errors, less emphasis on larger ones– Higher powers bring out outliers
Error metric may coincide to statistical distribution of possible noise (squared error implies Gaussian)
Math analysis (will revisit details later)
33
Can write𝑦𝑦(𝑥𝑥1,𝑤𝑤)
⋮𝑦𝑦(𝑥𝑥𝑁𝑁,𝑤𝑤)
=1 ⋯ 𝑥𝑥1𝑀𝑀⋮ ⋱ ⋮1 ⋯ 𝑥𝑥𝑁𝑁𝑀𝑀
𝑤𝑤0⋮𝑤𝑤𝑀𝑀
Shorthand y(x,w)=Xw (matrix vector product) Recall tn=sin(2πxn)+zn
Searching for vector w with minimal ||y(x,w)-t||2
– Recall ℓ𝑝𝑝 norm ||z||p=[Σn(zn)p]1/p
– Euclidean norm ||z||2=Σn(zn)2
Will study “least squares” finds w that minimizes ||Xw-t||2
– Closed form: w* = (XTX)-1XTt = X+t
Let’s check it out (Matlab on webpage)
34
N=10 noisy observations M=0 order
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
-0.5
0
0.5
1
1.5
observations
truth
polynomial
Higher order?
35
N=10 noisy observations M=3 order
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
-0.5
0
0.5
1
1.5
observations
truth
polynomial
Even higher order?
36
N=50 noisy observations M=20 order Overfitting!
More observations?
37
N=1000 noisy observations M=3 order
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
observations
truth
polynomial
Discussion
38
More observations better curve fit (fixed M)
Small M const curve or linear curve (bad fit) Large M overfitting (polynomial will go crazy)
Challenge: How to estimate “good” M? Solution: test data
– Training data – for computing optimal weights w– Test data – check how well w explains remaining data– Find M that results in low error
Detailed discussion in book
Top Related