ECE 592 Topics in Data Science - Nc State UniversityECE 592 Topics in Data Science Dror Baron...

Post on 25-Apr-2020

8 views 0 download

Transcript of ECE 592 Topics in Data Science - Nc State UniversityECE 592 Topics in Data Science Dror Baron...

ECE 592Topics in Data Science

Dror BaronAssociate Professor

Dept. of Electrical and Computer Engr.North Carolina State University, NC, USA

Today’s class

2

About instructor

Advice for new graduate students

Course structure

Motivation for data science– What is data science?– Applications– Examples

Constantly improving course Please provide feedback

About the instructor

3

Dr. Dror Baron– Email: barondror@ncsu.edu– Office: EB2 2097– Office hour: after Monday class or by appointment

At NC State since 2010

Also taught – ECE 308 (control)– ECE 421 (signal processing)– ECE 514 (random processes)– ECE 792 (universal algorithms in communication & signal proc.)

Research interests– Statistical signal processing & information theory– Information theoretic approaches to sparse signal processing– Recent interest in large scale iterative algorithms

Advice for new graduate students

4

Who here is new?

Welcome!

Many international graduate students in ECE– Hope you aren’t shocked – If you aren’t sure – ask!– Lots of cars drivers unaware of pedestrians be careful

Course Structure

Course structure

6

Course webpage – contains relevant materials

Main course components:– Providing feedback (might get a message board)– Prerequisites– Course purpose– Outline / main topics– Textbook(s)– Matlab and/or Python– Assignments (homeworks & projects)– Grade structure

We have a tentative schedule and syllabus

Feedback

7

Message board (maybe)

Email

Questions?

Prerequisites

8

Eager to learn about data science

Coursework:– ECE 421 (signal processing)– ST 371 (probability)

Comfortable with linear algebra & probability

Comfortable with programming– Big data big datasets must be fast– Matlab and/or Python– Will cover scientific programming

Course purpose

9

A big picture idea about data science– Probabilistic / information theoretic perspective– Scientific programming

What’s it good for?– Learning from data – Big data sets

Core techniques?

Components: math, computers, algorithms, data, …

Outline / main topics

10

Introduction/motivation Scientific programming

– Computational complexity, data structures, profiling

Optimization– Dynamic programming, linear programming, convex

optimization, integer programming, EM algorithm Machine learning basics

– Classification, clustering, regression

Sparse signal processing– Wavelets, sparse acquisition & reconstruction

Dimensionality reduction – Principle components analysis

New in 2019!

11

2016: too much sparse signal processing

2017: less sparsity; started with machine learning (ML); students wanted more (and more, and more…) of that– End of semester: recommended to start with scientific

programming (realized they lacked knowledge there)

2018: started w/scientific programming– Background on probability & information theory (helps

understand ML); more optimization & ML; less sparsity

2019: improve projects/homeworks with TA

Textbook(s) and online references

12

No single textbook Borrowing from multiple sources:

– Bishop, Pattern Recognition and Machine Learning– MacKay, Information Theory, Inference, and Learning Algorithms– Mohri et al., Foundations of Machine Learning– Hastie et al., The Elements of Statistical Learning

Slides posted online– Typing details for new stuff as we go along– Please ask for extra supplemental material if helpful

Matlab and/or Python

13

Matlab: good for prototyping Python:

– Closer to normal programming language– Increasingly used in industry

Various languages used in data science– SAS, R, …– Core implementations often in C/C++, Java, …

Please download Matlab/Python to personal machines– Links (including tutorial) on webpage

Assignments

14

Homeworks– More math– Less programming– Every 2-3 weeks

Projects– 3-4 “homework style” projects– Integration of math, algorithm development, & programming– Oriented around application and data– Final project will focus on topic of specific interest to students; 2-3

students will submit report and present to class

Both HW & P submtted individually, in pairs, or triples

Tentative grade structure

15

Homework 15% Projects 20% Final project 20% Midterm 20% Final exam 25% (schedule determined by university)

– Note: 2 hour final exam

Extra credit 2-3%

Motivation for course?

16

Why take ECE 592? [Students suggest reasons for doing so; we discuss]

Motivation & Applications

Keywords: big data, data science

What is data science?

18

Wikipedia:Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured,[1][2] which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics

Extract knowledge from data Multi-disciplinary (math, statistics, programming, …)

Why is it receiving attention?

19

Big data– Petabytes (1015 B) now commonplace– Often requires multiple processors

• Large amounts of storage• Clusters or GPU’s

Big data as societal feedback system

20

Can extract bigger profits from bigger data– Note: can replace “profit” by utility, societal benefit, etc.

improved computing capabilities

learn more from data

profits

process more data

provide better servicebuy more computers; spend more on R&D…

Application #1 – Click prediction

21

Show users ads online Paid for clicks Track various data related to each ad

– Ad topic, user history, geographic location, time of day, …

Better prediction more ad revenue

Personal anecdote– Read something about Audi– Lots of Audi commercials that week (creepy?)

Application #2 – Speech recognition

22

User speaks into phone Phone converts audio signal to speech We’re seeing more of this in automated call centers

Technical approach shifting from modeling speech (hidden Markov models) to training on lots of data– Major trend due to increasing computational power

Application #3 – Mortgage defaults

23

Consumers have mortgages on homes Some consumers stop paying (default) Bank loses $ Want to predict who defaults and how much

Similar to click prediction (binary classification) Possibly more complex (want default amount)

Similar: credit card payments

Application #4 – Financial prediction

24

Lots of financial assets (stocks, bonds, …) traded Data about different assets

– Company sector, profits, growth rate, R&D spending, past prices, …

Want to predict future prices Want to design portfolio that goes up with low

volatility (small fluctuations)

Application #5 – Games

25

Go – popular game in Asia Deep learning method trained on millions of games Beat Korean champion player

Old approach – program computer to play chess

New approach – let computer look at (lots of) games

Application #6 – Identify handwriting

26

Post office wants to recognize zip codes

Seems “easy”– Location of zip code on envelope can be identified– Can partition into individual digits– Only 10 digits

Typical approach – look at lots of data, compare individual digit to data base, choose nearest neighbor(s)

Application #7 – Autonomous cars

27

You heard about this, right?

Example: Polynomial Curve Fitting [Bishop – Sec. 1.1]

Keywords: curve fitting, least squares

Problem setting

29

Input variables x=(x1,…,xN)T

Observe noisy target variables t=(t1,…,tN)T

– Want to predict (future) target variables

Model for noisy observations: tn=sin(2πxn)+zn

Measurement noise zn

Want to perform polynomial curve fitting Find order-M polynomial that best-explains t

Why polynomial curve fitting?

30

Why might polynomial approximation to unknown function work?– Taylor series – approximate function w/polynomial

Maybe Fourier expansion is “better”– It is in this case

Side information about problem very useful– True function sparse in Fourier basis– Sometimes we have side information; sometimes not

What does “best explains” mean?

31

Suppose polynomial weights w We predict y(x,w)=t’=w0+w1x+…+wMxM

Expect y(x,w)=t’≈t(x)

Let’s provide a score for w weights

𝐸𝐸 𝑤𝑤 = �𝑛𝑛=1

𝑁𝑁

{𝑦𝑦 𝑥𝑥𝑛𝑛,𝑤𝑤 − 𝑡𝑡𝑛𝑛}2

Want w that minimizes E(w)

Why squared error?

32

Our score for w sums over squared errors:

𝐸𝐸 𝑤𝑤 = �𝑛𝑛=1

𝑁𝑁

{𝑦𝑦 𝑥𝑥𝑛𝑛,𝑤𝑤 − 𝑡𝑡𝑛𝑛}2

Absolute error would emphasize “typical” errors, less emphasis on larger ones– Higher powers bring out outliers

Error metric may coincide to statistical distribution of possible noise (squared error implies Gaussian)

Math analysis (will revisit details later)

33

Can write𝑦𝑦(𝑥𝑥1,𝑤𝑤)

⋮𝑦𝑦(𝑥𝑥𝑁𝑁,𝑤𝑤)

=1 ⋯ 𝑥𝑥1𝑀𝑀⋮ ⋱ ⋮1 ⋯ 𝑥𝑥𝑁𝑁𝑀𝑀

𝑤𝑤0⋮𝑤𝑤𝑀𝑀

Shorthand y(x,w)=Xw (matrix vector product) Recall tn=sin(2πxn)+zn

Searching for vector w with minimal ||y(x,w)-t||2

– Recall ℓ𝑝𝑝 norm ||z||p=[Σn(zn)p]1/p

– Euclidean norm ||z||2=Σn(zn)2

Will study “least squares” finds w that minimizes ||Xw-t||2

– Closed form: w* = (XTX)-1XTt = X+t

Let’s check it out (Matlab on webpage)

34

N=10 noisy observations M=0 order

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

-0.5

0

0.5

1

1.5

observations

truth

polynomial

Higher order?

35

N=10 noisy observations M=3 order

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

-0.5

0

0.5

1

1.5

observations

truth

polynomial

Even higher order?

36

N=50 noisy observations M=20 order Overfitting!

More observations?

37

N=1000 noisy observations M=3 order

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

observations

truth

polynomial

Discussion

38

More observations better curve fit (fixed M)

Small M const curve or linear curve (bad fit) Large M overfitting (polynomial will go crazy)

Challenge: How to estimate “good” M? Solution: test data

– Training data – for computing optimal weights w– Test data – check how well w explains remaining data– Find M that results in low error

Detailed discussion in book