Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About...

44
Advanced Lecture on Statistical Data Analysis (Lecture 01) Ichiro Takeuchi Nagoya Institute of Technology Ichiro Takeuchi, Nagoya Institute of Technology 1/24

Transcript of Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About...

Page 1: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Advanced Lecture on

Statistical Data Analysis

(Lecture 01)

Ichiro Takeuchi

Nagoya Institute of Technology

Ichiro Takeuchi, Nagoya Institute of Technology 1/24

Page 2: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

About this course

▶ Instructor: Ichiro Takeuchi

▶ Language: English (Japanese questions are allowed)

▶ Room: 0231 (for lecture), 101A (for computer exerciseunless somebody does not have laptop)

▶ Time: Fri 08:50 - 10:20 (hopefully shorter)

▶ Handwritten exercise is often assigned

▶ You must attend every class unless you have good reasons

▶ The grade will be determined based on the final reportabout data analysis

▶ Up-to-date course info will be available from the courseweb-site (find a link in the Moodle cite)

Ichiro Takeuchi, Nagoya Institute of Technology 2/24

Page 3: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

What we learn in this course

▶ We learn machine learning and statistics

▶ We also learn how to use statistical software R (only onceor twice this year)

▶ In addition, I hope you have a nice experience on havingan English lecture

Ichiro Takeuchi, Nagoya Institute of Technology 3/24

Page 4: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

For Today

▶ Introduction to machine learning and statistics

▶ Introduction to R software

▶ How to set up your account in CSE (if applicable)

▶ Plan to finish it within 45mins

Ichiro Takeuchi, Nagoya Institute of Technology 4/24

Page 5: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

What is Machine Learning?

▶ The goal of ML is to provide general data analysis toolsfor prediction or knowledge discovery.

Spam Mail Filter Financial Data Analysis Disease Diagnosis

Ichiro Takeuchi, Nagoya Institute of Technology 5/24

Page 6: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

ML and AI (Artificial Intelligence)

http://anime.goo.ne.jp/special/tezuka/inf_atom-new.html

Ichiro Takeuchi, Nagoya Institute of Technology 6/24

Page 7: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Human vs. Computer

http://www.toptens.net/

1997 2011 2013

Ichiro Takeuchi, Nagoya Institute of Technology 7/24

Page 8: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Can computer recognize cat?

▶ Q. Can we write a computer program that can recognizecats?

▶ There are infinitely many cats in the world!

▶ How can we define a cat explicitly?

Ichiro Takeuchi, Nagoya Institute of Technology 8/24

Page 9: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Can computer recognize cat?

▶ Q. Can we write a computer program that can recognizecats?

▶ There are infinitely many cats in the world!

▶ How can we define a cat explicitly?

Ichiro Takeuchi, Nagoya Institute of Technology 8/24

Page 10: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Can computer recognize cat?

▶ Q. Can we write a computer program that can recognizecats?

▶ There are infinitely many cats in the world!

This is a cat

▶ How can we define a cat explicitly?

Ichiro Takeuchi, Nagoya Institute of Technology 8/24

Page 11: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Can computer recognize cat?

▶ Q. Can we write a computer program that can recognizecats?

▶ There are infinitely many cats in the world!

This is also a cat

▶ How can we define a cat explicitly?

Ichiro Takeuchi, Nagoya Institute of Technology 8/24

Page 12: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Can computer recognize cat?

▶ Q. Can we write a computer program that can recognizecats?

▶ There are infinitely many cats in the world!

This is also a cat

▶ How can we define a cat explicitly?

Ichiro Takeuchi, Nagoya Institute of Technology 8/24

Page 13: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Can computer recognize cat?

▶ Q. Can we write a computer program that can recognizecats?

▶ There are infinitely many cats in the world!

This is also a cat

▶ How can we define a cat explicitly?

Ichiro Takeuchi, Nagoya Institute of Technology 8/24

Page 14: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Can computer recognize cat?

▶ Q. Can we write a computer program that can recognizecats?

▶ There are infinitely many cats in the world!

This is also a cat

▶ How can we define a cat explicitly?

Ichiro Takeuchi, Nagoya Institute of Technology 8/24

Page 15: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Can computer recognize cat?

▶ Q. Can we write a computer program that can recognizecats?

▶ There are infinitely many cats in the world!

This is also a kind of cat

▶ How can we define a cat explicitly?

Ichiro Takeuchi, Nagoya Institute of Technology 8/24

Page 16: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Can computer recognize cat?

▶ Q. Can we write a computer program that can recognizecats?

▶ There are infinitely many cats in the world!

▶ How can we define a cat explicitly?

Ichiro Takeuchi, Nagoya Institute of Technology 8/24

Page 17: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Computer Programs

▶ Computer programs for recognizing cat should besomething like this:

if (x is cute)

x is a cat;

if (x has triangle-shape ears)

x is a cat;

if (x has fur)

x is a cat;

if (x has a tail)

x is cat;

...

▶ We cannot explicitly specify properties of all the cats inthe world

Ichiro Takeuchi, Nagoya Institute of Technology 9/24

Page 18: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Machine Learning Approach

▶ Instead of explicitly describing the properties of a cat, weprovide examples to a computer and let the computerlearn what is cat

Ichiro Takeuchi, Nagoya Institute of Technology 10/24

Page 19: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Machine Learning Approach

▶ Instead of explicitly describing the properties of a cat, weprovide examples to a computer and let the computerlearn what is cat

Ichiro Takeuchi, Nagoya Institute of Technology 10/24

Page 20: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Machine Learning Approach

▶ Instead of explicitly describing the properties of a cat, weprovide examples to a computer and let the computerlearn what is cat

Ichiro Takeuchi, Nagoya Institute of Technology 10/24

Page 21: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Machine Learning Approach

▶ Instead of explicitly describing the properties of a cat, weprovide examples to a computer and let the computerlearn what is cat

Ichiro Takeuchi, Nagoya Institute of Technology 10/24

Page 22: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Machine Learning Approach

▶ Instead of explicitly describing the properties of a cat, weprovide examples to a computer and let the computerlearn what is cat

Ichiro Takeuchi, Nagoya Institute of Technology 10/24

Page 23: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Machine Learning Approach

▶ Instead of explicitly describing the properties of a cat, weprovide examples to a computer and let the computerlearn what is cat

Ichiro Takeuchi, Nagoya Institute of Technology 10/24

Page 24: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Google + Stanford Project (2012)

▶ Data compression by auto-encoder

Ichiro Takeuchi, Nagoya Institute of Technology 11/24

Page 25: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Google + Stanford Project (2012)

▶ Data compression by auto-encoder

Ichiro Takeuchi, Nagoya Institute of Technology 11/24

Page 26: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Google + Stanford Project (2012)

▶ Data compression by auto-encoder

Ichiro Takeuchi, Nagoya Institute of Technology 11/24

Page 27: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Google + Stanford Project (2012)

▶ Data compression by auto-encoder

Ichiro Takeuchi, Nagoya Institute of Technology 11/24

Page 28: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Google + Stanford Project (2012)

▶ Data compression by auto-encoder

Ichiro Takeuchi, Nagoya Institute of Technology 11/24

Page 29: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Types of Problems in ML

▶ Supervised learning▶ Regression▶ Classification

▶ Binary classification▶ Multiclass classification

▶ Semi-supervised learning

▶ Unsupervised learning▶ Clustering▶ Density estimation

Ichiro Takeuchi, Nagoya Institute of Technology 12/24

Page 30: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Regression Learning

▶ Data

(x1, y1), (x2, y2), . . . , (xn, yn)

▶ Input

xi ∈ Rd, i ∈ {1, . . . , n}

▶ Output

yi ∈ R, i ∈ {1, . . . , n}

Ichiro Takeuchi, Nagoya Institute of Technology 13/24

Page 31: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

An Example

area (x) price (y)3.472 25.93.531 29.52.275 27.94.050 25.94.455 29.9...

... 20

25

30

35

40

45

50

2 3 4 5 6 7 8 9 10

Ichiro Takeuchi, Nagoya Institute of Technology 14/24

Page 32: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

An Example

area (x) price (y)3.472 25.93.531 29.52.275 27.94.050 25.94.455 29.9...

... 20

25

30

35

40

45

50

2 3 4 5 6 7 8 9 10

Ichiro Takeuchi, Nagoya Institute of Technology 14/24

Page 33: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Least-square Linear Regression

▶ Training Data:

{(x1, y1), . . . , (xn, yn)}, xi ∈ Rd, yi ∈ R

▶ Linear Model:

f(xi) = β0 + β1xi1 + . . .+ βdxid = β0 + β⊤xi

▶ Quadratic Loss:

minβ0,β

E(β0,β) =n∑

i=1

(yi − f(xi)

)2

Ichiro Takeuchi, Nagoya Institute of Technology 15/24

Page 34: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Types of Problems in ML

▶ Supervised learning▶ Regression▶ Classification

▶ Binary classification▶ Multiclass classification

▶ Semi-supervised learning

▶ Unsupervised learning▶ Clustering▶ Density estimation

Ichiro Takeuchi, Nagoya Institute of Technology 16/24

Page 35: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Classification Learning

▶ Data

(x1, y1), (x2, y2), . . . , (xn, yn)

▶ Input

xi ∈ Rd, i ∈ {1, . . . , n}

▶ Output

yi ∈ {−1,+1}, i ∈ {1, . . . , n}

Ichiro Takeuchi, Nagoya Institute of Technology 17/24

Page 36: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

A Binary Classification Example

▶ Consider a medical diagnosis problem (classify cancer ornot) based on the activities of two genes A and B:

ID gene A gene Bcancer (+1)or not (−1)

1 310 150 +12 190 160 +13 280 120 +14 310 170 +15 290 120 +16 200 100 −17 180 130 −18 240 110 −19 150 150 −110 150 110 −1

80

100

120

140

160

180

200

100 150 200 250 300 350 400A

ctiv

ity o

f gen

e B

Activity of gene A

Ichiro Takeuchi, Nagoya Institute of Technology 18/24

Page 37: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

A Binary Classification Example

▶ We want to determine the classification boundary so thatwe can diagnose or classify a new patient.

80

100

120

140

160

180

200

100 150 200 250 300 350 400

Act

ivit

y o

f g

en

e B

Activity of gene A

Ichiro Takeuchi, Nagoya Institute of Technology 19/24

Page 38: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

A Binary Classification Example

▶ We want to determine the classification boundary so thatwe can diagnose or classify a new patient.

80

100

120

140

160

180

200

100 150 200 250 300 350 400

Act

ivit

y o

f g

en

e B

Activity of gene A

Ichiro Takeuchi, Nagoya Institute of Technology 19/24

Page 39: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

A Binary Classification Example

▶ We want to determine the classification boundary so thatwe can diagnose or classify a new patient.

80

100

120

140

160

180

200

100 150 200 250 300 350 400

Act

ivit

y o

f g

en

e B

Activity of gene A

Ichiro Takeuchi, Nagoya Institute of Technology 19/24

Page 40: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Formulation of Binary Classifier Training

▶ Training Set (with n instances and d features):

{xi, yi}ni=1, xi = [xi1 . . . xid]⊤ ∈ Rd, yi ∈ {−1,+1}

▶ (Linear) Classifier:

f(x) = w0 +w⊤x, y =

{−1, if f(x) < 0+1, if f(x) > 0

▶ Optimization Problem:

minw,w0

n∑i=1

L(yi, f(xi))︸ ︷︷ ︸loss term

+ λR(w)︸ ︷︷ ︸regularization term

.

Ichiro Takeuchi, Nagoya Institute of Technology 20/24

Page 41: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Support Vector Machine (SVM)

▶ SVM has been one of the most popular binary classifiersin the last decade.

80

100

120

140

160

180

200

100 150 200 250 300 350 400

Act

ivit

y o

f g

en

e B

Activity of gene A

80

100

120

140

160

180

200

100 150 200 250 300 350 400

Act

ivit

y o

f g

en

e B

Activity of gene A

80

100

120

140

160

180

200

100 150 200 250 300 350 400

Act

ivit

y o

f g

en

e B

Activity of gene A

(a) (b) (c)

Which boundary is the best (although all the three boundaries per-

fectly classify the 10 training instances)?

Ichiro Takeuchi, Nagoya Institute of Technology 21/24

Page 42: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

R

▶ R is a freely available language and environment forstatistical computing

▶ Many computer scientists and statisticians developpackage

▶ Many practitioners use package

▶ Useful for your own research (data analysis, visualization)

▶ Introduce some useful packages in this course

▶ Your final presentation should be prepared using R

▶ To install, google “CRAN”

Ichiro Takeuchi, Nagoya Institute of Technology 22/24

Page 43: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

How to set up your account in CSE

▶ Make decisions by 5pm today

▶ For each 研究室, only one of the students should sendthe list of

▶ 名前▶ 学籍番号▶ 基盤 ID▶ 所属専攻▶ 所属研究室▶ 学年 (M1, M2)▶ メールアドレス

to [email protected] by 5pm Apr 09th.

▶ To issue your account, you need to put all the requiredinformation to 統一データベース

▶ If you already have an account, you have nothing to do

Ichiro Takeuchi, Nagoya Institute of Technology 23/24

Page 44: Advanced Lecture on Statistical Data Analysis (Lecture 01)takeuchi/T/NIPm/NipM01_web.pdf · About this course Instructor: Ichiro Takeuchi Language: English (Japanese questions are

Hope to see you in the next week

Ichiro Takeuchi, Nagoya Institute of Technology 24/24