Introduction - Columbia University · Introduction Frank Wood July 6, 2010. Introduction Overview...

Introduction

Frank Wood

July 6, 2010

IntroductionOverview of Topics

Introduction

I Data mining is the search for patterns in large collections ofdata

I Pattern recognition is concerned with automatically findingpatterns in data

I Machine learning is pattern recognition with concern forcomputational tractability

Example : Handwritten Digit Identification

Figure: Hand written digits from the USPS

Approaches to Identifying Digits

Goal

I Build a machine that can identify digits automatically

Approaches

I Hand craft a set of rules that separate each digit from the next

I Set of rules invariably grows large and unwieldy and requiresmany “exceptions”

I “Learn” a set of models for each digit automatically fromlabeled training data.

Formalism

I Each digit is 28x28 pixel image

I Vectorized into a 784 entry vector x

Machine learning approach

I Obtain a of N digits {x1, . . . , xN} called the training set.

I Label (by hand) the training set to produce a label or “target”t for each digit image x

I Learn a function y(x) which takes an image x as input andreturns an output in the same “format” as the target vector.

�

�

0 1

−1

0

1

Figure: Chapter 1, Figure 2

t

x

y(xn,w)

tn

xn


�

�

��

0 1

−1

0

1

Figure: Chapter 1, Figure 4a

�

�

��

0 1

−1

0

1

Figure: Chapter 1, Figure 4b

�

�

��

0 1

−1

0

1

Figure: Chapter 1, Figure 4c

�

�

��

0 1

−1

0

1

Figure: Chapter 1, Figure 4d

�

��

0 3 6 90

0.5

1TrainingTest


�

�

��

0 1

−1

0

1


�

�

� ��

0 1

−1

0

1


��

� ��−35 −30 −25 −200

0.5

1TrainingTest


}

}ci

rjyj

xi

nij


��

�

��

��


��


��

�


�

��


xδx

p(x) P (x)


N (x|µ, σ2)

x

2σ

µ


x

p(x)

xn

N (xn|µ, σ2)


(a)

(b)

(c)


t

xx0

2σy(x0,w)

y(x,w)

p(t|x0,w, β)


�

�

0 1

−1

0

1


run 1

run 2

run 3

run 4


��

��

0 0.25 0.5 0.75 10

0.5

1

1.5

2


x1

D = 1


x1

x2

D = 2


x1

x2

x3

D = 3


�

volu

me

frac

tion

��

��

��

��

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1


��

��

��

��

0 2 40

1

2


R1 R2

x0 x̂

p(x, C1)

p(x, C2)

x


x

p(C1|x) p(C2|x)

0.0

1.0θ

reject region


��

��

�

clas

s de

nsiti

es

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5


�

��

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2


t

xx0

y(x0)

y(x)

p(t|x0)


��

��

� ��

−2 −1 0 1 20

1

2


��

��

��

� ��

−2 −1 0 1 20

1

2


prob

abili

ties

��

0

0.25

0.5


xa bxλ

chord

xλ

f(x)


Introduction - Columbia University · Introduction Frank Wood July 6, 2010. Introduction Overview...

Documents

Transcript of Introduction - Columbia University · Introduction Frank Wood July 6, 2010. Introduction Overview...