Download - Multinomial classification and application of ML

Transcript
Page 1: Multinomial classification and application of ML

Machine/Deep Learningwith Theano

Softmax classification : Multinomial classificationApplication & Tips : Learning rate, data preprocessing, overfitting

Deep Neural Nets for Everyone

Page 2: Multinomial classification and application of ML

Multinomial Classification

Softmax classification

Page 3: Multinomial classification and application of ML

Logistic Regression

𝐻 𝐿(𝑋 )=π‘Šπ‘‹

𝐻 𝐿 ( 𝑋 )=𝑍

𝑔 (𝑍 )=1

1+π‘’βˆ’π‘

𝐻𝑅 ( 𝑋 )=𝑔 (𝐻 𝐿(𝑋 ))

𝑋

π‘Š

𝑍 π‘Œ

: Prediction ( 0 ~ 1 ) : Real Value ( 0 or 1 )

Page 4: Multinomial classification and application of ML

Binomial Classification

?μ™Όμͺ½μ˜ 그림은 원 일까 ?

yes/no

Page 5: Multinomial classification and application of ML

Binomial Classification

리의 κ²½ν–₯μ„±μ„ μ˜ κ²½ν–₯μ„±

π‘₯1

π‘₯2

원𝑋

π‘Š

𝑍 π‘Œ

λ‹€κ°ν˜•

Page 6: Multinomial classification and application of ML

Multinomial Classification

ABC

?μ™Όμͺ½μ˜ 그림은 A/B/C 쀑 λ¬΄μ—‡μΌκΉŒ ?

Page 7: Multinomial classification and application of ML

π‘₯1

π‘₯2

AB C

Multinomial Classification

Page 8: Multinomial classification and application of ML

π‘₯1

π‘₯2

AB C

Multinomial Classification

𝑋

π‘Š

𝑍 π‘Œ

A?

Page 9: Multinomial classification and application of ML

π‘₯1

π‘₯2

AB C

Multinomial Classification

𝑋

π‘Š

𝑍 π‘Œ

?B

Page 10: Multinomial classification and application of ML

π‘₯1

π‘₯2

AB C

Multinomial Classification

𝑋

π‘Š

𝑍 π‘Œ

?C

Page 11: Multinomial classification and application of ML

π‘₯1

π‘₯2

AB C

Multinomial Classification

𝑋

π‘Š

𝑍 π‘Œ

𝑋

π‘Š

𝑍 π‘Œ

𝑋

π‘Š

𝑍 π‘ŒA?

B?

C?

Page 12: Multinomial classification and application of ML

Multinomial Classification

𝑋

π‘Š

𝑍[𝑀1 𝑀2 𝑀3 ][π‘₯1π‘₯2π‘₯3] ΒΏ [𝑀1π‘₯1+𝑀2 π‘₯2+𝑀3π‘₯3 ]

Page 13: Multinomial classification and application of ML

Multinomial Classification

𝑋

π‘Š

𝑍 [𝑀 𝐴1 𝑀𝐴2 𝑀 𝐴3 ] [π‘₯1π‘₯2π‘₯3] ΒΏ [𝑀 𝐴1π‘₯1+𝑀 𝐴2π‘₯2+𝑀𝐴 3π‘₯3 ]

𝑋

π‘Š

𝑍 [𝑀𝐡1 𝑀𝐡2 𝑀𝐡 3 ] [π‘₯1π‘₯2π‘₯3] ΒΏ [𝑀𝐡1π‘₯1+𝑀𝐡2π‘₯2+𝑀𝐡3π‘₯3 ]

𝑋

π‘Š

𝑍 [𝑀𝐢 1 𝑀𝐢 2 𝑀𝐢 3 ] [π‘₯1π‘₯2π‘₯3] ΒΏ [𝑀𝐢 1π‘₯1+𝑀𝐢 2π‘₯2+𝑀𝐢 3 π‘₯3 ]

Page 14: Multinomial classification and application of ML

Multinomial Classification

𝑋

π‘Š

𝑍

[𝑀𝐴1 π‘₯1+𝑀 𝐴2π‘₯2+𝑀𝐴3 π‘₯3𝑀𝐡 1π‘₯1+𝑀𝐡 2π‘₯2+𝑀𝐡 3π‘₯3𝑀𝐢 1π‘₯1+𝑀𝐢 2π‘₯2+𝑀𝐢 3π‘₯3 ]𝑋

π‘Š

𝑍 [π‘₯1π‘₯2π‘₯3]¿𝑋

π‘Š

𝑍

[𝑀𝐴1 𝑀 𝐴2 𝑀 𝐴3

𝑀𝐡1 𝑀𝐡2 𝑀𝐡 3𝑀𝐢 1 𝑀𝐢 2 𝑀𝐢 3

]

[𝐻 𝐴(𝑋 )𝐻𝐡(𝑋 )𝐻𝐢 (𝑋 )]ΒΏ

Page 15: Multinomial classification and application of ML

Multinomial Classification

[𝐻 𝐴(𝑋 )𝐻𝐡(𝑋 )𝐻𝐢 (𝑋 )] [ 1505βˆ’0.1]

example ABC

How Simi-lar?

ΒΏ

Page 16: Multinomial classification and application of ML

Multinomial Classification : Softmax Function

Score Probability

𝑯 𝑨 ( 𝑿 )=𝒁 𝑨

𝑯 𝑩 ( 𝑿 )=𝒁𝑩

𝑯π‘ͺ ( 𝑿 )=𝒁π‘ͺ

𝒀 𝑨

𝒀 𝑩

𝒀 π‘ͺ

(2) (1)

Page 17: Multinomial classification and application of ML

Multinomial Classification

π‘‹π‘Š 𝐴

𝑍 𝐴

π‘‹π‘Š 𝐡

𝑍𝐡

π‘‹π‘Š 𝐢

𝑍𝐢

ABC

softmax hot encoding(find maximum)

1.0

0 .0

0 .0

π‘Œ 𝐡

π‘Œ 𝑐

π‘Œ 𝐴0 .8

0 .15

0 .05

Page 18: Multinomial classification and application of ML

Cost Function

Cross Entropy Function

Page 19: Multinomial classification and application of ML

Entropy Function

(Information) Entropy

𝐻 (𝑝 )=βˆ’βˆ‘ 𝑝 (π‘₯) log𝑝 (π‘₯)

β€’ ν™•λ₯  뢄포 p 에 λ‹΄κΈ΄ λΆˆν™•μ‹€μ„±μ„ λ‚˜νƒ€λ‚΄λŠ” μ§€ν‘œ

β€’ 이 값이 클 수둝 μΌμ •ν•œ λ°©ν–₯μ„±κ³Ό κ·œμΉ™μ„±μ΄ μ—†λŠ” chaos

β€’ p λΌλŠ” λŒ€μƒμ„ ν‘œν˜„ν•˜κΈ°μœ„ν•΄ ν•„μš”ν•œ μ •λ³΄λŸ‰ (bit)

Page 20: Multinomial classification and application of ML

Cross Entropy Function

Cross Entropy

𝐻 (𝑝 ,π‘ž )=βˆ’βˆ‘ 𝑝 (π‘₯) logπ‘ž(π‘₯ )

β€’ 두 ν™•λ₯  뢄포 p, q 사이에 μ‘΄μž¬ν•˜λŠ” μ •λ³΄λŸ‰μ„ κ³„μ‚°ν•˜λŠ” 방법

β€’  p->q 둜 정보λ₯Ό λ°”κΎΈκΈ° μœ„ν•΄ ν•„μš”ν•œ μ •λ³΄λŸ‰ (bit)

Page 21: Multinomial classification and application of ML

Cross Entropy Cost Function

π‘‹π‘Š 𝐴

𝑍 𝐴

π‘‹π‘Š 𝐡

𝑍𝐡

π‘‹π‘Š 𝐢

𝑍𝐢

π‘Œ 𝐴

π‘Œ 𝐡

π‘Œ 𝑐

: Prediction ( 0 ~ 1 ) : Real Value ( 0 or 1 )𝐷 (π‘Œ 𝑖 ,π‘Œ 𝑖 )=βˆ’βˆ‘ π‘Œ 𝑖 logπ‘Œ 𝑖

Page 22: Multinomial classification and application of ML

Cross Entropy Cost Function

[π‘Œ 𝐴

π‘Œ 𝐡

π‘Œ 𝐢]=[100] [π‘Œ 𝐴

π‘Œ π΅π‘Œ 𝐢

]=[100]

𝐷 (π‘Œ 𝑖 ,π‘Œ 𝑖)=βˆ’βˆ‘ π‘Œ 𝑖 logπ‘Œ 𝑖

Page 23: Multinomial classification and application of ML

Cross Entropy Cost Function

𝐷 (π‘Œ 𝑖 ,π‘Œ 𝑖 )=βˆ’βˆ‘ π‘Œ 𝑖 logπ‘Œ 𝑖

[π‘Œ 𝐴

π‘Œ 𝐡

π‘Œ 𝐢]=[100] [π‘Œ 𝐴

π‘Œ π΅π‘Œ 𝐢

]=[010]

Page 24: Multinomial classification and application of ML

Logistic Cost VS Cross Entropy

binomial classification 의 경우 각각 였직 2 가지 경우의 Real Data 와 H(x) 값이 λ‚˜μ˜¬ 수 μžˆλ‹€ . [01 ][10 ]

μœ„ 행렬은 λ‹€μŒκ³Ό 같이 ν‘œν˜„ ν•  수 μžˆλ‹€ . [ 𝐻 (π‘₯)1βˆ’π» (π‘₯)]𝐻 (π‘₯ ) , 𝑦 {01

[ 𝑦1βˆ’ 𝑦 ]

Page 25: Multinomial classification and application of ML

Logistic Cost VS Cross Entropy

Cross Entropy Cost Function에 λŒ€μž…ν•˜λ©΄ 𝐻 (𝐻 (π‘₯ ), 𝑦 )=βˆ’[ 𝑦1βˆ’ 𝑦 ] βˆ™ log [ 𝐻 (π‘₯ )

1βˆ’π» (π‘₯ )]

Page 26: Multinomial classification and application of ML

Cross Entropy Cost Function

𝐿= 1π‘βˆ‘

𝑛𝐷𝑛 (π‘Œ ,π‘Œ )=βˆ’ 1π‘βˆ‘ (βˆ‘π‘Œ 𝑖 logπ‘Œ 𝑖)

N 개의 training set 에 λŒ€ν•œ Cost λ“€μ˜ ν•©

Page 27: Multinomial classification and application of ML

Application & Tips

Learning RateData Preprocessing

Overshooting

Page 28: Multinomial classification and application of ML

Gradient Descent Function

π‘Š=π‘Š βˆ’π›Ό πœ•πœ•π‘Š πΆπ‘œπ‘ π‘‘(π‘Š )

Learning Rate

Page 29: Multinomial classification and application of ML

Learning rate : Overshooting

𝐿(π‘Š )

π‘Š

Page 30: Multinomial classification and application of ML

Learning rate : Too small

𝐿(π‘Š )

π‘Š

Page 31: Multinomial classification and application of ML

Data Preprocessing

𝐿(π‘Š )

π‘Š 𝑀1

𝑀2

Page 32: Multinomial classification and application of ML

Data Preprocessing

𝑀1

𝑀2

π‘Š=π‘Š βˆ’π›Ό πœ•πœ•π‘Š πΆπ‘œπ‘ π‘‘(π‘Š )

λ³€ν•˜λ©΄μ„œ 각 weight 값듀에 λ―ΈμΉ˜λŠ” 영ν–₯이 λ‹€λ₯Ό λ•Œ μ μ ˆν•œ Learning rate 을 μ°ΎκΈ°κ°€ νž˜λ“€μ–΄μ§„λ‹€ .

Page 33: Multinomial classification and application of ML

Data Preprocessing : Standardization

𝑀𝑖 β€²=π‘€π‘–βˆ’πœ‡π‘–

𝜎 𝑖

의 평균

의 ν‘œμ€€νŽΈμ°¨

Page 34: Multinomial classification and application of ML

Overfitting

β€’ training data 에 κ³Όλ„ν•˜κ²Œ μ΅œμ ν™” λ˜λŠ” ν˜„μƒ

β€’ real data 에 λŒ€ν•΄μ„  잘 λ™μž‘ν•˜μ§€ μ•ŠλŠ”λ‹€ .

Page 35: Multinomial classification and application of ML

Overfitting

π‘₯2

원

π‘₯1

π‘₯2

원

π‘₯1

Page 36: Multinomial classification and application of ML

Overfitting

β€’ λ§Žμ€ μ–‘μ˜ training data 둜 ν•™μŠ΅ μ‹œν‚¨λ‹€ .

β€’ feature() 의 개수λ₯Ό 쀄인닀 .

β€’ Regularization

Solution:

Page 37: Multinomial classification and application of ML

Overfitting : Regularization

𝐿= 1π‘βˆ‘

𝑛𝐷𝑛 (π‘Œ ,π‘Œ )+Ξ»βˆ‘π‘Š 2

β€’ weight κ°€ λ„ˆλ¬΄ 큰 값을 가지지 μ•Šλ„λ‘ ν•œλ‹€ . => Cost ν•¨μˆ˜κ°€ ꡴곑이 μ‹¬ν•˜μ§€ μ•Šλ„λ‘ μ‘°μ •ν•œλ‹€ .

Regularization Strength

Page 38: Multinomial classification and application of ML

Overfitting : Regularization

𝐿= 1π‘βˆ‘

𝑛𝐷𝑛 (π‘Œ ,π‘Œ )+Ξ»βˆ‘π‘Š 2

Regularization Strength

Page 39: Multinomial classification and application of ML

Application & Tips

Learning and Test data sets

Page 40: Multinomial classification and application of ML

Training, validation and test sets

β€’ training data 에 λŒ€ν•΄μ„œλŠ” 이미 정닡을 memorize ν•œ μƒνƒœμ΄κΈ° λ•Œλ¬Έμ— μ‹€μ œ real data 에 잘 μž‘λ™ ν•˜λŠ”μ§€ 확인을 ν•  수 μ—†λ‹€ . => Test data ν•„μš” !

β€’ ν•™μŠ΅λœ machine 에 λŒ€ν•΄μ„œ μ μ ˆν•œ learning rate 와 regularization strengt λ₯Ό μ°ΎκΈ° μœ„ν•œ validation μž‘μ—…μ΄ μžˆμ–΄μ•Ό ν•œλ‹€ . => Validation data ν•„μš” !

Page 41: Multinomial classification and application of ML

Online Learning

Data

Model

β€’ λ„ˆλ¬΄ λ§Žμ€ μ–‘μ˜ 데이터가 μžˆμ„ λ•Œ , λΆ„ν• ν•˜μ—¬ λ‚˜λˆ„μ–΄ ν•™μŠ΅μ‹œν‚¨λ‹€ .

β€’ Data κ°€ μ§€μ†μ μœΌλ‘œ μœ μž… λ˜λŠ” 경우 μ‚¬μš©λ˜κΈ°λ„ ν•œλ‹€ .