Machine/Deep Learningwith Theano
Softmax classification : Multinomial classificationApplication & Tips : Learning rate, data preprocessing, overfitting
Deep Neural Nets for Everyone
Multinomial Classification
Softmax classification
Logistic Regression
π» πΏ(π )=ππ
π» πΏ ( π )=π
π (π )=1
1+πβπ
π»π ( π )=π (π» πΏ(π ))
π
π
π π
: Prediction ( 0 ~ 1 ) : Real Value ( 0 or 1 )
Binomial Classification
?μΌμͺ½μ κ·Έλ¦Όμ μ μΌκΉ ?
yes/no
Binomial Classification
리μ κ²½ν₯μ±μ μ κ²½ν₯μ±
π₯1
π₯2
μπ
π
π π
λ€κ°ν
Multinomial Classification
ABC
?μΌμͺ½μ κ·Έλ¦Όμ A/B/C μ€ λ¬΄μμΌκΉ ?
π₯1
π₯2
AB C
Multinomial Classification
π₯1
π₯2
AB C
Multinomial Classification
π
π
π π
A?
π₯1
π₯2
AB C
Multinomial Classification
π
π
π π
?B
π₯1
π₯2
AB C
Multinomial Classification
π
π
π π
?C
π₯1
π₯2
AB C
Multinomial Classification
π
π
π π
π
π
π π
π
π
π πA?
B?
C?
Multinomial Classification
π
π
π[π€1 π€2 π€3 ][π₯1π₯2π₯3] ΒΏ [π€1π₯1+π€2 π₯2+π€3π₯3 ]
Multinomial Classification
π
π
π [π€ π΄1 π€π΄2 π€ π΄3 ] [π₯1π₯2π₯3] ΒΏ [π€ π΄1π₯1+π€ π΄2π₯2+π€π΄ 3π₯3 ]
π
π
π [π€π΅1 π€π΅2 π€π΅ 3 ] [π₯1π₯2π₯3] ΒΏ [π€π΅1π₯1+π€π΅2π₯2+π€π΅3π₯3 ]
π
π
π [π€πΆ 1 π€πΆ 2 π€πΆ 3 ] [π₯1π₯2π₯3] ΒΏ [π€πΆ 1π₯1+π€πΆ 2π₯2+π€πΆ 3 π₯3 ]
Multinomial Classification
π
π
π
[π€π΄1 π₯1+π€ π΄2π₯2+π€π΄3 π₯3π€π΅ 1π₯1+π€π΅ 2π₯2+π€π΅ 3π₯3π€πΆ 1π₯1+π€πΆ 2π₯2+π€πΆ 3π₯3 ]π
π
π [π₯1π₯2π₯3]ΒΏπ
π
π
[π€π΄1 π€ π΄2 π€ π΄3
π€π΅1 π€π΅2 π€π΅ 3π€πΆ 1 π€πΆ 2 π€πΆ 3
]
[π» π΄(π )π»π΅(π )π»πΆ (π )]ΒΏ
Multinomial Classification
[π» π΄(π )π»π΅(π )π»πΆ (π )] [ 1505β0.1]
example ABC
How Simi-lar?
ΒΏ
Multinomial Classification : Softmax Function
Score Probability
π― π¨ ( πΏ )=π π¨
π― π© ( πΏ )=ππ©
π―πͺ ( πΏ )=ππͺ
π π¨
π π©
π πͺ
(2) (1)
Multinomial Classification
ππ π΄
π π΄
ππ π΅
ππ΅
ππ πΆ
ππΆ
ABC
softmax hot encoding(find maximum)
1.0
0 .0
0 .0
π π΅
π π
π π΄0 .8
0 .15
0 .05
Cost Function
Cross Entropy Function
Entropy Function
(Information) Entropy
π» (π )=ββ π (π₯) logπ (π₯)
β’ νλ₯ λΆν¬ p μ λ΄κΈ΄ λΆνμ€μ±μ λνλ΄λ μ§ν
β’ μ΄ κ°μ΄ ν΄ μλ‘ μΌμ ν λ°©ν₯μ±κ³Ό κ·μΉμ±μ΄ μλ chaos
β’ p λΌλ λμμ νννκΈ°μν΄ νμν μ 보λ (bit)
Cross Entropy Function
Cross Entropy
π» (π ,π )=ββ π (π₯) logπ(π₯ )
β’ λ νλ₯ λΆν¬ p, q μ¬μ΄μ μ‘΄μ¬νλ μ 보λμ κ³μ°νλ λ°©λ²
β’ p->q λ‘ μ 보λ₯Ό λ°κΎΈκΈ° μν΄ νμν μ 보λ (bit)
Cross Entropy Cost Function
ππ π΄
π π΄
ππ π΅
ππ΅
ππ πΆ
ππΆ
π π΄
π π΅
π π
: Prediction ( 0 ~ 1 ) : Real Value ( 0 or 1 )π· (π π ,π π )=ββ π π logπ π
Cross Entropy Cost Function
[π π΄
π π΅
π πΆ]=[100] [π π΄
π π΅π πΆ
]=[100]
π· (π π ,π π)=ββ π π logπ π
Cross Entropy Cost Function
π· (π π ,π π )=ββ π π logπ π
[π π΄
π π΅
π πΆ]=[100] [π π΄
π π΅π πΆ
]=[010]
Logistic Cost VS Cross Entropy
binomial classification μ κ²½μ° κ°κ° μ€μ§ 2 κ°μ§ κ²½μ°μ Real Data μ H(x) κ°μ΄ λμ¬ μ μλ€ . [01 ][10 ]
μ νλ ¬μ λ€μκ³Ό κ°μ΄ νν ν μ μλ€ . [ π» (π₯)1βπ» (π₯)]π» (π₯ ) , π¦ {01
[ π¦1β π¦ ]
Logistic Cost VS Cross Entropy
Cross Entropy Cost Functionμ λμ νλ©΄ π» (π» (π₯ ), π¦ )=β[ π¦1β π¦ ] β log [ π» (π₯ )
1βπ» (π₯ )]
Cross Entropy Cost Function
πΏ= 1πβ
ππ·π (π ,π )=β 1πβ (βπ π logπ π)
N κ°μ training set μ λν Cost λ€μ ν©
Application & Tips
Learning RateData Preprocessing
Overshooting
Gradient Descent Function
π=π βπΌ πππ πΆππ π‘(π )
Learning Rate
Learning rate : Overshooting
πΏ(π )
π
Learning rate : Too small
πΏ(π )
π
Data Preprocessing
πΏ(π )
π π€1
π€2
Data Preprocessing
π€1
π€2
π=π βπΌ πππ πΆππ π‘(π )
λ³νλ©΄μ κ° weight κ°λ€μ λ―ΈμΉλ μν₯μ΄ λ€λ₯Ό λ μ μ ν Learning rate μ μ°ΎκΈ°κ° νλ€μ΄μ§λ€ .
Data Preprocessing : Standardization
π€π β²=π€πβππ
π π
μ νκ·
μ νμ€νΈμ°¨
Overfitting
β’ training data μ κ³Όλνκ² μ΅μ ν λλ νμ
β’ real data μ λν΄μ μ λμνμ§ μλλ€ .
Overfitting
π₯2
μ
π₯1
π₯2
μ
π₯1
Overfitting
β’ λ§μ μμ training data λ‘ νμ΅ μν¨λ€ .
β’ feature() μ κ°μλ₯Ό μ€μΈλ€ .
β’ Regularization
Solution:
Overfitting : Regularization
πΏ= 1πβ
ππ·π (π ,π )+Ξ»βπ 2
β’ weight κ° λ무 ν° κ°μ κ°μ§μ§ μλλ‘ νλ€ . => Cost ν¨μκ° κ΅΄κ³‘μ΄ μ¬νμ§ μλλ‘ μ‘°μ νλ€ .
Regularization Strength
Overfitting : Regularization
πΏ= 1πβ
ππ·π (π ,π )+Ξ»βπ 2
Regularization Strength
Application & Tips
Learning and Test data sets
Training, validation and test sets
β’ training data μ λν΄μλ μ΄λ―Έ μ λ΅μ memorize ν μνμ΄κΈ° λλ¬Έμ μ€μ real data μ μ μλ νλμ§ νμΈμ ν μ μλ€ . => Test data νμ !
β’ νμ΅λ machine μ λν΄μ μ μ ν learning rate μ regularization strengt λ₯Ό μ°ΎκΈ° μν validation μμ μ΄ μμ΄μΌ νλ€ . => Validation data νμ !
Online Learning
Data
Model
β’ λ무 λ§μ μμ λ°μ΄ν°κ° μμ λ , λΆν νμ¬ λλμ΄ νμ΅μν¨λ€ .
β’ Data κ° μ§μμ μΌλ‘ μ μ λλ κ²½μ° μ¬μ©λκΈ°λ νλ€ .
Top Related