ML-03 Logistic Regression - IITKGP

32
CS 60050 Machine Learning Classification: Logistic Regression Some slides taken from course materials of Andrew Ng

Transcript of ML-03 Logistic Regression - IITKGP

CS60050MachineLearning

Classification:LogisticRegression

Some slides taken from course materials of Andrew Ng

AndrewNg

Classification

Email:Spam/NotSpam?OnlineTransactions:Fraudulent /Genuine?Tumor:Malignant/Benign?

0:“NegativeClass”(e.g.,benigntumor)1:“PositiveClass”(e.g.,malignanttumor)

AndrewNg

TumorSize

Malignant?

(Yes)1

(No)0

Canwesolvetheproblemusinglinearregression?

AndrewNg

TumorSize

Thresholdclassifieroutput at0.5:

If,predict“y=1”

If,predict“y=0”

Malignant?

(Yes)1

(No)0

Canwesolvetheproblemusinglinearregression?E.g., fitastraightlineanddefineathresholdat0.5

AndrewNg

TumorSize

Thresholdclassifieroutput at0.5:

If,predict“y=1”

If,predict“y=0”

Malignant?

(Yes)1

(No)0

Canwesolvetheproblemusinglinearregression?E.g., fitastraightlineanddefineathresholdat0.5

Failureduetoaddinganewpoint

AndrewNg

Classification:y=0or1

canbe>1or<0

LogisticRegression:

Anotherdrawbackofusinglinearregressionforthisproblem

Whatweneed:

AndrewNg

SigmoidfunctionLogisticfunction

LogisticRegressionModelWant

0

Ausefulproperty:easytocomputedifferentialatanypoint

AndrewNg

InterpretationofHypothesisOutput

=estimatedprobabilitythaty=1oninputx

Tellpatientthat70%chanceoftumorbeingmalignant

Example:If

“probabilitythaty=1,givenx,parameterized by”

AndrewNg

Logisticregression

Supposepredict““if

predict““if

AndrewNg

Logisticregression

Supposepredict““if

predict““if

When𝛩Tx ≥0

When𝛩Tx <0

AndrewNg

Separatingtwoclassesofpoints• Weareattemptingtoseparatetwogivensets/classesofpoints

• Separatetworegionsofthefeaturespace• ConceptofDecisionBoundary• Findingagooddecisionboundary=>learnappropriatevaluesfortheparameters𝛩

AndrewNg

x1

x2

DecisionBoundary

1 2 3

1

2

3

AndrewNg

x1

x2

DecisionBoundary

1 2 3

1

2

3

Predictif

Howtogettheparametervalues– willbediscussedsoon

AndrewNg

Non-linear decisionboundaries

x1

x2

1-1

-1

1

Wecanlearnmorecomplexdecisionboundarieswhere thehypothesisfunctioncontainshigherorderterms.

(rememberpolynomialregression)

AndrewNg

Non-linear decisionboundaries

x1

x2

Predictif

1-1

-1

1

Howtogettheparametervalues– willbediscussedsoon

CostfunctionforLogisticRegression

Howtogettheparameter values?

AndrewNg

Trainingset:

Howtochooseparameters?

mexamples

AndrewNg

Costfunction

Linearregression:

Howeverthiscostfunction isnon-convexforthehypothesisoflogisticregression.

Squarederrorcostfunction:

AndrewNg

Logisticregressioncostfunction

Cost

AndrewNg

Logisticregressioncostfunction

AndrewNg

Logisticregressioncostfunction

AndrewNg

Logisticregressioncostfunction

Tofitparameters:Thiscostfunctionisconvex

AndrewNg

GradientDescent

Want:Repeat

(simultaneouslyupdateall)

AndrewNg

GradientDescent

Want:

(simultaneouslyupdateall)

Repeat

Algorithmlooksidenticaltolinearregression,butthehypothesisfunctionisdifferentforlogisticregression.

AndrewNg

Thuswecangradientdescenttolearnparametervalues,andhencecomputeforanewinput:

Output

Tomakeapredictiongivennew :

=estimatedprobabilitythaty=1oninputx

AndrewNg

Howtousetheestimatedprobability?• Refrainingfromclassifyingunlessconfident• Rankingitems• Multi-classclassification

Multi-classclassification:onevs.all

AndrewNg

Multiclassclassification

Newsarticletagging:Politics,Sports,Movies,Religion,…

Medicaldiagnosis:Notill,Cold,Flu,Fever

Weather:Sunny,Cloudy,Rain,Snow

AndrewNg

x1

x2

x1

x2

Binaryclassification: Multi-classclassification:

AndrewNg

x1

x2

One-vs-all(one-vs-rest):

Class1:Class2:Class3:

x1

x2

x1

x2

x1

x2

AndrewNg

One-vs-all

Trainalogisticregressionclassifierforeachclasstopredicttheprobabilitythat.

Onanewinput,tomakeaprediction,picktheclassthatmaximizes

AndrewNg

AdvancedOptimization algorithms (notpartofthiscourse)

Optimizationalgorithms:- Gradientdescent- Conjugategradient- BFGS- L-BFGS

Advantagesoftheotheralgorithms:- Noneedtomanuallypicklearningrate- Oftenconvergesfasterthangradientdescent

Disadvantages:- Morecomplex