Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2....

18

Transcript of Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2....

Page 1: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on
Page 2: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

Stat306:FindingRela1onshipsinData.

Lecture15Sec1ons4.1and4.2

Page 3: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

Chapter4–Variableselec1onandaddi1onaldiagnos1cs

Page 4: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

Chapter4–Variableselec1onandaddi1onaldiagnos1cs

4.1VariableSelec1onalgorithms4.2Cross-valida1onandout-ofsampleassessment4.3Addi1onaldiagnos1cs4.4Transformsandnonlinearity4.5Diagnos1csfordatacollectedsequen1allyin1me

Page 5: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

Observa(onal Experimental

GoalisExplana(on 1. 2.

GoalisPredic(on 3. 4.

Fourcategoriesofscien(ficstudy

Page 6: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

GoalisExplana(on

1. Whatques1onsdoyouwanttoask?

2.  Defineanappropriatemodel.

3.  Definethehypothesesthatcorrespondtotheques1onsofinterest.

4.  Collectthedata.

5.  Fitthemodelasdefinedearlier.

6.  Answeryourques1onswithuncertaintyquan1fica1on(i.e.withp-values,ConfidenceIntervals).

Page 7: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

GoalisPredic(on

1. Whatdoyouwanttopredict?

2.  Defineanappropriatemetricforevalua1ngqualityofpredic1ons(e.g.RMSE,absolutepredic1onerror,ROCcurve).

3.Collectthedata.4.  Separateyourdatainto“train”and“holdout”subsets.5.  Fitmanydifferentmodelstothe“train”subsetofthedata.

6.  Pickthemodelthatis“best”(accordingtoyourchosenoutcome)formakingpredic1onsonthe“holdout”subsetofthedata.

7.  Notethatp-valuesandConfidenceintervalsarenotvalid.

Page 8: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)

1.  Collectthedata.

2.  Selecta“model-selec1on”criteria(e.g.AdjustedR2orCp)3. Iden1fyallpossibleregressionmodelswithallpossible

combina1onsofthepredictors.4.  Iden1fyasubsetofmodelsthatarebestintermsofthechosen

“model-selec1on”criteria.

5. Evaluateandrefinethemodelsiden1fiedinStep4bydoingresidualanalyses,transforma1ons,checkingmodelassump1ons.

6. Picka“best”modelfromtherefinedsubsetofmodelsthatmeetsassump1onsandallowsyoutodosomeexplana1ons.

Page 9: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

4.1VariableSelec(onalgorithms

•  Evenwithasmallnumberofpossiblecovariates,therearealotpossiblemodelsonecouldfit.

•  Andthinkaboutallthepossibleinterac1onterms!

•  Thiscanmakethingsalmostimpossible.

GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)

Page 10: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

4.1VariableSelec(onalgorithms

GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)

TheCpsta1s1c,a“model-selec1on”criteria

TheCpsta(s(candtheadjusted-R2areverysimilar

Page 11: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

4.1VariableSelec(onalgorithms

GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)

1.  ForwardSelec(on

2.  BackwardElimina(on

Page 12: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

4.1VariableSelec(onalgorithms

GoalisPredic(on…butyoualsowantsomeexplana(ons(warning,thisisabitoutdated)

1.  ForwardSelec(on

-startwithonevariable,addonevariableata1me

2.BackwardElimina(on

-startwithfullmodel(allpoten1alvariables),removeonevariableata1me

Page 13: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

4.2Train/Test

GoalisPredic(on

Page 14: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

GoalisPredic(on

1. Whatdoyouwanttopredict?

2.  Defineanappropriatemetricforevalua1ngqualityofpredic1ons(e.g.RMSE,absolutepredic1onerror,ROCcurve).

3.Collectthedata.4.  Separateyourdatainto“train”and“holdout”subsets.5.  Fitmanydifferentmodelstothe“train”subsetofthedata.

6.  Pickthemodelthatis“best”(accordingtoyourchosenoutcome)formakingpredic1onsonthe“holdout”subsetofthedata.

7.  Notethatp-valuesandConfidenceintervalsarenotvalid.

Page 15: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

4.2Cross-valida(on

GoalisPredic(on

Page 16: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

GoalisPredic(on

1.  Whatdoyouwanttopredict?

2.  Defineanappropriatemetricforevalua1ngqualityofpredic1ons(e.g.RMSE,absolutepredic1onerror,ROCcurve).3.Collectthedata.4.  SeparateyourdataintoKrandomsubsets.

5.  Forkin1:K-  Fityourmodelusingallthedataexceptthekthsubset.-  Calculatemetric(e.g.predic1onerror)basedonfibngthemodeltothekthsubsetofthedata.

6.  CalculateaverageofKmetricsforeachmodel.

7.  Choose“bestmodel”basedonaveragedmetric.8.  Notethatp-valuesandConfidenceintervalsarenotvalid.

Page 17: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

MeanAbsolutePredic(onError:

12

8

6

9

5

Foreachmodel,wedo5-foldCV:

K-averagedmetric=40/5=8

Metric:

Source:hgp://blog.goldenhelix.com/goldenadmin/cross-valida1on-for-genomic-predic1on-in-svs/

Page 18: Stat 306 - happydog · Stat 306: Finding Relaonships in Data. Lecture 15 Secons 4.1 and 4.2. Chapter 4 – Variable selec1on and addi1onal diagnos1cs Chapter 4 – Variable selec1on

4.2Leave-one-out

GoalisPredic(on