Object Orie’d Data Analysis, Last Time Organizational Matters .
Object Orie’d Data Analysis, Last Time
-
Upload
lesley-good -
Category
Documents
-
view
38 -
download
0
description
Transcript of Object Orie’d Data Analysis, Last Time
![Page 1: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/1.jpg)
Object Orie’d Data Analysis, Last Time
• Kernel Embedding– Embed data in higher dimensional
manifold
– Gives greater flexibility to linear methods
• Support Vector Machines– Aimed at very non-Gaussian Data
– E.g. from Kernel Embedding
• Distance Weighted Discrimination– HDLSS Improvement of SVM
![Page 2: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/2.jpg)
Support Vector MachinesGraphical View, using Toy Example:
• Find separating plane
• To maximize distances from data to plane
• In particular smallest distance
• Data points closest are called
support vectors
• Gap between is called margin
![Page 3: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/3.jpg)
Support Vector MachinesGraphical View, using Toy Example:
![Page 4: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/4.jpg)
Support Vector MachinesForgotten last time,
Important Extension:
Multi-Class SVMs
Hsu & Lin (2002)
Lee, Lin, & Wahba (2002)
• Defined for “implicit” version
• “Direction Based” variation???
![Page 5: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/5.jpg)
Support Vector MachinesAlso forgotten last time,
Toy examples illustrating
Explicit vs. Implicit
Kernel Embedding
As well as effect of window width, σ
on Gaussian kernel embedding
![Page 6: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/6.jpg)
SVMs, Comput’n & Embedding
For an “Embedding Map”,
e.g.
Explicit Embedding:
Maximize:
Get classification function:
• Straightforward application of embedding
• But loses inner product advantage
x
2x
xx
i ji
jijijiiD xxyyL,
21
n
iiii bxxyxf
1
![Page 7: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/7.jpg)
SVMs, Comput’n & EmbeddingImplicit Embedding:
Maximize:
Get classification function:
• Still defined only via inner products• Retains optimization advantage• Thus used very commonly• Comparison to explicit embedding?• Which is “better”???
i ji
jijijiiD xxyyL,
21
n
iiii bxxyxf
1
![Page 8: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/8.jpg)
Support Vector MachinesTarget Toy Data set:
![Page 9: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/9.jpg)
Support Vector MachinesExplicit Embedding, window σ = 0.1:
![Page 10: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/10.jpg)
Support Vector MachinesExplicit Embedding, window σ = 1:
![Page 11: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/11.jpg)
Support Vector MachinesExplicit Embedding, window σ = 10:
![Page 12: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/12.jpg)
Support Vector MachinesExplicit Embedding, window σ = 100:
![Page 13: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/13.jpg)
Support Vector MachinesNotes on Explicit Embedding:
• Too small Poor generalizability
• Too big miss important regions
• Classical lessons from kernel smoothing
• Surprisingly large “reasonable region”
• I.e. parameter less critical (sometimes?)
Also explore projections (in kernel space)
![Page 14: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/14.jpg)
Support Vector MachinesKernel space projection, window σ =
0.1:
![Page 15: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/15.jpg)
Support Vector MachinesKernel space projection, window σ = 1:
![Page 16: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/16.jpg)
Support Vector MachinesKernel space projection, window σ =
10:
![Page 17: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/17.jpg)
Support Vector MachinesKernel space projection, window σ =
100:
![Page 18: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/18.jpg)
Support Vector MachinesKernel space projection, window σ =
100:
![Page 19: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/19.jpg)
Support Vector MachinesNotes on Kernel space projection:
• Too small – Great separation
– But recall, poor generalizability
• Too big no longer separable
• As above:– Classical lessons from kernel smoothing
– Surprisingly large “reasonable region”
– I.e. parameter less critical (sometimes?)
Also explore projections (in kernel space)
![Page 20: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/20.jpg)
Support Vector MachinesImplicit Embedding, window σ = 0.1:
![Page 21: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/21.jpg)
Support Vector MachinesImplicit Embedding, window σ = 0.5:
![Page 22: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/22.jpg)
Support Vector MachinesImplicit Embedding, window σ = 1:
![Page 23: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/23.jpg)
Support Vector MachinesImplicit Embedding, window σ = 10:
![Page 24: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/24.jpg)
Support Vector MachinesNotes on Implicit Embedding:
• Similar Large vs. Small lessons
• Range of “reasonable results”
Seems to be smaller
(note different range of windows)
• Much different “edge” behavior
Interesting topic for future work…
![Page 25: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/25.jpg)
Distance Weighted Discrim’n 2-d Visualization:
Pushes PlaneAway FromData
All PointsHave SomeInfluence
n
i iw r1,
1min
![Page 26: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/26.jpg)
Distance Weighted Discrim’n References for more on DWD:
• Current paper:Marron, Todd and Ahn (2007)
• Links to more papers:Ahn (2007)
• JAVA Implementation of DWD:caBIG (2006)
• SDPT3 Software:Toh (2007)
![Page 27: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/27.jpg)
Batch and Source Adjustment
Recall from Class Notes 8/28/07• For Stanford Breast Cancer Data (C.
Perou)• Analysis in Benito, et al (2004)
Bioinformatics, 20, 105-114. https://genome.unc.edu/pubsup/dwd/
• Adjust for Source Effects– Different sources of mRNA
• Adjust for Batch Effects– Arrays fabricated at different times
![Page 28: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/28.jpg)
Source Batch Adj: Biological Class Col. &
Symbols
![Page 29: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/29.jpg)
Source Batch Adj: Source Colors
![Page 30: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/30.jpg)
Source Batch Adj: PC 1-3 & DWD direction
![Page 31: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/31.jpg)
Source Batch Adj: DWD Source Adjustment
![Page 32: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/32.jpg)
Source Batch Adj: Source Adj’d, PCA view
![Page 33: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/33.jpg)
Source Batch Adj: S. & B Adj’d, Adj’d PCA
![Page 34: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/34.jpg)
3434
UNC, Stat & OR
Why not adjust using SVM?
Major Problem: Proj’d Distrib’al
Shape
Triangular Dist’ns (opposite skewed)
Does not allow sensible rigid shift
![Page 35: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/35.jpg)
3535
UNC, Stat & OR
Why not adjust using SVM?
Nicely Fixed by DWD
Projected Dist’ns near Gaussian
Sensible to shift
![Page 36: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/36.jpg)
3636
UNC, Stat & OR
Why not adjust by means?
DWD is complicated: value added?
Xuxin Liu example…
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
(although still not perfect)
![Page 37: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/37.jpg)
3737
UNC, Stat & OR
Why not adjust by means?
Next time:
Work in before and after, slides like
138-141 from DWDnormPreso.ppt
In Research/Bioinf/caBIG
![Page 38: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/38.jpg)
3838
UNC, Stat & OR
Twiddle ratios of subtypes
![Page 39: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/39.jpg)
3939
UNC, Stat & OR
Why not adjust by means?
DWD robust against non-
proportional subtypes…
Mathematical Statistical Question:
Are there mathematics behind this?
(will answer next time…)
![Page 40: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/40.jpg)
4040
UNC, Stat & OR
DWD in Face Recognition
Face Images as Data
(with M. Benito & D. Peña)
Male – Female Difference?
Discrimination Rule?
Represented as long
vector of pixel gray levels
Registration is critical
![Page 41: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/41.jpg)
4141
UNC, Stat & OR
DWD in Face Recognition, (cont.)
Registered Data
Shifts and scale
Manually chosen
To align eyes and mouth
Still large variation
See males vs. females???
![Page 42: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/42.jpg)
4242
UNC, Stat & OR
DWD in Face Recognition , (cont.)
DWD Direction
Good separation
Images “make
sense”
Garbage at ends?
(extrapolation
effects?)
![Page 43: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/43.jpg)
4343
UNC, Stat & OR
DWD in Face Recognition , (cont.)
Unregistered Version
Much blurrier
Since features don’t
properly line up
Nonlinear Variation
But DWD still works
Can see M-F
differ’ce?
![Page 44: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/44.jpg)
4444
UNC, Stat & OR
DWD in Face Recognition , (cont.)
Interesting summary:
Jump between
means
(in DWD direction)
Clear separation of
Maleness vs.
Femaleness
![Page 45: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/45.jpg)
4545
UNC, Stat & OR
DWD in Face Recognition , (cont.)
Fun Comparison:
Jump between means
(in SVM direction)
Also distinguishes
Maleness vs.
Femaleness
But not as well as
DWD
![Page 46: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/46.jpg)
4646
UNC, Stat & OR
DWD in Face Recognition , (cont.)
Analysis of difference: Project onto normals SVM has “small gap” (feels noise artifacts?) DWD “more informative” (feels real structure?)
![Page 47: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/47.jpg)
4747
UNC, Stat & OR
DWD in Face Recognition, (cont.)
Current Work:
Focus on “drivers”:
(regions of interest)
Relation to Discr’n?
Which is “best”?
Lessons for human
perception?
![Page 48: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/48.jpg)
Outcomes DataBreast Cancer Study (C. M. Perou):
• Outcome of interest = death or survival
• Connection with gene expression?
Approach:
• Treat death vs. survival during study as “classes”
• Find “direction that best separates the classes”
![Page 49: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/49.jpg)
Outcomes DataFind “direction that best separates the classes”
![Page 50: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/50.jpg)
Outcomes DataFind “direction that best separates the classes”
![Page 51: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/51.jpg)
Outcomes DataFind “direction that best separates classes”• DWD Projection• SVM Projection
Notes:• SVM is “better separated”?
(recall “data piling” problems….)• DWD gives “more spread between sub-
populations”???(perhaps “more stable”?)
![Page 52: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/52.jpg)
Outcomes DataWhich is “better”?
Approach:
• Find “genes of interest”
• To maximize loadings of direction vectors
(reflects pointing in gene direction)
• Show intensity plot (of gene expression)
• Using top 20 genes in each direction
![Page 53: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/53.jpg)
Outcomes DataWhich is “better”?
• Study with gene intensity plot
• Order cases by DWD score (proj’n)
• Order genes by DWD loading (vec. entry)
• Reduce to top & bottom 20
• Color map shows gene expression
• Shows genes that drive classification
• Gene names also available
![Page 54: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/54.jpg)
Outcomes DataWhich is “better”? DWD direction
![Page 55: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/55.jpg)
Outcomes DataWhich is “better”? SVM direction
![Page 56: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/56.jpg)
Outcomes DataWhich is “better”?
• DWD finds genes showing better separation
• SVM genes are less informative
![Page 57: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/57.jpg)
Outcomes DataHow about Centroid (Mean Diff’nce) Method?
![Page 58: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/58.jpg)
Outcomes DataHow about Centroid (Mean Diff’nce) Method?
![Page 59: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/59.jpg)
Outcomes DataCompare to DWD direction
![Page 60: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/60.jpg)
Outcomes DataHow about Centroid (Mean Diff’nce) Method?
• Best yet, in terms of red – green plot?
• Projections unacceptably mixed?
• These are two different goals…
• Try for trade-off?
Scale space approach???
• Interesting philosophical point:
Very simple things often “best”
![Page 61: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/61.jpg)
Outcomes DataWeakness of above analysis:
• Some with “genes prone to disease” have not died yet
• Perhaps can see in DWD expression plot?
Better analysis:
• More sophisticated survival methods
• Work in progress w/ Brent Johnson, Danyu Li, Helen Zhang
![Page 62: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/62.jpg)
Distance Weighted Discrim’n 2=d Visualization:
Pushes PlaneAway FromData
All PointsHave SomeInfluence
n
i iw r1,
1min
![Page 63: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/63.jpg)
Distance Weighted Discrim’n Maximal Data Piling
![Page 64: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/64.jpg)
HDLSS Discrim’n Simulations
Main idea:
Comparison of
• SVM (Support Vector Machine)
• DWD (Distance Weighted Discrimination)
• MD (Mean Difference, a.k.a. Centroid)
Linear versions, across dimensions
![Page 65: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/65.jpg)
HDLSS Discrim’n Simulations
Overall Approach:
• Study different known phenomena– Spherical Gaussians
– Outliers
– Polynomial Embedding
• Common Sample Sizes
• But wide range of dimensions
25 nn
1600,400,100,40,10d
![Page 66: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/66.jpg)
HDLSS Discrim’n Simulations
Spherical Gaussians:
![Page 67: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/67.jpg)
HDLSS Discrim’n Simulations
Spherical Gaussians:• Same setup as before• Means shifted in dim 1 only,• All methods pretty good• Harder problem for higher dimension• SVM noticeably worse• MD best (Likelihood method)• DWD very close to MD• Methods converge for higher
dimension??
2.21
![Page 68: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/68.jpg)
HDLSS Discrim’n Simulations
Outlier Mixture:
![Page 69: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/69.jpg)
HDLSS Discrim’n Simulations
Outlier Mixture:80% dim. 1 , other dims 020% dim. 1 ±100, dim. 2 ±500, others 0• MD is a disaster, driven by outliers• SVM & DWD are both very robust• SVM is best• DWD very close to SVM (insig’t
difference)• Methods converge for higher dimension??
Ignore RLR (a mistake)
2.21
![Page 70: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/70.jpg)
HDLSS Discrim’n Simulations
Wobble Mixture:
![Page 71: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/71.jpg)
HDLSS Discrim’n Simulations
Wobble Mixture:80% dim. 1 , other dims 020% dim. 1 ±0.1, rand dim ±100, others
0• MD still very bad, driven by outliers• SVM & DWD are both very robust• SVM loses (affected by margin push)• DWD slightly better (by w’ted influence)• Methods converge for higher dimension??
Ignore RLR (a mistake)
2.21
![Page 72: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/72.jpg)
HDLSS Discrim’n Simulations
Nested Spheres:
![Page 73: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/73.jpg)
HDLSS Discrim’n SimulationsNested Spheres:
1st d/2 dim’s, Gaussian with var 1 or C2nd d/2 dim’s, the squares of the 1st dim’s(as for 2nd degree polynomial embedding)
• Each method best somewhere• MD best in highest d (data non-Gaussian)• Methods not comparable (realistic)• Methods converge for higher
dimension??• HDLSS space is a strange place
Ignore RLR (a mistake)
![Page 74: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/74.jpg)
HDLSS Discrim’n SimulationsConclusions:
• Everything (sensible) is best sometimes• DWD often very near best• MD weak beyond Gaussian
Caution about simulations (and examples):• Very easy to cherry pick best ones• Good practice in Machine Learning
– “Ignore method proposed, but read paper for useful comparison of
others”
![Page 75: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/75.jpg)
HDLSS Discrim’n Simulations
Caution: There are additional players
E.g. Regularized Logistic Regression
looks also very competitive
Interesting Phenomenon:
All methods come together
in very high dimensions???
![Page 76: Object Orie’d Data Analysis, Last Time](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812f58550346895d94e68f/html5/thumbnails/76.jpg)
HDLSS Discrim’n Simulations
Can we say more about:
All methods come together
in very high dimensions???
Mathematical Statistical Question:
Mathematics behind this???
(will answer next time)