Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: [email protected]
-
Upload
tyrone-harrison -
Category
Documents
-
view
34 -
download
2
description
Transcript of Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: [email protected]
![Page 1: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/1.jpg)
III 1
Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: [email protected] URL: rutcor.rutgers.edu/~salexe
Datascope - a new tool for Logical Analysis of Data (LAD)
Datascope - a new tool for Logical Analysis of Data (LAD)
DIMACS Mixer Series,September 19, 2002
![Page 2: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/2.jpg)
III 2
DatasetHidden
Function LAD
Approximation
LAD - ProblemLAD - Problem
![Page 3: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/3.jpg)
III 3
LAD - PatternsLAD - Patterns
Positive Pattern Negative Pattern
![Page 4: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/4.jpg)
III 4
LAD - Theories, Models, Classifications
LAD - Theories, Models, Classifications
Positive Theory Negative Theory
Model
![Page 5: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/5.jpg)
III 5
Datascope FunctionsDatascope Functions
Support Set IdentificationSpace DiscretizationPattern DetectionModel ConstructionDiscriminant / Prognostic IndexClassificationFeature Analysis
![Page 6: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/6.jpg)
III 6
Matlab Solver
InternalSolver
Datascope DataflowDatascope Dataflow
DiscretizationDiscretization
Significant Features
Cutpoints,Support Set
FeatureAnalysis
Pattern Space
DiagnosisPrognosis
RiskStratification
Pandect GenerationPandect Generation
Discriminant ConstructionDiscriminant Construction UserExcel Model
Pre-ProcessingPre-Processing
Raw Data
Theories/ModelsTheories/Models
Pattern Report
![Page 7: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/7.jpg)
III 7
1. Support Set Identification1. Support Set Identification
Selects Small Subset of Significant Features
Preserves Hidden Knowledge
Feature Ranking Criteria:
Statistical Correlation with Outcome
Combinatorial Entropy
Distribution Monotonicity
Class Separation
Envelope Eccentricity
E.g., 10 proteins selected out of
15,144
E.g., 10 proteins selected out of
15,144
![Page 8: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/8.jpg)
III 8
DataData
Spreadsheet OrientedOLE (via Clipboard)/ Excel Spreadsheet /
dBase tables
Training / Test GenerationBootstrapk-FoldingJackknife
New FeaturesCorrelation
![Page 9: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/9.jpg)
III 9
Data: Training/Test Data: Training/Test
![Page 10: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/10.jpg)
III 10
2. Space Discretization 2. Space Discretization
Criteria:
Entropy
Correlation with Output
Bins (equipartitioning)
Intervals
Clustered
Class Separation
Criteria:
Entropy
Correlation with Output
Bins (equipartitioning)
Intervals
Clustered
Class Separation
Parameter Choice: User Defined Minimizing Support Set
Parameter Choice: User Defined Minimizing Support Set
Quality Measures: Entropy Separability
Quality Measures: Entropy Separability
![Page 11: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/11.jpg)
III 11
Entropy Correlation with Output Bins
Intervals Clustered Class Separation
![Page 12: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/12.jpg)
III 12
3. Generation of Maximal Patterns 3. Generation of Maximal Patterns
Pattern Type Selection:Prime
ConesIntervals
Spanned
Pattern Type Selection:Prime
ConesIntervals
Spanned
Parameter Bound Settings:Prevalence:
% of positive observations% of negative observations
Homogeneity:on positive patternson negative patterns
Degree.
Parameter Bound Settings:Prevalence:
% of positive observations% of negative observations
Homogeneity:on positive patternson negative patterns
Degree.Post-Generation Filters:
By CharacteristicsMaximalityStrongness
Post-Generation Filters:By CharacteristicsMaximalityStrongness
![Page 13: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/13.jpg)
III 13
16 xi.e.,
Positive Patterns
Positive Patterns
Pattern Definition Training Set Test Set Pattern Definition Training Set Test Set
![Page 14: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/14.jpg)
III 14
Negative Patterns
Negative Patterns
Pattern Definition Training Set Test Set Pattern Definition Training Set Test Set
![Page 15: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/15.jpg)
III 15
4. Theories and Models 4. Theories and Models
PandectPandect
Theory Selection:via:
Greedy
Bottleneck Greedy
Lexicographic Greedy
Set Covering Heuristics
Theory Selection:via:
Greedy
Bottleneck Greedy
Lexicographic Greedy
Set Covering Heuristics
Model Selection:
2 Set-Covering Problems
Quadratic Set-Covering Problem
Model Selection:
2 Set-Covering Problems
Quadratic Set-Covering Problem
![Page 16: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/16.jpg)
III 16
4. Example (Model)4. Example (Model)
![Page 17: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/17.jpg)
III 17
5. Example (Classification)5. Example (Classification)
![Page 18: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/18.jpg)
III 18
![Page 19: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/19.jpg)
III 19
![Page 20: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/20.jpg)
III 20
![Page 21: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/21.jpg)
III 21
5. Discriminants 5. Discriminants
Weight Selection Methods:Direct
1. Prognostic Index
2. Weighted Prognostic Index
LP-Based
3. Distance Maximizing Separator (SVM)
4. Cost Minimizing Separator
5. Expected Value Separator
NLP-Based
6. Regression in Pattern Space (ANN)
7. Best Correlation with Output
(weighted sums of patterns)
![Page 22: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/22.jpg)
III 22
Prognostic Index Weighted Prognostic Expected Value Index Separator
Distance Maximizing Cost Minimizing Best Correlation Separator Separator with Output
![Page 23: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/23.jpg)
III 23
%83.93%25.2*5.0%75.97%40.8*5.0%24.884
1
Accuracy
Sensitivity Specificity
![Page 24: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/24.jpg)
III 24
![Page 25: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/25.jpg)
III 25
Reporting Reporting
CutpointsDiscretized SpacePandectCoverage of Observations by PatternsPattern Report (Compact/Full Versions)Theories/ModelsAttribute AnalysisLog File
![Page 26: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/26.jpg)
III 26
Pattern Space
Pattern Space
Training
+ + + + + + - - -Patterns
Test
+ + + + + + - - -Patterns
Positive Observations
Unclassified Observations
Negative Observations
![Page 27: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/27.jpg)
III 27
ClusteredPattern Space
ClusteredPattern Space
![Page 28: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/28.jpg)
III 28
AccuracySensitivitySpecificity
AccuracySensitivitySpecificity
BootstrapK-FoldingJackknife
BootstrapK-FoldingJackknife
Validation ProceduresValidation Procedures
Stratified Random Partition
Stratified Random Partition
LAD Model on Training Set
LAD Model on Training Set
Performance Evaluation
Performance Evaluation
Raw Data
![Page 29: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/29.jpg)
III 29
Special FeaturesSpecial Features
Generating User Model Generation(Excel Files)
Datascope Macro LanguageMultiple and Complex Experiments
Interface with Other Applications
(Datascope Server)
![Page 30: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/30.jpg)
III 30
Performance Performance C o m p a r a t i v e r e s u l t s f o r 5 d a t a s e t s f r o m t h e I r v i n e r e p o s i t o r yL A D a n d o t h e r 3 3 a l g o r i t h m s
D a t a s e t N a i v e B e s t ( B ) W o r s t ( W ) L A D ( L ) A c c u r a c y
b c w 3 5 3 9 3 . 5 0 . 0 8 9 9 . 4 8 %b l d 4 2 2 8 4 3 2 7 . 8 - 0 . 0 1 1 0 0 . 2 8 %
h e a 4 4 1 4 3 4 1 4 . 7 0 . 0 4 9 9 . 1 9 %p i d 3 3 2 2 3 1 2 1 . 5 - 0 . 0 6 1 0 0 . 6 4 %v o t 3 9 4 6 4 . 6 0 . 3 0 9 9 . 3 8 %
a v e r a g e 0 . 0 7 9 9 . 7 9 %
WBL 1:
Tjen-Sien Lim, Wei-Yin Loh and Yu-Shan Shin A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms, by, Machine Learning, 40, 203-229 (2000)
http://www.ics.uci.edu/~mlearn/MLRepository.html
![Page 31: Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ e-mail: salexe@rutcor.rutgers](https://reader036.fdocuments.net/reader036/viewer/2022081504/56813174550346895d97ec74/html5/thumbnails/31.jpg)
III 31
LAD Case Studies LAD Case Studies
Assessing Long-Term Mortality Risk After Exercise Electrocardiography
Ovarian Cancer Detection Using Proteomic Data
Combinatorial Analysis of Breast Cancer Data from Image Cytometry and Gene Expression Microarrays
Cell Proliferation on Medical Implants
Country Risk Rating