RandomForests for Biomedical Applications
-
Upload
salford-systems -
Category
Technology
-
view
509 -
download
5
Transcript of RandomForests for Biomedical Applications
![Page 1: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/1.jpg)
Random Forests and Archetypal Analysis of Dietary Patterns in the
Cache County Memory Study
Adele CutlerDepartment of Mathematics and Statistics
Utah State University
This research is partially supported by NIH 1R15AG037392-01
![Page 2: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/2.jpg)
04/13/2023 ADMC 2012 2
Leo Breiman, 1928 - 2005
1984 CART
1994 Archetypal Analysis
1996 Bagging
2001 Random Forests
![Page 3: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/3.jpg)
04/13/2023 ADMC 2012 3
Example 1: Cookbooks and nutrition
• 300 recipes from 12 cookbooks• Nutritional information (33 predictors)
Joint work with Sheryl AguilarMichael Lefevre
Center for Advanced Nutrition, Utah State University
![Page 4: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/4.jpg)
04/13/2023 ADMC 2012 4
Example 2: The Cache County Memory Study
![Page 5: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/5.jpg)
04/13/2023 ADMC 2012 5
Utah
![Page 6: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/6.jpg)
04/13/2023 ADMC 2012 6
Cache Valley, Utah
![Page 7: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/7.jpg)
04/13/2023 ADMC 2012 7
Utah State University
![Page 8: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/8.jpg)
04/13/2023 ADMC 2012 8
Example 2: The Cache County Memory Study
• Prospective, population-based study, 1995-2006
• 5,092 people aged 65 and over • Food frequency questionnaire
Joint work with Heidi Wengreen2
Chris Corcoran1
Anna Quach1
1Mathematics and Statistics, Utah State University2Nutrition and Food Sciences, Utah State University
![Page 9: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/9.jpg)
04/13/2023 ADMC 2012 9
Outline
• RF for cookbooks• RF for memory study
• Archetypes for cookbooks• Archetypes for memory
• Current development
![Page 10: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/10.jpg)
04/13/2023 ADMC 2012 10
Random Forests
![Page 11: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/11.jpg)
04/13/2023 ADMC 2012 11
Random Forests for Classification
Example 1 (cookbooks): • Predict the author of a recipe based on the
nutritional content of the recipe• Which variables are important?
Example 2 (memory): • Predict a person’s dementia status (yes/no) based
on their diet• Which variables are important?
![Page 12: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/12.jpg)
04/13/2023 ADMC 2012 12
Example 1: Cookbooks
?
![Page 13: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/13.jpg)
04/13/2023 ADMC 2012 13
Cookbooks: Predict the author?Cookbook Error Rate (%)AHA 4
Cookbook 2 40
Cookbook 3 59
Cookbook 4 95
Cookbook 5 79
Cookbook 6 65
Cookbook 7 91
Cookbook 8 64
Cookbook 9 15
Cookbook 10 92
Cookbook 11 72
Cookbook 12 85
Error rate 63%
![Page 14: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/14.jpg)
04/13/2023 ADMC 2012 14
Cookbooks: important variables
Error rate 63% • Fat (g)• Saturated fat (g)• Cholesterol (mg)• Monounsaturated fat (g)• Sodium (mg)• Protein (g)• Vitamin B6 (mg)
![Page 15: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/15.jpg)
04/13/2023 ADMC 2012 15
Two Classes: AHA versus the rest
Error rate 2.33% • Fat (g)• Monounsaturated fat (g)• Saturated fat (g)• Sodium (mg)• Polyunsaturated fat (g)• Protein (g)• Cholesterol (mg)
![Page 16: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/16.jpg)
04/13/2023 ADMC 2012 16
Two Classes: AHA versus the rest
Error rate 2.33%
Predicted Other AHA Error Rate %
Other 274 1 0.36AHA 6 19 24.00
Class weights!
![Page 17: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/17.jpg)
04/13/2023 ADMC 2012 17
Class Weights
80% weight AHA, 20% weight “Other”Error rate 5%
Predicted Other AHA Error Rate %
Other 261 14 5.1AHA 1 24 4.0
![Page 18: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/18.jpg)
04/13/2023 ADMC 2012 18
Salford and R
Different weighting schemes!
• R weights only take a weighted bootstrap sample
• Salford does weighted splits as well
![Page 19: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/19.jpg)
04/13/2023 ADMC 2012 19
R Weights
0 5 10 15 20 25 30
0.0
00
0.0
05
0.0
10
0.0
15
0.0
20
0.0
25
Variable number
Imp
ort
an
ce
![Page 20: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/20.jpg)
04/13/2023 ADMC 2012 20
Important Variables (R)
• Fat (g)• Monounsaturated fat (g)• Saturated fat (g)• Sodium (mg)• Polyunsaturated fat (g)
For all weights!
![Page 21: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/21.jpg)
04/13/2023 ADMC 2012 21
Salford Weights
0 5 10 15 20 25 30
02
46
81
01
21
4
Variable number
Imp
ort
an
ce
![Page 22: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/22.jpg)
04/13/2023 ADMC 2012 22
Important Variables (Salford)
• Carb (g)• Polyunsaturated fat (g)• Caffeine (mg)• Cholesterol (mg)• Fiber (g)• Protein (g)• Trans fat (g)• Fat (g)
![Page 23: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/23.jpg)
04/13/2023 ADMC 2012 23
Example 2: Memory
![Page 24: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/24.jpg)
04/13/2023 ADMC 2012 24
Memory: Predict survivalError rate 28.2%
Predicted Survived Died Error Rate %
Survived 839 591 41Died 359 1584 18
![Page 25: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/25.jpg)
04/13/2023 ADMC 2012 25
Memory: Predict dementia?Error rate 28.1%
Predicted Normal Demented Error Rate %
Normal 2410 24 0.99Demented 926 13 98.62
![Page 26: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/26.jpg)
04/13/2023 ADMC 2012 26
Class Weights
30% weight “Other”70% weight AHAError rate 38%
Predicted Normal Demented Error Rate %
Normal 1646 788 32Demented 508 431 54
![Page 27: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/27.jpg)
04/13/2023 ADMC 2012 27
Salford Weights
0 20 40 60 80
0.0
00
0.0
02
0.0
04
0.0
06
0.0
08
0.0
10
Variable number
Imp
ort
an
ce
![Page 28: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/28.jpg)
04/13/2023 ADMC 2012 28
R Weights
0 20 40 60 80
01
23
4
Variable number
Imp
ort
an
ce
![Page 29: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/29.jpg)
04/13/2023 ADMC 2012 29
Salford Weights
0 20 40 60 80
02
46
81
0
imp
ort
an
ce
![Page 30: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/30.jpg)
04/13/2023 ADMC 2012 30
R Weights
0 20 40 60 80
0.0
00
0.0
02
0.0
04
0.0
06
0.0
08
0.0
10
imp
ort
an
ce
![Page 31: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/31.jpg)
04/13/2023 ADMC 2012 31
Summary
• R weights only take a weighted bootstrap sample
• Salford does weighted splits as well• Salford weights can give different variable
importance
![Page 32: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/32.jpg)
04/13/2023 ADMC 2012 32
Archetypes
Cutler and Breiman, Technometrics, 1994
• Unsupervised learning, alternative to cluster analysis or PCA
• Summarize data using a fixed number of “archetypes”
• The archetypes are extremes• Data points are approximated by mixtures of
archetypes
![Page 33: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/33.jpg)
04/13/2023 ADMC 2012 33
Archetypes
Example 1 (cookbooks): • Archetypes represent extreme recipes• A particular recipe is approximated as a mixture of
the extreme recipes
Example 2 (memory):• Archetypes represent extreme dietary patterns• A person’s diet is approximated as a mixture of the
extreme diets
![Page 34: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/34.jpg)
04/13/2023 ADMC 2012 34
Example 1: Cookbooks
?
![Page 35: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/35.jpg)
04/13/2023 ADMC 2012 35
Cookbooks: How many archetypes?
2 4 6 8 10
30
03
50
40
04
50
Number of archetypes
RM
SE
![Page 36: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/36.jpg)
04/13/2023 ADMC 2012 36
1
2 3
![Page 37: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/37.jpg)
04/13/2023 ADMC 2012 37
1
2 3
4
![Page 38: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/38.jpg)
04/13/2023 ADMC 2012 38
1
2
3 4
5
![Page 39: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/39.jpg)
04/13/2023 ADMC 2012 39
1
2
3 4
5
6
![Page 40: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/40.jpg)
04/13/2023 ADMC 2012 40
1
2
3
4 5
6
7
![Page 41: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/41.jpg)
04/13/2023 ADMC 2012 41
1
2
3 4
5
6
Cookbook 1
![Page 42: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/42.jpg)
04/13/2023 ADMC 2012 42
1
2
3 4
5
6
Cookbook 2
![Page 43: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/43.jpg)
04/13/2023 ADMC 2012 43
1
2
3 4
5
6
Cookbook 3
![Page 44: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/44.jpg)
04/13/2023 ADMC 2012 44
1
2
3 4
5
6
Cookbook 4
![Page 45: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/45.jpg)
04/13/2023 ADMC 2012 45
1
2
3 4
5
6
Cookbook 5
![Page 46: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/46.jpg)
04/13/2023 ADMC 2012 46
1
2
3 4
5
6
Cookbook 6
![Page 47: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/47.jpg)
04/13/2023 ADMC 2012 47
1
2
3 4
5
6
Cookbook 7
![Page 48: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/48.jpg)
04/13/2023 ADMC 2012 48
1
2
3 4
5
6
Cookbook 8
![Page 49: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/49.jpg)
04/13/2023 ADMC 2012 49
1
2
3 4
5
6
Cookbook 9
![Page 50: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/50.jpg)
04/13/2023 ADMC 2012 50
1
2
3 4
5
6
Cookbook 10
![Page 51: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/51.jpg)
04/13/2023 ADMC 2012 51
1
2
3 4
5
6
Cookbook 11
![Page 52: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/52.jpg)
04/13/2023 ADMC 2012 52
1
2
3 4
5
6
Cookbook 12
![Page 53: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/53.jpg)
04/13/2023 ADMC 2012 53
Example 2: Memory
![Page 54: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/54.jpg)
04/13/2023 ADMC 2012 54
Memory: How many archetypes?
2 4 6 8 10
2.5
3.0
3.5
4.0
Number of archetypes
RM
SE
![Page 55: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/55.jpg)
04/13/2023 ADMC 2012 55
1
2
3
4 5
6
7
Color = Dementia Status
![Page 56: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/56.jpg)
04/13/2023 ADMC 2012 56
1
2
3
4 5
6
7
Color = Smoking Status
![Page 57: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/57.jpg)
04/13/2023 ADMC 2012 57
1
2
3
4 5
6
7
Color = Drinking Status
![Page 58: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/58.jpg)
04/13/2023 ADMC 2012 58
1
2
3
4 5
6
7
Color = Age
![Page 59: RandomForests for Biomedical Applications](https://reader036.fdocuments.net/reader036/viewer/2022062514/557d7555d8b42a2c428b4908/html5/thumbnails/59.jpg)
04/13/2023 ADMC 2012 59
Development
Random forests:• Regression version• Case weights • Probability estimates• Proximities• Multivariate outcomes
Archetypes:• Archetypal functions• Archetypal sets