Overall Relative Cancer Risk (BMI >40 vs 18.5 to 24.9):
description
Transcript of Overall Relative Cancer Risk (BMI >40 vs 18.5 to 24.9):
Overall Relative Cancer Risk (BMI >40 vs 18.5 to 24.9):M: 1.52 (1.13 – 2.05): F: 1.62 (1.40 – 1.87)
Increased risk of colorectal, pancreatic, liver, esophagus, kidney, multiple myeloma, non-Hodgkin’s lymphoma, gallbladder, prostate, breast, cervical, ovarian, uterine
“Current patterns of overweight and obesity in the United States could account for 14% of all deaths from cancer in men and 20% of those in women”
American Cancer Society Study900,000 adults
Calle et al., NEJM, 348:1625-1638, 2003
BMI of 26 vs 21Coronary Heart Disease: 2x increase Hypertension: 2-3x increaseType II Diabetes: 8x increase
Weight Change of 15 kgCoronary Heart Disease: 2x increase Hypertension: 2-3x increaseType II Diabetes: 6x increase
Guidelines for Healthy WeightNurses’ Health Study: Willett et al., NEJM, 341:427-434, 1999
Caloric Restriction (CR)
CR is an experimental paradigm in which the dietary/caloric intake of a
group of animals is reduced relative to that eaten by ad libitum fed controls
Caloric restriction is the most potent, most robust, and most reproducible known means of reducing morbidity
and mortality in mammals
Survival Data, 1987 Cohort, Casein Diet
14001200100080060040020000
20
40
60
80
100
Days
Surv
ival
Rat
e
ALDR
DR Reduces Morbidity:Breast Cancer
• Delays tumor onset (initiation and promotion)
• Slows progression• Can modulate oncogene penetrance
– v-Ha-ras tumors decreased 67% • (Fernandes et al., PNAS 92:6494-6498, 1995)
• Can prevent carcinogen-induced tumors– 7,12-demethyl-benz(a)anthracene (60% AL,
0% DR)• Kritchevsky et al. Cancer Res 44, 3174-3177, 1984
– Even with high fat diet, tumor yield, size, burden down 93-98%
• Klurfied et al. Cancer Res 47, 2759-2762, 1987
CR “Beneficial” Effects• Lower oxidative stress
–“Better” redox balance • “Improved” glucose
metabolism– Increased insulin sensitivity–Reduced blood glucose–Reduced diabetes risk
• Reduced inflammation
How do we study complex biological/clinical problems?
How do we address such questions in humans, where our ability to manipulate
and analyze the system is limited?
High Throughputand/or Data Density Studies
• Genomics/SNPs• mRNA expression arrays• Proteomics• Small metabolites
Metabolomics: The –omics face of biochemistry
Measurement of changes in populations of low molecular weight metabolites under a
given set of conditions Fiehn
GENOMEE
NV
IRO
NM
EN
T
TRANSCRIPTOME
PROTEOME
METABOLOME
HEALTHSTATE
DISEASE STATE
GENOMEE
NV
IRO
NM
EN
T
TRANSCRIPTOME
PROTEOME
METABOLOME
HEALTHSTATE
DISEASE STATE
0.0 20.0 40.0 60.0 80.0 100.0
0.00
0.20
0.40
0.60
0.80
Retention time (minutes)
Res
pons
e (µ
A)
1
AL8AL7AL5AL1AL4AL3AL2AL6DR8DR6DR5DR7DR1DR4DR2DR3 0.00.20.40.60.81.0 Predicted
Observed Values vs. Predicted Values
Mechanistic InsightDrug DevelopmentToxicologyClassificationPredictionFunctional genomicsSub-threshold studiesOthers
Sample Collection Sample Analysis Database Curation
BioinformaticsA
ctua
l 2 SD
2 SD
3 SD
Computational Modeling of Metabolic Serotypes
Modeling Metabolic Interactions
Objectively Defining Class Identity
Following Biochemical Pathways
What we measure -- biochemically
Metabolites – small molecules
Pathways (eg, purine catabolites)
Interactive pathways (eg, amino acid metabolism)
Compound classes (eg, lipids)
Conceptually linked systemseg antioxidants, redox damage products
What we measure -- conceptuallyBiochemical constituentsExcretion productsPrecursor – product Balances (eg, redox systems)“collection depots”FluxSnapshot view of biochemistryIntegrated signal from genome and environmentShort and long term statusTemporal image Sub-threshold changes (eg (toxicology, nutrition)
Metabolomics – Some Advantages
Sensitivity “silent phenotypes”/sub-threshold effectsDiscovery
Knowledge base (ie, metabolic pathways)
Limited repertoire – simplifies possibilities(2500 non-lipid endogenous metabolites??)
Metabolome integrates signalNature and Nurture -- genome and environmentMeasurement of system status/defects
Metabolome has the fastest response time
Metabolomics – Some Disadvantages
Too Sensitive?cohort effects, site effects, time effectssample handlingindividual metabolites responsive to multiple factors
genes, environment, health status, locationexperiment design must account for all factors
controlled or fuzzy, multiple sourcesPractical
Set-up costsPossible need for multiple platforms (NMR, MS, HPLC)early industry dominance – lots of propriety dataincompatible data standards
Metabolomics Technology
=
Metabolomics Platform
Biology
AnalyticalChemistry
Data Analysis
0.0 20.0 40.0 60.0 80.0 100.0
0.00
0.20
0.40
0.60
0.80
Retention time (minutes)
Res
pons
e (µ
A)
1
AL8AL7AL5AL1AL4AL3AL2AL6DR8DR6DR5DR7DR1DR4DR2DR3 0.00.20.40.60.81.0 Predicted
Observed Values vs. Predicted Values
Mechanistic InsightDrug DevelopmentToxicologyClassificationPredictionFunctional genomicsSub-threshold studiesOthers
Sample Collection Sample Analysis Database Curation
BioinformaticsA
ctua
l 2 SD
2 SD
3 SD
Computational Modeling of Metabolic Serotypes
Modeling Metabolic Interactions
Objectively Defining Class Identity
Following Biochemical Pathways
Analytical
0.0 20.0 40.0 60.0 80.0 100.0
0.00
0.20
0.40
0.60
0.80
Retention time (minutes)
Res
pons
e (µ
A)
1
AL8AL7AL5AL1AL4AL3AL2AL6DR8DR6DR5DR7DR1DR4DR2DR3 0.00.20.40.60.81.0 Predicted
Observed Values vs. Predicted Values
Mechanistic InsightDrug Development
ToxicologyClassification
PredictionFunctional genomics
Sub-threshold studiesOthers
Sample Collection Sample Analysis Database Curation
BioinformaticsA
ctua
l 2 SD
2 SD
3 SD
Computational Modeling of Metabolic Serotypes
Modeling Metabolic Interactions
Objectively Defining Class Identity
Following Biochemical Pathways
DataAnalysis
0.0 20.0 40.0 60.0 80.0 100.0
0.00
0.20
0.40
0.60
0.80
Retention time (minutes)
Res
pons
e (µ
A)
1
AL8AL7AL5AL1AL4AL3AL2AL6DR8DR6DR5DR7DR1DR4DR2DR3 0.00.20.40.60.81.0 Predicted
Observed Values vs. Predicted Values
Mechanistic InsightDrug Development
ToxicologyClassification
PredictionFunctional genomics
Sub-threshold studiesOthers
Sample Collection Sample Analysis Database Curation
BioinformaticsA
ctua
l 2 SD
2 SD
3 SD
Computational Modeling of Metabolic Serotypes
Modeling Metabolic Interactions
Objectively Defining Class Identity
Following Biochemical Pathways
Biology
Caloric intakeas a case study
High points onlyIgnoring details, other studies,
etc
Survival Data, 1987 Cohort, Casein Diet
14001200100080060040020000
20
40
60
80
100
Days
Surv
ival
Rat
e
ALDR
Hypothesis:Long-term, low-calorie diets
induce changes in metabolism that persist throughout the
lifespan
Predictions• CR alters the sera “metabolome”
• There exists a “CR Serotype”
• …Part of “CR serotype” reflects beneficial physiological status --- ie, serotype defines health without reference to disease…
Goals1) Insights into the mechanism of CR2) Recognize CR in other organisms (e.g., non-human primates)
3) Biochemically determine the effective, long-term caloric intake of an individual (e.g., for epidemiological studies)
4) Identify predictive markers of disease (e.g., to intervene/prevent/focus resources;
focus on diseases where intervention is possible)
Model: F344 x BN F1 Rat
Overall Design:AL/CR, male/female, 5 different agesDifferent extents and duration of diets Total experiment ~36 groups, 82 cohorts.
Approach:HPLC separations with coulometric array detection(LC/LC-MS for plasma proteomics)Multilayer statistical and data analysis
Experimental Design
Analytical Stability
Biologic Variability
0.00 0.12 0.24 0.36 0.48 0.600.00
0.12
0.24
0.36
0.48
0.60
0.00 0.15 0.30 0.45 0.60 0.750.00
0.15
0.30
0.45
0.60
0.75
0.00 0.15 0.30 0.45 0.60 0.750.00
0.15
0.30
0.45
0.60
0.75
0.00 0.25 0.50 0.75 1.00 1.25 1.500.00
0.25
0.50
0.75
1.00
1.25
1.50
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.000.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
0.00 0.25 0.50 0.75 1.00 1.25 1.500.00
0.25
0.50
0.75
1.00
1.25
1.50
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.000.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
0.00 0.25 0.50 0.75 1.00 1.25 1.500.00
0.25
0.50
0.75
1.00
1.25
1.50
MDRCV0.00 0.25 0.50 0.75 1.00 1.25 1.50
ANAL
CV1
0.00
0.25
0.50
0.75
1.00
1.25
1.50
0.00 0.25 0.50 0.75 1.00 1.250.00
0.25
0.50
0.75
1.00
1.25
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.000.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
r ² = 0.01
r ² = 0.31
r ² = 0.00 r ² = 0.01r ² = 0.01
MALCV FALCV
MDRCVMALCVFALCVMALCV
ANALCV1ANALCV2ANALCV1
FDRCV
ANAL
CV1
ANAL
CV1
ANAL
CV1
FDRC
V
FALC
V
FDRC
V
MDR
CV
ANAL
CV2
ANAL
CVT
ANAL
CVT
r ² = 0.75 r ² = 0.55 r ² = 0.36
r ² = 0.31 r ² = 0.25 r ² = 0.09
AvAAvA
AL AL vs vs DRDR
Biological Biological vs vs AnalyticalAnalytical
In Rats: Biological variability 5 fold greater than analytical variability Analytical variability does not influence biological variability
Analytical vs Biological Variation
Primary Data Analysis
• Multivariate analyses are relatively noise-resistant• Minimize loss of informative metabolites
• Reduce false negatives (Type II errors)• Increase false positives (Type I errors)
Does Serotype Encode Sufficient Information to
Identify Diet Group?
Data Exploration and Classification Analysis
• Hierarchical Cluster Analysis (HCA)– Identifies natural groups in data
• Principal Component Analysis (PCA)– Finds linear combinations of original variables
that account for maximal variation
T-tests, p<0.2 ?!
ALALDRDR
ALALDRDR
ALALDRDR
ALALDRDR
ALALDRDR
ALALDRDR
Single Complete Single Complete CentroidCentroid
AutoscaleAutoscale
Range Range ScaleScale
PreprocessingPreprocessingDependentDependent
Categorization Categorization AccuracyAccuracy
100%100%
100%100%
100%100%100%100%100%100%Linkage MethodLinkage Method
DependentDependentCategorization Categorization
AccuracyAccuracy
0.20.20.40.40.60.60.80.81.01.0 0.00.00.20.20.40.40.60.60.80.81.01.0
AL8AL8AL7AL7AL5AL5AL1AL1AL4AL4AL3AL3AL2AL2AL6AL6DR8DR8DR6DR6DR5DR5DR7DR7DR1DR1DR4DR4DR2DR2DR3DR3 0.00.00.20.20.40.40.60.60.80.81.01.0
0.00.00.20.20.40.40.60.60.80.81.01.0 0.00.00.20.20.40.40.60.60.80.81.01.00.00.00.20.20.40.40.60.60.80.81.01.0
AL8AL8AL7AL7AL1AL1AL5AL5AL6AL6AL4AL4AL3AL3AL2AL2DR8DR8DR7DR7DR1DR1DR6DR6DR5DR5DR4DR4DR2DR2DR3DR3
AL8AL8AL5AL5AL7AL7AL1AL1AL6AL6AL4AL4AL3AL3AL2AL2DR8DR8DR7DR7DR1DR1DR6DR6DR5DR5DR4DR4DR2DR2DR3DR3
AL8AL8AL7AL7AL1AL1AL5AL5AL6AL6AL4AL4AL3AL3AL2AL2DR8DR8DR7DR7DR1DR1DR6DR6DR5DR5DR4DR4DR2DR2DR3DR3
AL8AL8AL5AL5AL7AL7AL1AL1AL6AL6AL4AL4AL3AL3AL2AL2DR8DR8DR7DR7DR1DR1DR6DR6DR5DR5DR4DR4DR2DR2DR3DR3
AL8AL8AL5AL5AL7AL7AL1AL1AL4AL4AL3AL3AL6AL6AL2AL2DR8DR8DR7DR7DR1DR1DR4DR4DR2DR2DR3DR3DR6DR6DR5DR5
HCA Distinguishes Female AL and DR RatsHCA Distinguishes Female AL and DR Rats
63 variables (confirmed), Non63 variables (confirmed), Non--independent samplesindependent samples
ALALDRDR
ALALDRDR
ALALDRDR
ALALDRDR
ALALDRDR
ALALDRDR
Single Complete Single Complete CentroidCentroid
AutoscaleAutoscale
Range Range ScaleScale
PreprocessingPreprocessingDependentDependent
Categorization Categorization AccuracyAccuracy
100%100%
100%100%
100%100%100%100%100%100%Linkage MethodLinkage Method
DependentDependentCategorization Categorization
AccuracyAccuracy
0.20.20.40.40.60.60.80.81.01.0 0.00.00.20.20.40.40.60.60.80.81.01.0
AL8AL8AL7AL7AL5AL5AL1AL1AL4AL4AL3AL3AL2AL2AL6AL6DR8DR8DR6DR6DR5DR5DR7DR7DR1DR1DR4DR4DR2DR2DR3DR3 0.00.00.20.20.40.40.60.60.80.81.01.0
0.00.00.20.20.40.40.60.60.80.81.01.0 0.00.00.20.20.40.40.60.60.80.81.01.00.00.00.20.20.40.40.60.60.80.81.01.0
AL8AL8AL7AL7AL1AL1AL5AL5AL6AL6AL4AL4AL3AL3AL2AL2DR8DR8DR7DR7DR1DR1DR6DR6DR5DR5DR4DR4DR2DR2DR3DR3
AL8AL8AL5AL5AL7AL7AL1AL1AL6AL6AL4AL4AL3AL3AL2AL2DR8DR8DR7DR7DR1DR1DR6DR6DR5DR5DR4DR4DR2DR2DR3DR3
AL8AL8AL7AL7AL1AL1AL5AL5AL6AL6AL4AL4AL3AL3AL2AL2DR8DR8DR7DR7DR1DR1DR6DR6DR5DR5DR4DR4DR2DR2DR3DR3
AL8AL8AL5AL5AL7AL7AL1AL1AL6AL6AL4AL4AL3AL3AL2AL2DR8DR8DR7DR7DR1DR1DR6DR6DR5DR5DR4DR4DR2DR2DR3DR3
AL8AL8AL5AL5AL7AL7AL1AL1AL4AL4AL3AL3AL6AL6AL2AL2DR8DR8DR7DR7DR1DR1DR4DR4DR2DR2DR3DR3DR6DR6DR5DR5
HCA Distinguishes Female AL and DR RatsHCA Distinguishes Female AL and DR Rats
63 variables (confirmed), Non63 variables (confirmed), Non--independent samplesindependent samples63 variables (confirmed), Non63 variables (confirmed), Non--independent samplesindependent samples
PCA Distinguishes AL and DR Female RatsPCA Distinguishes AL and DR Female Rats
Autoscale Range ScaleAutoscale Range Scale
Factor1Factor1
Factor2Factor2DRDR ALALFactor3Factor3Factor3Factor3
Factor1Factor1Factor2Factor2
DRDR
ALAL
63 variables (confirmed), Non63 variables (confirmed), Non--independent samplesindependent samples
PCA Distinguishes AL and DR Female RatsPCA Distinguishes AL and DR Female Rats
Autoscale Range ScaleAutoscale Range Scale
Factor1Factor1
Factor2Factor2DRDR ALALFactor3Factor3Factor3Factor3
Factor1Factor1Factor2Factor2
DRDR
ALAL
HCA Proof of Principle PCA
Biological ModelBiological Model
1400140012001200100010008008006006004004002002000000
2020
4040
6060
8080
100100
DaysDays
Surv
ival
Rat
eSu
rviv
al R
ate
ALALDRDR
Biological ModelBiological Model
1400140012001200100010008008006006004004002002000000
2020
4040
6060
8080
100100
DaysDays
Surv
ival
Rat
eSu
rviv
al R
ate
ALALDRDR
1075 analytically detectable peaks1075 analytically detectable peakssensitivity ~300 sensitivity ~300 pApA = ~10 fmole/125 = ~10 fmole/125 l seral sera
0.0 20.0 40.0 60.0 80.0 100.0
0.00
0.20
0.40
0.60
0.80
Retention time (minutes)Retention time (minutes)
Resp
onse
(µA)
Resp
onse
(µA)
16151413121110987654321
1075 analytically detectable peaks1075 analytically detectable peakssensitivity ~300 sensitivity ~300 pApA = ~10 fmole/125 = ~10 fmole/125 l seral sera
0.0 20.0 40.0 60.0 80.0 100.0
0.00
0.20
0.40
0.60
0.80
Retention time (minutes)Retention time (minutes)
Resp
onse
(µA)
Resp
onse
(µA)
16151413121110987654321
050
100150200250300350
050
100150200250300350
Model Feature Selection
HCA Validation PCA
DRDRALAL
0.00.00.20.20.40.40.60.60.80.81.01.0
94% Accuracy94% Accuracy
CompleteComplete
DR13DR13DR14DR14DR12DR12DR11DR11DR9DR9DR10DR10AL16AL16AL15AL15AL12AL12AL10AL10AL13AL13AL9AL9AL14AL14
AL11DR16DR16DR15DR15
Factor1
Factor3
Factor2
DR
63 variables (confirmed), independent female Cohort #263 variables (confirmed), independent female Cohort #2
PCA Distinguishes AL and DR Female RatsPCA Distinguishes AL and DR Female Rats
AL
Factor1
Factor3
Factor2
DR
63 variables (confirmed), independent female Cohort #263 variables (confirmed), independent female Cohort #2
PCA Distinguishes AL and DR Female RatsPCA Distinguishes AL and DR Female Rats
AL
HCA Simplify Model PCA
ALALDRDR DRDR
ALALDR21DR21AL21AL21AL17AL17AL18AL18AL20AL20AL19AL19AL22AL22DR19DR19DR20DR20DR18DR18DR17DR17
0.00.00.20.20.40.40.60.60.80.81.01.0
36 Variables36 Variables
DR21DR21AL21AL21AL19AL19AL22AL22DR19DR19AL17AL17AL18AL18AL20AL20DR20DR20DR18DR18DR17DR17
0.00.00.20.20.40.40.60.60.80.81.01.0
63 Variables63 Variables
91% Accuracy91% Accuracy 63% Accuracy63% Accuracy
Removed variables can contribute Removed variables can contribute to ALto AL--DR separationsDR separations
Autoscale, completeAutoscale, complete
63/(37/36) variables (confirmed), Independent female Cohort #363/(37/36) variables (confirmed), Independent female Cohort #3
ALALDRDR DRDR
ALALDR21DR21AL21AL21AL17AL17AL18AL18AL20AL20AL19AL19AL22AL22DR19DR19DR20DR20DR18DR18DR17DR17
0.00.00.20.20.40.40.60.60.80.81.01.0
36 Variables36 Variables
DR21DR21AL21AL21AL19AL19AL22AL22DR19DR19AL17AL17AL18AL18AL20AL20DR20DR20DR18DR18DR17DR17
0.00.00.20.20.40.40.60.60.80.81.01.0
63 Variables63 Variables
91% Accuracy91% Accuracy 63% Accuracy63% Accuracy
Removed variables can contribute Removed variables can contribute to ALto AL--DR separationsDR separations
Autoscale, completeAutoscale, complete
63/(37/36) variables (confirmed), Independent female Cohort #363/(37/36) variables (confirmed), Independent female Cohort #3
Factor2Factor2
Factor1Factor1Factor3Factor3
36 Variables36 Variables
Factor1Factor1
Factor3Factor3
Factor2Factor2
63 Variables63 Variables
DRDRDRDR
ALALALAL
PCA Distinguishes AL/DR Using Either PCA Distinguishes AL/DR Using Either DatasetDataset
37 variables (confirmed), Independent female Cohort #337 variables (confirmed), Independent female Cohort #3
Factor2Factor2
Factor1Factor1Factor3Factor3
36 Variables36 Variables
Factor1Factor1
Factor3Factor3
Factor2Factor2
63 Variables63 Variables
DRDRDRDR
ALALALAL
PCA Distinguishes AL/DR Using Either PCA Distinguishes AL/DR Using Either DatasetDataset
37 variables (confirmed), Independent female Cohort #337 variables (confirmed), Independent female Cohort #3
Status
Proof of principle accuracy: HCA (100%)PCA (100%)
Validation Accuracy: HCA (94%)PCA (100%) - subjective rotation
Simplification – HCA (Fails)PCA (100% Accuracy)
Use larger models?Test components vs distance
“Expert Systems/Supervised Analysis”
KNN – k-nearest neighbor analysis– Supervised HCA (HCA is KNN with K=1)– Distance-based metric– Strength is with small (training) datasets
SIMCA – Soft Independent Modeling of Class Analogy– Supervised PCA– Component-based metric– Strength is modeling flexibility (eg, group-specific interactions)
In our DR sera metabolomics data – components greatly outperform distance-based
algorithms
In OUR DR SERA METABOLOMICS data –
components greatly outperform distance-based
algorithms
Profiles are cohort specific
-6-4-2024
-8-6
-4-20246
-4-202468
t[3]
t[1]
t[2]
male samples modeled with male/female data set
AMALAMDRBMALBMDRCMALCMDR
Cohort Separations
Cohort Effects
-6-4-2024
-8-6
-4-20246
-4-202468
male samples modeled with male/female data set
AMALAMDRBMALBMDRCMALCMDR
Cohort SeparationsCohort Separations
-6-4-2024
-8-6
-4-20246
-4-202468
male samples modeled with male/female data set
AMALAMDRBMALBMDRCMALCMDR
Cohort SeparationsCohort Separations
-8-6
-4-20
24
68
10
-12
-10
-8
-6
-4
-2
0
24
68
-10-8
-6-4
-20
24
68
10
t[3]
t[1]
t[2]
(a) none
-8-6
-4-20
2
4
6
-10
-8
-6
-4
-2
0
2
46
8
-10-8
-6-4
-20
24
68
10
t[3]
t[1]
t[2]
(b) log
-8-6
-4-20
24
6
8
-10
-8
-6
-4
-2
0
2
46
8
-8-6
-4-2
02
46
810
t[3]
t[1]
t[2]
(c) winsorize (3SD)
-8-6
-4
-2
0
2
4
6
-10
-8
-6
-4
-2
0
2
4
6
8
-12-10
-8-6
-4-2
02
46
810
t[3]
t[1]
t[2]
(f) log + win (2SD)
-8-6
-4-20
2
4
6
8
-10
-8
-6
-4
-2
0
2
4
6
-10-8
-6-4
-20
24
68
10
t[3]
t[1]
t[2]
(e) winsorize (2SD)
-8-6
-4-20
2
4
6
-10
-8
-6
-4
-2
0
2
46
8
-12-10
-8-6
-4-2
02
46
8
t[3]
t[1]
t[2]
(d) log + win (3SD)
AMALAMDRBMALBMDRCMALCMDR (a) no scaling (females)
0.0 0.2 0.4 0.6 0.8 1.0
none
log
winsorize (2SD)
winsorize (3SD)
log + win (2SD)
log + win (3SD)
(e) UV scaling (females)
0.0 0.2 0.4 0.6 0.8 1.0
none
log
winsorize (2SD)
winsorize (3SD)
log + win (2SD)
log + win (3SD)
(c) Pareto scaling (females)
0.0 0.2 0.4 0.6 0.8 1.0
none
log
winsorize (2SD)
winsorize (3SD)
log + win (2SD)
log + win (3SD)
(b) no scaling (males)
0.0 0.2 0.4 0.6 0.8 1.0
none
log
winsorize (2SD)
winsorize (3SD)
log + win (2SD)
log + win (3SD)
(f) UV scaling (males)
0.0 0.2 0.4 0.6 0.8 1.0
none
log
winsorize (2SD)
winsorize (3SD)
log + win (2SD)
log + win (3SD)
(d) Pareto scaling (males)
0.0 0.2 0.4 0.6 0.8 1.0
none
log
winsorize (2SD)
winsorize (3SD)
log + win (2SD)
log + win (3SD)
R2XR2YQ2
(cum)
PLS-DA
PLS-DA
00..000000..110000..220000..330000..440000..550000..660000..770000..880000..990011..0000
Com
p[1]
Com
p[1]
Com
p[2]
Com
p[2]
Com
p[3]
Com
p[3]
--00..2200
00..0000
00..2200
00..4400
00..6600
00..8800
--00..2200
00..0000
00..2200
00..4400
00..6600
00..8800
00..0000 00..1100 00..2200 00..3300 00..4400 00..5500 00..6600 00..7700 00..8800 00..9900 11..000000..0000 00..1100 00..2200 00..3300 00..4400 00..5500 00..6600 00..7700 00..8800 00..9900 11..0000
--66--44--220022446688
--1122 --1100 --88 --66 --44 --22 00 22 44 66 88 1100 1122
t[2]
t[2]
t[1]t[1]
Act
ual A
L pr
obab
ility
Act
ual A
L pr
obab
ility
00..0000
00..2200
00..4400
00..6600
00..8800
11..0000
Predicted AL probabilityPredicted AL probability
p<0.001
13.46713.46721.11721.11770.01770.017
14.97514.97562.94262.942
24.48324.48392.79292.792
65.40865.40893.89293.892
52.85852.858
100.242100.24244.32544.325
40.36740.367
72.34272.342--00..2200
--00..1100
00..0000
00..1100
00..2200
00..3300
--00..1100 00..00 00 00..1100 00 ..2200
w*c
[2]
w*c
[2]
w*c[1]w*c[1]
5.0425.0425.0675.067
5.7585.758
6.4086.408
7.6427.642
8.1838.1839.4429.442
9.459.45
9.8179.817
11.4511.45
11.45811.458
12.60812.608
13.52513.525
14.03314.033
15.47515.475
16.92516.925
16.93316.933
17.47517.475
20.120.1
22.19222.192
22.46722.467
23.523.5
25.49225.49226.39226.39227.14227.142
27.89227.89228.60828.608 28.85828.858
29.31729.317
30.13330.133
30.24230.24231.61731.617
34.78334.783
34.934.934.99234.992
35.535.537.35837.358
39.41739.417
40.29240.29240.440.4
40.82540.825
41.64241.642
43.03343.033
44.22544.225
45.30845.308
45.41745.41746.14246.142
46.57546.575 46.72546.725
47.19247.192
47.44247.442
47.96747.967
48.53348.533
48.80848.808 49.649.6
52.10852.108
54.05854.058
54.28354.283
54.80854.808
55.65855.658
56.11756.117
58.16758.167
61.09261.092
62.49262.492
62.6562.6563.35863.358
67.64267.642
68.57568.57568.668.668.768.7
69.0569.05
73.07573.075
75.48375.483
75.52575.525
76.276.277.44277.442
77.91777.91782.22582.225
84.80884.808
CRCR
ALAL
Actual Degree of Restriction
40 50 60 70 80 90 100 110 120
Predic
ted De
gree R
estrict
ion
40
50
60
70
80
90
100
110
120
r ² = 0.877; r ² = 0.994 (means)
N = 90Each group p<0.05 vs others
w*c[1]
5.042
5.067
5.758
6.408
7.642
8.183
9.4429.45
11.308
11.4511.458
13.467
13.525
14.975
15.475
16.925
16.93317.475
21.117
22.192
23.5
24.483
25.492
26.392
27.14227.892
28.60828.858
29.31730.108
30.133
30.24234.483
34.783
34.934.992
35.5
37.358
39.417
40.292
40.4
41.642
43.033
44.225
44.325
45.308
45.417
46.142
47.19247.442
47.967 48.533
48.808
49.652.108
52.858
54.058
54.283
54.808
55.658
56.117
58.167
61.09262.442
62.492
62.65
62.942
63.358
65.408
67.642
68.57568.6
68.7
70.017
72.933
73.075
75.483
75.525
76.2
77.442
77.867
77.917
81.782.225
84.808
92.792100.242
INTAK
E
LK w*c
[2]Actual Degree of Restriction
40 50 60 70 80 90 100 110 120
Predic
ted De
gree R
estrict
ion
40
50
60
70
80
90
100
110
120
r ² = 0.877; r ² = 0.994 (means)
N = 90Each group p<0.05 vs others
w*c[1]
5.042
5.067
5.758
6.408
7.642
8.183
9.4429.45
11.308
11.4511.458
13.467
13.525
14.975
15.475
16.925
16.93317.475
21.117
22.192
23.5
24.483
25.492
26.392
27.14227.892
28.60828.858
29.31730.108
30.133
30.24234.483
34.783
34.934.992
35.5
37.358
39.417
40.292
40.4
41.642
43.033
44.225
44.325
45.308
45.417
46.142
47.19247.442
47.967 48.533
48.808
49.652.108
52.858
54.058
54.283
54.808
55.658
56.117
58.167
61.09262.442
62.492
62.65
62.942
63.358
65.408
67.642
68.57568.6
68.7
70.017
72.933
73.075
75.483
75.525
76.2
77.442
77.867
77.917
81.782.225
84.808
92.792100.242
INTAK
E
LK w*c
[2]w*c[1]
5.042
5.067
5.758
6.408
7.642
8.183
9.4429.45
11.308
11.4511.458
13.467
13.525
14.975
15.475
16.925
16.93317.475
21.117
22.192
23.5
24.483
25.492
26.392
27.14227.892
28.60828.858
29.31730.108
30.133
30.24234.483
34.783
34.934.992
35.5
37.358
39.417
40.292
40.4
41.642
43.033
44.225
44.325
45.308
45.417
46.142
47.19247.442
47.967 48.533
48.808
49.652.108
52.858
54.058
54.283
54.808
55.658
56.117
58.167
61.09262.442
62.492
62.65
62.942
63.358
65.408
67.642
68.57568.6
68.7
70.017
72.933
73.075
75.483
75.525
76.2
77.442
77.867
77.917
81.782.225
84.808
92.792100.242
INTAK
E
LK w*c
[2]
Markers “Predict” Caloric Intake with High Quantitative
Accuracy-- Proof of Concept --
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
t[2]O
t[1]P
female basic model.M1 (OPLS)t[Comp. 1]/t[Comp. 2]Colored according to value in variable female basic model(Group Y new)
R2X[1] = 0.105822 R2X[2] = 0.162921 Ellipse: Hotelling T2 (0.95)
Series (Variable Group Y new)
1 - 1.51.5 - 2
SIMCA-P 11 - 3/22/2006 2:34:11 PM
O-PLS models built for better testing
Iteratively improve models – focus on
analytical robustness
Then test one model…
-6-4-20246
024681012141618-6-4-202468
AGE Duration Extent
010203040-6-4-20246
Plot 1 egroup vs m17e Plot 2 Regr
O-P
LS P
redi
ctio
n S
core
(A.U
.)
Weeks on CR Diet Percent RestrictionAL [Months of Age] CR
6 12 18 24 30 6 12 18 24 30 r ² = 0.645
Validation: Across Lifespan
-6-4-20246
024681012141618-6-4-202468
AGE Duration Extent
010203040-6-4-20246
Plot 1 egroup vs m17e Plot 2 Regr
O-P
LS P
redi
ctio
n S
core
(A.U
.)
Weeks on CR Diet Percent RestrictionAL [Months of Age] CR
6 12 18 24 30 6 12 18 24 30 r ² = 0.645
Validation: Duration
-6-4-20246
024681012141618-6-4-202468
AGE Duration Extent
010203040-6-4-20246
Plot 1 egroup vs m17e Plot 2 Regr
O-P
LS P
redi
ctio
n S
core
(A.U
.)
Weeks on CR Diet Percent RestrictionAL [Months of Age] CR
6 12 18 24 30 6 12 18 24 30 r ² = 0.645
Validation: Extent
Percent Restriction01020304050-4-20246
r ²0.636
r ² = 0.338Only including 8 week CRNo AL group
Percent Restriction
OP
LS S
core
AU
0 10 20 30 40
OP
LS S
core
AU
-8
-6
-4
-2
0
2
4
6
8Includes 12 and 18 AL and CRr ² = 0.636
Percent Restriction01020304050-4-20246
r ²0.636
r ² = 0.338Only including 8 week CRNo AL group
Percent RestrictionO
PLS
Sco
re A
U0 10 20 30 40
OP
LS S
core
AU
-8
-6
-4
-2
0
2
4
6
8Includes 12 and 18 AL and CRr ² = 0.636
Validation: Extent
Experimental Design
AL vs DRAnalytical
IssuesBiological
Issues
0.00 0.12 0.24 0.36 0.48 0.600.00
0.12
0.24
0.36
0.48
0.60
0.00 0.15 0.30 0.45 0.60 0.750.00
0.15
0.30
0.45
0.60
0.75
0.00 0.15 0.30 0.45 0.60 0.750.00
0.15
0.30
0.45
0.60
0.75
0.00 0.25 0.50 0.75 1.00 1.25 1.500.00
0.25
0.50
0.75
1.00
1.25
1.50
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.000.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
0.00 0.25 0.50 0.75 1.00 1.25 1.500.00
0.25
0.50
0.75
1.00
1.25
1.50
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.000.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
0.00 0.25 0.50 0.75 1.00 1.25 1.500.00
0.25
0.50
0.75
1.00
1.25
1.50
MDRCV0.00 0.25 0.50 0.75 1.00 1.25 1.50
ANAL
CV1
0.00
0.25
0.50
0.75
1.00
1.25
1.50
0.00 0.25 0.50 0.75 1.00 1.250.00
0.25
0.50
0.75
1.00
1.25
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.000.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
r ² = 0.01
r ² = 0.31
r ² = 0.00 r ² = 0.01r ² = 0.01
MALCV FALCV
MDRCVMALCVFALCVMALCV
ANALCV1ANALCV2ANALCV1
FDRCV
ANAL
CV1
ANAL
CV1
ANAL
CV1
FDRC
V
FALC
V
FDRC
V
MDR
CV
ANAL
CV2
ANAL
CVT
ANAL
CVT
r ² = 0.75 r ² = 0.55 r ² = 0.36
r ² = 0.31 r ² = 0.25 r ² = 0.09
AvAAvA
AL AL vs vs DRDR
Biological Biological vs vs AnalyticalAnalytical
In Rats: Biological variability 5 fold greater than analytical variability Analytical variability does not influence biological variability
Analytical vs Biological Variation
Human Studies: Profile = …
• Recent Food (ie, Fast)• BMI• Food intake• Physiology
Human Studies: Profile = …
• Recent Food (ie, Fast)• BMI• Food intake• Physiology
Human Studies: Profile = …
• Recent Food (ie, Fast)–Effect size ~ 0.2*StDev
• BMI• Food intake• Physiology
Human Studies: Profile = …
• Recent Food (ie, Fast)• BMI• Food intake• Physiology
Human Studies: Profile = …
• Recent Food (ie, Fast)• BMI• Food intake• Physiology
Profile Score (Individual)
-8 -6 -4 -2 0 2 4 6 8
BM
I
15
20
25
30
35
40
45
-8 -6 -4 -2 0 2 4 615
20
25
30
35
40
45
Profile Score (3 point mean)
Profile Score Does Not Reflect BMI
Human Studies: Profile = …
• Recent Food (ie, Fast)• BMI• Food intake• Physiology
Human Studies: Profile = …
• Recent Food (ie, Fast)• BMI• Food intake• Physiology
Human Studies: FFQ vs score (individuals)
1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
-8
-6
-4
-2
0
2
4
6
8
10
f(x) = 0.00100965579286458 x − 1.92440997194325R² = 0.0329559398823117
Extreme FFQ Quintiles I• No obvious association with individual
FFQs, but…
• FFQs are known to be weakly predictive of individual caloric intake
• But extreme quintiles on the FFQ should differ – or we would likely never see anything in epidemiology
Extreme FFQ Quintiles II
1200 1400 1600 1800 2000 2200 2400 2600
-0.80-0.60-0.40-0.200.000.200.400.60f(x) = 0.0010794306233 x − 2.0509182856653R² = 0.833550524003422
FFQ Quintiles “predict” score
Individual values in extreme quintiles
differ, p<0.005
Extreme FFQ Quintiles II
1200 1400 1600 1800 2000 2200 2400 2600
-0.80-0.60-0.40-0.200.000.200.400.60f(x) = 0.0010794306233 x − 2.0509182856653R² = 0.833550524003422
-0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.600
50010001500200025003000
f(x) = 772.21315200072 x + 1881.4884123559R² = 0.833550524003422
FFQ Quintiles “predict” scoreScore “predicts” FFQ quintile
Human Studies: Profile = …
• Recent Food (ie, Fast)• BMI• Food intake • Physiology
Human Studies: Profile = …
• Recent Food (ie, Fast)• BMI• Food intake • Physiology
Testing Physiology
• Hypothesis: Profile score will differ/predict future risk of diseases reduced by CR, eg, breast cancer
• Proposed 1000 cases/paired controls for breast cancer– Nested within Nurses’ Health Study– Samples taken 2-10 years before onset
• Initial funding, 2 years, 750 case control pairs
• Today, 210 case:control pairs (first blind break)
Case-Controls - DesignUsing the raw RAT models, no corrections, no optimization except for those that addess analytical issues in humans All observations that served as both cases and controls (due to conversion) were excluded for this analysis 210 total case/control pairs scored The original model currently the best, so I'll present those numbers...others are close
Case-Controls - Data
These slides not cleared for posting at cohort study level, contact me at [email protected] if you have questions
Human Studies: Profile = …
• Recent Food (ie, Fast)• BMI• Food intake • Physiology
SummaryCreated and validated a working model of the CR serotype
in both male and female rats
Profiles distinguish diet in blinded studies
Markers pass analytical tests in human plasma
Rat profiles pass key tests in human case/control studies
Other studies ongoing, lipids, macronutrient shifts
The metabolomic markers and profiles identified appear analytically and biologically suitable for studies in defined human populations such as national clinical trials and epidemiological cohorts
Data suggests that the protective phenotype exemplified in the CR rat is broadly conserved across species
Data suggests that we have the “raw material” needed to build marker profiles that can be tailored to provide a continuous ruler for objective measurement of factors such as caloric intake in epidemiological and clinical studies
Data suggests that we have the “raw material” to begin to address personalized disease risk from the nutritional/environmental side
Long-Range Disease Risk Prediction – Algorithmic Information Fusion in a Life and Death Environment
OpportunitiesGenetics
Demographics
Questionnaires
Clinical data
Metabolomics, proteomics, lipidomics = GxE
Long-Range Disease Risk Prediction – Algorithmic Information Fusion in a Life and Death Environment
ComplicationsGenetics…low effect size, low proportion explained, low penetrance, multi-gene effects, sequence vs SNP, Epigenome vs sequence, tissue specificityDemographics…definition and measurement problemsQuestionnaires…people lie, people are biased liars, questions are *hard* and often population specificClinical data… limited, costly, by the time you have it may be too late, thresholds, different data structure/distributionsMetabolomics… proteomics, lipidomics = GxE, very complex interactions, time effects, diet effects
Long-Range Disease Risk Prediction – Algorithmic Information Fusion in a Life and Death Environment
ComplicationsAre we limited to decision level analysis?
Long-Range Disease Risk Prediction – Algorithmic Information Fusion in a Life and Death Environment
ComplicationsAre we limited to decision level analysis?
No… Biology-based fusion works
Long-Range Disease Risk Prediction – Algorithmic Information Fusion in a Life and Death Environment
Step 1What is Success?What is Failure?
How do we balance Success or FailureWhat does it mean to optimize?
How do we assess?
Long-Range Disease Risk Prediction – Algorithmic Information Fusion in a Life and Death Environment
Step 1What is Success?What is Failure?
How do we balance Success or FailureWhat does it mean to optimize?
How do we assess?
Long-Range Disease Risk Prediction
Algorithmic Information Fusion
in a Life and Death Environment
Long-Range Disease Risk Prediction
Algorithmic Information Fusion
in a Life and Death Environment
Long-Range Disease Risk Prediction
Algorithmic Information Fusion
in a Life and Death Environment
Long-Range Disease Risk Prediction
Algorithmic Information Fusion
in a Life and Death Environment
Acknowledgements• Brigham and Women’s Hospital and Burke/Cornell
– Yevgeniya Shurubor, Honglian Shi, Diane Sheldon, Sophie Guo, Susan (Schiavo) Bird, Rose Gathungu
– Ugo Paolucci, Ruiwen Zhou, Vasant Marur, Neil Russell, Matt Sniatynski
• Outside Collaborators– Walter Willett, Sue Hankinson, Frank Hu, Paul Vouros – Wayne Matson, Karen Vigneau-Callahan, Paul Milbury– Tom Vogl, Frank Hsu, Christina Schweikert
• Funding– NIH (NIA; NCI; NIEHS)– Winifred Masterson Burke Relief Foundation– Brigham and Women’s Hospital, and Dept of
Neurosurgery
Acknowledgments
Bioinformatics
Animal Studies and Mitochondria Research
Analytical Work