Post on 16-Feb-2019
Characterization of metabolomic signatures in septic shock patients: a data
mining approach
XXIX Cycle 2014 – 2017
POLITECNICO DI MILANO
DEPARTMENT OF ELECTRONICS, INFORMATION AND BIOENGINEERING
DOCTORAL PROGRAM IN BIOENGINEERING
Doctoral dissertation of: Alice Cambiaghi
Advisors: Dr. Manuela FERRARIO Prof. Giuseppe BASELLI Dr. Roberta PASTORELLI Tutor: Prof. Manuela Teresa RAIMONDI Supervisor of the PhD Program: Prof. Andrea ALIVERTI
To Davide
Abstract
Septic shock is one of the major complication in critically ill patients with a mortality rate
reaching 40%, a high-risk of second line treatments and long term physical and cognitive
impairments in survivors. Current treatments in septic shock are mainly devoted to restore
homeostasis and prevent multiple organ failure by administrating fluids and vasoactive agents to
avoid prolonged hypotension. Despite significant improvements in clinical care, accurate diagnosis
and risk stratification for septic shock patients remains a challenge and clinicians are still far to have
found the optimal therapy. In this contest, information at molecular or cellular level provided by
omics analyses are of great importance for the development of new therapeutic targets and to
follow individual response to therapy. Thanks to the recent technological advances in high-
throughput omics analyses, this type of data is becoming more and more accessible, giving origin to
huge and heterogeneous datasets. In this framework, the interest in metabolomics has increased
since metabolites represents the terminal downstream products of the genome and consist of the
total complement of all low-molecular-weight molecules that cellular processes leave behind.
Hence, metabolomics studies are very promising to model complex and multifactorial syndromes,
such as septic shock, and may be a promising tool toward personalized medicine. In spite of these
progresses, the management of metabolomics data is still an open challenge. Data mining and
machine learning approaches have been recently applied in this context, but several aspects have
still to be explored in order to have reliable tools.
The objective of this thesis is the exploration of machine learning and data mining techniques
for metabolomics data analysis and multilevel integration in two septic shock patient cohorts
selected from ALBIOS and ShockOmics clinical trials. We focused on a homogeneous and well
defined group of patients in the same condition (i.e. severe septic shock) and on a short temporal
window (i.e. 48 hours or one week after diagnosis). The models obtained highlighted the role of
lipids, alanine and plasmalogens. The identified pathways could be a further step in the
comprehension of the complex mechanisms, currently still under study, involved in the
pathogenesis and progression of septic shock.
Sommario
Lo shock settico è una delle maggiori complicanze che possono insorgere nei pazienti di
terapia intensiva; è caratterizzato da un tasso di mortalità che sfiora il 40% e da gravi conseguenze
a lungo termine che implicano un deterioramento sia cognitivo che fisico. Attualmente, il
trattamento dello shock settico è volto a ripristinare l’omeostasi e a prevenire il fallimento
multiorgano tramite la somministrazione endovenosa di fluidi e farmaci vasopressori così da
normalizzare la pressione. Nonostante i significativi miglioramenti degli ultimi anni, la stratificazione
dei pazienti in shock settico sulla base del rischio già al momento della diagnosi iniziale, risulta
tuttora difficoltosa e si è ancora lontani dall’aver trovato il trattamento ottimale. In questo contesto,
le informazioni fornite dai dati omici riguardo ai meccanismi coinvolti a livello molecolare e cellulare
risultano fondamentali per permettere sia l’identificazione di nuovi target terapeutici sia il
monitoraggio continuo della risposta di ogni paziente alla terapia. Grazie ai recenti progressi della
tecnologia nell’ambito delle analisi omiche, le informazioni riguardo a questo tipo di dati sono
diventate sempre più accessibili, dando così origine a database molto estesi ed eterogenei. In
particolare, sta crescendo l’interesse per la metabolomica, dovuto al fatto che i metaboliti
costituiscono i prodotti finali del metabolismo cellulare e possono quindi fornire indicazioni preziose
sui processi molecolari in atto in un determinato istante. Gli studi di metabolomica risultano quindi
particolarmente promettenti per modellare una sindrome complessa come lo shock settico e
potrebbero costituire un ottimo punto di partenza verso lo sviluppo di modelli per la medicina
personalizzata. Tuttavia, nonostante questi progressi, il trattamento e l’analisi dei dati omici
costituisce ancora una sfida. Di recente, tecniche di data mining e di machine learning sono state
utilizzate in questo ambito ma c’è ancora molto da esplorare prima di poter ottenere dei metodi
affidabili.
Alla luce di quanto qui espresso, l’obiettivo di questa tesi è lo studio di tecniche di machine
learning per l’integrazione multiscala di dati omici in due coorti di pazienti in shock settico
selezionate dagli studi clinici ALBIOS e ShockOmics. Ci siamo concentrati su gruppi omogenei di
pazienti nella stessa condizione (shock settico grave) e su un limitato intervallo temporale (48 ore o
una settimana dopo la diagnosi). I modelli ottenuti mettono in luce il ruolo di lipidi, alanina e
plasmalogeni. I processi biologici identificati potrebbero costituire un passo avanti nella
comprensione dei complessi meccanismi, tuttora in corso di studio, coinvolti nella patogenesi e nella
progressione dello shock settico.
6
Table of Contents List of abbreviations …………………………………………………………………………………………………………………………………… i
Summary …………………………………………………………………………………………………………………………………………………… iii
1 INTRODUCTION ......................................................................................................................................... 1
1.1 SHOCK ................................................................................................................................................ 2
1.1.1 Definition, description and causes ............................................................................................ 2
1.1.2 Pathophysiology ........................................................................................................................ 3
1.2 SEPTIC SHOCK .................................................................................................................................... 4
1.2.1 Definition and incidence ............................................................................................................ 4
1.2.2 Pathophysiology ........................................................................................................................ 5
1.2.3 Treatment of septic shock ....................................................................................................... 11
1.2.4 The SOFA score ........................................................................................................................ 12
1.3 OMICS DATA .................................................................................................................................... 13
1.3.1 Metabolomics .......................................................................................................................... 14
1.3.2 Proteomics ............................................................................................................................... 19
1.4 MOTIVATIONS AND OBJECTIVES OF THE STUDY ............................................................................. 23
1.4.1 Thesis outline ........................................................................................................................... 25
2 DATA MINING METHODS FOR OMICS DATA ANALYSIS AND INTEGRATION ........................................ 26
2.1 THE MULTIPLE TESTING PROBLEM .................................................................................................. 26
2.2 LINEAR AND LOGISTIC REGRESSION MODELS ................................................................................. 27
2.2.1 Shrinkage methods .................................................................................................................. 29
2.3 FEATURE REDUCTION ...................................................................................................................... 32
2.3.1 The minimum-redundancy maximum-relevance (mRMR) algorithm ..................................... 34
2.4 LINEAR METHODS FOR CLASSIFICATION ......................................................................................... 35
2.4.1 Linear Discriminant Analysis (LDA) .......................................................................................... 36
2.4.2 Partial Least Squares Discriminant Analysis (PLS-DA) ............................................................. 37
2.5 PERFORMANCE EVALUATION .......................................................................................................... 39
2.6 PROBABILISTIC GRAPHICAL MODELS ............................................................................................... 40
Table of contents
2.7 FINAL CONSIDERATIONS .................................................................................................................. 44
3 MORTALITY PREDICTION FOR SEVERE SEPTIC SHOCK PATIENTS: A TARGETED METABOLOMICS STUDY ON ALBIOS DATABASE .................................................................................................................................... 46
3.1 INTRODUCTION ............................................................................................................................... 47
3.2 MATERIAL AND METHODS .............................................................................................................. 48
3.2.1 Study design, patients and clinical data .................................................................................. 48
3.2.2 Univariate analyses for metabolomics data ............................................................................ 48
3.2.3 Multivariate analysis ................................................................................................................ 49
3.3 RESULTS ........................................................................................................................................... 49
3.3.1 Clinical characteristics of the study population ...................................................................... 49
3.3.2 Time-course of plasma metabolites and association with mortality ...................................... 51
3.3.3 Association between metabolic patterns and mortality ......................................................... 53
3.3.4 Integrated clinical and metabolomics determinants of mortality .......................................... 54
3.4 DISCUSSION ..................................................................................................................................... 55
3.5 REMARKS ......................................................................................................................................... 58
4 INTEGRATION OF METABOLOMICS AND PROTEOMICS: AN ANCILLARY STUDY ON ALBIOS DATABASE 59
4.1 INTRODUCTION ............................................................................................................................... 59
4.2 MATERIAL AND METHODS .............................................................................................................. 60
4.2.1 Study design, patients and clinical data .................................................................................. 60
4.2.2 Proteomics data analyses ........................................................................................................ 60
4.2.3 Statistical analyses ................................................................................................................... 61
4.2.4 Multivariate analysis ................................................................................................................ 62
4.3 RESULTS ........................................................................................................................................... 64
4.3.1 Clinical characteristics of the study population ...................................................................... 64
4.3.2 Changes in protein expressions between groups .................................................................... 65
4.3.3 Time trend variation of proteins and metabolites .................................................................. 67
4.3.4 Multivariate analysis ................................................................................................................ 69
4.4 EXPLORATIVE ANALYSES .................................................................................................................. 78
4.5 DISCUSSION ..................................................................................................................................... 81
Table of contents
4.6 REMARKS ......................................................................................................................................... 82
5 CHARACTERIZATION OF A METABOLOMIC PROFILE ASSOCIATED WITH RESPONSIVENESS TO THERAPY IN THE ACUTE PHASE OF SEPTIC SHOCK ........................................................................................ 84
5.1 INTRODUCTION ............................................................................................................................... 85
5.2 MATERIAL AND METHODS .............................................................................................................. 86
5.2.1 Study design, patients and clinical data .................................................................................. 86
5.2.2 Statistical analysis .................................................................................................................... 87
5.2.3 Data from targeted metabolomics analysis ............................................................................ 89
5.2.4 Multivariate analyses............................................................................................................... 90
5.3 RESULTS ........................................................................................................................................... 91
5.3.1 Clinical characteristics of the study population ...................................................................... 91
5.3.2 Metabolic fingerprinting by untargeted metabolomics .......................................................... 94
5.3.3 Metabolic profiling by targeted metabolomics ....................................................................... 96
5.3.4 Regression analysis for targeted metabolomics data ........................................................... 100
5.3.5 Regression models for targeted and untargeted metabolomics data .................................. 102
5.3.6 Discriminant analysis ................................................................................................................. 104
5.4 EXPLORATIVE ANALYSES ................................................................................................................ 106
5.5 DISCUSSION ................................................................................................................................... 110
5.6 REMARKS ....................................................................................................................................... 112
6 DISCUSSION AND CONCLUSIONS ......................................................................................................... 113
6.1 MAIN FINDINGS ............................................................................................................................. 113
6.2 LIMITS AND CLINICAL IMPACT OF THE STUDY............................................................................... 114
6.3 FUTURE DEVELOPMENTS .............................................................................................................. 116
A METABOLOMICS ANALYSES ................................................................................................................. 119
A.1 UNTARGETED METABOLOMICS BY FLOW INJECTION-TOF-MS ..................................................... 119
A.1.1 Samples preparation ............................................................................................................. 119
A.1.2 Flow Injection-TOF MS/MS .................................................................................................... 119
A.1.3 MS Data Processing ............................................................................................................... 119
A.1.4 Metabolite identification ....................................................................................................... 120
Table of contents
A.2 TARGETED METABOLOMICS .......................................................................................................... 120
B PROTEOMICS ANALYSES BY iTRAQ QUANTITATION ........................................................................... 123
B.1 STUDY DESIGN ............................................................................................................................... 123
B.2 SAMPLE PREPARATION .................................................................................................................. 123
B.2.1 Human Plasma depletion ...................................................................................................... 123
B.2.2 In solution sample digestion .................................................................................................. 123
B.2.3 Peptide labeling ..................................................................................................................... 124
B.2.4 Sample clean-up and fractionation ....................................................................................... 124
B.3 LC-MS/MS ANALYSES ..................................................................................................................... 125
B.4 DATABASE SEARCH ........................................................................................................................ 126
B.5 DATA ANALYSIS .............................................................................................................................. 126
C Comparison of metabolomics profile of cardiogenic and septic shock patients ................................ 128
C.1 AIM OF THE ANALYSES .................................................................................................................. 128
C.2 PRELIMINARY RESULTS .................................................................................................................. 128
C.3 REMARKS ....................................................................................................................................... 129
D LIST OF PUBLICATIONS .......................................................................................................................... 133
Bibliography ……………………………………………………………………………………………………………………………………………117
i
List of abbreviations AA Amino acid AIC Akaike Information Criterion ATP Adenosine Triphosphate AUC Area Under the Curve BIC Bayesian Information Criterion BN Bayesian Network CE Capillary Electrophoresis CI Conditional Independence CS Cardiogenic Shock CV Cross Validation C1QA Complement C1q subcomponent subunit A DAG Directed Acyclic Graph DAMP Damage-Associated Molecular Pattern DIC Disseminated Intravascular Coagulopathy EGDT Early Goal Directed Therapy ETC Electron Transport Chain FA Fatty Acid FDR False Discovery Rate FIA-TOF-MS Flow Injection-Time-of-Flight Mass Spectrometry FP False Positive GC Gas Chromatography HMDB Human Metabolome Database HPLC High Performance Liquid Chromatography ICU Intensive Care Unit IDO Indolamine 2,3-dioxygenase iTRAQ Isobaric Tag for Relative and Absolute Quantitation LC Liquid Chromatography LCPUFA Long Chain Polyunsaturated Fatty Acid LDA Linear Discriminant Analysis lysoPC Lysophosphatidylcholines MI Mutual Information MN Markov Network MODS Multiple Organ Dysfunction Syndrome MOF Multiple Organ Failure mRMR minimal-Redundancy Maximal-Relevance MS Mass Spectrometry MSE Mean Square Error NMR Nuclear Magnetic Resonance NR Non respondent NS Non survivor OLS Ordinary Least Squares OOB Out-Of-Bag PAMP Pattern-Associated Molecular Pattern PC Phosphatidylcholines
List of abbreviations
ii
PLA2 Plasma Phospholipase A2 PLS-DA Partial Least Squares Discriminant Analysis PRR Patter Recognition Receptor Q-TOF Quadrupole time-of-flight R Responder RF Random Forests ROC Receiving Operating Characteristic ROS Reactive Oxygen Species S Survivors SM Sphingomyelins SOFA Sequential Organ Failure Assessment SS Septic Shock SVM Support Vector Machines TN True Negative TNF Tumor Necrosis Factor TP True Positive VIP Variable Importance in Projection
iii
Summary
According to reports in the United States and in Europe, shock affects about one third of
patients in Intensive Care Unit (ICU) for a total of more than 1 million victims a year. Septic shock is
a very common kind of shock and remains a major complication in critically ill patients, due to its
high lethality (40%), high-risk of second lines treatments and long-term physical and cognitive
impairments in survivors, with a 5-year mortality rate of 75% 1 . Septic shock is defined as a
complication of sepsis characterized by a life-threatening organ dysfunction caused by a
dysregulated host response to infection, which involves circulatory, cellular and metabolic
abnormalities2.
Although the pathophysiology of septic shock is not precisely understood, it has become
evident that it involves complex interactions between the pathogen and the host’s immune system.
More precisely, septic shock is characterized by an imbalance in the initial host response to infection
due to the fact that the physiological pro-inflammatory response is not adequately compensated by
anti-inflammatory mechanisms. This leads to an overwhelming and uncontrolled inflammation with
several harmful effects including: damage to cellular proteins, lipids and DNA by excessive
production of reactive oxygen species (ROS), compromised mitochondrial functionality, and
impairment of the coagulation cascade with subsequent formation of microvascular thrombi and
fibrin deposition. This latter event causes microcirculatory alterations which ultimately result in
poor tissue perfusion. Without adequate oxygen delivery, tissues deteriorate and, consequently,
organs begin to fail. This condition is known as multiple organ failure (MOF), i.e. organs not directly
injured by infection become dysfunctional due to systemic disorders involving immunoregulation
and endothelial dysfunction. When MOF occurs, the damage to tissues is already so extensive that
the patient is destined to die, even with an adequate medical intervention3,4.
Current treatments for septic shock are mainly devoted to restore homeostasis and to
prevent MOF. Since the transition to serious illness occurs in few critical hours, it has been
1 Iwashyna TJ, Ely EW, Smith DM, L. K. Long-term cognitive impairment and functional disability among survivors of
severe sepsis. Jama 304, 1787–1794 (2010). 2 Singer, M. et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). Jama 315, 801–
10 (2016). 3 Angus, D. C. & van der Poll, T. Severe sepsis and Septic Shock. N Engl J Med 369, 840–51 (2013). 4 De Backer, D., Orbegozo Cortes, D., Donadello, K. & Vincent, J.-L. Pathophysiology of microcirculatory dysfunction and
the pathogenesis of septic shock. Virulence 5, 73–9 (2014).
Summary
iv
speculated that early recognition and treatment administration could provide maximal benefit in
terms of outcome. For this reason, an attempt to use early goal directed therapy (EGDT) has been
made for the treatment of shock patients in ICU. EGDT consists in an early hemodynamic assessment
on the basis of physical findings and vital signs to detect persistent global tissue hypoxia so to rapidly
balance oxygen delivery with oxygen demand.
Despite significant improvements in clinical care, accurate diagnosis and risk stratification
for septic shock patients remain a challenge and clinicians are still far to have found the optimal
therapy. Currently, the choice of the treatment is based only upon the traditional concept of sepsis
progression and corresponding clinical signs (i.e. organ hypoperfusion), thus it is not tailored on the
individual. Furthermore, given the highly variable and non-specific symptoms and clinical
presentations of this pathology, an early assessment of sepsis severity is not trivial and patients’
response to therapy is difficult to predict. The complex pathophysiology of sepsis suggests that a
single biomarker approach cannot adequately describe this syndrome. In fact, traditional biomarker
strategies, (e.g. measurement of plasma concentration of C-reactive proteins, procalcitonine,
cytokines, etc.) have not yielded a definitive biomarkers panel, since they cannot discriminate
individual patient responses and outcomes. Thereby, a comprehensive and integrated analysis of
molecular and clinical measurements is needed to plan an early and appropriate therapeutic
intervention.
In this context, information at molecular or cellular level provided by omics analyses (i.e.
genomics, transcriptomics, proteomics and metabolomics) may be a suitable mean to follow
responsiveness to therapy, to establish new therapeutic targets, and to enable the identification of
patients amenable to tailored therapies. Recently, the interest in metabolomics is increasing as it
may provide a more sensitive readout of individual response phenotypes. Metabolomics consists in
the analysis and quantification of thousands of metabolites, i.e. small molecular compounds which
constitute the end products of cellular metabolism and can thus be considered the chemical
fingerprint of an organism at a precise time point. Metabolomics approaches have become a
powerful tool for revealing molecular pathways and for identifying and quantifying differentially
expressed metabolites, independently from multiple trigger factors causing the disease. This aspect
is very promising for complex and multifactorial syndromes, such as septic shock, and makes
metabolomics analyses a suitable starting point toward personalized medicine.
Summary
v
In this thesis work, we focused on a homogeneous and well defined group of patients in the
same condition (i.e. severe septic shock) and on a short temporal window (i.e.one week or 48 hours
after diagnosis). The first cohort is constituted by a subset of 20 patients of the ALBIOS database
(Albumin Italian Outcome Sepsis study, NCT00707122)5, a multicenter clinical trial which enrolled
patients with severe sepsis or septic shock from 100 ICU in Italy. The second cohort of patients is
constituted by 21 septic shock patients from ShockOmics dataset (NCT02141607)6.
Our objective was to provide a thorough description of putative biological pathways which
characterize these patients’ cohorts in order to suggest possible biomarkers to be validated in
further investigations. As hundreds of metabolites can be measured and all of these can be
associated to more than one pathway, it is important not only to find changes or difference in
metabolites concentrations, but also to find possible associations among them, which can envisage
a prevailing pathway.
Thanks to the recent technological advances in high-throughput omics analyses, omics data
are becoming more and more accessible, giving origin to huge and heterogeneous datasets which
require specialized mathematical, statistical and bioinformatics tools, not fully available yet. Some
attempts to analyze and integrate omics data have been made and some general statistical tools,
such as unsupervised multivariate data analysis, correlation analysis or principal components
analysis, have been used and are currently implemented in different software packages. However,
given the complexity of this kind of datasets, traditional statistical tests cannot be used alone for a
robust analysis. In fact, the main objective of these methods is data exploration by considering each
feature as independent from the other variables of the dataset. Consequently, they do not allow for
a general and valid categorization of selected variables or, as in our case, for biological and
functional interpretation. As an attempt to circumvent this problem, data mining and alternative
feature reduction methods are proposed in this dissertation as a novel strategy for selecting and
prioritizing variables. In fact, by considering each features in relation with all the others, data mining
approaches enable to extract previously unsuspected information from omics datasets, thereby
5 Caironi, P. et al. Albumin replacement in patients with severe sepsis or septic shock. N Engl J Med 370, 1412–1421
(2014) 6 Aletti, F. et al. ShockOmics: multiscale approach to the identification of molecular biomarkers in acute heart failure
induced by shock. Scand. J. Trauma. Resusc. Emerg. Med. 24, 9 (2016).
Summary
vi
they can be used to develop classification models and to elucidate possible associations among
species, which could reveal the metabolic pathways involved in the condition under study7.
As already outlined, omics datasets are characterized by a large number of highly correlated
features (hundreds or thousands) compared to the number of observations. For this reason, it is
necessary to develop suitable strategies by combining different data mining techniques, studied ad
hoc for the specific scientific question and for the kind of data considered. As detailed in Chapter 2,
the approach proposed in this dissertation combines features reduction techniques, linear and
logistic regression models and linear methods for classification.
Briefly, to perform feature reduction we adopted the minimal-redundancy-maximal-
relevance (mRMR) method proposed by Peng et al8. This algorithm sorts the features according to
their relevance to the outcome (maximum relevance criterion) and to their redundancy (minimum
redundancy criterion) with respect to the other variables. The ranking is based on the mutual
information between the outcome and each feature and on the mutual information between each
couple of features. Successively the featured selected and reduced in number were used to build
logistic regression models with the elastic net regularization approach. The elastic net performs both
variable selection and regularization in order to enhance the prediction accuracy and interpretability
of the statistical model it produces. This a shrinkage regression method is effective in case of several
highly correlated variables, since it performs continuous variable selection, causing some of the
regression coefficients to be exactly zero, thus eliminating redundant features. Moreover, the
subset of variables corresponding to non-zero coefficients can be considered as the ones mainly
associated with the outcome. Linear methods for classification, i.e. Linear Discriminant Analysis
(LDA) and Partial Least Squares Discriminant Analysis (PLS-DA), were also applied on our datasets.
Additionally, explorative analyses by probabilistic graphical models have been performed
with the aim to highlight conditional dependences among features. Probabilistic graphical models
aim to capture the underlying probabilistic relations between the domain variables and to express
them via a graph structure, easy to interpret. Specifically, the probabilistic relations are represented
as a network, where features constitute the nodes and the edges connecting them represents a
conditional dependence of the child node on the parent node. Thus, the absence of an edge
7 Baumgartner, C., Lewis, G. D., Netzer, M., Pfeifer, B. & Gerszten, R. E. A new data mining approach for profiling and
categorizing kinetic patterns of metabolic biomarkers after myocardial injury. Bioinformatics 26, 1745–1751 (2010). 8 Peng, H., Long, F. & Ding, C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-
Relevance and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005).
Summary
vii
connecting two nodes implies that the two corresponding variables are conditional independent,
given the other variables9. Two are the main families of graphical representations: the Bayesian
Networks, which use directed graphs (i.e. the edges are directed since they have a source and a
target), and the Markov Networks (MN), which uses undirected graphs. For our analyses we applied
undirected MNs, which were built using a two-step approach. Firstly, the maximum likelihood
network was found using the algorithm of Chow and Liu10, then a forward search was performed on
each triangulated graph of the network until no more add-eligible edges are found.
The first study, presented in Chapter 3, is an explorative analysis aiming at providing absolute
quantitative information on changes in plasma metabolite levels measured one day and one week
after development of severe septic shock, so to relate these changes with mortality. The multicenter
ALBIOS clinical trial enrolled patients with severe sepsis or septic shock from 100 ICU in Italy. Among
the 1818 patients included in this trial, only 20 resulted suitable for our study according to the
following inclusion criteria: presence of septic shock, total SOFA score > 8, serum lactate > 4 mmol/L,
and availability of plasma samples at day 1 (acute state, D1) and day 7 (steady state, D7) after
diagnosis of septic shock. These criteria were chosen to have a subset of patients homogenous with
the ones of ShockOmics clinical trial. We examined plasma metabolome and clinical features of
these patients both at D1 and at D7. Patients were classified into two groups according to their
survival status 28 days after study enrollment: survivors (11 patients, S) and non-survivors (9
patients, NS). The two time points were chosen to verify the hypothesis that the metabolic changes
over the time reflect not only initial clinical characteristics but also the progression of the disease
and the long-term survival. We applied a targeted mass spectrometry-based quantitative
metabolomics approach using the Biocrates platform coupled to Triple-Quad 5500 LC-MS/MS
system. Association between metabolic patterns and mortality was assessed by univariate and
multivariate analyses, adjusted for clinical relevant variables. The outcome (S = 0, NS = 1) was
considered as output of the final model. The performance was evaluated by means of the accuracy,
i.e. the proportion of true classification among the total number of cases examined. Overall, our
results showed that the metabolite species mainly involved are the kynurenine and
lysophosphatidylcholines (lysoPC), the alterations of which has already been reported in septic
shock patients.
9 Nielsen, T. D. & Jensen, F. V. Bayesian Network and Decision Graph. Springer Science & Business Media (2009). 10 Chow, C. K. & Liu, C. N. Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory
14, 462–467 (1968).
Summary
viii
In light of these findings, we speculated that lipid homeostasis and tryptophan catabolism
might influence mortality in septic shock. Therefore, to acquire a more complete view of the
pathways involved in this complex syndrome, we further investigated the changes that occurred in
metabolites concentrations. Chapter 4 illustrates these analyses. The objective of this study was to
better characterize non survivors patients (NS) according to the variations that occurred in
metabolites concentration from D1 to D7, expressed as ratio D7/D1, and to integrate this
information with the variation in proteins and clinical data.
Protein signals, expressed as peak intensities, were measured by a multi-iTRAQ experiment
and the ratio D7/D1 was computed as for the metabolomics data. Three different classification
models were built: one for metabolites only, one for metabolites and proteins and one to integrate
metabolomics and proteomics data with clinical parameters. Firstly, feature reduction by mRMR
algorithm was performed in order to avoid multicollinearity. We considered the first 10, 20 and 30
ranked metabolites to build three different classification models. To find the most reliable set of
features, we performed 50 times an elastic net logistic model on a training set using a logit function
to fit the training set data and we selected the coefficients of the model with the minimal deviance.
We also applied another strategy: we used the shrinkage parameter λ corresponding to the model
with the minimal deviance to fit another elastic net model on a testing set and to obtain the
coefficients of the logistic regression. LDA and PLS-DA were also implemented. More precisely, LDA
was performed on the first 10 ranked metabolites and the coefficients for the linear boundary
between the first and second classes were retrieved. PLS-DA was performed both on the first 10
and 20 ranked metabolites, considering 3 PLS components. The results obtained confirmed that
early changes in plasma levels of lipid species are altered in non survivors as previously outlined. As
for proteins, the most important differences between the two groups are related to proteins
belonging to inflammation and to the coagulation cascade, which are two of the most important
pathways involved in septic shock progression. Finally, Markov Networks were built by combining
metabolomics and proteomics data as already done for the classification models. From a qualitative
analysis, it appears that features have different dependences in the two groups, possibly indicating
differences in the functioning of the molecular mechanisms involved. Overall, the results of this
study confirm the feasibility of our data mining approach for the analyses of metabolomic data and
for the integration with proteomics. In respect to our previous analyses on metabolomic data only,
the integration with proteomics seems to indicate the importance of the interaction between
Summary
ix
inflammation, coagulation and the complement system in sepsis, which is in line with recent
findings.
Chapter 5 presents the study on Shockomics clinical trial database. The goal of this study was
to elucidate early metabolic signatures associated with the progression of septic shock and with
responsiveness to therapy, assessed as change in SOFA score measured at admission (T1, acute
phase) and 48 hours after (T2, post-resuscitation). We examined the plasma metabolome of 21
septic shock patients, classified into two groups according to the following criterion: patients for
which both SOFAT2 > 8 and Δ = SOFAT1- SOFAT2 < 5 were classified as not responsive to therapy (NR,
7 patients), the remaining as responsive (R, 14 patients). Untargeted and targeted mass
spectrometry-based metabolomics strategies were combined to cover as much as possible the
plasma metabolites repertoire. Firstly, a mass metabolic profiling, performed by direct flow
Injection-Time of Flight-MS, was applied as untargeted screening to explore the main perturbed
metabolic features. Afterwards, a targeted analysis to measure specific metabolic classes and the
magnitude of their variation was done with the same methods already outlined for ALBIOS dataset.
Variations that occurred in metabolites concentration from T1 to T2 was expressed as Δ = T2-T1.
Two models were built: one for targeted metabolomics data and one which integrated data from
the targeted and the untargeted approach. Also in this case, the mRMR algorithm was used for
preliminary feature selection. The top 10, 20 and 30 ranked features were used to build two elastic
net regression models as previously described. LDA and PLS-DA were also implemented. The
multivariate models showed that lower variation in the concentration of plasmalogens and of fatty
acids, in combination with a higher increment of alanine, were associated to non-responsiveness.
These findings supported the emerging evidence that lipidome alteration plays an important role in
the individual patient response to infection, thus the understanding of regulatory pathways of lipids
is important for the development of an effective and tailored therapy. Furthermore, alanine
indicated a possible alteration in glucose-alanine cycle which occurs in the liver thus providing a
different picture on liver functionality than bilirubin, which is usually used in clinics. These results
were also strengthened by the explorative analyses performed by Markov Networks from which it
appeared that lipids species and gluconeogenic amino acids are important regulatory nodes.
In conclusion, our results demonstrate the feasibility and the robustness of the proposed
approaches, in spite of the limited number of patients. In fact, the performances of our classification
models are good and the species identified are in line with other studies done on larger cohorts and
with investigations on the identified pathways. To the best of our knowledge, no other scientific
Summary
x
works have applied data mining technique to perform multilevel omics analyses with the aim to find
association between plasma metabolome changes and mortality or responsiveness to therapy.
Thus, our findings represent a significant advance in the field and could be an important step toward
the devise of a personalized therapy.
Even if these results are very promising, some limitations must be discussed. An important
limit of the study is represented by the small size of the datasets used to build the predictive models.
However, we tried to reduce the confounding factors by focusing on a homogeneous groups of
patients in severe septic shock, thus hypothesizing that the changes observed are mainly related to
shock progression and different prognoses. Another limitation is that metabolites concentrations
were measured only at two time points (i.e. 7 days and 48 hours after shock diagnosis): a more
frequent monitoring of metabolites temporal change might provide better insights on the pathways
activated at different stages of disease. Finally, we are aware that these results can be affected by
overfitting since we did not have an independent validation dataset. However, we must recall that
we are not interested in prediction but in the development of an approach to describe the current
datasets and to identify the main pathways involved in pathology progression within the studied
patients’ cohorts. A thorough investigation on such pathways requires specific experiments, which
takes into account specific organ and not the blood stream, enzymes or other byproducts of such
pathways. Despite these limitations, our results can be considered a further contribution in the
current clinical scenario.
Specifically, we observed that low levels of lysoPCs are associated with poor outcome.
Recently systemic lysoPCs treatment has been proved to be effective in rodent models of sepsis and
ischemia11. These observations seem to suggest that elevation of plasma levels of these lipids can
actually help to relieve serious inflammatory conditions, thus a systematic administration of lysoPC
therapies may be of great utility in the treatment of septic shock. Moreover, the integration analyses
of metabolomics and proteomics data highlight the interplay between lipid metabolism,
inflammation and coagulation. Inflammation and coagulation are strictly linked, thereby it has been
suggested that the use of drugs, such as aspirin, which act on both pathways may interfere with the
pathogenesis of sepsis. To this purpose, a study performed on septic shock patients assuming aspirin
showed a reduction in 30-day mortality rate compared to non-aspirin users, thus suggesting that
11 Yan, J. J., Jung, J. S. et al., Therapeutic effects of lysophophatidylcholine in experimental sepsis. Nat. Med. 161–167
(2004).
Summary
xi
low-dose aspirin administration (100 mg/day) could constitute a putative treatment for septic
shock12.
In spite of these interesting findings, further investigations are needed to better elucidate
important pathophysiological mechanisms involved in septic shock progression and to identify novel
targets for the administration of a timely and effective therapy. Results validation on animal models
of septic shock (currently on going) will be performed in order to refine our assumption and to better
describe the involved pathways.
A comparison with cardiogenic shock patients enrolled in ShockOmics could be a further step
in order to identify common pathways associated, for instance, with acute heart failure. This
information will be very useful to understand the molecular mechanisms which triggers MOF and
heart failure and which are thus independent from the root cause of shock. Eventually, by merging
the information gained from animal experiments and from the analyses on cardiogenic shock
patients, we aim to identify inflammatory mediators and molecular markers activated in shock and
to provide a list of putative biomarkers and pathways involved in its progression in order to guide a
timely early goal directed therapy.
12 Falcone, M. et al. Septic shock from community-onset pneumonia: is there a role for aspirin plus macrolides
combination? Intensive Care Med. 42, 301–302 (2016).
1
1 INTRODUCTION
Septic shock remains one of the major problem in Intensive Care Unit (ICU), with high
mortality and high-risk second lines therapies. Current treatments are mainly devoted to restore
homeostasis and to prevent multiple organ failure (MOF), but clinicians are still far to have found
the optimal therapy. Since the transition to serious illness occurs in few critical hours, the so called
“golden hours”, it has been speculated that early recognition and cure administration could provide
maximal benefit in terms of outcome. For this reason, an attempt to use early goal directed therapy
(EGDT) has been made for the treatment of shock patients in ICU. EGDT consists in an early
hemodynamic assessment on the basis of physical findings and vital signs to detect persistent global
tissue hypoxia so to rapidly adjust cardiac preload, afterload, and contractility to balance oxygen
delivery with oxygen demand. Although several studies have been performed to evaluate its
efficacy, EGDT is still object of a huge debate and a general consensus about its usefulness and
reproducibility has not been reached yet. Improvements in septic shock patients’ survival rate are
still modest, mainly due to the absence of predictive parameters for the monitoring of drug delivery
and patient response. Up to now, most of the studies are devoted to find association with mortality
or with comorbidities but a clear picture on therapy effectiveness is lacking.
In the last years, the research community has become more and more aware that individual
response to therapy is crucial and that precision medicine could be an important aspect to treat also
acute illness conditions such as septic shock. Precision medicine is based on a multilevel approach
to tailor the therapeutics to individual patients, thus it extends personalized medicine beyond the
genome to include broader systems (e.g. the proteome and the metabolome). Specifically, the
interest in metabolomics has recently increased as the metabolites represent the end result of gene
and protein function and activity, therefore they may provide a more sensitive readout of drug
response phenotypes. Metabolomics dataset are very complex, thus data mining and machine
learning approaches represent valuable techniques for analysis and multilevel data integration. In
fact, these methods can help in elucidating early multilevel markers signatures which could reveal
the molecular pathways involved in septic shock.
This first chapter of the dissertation introduces the contest of the PhD thesis. A brief
overview on shock, with particular emphasis on septic shock, will be done. The pathophysiology of
this syndrome, the mechanisms which lead to multiple organ failure and its clinical management
1. Introduction
2
will also be briefly illustrated. Afterwards, an overview on omics data will be presented and the
issues still hampering multilevel data integration will be addressed. A description of the different
approaches currently in use to analyze metabolomics and proteomics data will be provided. The
leading idea of the present study is the need to shed lights on the pathways involved in septic shock
to promote early intervention and a personalized treatment.
1.1 SHOCK
1.1.1 Definition, description and causes
Shock is a syndrome affecting one third of patients in ICU and can be defined as a state of
organ hypoperfusion which causes an imbalance between oxygen delivery and oxygen consumption
to tissues with resultant cellular dysfunction and death. Mortality is still very high, ranging from 20%
to 50%, and depends on different factors, such as the kind of shock, the source of infection and the
leading cause. According to the driver mechanism, shock can be classified as follows[1]:
• Hypovolemic shock: due to the loss of fluid from the circulation which leads to a critical decrease
in intravascular volume. Common causes are: excess fluid loss due to dehydration (e.g. after
severe vomiting or diarrhea) or to diseases which cause excess urination (e.g. diabetes and kidney
failure), extensive burns, blockage in the intestine, pancreatitis or severe bleeding (hemorrhagic
shock).
• Cardiogenic shock: it is a relative or absolute reduction in cardiac output due to a primary cardiac
disorder. This leads to impaired myocardial contractility and, as a consequence, to a dramatic
decrease in the ability of the heart to pump blood. The main causes include: heart attack,
myocarditis, disturbances of the electrical rhythm of the cardiac muscle, mass or fluid
accumulation and blood clots which interfere with the normal flow out of the heart.
• Obstructive shock: it is caused by mechanical factors (usually a physical obstruction) that
interfere with filling or emptying of the heart or of great vessels. Obstructive shock can be caused
by: pulmonary embolism, atrial tumor or clot and pericardial tamponade (i.e. accumulation of
fluid in the pericardial space resulting in a compression of the heart).
• Distributive shock: it results from dilation of blood vessels and pooling of blood in the peripheral
intravascular space. This typically occurs as a consequence of anaphylactic shock, adverse
neurogenic stimuli (e.g. spinal cord injury) or invasion of bacterial endotoxins which directly act
1. Introduction
3
on blood vessels (e.g. sepsis). This latter case can lead to septic shock, a type of distributive shock
due to the progression of an infection.
In spite of the different root causes, all these kinds of shock have in common the collapse of
circulation and the resulting hypoperfusion which causes tissue anoxia and MOF. Anoxia potentiates
the loss of vasculature tone, which ultimately leads to death as a result of cardiorespiratory failure.
1.1.2 Pathophysiology
Shock represents a series of events that, if uninterrupted, act synergically to produce vicious
cycles that ultimately results in the patient’s death (Figure 1.1)[1],[2].
Figure 1.1- Pathogenesis of shock. Top row: possible causes of shock resulting in heart failure. Left: shock symptoms. Hypoperfusion of vital organ is the crucial effect in shock. Its consequences at cellular level are the shift to anaerobic metabolism and, if shock persists, cell anoxia and cell death. At systemic level, shocks triggers neurohormonal mechanisms which eventually leads to multiple organ failure. Form Damjanov,(2000)[1].
Depending on its severity, shock can be clinically classified into three stages: (1) non-
progressive or compensated, (2) progressive, and (3) irreversible. Early stages of shock are reversible
and treatable; however, once serious organ failure ensues, shock becomes irreversible.
At cellular level, reduced perfusion of vital tissues implies that cells receive an amount of O2
which is inadequate for aerobic metabolism. This condition is known as anoxia. As a consequence,
cells shift to anaerobic metabolism which is characterized by increased production of CO2 and
accumulation of lactic acid. Cellular function declines, and, if shock persists, irreversible cell damage
1. Introduction
4
and death occur. At systemic level, cardiac failure and the resultant hypoperfusion are initially
compensated by peripheral vasoconstriction (compensated shock). At the beginning,
vasoconstriction is selective, shunting blood to the heart and brain and away from the splanchnic
circulation. With shock progression, vasoconstriction involves the renal blood vessels and results in
renal hypoperfusion and in a decreased glomerular filtration rate. Low urine output and even anuria
are thus typical of this stage. Anoxia and anuria lead to metabolic acidosis which has a depressive
effect on the heart and further potentiates pump failure. At this stage, the compensatory
mechanisms are not adequate to counterbalance for the loss of blood volume, thereby blood
pressure declines to very low levels and the heart functionality begins to deteriorate (progressive
shock). Heart insufficiency raises intrapulmonary venous pressure, causing stagnation of blood in
the pulmonary circulation which favors the formation of pulmonary edema and affects the
alveolocapillary functional units. Lungs cannot function properly, and this further contributes to
general hypoxia.
As blood pressure further decreases, blood begins to clot in the small vessels. At the same
time, toxins are released from intestine and other tissues that suffer from severe ischemia. In fact,
tissue anoxia results in the release of numerous inflammatory cytokines which cause vasodilation
and promote fluid loss by increasing the permeability of the peripheral blood vessels. In response
to these events, an intense tissue deterioration begins and, without adequate medical intervention,
progressive shock evolves in irreversible shock. This last phase of shock is characterized by the
decreasing of heart functionality and by the progressive dilation of peripheral blood vessels. These
events eventually lead to death, regardless of the amount or type of medical treatment applied. In
fact, when shock becomes irreversible, the organs begin to fail and the so called multiple organ
failure (MOF) occurs. Under this condition, organs not directly injured by the original trauma
become dysfunctional due to systemic disorders involving immunoregulation and endothelial
dysfunction. The damage to tissues, including cardiac muscle, is so extensive that the patient is
destined to die, even if adequate blood volume is reestablished and the blood pressure is restored
to its normal value.
1.2 SEPTIC SHOCK
1.2.1 Definition and incidence
Septic shock definition and clinical criteria have been recently revised by The Third
International Consensus Definitions for Sepsis and Septic Shock[3]. According to the new definition,
1. Introduction
5
septic shock can be considered as a subset of sepsis, i.e. a syndrome characterized by a life-
threatening organ dysfunction caused by a dysregulated host response to infection. In septic shock,
the underlying circulatory, cellular and metabolic abnormalities are profound enough to
substantially increase mortality in respect to sepsis alone[3]. According to the updated guidelines
provided by the International Consensus[4], septic shock can be diagnosed by: i) hypotension (i.e.
systolic blood pressure <90mmHg or mean arterial pressure <65mmHg) and vasopressor
requirement to maintain a mean arterial pressure of 65 mmHg or greater; ii) serum lactate level
greater than 2 mmol/L (>18mg/dL) in the absence of hypovolemia. The persistence of both these
conditions is associated with hospital mortality rates higher than 40%[3],[5]. Moreover, previous
clinical studies have shown that hyperlactatemia and overtime trend of plasma lactate levels can be
considered reliable markers of severity and mortality[6].
The clinical manifestations of sepsis and septic shock are highly variable and subtle,
depending on several factors, such as age, sex, ethnicity, health condition of the patient, initial site
of infection, type of infection, and interval elapsed before treatment administration. Incidence rates
are known to increase with age, probably due to age-related differences in immune function[7].
As reported by the Healthcare Cost and Utilization Project databases, sepsis accounted for
more than $20 billion, corresponding to 5.2% of total United States hospital costs in 2011[8]. The
incidence of sepsis is continuously increasing and conservative estimates show that it is one of the
leading causes of mortality in critically ill patients worldwide[9]. In Europe, severe sepsis affects 90.4
cases per 100 000 adults per year and an overall hospital mortality is of 36%, as described in the last
Sepsis Occurrence in Acutely ill Patients study[10]. Patients who survive severe sepsis or septic shock
often have long-term physical and cognitive impairment and they are at risk for early death within
5 years, with mortality rates as high as 75%. This situation has a significant impact on health care
costs and also important social implications[11].
1.2.2 Pathophysiology
Although the pathophysiology of septic shock is not precisely understood, it has become
evident that it involves complex interactions between the pathogen and the host’s immune system.
More precisely, the initial host response to an infection triggers subsequent compensatory anti-
inflammatory mechanisms, which on the one hand contribute to the clearance of infection and
tissue repair, on the other are implicated in organ injury and in susceptibility to secondary
infections[4].
1. Introduction
6
When an infection occurs, pathogens are recognized by the host through pattern recognition
receptors (PRRs, e.g. toll-like receptors), which are proteins expressed by cells of the innate immune
system to identify two classes of molecules: pathogen-associated molecular patterns (PAMPs), and
damage-associated molecular patterns (DAMPs). PAMPs are highly conserved and unique structures
of microbial pathogens whereas DAMPs, or alarmins, are generated endogenously in presence of
cellular damage or injury. After interaction with PAMPs, PRRs trigger a complex signaling system.
This results in a series of concatenated events including the release of inflammatory mediators and
reactive oxygen species (ROS), local vasodilation, increased endothelial permeability, and activation
of coagulation pathways. In sepsis and septic shock, the physiological inflammatory response is
overwhelming and not adequately compensated by the anti-inflammatory mechanisms which
should limit its potentially harmful effect. More specifically, excessive production of reactive oxygen
species (ROS) damages cellular proteins, lipids and DNA and compromises mitochondrial function.
Moreover, whereas the impairment of the coagulation cascade promotes formation of
microvascular thrombi, and fibrin deposition, thus causing microcirculatory alterations which
ultimately result in poor tissue perfusion[4],[12]. The interactions between inflammation,
coagulation and complement activation during sepsis progression is illustrated in Figure 1.2.
Inflammation and coagulation are tightly inter-connected in septic shock. In fact, uncontrolled
inflammation promotes disseminated intravascular coagulopathy (DIC), a syndrome characterized
by massive thrombin production and platelet activation, coupled with impaired fibrinolysis and
microvascular thrombosis. The combinations of these events results in consumptive coagulopathy
and bleeding which contribute to organ failure. DIC is a central event in the pathophysiology of
sepsis and one of the most important marker of poor prognosis[13].
Figure 1.2 - Schematic representation of the interactions between inflammation, coagulation and complement activation during sepsis progression. From Lupu et al (2013)[13].
1. Introduction
7
At systemic level, all the events here described results in an inadequate oxygen delivery to
peripheral tissues and in consequent tissue anoxia which leads to the same pathology progression
already described for shock.
1.2.2.1 Effect of hypoxia at cellular level: the role of mitochondria
It has been observed that patients with sepsis have dysfunctional mitochondria which are
damaged by high levels of ROS[14]. Mitochondria are membrane-bound organelles found in most
eukaryotic cells. They are usually described as the "cellular power plant" because within them most
of the cell supply of adenosine triphosphate (ATP) is generated. In addition to supplying cellular
energy, mitochondria are also involved in other essential tasks such as signaling, cellular
differentiation, cell death, control of the cell cycle and cell growth. The number of mitochondria in
a cell varies widely by organism and tissue type, according to their energy demand[15],[16].
Mitochondria have a double membrane structure: the outer membrane encapsulates the
organelle whereas the inner one surrounds the central matrix space. These two phospholipidic
membranes separate four distinct compartments (Figure 1.3)[16],[17]:
1. outer membrane: it provides a permeability barrier and contains several integral proteins,
called porines, which regulate substances exchange;
2. intermembrane space: it has an ionic composition similar to the cytosol but it contains also a
distinct group of carrier proteins specific of the mitochondrion;
3. inner mitochondrial membrane: it is highly folded into cristae which greatly increase its surface
area. It is a highly specialized membrane since its lipid bilayer contains cardiolipin, a four-tailed
phospholipid which makes the membrane especially selective for ions. This membrane also
houses the electron transport chain (ETC) complexes and respirasomes, thus giving structural
support to the phosphorilation apparatus;
4. matrix: contains hundreds of metabolic enzymes, ribosomes, mitochondrial DNA and RNA. It is
here that ATP is produced through oxidation of pyruvate and fatty acids which enter then the
Krebs Cycle (see Figure 1.4).
1. Introduction
8
Figure 1.3 – Mitochondrion structure. It has a double membrane: the inner one contains the ETC apparatus and has deep grooves (cristae) which increase its surface area. The ATP synthesis occurs inside the mitochondrial matrix where also mitochondrial DNA and RNA are contained.
The particular structure of the mitochondrion provides a compartmentalization of its
metabolism: the membranes resemble a sieve which regulates substrate and waste product
exchange whereas all the reactions necessary for energy production occur in the matrix. The main
substrates for mitochondrial oxidation are pyruvate and fatty acids (FA). Pyruvate comes from
glucose or other sugars originated from carbohydrates metabolism while FAs come from fats. Both
these fuel molecules are transported across the inner mitochondrial membrane and then converted
to the crucial metabolic intermediate acetyl-CoA by enzymes located in the mitochondrial matrix
(Figure 1.4)[16],[17].
Figure 1.4 - Representation of classic pathways of cellular metabolism. Substrates (glucose and FAs) are transported across the cell membrane into the cytosol where they are activated to pyruvate and acetyl-CoA. These two metabolic intermediates are transported inside the mitochondrion by specific transport systems. Once inside, the substrates enter the Krebs Cycle and their reducing equivalents are used by the electron transport chain to generate a proton gradient which is used for ATP production. From Doenst et al[18].
1. Introduction
9
The main mechanisms involved in cellular metabolism are listed in the following:
• Glucose use: the glucose used for ATP generation either comes from uptake of exogenous
molecules or, to a lesser extent (<40%), from glycogen stores. Glucose is transported into the
cytosol by glucose transporters, among which GLUT 1 and GLUT 4. In the cell glucose is
phosphorylated to glucose-6-phosphate which enters the glycolytic pathways. Glycolysis
generates pyruvate which can be transported into the mitochondrial matrix where it is oxidized
to acetyl-CoA by the multienzyme complex pyruvate dehydrogenase[18]. This complex is a key
regulator of pyruvate oxidation since it is inhibited by accumulation of end product of FA
oxidation[17].
• FA use: the oxidation of FAs is a complex process which occurs within the mitochondria and
represents the major source of energy for cells. The process by which FAs are broken down into
energy involves different kinds of proteins, listed in Table 1.1, and it can be divided into several
steps, as schematized in Figure 1.5[18],[19]:
1. uptake of FAs into the cytosol, facilitated by transport proteins and plasma membrane FA-
binding proteins (FABP, FAT)[20];
2. addition of a CoA group by fatty-acyl-CoA synthase (FACs) and formation of long chain acyl-
CoA, a temporary compound which can enter the mitochondria;
3. conversion of long chain acyl-CoA in long-chain acylcarnitine by carnitine
palmitoyltrasferase (CPT I). This reaction represents a crucial regulatory node in FA
oxidation. In fact, this enzyme is subject to feedback inhibition by the acyl-CoA breakdown
product malonyl-CoA that builds up during high rates of FA oxidation. Accumulation of
malonyl-CoA reduces FA oxidation and further increases cytoplasmic free FA and acyl-CoA
metabolites leading to energy deficiency[21],[22].
4. transportation into the mitochondrial matrix via carnitine translocase (CAT). Inside the
matrix long-chain acylcarnitine is converted back to long-chain acyl-CoA by CPT II;
5. inside the mitochondrial matrix, long chain acyl-CoA molecules are broken down to acetyl-
CoA, which is then oxidized in the Krebs Cycle.
1. Introduction
10
Figure 1.5 - Schematic representation of FAO. FAs primarily enter a cell via fatty acid protein transporters on the cell surface (FABP, FAT). Once inside, FACS adds a CoA group to the fatty acid which is then converted to acylcarnitine by CPT I. Acylcarnitine is transported by CAT across the inner mitochondrial membrane. Once in the matrix CPT II converts the acylcarnitine back to acyl-CoA which enters the fatty acid β-oxidation pathway, resulting in the production of one acetyl-CoA from each cycle of β-oxidation. Acetyl-CoA then enters the Krebs cycle. The NADH and FADH2 produced by both β-oxidation and the Krebs cycle are used by the electron transport chain to produce ATP[21].
• ATP production: the common end product of glucose and FAs oxidation is acetyl-CoA which
enters the Krebs Cycle. The cycle converts the carbon atoms from acetyl-CoA to CO2, which is
released from the cell as a waste product. The cycle generates also high-energy electrons,
carried by the activated carrier molecules NADH and FADH2. These high-energy electrons are
then transferred to the inner mitochondrial membrane, where they enter the ETC for oxidative
phosphorylation. As electrons move along this chain, energy is stored as an electrochemical
proton gradient across the inner membrane which is used to drive ATP synthesis[16]. Finally,
ATP is transported from the mitochondrial matrix to the cytoplasm through the adenine
nucleotide transporter, making energy available for cellular work.
PROTEIN NAME FUNCTION Fatty acid binding protein FABP Peripheral membrane protein: traps FAs to facilitate their absorption[20] . FA translocase FAT Integral membrane protein: enables FAs to enter the cell[20].
Fatty acyl-CoA synthase FACs Cytosolic enzyme responsible for esterification of FAs to long chain fatty acyl-CoA[21].
Carnitine palmitoyltrasferase I CPT I Located in the outer mitochondrial membrane, this enzyme converts acyl-CoA in acylcarnitine so that it can enter the mitochondrion.
Carnitine translocase CAT Shuttles the acylcarnitine across the inner mitochondrial membrane[19]
Carnitine palmitoyltrasferase II CPT II Located in the inner mitochondrial membrane, this enzyme converts acylcarnitine back to acyl-CoA[23].
Table 1.1 – List of proteins involved in FA oxidation and main functions.
1. Introduction
11
Also ROS production is an important aspect of mitochondria life-cycle. Under normal
circumstance about 98% of mitochondrial oxygen combustion is linked to ATP formation through
oxidative phosphorylation. A by-product of this process is the generation of ROS, a variety of
molecules and free radicals derived from molecular oxygen[24]. ROS are important in the redox
signaling from the mitochondrion to the rest of the cell but they also contribute to mitochondrial
damage in several pathologies. In fact, ROS production can lead to oxidative damage to
mitochondrial proteins and membranes, thus impairing the ability of these organelles to synthesize
ATP and to carry out their wide range of metabolic functions which are central to the normal
operation of cells. Mitochondrial DNA is also susceptible to attack by ROS. Since expression of the
entire mitochondrial genome is required to maintain the functional integrity of mitochondria,
mtDNA damage and depletion results in an impaired mitochondrial respiratory capacity and in cell
growth arrest[25]. Mitochondrial oxidative damage can also increase the tendency of mitochondria
to release intermembrane space proteins and thereby activate the cell apoptotic machinery[26].
Recent investigations indicate that mitochondria damage contributes significantly to the
pathogenesis of sepsis-induced MOF through dysregulation of oxygen metabolism, i.e. cytopathic
hypoxia, accelerate oxidant production and cell death promotion[25]. Cytopathic hypoxia results in
impaired oxygen utilization and development of tissue acidosis, as demonstrated by excess lactate
levels in the blood of septic shock patients. Cellular oxygen metabolism can be impaired by a number
of different mechanisms but there is evidence that it is primarily altered at cellular level, particularly
in the mitochondria. In fact, during sepsis several critical components of the ETC are compromised,
thereby mitochondria are no longer able to efficiently provide energy to the cell. In addition to tissue
hypoxia, also some mediators of the innate immune system, such as tumor necrosis factor α (TNFα)
and various interleukins, have an important role in the etiology of cytopathic changes and
mitochondrial injury. In particular, TNFα acts via receptor-mediated signaling pathways and triggers
mechanisms which induce cytotoxicity. Mitochondrial death pathways are also involved in the
depletion of lymphocytes and intestinal epithelial cells thus compromising the host’s immune
response and physical barriers to infection.
1.2.3 Treatment of septic shock
The current treatment for septic shock patients is mainly devoted to restore hemostasis and
to prevent MOF. The Surviving Sepsis Campaign[27] recently provided the updates guidelines for
the management of severe sepsis and septic shock, which we briefly summarize here. Within 6 hours
1. Introduction
12
after shock diagnosis, initial aggressive resuscitation with vasopressor administration is provided to
patients with hypotension. In fact, septic shock patients require the administration of fluids, usually
crystalloids, and vasoactive agents, such as noradrenaline, in order to avoid prolonged hypotension.
They often receive lung support with a ventilator in order to achieve adequate oxygenation. In this
initial phase bacterial cultures are also collected to make a possible diagnosis and to start an
empirical antimicrobial therapy. Early antimicrobial therapy is of primary importance in the
treatment of septic shock, since the prompt administration of an appropriate therapy significantly
reduces mortality risk[28]. The choice of the therapy depends on several factors, such as the
suspected source of infection and the medical history of each patient. Overall, the aim of the
treatment is both to restore hemodynamic stability and to mitigate the effect of uncontrolled
infection. This initial stage is followed by the so called supportive therapy: patients are continuously
monitored and a set of different procedures, aimed to support organ function, can be performed,
such as blood product administration, mechanical ventilation and renal replacement therapy. When
possible, de-escalation of therapy is done to avoid the emergence of resistant organisms and to
minimize the risk of drug toxicity[4],[27]. In spite of the progresses made in the understanding of
the underlying biologic features of sepsis, clinicians are still far to have found the optimal clinical
treatment, thus new strategies should be adopted to translate advances in molecular biology into
effective new therapies.
1.2.4 The SOFA score
To assess efficacy and cost-effectiveness of new therapies, mortality alone is not a sufficient
parameter. In clinical practice, the severity and progression of organ dysfunction is usually assessed
by means of scores. Several scoring systems have been introduced to quantify abnormalities
according to clinical findings, laboratory data and therapeutic treatments. The predominant one,
currently in use, is the Sequential Organ Failure Assessment (SOFA) score[3]. This score is based on
the assumption that organ failure is not an “all-or-none” phenomenon but rather a dynamic process
for which also the progression of events is important. A regular assessment of organ function is
therefore necessary to follow the evolution of the disease. The SOFA score estimates dysfunctions
regarding liver, kidney, coagulation, cardiovascular, respiratory and central nervous systems; it also
accounts for clinical interventions and laboratory variables like PaO2, platelet count, creatinine and
bilirubin levels. It is composed of six scores, one for each organ system (respiratory, cardiovascular,
hepatic, coagulation, renal and neurological), graded from 0 to 4 according to the degree of
1. Introduction
13
dysfunction, 4 being the worst condition[29]. A higher SOFA score is thus associated with an
increased probability of mortality. Both the variation, the mean and highest SOFA scores are
predictors of outcome. Specifically, an increase of 2 points or more is associated with an in-hospital
mortality greater than 10%[3] and maximum SOFA greater than 15 points is associated with a
mortality rate above 90%[29].
1.3 OMICS DATA
Omics data (e.g. genomics, transcriptomics, proteomics and metabolomics) provide system-
level information for all type of cell components and interactions in an organism. Each type of omics
data describes a different step of the biological information flow, starting from DNA till the
expression of a particular cell phenotype (Figure 1.6).
Figure 1.6 - Flow of biological information represented also as omics data
Briefly, in every organism DNA (genomics) is first transcribed to mRNA (transcriptomics) and
translated into proteins (proteomics). Proteins, such as enzymes or transcription factors, catalyze
reactions of which metabolites (metabolomics), glycoproteins and oligosaccharides (glycomics) and
various lipids (lipidomics) are byproducts. All processes involved are generally dictated by different
1. Introduction
14
kinds of molecular interactions: protein-DNA interactions in the case of transcription, protein-
protein interactions and enzymatic reactions in translational processes. Ultimately, the metabolic
pathways comprise integrated networks, or flux maps (fluxomics), which dictate the cellular
behavior or phenotype (phenomics)[30].
Although the knowledge of each type of omics data is crucial for global understanding of
cellular processes, this information alone is not sufficient to gain a comprehensive view of all the
biological mechanisms involved in the phenomenon under study. Integrated approaches combining
two or more omics field (e.g. metabolomics and proteomics), are thus required to gain deeper
insights. Although this information is usually available, multilevel integration of different omics data
is still an open issue and effective data integration is far to be achieved. In fact, the handling,
processing, analysis and integration of omics data require specialized mathematical, statistical and
bioinformatics tools which are not fully available yet. Several technical problems are still hampering
a rapid progress in the field, and researchers usually have to compare multiple databases and to
manually extract and assemble the information needed[31]. Hence, it is easy to infer that a
meaningful comparison and exchange of omics data, obtained from different platforms or different
laboratories, are cumbersome. This is mainly due to the lack of standards for data formats, data
processing parameters, and data quality assessment. In addition to this, it is extremely challenging
to figure out how to actually analyze the huge amounts of data generated and how to deduce
biological insights. The necessity of an integrated pipeline for comprehensive analysis of complex
omics data sets has therefore become a critical aspect of multilevel data integration studies[32],
[33]. A more detailed review on omics data analysis and on the currently available software has
been published in Briefings in Bioinformatics1.
1.3.1 Metabolomics
Metabolomics is a rapidly growing field of biological sciences which has lately reached
widespread applications in many different areas, including molecular epidemiology, biomarker
discovery and identification, drug development and personalized health care[34]. It consists in the
analysis and quantification of thousands of metabolites, i.e. small molecular compounds (< 1500
Dalton) which constitute the end products of cellular metabolism and can thus be considered the
chemical fingerprint of an organism at a precise time point. Overall, the aim of a metabolomic study
1 A. CAMBIAGHI, M. Ferrario, M. Masseroli, “Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration”, Briefings in Bioinformatics (2016)
1. Introduction
15
is to correlate the changes in metabolite concentration with pathological states, or with the effect
of environmental influencing factors, such as drugs or contaminants[35].
The analytical approaches to perform a metabolomic analysis are two: targeted and
untargeted. Targeted metabolomics is the measure of a small set of known compounds quantified
according to a standard. The choice of the set of metabolites is driven by a specific biochemical
question so it includes one or more already defined pathways. The main limitation of the targeted
approach is that an a priori knowledge of the compounds of interest is needed, therefore this
method is less suitable for the discovery and identification of novel metabolic markers[36], [37]. The
untargeted approach is more global in scope since it does not depend on an a priori hypothesis. The
aim of this technique is to simultaneously measure as many metabolites as possible without any
bias. Variations in metabolite concentrations are observed as total changes of chromatographic
patterns without requiring the previous knowledge of the compounds under investigation. Each
metabolite produces one or more chromatographic peaks, which correspond to an ion with unique
mass-to-charge ratio and retention time. Masses are not precisely measured, it possible to have
only their relative quantification, usually expressed as fold change[37], [38]. Metabolite
identification from chromatographic peaks is a manual and time-consuming process and, due to the
lack of annotation databases of high coverage, the association of metabolites with their spectra is
still a challenge[37].
Absolute metabolomic targeted approach would apparently sound limited and risky when
compared to untargeted strategies, which emphasize the global behavior of the metabolome. In
fact, one may argue that a targeted approach might miss novel metabolic features linked to
metabolic derangement in the condition under study. However, untargeted approaches have
several drawbacks such as difficulties in identifying all the detected signals, the reliance of the
intrinsic analytical coverage of the MS platform employed, the bias towards the detection of highly
abundant metabolites and the lack of absolute quantitative information on metabolites. Targeted
metabolomics provides indeed quantitative information about the molar concentrations of the
metabolites involved in a pathway, facilitating the immediate understanding of any alterations
between different biological states.
1.3.1.1 Techniques for metabolomics data analyses
There are several techniques which can be used for metabolite profiling, each one has
associated advantages and drawbacks. Thereby, a combination of different approaches is usually
1. Introduction
16
applied to gain a broader prospective. Two are the main analytical methods: mass spectrometry
(MS) and nuclear magnetic resonance (NMR) spectroscopy[38].
MS is the most widely applied technology in metabolomics, due to its sensitivity and to the
wide range of covered metabolites. Mass spectrometers operate by ion formation, separation of
ions according to their mass-to-charge (m/z) ratio and detection of separated ions. MS can be used
to analyze biological samples either directly via direct-injection MS or coupled with
chromatographic or electrophoretic separation[39]. However, given the heterogeneity of the
landscape of the metabolome, this latter strategy is preferred to decrease sample complexity. The
most commonly used chromatographic separation techniques are gas chromatography (GC), high-
performance liquid chromatography (HPLC) and capillary electrophoresis (CE)[40]. After separation,
data are usually collected on a quadrupole time-of-flight (Q-TOF) mass spectrometer or by ion trap
instruments such as an Orbitrap[37]. The metabolomics analyses presented in this dissertation have
been performed by Q-TOF, as described in Appendix A. Q-TOF mass spectrometry is based on the
assessment of an ion's m/z by a time measurement. Briefly, ions are accelerated by an electric field
of known strength, thus each ion has the same kinetic energy as any other ion with the same charge.
The velocity of the ion depends on the m/z: for equal charge, the heavier ions are slower, whereas
for equal mass ions with higher charge are faster. The time each ion takes to reach the detector
depends on its velocity, and therefore it is a measure of its m/z. From this ratio and from known
experimental parameters, it is possible to identify the ion.
Also NMR is used for metabolites detection since it is a rapid and non-destructive method
which requires only minimal sample preparation. NMR spectroscopy functions by the application of
strong magnetic fields and radio frequency pulses to the nuclei of atoms, thus it exploits the
magnetic properties of certain atomic nuclei to determine the physical and chemical properties of
molecules. The output of a NMR analysis is a spectrum which is often convoluted and hard to
interpret. This aspect and the low sensitivity of this techniques, make NMR inappropriate for the
analysis of large number of low-abundance metabolites[38],[39].
1.3.1.2 Metabolomics analysis workflow
In the following, we provide a brief overview of how a metabolomic analysis is
performed[37],[38],[41]. A typical metabolomic study consists of several steps, as illustrated in
Figure 1.7, which can be grouped as follows[31],[38],[42],[43]:
1. Introduction
17
Figure 1.7 - Flow chart of a typical metabolomics study. After sample preparation, specific metabolic signals are acquired using heterogeneous analytical platforms (DATA ACQUISITION). Raw signals are then pre-processed to produce data in a suitable format for univariate and multivariate statistical analysis. (DATA PROCESSING). Significantly expressed metabolites are then linked to the biological context, through enrichment and pathway analysis, and mapped into networks. Finally, metabolomics data are integrated with other ‘omics’ data and with prior knowledge to gain a comprehensive view of the molecular processes involved (DATA INTERPRETATION AND INTEGRATION).
1. SAMPLE PREPARATION: according to the kind of sample under analysis (i.e. blood plasma,
serum, urine, saliva, solid tissues or cultured cells), a different approach must be adopted. In
fact, a correct sample preparation is essential to ensure the optimal extraction of metabolites
and thus reduce experimental error[40],[44].
2. DATA ACQUISITION: different approaches are used to separate and chemically characterize
diverse groups of metabolites on the base of both their chemical and physical properties. As
previously outlined, compound separation techniques such GC, HPLC and CE are combined with
compound detection techniques, such as MS or NMR. Each method has different resolution,
sensitivity and technological limitations in identifying metabolites thus it should be chosen in
accordance to the chemical and physical characteristic of each sample and to the kind of
analysis performed (targeted or untargeted)[40],[41],[43].
3. DATA PROCESSING: to facilitate compound quantification, the acquired raw signals
(chromatograms, spectra or NMR data) are pre-processed by ad hoc software tools such as the
commercial software SIEVETM by Thermo Scientific (www.thermofisher.com) or the cloud-based
1. Introduction
18
platform XCMS (Center for Metabolomics at the Scripps Research Institute[45]). Some groups
also use in-house scripts developed with different softwares (e.g. Matlab). The pre-processing
stage usually involves noise reduction, retention time correction, peak detection and
integration, and chromatogram alignment. In untargeted studies, metabolites have to be
identified from spectral information, usually by means of different databases search, such as
the Human Metabolome Database (HMDB, http://www.hmdb.ca/[46]) or the MEtabolite and
Tandem MS Database (METLIN, http://metlin.scripps.edu[47]). Once the metabolites list is
ready, a statistical analysis is performed to find significant differences between sample sets. A
typical statistical analysis for metabolomic data consists of two phases: a more general analysis
using traditional statistical methods followed by a more focused investigation applying data
mining strategies. Overall, traditional statistical methods are used to gain a global view of the
considered datasets and to identify which metabolites significantly change under the studied
conditions. A limit of traditional approaches is that they highlight relationships among variables
based only on mathematical criteria (e.g., maximization of variance, or correlation) thus they
do not take into account correlations of biological origin[48]. This analyses should be combined
with data mining techniques which allow to better discriminate groups of functionally related
metabolites (i.e., metabolite sets) which can be used for biological interpretation[36], [42].
Chapter 2 will present the main techniques of data mining, emphasizing the ones adopted in
this study.
4. DATA INTERPRETATION AND INTEGRATION: in this final step, the set of significant metabolites
previously found is linked to the biological context under study. In fact, to better understand
the biological role of each metabolite, the chemical information derived from metabolomics
analyses has to be related to both their biochemical origins and physiological
consequences[35],[43]. This can be done through enrichment and pathway analyses.
Enrichment analysis aims to investigate the enrichment (i.e., over and/or under-expression) of
predefined groups of functionally related metabolites in order to find significant expression
changes among them. Moreover, the identification of altered metabolites allows to select
specific biological pathways, or disease condition, which can be further investigated[31].
Pathway analysis involves the description and visualization of the interactions among genes,
proteins, or metabolites within cells, tissues or organs. Its goal is to identify the pathways which
significantly impact on a given biological process[49]. Enrichment and pathway analyses are
usually performed using specific software tools (e.g. MetPa[49]), which map significant
1. Introduction
19
metabolites to known biochemical pathways on the basis of the information contained in public
databases (e.g. the Kyoto Encyclopedia of Genes and Genomes (KEGG))[50]. Pathway data are
usually presented as networks, with metabolites as nodes and reactions as edges. To obtain a
comprehensive view of all the biological processes involved, the information regarding the
metabolic pathways has to be integrated with transcriptomics and proteomics data[35],[51].
Integration with biological knowledge derived from the literature or from previous
experimental data is also suggested to reach a more reliable evaluation of the process under
study[43],[48].
1.3.2 Proteomics
Proteomics refers to the large-scale analysis of the proteins encoded by the genome. It
involves the application of different technologies to detect and quantify the overall proteins content
of a cell, tissue or organism in order to understand proteins structure and function. As
metabolomics, proteomics is used in many research fields such as biomarkers discovery, vaccine
production and study of the alteration of expression patterns in response to different stimuli or
disease states. Proteomics analyses are very complex because they consist in the identification of
the protein signatures of the whole genome, which differ from cell to cell and from time to time.
For instance, the human genome harbors from about 26000 to 31000 protein encoding genes,
whereas the total number of human protein products, including splice variants and essential post
translational modifications, is estimated to be close to one million. In spite of the advance of new
technologies, comprehensive proteomics analysis of biological samples (e.g. plasma, serum or other
bodily fluids or tissues) has not been fully developed yet, mainly due to high complexity of the
samples and to the wide dynamic range of protein concentrations. Therefore, processing and
analysis of proteomics data is a very long and complicated process[52],[53].
There are two main approaches to proteomic analyses: top-down and bottom-up. In top-
down proteomics, intact proteins or large fragments are ionized and analyzed by mass
spectrometry, whereas the bottom-up rely on peptides, generated by proteolytic digestion of
protein samples. Since top-down proteomics is limited by protein size (<50kD), bottom-up
techniques are currently the most commonly used[54]. Like metabolomics, proteomics analyses can
be further divided into targeted and untargeted. Targeted proteomics experiments are hypothesis-
driven, thus they are designed to quantify a limited number of proteins (i.e. less than one hundred)
1. Introduction
20
with very high precision. Untargeted proteomics studies instead aim at identifying as many proteins
as possible across a broad dynamic range.
Irrespectively from the approach adopted, several different techniques can be applied to
perform a proteomics analysis[53]. Generally, two are the main strategies: gel-based and shotgun
proteomics, both of which include a great variety of analytical methods. Gel-based applications
consist in one-dimensional and two-dimensional polyacrylamide gel electrophoresis and they have
been developed well before the term proteomics was coined. In spite of this, they are still
extensively used mostly for qualitative experiments, protein separation and quantitative expression
profiling[55]. The main drawbacks of these techniques is the poor reproducibility and inability to
detect certain classes of proteins including acids, basis, low abundance and hydrophobic ones.
Shotgun proteomics, also called gel-free or MS-based techniques, have become the most common
method for proteomics analyses since they are more sensible and reproducible than gel-based ones.
They include several methods among which multidimensional protein identification technology,
isotope-coded affinity tag, stable isotope labeling with amino acids in cell culture, and isobaric
tagging for relative and absolute quantitation (iTRAQ)[56]. This latter approach will be further
described in the following since data from iTRAQ experiments was analyzed in this study to
complete the metabolomics analyses. Additional details on the experimental procedure adopted in
this study can be found in Appendix B.
1.3.2.1 The isobaric tags for relative and absolute quantification (iTRAQ) method
The iTRAQ technique can be used both for relative and absolute quantitation and enables to
analyze from 4 to 8 different samples simultaneously. It is based on the use of stable isotope labeled
molecules (i.e. isobaric reagents) which covalently bond to the N-terminus of the primary amines of
peptides and proteins. The iTRAQ analytical process is long and consists of different steps. Firstly,
proteins are isolated from the biological samples (protein extraction) to obtain a protein isolate.
Several strategies can be adopted for protein extraction according to the kind of sample being
analyzed and to the purpose of the study. For instance, extraction is more selective for a targeted
study than for an untargeted one. After extraction, high abundant proteins have to be removed to
avoid any bias, usually by chromatographic separation techniques. Albumin is very abundant in
human plasma, thus it is usually removed prior to performing a proteomics analysis in order to
better detect and quantify lower abundance proteins. Before iTRAQ labelling, proteins are digested
using an enzyme, usually trypsin, to generate smaller proteolytic peptides. This is done since most
1. Introduction
21
proteins are too big to fall within the limited mass range which a typical mass spectrometer can
measure. Each sample is labeled with a different iTRAQ reagent and then combined into one sample
mixture (Figure 1.8).
Figure 1.8 - Example of iTRAQ proteomic quantitation used on 6 different samples. After trypsin digestion, samples are labeled with individual mass tags and then combined in a unique mixture for LC-MS/MS analysis. Since the masses of all of the tags are the same, identical peptides from different samples elute together in the LC column. After the analysis by tandem MS, the removed tags enable to quantitate relative peptide intensities, while the peptide fragment ions are sequenced for protein identification.
The combined samples mixture is then analyzed by liquid chromatography and tandem mass
spectrometry (LC-MS/MS) for both identification and quantification. Liquid chromatography
enables to divide the peptides mixture in smaller sub-samples in order to simplify subsequent
analysis and results interpretation; eluted compounds are then injected into the mass spectrometer.
In the first round of MS, peptides are ionized and their mass-to-charge ratio measured to yield a
precursor ion spectrum. During the second MS, the isobaric tags are broken off and quantification
is performed based on the relative abundance of these tags. The relative quantity of a peptide
among the treated samples is determined by comparing the intensities of reporter ion signals also
present in the MS/MS scan[56],[57].
1.3.2.2 Proteomics data analysis and interpretation
After data acquisition with any of the methods previously mentioned, the raw signal
obtained (e.g. chromatographic peaks for iTRAQ analyses) have to be translated into proteins and
into biological information. This is a complex process which can be summarized in four main
stages[33]:
1. Introduction
22
1. DATA PROCESSING: raw data are processed to extract relevant information. After peak
detection, more elaborate algorithms are used to discriminate signal from noise and to derive
more accurate measurements.
2. PEPTIDE IDENTIFICATION: the obtained MS spectra should be compared with the ones collected
in the available databases in order to find matches with known peptides. Several engines are
available such as Sequest, Mascot, Comet, X!tandem. Protein identification via database
searches is computationally intensive and time-demanding. In fact, the assignment of spectra
to peptide sequences is not direct, but it involves matching and scoring large sets of
experimental spectra with predicted masses from fragment ions of peptide sequences. The
various search engines do not yield identical results as they are based on different algorithms
and scoring functions. This makes comparison and integration of results from different studies
extremely challenging[33].
3. PROTEIN IDENTIFICATION AND VALIDATION: in this step, identified peptides are reassembled
in silico into proteins. The association of identified peptides with their precursor proteins is a
very critical procedure since many peptides are common to several proteins, thus making
protein assignments quite ambiguous. For this reason, ad hoc tools like ProteinProphet or
Mascot, are used to assess the validity of the protein inference and associate a probability to
it[56]. These tools cleave every protein in the specified search database in silico according to
specific rules depending on the cleavage enzyme used for digestion and then they calculate the
theoretical mass for each peptide. Afterwards, the software computes a score based on the
probability that the peptides from a sample match those in the selected protein database and
derives the ones which better explain the observed peptides. Clearly, proteins with multiple
peptide matches have a much greater confidence in their assignment than proteins identified
only by one peptide[33]. Therefore, the outcome of a proteome analysis is usually a long list of
identified factors, associated to a probability score and ideally also to a quantitative value.
4. DATA INTERPRETATION AND INTEGRATION: once the proteomics analysis per se is finished and
the list of relevant proteins is ready, functional analysis is performed with the aim of revealing
pathways and interactions relevant to the biological question of interest. The first step of a
functional analysis is to connect the protein name to a unique identifier and then to its
associated Gene Ontology terms (http://www.geneontology.org). In this way proteins are
matched to their corresponding gene from which it is easier to infer the molecular pathway
involved. Alike metabolomics, enrichment and pathway analyses are commonly performed in
1. Introduction
23
order to link the list of significant proteins to the biological context under study. Proteins
involved in the chemical reaction and those that have regulatory influence are combined in so-
called pathway databases, such as KEGG, Reactome, Ingenuity Pathway Knowledge Base or
BioCarta. Almost all pathway database are equipped with software able to perform also
enrichment analysis, thereby from a unique tool is sufficient to extract data on the pathways
involved and on their abundance[58].
1.4 MOTIVATIONS AND OBJECTIVES OF THE STUDY
Despite significant improvements in clinical care, accurate diagnosis and risk stratification
for septic shock patients remains an important challenge. In fact, the early assessment of sepsis
severity is complicated by the highly variable and non-specific symptoms and clinical presentations
of this syndrome. Moreover, the choice of treatment is based only upon the traditional concept of
sepsis progression and corresponding clinical signs (i.e. organ hypoperfusion), thereby therapies are
not optimized for individual patients[7],[59].
The complex pathophysiology of sepsis suggests that a single biomarker approach cannot
adequately describe and stratify patients affected by this syndrome. Traditional biomarker
strategies, which implies the measurement of the concentration of a panel of circulating proteins
(e.g. C-reactive proteins, procalcitonine, cytokines, etc.) have not yielded a definitive set of
biomarkers, since they lacked the sensitivity and specificity to discriminate individual patient
prognoses and outcomes. Thereby, a comprehensive and integrated analysis of molecular
measurements and multiple clinico-pathologic data may facilitate an early and appropriate
therapeutic intervention. Integration of omics and clinical data may thus provide a means to follow
responsiveness to therapy, to establish new therapeutic targets, and finally to enable identification
of patients prone to specific therapies. Overall, early patient stratification may improve septic shock
outcome thanks to a prompt intervention with a tailored therapy[60],[61].
Although the so called omics technologies have been available for well over a decade,
technological advances in the field are continually increasing the feasibility and accessibility of this
kind of analyses, accompanied by a reduction of costs. In the last years, the interest in metabolomics
have increased since metabolites represents the terminal downstream products of the genome and
consists of the total complement of all low-molecular-weight molecules that cellular processes leave
behind. Since metabolites concentration levels vary as a consequence of genetic, physiological,
pathological or environmental changes, metabolomics studies can be applied in many different
1. Introduction
24
fields and are thus very useful to reveal molecular pathways and for identifying and quantifying
differentially expressed molecules, independently from multiple trigger factors causing the disease
under investigation. This aspect is very promising for complex and multifactorial syndromes, such
as septic shock, and makes metabolomics analyses a suitable starting point toward personalized
medicine in this area.
Personalized medicine refers to the tailoring of medical treatment to the individual
characteristics of each patient and it is a concept of therapeutic and preventive approach for
disease, which takes into account individual variability in genes, environment and lifestyle[62]. All
these factors cannot be synthesized in a single omics analysis. Only a multilevel approach, which
integrates clinical measurements and different omics data, could elucidate the complex
pathophysiological mechanisms of a disease and may thus provide a more precise picture of the
pathways involved. It is worth to underline that high-throughput omics approaches generate a huge
amount of data, giving origin to very complex and heterogeneous datasets, which cannot be
analyzed using traditional statistical analysis, hence the increasing interest towards data mining and
machine learning.
This thesis deals with the exploration of machine learning and data mining techniques for
metabolomics data analysis and multilevel integration in two septic shock patients’ cohorts. W
focused on a homogeneous and well defined group of patients in the same condition (i.e. severe
septic shock) and on a short temporal window (i.e. 48 hours or one week after diagnosis). The first
cohort is constituted by a subset of 20 patients of the ALBIOS database (Albumin Italian Outcome
Sepsis study, NCT00707122)[63], a multicenter clinical trial which enrolled patients with severe
sepsis or septic shock from 100 ICU in Italy. The second cohort is constituted by 21 septic shock
patients from ShockOmics dataset (NCT02141607)[64].
Primary objective of this thesis is the development of a strategy to analyze and integrate
omics data using data mining approaches in order to identify changes in metabolite patterns
associated to septic shock progression and patient response to treatment. Like in other omics study,
the number of features, e.g. hundreds of metabolites concentration, is much higher than the
number of observations, i.e. the number of patients. Therefore, the novelty and worthy of this work
is the development of suitable and reliable strategies to cope with this situation: different data
mining techniques, studied ad hoc for the specific scientific question and for the kind of data
considered, were explored.
1. Introduction
25
The models developed enabled to highlight the molecular pathways involved and to suggest
a list of candidate biomarkers so to intervene on treatment design to lower the mortality risk.
Indeed, the importance of this study is that it could be useful for physicians in septic shock
management and in the design of a therapy tailored on each patient. Moreover, the models
obtained by our data mining approach highlighted some species and pathways, which could help in
understanding the complex mechanisms, currently still under study, involved in the pathogenesis
and progression of septic shock.
1.4.1 Thesis outline
This thesis is organized into six chapters, including two introductory chapters and one of
conclusions, future directions and clinical impact of the work. Three appendixes complete the
dissertations giving further details on metabolomics and proteomics data analyses.
Chapter 1 illustrated the pato-physiological background, giving particular emphasis on septic
shock and omics data. Chapter 2 provides and overview of the data mining techniques applied for
our analyses. Chapter 3 and 4 present the analyses performed on a cohort of patients from ALBIOS
database (Albumin Italian Outcome Sepsis study, NCT00707122)[63]. Chapter 5 describes the
analyses performed on septic shock patients from ShockOmics clincila trial (NCT02141607)[64].
Chapter 6 summarizes the results, illustrates the clinical impact of the work and outlines the future
steps. The appendixes contain additional information about analytical protocols followed to
perform targeted and untargeted metabolomics analyses, details on proteomics data from multi-
iTRAQ experiments, and an explorative analysis on qualitative comparison of cardiogenic and septic
shock patients. The last appendix reports the complete list of the publications of the PhD candidate.
26
2 DATA MINING METHODS FOR OMICS DATA ANALYSIS AND
INTEGRATION
The main objective of traditional statistical methods is data exploration by considering each
feature as independent from the other variables of the dataset. Therefore, with a statistical test it
is possible to calculate whether confidence in a hypothesis exceeds a significance level, based solely
on a sample-based estimate. They do not allow for a general and valid categorization of selected
variables or, as in our case, for biological and functional interpretation. As an attempt to circumvent
this problem, data mining and alternative feature reduction methods constitute a better strategy
for selecting and prioritizing variables[65]. By considering each features in relation with all the
others, data mining approaches enable in fact to extract previously unsuspected information or
patterns from complex databases, such as the omics ones. It is important to underline that data
mining methods are not meant to replace classical statistical tests, but the two approaches should
be used in a complementary fashion: the former can be considered as hypothesis-generating
methods, while the latter can be used for hypothesis testing[66].
In this chapter, the machine learning techniques employed are reviewed and briefly
described. Firstly, a concise introduction about linear regression methods will be given, followed by
a more detailed explanation of logistic methods for regression, specifically of the elastic net
technique. Afterwards, an overview on the feature reduction methods present in literature will be
done. Particular emphasis will be given to the minimum-redundancy maximum-relevance (mRMR)
algorithm which has been applied in this study. Linear classification methods, specifically Linear
Discriminant Analysis (LDA) and Partial Least Squares Discriminant Analysis (PLS-DA), will be
described afterwards. Finally, we will briefly outline probabilistic graphical models.
2.1 THE MULTIPLE TESTING PROBLEM
Omics datasets are characterized by the so-called “course of dimensionality problem”, which
arises when the number of features p is much higher than the number of observations n. In this
situation, the number of statistical tests to perform increases and, as a consequence, also the
probability of wrongly rejecting the null hypothesis (type I error) increases. In other words, when
setting a p-value threshold of, for example, 0.05, there is a 5% chance that the result is a false
2. Data mining methods for omics data analysis and integration
27
positive, i.e. although results are statistically significant there is actually no difference in the group
means. While 5% is acceptable for one test, if we do lots of tests on the data, then this 5% can result
in a large number of false positives. This is known as the multiple testing problem. Type I errors are
particularly undesirables in omics studies, as false findings may seriously affect the outcome. Two
are the possible ways of handling this problem: the Bonferroni or the False Discovery Rate (FDR)
corrections. The Bonferroni correction is the classical approach used for the multiple testing
problem. Instead of setting the critical p-value for significance to 0.05, a lower critical value is used,
obtained by dividing the p-value by the number of comparisons. For instance, if the features are 100
and thus the tests to perform are 100, the critical value for an individual test would be
0.05/100=0.0005, thereby only features for which the p-value<0.0005 are considered significant. It
is evident that in case of high-dimensional datasets this condition is too conservative, in the sense
that while it reduces the number of false positives, it also reduces the number of true
discoveries[67].
Also the FDR approach determines adjusted p-values for each test, but it controls the
number of false discoveries only in those tests which are significant. This is different from the
Bonferroni correction, which controls all falsely rejected hypotheses. Because of this, the FDR
approach is less conservative and has greater ability to find truly significant results. The FDR
correction calculates a p-corrected value, called q-value, for each tested feature. This q-value is a
function of the p-values and the distribution of the entire set of p-values from the family of tests
being considered. For each feature, its associated q-value can be seen as the expected proportion
of false positives considered when such feature is declared to be significantly different. Hence, a
features having a q-value of 0.05 implies that 5% of features showing p-values as small as such
feature are false positives. Specifically, a p-value of 0.05 implies that 5% of all tests will result in false
positives and a q-value of 0.05 means that 5% out of the significant tests will result in false
positives[36], [67]. Thus, when imposing a significance threshold, both the p-value and the FDR
correction should be taken into account.
2.2 LINEAR AND LOGISTIC REGRESSION MODELS
A linear regression model assumes that the regression function y=f(x) is linear in the inputs
x1, … xn, and that y is a continuous variable. The aim of regression models is to identify the function
f, which expresses the relationship between the target variable y, also called dependent variable or
outcome, and n explanatory variables xn, also termed independent variables or predictors. To
2. Data mining methods for omics data analysis and integration
28
achieve a more convincing and sound interpretation, the functional relationship between the
dependent and independent variables, mathematically represented by f, should be of casual nature,
i.e. it should express a cause-effect nexus, with the independent variables xn playing the causal role
and the dependent variable y being the effect.
A linear regression model assumes that, given n input variables x1, … xn, the response 𝑦𝑦 is
predicted or estimated by the linear function:
𝑦𝑦 = 𝛽𝛽0 + 𝑥𝑥1𝛽𝛽1 + ⋯ + 𝑥𝑥𝑛𝑛𝛽𝛽𝑛𝑛 (2.1)
where the 𝛽𝛽 = (𝛽𝛽0, … , 𝛽𝛽𝑛𝑛) are unknown parameters or coefficients of the model, produced by the
model fitting procedure. The most popular estimation method is the ordinary least squares (OLS),
in which the coefficients are obtained by minimizing the residual sum of squares [68]:
𝛽𝛽𝑂𝑂𝑂𝑂𝑂𝑂 = ∑ �𝑦𝑦𝑖𝑖 − 𝛽𝛽0 − ∑ 𝑥𝑥𝑖𝑖𝑖𝑖𝛽𝛽𝑖𝑖𝑛𝑛𝑖𝑖=1 �𝑁𝑁
𝑖𝑖=12 (2.2)
where the regression coefficients 𝛽𝛽𝑖𝑖 represent the change in y, given one unit change in x. The input
variables are usually normalized, i.e. centered to have mean 0 and scaled to have standard deviation
1 (i.e. Z-score normalization). In this way the coefficients can be interpreted as weight.
The goal of linear regression models is twofold. On the one hand, they highlight dependency
of the target variable on the predictors, thus enabling a functional and/or causal interpretation and
this was precisely the objective of our metabolomics data analyses. On the other hand, they can
also be used to predict the future value of the target attribute, based upon the functional
relationship identified and upon the future values of the explanatory attributes. Therefore, the
development of a regression model allows to achieve a deeper understanding of the phenomenon
under study and to evaluate the effects determined on the target variable by different combinations
of values assumed by the predictors[69].
In spite of their simplicity, linear models provide an adequate and interpretable description
of how the inputs xn affect the output y. For prediction purposes, they outperform more complex
nonlinear models in some particular situations, e.g. with small numbers of training cases, low signal-
to-noise ratio or sparse data, thus they are broadly used in many different fields[68].
When the dependent variable y is categorical or binomial, for example in a classical
dichotomous problem (sick/healthy, dead/alive, etc.), we can consider f(x) as a reasonable estimate
of the posterior probabilities Pr(G = 1|X = x). However, f(x) can be negative or greater than 1, and
typically some are. These violations in themselves do not guarantee that this approach will not work,
2. Data mining methods for omics data analysis and integration
29
and in fact on many problems it gives similar results to more standard linear methods for
classification. If we allow linear regression onto basis expansions h(x) of the inputs, this approach
can lead to consistent estimates of the probabilities. However, logistic regression is to be
preferred[68].
The logistic regression model arises from the desire to model the posterior probabilities of
the K classes via linear functions in x, while at the same time ensuring that they sum to one and
remain in [0, 1]. The model is specified in terms of (K – 1) log-odds or logit transformations,
reflecting the constraint that the probabilities sum to one. In case of dichotomous problem (K=2
classes) the model can be simplified to one equation only:
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙(𝑝𝑝) = ln � 𝑝𝑝1−𝑝𝑝
� = 𝛽𝛽0 + 𝑥𝑥1𝛽𝛽1 + ⋯ + 𝑥𝑥𝑛𝑛𝛽𝛽𝑛𝑛 (2.3)
where p is the probability of the presence of the characteristic or class of interest. The logit
transformation is defined in terms of the log-odds as:
𝑙𝑙𝑜𝑜𝑜𝑜𝑜𝑜 = 𝑝𝑝1−𝑝𝑝
= probability of the presence of the characteristicprobability of the absence of the characteristic
(2.4)
Because the dependent variable is not a continuous one, the goal of logistic regression is
different from the one of linear regression. In fact, it predicts the likelihood that y is equal to 1 given
certain values of x. That is, if x and y have a positive linear relationship, then the probability that an
observation will have a score of y = 1 will increase as the value of x increases. So, rather than
choosing the parameters that minimize the sum of squared errors (like in ordinary regression),
estimation in logistic regression selects the parameters that maximize the likelihood of observing
the sample values. Because of the use of the logit function, the logistic regression coefficients are
not as easy to interpret; thus they are translated to the so called odd ratios using the exponent
function. The odd ratios are equal to 𝑒𝑒𝛽𝛽, where an odd ratio of 1 (i.e. β=0 ) indicates there is no
relationship between x and y [68].
2.2.1 Shrinkage methods
When considering linear regression models, according to the Gauss-Markov Theorem, the
OLS coefficients are the best linear unbiased estimators. This means that, among all linear estimates
with no bias, they have the smallest variance, and thus the smallest mean squared error (MSE).
However, there exist biased estimators with smaller MSE which trade a little bias for a larger
2. Data mining methods for omics data analysis and integration
30
reduction in variance and thus perform better in presence of many correlated variables. In this case
in fact, the 𝛽𝛽𝑂𝑂𝑂𝑂𝑂𝑂 can become poorly determined and exhibit high variance: a large positive
coefficient on one variable can be cancelled by a similarly large negative coefficient on its correlated
counterpart. This issue can be partially mitigated by imposing a size constraint on the 𝛽𝛽, applying
methods which shrink the regression coefficients by imposing a penalty on their size. For example,
ridge regression[70] is a continuous shrinkage method which minimizes the residual sum of squares
by applying a penalty on the L2-norm1 of the regression coefficients:
𝛽𝛽𝑟𝑟𝑖𝑖𝑟𝑟𝑟𝑟𝑟𝑟 = argmin𝛽𝛽
�∑ �𝑦𝑦𝑖𝑖 − 𝛽𝛽0 − ∑ 𝑥𝑥𝑖𝑖𝑖𝑖𝛽𝛽𝑖𝑖𝑛𝑛𝑖𝑖=1 �𝑁𝑁
𝑖𝑖=12 + 𝜆𝜆 ∑ |𝛽𝛽𝑖𝑖|2𝑛𝑛
𝑖𝑖=1 � (2.5)
where λ is a complexity parameter that controls the amount of shrinkage[68].
Even though ridge regression generally achieves better prediction performances than OLS, it cannot
produce a parsimonious model since it always keeps all the predictors.
To have an easier interpretable model, a technique called the lasso (“least absolute
shrinkage and selection operator”) was proposed by Tibshirani and colleagues[71]. The lasso shrinks
some coefficients and sets others to zero, thus performing at the same time subset selection: only
the variables mainly associated with the outcome obtain a non-null coefficient and thus are included
in the model. The lasso imposes a L1-norm 2 penalty on the regression coefficients which are
assessed by:
𝛽𝛽𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = argmin𝛽𝛽
�∑ �𝑦𝑦𝑖𝑖 − 𝛽𝛽0 − ∑ 𝑥𝑥𝑖𝑖𝑖𝑖𝛽𝛽𝑖𝑖𝑛𝑛𝑖𝑖=1 �𝑁𝑁
𝑖𝑖=12 + 𝜆𝜆 ∑ |𝛽𝛽𝑖𝑖|𝑛𝑛
𝑖𝑖=1 � (2.6)
where the only difference between the ridge regression and lasso is the penalty term (L2 and L1
norm respectively). Although the lasso is successful in many situations, it has two main
limitations[72]:
• the total number of variables p that the lasso can select is bound by the total number of
observations n in the dataset, e.g. in the case 𝑝𝑝 ≫ 𝑛𝑛, the lasso selects at most n variables before
it saturates;
• it fails to perform group selection; if there is a group of highly correlated variables, the lasso
tends to select only one variable and ignores the others.
1 The L2-norm, also called Euclidean norm or distance, is expressed as ‖𝛽𝛽‖2 = �𝛽𝛽1
2 + ⋯ + 𝛽𝛽𝑛𝑛2
2 The L1-norm is defined as ‖𝛽𝛽‖1 = |𝛽𝛽1| + ⋯ + |𝛽𝛽𝑛𝑛|
2. Data mining methods for omics data analysis and integration
31
These issues make the lasso an inappropriate variable selection method for situation of grouped
variables and when 𝑝𝑝 ≫ 𝑛𝑛, as it is often the case in datasets containing biological data (e.g. gene
microarray analysis, proteomics or metabolomics).
To solve the problems highlighted above, Zou and colleagues proposed a new regularization
technique, called the elastic net[72]. Similarly to the lasso, the elastic net simultaneously performs
automatic variable selection and continuous shrinkage, but it can also select groups of correlated
variables. The elastic net estimator 𝛽𝛽𝑟𝑟𝑙𝑙 𝑛𝑛𝑟𝑟𝑛𝑛 is the minimizer of equation:
𝛽𝛽𝑟𝑟𝑙𝑙 𝑛𝑛𝑟𝑟𝑛𝑛 = argmin𝛽𝛽
�∑ �𝑦𝑦𝑖𝑖 − 𝛽𝛽0 − ∑ 𝑥𝑥𝑖𝑖𝑖𝑖𝛽𝛽𝑖𝑖𝑛𝑛𝑖𝑖=1 �𝑁𝑁
𝑖𝑖=12 + 𝜆𝜆2 ∑ |𝛽𝛽𝑖𝑖|2𝑛𝑛
𝑖𝑖=1 + 𝜆𝜆1 ∑ |𝛽𝛽𝑖𝑖|𝑛𝑛𝑖𝑖=1 � (2.7)
where the elastic net penalty is a combination of the ridge and the lasso ones. More precisely, the
elastic-net selects variables like the lasso, and shrinks together the coefficients of correlated
predictors like the ridge.
The penalty term for the three models can then be expressed as follows:
𝑃𝑃𝛼𝛼 = ∑ �12
(1 − 𝛼𝛼)𝛽𝛽𝑖𝑖2 + 𝛼𝛼|𝛽𝛽𝑖𝑖|�𝑛𝑛
𝑖𝑖=1 (2.8)
For 𝛼𝛼 = 0 we obtain the ridge penalty, for 𝛼𝛼 = 1 the lasso, and for 0 < 𝛼𝛼 < 1 the elastic net. In this
latter case, the closer 𝛼𝛼 is to 0, the more rigid is the model. The algorithm here presented is the so
called naïve elastic net, which does not perform satisfactorily, unless it is very close to either ridge
regression (𝛼𝛼~0) or the lasso (𝛼𝛼~1). In fact, the parameters are penalized twice with the same 𝛼𝛼:
this double shrinkage does not decrease variance and introduces extra bias. This issue can be
overcome by imposing:
𝑃𝑃𝛼𝛼 = (1 − 𝛼𝛼)𝛽𝛽𝑖𝑖 + 𝛼𝛼|𝛽𝛽𝑖𝑖2| where 𝛼𝛼 = 𝜆𝜆2
(𝜆𝜆1+𝜆𝜆2) (2.9)
Thus, we obtain the elastic net estimators as:
𝛽𝛽𝑟𝑟𝑙𝑙 𝑛𝑛𝑟𝑟𝑛𝑛 = (1 + 𝜆𝜆2)𝛽𝛽𝑟𝑟𝑙𝑙 𝑛𝑛𝑙𝑙𝑖𝑖𝑛𝑛𝑟𝑟 (2.10)
The elastic net produces a sparse model with good prediction accuracy, while encouraging a
grouping effect. The empirical results demonstrate that the elastic net not only has good
performances but they are also superior to the lasso ones, particularly when dealing with several
correlated group variables, as in our datasets.
2. Data mining methods for omics data analysis and integration
32
The lasso and elastic net techniques can be used for variable selection and shrinkage with
any linear regression model. For the logistic regression, the lasso penalty can be written as:
𝛽𝛽𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = argmin𝛽𝛽
�1𝑁𝑁
𝐷𝐷𝑒𝑒𝐷𝐷(𝛽𝛽0, 𝛽𝛽) + 𝜆𝜆 ∑ |𝛽𝛽𝑖𝑖|𝑛𝑛𝑖𝑖=1 � (2.11)
where the Dev is the deviance of the model fit to the responses using as intercept β0 and as
predictor coefficients β. The bigger is the deviance of the observed values from the expected ones,
the poorer the fit of the model.
As for the elastic net, i.e. for α strictly between 0 and 1, we obtain:
𝛽𝛽𝑟𝑟𝑙𝑙 𝑛𝑛𝑟𝑟𝑛𝑛 = argmin𝛽𝛽
�1𝑁𝑁
𝐷𝐷𝑒𝑒𝐷𝐷(𝛽𝛽0, 𝛽𝛽) + 𝜆𝜆𝑃𝑃𝛼𝛼(𝛽𝛽)� (2.12)
where
𝑃𝑃𝛼𝛼(𝛽𝛽) = (1−𝛼𝛼)2
‖𝛽𝛽‖22 + 𝛼𝛼‖𝛽𝛽‖1 = ∑ �(1−𝛼𝛼)
2𝛽𝛽𝑖𝑖
2 + 𝛼𝛼�𝛽𝛽𝑖𝑖��𝑛𝑛𝑖𝑖=1 (2.13)
In this work, the elastic net model was developed in Matlab® environment with the lasso
and lassoglm library, available in the Statistics and Machine Learning Toolbox.
2.3 FEATURE REDUCTION
The purpose of feature reduction (or feature selection) is to eliminate from the dataset a
subset of uninfluential attributes, which are not suited to accurately explain the investigated
phenomenon. The key concept of feature selection is that correlated variables provide no extra
information about the classes, thereby they constitute noise for the predictor and are possible
source of bias. This implies that the total information content can be obtained only from fewer
unique features, which have maximum discrimination information about the classes[73]. Although
the elastic net or lasso penalty performs feature selection, reducing the number of variables before
building the model can have several advantages. Indeed, not only the computational time of the
learning algorithm decreases, but also the models generated are more robust, accurate and easier
to understand[69]. Moreover, feature reduction is extremely useful when a model is affected by
multicollinearity, as it often happens with biological variables. In fact, if we have high collinearity
and a condition where 𝑝𝑝 ≫ 𝑛𝑛 , the algorithm for the coefficients estimate can fail, the overall
significance of the model is compromised and the estimate of the regression coefficient can be
2. Data mining methods for omics data analysis and integration
33
inaccurate. In case of linear regression, the multicollinearity and 𝑝𝑝 ≫ 𝑛𝑛 condition do not permit to
compute the hat matrix (H = X(XT X)−1 XT and 𝑦𝑦� = X β = Hy ) as X is not invertible.
To remove redundant features, a criterion which measures the relevance of each feature
with the output class must be applied. Several methods have been developed for this purpose and
a brief overview of the main ones is provided in the follow.
According to the literature, feature reduction algorithms can be broadly classified into filter,
wrapped and embedded methods. Filter methods are named after the fact that they are applied
before classification in order to “filter out” less relevant attributes. They perform features selection
by ordering features according to their relevance: a suitable ranking criterion is applied to score the
model variables and a threshold is set in order to remove the variables which fall below it. In this
way, only top ranked features are selected and used for prediction. Feature relevance, i.e. the
usefulness of a feature in discriminating the different classes, can be measured according to several
different criteria, among which the most commonly used are conditional independence, correlation
and mutual information[74]. Filter methods are simple and robust against overfitting, but the best
features subset selected may not be unique or the optimal one[73].
Unlike filter methods, in wrapper methods the predictor is “wrapped” on a specific search
algorithm, which selects the features with the highest predictor performance. Since evaluating all
the 2n subset of features is a NP-hard problem, suboptimal subsets have to be found heuristically.
Two are the classes of algorithms used for this purpose: sequential selection and heuristic search
algorithms. Sequential selection algorithms are iterative: they start with an empty set (or full set)
and, at each iteration, they add (or remove) a feature according to its classification accuracy. At
each step, the new subset is evaluated according to the predictor performance and the process is
repeated until the required number of features is reached. Heuristic search algorithms instead
evaluate different features subsets in order to optimize the predictor performance. Different
features subsets can be generated either by searching around the features space or by generating
new solutions, e.g. by randomly selecting without replacement the variables in the dataset.
Although wrapper methods could in principle find the best feature subset, they are prone to
overfitting[73],[74].
In embedded methods feature selection is incorporated in the process of building the model,
thus feature reduction and learning are not two separate stages but they continuously interact. As
a results, these methods simultaneously attempt to select features and determine model
parameters[75],[76]. Embedded methods can be further divided into three categories: build-in,
2. Data mining methods for omics data analysis and integration
34
pruning and regularization methods. The first set of models have a build-in mechanism for feature
selection and are represented by classification and regression trees. Pruning methods train a model
with all features, then they attempt to remove some of the features by setting the associated
coefficients to 0, while keeping the model performance. Some examples are support vector machine
and nearest shrunken centroids. The regularization methods aim to minimize the fitting error and
in the meantime to force the coefficients to be small. The coefficients which are close to 0 are then
removed[75].
Every family of feature selection methods (i.e. filter, wrapper and embedded) has its own
advantages and drawbacks, therefore they have to be chosen according to the problem under
investigation and the nature of the data being analyzed. In general, filter methods are fast, since
they do not incorporate learning; wrapper methods are slower than filter, since they evaluate the
model performance at each iteration. Embedded methods tend to have higher capacity than filter
methods and are therefore more likely to over fit. Filter methods usually perform better when the
training set is small, as in our datasets, whereas embedded methods outperform filter methods
when the number of observations is much higher than the number of features[76].
2.3.1 The minimum-redundancy maximum-relevance (mRMR) algorithm
The feature reduction approach adopted in this study was the minimum-redundancy
maximum-relevance (mRMR), a filter method based on mutual information (MI) [77]. As the name
suggests, this algorithm combines two criteria, the maximal relevance (Max-Relevance) and minimal
redundancy (Min-Redundancy). The Max-Relevance selects the features with the highest relevance
to the target, estimated in terms of MI. Given two random continuous variables x and y, their mutual
information 𝐼𝐼(𝑥𝑥, 𝑦𝑦) is defined in term of their probabilistic density function as:
𝐼𝐼(𝑥𝑥, 𝑦𝑦) = ∬ 𝑝𝑝(𝑥𝑥, 𝑦𝑦) log 𝑝𝑝(𝑥𝑥,𝑦𝑦)𝑝𝑝(𝑥𝑥)𝑝𝑝(𝑦𝑦)
𝑜𝑜𝑥𝑥𝑜𝑜𝑦𝑦 (2.14)
MI is zero if x and y are independent, greater than zero if they are dependent. Given a feature subset
S with n features xi, and a target class c, the Max-Relevance criterion consists in searching features
which satisfy:
max𝐷𝐷
(𝑆𝑆, 𝑐𝑐), 𝐷𝐷 = 1|𝑂𝑂|
∑ 𝐼𝐼(𝑥𝑥𝑖𝑖; 𝑐𝑐)𝑥𝑥𝑖𝑖 ∈ 𝑂𝑂 (2.15)
2. Data mining methods for omics data analysis and integration
35
The features with the highest MI value are the most correlated to the outcome. However, it has
been recognized that the combination of the individually most relevant features does not
necessarily lead to good classification performances, thereby this criterion alone is not enough to
select the optimal features subset. It is likely that variables selected according to the Max-Relevance
criterion are redundant, i.e. with the same information content of the others. Therefore, in spite of
being strongly dependent on the target class, they do not add any meaningful information to the
model and, if one of them is removed, the class-discriminative power of the dependent variable
does not change. A criterion to reduce redundancy has thus been introduced to select mutually
exclusive features:
min𝑅𝑅
(𝑆𝑆), 𝑅𝑅 = 1|𝑂𝑂|2 ∑ 𝐼𝐼(𝑥𝑥𝑖𝑖; 𝑥𝑥𝑖𝑖)𝑥𝑥𝑖𝑖,𝑥𝑥𝑗𝑗 ∈ 𝑂𝑂 (2.16)
Redundancy is thus expressed as high values of MI among the features. The minimum redundancy
maximum relevance (mRMR) criterion combines the two constraints 2.15 and 2.16 according to:
maxΦ
(𝐷𝐷, 𝑅𝑅), Φ = 𝐷𝐷 − 𝑅𝑅 (2.17)
For our analyses we used a two-stage approach. In the first stage, we found a smaller set of
candidates features by applying the mRMR algorithm. Then, we used the selected features to build
the models with the elastic net technique. This allowed us to select a compact set of superior
features and to obtain models with very good accuracies. The source codes implemented for mRMR,
developed by Peng[77], are freely available.
2.4 LINEAR METHODS FOR CLASSIFICATION
Linear classification methods deal with the problem of finding the best sub-division in the
variables domain so to asses to which group a sample is most likely to belong to. More precisely, in
a classification problem every observation xn in the data matrix X is associated to a qualitative
outcome (or class) yi, which can take only values from a discrete set G. Since also the predictors
𝐺𝐺�(𝑥𝑥) takes values in G, the input space can always be divided into a collections of regions labeled
according to the classification. Depending on the prediction function, the boundaries of these
regions can be smooth or rough. For an important class of procedures, called linear methods for
classification, these boundaries are linear[68]. There are several specific algorithms useful for
classification problems; here we briefly describe Linear Discriminant Analyses (LDA) and Partial Least
2. Data mining methods for omics data analysis and integration
36
Squares Discriminant Analyses (PLS-DA) which have been applied in this study. Both methods were
implemented in Matlab® environment; LDA with the fitcdiscr function, available in the Statistics and
Machine Learning Toolbox, whereas the source codes implemented for PLS-DA, developed by Li et
al.[78], are freely available.
2.4.1 Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) is a supervised classification method based upon the
concept of searching for a linear combination of variables (predictors) that best separates two
classes. The goal of LDA is to project a feature space (i.e. a n-dimensional dataset of n samples) onto
a smaller subspace k (where k ≤ n−1) while maintaining the class-discriminatory information and
avoiding overfitting by minimizing the error in parameter estimation. If we consider a two-class
classification problem, we can define separability by mean of a score function S:
𝑆𝑆(𝛽𝛽) = 𝛽𝛽𝑇𝑇𝜇𝜇1− 𝛽𝛽𝑇𝑇𝜇𝜇2𝛽𝛽𝑇𝑇𝐶𝐶𝛽𝛽
(2.18)
where β are the coefficients of the linear model and 𝜇𝜇1, 𝜇𝜇2 are the mean vectors. Given the score
function S, the problem is to estimate the linear coefficients which maximize the score and it can be
solved as follows:
𝛽𝛽 = 𝐶𝐶−1(𝜇𝜇1 − 𝜇𝜇2) (2.19)
with C being the pooled covariance matrix, expressed as:
𝐶𝐶 = 1𝑛𝑛1+𝑛𝑛2
(𝑛𝑛1𝐶𝐶1 + 𝑛𝑛2𝐶𝐶2) (2.20)
where 𝐶𝐶1, 𝐶𝐶2 are the covariance matrixes and 𝑛𝑛1, 𝑛𝑛2 is the number of samples in the two classes.
To assess the effectiveness of the discrimination, one way is to calculate the Mahalanobis distance
∆ between two groups:
∆2= 𝛽𝛽𝑇𝑇(𝜇𝜇1 − 𝜇𝜇2) (2.21)
where a distance greater than 3 means that the two averages differ by more than 3 standard
deviations. This implies that the overlap (i.e. the probability of misclassification) is quite small.
Two are the main limitations of LDA. First, the number of variables needs to be less than the
number of observations; hence, when the number of variables exceeds the number of samples,
feature reduction has to be performed before applying this algorithm. The other disadvantage is
2. Data mining methods for omics data analysis and integration
37
that LDA does not take into account the differing variance structure of each group since it only used
a single pooled covariance matrix C. This approach is not appropriated for a dispersed group, for
which a relatively large distance from a mean may be less significant than for a compact one.
We want to remark that with two classes, there is a simple correspondence between linear
discriminant analysis and classification by linear least squares. In fact, if we suppose to code the
targets in the two classes as +1 and −1 respectively, it is easy to show that the coefficient vector
from least squares is proportional to the LDA direction[79].
2.4.2 Partial Least Squares Discriminant Analysis (PLS-DA)
Partial least squares discriminant analysis (PLS-DA) is a supervised method mainly used for
classification purposes, i.e. to determine to which group a sample is most likely to belong to, given
a set of measurements. In the two-dimensional case, the PLS-DA algorithm can be regarded as a
linear two-class classifier which aims to find a straight line that divides the space into two regions.
When there are more than two variables, the decision function is represented by a hyperplane in
the multidimensional space. To optimize the separation between groups, PLS-DA links the two data
matrixes X, raw data, and Y, class membership, by maximizing the covariance between them. The
original variables are thus summarized into fewer new ones (called PLS components or latent
variables) with the constrain to explain as much as possible the covariance between X and Y [80],
[81].
The PLS-DA algorithm is derived from PLS regression and involves building a regression
model between X and Y by decomposing the two matrixes in a product of a common set of specific
loadings. The fundamental PLS-DA equations are the following[79]:
𝑋𝑋 = 𝑇𝑇𝑃𝑃 + 𝐸𝐸 (2.22)
𝑌𝑌 = 𝑇𝑇𝑇𝑇 + 𝑓𝑓 (2.23)
where T is the common score matrix, i.e. it contains the projections of X and Y onto the hyperplane
representing the maximum covariance between them. P and q are the loadings matrixes which
contains the directions of the hyperplane with respect to X and Y variables. Thereby, the loadings
matrixes give information about how each variable influence the model. Finally, E and f are the
residuals, i.e. the errors of X and Y left unaccounted by the model. The PLS-DA algorithm works as
follows:
1. Calculate the PLS weight vector w
2. Data mining methods for omics data analysis and integration
38
𝑤𝑤 = 𝑋𝑋𝑇𝑇𝑌𝑌 (2.24)
2. Calculate the scores, given by
𝑇𝑇 = 𝑋𝑋𝑋𝑋�∑ 𝑋𝑋2 (2.25)
3. Calculate the X loadings by
𝑃𝑃 = 𝑇𝑇𝑇𝑇𝑋𝑋∑ 𝑇𝑇2 (2.26)
4. Calculate the Y loadings (a scalar)
𝑇𝑇 = 𝑌𝑌𝑇𝑇𝑇𝑇∑ 𝑇𝑇2 (2.27)
5. Subtract the effect of the new PLS component from the data matrix to obtain a residual data
matrix
𝐸𝐸 = 𝑋𝑋 + 𝑇𝑇𝑃𝑃 (2.28)
𝑓𝑓 = 𝑌𝑌 + 𝑇𝑇𝑇𝑇 (2.29)
Score and loadings are developed in a way that the first score of X has the maximum
covariance with the first one of Y and so on. Variable importance in a PLS-DA model can be measured
by the Variable Importance in Projection (VIP) score, which expresses the contribution of each
variable in the model. The VIP score of a variable is calculated as a weighted sum of the squared
correlations between the PLS-DA components and the original variable and the weights correspond
to the percentage of variation explained by the PLS-DA component in the model. Once a model is
built, it is possible to predict the value of Y both for the original data and for new samples.
It is worth to underline that when the group sizes are unequal, such as in our cases, the
decision boundary is shifted towards the larger group and consequently many samples can be
misclassified. Thereby, the solution is to weight the center of X by subtracting the average of the
means of the two groups (𝑋𝑋𝐴𝐴��� − 𝑋𝑋𝐵𝐵����)/2 from the columns so to shift away the decision boundary
from the larger group.
The PLS-DA approach is widely used in omics data analysis since it can handle highly collinear
and noisy data, which are very common outputs of this kind of studies. In addition, it also provides
loading weights and VIP scores, which can be used to identify the most important variables. Results
are shown in low dimensional score plots which illustrate the separation between groups in an easily
2. Data mining methods for omics data analysis and integration
39
interpretable way. Moreover, comparison of loadings and score plot enables to investigate
relationship important variables that can be specific in the group of interest. This aspect is
fundamental in field such as metabolomics, in which we are interested not only in deciding to which
group a sample belongs to, but also to asses which variables are best discriminators.
In spite of these advantages, the PLS-DA algorithm has the tendency to over-fit, especially
when the number of variables significantly exceed the number of observations. This is due to the
fact that some correlations can be found just by chance, given the high number of variables. A way
to avoid overfitting is to split samples into training and test sets or, as an alternative when the
number of samples is too low, to perform variables reduction[79],[81].
2.5 PERFORMANCE EVALUATION
Within a classification analysis it is usually advisable to validate the model on an independent
dataset. To do so, the dataset is divided into two smaller subsets: the training and testing set. As
the names suggest, the training set is used for training the classification model, that is for deriving
the functional relationship between the target variable and the explanatory variables. What remains
of the available data, i.e. the testing set, is used later to evaluate the performance of the generated
model or to select the best model out of those developed using alternative classification methods.
To guarantee that each observation appears the same number of times in the training and in the
testing set, the cross-validation (CV) method can be applied to partition the dataset. The dataset is
divided into k subsets and, at each time, one of the k subsets is used as the test set and the other k-
1 subsets are used as a training set. The procedure is repeated k times using each of the k training
sets in turn and evaluating the model performance each time on the corresponding test set. At the
end of the procedure, the overall accuracy is computed as the average of the k individual
performances. To reduce variability, multiple rounds of cross-validation are performed using
different partitions (different k), and the validation results are averaged over the rounds[69].
In a binary classification as in our case, i.e. when we have two classes only, for each instance
in the test set, a model prediction is expressed as the probability of being in the case class with a
value in the range [0,1]. Two strategies were used for performance evaluation: confusion matrixes
and Receiver Operating Characteristic (ROC) curve.
2. Data mining methods for omics data analysis and integration
40
In a confusion matrix, the columns are the predicted class and the rows are the actual class.
The performance of a model is evaluated by means of the accuracy, i.e. the proportion of true
classification (True Negative, TN and True Positive, TP) among the total number of cases examined.
When data are imbalanced, predictive accuracy of a confusion matrix may not be
appropriate, thus ROC curves may be used instead. A ROC curve is a standard technique for
summarizing classifier performance over a range of trade-offs between TP, the number of positive
examples correctly classified, and false positive (FP), the number of negative examples incorrectly
classified as positive, error rates. An accepted performance metric for a ROC curve is the Area Under
the Curve (AUC) as it is independent from the decision criterion selected and prior probabilities[82].
2.6 PROBABILISTIC GRAPHICAL MODELS
The framework of probabilistic graphical models is quite broad and encompasses a variety
of different types of models and of methods related to them. The aim of probabilistic graphical
models is to capture the underlying probabilistic relations between variables of interest and to
express the underlying set of conditional independence (CI) assumptions via a graph structure.
Let P(x) be a joint probability distribution of n discrete variable (x1, … xn), the aim is to find
an approximation of this distribution by taking advantage of the properties of statistical
independence, i.e. P(xi|xj) =P(xi) if xi and xi are statistically independent.
The basic idea is that variables are statistically dependent only on very few others. These
interactions can be represented as a network, thus providing an easily interpretable picture of the
complex statistical relationships that exist among the domain variables[83], [84]. Graphical models
are used in many different fields of knowledge for diagnosis, prediction, classification and decision
making. They have also found a wide application in medicine and biology since they can be used to
predict and model the molecular basis of complex disease states.
An example of a simple graphical model is shown in Figure 2.1. Variables in the domain are
modeled as random variables and represented as nodes; the edges connecting them represents a
conditional dependence of the child node on the parent node. Thus, the absence of an edge
connecting two nodes implies that the two corresponding variables are conditional independent,
given the other variables. In addition to the graph structure, each node is annotated with the
conditional distribution of the variable given the values of its parents, and this information can be
used to infer the most probable values of variables in the network, given assignments to other
variables[84].
2. Data mining methods for omics data analysis and integration
41
Figure 2.1 - Example of a simple graphical model. The nodes represent the variables A, B, C and D, whereas the edges denote the conditional dependence among them (e.g. the node A and B are conditional independent, given the other variables).
Two are the main families of graphical representations: the Bayesian Networks (BN) which
use directed graphs (i.e. the edges are directed since they have a source and a target) and the
Markov Networks (MN), also called Markov Random Field, which uses undirected graphs[83]. For
both methods, two are the main steps to perform an analysis: structure learning and inference.
Structure learning consists in building the network, whereas inference is the process of computing
the consequences of the network for outcome prediction[84].
The graphical structure of a BN is a Directed Acyclic Graph (DAG), in which a conditional
probability distribution is associated to each node N and described as
𝑃𝑃(𝑁𝑁𝑖𝑖|𝑃𝑃𝑃𝑃(𝑁𝑁𝑖𝑖)) (2.30)
This implies that the node 𝑁𝑁𝑖𝑖 is conditionally independent of its non-descendants, given its
immediate parents 𝑃𝑃𝑃𝑃(𝑁𝑁𝑖𝑖). The joint distribution of the nodes is given by:
𝑃𝑃(𝑁𝑁𝑖𝑖 , … 𝑁𝑁𝑛𝑛) = ∏ 𝑃𝑃(𝑁𝑁𝑖𝑖|𝑃𝑃𝑃𝑃(𝑁𝑁𝑖𝑖))𝑖𝑖 (2.31)
Figure 2.2 illustrated an example of a directed graph, which represents the following joint
probability function P(G, S, R)=P(G|S,R)P(S|R)P(R). Each node is connected by an arrowhead to
depict the dependence relations graphically.
2. Data mining methods for omics data analysis and integration
42
Figure 2.2 – Example of a DAG. Note that the edges are directed.
If there are no a priori hypotheses or any mechanistic model is available, the correspondence
between the CI of the random variables and the representation of corresponding nodes on the
graph has to be modeled by specific structure learning algorithms which can be grouped in three
main categories[85]:
• Constraint based algorithms: they find the network that best explains dependencies and
independencies in the data. They use CI tests to detect the Markov blankets of the variables, i.e.
the area which includes the parents, the children and all the nodes that share a child with that
particular node. They are then used to build the network structure. These algorithms consist of
three steps[85],[86]:
1. Building of the skeleton of the network (i.e. the undirected graph underlying the network
structure). Since an exhaustive search is computationally unfeasible, all learning algorithms
use some kind of optimization such as restricting the search to the Markov blanket of each
node;
2. Set all the direction of the edges considering each node triplets;
3. Set the directions of the other arcs as needed to satisfy the acyclicity constraint.
• Score-based algorithms: they find the highest-scoring network structure by assigning a
goodness-of-fit score to each candidate BN and trying to maximize it with heuristic search
algorithms (e.g. hill climbing, tabu search). During the exploration process, the scoring function
is applied in order to evaluate the fitness of each candidate structure to the data[85], [87].
Popular network scores include the log-likelihood score, the Akaike information criterion (AIC)
and the Bayesian Information Criterion (BIC) which are all based on the maximization of the
value of the likelihood function associated to each model;
• Hybrid approaches: they integrate constraint and score-based algorithms since they use
conditional independence tests and network scores at the same time[85],[86].
2. Data mining methods for omics data analysis and integration
43
Structure learning algorithms are driven by distinct principles and metrics, so the resulting
models may be different. The algorithms based on independence tests perform a qualitative study
of the dependence and independence relationships between the variables in the domain, thus they
attempt to find a network that represents these relationships as far as possible.
Score-based algorithms instead attempt to find a graph that maximizes the selected score.
In other words, they represent a measure of closeness in approximating P(x) by a combination of
lower order distributions. Each algorithm is characterized by a specific scoring function and search
procedure used, thereby results may differ according to the methods applied[87]. For this reason,
it is recommended to use an algorithm of each category and to compare the results. Ideally, the
edges which appear in all models should represent the strongest dependencies.
Being forced to choose the direction of edges, BNs are not suitable for some domains (e.g.
spatial or relational data). In such cases, MN can be applied. Also for MN, the relationship among
variables (i.e. the nodes 𝑁𝑁𝑖𝑖) is based on CI but it is defined by the Markov property, expressed by:
𝑁𝑁1 ⊥ 𝑁𝑁2|𝑁𝑁𝑖𝑖 (2.32)
where ⊥ indicates independence in the joint distribution over the domain. Thus two nodes are
conditionally independent if there is no edge between them. Since directionality of the edges is not
considered, CI alone can be used to build a network[88]. In spite of this, to obtain more robust
networks, it is advisable to apply the same algorithms already described for BN structure learning.
One of the main difference between BNs and MNs is that MNs may be cyclic, therefore they
can represent cyclic dependencies which a BN cannot. Given their directionality, BNs can instead
represent induced dependencies and this enable to represent causality: an edge from A to B
indicates in fact that “variable A causes B” and this information can thus be used for inference[83].
Probabilistic graphical models are an attractive methodology in the omics field, since they
can model complex interactions between many variables of interest in an easily interpretable way.
Furthermore, they provide a robust analytic approach for identifying both predictors of between-
individual variation within a group of interest and other potentially interesting interactions between
physiological, pathological and environmental variables among different groups (e.g. survivors vs
non survivors). Probabilistic graphical models, and above all BNs, have been already applied in some
genomics and metabolomics study both to identify novel biomarkers, to reconstruct pathways of
interest and for prediction or classification purporses[66],[89],[90]. In this study, probabilistic
graphical models have been used for explorative analyses both on metabolomics data and on
2. Data mining methods for omics data analysis and integration
44
integrated proteomics and metabolomics data sets. The models have been built using the R
packages gRapHD and bnlearn[85],[91].
2.7 FINAL CONSIDERATIONS
We are aware that the methods here presented do not cover all the possible strategies
available for omics data analysis and integration. In fact, as reported by Gromski et al.[81], several
methods, suitable for this purpose, can be found in the literature and are currently used to analyze
large and highly complex datasets. In this review, two other approaches are suggested: random
forest (RF) and support vector machines (SVM), which will be briefly described in this paragraph.
The random forests technique is an ensemble learning method that generates many
classification trees and aggregates them to compute a classification. In a decision tree, the instances
within each node are split into subgroups using, among all features available, the one that
maximizes a given criterion. The criterion, usually the Gini Impurity Index, evaluates the
homogeneity of new nodes locally. Random forests introduce variations in the samples and
instability over the single classification trees by drawing several bootstrap samples from the original
training data. Each bootstrap sample is used to fit a single classification tree, and, at each split in the
tree, the algorithm randomly restricts the set of predictor variables to select from. The ensemble
will then consist of a diverse set of trees. For prediction, an average over the predictions of the
single trees is used as it is proved to be more accurate than any of the single trees[92].
The main issue related to RF is that it is very difficult to interpret them in terms of underlying
mechanisms leading to the obtained classification. It is indeed critical to understand which variables
or interactions between variables are providing the predictive accuracy. The use of internal out-of-
bag (OOB) estimates was proposed in Breiman [92] to estimate the importance of each variable in
the model. When each tree is built, a number of samples randomly selected with replacement are
used to grow each tree. Because of the replacement, a subset of samples is not included in the
building process of each tree: it is the out-of-bag sample set of that tree. To assess the role of each
variable in the prediction performance of the forest, OOB of all the trees can be used. The procedure
proposed by Breiman uses samples not included in the building process of each tree of the forest to
test the tree performance and then a permutation of the same sample subset to test it again. The
comparison of the results permits to estimate an index, which is related to the importance of that
variable. If the performance of the forest does not change permuting that variable, it means that
the considered variable has low importance in forest classification performance.
2. Data mining methods for omics data analysis and integration
45
Support Vector Machine (SVM) instead performs classification by constructing separating
planes which distinguish between objects of different class memberships in a multidimensional
space[93]. Given a labeled training data, the algorithm aims at finding the discriminating hyperplane
that maintains an optimal margin from the boundary of the training dataset. This hyperplane is
called support vectors, and it is used to categorize new instances. The optimal hyperplane, i.e. the
one that achieves the maximum geometric margin, is obtained by an iterative training algorithm by
finding the solution for a quadratic optimization problem.
Both RF and SVM do not produce a variable selection like the lasso (least absolute shrinkage
and selection operator) regression analysis or other shrinkage and regularization approaches. In this
case a wrapper strategy should be adopted. As already outlined, wrapper methods evaluate subsets
of variables which allows to detect the possible interactions between variables. The variables are
removed according to the variable ranking (variable importance in RF or by applying mRMR). The
selected subset is the one with the best performance and the most parsimonious (one-standard
error rule). The two main disadvantages of wrapper techniques are: the increasing overfitting risk
when the number of observations is insufficient and the significant computation time when the
number of variables is large.
To conclude, since metabolomics is a relatively young and complex field, to date there is no
universal choice of an analytical method which is superior in all cases. For this reason, only the
application of a variety of different approaches, suited to the data under analysis, may lead to robust
results.
46
3 MORTALITY PREDICTION FOR SEVERE SEPTIC SHOCK PATIENTS: A
TARGETED METABOLOMICS STUDY ON ALBIOS DATABASE
In this study we examined plasma metabolome and clinical features in a subset of 20 patients
with severe septic shock (SOFA score >8), enrolled in the multicenter Albumin Italian Outcome
Sepsis study (ALBIOS, NCT00707122). This work was partly presented at the XVI Congress of the
European Shock Society (ESS)1, and published as journal paper on Scientific Reports2. Our purpose
was to identify metabolites changes associated with 28-day mortality and to elucidate early
biomarkers signatures, which might help clinicians in prioritizing individual patient treatment during
shock. Blood samples were analyzed at the laboratory of Mass Spectrometry at IRCCS Mario Negri
Institute in Milan under the supervision of Dr. Roberta Pastorelli.
A mass spectrometry-based quantitative metabolomics approach was used to
simultaneously measure different metabolites classes, including acylcarnitines, amino acids,
biogenic amines, glycerophospholipids, sphingolipids, and sugars. The elastic net technique was
applied to find association with mortality on the basis of metabolite concentration levels and clinical
parameters. Our results showed that low unsaturated long-chain phosphatidylcholines species and
lysophosphatidylcholines species were associated with survival together with circulating
kynurenine. Moreover, these glycerophospholipids were negatively correlated to the event in
combination with clinical variables such as cardiovascular SOFA score. Overall, we observed that
early changes in plasma levels of both lipid species and kynurenine are associated with mortality
and this may have potential implications for early intervention and for the discovery new target
therapies.
In the follow we will present the rationale behind the study, the dataset used and the
methods applied. A discussion of the results will close the chapter.
1 A. CAMBIAGHI, L. Brunelli, Caironi P, et al., “SCK-3: Target metabolomics for improving early prediction of death in patients with septic shock”, XVI. Congress of the EUROPEAN SHOCK SOCIETY, Cologne, Germany, September 24-26 2015 (abstract published in Shock Journal, 2015, 44(2) - pp: 1-27) 2 M. Ferrario, A. CAMBIAGHI, L. Brunelli, S. Giordano, P. Caironi, L. Guatteri, F. Raimondi, L. Gattinoni, R. Latini, S. Masson, G. Ristagno, R. Pastorelli, “Mortality prediction in patients with severe septic shock: a pilot study using a target metabolomics approach”, Scientific Report, 6 (2016)
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
47
3.1 INTRODUCTION
ALBIOS (Albumin Italian Outcome Sepsis study, NCT00707122) is a recent, large, multicenter,
randomized clinical trial that enrolled 1818 patients with severe sepsis or septic shock. Given the
high rate mortality (46.7%)[63], similar to that observed in other comparable clinical studies[94],
[95], we believed that a multimarker strategy may be helpful to better understand the complex
pathogenesis of the disease and its evolution for early risk stratification and personalized therapies
implementation. Several recent studies have focused on investigating plasma metabolomics profiles
as predictive signatures of ICU mortality in adult patients[89], [96]–[98], thus the use of emerging
omics tools able of examining physiological responses at system level is particularly promising for
complex conditions such as septic shock.
In previous studies, different composite metabolite patterns have been identified with
nuclear magnetic resonance (NMR) or mass spectrometry (MS). Although these methods have
different intrinsic metabolomics coverage potential, they all clearly highlight the widespread
metabolic abnormalities in patients with septic shock, and the interplay of several different
biochemical pathways.
In the present study, we used a target mass spectrometry-based quantitative metabolomics
approach focusing our attention on several series of metabolites, some of which have already been
identified as part of key biochemical pathways in septic shock. More precisely, the metabolite
species mainly involved are the kynurenine and lysophosphatidylcholines, the alterations of which
has already been reported in septic shock patients[89],[94]-[98]. We applied such strategy on a
selected subset of patients with severe septic shock (SOFA>8 and lactate level >4 mmol/L) enrolled
in the ALBIOS study.
Our explorative study was designed to provide absolute quantitative information on changes
in plasma metabolite levels measured one day (initial acute phase) and one week after development
of severe septic shock, and to relate these changes with mortality. The two time points were chosen
to verify the hypothesis that the metabolic changes over the time period reflect not only initial
clinical characteristics, but also the progression of the disease and long-term survival. Association
between metabolic patterns and mortality was assessed with univariate and multivariate analyses
adjusted for clinical relevant variables.
The primary goal of this pilot investigation was to verify the feasibility of our metabolomics
approach that is intended to be used for ShockOmics clinical trial (NCT02141607), a study aimed at
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
48
elucidating early multilevel markers signatures which could reveal the metabolic pathways involved
in this syndrome as a necessary step for a target therapy.
3.2 MATERIAL AND METHODS
3.2.1 Study design, patients and clinical data
The multicenter ALBIOS clinical trial enrolled patients with severe sepsis or septic shock from
100 ICU in Italy (NCT00707122), as fully described in the original article[63]. We analyzed only a
subset of patients, selected according to the following inclusion criteria: the presence of septic shock
(i.e. presence of a proved or suspected infection in at least one site; two or more signs of systemic
inflammatory reaction syndrome; the presence of an acute sepsis-related cardiovascular
dysfunction or systolic blood pressure < 90 mmHg), total SOFA score > 8, serum lactate > 4 mmol/L,
and availability of plasma samples at day 1 (acute state, D1) and day 7 (steady state, D7) after
diagnosis of septic shock. Exclusion criteria included the presence of active hematological
malignancy or cancer, immunodepression, HIV, chronic renal failure, or cirrhosis. Only patients
discharged from ICU between 7-14 days from the occurrence of shock were considered. Such
inclusion and exclusion criteria were chosen in accordance with those of the multicenter clinical
study, ShockOmics (NCT02141607), as the current study represents a preliminary investigation.
Only 20 among the 1818 patients enrolled in ALBIOS trial and with plasma samples stored in
the biobank fulfilled the inclusion criteria. These patients were analyzed according to their survival
status 28 days after study enrollment and were thus classified into two groups: survivors (11
patients, S) and non-survivors (9 patients, NS). For each time point at which the blood samples were
collected (i.e. D1 and D7), we considered 24 clinical parameters and 137 metabolites concentrations
(µM).
3.2.2 Univariate analyses for metabolomics data
A targeted quantitative approach using a combined direct flow injection and liquid
chromatography (LC) tandem mass spectrometry (MS/MS) assay was applied for targeted
metabolomics analysis. Details on the protocol are illustrated in Appendix A. The changes in
metabolite concentrations from D1 to D7 within the same group were evaluated by means of the
paired Wilcoxon signed rank test. The comparisons between S and NS patients were performed by
unpaired Wilcoxon rank-sum test both at D1 and at D7. Finally, for each metabolite, the time-trend
variation in metabolites concentration (i.e. ∆=D7-D1) was compared between the two groups by
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
49
Wilcoxon rank-sum test. To overcome the problem of the large number of statistical comparisons,
the false discovery rate (FDR) was computed using a bootstrapping technique of oversampling with
replacement to obtain a sample size of 20 patients per group for a total of 40 observations. Results
were considered statistically significant when p-value <0.05 and FDR <0.15.
3.2.3 Multivariate analysis
The aim of the multivariate models was to predict NS patients. Four models were built
according to the different data set used: two models for metabolite concentrations at D1 and D7
respectively and two which combined metabolites and clinical parameters at D1 and D7
respectively. The technique used was the Elastic Net. Data were first normalized (Z-score
normalization) to have unitary variance and zero mean. Given the low number of subjects, also in
this case a bootstrapping with replacement was used to obtain a sample size of 20 patients per
group for a total of 40 observations. For every data set analyzed, different models were built with
2, 4, 5 and 10-fold cross validation (CV), and the model with the minimum Mean Squared Error (MSE)
was selected. The outcome (S = 0, NS = 1) was considered as output of the model. The best model
was selected among the different CV models based on one-standard error rule. The performance
was evaluated by means of the accuracy, i.e. the proportion of true classification (True Negative, TN
and True Positive, TP) among the total number of cases examined.
3.3 RESULTS
3.3.1 Clinical characteristics of the study population
Clinical characteristics, scores and comorbidities of the 20 patients at study enrolment (D1
are reported in Table 3.1. Patients were randomized to receive either 20% albumin and crystalloid
solutions (13 patients) or crystalloid solutions alone (7 patients) for volume replacement. In 11
patients (55%), source of infections was identified at site culture, including gram-negative (5
patients), gram-positive (2 patients) and both gram-negative and gram-positive bacterial infection
(3 patients), as well as fungal infection (1 patient). In 9 of these patients (82%), antibiotic therapy
empirically administered during the first 24 hours was appropriate. On day 28, mortality rate was
45% (9 patients died). No significant differences were found between the two groups (S and NS) at
enrolment. All the patients were treated according to the standard guidelines internationally
accepted for the treatment of patients with severe sepsis or septic shock.
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
50
ALL PATIENTS S NS # patients 20 11 (55%) 9 (45%)
Gender (Male) [# (%)] 13 (65%) 6 (55%) 7 (78%) Age (years) 70.5 (56.0, 77.0) 70.5 (56.0, 77.0) 72.0 (69.0, 76.5)
BMI - Body Mass Index 26.8 (25.4, 29.4) 26.8 (25.4, 29.4) 27.8 (23.8, 30.3) Heart Rate (bpm) 101.0 (96.0, 109.0) 101.0 (96.0, 109.0) 105.0(100.0, 110.3)
Mean Arterial Pressure (mmHg) 75.0 (65.0, 85.2) 75.0 (65.0, 85.2) 68.3 (65.3, 83.1) Venous Central Pressure (mmHg) 12.5 (7.5, 15.0) 12.5 (7.5, 15.0) 13.0 (9.5, 13.3)
Positive End Respiratory Pressure (cmH2O) 10.0 (6.5, 10.0) 10.0 (6.5, 10.0) 10.0 (7.5, 10.5) FiO2 57.5 (50.0, 60.0) 57.5 (50.0, 60.0) 60.0 (40.0, 61.3)
Central Venous O2 saturation (%) 78.5 (72.0, 81.5) 78.5 (72.0, 81.5) 80.0 (74.5, 83.0) PvCO2 (mmHg) 50.0 (42.5, 51.5) 50.0 (42.5, 51.5) 51.0 (43.3, 52.0)
PvO2 (mmHg) 45.5 (41.0, 49.0) 45.5 (41.0, 49.0) 47.0 (43.8, 51.5) PaCO2 (mmHg) 45.5 (37.5, 49.0) 45.5 (37.5, 49.0) 49.0 (37.5, 49.3)
PaO2 (mmHg) 104.0 (85.0, 136.0) 104.0 (85.0, 136.0) 92.0 (80.0, 116.5) Lactate (mmol/L) 3.4 (2.7, 5.4) 3.4 (2.7, 5.4) 4.7 (3.2, 6.3)
Platelets (103/mm3) 47.0 (27.0, 81.0) 47.0 (27.0, 81.0) 28.0 (20.5, 120.0) Creatinine (mg/dL) 2.5 (1.4, 3.2) 2.5 (1.4, 3.2) 1.9 (1.3, 3.4) Biliuribine (mg/dL) 1.9 (1.1, 3.0) 1.9 (1.1, 3.0) 2.0 (1.4, 6.5)
Arterial pH 7.4 (7.3, 7.4) 7.4 (7.3, 7.4) 7.4 (7.3, 7.4) Venous pH 7.4 (7.3, 7.4) 7.4 (7.3, 7.4) 7.4 (7.3, 7.4)
Urine Output (mL/day) 1625 (975, 3060) 1625 (975, 3060) 1600 (547, 2122) Use of renal replacement therapy [# (%)] 3 (15%) 1 (9%) 2 (22%)
Presence of ventilatory support [# (%)] 20 (100%) 11 (100%) 9 (100%)
CLINCAL SCORES SOFA 12.5 (9.5, 13.5) 12.5 (9.5, 13.5) 13.0 (10.0, 14.3)
Respiratory System 3.0 (2.0, 3.0) 3.0 (2.0, 3.0) 3.0 (1.8, 3.0) Coagulation 2.5 (2.0, 3.0) 2.5 (2.0, 3.0) 3.0 (1.0, 3.3)
Liver 1.5 (0.0, 2.0) 1.5 (0.0, 2.0) 2.0 (0.8,2.3) Cardiovascular System 4.0 (3.0, 4.0) 4.0 (3.0, 4.0) 4.0 (3.0, 4.0)
Renal System 2.0 (1.0, 3.0) 2.0 (1.0, 3.0) 2.0 ( 0.8, 3.3)
COMORBIDITIES Liver Disease [# (%)] 0 (0%) 0 (0%) 0 (0%)
Chronic obstructive pulmonary disease [# (%)] 2 (10%) 1 (9%) 1 (11%) Chronic Renal Failure [# (%)] 0 (0%) 0 (0%) 0 (0%)
Immunodeficiency [# (%)] 0 (0%) 0 (0%) 0 (0%) Congestive or ischemic heart disease [# (%)] 2 (10%) 1 (9%) 1 (11%)
Table 3.1 - Characteristics at study enrollment in the two groups of patients (S: survivors; NS: non survivors). Data are presented as median, 25th and 75th percentile or as frequency (%). The two groups did not significantly differ (p-value >0.05 Wilcoxon rank-sum for continuous variables test and p-value >0.05 Fisher exact test for categorical variables)
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
51
3.3.2 Time-course of plasma metabolites and association with mortality
We first assessed by univariate analysis whether metabolites levels significantly changed
from D1 to D7 within the same group (Wilcoxon signed rank test p<0.05, FDR<0.15). Figure 3.1 gives
a pictorial overview of the changes of metabolite concentrations (row, mean log2 μM) between D1
and D7 in survivors (left panel) and non-survivors (right panel). Five different species of
lysophosphatidylcholines (lysoPC), 19 of diacyl-phosphatidylcholines (PC aa), 26 of acyl-alkyl
phosphatidylcholines (PC ae), 2 of acylcarnitines (carnitine, C0; butyrylcarnitine, C4), 4 of long-chain
sphingomyelins (SM) increased from D1 to D7 in S patients, while kynurenine decreased. In NS
patients, we observed an overall increase from D1 to D7 of lysoPC, PC, and SM species; amino acids
doubled their plasma concentration together with putrescine.
Profiles of specific metabolites differed significantly between NS and S (Table 3.2). The
majority of PC and LYSOPC species showed lower values at D1 and D7 in NS when compared to S,
whereas higher concentrations of acetylcarnitine (e.g. C2) and of kynurenine were observed in NS
on D1 and D7, respectively. There were six lipid species comprising saturated long-chain lysoPC and
polyunsaturated very long-chain PC, whose levels decreased at D7 in NS.
METABOLITE S NS p-value FDR NS vs S
D1
lysoPC a C16:1 0.657 (0.334, 0.970) 0.313 (0.291, 0.591) 0.040 0.003 ↓ PC aa C30:2 0.005 (0.005, 0.026) 0.099 (0.017, 0.147) 0.046 0.005 ↑ PC aa C38:1 0.757 (0.499, 0.936) 1.025 (0.886, 1.645) 0.028 <10-6 ↑ PC aa C38:6 164.022 (131.032,174.123) 92.206 (47.999,137.945) 0.033 <10-6 ↓ PC ae C38:0 2.633 (2.418, 3.202) 1.712 (1.124, 2.241) 0.015 <10-6 ↓ SM C20:2 0.095 (0.068, 0.121) 0.055 (0.043, 0.088) 0.048 0.001 ↓ C2 5.080 (3.369, 8.774) 11.066 (8.189, 21.852) 0.048 0.028 ↑
D7
lysoPC a C16:0 47.046 (24.384, 58.821) 18.150 (14.455, 33.212) 0.048 0.001 ↓ lysoPC a C18:0 10.807 (6.188, 14.137) 5.684 (3.502, 7.111) 0.040 0.003 ↓ lysoPC a C24:0 0.096 (0.086, 0.108) 0.066 (0.062, 0.085) 0.010 <10-6 ↓ PC aa C32:3 3.486 (2.769, 4.240) 2.019 (1.807, 2.486) 0.028 0.001 ↓ PC aa C34:4 8.604 (6.879, 11.438) 4.150 (3.464, 5.720) 0.028 <10-6 ↓ PC aa C36:4 615.675(487.555,717.315) 369.822(340.428,463.466) 0.048 <10-6 ↓ PC ae C34:3 26.979 (20.127, 31.112) 18.013 (15.057, 21.835) 0.048 0.002 ↓ PC ae C40:1 1.633 (1.209, 1.706) 0.844 (0.770, 1.230) 0.010 0.012 ↓ PC ae C42:4 0.788 (0.679, 0.892) 0.608 (0.487, 0.663) 0.040 <10-6 ↓ Kynurenine 7.680 (4.965, 8.735) 12.000 (8.745, 23.800) 0.012 <10-6 ↑
Table 3.2 - Metabolite levels comparison between survivors (S) and non-survivors (NS) at day 1 and at day 7. Only significant results are reported (p < 0.05, FDR < 0.15). Plasma concentrations are expressed in μM and shown as median 25th and 75th percentile. The arrows indicate that the metabolite concentration in NS group is lower (↓) or higher (↑) with respect to S group.
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
52
Figure 3.1 - Heat maps of the metabolites (mean Log2 μM) whose concentrations changed significantly from D1 to D7 in S (right panel) and NS (left panel) (Wilcoxon signed rank test p<0.05, FDR<0.15)
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
53
The variation of metabolites from D1 to D7, expressed as ∆=D7-D1, were then compared
between S and NS. Significant differences in metabolite levels were found (Figure 3.2) (Wilcoxon
test p<0.05, FDR<0.15). As for S patients, a clear negative variation was observed for kynurenine,
whereas lysoPC and PC (mainly low saturated long-chain species) showed a positive variation.
Figure 3.2 - Comparison of the absolute differences in metabolite concentrations (μM) from day 1 to day 7 (Δ=D7–D1) in survivors (S) and non-survivors (NS), shown as box-plots. The outliers are defined as 1.5 times of interquartile range and highlighted by a cross. Each plot represents a different metabolite. (Wilcoxon test p < 0.05, FDR < 0.15).
3.3.3 Association between metabolic patterns and mortality
As combination of features can give more information than features considered individually,
we used prediction models with the aim of identifying a set of features that are mostly associated
to the target class, i.e. NS group. The models coefficients can be interpreted as follows: the higher
their absolute value, the higher their weight in the model; a positive coefficient denotes a positive
correlation with the event (i.e. 28-day mortality), a negative coefficient vice versa. On D1, PC aa
C38:1 and C4 (butyryl-acylcarnitine) had a strong positive correlation with 28-day mortality,
whereas PC aa C40:6 and PC ae C38:0 were inversely associated (Figure 3.3 panel A). On D7,
kynurenine and PC aa C42:4 were positively correlated to 28-day mortality, while PC aa C40:1 and
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
54
PC ae C40:1 are negatively correlated (Figure 3.3 panel B). Models accuracy was 0.84 ± 0.25 for the
model built on D1 data and 0.73 ± 0.35 for the one at D7.
Figure 3.3 - Elastic Net coefficients built on metabolite concentrations at D1 (A) and D7 (B). Models accuracy was 0.84 ± 0.25 and 0.73 ± 0.35 respectively.
3.3.4 Integrated clinical and metabolomics determinants of mortality
We next checked for a possible redundancy of prognostic information among circulating
metabolites and clinical variables. Figure 3.4 shows the best elastic net regression models that
considered both metabolites and clinical variables measured at D7, whereas no reliable predictive
models were obtained with metabolites and clinical parameters collected at D1. As shown in Figure
3, daily urinary output, plasma concentration of lysoPC a C24:0, and mean arterial pressure resulted
negatively associated with the outcome, while the risk of death increased with the cardiovascular
subcomponent of the SOFA score which represents the need for vasoactive drugs. The accuracy of
the model was 0.86±0.03.
Figure 3.4- Elastic Net coefficients built on metabolite concentrations and clinical parameters at D7. Model accuracy was 0.86 ± 0.03
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
55
3.4 DISCUSSION
This study is a preliminary investigation aimed to characterize the metabolomics profiles of
patients with severe septic shock, and to integrate them with the clinical manifestation
characterizing this syndrome. We identified several metabolomics alterations, previously reported
in patients with severe sepsis and septic shock[89], [96]–[102], supporting the feasibility and the
rationale underlying our pilot study design. Overall, profiles of specific metabolites measured on
day 1 and day 7 differed markedly between S and NS patients, and some metabolic features
appeared to be associated with mortality. Though we cannot discuss the significance of every single
metabolite, some general comments on the main metabolic pathways and their pathological
relevance to septic shock are warranted.
NS patients were characterized by a significant elevation of the polyamine pool (e.g.
spermidine), from D1 to D7. Since polyamines mediate the complex interplay between bacterial
infection and the host immune response[103],[104], this might suggest an altered regulation of
pathogen-host interactions in these patients. Moreover, non-survivors had increased plasma level
of glucogenic amino acids, in line with relative hepatic dysfunction occurring early in sepsis and
consequent derangement in the hepatic gluconeogenesis[103],[104]. A peculiarity of NS patients
was the significant increase from D1 to D7 of plasma kynurenine (Figure 3.2): kynurenine level at D7
was almost doubled in 28-day NS compared to S (Table 3.2). A clear relation has already been made
between accelerated tryptophan catabolism along the kynurenine pathway and inflammatory
reactions[107], [108]. Furthermore, it has recently been shown that kynurenine plasma level might
predict the development of sepsis in major trauma patients[109], and its modulation has already
been associated to 28-day mortality in critically ill patients[89]. Increased production of kynurenine
has been proposed to contribute to hypotension in sepsis[110] and it has been associated with
dysregulated immune response and impaired microvascular reactivity[111]. We can thus speculate
that the lower kynurenine concentration found in S patients may represent a favorable host
response trait. However, whether kynurenine metabolism is a pathogenic factor in sepsis or rather
an epiphenomenon needs further evaluation.
Decreased plasma level of PC and lysoPC species was a prominent component of the
metabolic phenotype in NS (Table 3.2 and Figure 3.2), in accordance with an overall lipidome
alterations observed in sepsis and critically ill patients[96],[101],[102]. Already 24 hours after
admission (i.e. D1), NS patients showed a marked decrease in PC species, containing long chain
polyunsaturated fatty acid (LCPUFAs), that persisted at D7 with further elongation/desaturation
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
56
products. Since LCPUFAs reduce T-cell activation and dampen inflammation[112], it might be
speculated that a decrease in PC containing LCPUFAs can hamper their protective effects, including
a concerted action of either withdrawing pro-inflammatory eicosanoids or incrementing anti-
inflammatory eicosanoids. We can reasonably exclude dietary-derived influence on LCPUFAs, since
the difference in their concentrations were present already at D1, and patients were all subjected
to dietary support according to standard guidelines on the treatment of patients with severe sepsis
or septic shock[59].
A general explanation for these findings is that the lowered circulating level of PCs found in
NS patients might be due to reduced or unbalanced availability of fatty acid substrates for their
biosynthesis, consistent with a deregulated mitochondrial and/or peroxisomal beta-oxidation
occurring early in sepsis, as already anticipated in Chapter 1. Regulation of fatty acid synthesis and
oxidation in the mitochondria is schematized in Figure 3.5. Briefly, malonyl-CoA, produced during
fatty acid synthesis, inhibits the uptake of fatty acylcarnitine (and thus fatty acid oxidation) by
mitochondria. When fatty acyl-CoA levels rise, fatty acid synthesis is inhibited and fatty acid
oxidation increases. Thus, the decreased plasma acetylcarnitine (C2, Table 3.2) observed in S
patients compared to NS ones would indicate a general more efficient use of substrates for energy
production and a probably reversible mitochondrial damage in survivors.
Figure 3.5 - Schema of the regulation of fatty acid synthesis and oxidation in the mitochondria.
A further bio-signature characterizing NS patients was their reduction over time in circulating
mono-saturated and saturated lysoPCs. Such changes in lysoPC concentration is in concordance with
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
57
Park et al.[100] who showed a similar downward trend of lysoPCs in 28-day non-survivors as
compared to survivors patients in sepsis. Decreased lysoPC levels have been also reported in septic
patients compared to healthy controls[99]. The well-known pro-inflammatory activities of
lysoPCs[111],[112] seem to contradict the poor outcome (and the lower lysoPC levels) observed in
sepsis. The reduction in circulating lysoPC may simply reflect their enhanced conversion to
lysophosphatidic acid, which is known to induce a multitude of cellular responses through its action
on immunological relevant cells[115]. Therefore, it is conceivable that lysoPC reduction may
promote an excessive immune response with detrimental effect in those patients who will not
survive[99],[100].
In the multivariate models, low unsaturated long-chain PC species were associated with
mortality together with circulating kynurenine (Figure 3.3). Moreover, lysoPC a C24:0 was negatively
correlated to the event in combination with clinical variables (Figure 3.4). The recurrent decrease in
lysoPC a C24:0 may denote an alteration in very long chain fatty acids, such as lignoceric acid as
preferred substrates for the peroxisomal beta-oxidation. Consequently, a down-regulation of the
peroxisomal lignoceryl-CoA ligase activity might be hypothesized[116].
Overall, our findings suggest a multifactorial origin for such abnormal phospholipids metabolism,
in which dysregulation of phospholipases, catabolism of lysoPC, peroxisomal dysfunction, imbalance
in the levels of saturated/unsaturated fatty acids could all be involved. However, the underlying
molecular mechanisms potentially regulating the circulating PC and lysoPC species in S patients as
compared to NS remain unclear and need further investigations.
We acknowledge that this study has several limitations. First, a targeted approach restricts,
by its nature, the panel of candidate markers and focuses only on few metabolic pathways. Second,
the sample size is limited, and confirmatory studies are necessary. This is mainly due to the fact that
many patients with such severity of septic shock do not reach day 7 of ICU staying, thus we end up
with a limited number of patients fulfilling our inclusion criteria. Third, we measured metabolites at
only two time points within one week from the diagnosis of septic shock; metabolites with temporal
changes out of this time window might thus provide a more precise insight for the clinical
progression of the disease. Nevertheless, we identified a combination of circulating metabolites
altered during the early course of severe septic shock and associated with mortality.
This preliminary investigation was therefore very informative in capturing possible evolution and
variations of metabolic signatures during a full blown, durable and well-established
pathophysiologic manifestation of severe septic shock. Focusing on a homogeneous group of
3. Mortality prediction for severe septic shock patients: a targeted metabolomics study on ALBIOS database
58
patients rather than on a larger number of scattered phenotypes allowed for a better control of
potentially confounding factors. Therefore, the metabolic changes observed in our samples pertains
more closely to the selected pathophysiological condition and it should be proved in a larger cohort
by including different phenotypes and not only the severe patients.
3.5 REMARKS
In conclusion, the data here presented confirm the feasibility of our approach in determining
changes circulating metabolites able to characterize the progress of septic shock condition. Our
results are in line with recent findings indicating that lipid homeostasis and tryptophan catabolism
might influence mortality in septic shock. The association of early changes in the plasma levels of
both lipid species and kynurenine with mortality, with possible implications for early intervention is
the most important result of our study. Although our analyses cannot determine causality, they
suggest that alterations in kynurenine and lipid species might represent not only risk factors for
patients with severe septic shock, but important pathophysiologic mechanisms deserving further
investigations.
59
4 INTEGRATION OF METABOLOMICS AND PROTEOMICS: AN
ANCILLARY STUDY ON ALBIOS DATABASE
Integration of metabolomics and proteomics information is a promising approach for
revealing molecular pathways as well as for identifying and quantifying differentially expressed
molecules, independently from multiple trigger factors leading to septic shock. To this purpose, we
examined plasma metabolome, proteome and clinical features in a subset of patients with severe
septic shock (SOFA score >8), enrolled in the multicenter ALBIOS study[63], already described in
Chapter 3. Proteomics analyses were performed at the Proteomics Platform, Parc Cientific de
Barcelona, Spain, under the supervision of Dr. Eliandre de Oliveira.
Overall, we aimed to integrate the results obtained by the metabolomic analyses previously
reported with the information coming from proteomics in order to have a wider picture of the
interactions occurring between metabolites and proteins and to gain deeper insights into septic
shock progression and individual patient’s response. We merged the results obtained by
spectrometry-based quantitative metabolomics with protein signals measured by iTRAQ. We then
applied the Elastic Net technique, LDA and PLS-DA to build integrated classification models used to
find features associated with mortality. Our results confirm that early changes in plasma levels of
lipid species are altered in non survivors. As for proteins, the most important differences between
the two groups are related to proteins which are part of the inflammatory response and of the
coagulation cascade, which are two of the most important pathways involved in septic shock
progression.
In the follow we will present the rationale behind our analyses, the study design and the
methods applied. Afterward, the obtained results will be compared and discussed.
4.1 INTRODUCTION
In the last decade, advances in high-throughput approaches have allowed the development
of proteomic and metabolomic studies for evaluating the association of genetic and phenotypic
variability with disease progression. These considerations are of fundamental importance in case of
complex multifactorial syndromes such as septic shock. In fact, response to treatment differs from
patient to patient and is extremely difficult to predict. Thus, integration of both proteomics and
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
60
metabolomics approaches may better describe the pathophysiological mechanisms involved and
allow a more complete characterization of this condition.
In the study previously outlined (see Chapter 3)[117], we found that profiles of specific
metabolites measured one day (D1) and one week (D7) after diagnosis of septic shock differed
markedly between survivors (S) and non-survivors (NS) patients. More precisely, we observed that
low unsaturated long-chain phosphatidylcholines (PCs) species and lysophosphatidylcholines
(lysoPCs) species were associated with survival together with circulating kynurenine. We thus
speculate that lipid homeostasis and tryptophan catabolism might influence mortality in septic
shock.
In light of these considerations, the objective of these analyses is to better characterize non
survivors patients according to the variations that occurred in metabolites concentration from D1
to D7, expressed as ratio D7/D1. This information will then be integrated with proteomics and
clinical data in order to acquire a more complete view of the pathways involved in this complex
syndrome.
4.2 MATERIAL AND METHODS
4.2.1 Study design, patients and clinical data
This pilot retrospective investigation was an ancillary study of the multicenter ALBIOS clinical
trial (NCT00707122)[63]. Inclusion criteria for the present study are the same already adopted in
our previous analyses (see Chapter 3, paragraph 3.2.1)[117]. Three out of the 20 patients included
in our previous study have been excluded due to hemolysis in their blood samples and thus they
were not suitable for the proteomics analyses.
Patients were analyzed according to their survival at 28 days after study enrollment. For each
patient, plasma samples were available at day 1 (acute state, D1) and at day 7 (steady state; D7)
after diagnosis of septic shock. For each time point (D1 and D7), we considered 24 clinical
parameters (Table 1), 137 metabolites concentrations (µM) and 132 proteins values, expressed as
peak intensities, for a total of 293 features.
4.2.2 Proteomics data analyses
A multi-iTRAQ experiment was designed to compare the plasma protein pattern expression
between S and NS patients. Details on the protocol are illustrated in Appendix B. Briefly, sample
from 17 septic shock patients and from 5 healthy donors (M1 to M5) were arranged in six iTRAQ™
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
61
8plex experiment. The 5 healthy donors were used for LC-MS normalization purposes. For each of
the six iTRAQ run more than 200 proteins were quantified.
4.2.2.1 Criteria of proteins selection
The following procedure was followed for proteins selection: 1) identification of proteins,
which have been detected in all six runs; 2) removal of proteins which are contaminants, i.e. the
most abundant proteins that should already have been depleted before iTRAQ analyses (e.g. serum
albumin); 3) exclusion of proteins identified only by one peptide, even if unique for that protein.
After this selection, a total of 132 proteins were identified and considered for further analyses.
4.2.2.2 Quality control
To assess if measures from the six different runs are comparable, blood samples from five
healthy controls (M1 to M5) were included in the analyses and their replicates in different runs were
used to test for significant differences or bias. For each of the five control samples, the differences
in measured proteins abundance between the pairs of replicates were computed (e.g. for sample
labeled as M1 we computed Δ RUN 1-5 = M1 RUN1 – M1 RUN5).
For each of the series of differences, the Lillierfors test was performed against the null
hypothesis of Gaussian distribution and the Student test against the null hypothesis that the series
have mean value equal to zero. In this way, we statistically verify if the differences between runs
are randomly distributed around zero. For each of the series, both tests have a p-values < 0.05, thus
we cannot reject the hypothesis that differences are normally distributed around zero. This implies
that there are no biases.
4.2.3 Statistical analyses
For each of the 132 protein abundances, comparisons between S (9 patients) and NS (8
patients) at D1 and D7 were performed using a 2-way ANOVA. The observations were grouped
according to two factors: runs and outcome (S/NS). Two separate tests were performed, one for D1
and one D7. For each protein, a total of 3 p-values were calculated, defined as p-valueOUTCOME, p-
valueRUN, and p-valueOUTCOME*RUN. Only those proteins associated to p-valueOUTCOME<0.05 and p-
valueOUTCOME*RUN>0.05 were considered.
On these proteins, we compared the peak intensities measured at D1 and D7 of S and NS
groups by means of Wilcoxon rank-sum test. To overcome the problem of the large number of
statistical comparisons, we computed also the false discovery rate (FDR). The FDR was assessed after
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
62
the bootstrapping procedure: the sample size was increased from 9 to 20 subjects for the S group
and from 8 to 20 subjects for NS by means of a random sampling with replacement, for a total of 40
observations. Bootstrapping procedure was used only for the FDR assessment in order to increase
the samples number for the estimation of p-values distribution. Results were considered statistically
significant when p-value <0.05 (no bootstrapping) and FDR <0.15.
Comparisons between D1 and D7 within the same group were performed with a 2-way
ANOVA for repeated measures. The repeated measures model included outcome, run, and day (D1
or D7). A total of 4 p-values were computed: p-valueDAY, p-valueOUTCOME*DAY, p-valueRUN*DAY, p-
valueRUN*DAY*OUTCOME, where DAY represents the repeated factor. Those proteins (3 in total) which
were affected by the run (i.e. p-valueRUN*DAY*OUTCOME<0.05 and p-valueRUN*DAY<0.05) were excluded
from further analyses as we cannot exclude the run effect. Post hoc comparisons were then
performed on the remaining proteins and their trend from D1 to D7 was compared by means of the
paired Mann-Whitney t-test.
Finally, we compared the ratio D7/D1 for both metabolite concentrations and protein peak
intensities between S and NS by Wilcoxon rank-sum test. Also in this case the FDR was computed as
previously described and results were considered statistically significant when p-value <0.05 (no
bootstrapping) and FDR <0.15.
4.2.4 Multivariate analysis
4.2.4.1 Data from targeted metabolomics analyses
Our aim was to characterize NS patients, in particular to find the species which are mostly
associated to the outcome.
We built the classification models on the ratio D7/D1 of metabolite concentrations. Because
of the small sample size (17 patients) and the large number of features (137 metabolites),
collinearity represents a crucial issue. The method used to reduce the features dimension is the
minimal-redundancy-maximal-relevance (mRMR), as previously described (see Chapter 2). We
discretized the features distribution according to the interquartile range before applying the mRMR
algorithm.
We considered the first 10, 20 and 30 ranked metabolites to build three different
classification models. Data were first normalized (Z score normalization) and the dataset was divided
into a training and test set as two third and one third of the observations, respectively.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
63
We adopted two strategies to further select a smaller subset of features. We performed 50
times an elastic net logistic model using a logit function to fit the training set data. We considered a
binary classification (S = 0, NS = 1) and the output of the model is a value between 0 and 1, which
represents a sort of probability. We then selected the coefficients of the model with the minimal
deviance. We also applied another strategy, we used the shrinkage parameter λ, corresponding to
the model with the minimal deviance, to fit another elastic net model and to obtain the coefficients
of the logistic regression. In both cases, the models were then evaluated on the testing set and the
performance were assessed by the number of correct imputations.
LDA and PLS-DA were also implemented. More precisely, LDA was performed on the first 10
ranked metabolites and the coefficients for the linear boundary between the first and second
classes were retrieved. PLS-DA was performed both on the first 10 and 20 ranked metabolites,
considering 3 PLS components. Since the groups are unbalanced, the data matrix was weighted
centered in order to avoid having a decision boundary shifted towards the most numerous group.
The performance of the classification models was evaluated by considering the number of correct
imputations.
4.2.4.2 Integration of targeted metabolomics and proteomics data
We built an integrated model by merging targeted metabolomics and proteomics data. Also
for proteomics data we computed the ratio D7/D1 for each of the 132 protein peak intensities. To
avoid multicollinearity, the mRMR algorithm was applied and the first 50 ranked proteins were
selected. These proteins were then combined with the first 50 ranked metabolites and the mRMR
was performed again on this new features subset composed of 50 metabolites and 50 proteins.
After Z score normalization, we considered the first 10, 20 and 30 ranked features to build the
classification models using the two strategies described in the previous paragraph. LDA and PLS-DA
were also performed as stated above.
4.2.4.3 Integration of metabolomics, proteomics and clinical data
Finally, we built a comprehensive model which combines targeted metabolomics,
proteomics and clinical data. Only continuous clinical variables were considered for which the ratio
D7/D1 was computed. Total SOFA score and partial SOFA scores were not included to avoid any
redundancy. In fact, they are calculated from clinical parameters which are included in the model
and they are thus likely to be correlated. Finally, a total of 17 clinical variables were included.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
64
The 17 clinical variables were added to the first 20 ranked features from the set of
metabolites and proteins, obtained as previously described. The mRMR was then performed on this
subset of features to further reduce the number of features. After Z score normalization, the first
10, 20 and 30 ranked features were selected to build the classification models. LDA and PLS-DA were
also performed on these three sets of features.
4.3 RESULTS
4.3.1 Clinical characteristics of the study population
Patients with severe septic shock enrolled in the multicenter ALBIOS clinical trial[63], and
fulfilling the inclusion/exclusion criteria as previously reported, were analyzed. The baseline
characteristics of these 17 patients, source and kind of infection are reported in Table 4.1. To note
that they are quite similar to the previous one (see Table 3.1). In 9 patients, source of infection was
identified at site culture, including gram-negative (4 patients), gram-positive (2 patients) and both
gram-negative and gram-positive bacterial infection (gram mix, 2 patients), as well as other kinds of
microorganisms (mixed, 1 patient). Two patients (one S and one NS), had multiple infections (S
abdomen and other, NS lungs and other). 9 out of 17 patients (82%) received antibiotic therapy
empirically decided during the first 24 hours. Patients were randomized to receive either 20%
albumin and crystalloid solutions (10 patients) or crystalloid solutions alone (7 patients) for volume
replacement. On day 28, mortality rate was 47% (8 patients died).
ALL PATIENTS S NS Age (years) 66.1 ±13.9 63. 8 ± 16.6 67.9 ± 12.5
BMI (kg/m2) 27 ± 3.9 27.5 ± 3.9 27.9 ± 3.2
SOURCE OF INFECTION Lungs [# (%)] 6 (35%) 1 (11%) 5 (63%)
Abdomen [# (%)] 8 (47%) 4 (44%) 2 (25%) Genitourinary [# (%)] 5 (29%) 5 (56%) 0 (0%)
Other [# (%)] 3 (18%) 1 (11%) 2 (25%)
KIND OF INFECTION Negative [# (%)] 8 (47%) 3 (33%) 5 (63%)
Mixed [# (%)] 1 (6%) 0 (0%) 1 (13%) Gram positive [# (%)] 2 (12%) 1 (11%) 1 (13%)
Gram negative [# (%)] 4 (24%) 3 (33%) 1 (13%) Gram mix [# (%)] 2 (12%) 2 (22%) 0 (0%)
Table 4.1– Clinical characteristics in survivors (S) and non survivors (NS) at enrollment. For 8 patients (3 S and 5 NS) the bacterial culture had a negative results (negative). No statistically significant differences were found between the two groups.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
65
Clinical and laboratory variables on day 1 (D1) and day 7 (D7) are reported in Table 4.2. All the
patients were treated according to the standard guidelines internationally accepted for the
treatment of patients with severe sepsis or septic shock. No significant differences were found
between the two groups.
D1 D7 S NS S NS
Heart Rate (bpm) 103.5 ± 28.4 106.1 ± 12.8 80.4 ± 11.0 91.1 ± 8.7 Mean Arterial Pressure (mmHg) 76. 6 ± 18.3 72.0 ± 11.0 96.3 ± 13.8 * 78.4 ± 12.0 *
Central Venous Pressure (mmHg) 11.4 ± 5.8 11.5 ± 4.4 7.9 ± 5.1 8.8 ± 2.1 Urine output (mL) 2556.1 ± 918.4 1840.0 ± 1652.8 3705.6 ± 1580.2 * 1737.5 ± 1478.9 *
FiO2 (%) 59.7 ± 12.4 56.3 ± 20.8 40.6 ± 8.5 46.3 ± 23.4 ScvO2 (%) 73.3 ± 11.6 78.5 ± 7.5 77.1 ± 7.3 77.9 ± 4.5
PvCO2 (mmHg) 46.8 ± 5.7 47.8 ± 4.9 50.6 ± 5.2 46.5 ± 7.8 PaCO2 (mmHg) 42.3 ± 6.3 44.3 ± 6.4 45.3 ± 3.9 41.6 ± 8.3 PvO2 (mmHg) 43.3 ± 4.6 46.5 ± 7.4 44.1 ± 5.9 44.6 ± 5.9 PaO2 (mmHg) 122.2 ± 61.0 98.5 ± 32.0 126.9 ± 29.8 115.1 ± 61.0 LAT (mmol/L) 3.0 ± 1.6 5.0 ± 2.3 1.4 ± 0.5 2.4 ± 2.2
Platelets (x103/mm3) 63.9 ± 35.4 61.4 ± 68.1 112.0 ± 67.2 80.3 ± 50.9 Serum creatinine (mg/dL) 2.7 ± 0.9 2.1 ± 1.3 1.8 ± 1.7 1.8 ± 1.5
Serum biluribin (mg/dL) 1.7 ± 0.9 5.0 ± 4.8 1.9 ± 1.3 9.1 ± 10.8 Presepsin (µg/L) 1486 ± 1256 2673 ± 2351 830 ± 458 4969 ± 5826
Arterial pH 7.4 ± 0.1 7.4 ± 0.1 7.5 ± 0.0 7.4 ± 0.1 Venous pH 7.4 ± 0.0 7.4 ± 0.1 7.4 ± 0.0 7.4 ± 0.1 CVV [# (%)] 0 (0%) 2 (25%) 1 (11%) 3 (38%)
Ventilatory Support [# (%)] 9 (100%) 8 (100%) 4 (44%) 7 (88%)
CLINICAL SCORES SOFA 11.3 ± 2.4 12.4 ± 3.2 5.0 ± 2.1 9.3 ± 5.1
Respiratory System 2.4 ± 1.0 2.4 ± 1.3 1.2 ± 0.8 1.9 ± 1.0 Coagulation 2.3 ± 0.9 2.5 ± 1.4 1.6 ± 1.1 1.9 ± 1.2
Liver 1.1 ± 0.9 1.9 ± 1.2 1.0 ± 1.0 2 ± 1.8 Cardiovascular System 3.6 ± 0.5 3.6 ± 0.5 0.0 ± 0.0 1.1 ± 1.5
Renal System 1.9 ± 0.6 2.0 ± 1.7 1.2 ± 1.5 2.4 ± 1.8
Table 4.2 – Clinical and laboratory variables at D1 and D7 for the 17 patients, divided in survivors (S, 9 pts) and non-survivors (NS, 8 pts). Data are presented as mean ± SD or as frequency. Mean Arterial Pressure and Urine output (marked with *) at D7 were significantly different between the two groups (p-value <0.05 Wilcoxon rank-sum test).
4.3.2 Changes in protein expressions between groups
A multi-iTRAQ experiment was designed to compare the plasma protein pattern expression
between S and NS patients. Criteria of proteins selection and quality control are described in details
in Appendix B. In total, 132 proteins were selected after quality control. For the significant proteins,
extended name and main functions are reported in Table B.1 (Appendix B).
We first assessed by univariate analysis if protein levels are significantly different between
the S and NS separately at the two time points (Wilcoxon rank-sum test p-value < 0.05, FDR < 0.15).
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
66
Proteins P02745, Q86VB7, Q96PD5 and Q9Y5Y7 were significantly different between S and NS at D1
(Figure 4.1) and proteins P05543, P13796 and P36222 at D7 (Figure 4.2).
Figure 4.1 - Boxplots of protein peak intensities significantly different between S (blue) and NS (orange) at D1 (Wilcoxon rank-sum test p < 0.05, FDR < 0.15). Distribution of differences is shown as box-plot, each plot represents a different protein: P02745, Complement C1q subcomponent subunit A; Q86VB7, Scavenger receptor cysteine-rich type 1 protein M130; Q96PD5, N-acetylmuramoyl-L-alanine amidase; Q9Y5Y7, Lymphatic vessel endothelial hyaluronic acid receptor 1.
Figure 4.2 - Boxplots of protein peak intensities significantly different between S (blue) and NS (orange) at D7 (Wilcoxon rank-sum test p < 0.05, FDR < 0.15). Distribution of differences is shown as box-plot; each plot represents a different protein: P05543, Thyroxine-binding globulin; P13796, Recombinase Flp protein; P36222, Chitinase-3-like protein 1.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
67
4.3.3 Time trend variation of proteins and metabolites
Changes in proteins levels from D1 to D7 within the same group was assed as well: 14
proteins significantly change from D1 to D7 in the NS group and 10 in the S group (Wilcoxon rank-
sum test p-value < 0.05). Of these proteins, 9 are significantly different from D1 to D7 in both groups.
The temporal trends in the two groups and reported in Table 4.3.
Differences in the ratio D7/D1 between S and NS patients for proteins and metabolites are
shown in Figure 4.3 and 4.4: 9 proteins and 5 metabolites are significantly different between the
two groups.
Figure 4.3 - Boxplot of the ratio D7/D1 of protein peak intensities significantly different between S (blue) and NS (orange) (Wilcoxon rank-sum test p < 0.05, FDR < 0.15). Distribution of differences is shown as box-plots. Each plot represents a different protein: P00746, Complement factor D; P00915, Carbonic anhydrase 1; P02649, Apolipoprotein E; P02745, Complement C1q subcomponent subunit A; P02746, Complement C1q subcomponent subunit B; P02765, Alpha-2-HS-glycoprotein; P05155, Plasma protease C1 inhibitor; P18065, Insulin-like growth factor-binding protein 2; Q9Y5Y7, Lymphatic vessel endothelial hyaluronic acid receptor 1.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
68
S NS
D1 D7 TREND D1 D7 TREND P00751 16.704(16.203,17.223) 16.308(16.183,16.619) ↓ 17.004(16.665,17.101) 16.630(16.122,16.759) * ↓ P01011 17.046(16.880,17.197) 16.799(16.493,17.059) ↓ 17.292(16.897,17.452) 16.509(16.228,16.994) * ↓ P02649 15.413(14.987,15.639) 15.691(15.234,16.187) ↑ 15.222(15.077,15.757) 16.347(16.001,16.892) * ↑ P02741 17.228(16.847,18.673) 15.921(15.482,16.165) * ↓ 18.037(17.845,18.162) 16.385(15.815,16.683) * ↓ P02750 17.099(16.557,17.530) 16.510(15.985,16.653) * ↓ 17.445(16.468,17.906) 16.611(15.951,17.189) * ↓ P06681 15.394(15.059,15.611) 14.951(14.627,15.195) * ↓ 15.579(15.479,15.909) 15.446(15.129,15.582) * ↓ P07358 15.294(14.675,15.771) 14.908(14.361,15.434) ↓ 14.573(14.521,15.862) 14.434(14.298,15.415) * ↓ P07360 15.731(15.366,16.235) 15.434(15.343,15.690) * ↓ 15.889(15.530,16.313) 15.490(15.107,15.969) * ↓ P15169 14.701(14.506,15.917) 14.482(14.119,15.433) ↓ 14.922(14.082,15.805) 14.641(13.806,15.523) * ↓ P18428 15.553(14.594,15.776) 14.077(13.318,14.515) * ↓ 15.485(14.620,15.728) 14.378(13.827,14.895) * ↓ P22792 14.339(14.097,14.900) 14.112(13.789,14.686) * ↓ 14.521(14.228,14.798) 14.190(13.916,14.551) * ↓ P25311 16.106(15.665,16.971) 17.427(16.729,17.668) * ↑ 16.466(15.381,17.172) 17.398(16.155,17.793) * ↑ P36222 13.330(12.748,14.194) 11.798(11.524,11.972) * ↓ 14.467(14.065,14.723) 12.388(12.076,13.035) * ↓ P49908 13.549(13.239,14.047) 14.240(14.188,14.828) * ↑ 13.163(12.914,13.674) 13.938(13.582,14.338) * ↑ Q15582 14.452(13.352,14.611) 13.777(12.819,14.227) * ↓ 14.157 (13,277, 14,384) 14.079 (13.147, 14.143) ↓
Table 4.3 - Comparison of protein level changes in survivors (S) and non-survivors (NS) at D1 and at D7. Significant differences between D1 and D7 are marked with * (Wilcoxon sign-rank test pval<0.05). Plasma concentrations are expressed as peak intensities and shown as median (25, 75 percentiles).
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
69
Figure 4.4- Boxplot of the ratio D7/D1 of metabolites concentrations significantly different between S (blue) and NS (orange) (Wilcoxon rank-sum test p < 0.05, FDR < 0.15). Distribution of differences are shown as box-plots.
4.3.4 Multivariate analysis
4.3.4.1 Regression analysis for targeted metabolomics data
We used classification models with the aim of identifying the set of features which are mostly
associated to the target class, i.e. the non survivors (NS). The coefficients of the models obtained
from metabolomics concentrations only are reported in Table 4.4. The interpretation of the
coefficients in a logistic regression is not trivial. If we express the relationship as:
𝑝𝑝1−𝑝𝑝
= exp (𝛽𝛽0 + 𝛽𝛽1𝑥𝑥1 + 𝛽𝛽2𝑥𝑥2+. . ) (4.1)
we can say that if the coefficient βi is positive then an increase of feature xi will be associated with
an increase of the odd ratio, i.e. the probability to belong to class 1 is higher than to class 0, all other
variables xj being equal. On the contrary, if the coefficient βi is negative, then an increase of the
feature xi will be associated with a decrease of the odd ratio, i.e. the probability to belong to class
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
70
1 is lower. Three metabolites were selected in all models: PC aa C42:6, PC aa C36:6 and tyrosine.
Figure 4.5.A shows the coefficient values of the model built according to the criterion of minimal
deviance on the first 30 ranked features. All the obtained models correctly classify the observations
in the testing set.
4.3.4.2 Regression analysis for targeted metabolomics and proteomics data
We build the classification models combining metabolomics and proteomics data, as
described in the methods section. The coefficients of the models are reported in Table 4.5; Figure
4.5.B shows the coefficient values of the model built according to the criterion of minimal deviance
on the first 30 ranked features. From Table 4.5 we can notice that lysoPC a C24:0 and the protein
P02745 are selected by all models. Moreover, PC aa C36:3 and PC aa C42:6 were again selected in
these models, and their coefficients maintain the same signs as in previous ones. All the obtained
models correctly classify the observations in the testing set.
4.3.4.3 Regression analysis for targeted metabolomics, proteomics and clinical data
Finally, we build a model combining metabolomics, proteomics and clinical data as described
in the methods section. The coefficients of the models are reported in Table 4.6. Figure 4.5.C shows
the coefficient values of the model built according to the criterion of minimal deviance on the first
30 ranked features. We can notice that also in these models the protein P02745 appears among the
most important predictors. Another protein, i.e. P02790, and PC aa C34:3 were also selected by all
models. All the obtained models correctly classify the observations in the testing set.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
71
Figure 4.5- Coefficient values of the logistic regression models built according to the criterion of minimal deviance on the first 30 ranked features for targeted metabolomics (panel A), integration of metabolomics and proteomics (panel B) and for the integration of omics data with clinical parameters (panel C).VCP: Central Venous Pressure; PEE: Positive End-expiratory pressure; PAC: PaCO2; MAP: Mean Arterial Pressure.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
72
10 features 20 features 30 features METABOLITES min Dev fixed λ min Dev fixed λ min Dev fixed λ
PC aa C42:6 -0.763 -0.213 -0.672 -2.083 -0.557 -0.466 PC aa C40:6 - - - - -0.380 -0.005 PC ae C42:1 - - - -1.009 - - lysoPC a C24:0 -0.498 -0.223 - -0.622 -0.233 -0.241 lysoPC a C20:4 - - - - -0.188 -0.025 SM OH C16:1 - - - -1.137 - - SM C24:1 - - - 0.263 -0.182 - SM C22:3 - - - - - -0.167 SM C24:0 - - - - - -0.030 PC ae C42:5 - - - - -0.160 - PC aa C42:2 -1.103 - -0.136 - -0.149 - PC aa C34:4 -1.333 - - -0.191 - - Met - - - - -0.105 -0.082 PC ae C30:2 - - - -0.565 -0.073 -0.173 PC aa C36:6 - - - - 0.013 - PC aa C42:5 - - - - 0.063 0.271 PC aa C36:3 2.280 0.338 0.442 1.931 0.178 0.479 Pro - - 0.652 1.946 0.198 - PC aa C34:3 - - 0.262 - 0.582 - PC aa C42:1 - - 1.135 2.824 0.653 0.716 Tyr 3.151 0.061 0.021 1.075 0.751 0.126 PC ae C30:1 - - 0.305 1.151 0.820 0.300 Creatinine - - 0.377 1.489 1.623 0.253 Performance Dev=4.02 Dev=23.77 Dev=8.69 Dev=24.98 Dev=9.15 Dev=25.62
Table 4.4 - Coefficient values of the logistic regression models for the first 10, 20 and 30 metabolites, computed according the two strategies (minimal deviance and estimated λ). The coefficients of the metabolites which are common to all models are in bold. The bottom row reports values of deviance of the obtained models.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
73
10 features 20 features 30 features FEATURES min Dev fixed λ min Dev fixed λ min Dev fixed λ
P02790 - - -0.416 -0.227 -1.630 -0.354 lysoPC a C24:0 -1.175 -0.641 -0.993 -0.372 -1.251 -0.628 PC aa C42:6 - - -0.801 -0.395 -0.186 -0.579 P02745 -1.087 -0.829 -0.187 -0.289 -0.774 -0.497 P20851 -0.485 - - - - - lysoPC a C17:0 - - - - -0.306 -0.105 SM OH C16:1 - -0.096 -0.335 -0.050 - -0.487 P02746 - - -0.235 -0.091 -0.249 -0.347 PC aa C42:2 -0.064 - - - -0.044 - PC aa C34:3 - - - 0.115 0.012 - PC ae C30:1 - - - - 0.017 0.346 O75882 0.389 0.239 - 0.086 0.212 0.472 Pro - - 0.054 - 0.238 0.169 P06276 - - - - 0.240 - P06727 - - - - - 0.245 P19823 - - 0.701 0.357 0.256 0.913 P05543 0.093 0.301 0.011 0.108 - 0.327 PC ae C42:1 - - -0.286 - 0.283 - Tyr - - - - - 0.309 P01034 - - - - 0.909 0.895 PC aa C36:3 - - 0.628 0.343 1.184 0.888 Performance Dev=10.6 Dev=33.89 Dev=8.66 Dev=21.52 Dev=3.96 Dev=16.21
Table 4.5 - Coefficient values of the logistic regression models for integration of metabolomics and proteomics built on the first 10, 20 and 30 features and computed according the two strategies (minimal deviance and estimated λ). The coefficients of the features which are in common to all models are in bold. The bottom row reports values of deviance of the obtained models.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
74
10 features 20 features 30 features FEATURES min Dev fixed λ min Dev fixed λ min Dev fixed λ
P02745 -1.140 -0.664 -0.410 -0.805 -0.374 -0.450 PC ae C34 3 - - - - -0.235 - Mean Arterial Pressure -0.580 -0.148 -0.255 -0.221 - PC aa C34 3 -0.315 -0.325 -0.279 -0.271 -0.187 -0.027 PaCO2 - - -0.192 -0.539 -0.138 -0.118 P02790 -0.273 -0.306 -0.383 -0.229 -0.117 -0.135 ScvO2 - - -0.082 -0.067 - - Serum bilirubine - 0.029 - - FiO2 - - 0.137 0.291 - 0.023 O75882 - - - - 0.150 P20851 - - - - 0.169 PC ae C44 4 - - - - 0.211 0.149 PEEP 0.738 0.299 - 0.439 0.267 0.054 Central Venous Pressure - - - - 0.372 - Heart Rate 0.124 - - - - - P06276 - - 0.260 - - - PC ae C44 4 - - 0.379 0.572 - - Urine Output 0.414 - 0.138 0.242 - - Serum creatinine - - 0.432 0.423 - 0.033 Performance Dev=10.22 Dev=27.08 Dev=11.65 Dev=32.83 Dev=11.75 Dev=30.04
Table 4.6 - Coefficient values of the logistic regression models for integration of omics data with clinical parameters built on the first 10, 20 and 30 features and computed according the two strategies (minimal deviance and estimated λ). The coefficients of the features which are in common to all models are in bold. The bottom row reports values of deviance of the obtained models.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
75
4.3.4.4 Discriminant analysis
Table 4.7, 4.8 and 4.9 report the coefficient values of the LDA models and the VIP scores of
the PLS-DA models built on the first 10 and 20 ranked features according to mRMR for targeted
metabolomics, for metabolomics and proteomics data, and for omics and clinical data respectively.
We cannot use the entire subset of 30 features due to the lower number of observations (i.e. 17
patients only). In fact, as explained in §2.4, the computation of the boundary region requires the
covariance matrix to be invertible and this is not the case.
In the metabolites model, it is worth to notice that PC aa C36:3, which already played an
important role in the regression models, occupies the second and first position in the VIP ranking,
when considering 20 and 10 features respectively. As for the integrated model, P02745 is in the first
position and lysoPC a C24:0 the third (20 feature model) and second (10 feature model), thus
confirming the importance of these features already emerged from the regression analysis. In the
classification models for omics and clinical data and for metabolomics and proteomics data, we can
notice that P02745 occupies the first position followed by another protein, i.e. P02790, in
agreement to what arose from the regression analysis. Three-dimensional PLS-DA score plots on 20
features for the three models are shown in Figure 4.6. In all cases, the groups separate perfectly.
METABOLITES VIP PLS-DA 20 VIP PLS-DA 10 LDA PC aa C42:1 1.633 - - PC aa C36:3 1.400 1.516 10.290 lysoPC a C24:0 1.372 1.411 0.200 lysoPC a C17:0 1.233 - - PC aa C42:6 1.159 1.120 1.785 PC ae C30:1 1.137 - - PC aa C34:3 1.118 - - Tyr 1.096 1.042 7.865 Pro 1.058 - - Creatinine 0.965 - - PC ae C42:1 0.915 0.827 -3.442 PC ae C30:2 0.858 0.836 6.010 PC aa C42:2 0.815 0.886 -6.153 SM C24:1 0.756 - - PC aa C42:5 0.753 - - SM OH C16:1 0.639 0.747 -13.680 PC ae C34:3 0.600 0.417 - PC aa C36:6 0.585 - -3.395 PC aa C34:4 0.521 0.686 1.921 PC ae C44:4 0.266 - -
Table 4.7 – VIP scores of PLS-DA and coefficients of LDA for the metabolites models.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
76
FEATURES VIP PLS-DA 20 VIP PLS-DA 10 LDA P02745 1.438 1.576 -4.822 PC aa C36:3 1.367 - - lysoPC a C24:0 1.324 1.433 -2.041 P19823 1.282 - - PC aa C42:6 1.267 - - P02746 1.249 - - P02790 1.223 - - P05543 1.053 1.092 1.627 PC aa C34:3 1.035 - - PC aa C42:2 0.936 1.006 0.729 PC ae C42:1 0.909 - - SM OH C16:1 0.876 0.782 -0.979 O75882 0.866 0.920 0.961 P22792 0.801 0.826 -2.583 P16070 0.745 0.653 1.940 P20851 0.721 0.438 -0.326 P06276 0.629 0.706 0.692 Q14520 0.533 - - PC ae C34:3 0.397 - - PC ae C44:4 0.232 - -
Table 4.8– VIP scores of PLS-DA and coefficients of LDA for the integrated metabolomics and proteomics data models.
FEATURES VIP PLS-DA 20 VIP PLS-DA 10 LDA P02745 1.681 1.620 -15.017 P02790 1.455 1.412 1.349 PC ae C44 4 1.334 - - PvCo2 1.308 - - PEEP 1.235 1.053 5.906 PaCO2 1.229 - - PC aa C34 3 1.124 1.030 0.172 FiO2 1.061 - - Serum creatinine 0.986 - - Urine Output 0.984 0.794 8.590 P05543 0.896 - - Mean arterial Pressure 0.847 0.825 -1.140 Serum bilirubine 0.814 0.601 6.668 P06276 0.659 - - Serum lactate 0.601 0.268 -1.285 Central Venous Pressure 0.559 - - Heart Rate 0.552 0.789 6.036 ScvO2 0.522 - - P16070 0.515 0.919 6.023 pHa 0.266 - -
Table 4.9 - VIP scores of PLS-DA and coefficients of LDA for the integrated omics data and clinical parameters models.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
77
Figure 4.6 - Three-dimensional PLS-DA score plots on 20 features for the metabolites model (panel A), integration of targeted metabolomics and proteomics (panel B) and for the integration of omics data with clinical parameters (panel C).The two groups are perfectly separated.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
78
4.4 EXPLORATIVE ANALYSES
Explorative analysis by probabilistic graphical models have been performed with the aim to
highlight dependences among features, and to verify whether there are differences in this
dependences between S and NS patients. To this purpose, we considered the dataset on which we
built the integration model for metabolites and proteins (i.e. first 50 ranked metabolites and
proteins) to build a Markov Network (MN) for S and NS patients respectively. We adopted a two-
step approach. Firstly, the maximum likelihood network was found by applying the algorithm of
Chow and Liu[118]. Afterwards, forward search was performed on each triangulated graph: the
algorithm repeatedly adds the edge that optimizes a selected measure until no more add-eligible
edges are found. For both steps, the minimized measure used is the Bayesian Information Criterion,
as described in Chapter 2.6. The two networks obtained for S and NS patients are shown in Figure
4.7.
Since it may be difficult to capture meaningful dependencies among so many features, we
isolated the so called “hubs”, i.e. nodes with several direct neighbors (colored in the figure). In fact,
due to their high connectivity, we speculate that these nodes could have an important role in the
network. It is also possible to see that there is a high number of “leafs", i.e. vertices having only one
edge. By comparing the two networks, we can notice that they have a different structures and
different hubs. More precisely, the S group network has one hub which is connected with six direct
neighbors, whereas the NS group network has one hub connected with seven direct neighbors. The
number of leafs was 17 in the R model (34%) and 14 in the other (28%).
To better understand the network structure, we concentrated on the neighborhood of the
different hubs up to the second node. The hub is constituted by P19823 in S patients, and by
methionine (Met) in NS patients, as shown in Figure 4.8.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
79
Figure 4.7- MN model of metabolites concentration in S (left panel) and NS (right panel) patients, highlighting the hubs (1 red for S and 1 blue for NS).
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
80
Figure 4.8- MN model of metabolites concentration in S (left panel) and NS (right panel) patients, highlighting the hubs (1 red for S and 1 blue for NS).
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
81
4.5 DISCUSSION
We performed a feature reduction to select the variables to enter the classification models,
which were again built using different techniques (regularized logistic regression and discriminant
analysis).
Our results are in line to the previous results (Chapter 3) and confirm the involvement of
lysoPCs and PCs in septic shock progression, which indicates an overall lipidome alteration in NS
patients. The novelty of the present study is the integration with the proteomics analysis. In
particular, the proteins significantly different between the two groups are involved in the pathways
of coagulation, innate immunity and inflammatory response (see Table B.1 in Appendix B for a
complete list). We focused in particular on one protein, P02745, i.e. complement C1q
subcomponent subunit A, whose peak intensities were significant both in the univariate and
multivariate analysis (see Figure 4.1, 4.3 and 4.5). Complement C1q subcomponent subunit A
associates with the proenzymes C1r and C1s to yield C1, the first component of the serum
complement system, as shown in Figure 4.9.
Figure 4.9 – Scheme of the complement protein C1 and of its subcomponents. Efficient activation of C1 takes place on interaction of the globular heads of C1q with the tail of IgG or IgM antibody present in immune complexes.
The complement system is a part of the immune system, which enhances the ability of
antibodies and phagocytic cells to clear microbes and damaged cells and promotes inflammation. It
consists of a several small proteins circulating in the blood as inactive precursors. After stimulation,
specific proteases cleave proteins to release cytokines and initiate an amplifying cascade of further
cleavages. The end result of this complement activation cascade is stimulation of phagocytes to
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
82
clear foreign and damaged material, promotion of inflammation to attract additional phagocytes,
and activation of the cell-killing membrane attack complex[119].
Even if the role of protein P02745 in sepsis has not been completely elucidated yet[120], the
involvement of the coagulation and complement systems in sepsis is well known, as already
illustrated in paragraph 1.2. Thereby, to confirm the mass spectrometry data and to further
strengthen this finding, a validation using an antibody-based method (ELISA), was performed to
have a quantitative measure of the protein P02745. The concentration values measured in the two
groups were compare by Wilcoxon rank-sum test at D1 and D7. A significant difference between S
and NS was found only at D1, in line with what already found for the data expressed as peak
intensities. As reported in Figure 4.10, even the trend is the same: the protein concentration is
higher in NS patients than in S ones.
Figure 4.10 – Boxplot showing P02745 concentration measured by ELISA (left) and by mass spectrometry (right). In both cases the difference is significant (Wilcoxon rank-sum test p-value<0.05) and the protein is more abundant in NS patients.
4.6 REMARKS
In conclusion, our results confirm the feasibility of our data mining approach for the analyses
of proteomics data and for the integration with the metabolomic ones. In respect to our previous
analyses on metabolomic data only (see Chapter 3), the integration with proteomics seems to
indicate the importance of the interaction between inflammation, coagulation and the complement
system in sepsis, which is in line with the recent findings[12].
This aspect is very interesting, since it is an example of how data integration can better
elucidate the several pathways involved in septic shock, thus enabling to have a more complete
view of disease progression. Although further analyses are needed, this may constitute an important
step toward the identification of the molecular mechanisms on which could be the target for new
therapies.
4. Integration of metabolomics and proteomics: an ancillary study on ALBIOS database
83
As for the integration with clinical data, the models are of difficult interpretation. From the
regression analysis, it may seem that the clinical parameters are more relevant than the omics ones
but we argue that this could be due to the fact that we consider a quite long time interval (7 days).
However, the protein P02745 still has the highest weight in the model. More focused investigations
are needed to better elucidate the interplay between clinical parameters and omics data in order
to better characterize the patient’s profile.
84
5 CHARACTERIZATION OF A METABOLOMIC PROFILE ASSOCIATED
WITH RESPONSIVENESS TO THERAPY IN THE ACUTE PHASE OF
SEPTIC SHOCK
Elucidation of early metabolic signatures associated with the progression of septic shock and
with responsiveness to therapy can be useful in the development of a target therapy. In this study,
we examined the plasma metabolome of 21 septic shock patients enrolled for the ShockOmics
clinical trial (NCT02141607). Part of this work was submitted as an abstract to the 40th Annual
Conference on shock1, and as journal paper to Scientific Reports2. Our aim was to verify if different
responses to therapy, assessed as change in the Sequential Organ Failure Assessment (SOFA) score
measured at admission (T1, acute phase) and 48 hours after (T2, post-resuscitation), are associated
to a different trend in metabolite patterns. To this purpose, we combined untargeted and targeted
mass spectrometry-based metabolomics strategies to cover as much as possible the plasma
metabolites repertoire. Metabolite concentrations changes from T1 to T2 (expressed as Δ = T2-T1)
were used to build classification models. Our results support the emerging evidence that lipidome
alteration plays an important role in the individual patient response to infection. The understanding
of regulatory pathway of lipids is thus important for the development of an effective and tailored
therapy. Furthermore, alanine indicates a possible alteration in glucose-alanine cycle which occurs
in the liver thus providing a different picture on liver functionality than the bilirubin.
Blood samples were analyzed at the laboratory of Mass Spectrometry at IRCCS Mario Negri
Institute in Milan under the supervision of Dr. Roberta Pastorelli.
We will present the rationale behind, the study design, the dataset and the methods applied.
Afterward, the obtained results will be compared and discussed.
1 A. CAMBIAGHI, B. Bollen Pinto, L. Brunelli, F. Falcetta, K. Bendjelid, F. Aletti, R. Pastorelli, M. Ferrario, “Responsiveness to therapy in the acute phase of septic shock: a metabolomics analysis”, 40th Annual Conference on shock, Fort Lauderdale, Florida, June 9-12 2017 (submitted). 2 A. CAMBIAGHI, B. Bollen Pinto, L. Brunelli, F. Falcetta, K. Bendjelid, F. Aletti, R. Pastorelli, M. Ferrario, “Characterization of a metabolomic profile associated with responsiveness to therapy in the acute phase of septic shock”, Scientific Report (submitted)
5. Metabolomics profile associated to responsiveness to therapy
85
5.1 INTRODUCTION
The most important phase in critically ill patients, such as septic shock patients, is the initial
one, i.e. shock diagnosis and beginning of treatment administration. In fact, early supportive therapy
with fluid resuscitation and vasopressors to restore hemodynamics and decrease tissue
hypoperfusion is decisive on the patient’s outcome and has been part of treatment guidelines for
decades[27]. However, mortality rates for septic shock may reach 60% in the era of early recognition
and treatment[9], with the present poor prognosis being mainly related to multiple organ
dysfunction (MOF). Modest improvement in septic shock survival can be explained by the inability
to prospectively identify patients, who are most likely to benefit from singular therapy and by the
absence of predictive monitoring markers of drug delivery and response. The research community
is becoming always more aware that the subject response to therapy is important and precision
medicine is coming an important research topic also for acute illness condition and septic
shock[121]. Precision medicine extends personalized medicine beyond the genome to include
broader systems, multilevel approach to the tailoring of therapeutics to individual patients.
Recently the interest in metabolomic approaches is increasing as the metabolome
represents the end result of gene and protein function and activity and therefore may provide a
more sensitive readout of drug response phenotypes because most drugs impact components of
metabolism[122]. Several studies used metabolomics analyses of various classes of blood
metabolites in search of predictive signatures of intensive care unit (ICU) mortality in adult
patients[98], [123], [124], but less attention has been given to the investigation of putative
metabolic determinants able to classify patient responsiveness to initial therapy during the first 48
hours in ICU.
To the best of our knowledge, this is the first study which investigate a population of septic
shock patients during the acute phase. We examined the plasma metabolomics profile of septic
shock patients during the acute phase of resuscitation. Blood samples were collected at study
enrolment (time T1) and after about 48 hours (time T2). The patients received initial therapy
according to the standards[27] immediately after shock diagnosis (time T0).The time interval
between T0 and T1 was on average 10 hours.
We merged untargeted and targeted mass spectrometry-based metabolomics strategies to
cover as much as possible the plasma metabolites repertoire. We first adopted an unbiased strategy
(untargeted metabolomics) towards profiling as many plasma metabolites as possible without any
a-priori hypothesis. To this purpose, a rapid and yet accurate mass metabolic profiling performed
5. Metabolomics profile associated to responsiveness to therapy
86
by direct flow injection-TOF-MS[125] was applied as untargeted screening to explore the main
perturbed metabolic features.
Targeted metabolomics instead is a method in which a specified list of metabolites is
measured and quantified according to a standard in order to achieve absolute quantification of
defined metabolite classes[126]. Since metabolic signatures showing alteration in circulating
kynurenine, fatty acids, lysophosphatidylcholines species and /or carnitine esters have already been
reported in different settings of septic shock patients[89],[123],[124],[127], and in our previous
analyses (see Chapter 3 and 4), we supposed that they might be conceivably involved in the first
phase of shock as well, and they could help in understanding the different trajectory in SS patients.
Consequently, we also applied a targeted approach focused on the measurement of these specific
metabolic classes to provide the magnitude of their level changes in our clinical setting and
eventually validate the information obtained by our untargeted analysis.
The primary objective of these analyses was to verify whether a different response to
therapy, measured as changes in organ dysfunction, i.e. Sequential Organ Failure Assessment
(SOFA) score, is associated to a different trend in metabolite patterns. Our aim is thus to provide a
thorough description of the possible biological pathways which characterize this population so to
suggest putative biomarkers to be next investigated.
5.2 MATERIAL AND METHODS
5.2.1 Study design, patients and clinical data
The current work is an ancillary study from the multicenter prospective observational trial
named ShockOmics (ClinicalTrials.gov Identifier NCT02141607). Details of the protocol are fully
described in the work of Aletti et al.[64]. Between October 2014 and December 2015, patients
admitted with septic shock to the ICU of Geneva University Hospitals were screened for inclusion.
Adult (>18 years old) patients with an admission SOFA score ≥6 and arterial lactate levels ≥ 2mmol/l
were enrolled. Patients with a high risk of death within the first 24 hours after admission, systemic
immunosuppression, hematological diseases, metastatic cancer, pre-existing dialysis,
decompensated cirrhosis or patients that had received more than 4 units of red blood cells or any
fresh frozen plasma before ICU admission were excluded. Informed consent was obtained from
patients or proxies. Patient management was performed by the clinical care team according to
international guidelines[27].
5. Metabolomics profile associated to responsiveness to therapy
87
For each patient, plasma samples were available at the time point named T1 (i.e. acute-
phase, within 16 hours after ICU admissions or development of shock) and at the time point named
T2 (i.e. the post treatment phase). In this study, we analyzed blood samples from 21 patients,
available both for untargeted and targeted metabolomics analyses. Patients were classified into two
groups according to their responsiveness to therapy. All the patients had a SOFA score higher than
9 at T1; patients who still have a SOFA score higher than 8 at T2 and didn’t show a decrease of at
least 4 points were classified as not responsive to therapy (NR), in the other cases they were
classified as responsive to therapy (R). In other words, the NR group consists of 7 patients, who had
at T2 a SOFA score> 8 and Δ SOFA< 5 (Δ= T1-T2 values of SOFA).
5.2.2 Statistical analysis
5.2.2.1 Data from untargeted metabolomics
An untargeted analysis by flow injection-TOF-MS was performed to screen for metabolic
features significantly characterizing the responsiveness (R group) and non-responsiveness (NR
group) to therapy in septic shock patients. Technical details can be found in Appendix A. A total of
14001 and 2190 metabolite masses were measured as peak intensities in positive and negative ion
mode respectively. Given the high number of masses measured, we performed some preliminary
statistical analyses in order to select only the most significant ones for successive metabolite
identification.
Firstly, we tested the presence or absence of the species in the two groups, i.e. if the
incidence of the peaks at each mass-to-charge ratio is different between the groups at T1 and at T2.
We constructed contingency tables for each m/z by counting the number of patients in R and NR
group having such ion detected (i.e. above the limit of detection) and we applied the Fisher Exact
Test by considering the data at T1 and T2 separately. We constructed also contingency tables for
each m/z by counting the number of patients having such ion detected at T1 and at T2 and we
applied the McNemar test to test whether the incidence of detected masses change from T1 to T2.
In positive ion mode, 63 masses at T1 and 172 at T2 have a statistically significant different incidence
(p-value<0.05) between R and NR, whereas in negative ion mode 8 masses at T1 and 20 at T2.
McNemar test was significant (p-value<0.05) for 653 and 119 masses in positive and negative mode
respectively (results not shown).
5. Metabolomics profile associated to responsiveness to therapy
88
As second step, we compared the peak intensities distributions. Unpaired and paired
univariate analysis were performed by means of Wilcoxon rank-sum test and by Wilcoxon signed-
rank test respectively.
To overcome the problem of the large number of statistical comparisons, in all analyses, the
calculation of the false discovery rate (FDR) was applied to the p-values obtained from the tests.
Results were considered statistically significant when p-value <0.05 and FDR <0.15. For the
univariate analysis, only masses for which peaks were detected in more than 5 pts in R group and in
more than 3 pts in NR, were considered. 25 and 79 masses were significantly different between R
and NR at T1 and T2 respectively in positive ion mode; 10 and 19 masses in negative ion mode
(Wilcoxon rank-sum test, p<0.05). As for the paired analysis (T1 vs T2 within the same group), in
positive ion mode, 119 and 48 masses significantly changed from T1 to T2 in R and NR respectively;
in negative ion mode 50 and 41. This preliminary analysis was used for the selection of the
metabolites to be identified (Appendix A).
For the identified metabolites, we evaluated the ability of separating the two groups of each
metabolite individually by computing the area under the ROC curve, applying the leave-one-out
cross-validation (Figure 5.1). Those species identified by untargeted metabolomics and quantified
also by targeted approach were compared in order to verify if their peak intensities and their
concentrations were correlated by means of the Spearman correlation analysis (Figure 5.2).
Figure 5.1– AUC analyses for untargeted metabolomics. The ability of separating the two groups of delta in peak intensities of the identified species individually was evaluated by computing the area under the ROC curve using the leave-one-out cross-validation (CV) technique. Notice that the performance in classifying the two groups is poor: the average ACU of each metabolites is below 0.8, with the only exception of creatinine.
5. Metabolomics profile associated to responsiveness to therapy
89
Figure 5.2– Spearman correlation between concentrations and peak intensities of the same species quantified with both approaches (targeted and untargeted respectively). Notice that all species have a significant good correlation (ρ > 0.85 and p-value<10-5).
5.2.3 Data from targeted metabolomics analysis
A targeted quantitative approach using a combined direct flow injection and liquid
chromatography (LC) tandem mass spectrometry (MS/MS) assay was applied for targeted
metabolomics analysis (see Appendix A for technical details). We compared the metabolite
concentrations measured at T1 and T2 of R and NR groups by means of Wilcoxon rank-sum test. The
variations in metabolites concentration from T1 to T2 were compared separately for the R and NR
group by means of Wilcoxon signed-rank test. Finally, for each metabolite, the time-trend variations
in metabolites concentration (i.e. ∆=T2-T1) were compared between the two groups by Wilcoxon
signed-rank test. To overcome the problem of the large number of statistical comparisons, we
computed also the false discovery rate (FDR). The FDR was assessed after the bootstrapping
procedure. The sample size was increased from 14 to 20 subjects for the R group and from 7 to 10
subjects for NR by a bootstrapping with replacement, for a total of 30 observations. Bootstrapping
procedure was used only for the FDR assessment. Results were considered statistically significant
when p-value <0.05 (no bootstrapping) and FDR <0.15. Also for the metabolites concentrations, we
5. Metabolomics profile associated to responsiveness to therapy
90
evaluated the ability of separating the two groups of each metabolite individually by computing the
area under the ROC curve by applying the leave-one-out cross-validation (Figure 5.3).
Figure 5.3 – AUC analyses for targeted metabolomics. Only the best 30 metabolites are shown (AUC > 0.5). The ability of separating the two groups of the delta of each metabolite individually was evaluated by computing the area under the ROC curve using the leave-one-out cross-validation (CV) technique. Notice that the performance in classifying the two groups is poor: the average ACU of each metabolites is below 0.8.
5.2.4 Multivariate analyses
5.2.4.1 Data from targeted metabolomics analysis
Our aim was to classify NR patients. The classification models were built on metabolite
concentrations changes from T1 to T2, expressed as Δ = T2-T1. Since metabolite concentrations are
highly correlated and the number of observations (21 patients) is much lower than the number of
features (130 metabolites) it was necessary perform features reduction before building the model.
We adopted the mRMR algorithm, as previously described, and we discretized the features
distribution according to the interquartile range. Multivariate analysis was performed similarly to
what already presented in Chapter 4. Briefly, we considered the first 10, 20 and 30 ranked
metabolites to build three different classification models. The dataset was divided into a training
and test set as two third and one third of the observations, respectively. Data were normalized (Z
score normalization) before performing the elastic net.
We adopted two strategies to further select a smaller subset of features. We performed 50
times an elastic net logistic model using a logit function to fit the training set data. We considered a
binary classification (R = 0, NR = 1) and the output of the model is a value between 0 and 1, which
represents a sort of probability. We then selected the coefficients of the model with the minimal
deviance. We also applied another strategy, we used the shrinkage parameter λ, corresponding to
5. Metabolomics profile associated to responsiveness to therapy
91
the model with the minimal deviance, to fit another elastic net model and to obtain the coefficients
of the logistic regression. In both cases, the models were then evaluated on the testing set and the
performance were assessed by the number of correct imputations.
LDA and PLS-DA were also implemented. More precisely, LDA was performed on the first 10
ranked metabolites and the coefficients for the linear boundary between the first and second
classes were retrieved. PLS-DA was performed both on the first 10 and 20 ranked metabolites,
considering 3 PLS components. Since the groups are unbalanced, the data matrix was weighted
centered in order to avoid having a decision boundary shifted towards the most numerous group.
The performance of the classification models was evaluated by the number of correct imputations.
5.2.4.2 Integration of data from targeted and untargeted analysis
We built an integrated model by using the concentration of the metabolites and the peak
intensities of the species identified by untargeted analysis. We must precise that for those
metabolites quantified also in the targeted approach (i.e. acetylcarnitine, tyrosine and histidine),
we used the concentration values instead of peak intensities as they are more reliable. Therefore,
the metabolites identified from untargeted approach and used for these analyses were:
acetylcarnitine, pyruvic acid, lactic acid, stearic acid, kynuramine, citric acid, myristic acid,
palmitoleic acid, palmitic acid, oleic acid, tyrosine, histidine. We built the model on the changes
from T1 to T2 (Δ = T2-T1). Untargeted metabolomics data (10 features in total) were then combined
with the first 20 ranked metabolites from targeted analysis. We considered all the 30 features and
we performed again the mRMR algorithm to find the first 10, 20 and 30 ranked features. The
classification models were built using the regularized logistic regression on normalized data as
described in the previous paragraph. LDA and PLS-DA were also performed as stated above.
5.3 RESULTS
5.3.1 Clinical characteristics of the study population
The characteristics of the 21 patients at study enrolment are reported in Table 5.1, whereas
comorbities, sources of infection and administered therapies are illustrated in Table 5.2. No
significant differences were found between the two groups at enrolment. However, even if not
significantly different, in the not responsive to therapy (NR) group one patient died within one week;
we also noticed a higher percentage of non survivors within 28 days (43% in NR group vs 14% in the
R group) and a longer hospital stay (33 days in NR vs 15 days in R).
5. Metabolomics profile associated to responsiveness to therapy
92
ALL PATIENTS R NR # patients 21 12 (57%) 9 (43%)
Sex (Male) [# (%)] 16 (76%) 8 (67%) 8 (89%) Age (years) 69.649 (63.366, 80.26) 66.503 (60.622, 74.784) 74.918 (66.198, 81.886)
BMI - Body Mass Index 26.85 (24.975, 30.335) 26.54 (23.505, 30.630) 27.76 (25.46, 30.407) Heart Rate (bpm) 111 (91.5, 125.5) 110 (89.5, 122) 111 (98, 134)
Mean Arterial Pressure (mmHg) 60 (56.25, 62.25) 58.5 (54, 61.5) 62 (58.5, 63.25) Systolic Arterial Pressure (mmHg) 85 (74.5, 91.25) 87.5 (74, 91.5) 85 (80.25, 91.25)
Diastolic Arterial Pressure (mmHg) 47 (44.5, 49.5) 46.5 (42.5, 47) 49 (46.5, 51.25) FiO2 0.5 (0.438, 0.663) 0.55 (0.45, 0.675) 0.5 (0.37, 0.637)
O2 Saturation (%) 96 (95, 98) 96.5 (95.5, 98) 95 (93, 97.25) PaO2 (mmHg) 88 (77.5, 109) 90 (77.5, 112.5) 83 (75.5, 106.5)
PaCo2 (mmHg) 38 (35.75, 49) 38 (34.5, 40) 48 (37.25, 49.5) HCO3 (mmol/L) 19 (17.75, 20.25) 18 (16.5, 20.5) 19 (18.75, 21.75)
Prothrombin Time (seconds) 62 (49.25, 81.25) 59 (41.5, 76) 62 (56.5, 86)
Fibrinogen (mg/dL) 4.7 (3.95, 6.625) 4.8 (4.6, 6.25) 4.5 (2.275, 7.325) C-Reactive Protein (mg/L) 273.4 (155.4, 352.05) 274.65 (173.9, 342.7) 188 (98.35, 405.35)
Lactate (mmol/L) 4 (2.95, 5.6) 4.25 (3, 5.6) 3.6 (2.7, 5.275)
Platelets (103/mm3) 168 (95.5, 198) 153 (90, 197) 194 (94.5, 202.25)
White Blood Cells (103/mm3) 11.7 (9.75, 19.675) 14.3 (10.15, 27.55) 10.5 (9.2, 12.6) Creatinine (mg/dL) 1.7 (1.175, 2.025) 1.65 (1.3, 2.05) 1.7 (1.075, 2.075)
Biliuribine (mg/dL) 1.3 (0.95, 2.05) 1.44 (1.025, 1.95) 1.3 (0.775, 3.7) Glycemia (mg/dL) 167 (124.25, 196.5) 160 (120.5, 178) 185 (119.5, 212.5)
pH 7.31 (7.235, 7.345) 7.285 (7.245, 7.335) 7.33 (7.21, 7.36)
Urine Output (mL/day) 1550 (803.75,1945) 1778.5 (1135,1975) 805 (417.5, 1741.25)
CLINICAL SCORES
SOFA 13 (12, 14.25) 12.5 (11, 14) 14 ( 13, 15.75)
Respiratory System 3 (2, 3) 3 (2, 3) 3 (2, 3) Nervous System 4 (3.75, 4) 4 (4, 4) 4 (3.25, 4)
Cardiovascular System 4 (3.75, 4) 4 (3, 4) 4 (4, 4)
Liver 1 (0, 2) 1 (0, 1) 1 (0.25, 2.75) Coagulation 0 (0, 1.25) 0 (0, 1) 1 (0, 2.5)
Renal System 1 (0.75, 2) 1 (0, 2) 2 (1, 2.75) APACHE II 26 (23, 30.25) 24 (20, 27) 30 (25, 32.5)
GCS – Glasgow Coma Score 3 ( 3, 5.25) 3 ( 3, 4) 3 ( 3, 5.75) SAS – Sedation Agitation Scale 2 ( 1, 2.25) 1 ( 1, 2) 2 ( 1.25, 2.75)
MORTALITY AND LENGTH OF STAY
Patients dead within 7 days after ICU
1 (5%) 0 (0%) 1 (14%)
Patients dead within 28 days [# (%)] Total 5 (24%) 2 (14%) 3 (43%)
In hospital [withdrawal of care] 4 [3] (19% [14%]) 1[1] (7%) 3[2] (43% [23%])
* Days in ICU before discharge 5 (3.25, 9) 4 (3, 6) 10 (5, 10) * Days in hospital 21.5 (11, 32.5) 15 (11, 30) 33 (24.75, 42)
Table 5.1 - Characteristics at study enrolment in the two groups of patients (R: responsive; NR: not responsive to therapy). Data are presented as median, 25th and 75th percentile or as frequency (%). The two groups did not significantly differ (p-value >0.05 Wilcoxon rank-sum test). The * indicates that analyses have been performed on 19 subjects (3 patients were excluded since they died in ICU).
5. Metabolomics profile associated to responsiveness to therapy
93
ALL PATIENTS R NR COMORBIDITIES
Acute Heart Failure [# (%)] 12 (57%) 7 (50%) 5 (71%) Acute Myocardial Infarction [# (%)] 0 (0%) 0 (0%) 0 (0%)
Prolonged arrhythmias [# (%)] 4 (19%) 1 (7%) 3 (43%) Chronic Organ Insufficiency [# (%)] 19 (91%) 13 (93%) 6 (87%)
Arterial Hypertension [# (%)] 9 (43%) 5 (36%) 4 (57%) Diabetes Mellitus [# (%)] 8 (38%) 6 (43%) 2 (23%)
Coronary arteries diseases [# (%)] 2 (10%) 1 (7%) 1 (14%) Systolic heart failure [# (%)] 1 (5%) 0 (0%) 1 (14%)
Diastolic heart failure [# (%)] 0 (0%) 0 (0%) 0 (0%) Cerebrovascular Disease [# (%)] 2 (10%) 2 (14%) 0 (0%)
Peripheral vascular disease [# (%)] 1 (5%) 1 (7%) 0 (0%) Dementia [# (%)] 2 (10%) 2 (14%) 0 (0%)
Chronic Lung Disease [# (%)] 3 (14%) 1 (7%) 2 (23%) Rheumatic/connective tissue disease [# (%)] 0 (0%) 0 (0%) 0 (0%)
Inflammatory Bowel Disease [# (%)] 0 (0%) 0 (0%) 0 (0%) Peptic ulcer [# (%)] 0 (0%) 0 (0%) 0 (0%)
Mild liver disease [# (%)] 0 (0%) 0 (0%) 0 (0%) Moderate/severe liver Disease [# (%)] 2 (10%) 1 (7%) 1 (14%)
Chronic Kidney Disease [# (%)] 0 (0%) 0 (0%) 0 (0%) Tumour without metastasis [# (%)] 2 (10%) 1 (7%) 1 (14%)
Hemiplegia/paraplegia [# (%)] 1 (5%) 1 (7%) 0 (0%)
SOURCE of INFECTION Respiratory [# (%)] 5 (24%) 3 (21%) 2 (23%) Abdominal [# (%)] 7 (33%) 3 (21%) 4 (57%)
Urinary Tract [# (%)] 6 (29%) 5 (36%) 1 (14%) Others [# (%)] 3 (14%) 3 (21%) 0 (0%)
THERAPIES
Beta-blocker [# (%)] 3 (14%) 2 (14%) 1 (14%) Ionotropic Drugs [# (%)] 7 (33%) 4 (29%) 3 (43%)
Sedation drugs [# (%)] 21 (100%) 14 (100%) 7 (100%) Other drugs [# (%)] 19 (91%) 14 (100%) 5 (71%)
Tracheal Intubation [# (%)] 19 (91%) 13 (93%) 6 (87%) Renal Replacement Therapy [# (%)] 1 (5%) 1 (7%) 0 (0%)
Transfusion [# (%)] 2 (10%) 0 (0%) 2 (23%)
Table 5.2 – Comorbidities and sources of infection at study enrolment, administered therapy during the acute phase in the two groups of patients (R: responsive; NR: not responsive to therapy). Data are presented as frequency (%). No significant differences were found (p-value >0.05 Fisher exact test).
5. Metabolomics profile associated to responsiveness to therapy
94
5.3.2 Metabolic fingerprinting by untargeted metabolomics
The statistical analyses performed on the identified species from untargeted approach
showed that at T1 the two groups are quite similar and most of the differences occurs at T2. In fact,
none of the identified species had a significant difference between R and NR at T1. At T2, stearic
acid was lower in NR, whereas pyruvic acid, lactic acid and histidine were higher (Figure 5.4). The
changes in peak intensities from T1 to T2 were verified in the two groups separately and then
compared (Table 5.3). A general increase in circulating essential amino acids such as arginine,
tyrosine, threonine and lysine was observed at T2 in R and NR patients. Only lysine and threonine
significantly increased in both groups. Similarly, a significant decrease in the abundance of
acetylcarnitine was observed in R and NR patients (Table 5.3). Only NR showed statistically
significant reduction in circulating fatty acids, mainly saturated and monosaturated (myristic acid,
palmitoleic acid, palmitic acid, oleic acid and stearic acid). The endogenously occurring kynuramine
derived from tryptophan, was increasing over-time in both groups, although significantly only in R.
As for the trend, 3 species significantly differed between the two groups: creatinine decreased in R
and increased in NR, whereas myristic acid and oleic acid significantly decreased in NR patients only
(Figure 5.5).
Figure 5.4 - Comparison of metabolite peak intensities in responsive (R) and non-responsive (NR) groups at T2 (Wilcoxon rank-sum test p-value < 0.05, FDR < 0.15).
5. Metabolomics profile associated to responsiveness to therapy
95
Responsive to Therapy (R) Not Responsive to Therapy (NR) T1 T2 Δ=T2-T1 T1 T2 Δ=T2-T1 Creatinine 12116 (10211, 17348) 11109 (8507,13271) * -1797 (-3691, -377) ↓ § 13371 (10815, 15693) 13749 (10223, 19343) 792 (-557, 2907) ↑ L-Arginine 4780 (3369, 5930) 6528 (5183, 7638) 2113 (552, 3812) ↑ 3434 (3200, 4627) 7275 (5063, 8561) * 2958 (1379, 3837) ↑ L-Acetylcarnitine 14079 (7830, 17703) 9163 (6284,11268) * -4143 (-10588, 1024) ↓ 22939 (11352, 28847) 14851 (9771, 19232) * -4307 (-12256, -2900) ↓ L-Threonine 1155 (819, 1518) 1747 (1415, 2296) * 816 (-53, 1075) ↑ 1038 (823, 1518) 1754 (1520, 2214) * 481 (238, 1005) ↑ Taurine 1302 (929, 1837) 859 (729, 1264) * -282 (-486, -83) ↓ 1516 (1032, 1902) 1095 (883, 1421) -362 (-975, 44) ↑ Kynuramine 1496 (1418, 1688) 1732 (1589, 1776) * 122 (28, 285) ↑ 1381 (1233, 1481) 1584 (1320, 1701) 72 (-2.55, 254) ↑ L-Tyrosine 874 (788, 1040) 1181 (867, 1476) * 105 (22, 667) ↑ 927 (801, 1089) 1167 (931, 1740) 281 (194, 403) ↑ Citric acid 26765 (16429, 33356) 16942 (7661, 24011) * -7611 (-12761,-1338) ↓ 33185 (24219, 44715) 27909 (22868, 32550) -8845 (-12989, -1882) ↓ L-Lysine 1133 (891, 1405) 1491 (1280, 2053) * 336 (90, 1190) ↑ 1149 (840, 1626) 1515 (1324, 1973) * 446 (254, 819) ↑ Stearic acid 37344 (32419, 58541) 38207 (32244, 42399) -3464 (-9555, 2738) ↑ 38710 (31144, 51784) 28711 (26782, 32069) * -11046 (-14567, -6515) ↓ Myristic acid 2145 (1463, 3917) 1511 (972, 3111) -457 (-1804, 284) ↓ § 3217 (1971, 6396) 1024 (816, 2326) * -1365 (-3421, -991) ↓ Palmitoleic acid 5130 (2286, 9028) 4338 (1625, 5849) -1586 (-4475, 1268) ↓ 5852 (4761, 11911) 2271 (1798, 4542) * -3707 (-8859, -2472) ↓ Palmitic acid 61453 (47464, 76280) 58577 (48918, 61999) -5683 (-18696, 5115) ↓ 68002 (52214, 96968) 47928 (36302, 53361) * -21621 (-48579,-11540) ↓ Oleic acid 46791 (31181, 92927) 43499 (27535, 85340) -11901 (-56910, 11496) ↓ § 94476 (76885, 179484) 42356 (24828 , 65253) * -63747 (-87753,-33127) ↓
Table 5.3– Trends of metabolite peak intensities from T1 to T2 in the two groups (untargeted approach). Significant differences between T1 and T2 are marked with * (Wilcoxon sign-rank test p-value <0.05), whereas § marks differences in the delta between R and NR pts (Wilcoxon rank-sum test p-value<0.05). The arrows indicate if the metabolite concentration at T2 is lower (↓) or higher (↑) than at T1
5. Metabolomics profile associated to responsiveness to therapy
96
Figure 5.5– Trends of metabolites peak intensities from T1 to T2 in the two groups. Box-plots in the top right corner show difference in metabolite peak intensity between T1 and T2 expressed as delta (Δ=T2 – T1). We performed Wilcoxon rank-sum test between the delta of the two groups and Wilcoxon signed rank between T1 and T2 in each group separately. Significant differences are marked with * (p-value<0.05). Only the three species for which delta was significantly different between R and NR groups have been plotted.
5.3.3 Metabolic profiling by targeted metabolomics
Similarly to what observed for the untargeted approach, the univariate statistical analyses
showed that most of the differences in metabolite levels occurs at T2. In addition, the NR group did
not show any significant changes in metabolite abundance from T1 to T2. More precisely, only 4
metabolites had a statistically significant difference in concentration between the two groups at T1,
whereas 23 metabolites at T2. In particular, as Figure 5.6 show, at T2 the plasma levels of 6 species
of lysophosphatidylcholines (lysoPCs), 7 of diacyl-phosphatidylcholines (PC aa), 2 of acyl-alkyl
phosphatidylcholines (PC ae), 2 of long-chain sphingomyelins (SM) together with glutamic acid were
reduced in NR with respect the R group, whereas increased abundance was observed for amino
acids such as alanine, methionine, phenylalanine and histidine.
No metabolites changed significantly in the NR group from T1 to T2, whereas 54 metabolites
significantly changed in the R group. In particular, in the R group 8 species of lysoPCs, 17 of PCaa, 22
of PCae, 11 of SM, 7 amino acids (AAs) and 2 biogenic amines significantly increased from T1 to T2,
whereas histidine, creatinine and taurine significantly decreased. For 38 metabolites the trend
(Δ=T2 – T1) was different between R and NR (Table 5.4). Moreover, kynurenine, a product of the
5. Metabolomics profile associated to responsiveness to therapy
97
tryptophan catabolism increased in the NR group significantly more than R group as shown in Figure
5.7.
Figure 5.6 - Comparison of metabolite concentrations (μM) in responsive (R) and non-responsive (NR) group at T2 (Wilcoxon rank-sum test p-value < 0.05, FDR < 0.15). 4 out of 23 metabolites significantly different between the two groups at T2 are shown as example: lysoPC a C18:0 trend is similar to other lipids not shown here; all of them have a higher concentration in the R group with respect to NR.
5. Metabolomics profile associated to responsiveness to therapy
98
Responsive to Therapy (R) Not responsive to Therapy (NR) T1 T2 Δ = T2-T1 T1 T2 Δ = T2-T1
lysoPC a C16:0 3.825 (2.320, 5.910) 15.7 (10.3, 17.5) * 10.205 (2.77, 13.66) ↑ § 2.780 (1.233, 4.902) 4.020 (1.400, 6.572) 0.250 (0.146, 1.648) ↑ lysoPC a C16:1 0.185 (0.124, 0.254) 0.604 (0.470, 0.690) * 0.383 (0.118, 0.525) ↑ § 0.116 (0.090, 0.160) 0.158 (0.099, 0.238) 0.012 (-0.002, 0.107) ↑ lysoPC a C17:0 0.139 (0.102, 0.191) 0.312 (0.124, 0.376) * 0.131 (0.025, 0.199) ↑ 0.094 (0.059, 0.145) 0.110 (0.080, 0.165) 0.005 (-0.012, 0.064) ↑ lysoPC a C18:0 0.902 (0.603, 1.650) 3.220 (2.680, 4.400) * 2.054 (0.762, 3.330) ↑ § 0.683 (0.382, 1.252) 1.040 (0.494, 1.415) 0.161 (0.071, 0.403) ↑ lysoPC a C18:1 1.170 (1.070, 1.970) 6.090 (3.010, 6.770) * 4.160 (1.070, 5.320) ↑ § 1.020 (0.709, 1.360) 1.470 (1.170, 2.200) 0.513 (0.242, 0.964) ↑ lysoPC a C18:2 1.215 (1.050, 1.380) 6.075 (2.330, 8.060) * 3.455 (1.123, 6.732) ↑ § 0.758 (0.572, 0.990) 1.160 (0.916, 2.310) 0.697 (0.212, 1.355) ↑ lysoPC a C20:3 0.222 (0.145, 0.333) 0.474 (0.333, 0.834) 0.245 (0.104, 0.600) ↑ § 0.165 (0.116, 0.229) 0.222 (0.147, 0.275) 0.045 (0.022, 0.056) ↑ lysoPC a C20:4 0.485 (0.359, 0.823) 1.430 (0.800, 1.830) 0.821 (0.262, 1.241) ↑ § 0.478 (0.367, 0.688) 0.456 (0.375, 0.985) 0.059 (-0.006, 0.314) ↑ PC aa C28:1 1.480 (1.280, 1.900) 1.805 (1.460, 2.120) * 0.310 (-0.070, 0.520) ↑ 1.40 (1.105, 1.770) 1.25 (1.083, 2.313) 0.078 (-0.265, 0.430) ↑ PC aa C32:3 0.948 (0.828, 1.130) 1.110 (0.949, 1.420) * 0.152 (0.066, 0.270) ↑ 1.16 (0.816, 1.258) 0.89 (0.850, 1.249) 0.087 (0.349, 0.147) ↑ PC aa C34:3 22.95 (17.9, 33.9) 38.85 (24.9, 47.4) * 10.00 (0.700, 18.550) ↑ 23.50 (14.20, 29.00) 23.70 (14.85, 34.25) 1.300 (-1.725, 3.325) ↑ PC aa C34:4 1.58 (1.29, 1.86) 2.21 (1.38, 2.48) * 0.560 (0.190, 0.920) ↑ 1.48 (1.295, 1.693) 1.160 (1.143, 1.600) -0.141(-0.255, -0.023) ↓ PC aa C36:1 55.8 (45.5, 70.9) 81.75 (58.6, 95.4) * 20.15 (-0.500, 41.300) ↑ 50.90 (44.20, 55.40) 63.60 (48.20, 82.25) 6.500(1.250, 21.850) ↑ PC aa C36:2 264.5 (182, 325) 361.5 (324, 495) * 121 (9, 221) ↑ 222 (207,223.75) 298 (229.5, 313.25) 74 (3.25, 102.25) ↑ PC aa C36:3 124.5 (106,147) 188 (136, 227) * 76 (2, 97.6) ↑ 104 (88.60, 121) 120 (110.25, 133.5) 18.0 (-7.25, 19.80) ↑ PC aa C38:0 1.630 (1.250, 2.080) 1.745 (1.380, 2.730) * 0.340 (-0.140, 0.802) ↑ § 2.010 (1.978, 2.237) 1.670 (1.585, 1.778) -0.440 (-0.548,-0.245) ↓ PC aa C38:1 0.601 (0.353, 0.833) 0.661 (0.408, 0.740) 0.079(-0.021, 0.259) ↑ § 0.602 (0.385, 1.107) 0.417 (0.347, 0.532) -0.097 (-0.938, 0.003) ↓ PC aa C38:3 28.8 (23.4, 41.2) 36.750 (29.200, 47.700) * 9.25 (-1.000, 11.900) ↑ § 28.30 (16.60, 31.175) 25.30 (18.75, 30.025) -0.90 (-6.300, 2.200) ↓ PC aa C38:5 37.8 (29.4, 49.2) 47.85 (44.2, 51.9) * 13.85 (2.9, 19.4) ↑ § 49.6 (41.225, 52.9) 38.7 (30.7, 47.025) -7.100(-13.450, 2.175) ↓ PC aa C38:6 78.45 (51.6, 90.4) 88.15 (70.8, 101) * 15.7 (5.9, 33.5) ↑ § 69.8 (57.5, 91.2) 62.2 (57.175,77.9) -11.30 (-17.85, -0.825) ↓ PC aa C40:4 1.655 (1.410, 2.020) 2.115 (1.750, 2.630) 0.39 (-0.13, 0.860) ↑ § 1.840 (1.250, 2.473) 1.590 (1.478, 2.003) -0.310 (-0.540, 0.253) ↓ PC aa C40:5 5.760 (3.700, 7.180) 6.670 (6.220, 7.350) * 1.515 (-0.26, 3.08) ↑ § 5.900 (4.890, 7.527) 5.500 (4.575, 6.715) -0.680 (-1.485, 0.512) ↓ PC aa C40:6 19.45 (15.2, 20.4) 23.8 (17.7, 25.4) * 5.450 (2.500,10.600) ↑ § 19.4(14.25, 23.725) 19.4 (16.025,19.8) 0.000 (-3.775, 0.900) // PC aa C42:2 0.122 (0.101, 0.150) 0.154 (0.121, 0.176) * 0.037 (-0.017, 0.067) ↑ 0.110(0.089, 0.158) 0.111(0.102, 0.116) -0.008 (-0.039, 0.015) ↓ PC aa C42:4 0.085 (0.065, 0.112) 0.105 (0.077, 0.151) * 0.012 (0.002, 0.037) ↑ 0.086(0.080, 0.127) 0.079(0.075, 0.110) -0.004 (-0.042, 0.018) ↓ PC ae C34:2 13.4 (11, 19.1) 15.3 (11.1, 23.8) * 2.80 (0.000, 6.230) ↑ 14.70 (11.650, 16.200) 11.80 (11.025, 16.525) -0.300 (-2.625, 1.212) ↓ PC ae C34:3 6.205 (4.340, 7.470) 6.575 (4.900, 9.800) 0.070 (-0.600, 2.500) ↑ § 6.760 (4.633, 8.373) 4.490 (3.763, 6.342) -1.570 (-3.197,-0.070) ↓ PC ae C36:2 16.8 (14.6, 20.9) 21.9 (15.2, 27.6) * 4.150 (0.400, 9.700) ↑ 18.20 (15.425, 21.150) 17.20 (15.550, 24.925) 2.60 (-1.825, 3.210) ↑ PC ae C36:3 6.580 (5.270, 8.760) 7.065 (6.210, 12.200) * 1.140 (-0.260, 2.800) ↑ § 6.490 (6.460, 7.298) 5.680 (5.215, 6.982) -0.930 (-1.225, 0.212) ↓ PC ae C36:4 15.7 (13.4, 18.2) 16.85 (13.2, 18.7) 0.950 (-0.600, 3.650) ↑ § 15.7 (14.675,18.3) 13.2 (12.225,14.1) -2.500 (-5.675,-1.700) ↓ PC ae C36:5 9.88 (8.99, 11.9) 11.1 (9.99, 12.2) 0.920 (-1.200, 3.210) ↑ § 10.2 (8.402,14.125) 8.470 (7.172,10.372) -3.700 (-3.850,-0.845) ↓ PC ae C38:0 1.265 (1.120, 1.350) 1.415 (1.280, 1.620) * 0.185 (0.100, 0.590) ↑ § 1.190 (1.080, 1.395) 0.970 (0.954, 1.238) -0.110 (-0.199,-0.084) ↓ PC ae C38:1 0.407 (0.276, 0.614) 0.649 (0.518, 1.020) * 0.162 (0.075, 0.387) ↑ 0.596 (0.248, 0.725) 0.690 (0.582, 0.870) 0.158 (-0.011, 0.250) ↑ PC ae C38:2 1.725 (1.530, 2.040) 1.845 (1.400, 2.730) * 0.360 (-0.040, 0.746) ↑ 1.860 (1.763, 2.013) 1.910 (1.845, 2.113) -0.030 (-0.088, 0.335) ↓ PC ae C38:3 3.340 (2.760, 3.960) 4.040 (2.960, 4.480) 0.800 (-0.570, 1.290) ↑ § 3.160 (2.530, 4.058) 2.920 (2.353, 3.075) -0.590 (-1.000,-0.198) ↓ PC ae C38:4 11.75 (7.42,14.2) 10.71 (9.1, 14.1) 0.350 (-0.800, 2.400) ↑ § 14.1 (11.6, 15.15) 10 (8.668,12.1) -2.540 (-4.033,-1.975) ↓ PC ae C38:5 12.9 (11.4, 14.4) 13.3 (10.2, 17.4) 1.750 (-1.000, 4.400) ↑ § 16.2 (13.9, 18.875) 12.9 (10.725, 13.8) -4.100 (-5.880,-3.000) ↓ PC ae C38:6 4.910 (4.050, 5.720) 4.720 (4.280, 7.240) 0.685 (-0.100, 1.390) ↑ § 5.270 (4.880, 6.490) 3.940 (3.760, 4.133) -1.760 (-2.057,-1.080) ↓ PC ae C40:2 1.010 (0.851, 1.140) 1.215 (1.060, 1.460) * 0.219 (0.120, 0.310) ↑ § 1.180 (1.000, 1.335) 1.010 (0.933, 1.275) -0.120 (-0.377,-0.067) ↓
5. Metabolomics profile associated to responsiveness to therapy
99
PC ae C40:5 2.040 (1.770, 2.370) 2.405 (1.600, 2.700) 0.390 (-0.100, 0.860) ↑ § 2.660 (2.395, 2.767) 2.110 (1.822, 2.190) -0.700 (-0.935,-0.298) ↓ PC ae C40:6 2.750 (2.390, 3.550) 2.840 (2.640, 3.910) * 0.565 (0.220, 0.840) ↑ § 3.020 (2.740, 3.775) 2.570 (2.252, 2.865) -0.410 (-1.168,-0.338) ↓ PC ae C42:2 0.223 (0.188, 0.259) 0.250 (0.222, 0.294) * 0.219 (0.120, 0.310) ↑ 0.258 (0.195, 0.313) 0.265 (0.209, 0.295) -0.036 (-0.058, 0.041) ↓ PC ae C42:3 0.339 (0.243, 0.395) 0.395 (0.256, 0.512) * 0.078 (0.004, 0.140) ↑ § 0.286 (0.266, 0.349) 0.331 (0.261, 0.362) 0.002 (-0.053, 0.040) // PC ae C42:4 0.272 (0.217, 0.415) 0.300 (0.242, 0.482) 0.036 (0.009, 0.090) ↑ § 0.316 (0.272, 0.410) 0.251 (0.216, 0.297) -0.060 (-0.134,-0.005) ↓ PC ae C42:5 0.875 (0.693, 0.944) 0.984 (0.579, 1.160) 0.066 (-0.146, 0.182) ↑ § 1.020 (0.956, 1.293) 0.887 (0.791, 1.051) -0.169 (-0.365,-0.074) ↓ PC ae C44:5 0.421 (0.302, 0.464) 0.430 (0.231, 0.535) -0.009 (-0.075, 0.154) ↓ § 0.487 (0.438, 0.639) 0.376 (0.323, 0.492) -0.111 (-0.149,-0.090) ↓ PC ae C44:6 0.319 (0.234, 0.345) 0.342 (0.247, 0.435) 0.053 (-0.015, 0.105) ↑ § 0.405 (0.354, 0.517) 0.304 (0.259, 0.361) -0.126 (-0.177,-0.031) ↓ SM (OH) C14:1 2.295 (1.830, 3.050) 3.065 (2.220, 3.380) * 0.425( 0.160, 0.890) ↑ 2.660 (2.023, 2.830) 2.210 (2.152, 3.325) 0.000 (-0.642, 0.352) // SM (OH) C16:1 0.948 (0.761, 1.260) 1.300 (1.010, 1.390) 0.250 (0.041, 0.416) ↑ 1.170 (0.998, 1.405) 1.030 (0.849, 1.293) -0.240 (-0.353, 0.054) ↓ SM (OH) C22:1 0.731 (0.684, 0.915) 0.837 (0.727, 1.140) * 0.082( -0.004, 0.289) ↑ 0.738 (0.574, 1.065) 0.732 (0.516, 0.916) -0.130 (-0.258, 0.146) ↓ SM (OH) C24:1 0.071 (0.053, 0.098) 0.077 (0.052, 0.098) 0.003 (-0.011, 0.023) ↑ § 0.065 (0.054, 0.095) 0.054 (0.041, 0.069) -0.015 (-0.033,-0.010) ↓ SM C16:0 35.95 (31, 39.9) 48.95 (38.7, 54.9) * 13.3 (3.8, 18.3) ↑ § 42.9 (30.475, 48.575) 40.6 (37.9, 45.275) -0.900 (-6.175, 6.100) ↓ SM C16:1 5.570 (5.100, 6.520) 6.815 (5.710, 8.650) * 1.260 (0.220, 3.190) ↑ § 6.610 (5.390, 7.965) 6.410 (5.200, 7.570) 0.060 (-1.385, 0.643) ↑ SM C18:0 6.245 (5.380, 7.030) 7.385 (6.880, 8.770) * 1.300 (0.450, 2.620) ↑ § 6.010 (5.343, 9.427) 6.430 (4.728, 6.750) -1.580 (-3.072, 0.520) ↓ SM C18:1 3.020 (2.890, 3.700) 3.585 (2.950, 4.210) * 0.525 (0.200, 0.840) ↑ § 3.560 (2.887, 4.277) 2.960 (2.325, 3.502) -0.880 (-1.333, 0.065) // SM C20:2 0.127 (0.108, 0.142) 0.169 (0.106, 0.258) * 0.045 (0.005, 0.116) ↑ 0.168 (0.128, 0.182) 0.117 (0.107, 0.199) 0.008 (-0.041, 0.044) ↑ SM C24:0 1.050 (0.838, 1.380) 1.235 (1.020, 1.560) * 0.158 (0.050, 0.465) ↑ 1.000 (0.747, 1.370) 0.954 (0.882, 1.115) -0.056 (-0.247, 0.145) ↓ SM C24:1 3.410 (2.740, 4.310) 4.885 (3.580, 5.310) * 0.565 (0.050, 2.420) ↑ 4.060 (2.388, 5.902) 3.680 (3.293, 4.188) -0.580 (-1.877, 0.998) ↓ Arginine 42.25 (29.2, 58.2) 64 (56.1, 86) * 23.15 (3.500, 49.20) ↑ 40.70 (31.750, 59.850) 69.20 (51.125, 86.200) 25.10 (10.425, 54.075) ↑ Histidine 64.2 (51, 72.7) 47.1 (37.1, 56.4) * -15.55 (-24.3, -2.4) ↓ 64.70 (51.525, 80.650) 61.30 (56.500, 91.850) 1.00 (-3.750, 7.375) ↑ Lysine 114.5 (103,141) 161 (135 ,241) * 49.95 (6.0, 130) ↑ 146 (93.925,271.750) 190 (155.250,217.750) 55 (38.825, 81.575) ↑ Ornitine 23.65 (20.1, 31.1) 58.95 (31.2, 84.4) * 12.60 (-2.70, 65.10) ↑ 31.30 (22.625, 42.875) 69.40 (47.675, 92.650) 34.00 (15.250, 76.493) ↑ Serine 41.05 (33.5, 50.3) 63.1 (55.2, 84.4) * 22.30 (5.10, 29.80) ↑ 43.60 (38.275, 56.375) 66.60 (54.800, 81.300) 26.30 (7.900, 48.300) ↑ Threonine 56.2 (37.4, 68.7) 78.35 (60, 89.6) * 24.70 (-3.80, 43.20) ↑ 53.70 (42.550, 74.100) 80.40 (72.200,102.300) 19.80 (15.625, 35.325) ↑ Trptophan 16.25 (11.9, 21.6) 27.6 (20.5, 35.1) * 9.335 (0.200, 19.10) ↑ 13.50 (9.720, 30.275) 22.20 (21.875, 70.950) 14.88 (4.050, 28.010) ↑ Tyrosine 40.2 (32.9, 45.9) 50.95 (37.4, 62.4) * 9.05 (-4.900, 31.100) ↑ 52.00 (36.675, 62.500) 48.10 (37.050, 90.875) 2.80 (-2.475, 28.450) ↑ Creatinine 91.95 (64.7,134) 74 (55.9, 98.1) * -14.65 (-35.90, -7.0) ↓ 92.40 (82.475,132.250) 106 (77.225,168.500) 1.50 (-9.450, 19.60) ↑ Met SO 0.552 (0.001, 0.735) 0.800 (0.591, 1.190) * 0.220 (-0.062, 0.859) ↑ 0.905 (0.116, 1.423) 1.380 (0.227, 2.768) 0.000 (-0.075, 2.196) ↓ Taurine 27.15 (14.6, 36.7) 15.7 (10.7, 20.3) * -10.55 (-13.40, -4.0) ↑ 38.60 (30.125, 40.550) 23.10 (13.900, 29.725) -14.20 (-25.950, -
↓
Kynurenine 3.610 (2.530, 4.850) 3.325 (2.630, 6.450) 0.215 (-0.630, 1.780) ↑ § 4.110 (2.848, 4.633) 6.010 (4.840, 9.938) 1.520 (1.380, 5.632) ↑
Table 5.4 - Trends of metabolite concentration (μM) from T1 to T2 in the two groups. Significant differences between T1 and T2 are marked with * (Wilcoxon sign-rank test p-value<0.05), whereas § marks differences in the delta between R and NR pts (Wilcoxon rank-sum test p-value<0.05). Note that no metabolite abundance changed significantly from T1 to T2 in the NR group.
5. Metabolomics profile associated to responsiveness to therapy
100
Figure 5.7 - Trends of metabolites concentration from T1 to T2 in the two groups. Box-plots in the top right corner show the difference in metabolite concentration between T1 and T2 expressed as delta (Δ=T2 – T1). We performed Wilcoxon rank-sum test between the delta of the two groups and Wilcoxon signed rank between T1 and T2 in each group separately. Significant differences are marked with * (p-value<0.05). Only four metabolites have been plotted as an example.
5.3.4 Regression analysis for targeted metabolomics data
We used classification models with the aim of identifying a set of features that are mostly
associated to the target class, i.e. the not responsive to therapy (NR group). The coefficients of the
models obtained from metabolomics concentrations only are reported in Table 5.5. As explained in
the previous chapter, the interpretation of the coefficients in the logistic regression it’s not trivial.
If we express the odd-ratio as exponential of linear combination of the independent variables, we
can say that if the coefficient 𝛽𝛽𝑖𝑖 is positive then the increase of the feature 𝑥𝑥𝑖𝑖 will be associated with
the increase of the odd ratio, i.e. the probability to belong to class 1 is higher than to class 0, given
all other 𝑥𝑥𝑖𝑖 variables being equal. On the contrary, if the coefficient 𝛽𝛽𝑖𝑖 is negative then the increase
of the feature 𝑥𝑥𝑖𝑖 will be associated with the decrease of the odd ratio, i.e. the probability to belong
to class 1 is lower than to class 0.
Three metabolites were selected in all models: PC ae C40:2, PC ae C38:0 and alanine. Figure
5.8.A shows the coefficient values of the model built according to the criterion of minimal deviance
on the first 20 ranked features. All the obtained models correctly classify the observations in the
testing set.
5. Metabolomics profile associated to responsiveness to therapy
101
10 features 20 features 30 features METABOLITES min Dev fixed λ min Dev fixed λ min Dev fixed λ
PC ae C40:2 -0.551 -0.500 -1.129 -0.306 -0.231 -0.266 Glu -0.915 - - - PC ae C38:0 -0.616 -0.460 -0.913 -0.415 -0.265 -1.002 PC ae C34:3 - - - - - -0.447 PC aa C36:6 - - - - -0.040 -0.518 PC aa C38:1 -0.222 -0.102 -0.535 - - -0.082 PC ae C40:5 - - -0.534 -0.194 - -0.470 lysoPC a C20:3 -0.369 -0.321 -0.384 -0.140 -0.039 -0.236 lysoPC a C18:0 -0.070 -0.099 -0.336 - - - lysoPC a C18:1 - - - - -0.017 -0.040 PC ae C44:6 - - - -0.342 -0.162 -0.473 PC ae C40:6 -0.183 -0.283 -0.316 - - PC ae C38:5 - - -0.316 -0.327 -0.348 -0.051 SM C16:1 -0.219 -0.055 -0.209 - -0.185 - lysoPC a C16:0 - - -0.186 -0.131 -0.070 -0.146 lysoPC a C16:1 - - - - - -0.211 Cit - - - 0.038 - 0.297 PC aa C38:1 - - - -0.065 - - PC aa C38:0 - - - -0.008 - - Tyr -0.249 - 0.328 - - - His - - 0.421 0.084 0.243 0.256 Kynurenine - - - 0.527 Ala 0.919 0.403 1.315 0.345 0.252 0.867 sugars - - 1.518 - - - Performance Dev=15.58 Dev=19.44 Dev=2.29 Dev=22.34 Dev=15.17 Dev=18.76
Table 5.5 - Coefficients values of the logistic regression models for the first 10, 20 and 30 features, computed according the two strategies (minimal deviance and estimated λ) . The coefficients of the metabolites which are selected by all models are in bold. The bottom row reports values of deviance obtained from the model.
5. Metabolomics profile associated to responsiveness to therapy
102
5.3.5 Regression models for targeted and untargeted metabolomics data
We built the integrated models by using 10 features from untargeted metabolomics data
and the first 20 ranked metabolites from targeted analysis, as explained in the Method section. We
can notice that the set of features selected includes again lysoPCs, PCs and alanine. Moreover, six
further species not measured by targeted analysis entered in the models: stearic acid, palmitoleic
acid, palmitic acid, oleic acid, myristic acid and citric acid. The coefficients of the models are
reported in Table 5.6. Figure 4.5.B shows the coefficients of the model built according to the
criterion of minimal deviance on the first 20 ranked features.
We ca notice that alanine, PC aa C38:0, PC ae C38:1, myristic acid and palmitoleic acid are
selected by all models. It is worth to underline that PC aa C38:0 and alanine have coefficients with
the same sign as the coefficients of the models built on targeted metabolomics data only.
Figure 5.8- Coefficients values of the logistic regression models for targeted metabolomics (panel A) and for integration of targeted and untargeted metabolomics (panel B).
5. Metabolomics profile associated to responsiveness to therapy
103
10 features 20 features 30 features METABOLITES min Dev fixed λ min Dev fixed λ min Dev fixed λ
PC ae C40:2 - -0.260 -1.982 - -0.556 -0.203 PC ae C40:5 -0.556 -0.471 - -1.645 -0.445 -0.304 lysoPC a C20:3 - - -1.604 -1.279 - -0.140 lysoPC a C18:0 - - -1.056 - lysoPC a C16:0 - - -1.003 -0.146 - -0.112 PC ae C44:6 -0.915 -0.563 -0.970 -3.321 - -0.517 SM C16:1 - - -0.933 - - - Stearic acid - - -0.785 -0.306 - - Palmitoleic acid -0.513 -0.318 -0.724 -1.039 -0.098 -0.237 Palmitic acid - -0.102 -0.717 -0.731 -0.256 -0.187 PC ae C38:5 - - -0.711 -0.137 - - Oleic acid - - -0.668 -0.569 -0.552 -0.218 Myristic acid -0.212 -0.180 -0.521 -0.385 -0.056 -0.054 PC aa C38:1 -0.539 -0.121 -0.394 -0.747 -0.209 -0.096 Citric acid - - -0.283 0.977 - - PC ae C40:5 - - - - - - PC aa C38:0 - - -0.092 - - -0.032 PC ae C38:0 -0.531 -0.442 -0.041 -2.174 -0.063 -0.518 PC ae C38:4 - - - -0.688 - -0.022 PC ae C38:5 - - - - -0.063 -0.303 Cit - - 0.376 0.789 0.036 0.067 His - - - - 0.377 0.198 Ala 1.010 0.430 0.353 1.953 0.105 0.292 Performance Dev=10.75 Dev=18.27 Dev=0.36 Dev=22.23 Dev=5.67 Dev=23.21
Table 5.6- Coefficients values of the logistic regression models for the first 10, 20 and 30 features , computed according the two strategies (minimal deviance and estimated λ). The coefficients of the metabolites which are selected by all models are in bold. The bottom row reports values of deviance obtained from model.
5. Metabolomics profile associated to responsiveness to therapy
104
5.3.6 Discriminant analysis
The coefficient values of the LDA models and the VIP scores of the PLS-DA models built on
the first 10 and 20 ranked features after mRMR are reported in Table 5.7 and 5.8 for targeted
metabolomics and for targeted and untargeted metabolomics data respectively. We cannot use the
entire subset of 30 features due to the lower number of observations (i.e. 21 patients only). In fact,
as explained in §2.4, the computation of the boundary region requires the covariance matrix would
be invertible and this is not the case.
In the targeted metabolomics model, it is worth to underline that PC ae C38:0, which already
played an important role in the regression models, occupies the first position in the VIP ranking
when considering 20 features. Similarly, when considering PC ae C40:2, we can notice that it is in
the first position when considering 10 features. This latter metabolite also has the highest score in
the integrated model. Three-dimensional PLS-DA score plots on 20 features for the two models are
shown in Figure 5.9 (metabolites only in panel A; integrated model in panel B). In both cases, the
models fail to correctly classify one subject only (i.e. NR in the targeted model and R in the
integrated one).
Figure 5.9 -Three-dimensional PLS-DA score plots on 20 features for metabolites model (panel A) and for the integrated metabolomics and proteomics model (panel B).
5. Metabolomics profile associated to responsiveness to therapy
105
METABOLITES VIP - PLS DA 20 VIP - PLS DA 10 COEF LDA
PC ae C38:0 1.351 - - PC ae C38:5 1.275 - - PC ae C44:6 1.210 - - PC ae C40:2 1.209 1.263 -3.535 PC ae C40:6 1.207 1.241 2.699 PC ae C40:5 1.176 - - PC aa C38:0 1.171 1.185 -4.278 PC ae C38:4 1.090 - - Ala 1.049 1.069 1.978 His 0.939 - - PC aa C38:6 0.918 0.972 -0.448 PC aa C38:1 0.875 0.941 -0.612 lysoPC a C16:0 0.851 - - lysoPC a C20:3 0.850 0.830 -2.231 lysoPC a C18:0 0.823 0.906 1.343 Cit 0.816 - - SM C16:1 0.812 0.888 0.850 Glu 0.782 0.431 - Tyr 0.579 - -1.058 sugars 0.487 - -
Table 5.7 – VIP scores of PLS-DA and coefficients of LDA for the targeted metabolomics models.
METABOLITES VIP - PLS DA 20 VIP - PLS DA 10 COEF LDA PC ae C40:2 1.431 1.271 -2.002 PC ae C40:6 1.418 - - PC aa C38:0 1.383 1.244 -3.779 Palmitoleic acid 1.168 0.987 -0.909 Myristic acid 1.162 1.053 2.412 PC aa C38:6 1.144 - - Ala 1.134 0.874 1.327 PC aa C38:1 1.120 0.964 -0.671 Oleic acid 1.103 - - Palmitic acid 1.088 0.926 -3.425 lysoPC a C18:0 1.058 0.903 2.225 SM C16:1 0.995 0.871 0.285 lysoPC a C20:3 0.960 0.791 -4.209 Stearic acid 0.775 - - Citric acid 0.667 - - L-Acetylcarnitine 0.631 - - Tyr 0.479 - - Pyruvic acid 0.362 - - Kynuramine 0.343 - - L-Lactic acid 0.334 - -
Table 5.8 – VIP scores of PLS-DA and coefficients of LDA for the integrated targeted and untargeted metabolomics models.
5. Metabolomics profile associated to responsiveness to therapy
106
5.4 EXPLORATIVE ANALYSES
Explorative analyses by probabilistic graphical models have been performed with the aim to
highlight dependences among features, and to verify whether there are differences in this
dependences between R and NR patients. To this purpose, we considered the full dataset (i.e. the
values of 130 metabolite concentrations expressed as delta) to build a Markov Network (MN) for R
and NR patients respectively. We adopted a two-step approach. Firstly, the maximum likelihood
networks was found by applying the algorithm of Chow and Liu[118]. Afterwards, forward search
was performed on each triangulated graph: the algorithm repeatedly adds the edge that optimizes
a selected measure until no more add-eligible edges are found. For both steps, the minimized
measure used is the Bayesian Information Criterion, as described in Chapter 2.
The two networks obtained for R and NR patients are shown in Figure 5.9. We follow the
same procedure already described in paragraph 4.4, i.e. we isolated the “hubs” (colored in Figure
5.9) and we counted the number of “leafs". By comparing the two networks, we can notice that they
have a different structures and different hubs. More precisely, the R group network has two hubs
which are connected with six direct neighbors, whereas the NR group network has three hubs
connected with seven direct neighbors. The number of leafs was 41 in the R model (31.5%) and 31
in the other (23.8%). We concentrated on the neighborhood of the different hubs up to the second
node. In R patients, the hubs are constituted by SM C16:0 and Isoleucine (Ile), as reported in Figure
5.10. As for NR patients, the three hubs are constituted by SM OH C22:1 and by the amino acids
Glycine (Gly) and Histidine (His), as shown in Figure 5.11.
5. Metabolomics profile associated to responsiveness to therapy
107
Figure 5.10- MN model of metabolites concentration in R (left panel) and NR (right panel) patients, highlighting the hubs (2 for R and 3 for NR).
5. Metabolomics profile associated to responsiveness to therapy
108
Figure 5.11– Visualization of the hub nodes in R group. The neighborhood of vertex SM C16:0 (red, left panel) and the neighborhood of vertex Ile (blue, right panel), both including only nodes within a radius of two.
5. Metabolomics profile associated to responsiveness to therapy
109
Figure 5.12- Visualization of the hub nodes in NR group. The neighborhood of vertex SM OH C22:2 (red, upper left panel) Gly (blue, upper right panel), and His (green, bottom panel) including only nodes within a radius of two.
5. Metabolomics profile associated to responsiveness to therapy
110
5.5 DISCUSSION
The results presented in this study highlighted biological pathways that could have a clinical
impact on septic shock progression and management. Improvement of organ function as assessed
by a drop in SOFA score in the first days of sepsis and septic shock has been shown to be associated
with improved outcomes[128], [129] but the mechanisms behind organ improvement remain to be
fully elucidated. We performed a comprehensive metabolomics study of septic shock patients
stratified into responders and non responders according to changes in SOFA score in the first 2 days
of ICU stay.
Combining untargeted and targeted metabolomics approaches by means of collecting data
for untargeted MS data acquisition and high-resolution MRM transitions for targeting multiple
metabolites, we obtained a wider picture of patient’s metabolic states and their metabolic
trajectory during the first 48 hours after ICU admission. Univariate analysis and the classification
model confirm that NR group presents an overall lipidome alteration as previously reported[98],
[100]–[102]. Here we showed that in NR patients specific lysophosphatidylcholines species (lysoPC
C16:0, C16:1, C18:0, C18:1, C18:2, C20:3) did not significantly change from T1 and T2, whereas in R
patients they significantly increased and they were markedly higher than in NR patients at T2 (see
Figure 5.4 and Table 5.5). In addition, their respective free fatty acids such as palmitic (C16:0),
palmitoleic (C16:1), stearic (C18:0) and oleic (C18:1) acids significantly decreased at T2 in NR group
only. The role of lysophosphatidylcholines (lysoPCs) in the metabolism is very complex. They are
primarily generated by the phospholipase A2 enzyme activity, and like the enzyme, have a direct
role in toxic inflammatory responses. Reduced plasma lysoPC levels have been noted in sepsis
patients and systemic treatment with lysoPCs has shown to be therapeutic in rodent models of
sepsis and ischemia[130]. These observations suggest that elevation of plasma levels of these lipids
can actually help to relieve serious inflammatory conditions. Cunningham et al.[130] demonstrated
that specific lysoPCs act as uncompetitive product inhibitors of plasma secreted PLA2 enzymes,
especially under conditions of elevated enzyme activity, thus providing a feedback mechanism for
the observed anti-inflammatory effects of these compounds. Therefore, the reduction in circulating
lysoPC observed in NR patients may simply reflect their enhanced conversion to lysophosphatidic
acid, which is known to induce a multitude of cellular responses through its action on immunological
relevant cells[115]. It is conceivable that lysoPC reduction may also promote an excessive immune
response with detrimental effect in NR patients[99],[100].
5. Metabolomics profile associated to responsiveness to therapy
111
Interestingly, a decrease in circulating levels of lysoPC16 and 18 species have been also
reported in inflammatory liver disease[131]–[133]. To note that NR patients also showed a marked
decrease in PC species, whose genesis is in the liver. The imbalance of lysoPC/PCs cycling might
suggest that the hepatic homeostasis and functionality is compromised even before clinical
manifestation and that bilirubin alone cannot give a clear picture on liver condition. In addition, NR
patients had lower levels of PC species containing long chain polyunsaturated fatty acids (LCPUFAs),
such as PC aa C38:6, PC aa C36:6, PC aa C40:5 with further elongation/desaturation products. These
profile is in agreement with our previous finding of different composition of PC species as potential
metabolic determinants of mortality in septic shock patients[117] (see Chapter 3). Here again we
can speculate that, since LCPUFAs reduce T-cell activation and dampen inflammation[112], a
decrease in PC containing LCPUFAs can hamper their protective effects, including a concerted action
of either withdrawing pro-inflammatory eicosanoids or incrementing anti-inflammatory
eicosanoids. As a matter of fact, eicosanoids and pro-resolving lipids profiles have been recently
correlated with survival and clinical outcome in sepsis[134].
The multivariate models showed that lower variation in plasmalogens concentration
(plasmenylcholines PC ae C44:6, PC ae 40:2, PC ae 40:5, plasmanylcholine PC ae 38:0), in lysoPC
C16:0, and in fatty acids in combination with a higher increment of alanine were associated to non-
responsiveness (Figure 5.5). Plasmalogens serve as endogenous antioxidants, mediators of
membrane structure and dynamics, storage for polyunsaturated fatty acids and lipid
mediators[135]. Increasing plasmalogen levels protect human endothelia cells during hypoxia[136].
Reduced degree in plasmalogens abundance in NR group might reflect increased oxidative
imbalance probably due to an exaggerated systemic inflammatory response with a resulting
elevated level of oxidative stress. Diminished plasmalogen level has been reported as a surrogate
marker of oxidative stress in elderly septic patients[137]. Furthermore, an exaggerated systemic
inflammatory response in NR would be in accordance with the observed increased levels of
kynurenine, supporting the role of an accelerated tryptophan catabolism along the kynurenine
pathway in sepsis outcome[110],[117].
A novelty of this study is the emerging role of alanine. Alanine is a gluconeogenic amino acid
and plays a key role in glucose-alanine cycle, a series of reactions in which amino groups and carbons
from muscle are transported to the liver, as schematized in Figure 5.14. When muscles degrade
amino acids for energy needs, the resulting nitrogen is transaminated to pyruvate to form alanine.
This is performed by the enzyme alanine transaminase, which converts L-glutamate and pyruvate
5. Metabolomics profile associated to responsiveness to therapy
112
into α-ketoglutarate and L-alanine. The resulting L-alanine is shuttled to the liver where the nitrogen
enters the urea cycle and the pyruvate is used to make glucose.
Figure 5.13– Schema of the glucose-alanine cycle.
Enhanced elaboration of glucose by the liver (hepatic gluconeogenesis) is a prominent
feature of the solid organ response to injury and provide fuel to the cellular elements of the
inflammatory response. The increase of plasma alanine in NR may be a sign of lower hepatic capacity
for conversion of alanine to glucose. The higher level of pyruvic acid and lactic acid found in NR
patients at T2 (Figure 5.1) seems to further support this interpretation.
5.6 REMARKS
In conclusion, the data presented here reinforce the emerging evidence that lipidome
alteration plays an important role in the individual patient response to infection. Moreover, changes
in the levels of metabolites over time have been shown to discriminate positive response to therapy.
The understanding of regulatory pathway of lipids is thus crucial for the development of an effective
and tailored therapy. Furthermore, the emerging role of alanine could suggest a different approach
for monitoring hepatic functionality which will be more specific than bilirubin. For the future, further
studies should investigate if metabolic dysregulation could be corrected by a more target therapy.
We acknowledge the possibility of overfitting of the classifier model to our limited set of
subjects in this investigation, despite our attempt to minimize such effects with the statistical
methods used. Furthermore, in these analyses we could not take into account all the possible
cofounding factors such as different renal and hepatic functions, type of nutrition (parental or
enteral), latent insulin resistance condition.
113
6 DISCUSSION AND CONCLUSIONS
The main purpose of this PhD thesis was the analysis and integration of metabolomics data
in septic shock patient cohorts by applying data mining approaches. In particular, machine learning
tools were used to identify the main pathways associated to septic shock progression. In this study
the datasets analyzed included more than one hundred features and 20 patients or less, i.e. the
number of features is much greater than the number of observations. It is worth to underline that
this situation is not unusual when the data comes from clinical trials, and omics data sets are indeed
characterized by a huge number of features (hundreds for targeted metabolomics or thousands for
untargeted metabolomics). It was necessary to develop a suitable strategy by combining different
data mining techniques studied ad hoc for the specific scientific question and for the kind of data
considered.
In this dissertation, we demonstrated the feasibility and the robustness of the proposed
approaches: the performance of our models are good and the species identified by the models are
in line with other studies performed on larger populations and with investigations on the identified
pathways. To the best of our knowledge, no other scientific work has applied data mining techniques
to perform multilevel omics analyses with the aim to find association between plasma metabolome
changes and mortality or responsiveness to therapy. Thus, the obtained results represent a
significant advance in the field and could be an important step forward in order to identify putative
biological pathways to be further investigated.
The following conclusions summarize the main achievements and results of the thesis; the
limits of the study and the possible clinical impact. Future directions of the project will close the
dissertation.
6.1 MAIN FINDINGS
The analyses performed both on ALBIOS and ShockOmics datasets showed an overall
lipidome alteration associated with poor prognosis. This suggests an impairment of the energetic
metabolism and of mitochondrial functionality. Both aspects are thus determinant of pathology
progression and could constitute an important target for the development of a new therapy.
6. Discussion and conclusions
114
Interestingly, also the liver seems to play a crucial role in the evolution of septic shock. In
fact, from the analyses performed on ALBIOS data, increased plasma levels of gluconeogenic amino
acids indicated the occurrence of an early hepatic dysfunction in non survivor patients. An
impairment of hepatic gluconeogenesis, associated with poor prognosis (i.e. non responsive to
therapy), was supported also by the results obtained from the ShockOmics study. More precisely,
the emerging role of alanine, a gluconeogenic amino acid involved in the glucose-alanine cycle,
could suggest a new possible marker of liver functionality, which may eventually integrate the
information of the commonly measured bilirubin or may give a new picture of patient progress. The
measurement of total blood bilirubin is currently used in clinics to assess liver function but it is an
unspecific test, performed to diagnose and/or monitor several different diseases (e.g., cirrhosis,
hepatitis, cancer, etc.). As a consequence, it only reflects problems in the liver due to deficiencies in
one specific metabolite, without providing any clue on shock progression.
Lastly, when we integrated metabolomics and proteomics data, our findings shed light on
other pathophysiological mechanisms, i.e. coagulation and complement system. Although the
impairment in coagulation is already known to occur in septic shock patients, the integration of
proteomics and metabolomics data revealed the interplay among the different pathways involved,
thus highlighting previously unsuspected interactions (e.g. coagulation and lipid metabolism).
In conclusion, these findings confirmed the feasibility of our approach and the robustness of
our models, in spite of the limited number of patients. As for the pathophysiological pathways
identified from our analyses, not only they are in line with recent findings but they also underline
some new interesting metabolic mechanisms, such as the glucose-alanine cycle, which deserve
further investigations.
6.2 LIMITS AND CLINICAL IMPACT OF THE STUDY
Even if the results of the current work are very promising, some limitations must be
discussed. As already underlined, an important limit of the study is represented by the small size of
the datasets used to build the classification models. However, we tried to reduce the confounding
factors by focusing on a homogeneous groups of patients, i.e. severe septic shock ones. We thus
hypothesized that the changes observed are mainly related to shock progression and different
prognoses. Another limitation is that metabolites concentrations were measured only at two time
points (at 7 day distance in one dataset and at 48 hour distance in the other one). We argue that a
6. Discussion and conclusions
115
more frequent monitoring of metabolites temporal change might provide better insights on the
pathways activated at different stages of the disease.
Finally, we are aware that these results can be affected by overfitting since we did not have
an independent validation dataset. However, we must recall that we are not interested in prediction
but in the development of an approach to describe the current datasets and to identify the main
pathways involved in pathology progression within the studied cohorts. A thorough investigation on
such pathways requires a specific experimental design, which takes into account specific organ and
not only blood plasma. The latter in fact contains enzymes and other byproducts of such pathways
which merge in the blood stream, thus hampering a precise understanding of the molecular
mechanisms involved.
Despite these limitations, our results can be considered a further contribution in the current
clinical scenario, where the identification of biomarkers to stratify the SS patients at highest risk of
poor outcomes or the identification of those patients who could benefit of more specific therapies,
is critical. In this context, metabolomics analyses are promising for several reasons. First, the most
widely used prognostic biomarker is currently lactate, a byproduct of anaerobic metabolism, which
is very unspecific and unsuitable to monitor the complexity and the evolution of septic shock.
Second, in the spectrum from genotype to phenotype, metabolites are most closely correlated to
phenotype, and thus they are likely to be much more correlated with disease progression.
In conclusion, we can examine our results in light of recent findings which are in line to what
we have observed. In fact, our results showed that low level of lysoPCs are associated with poor
outcome. Reduced plasma lysoPC levels have been already observed in sepsis patients and recently
systemic lysoPCs treatment has been proved to be effective in rodent models of sepsis and
ischemia[138], [139]. These observations seem to suggest that elevation of plasma levels of these
lipids can actually help to relieve serious inflammatory conditions. Cunningham et al.[130] reported
that the two most abundant lysoPC species in plasma, i.e. palmitoyl or stearoyl lysoPC, i.e.
lysoPC(16:0) and lysoPC(18:0), inhibit plasma circulating phospholipase A2 (PLA2) enzymes, which
are responsible of the activity of the innate immune system and of several inflammatory disorders.
Consequently, under conditions of severe inflammatory stress and subsequent elevation of PLA2
enzymes activity, elevation of circulating levels of lysoPCs may promote the consequent inhibition
of PLA2 enzymes, thus favoring cytoprotection. In light of this findings lysoPC therapies may be of
6. Discussion and conclusions
116
great utility if administered systemically at an appropriate levels and at the appropriate point in the
inflammatory cascade.
The integration analyses performed on metabolomics and proteomics data shed light on the
interplay between lipid metabolism, inflammation and coagulation. Inflammation and coagulation
are strictly linked, thereby it has been suggested to use agents that interfere with the pathogenesis
of sepsis by modulating both pathways. To this purpose, Falcone et al[140]. analyzed patients with
septic shock from community-onset pneumonia, in order to evaluate whether any specific
therapeutic intervention was associated to improved survival. Some of the enrolled patients were
treated with aspirin, which is commonly used in clinical practice both for its anti-inflammatory and
anti-coagulant properties. Interestingly, it was found that this drug had a beneficial effect and
patients assuming aspirin showed a reduction in 30-day mortality rate compared to non-aspirin
users (4.9 % vs 23.4 % respectively). If this is due to an anti-inflammatory effect or to rather is the
result of an augmented pro-inflammatory cytokine response and pathogen clearance, as suggested
by Kiers and al[141]., is still unclear. However, even if a specific analysis is lacking in SS patients, it is
known that lysoPC and lipids are affected by aspirin[142]. If low-dose aspirin administration (i.e. 100
mg/day) could constitute a putative treatment for septic shock, a metabolomics approach could
help in understanding the pathways involved and thus in stratifying the patients who will benefit
from this therapy.
6.3 FUTURE DEVELOPMENTS
Further investigations are needed to better elucidate important pathophysiological
mechanisms involved in septic shock progression so to suggest novel targets for the administration
of a timely and effective therapy.
Animal experiments are currently ongoing and will be used to validate our results. In fact,
for validation, controlled experimental conditions of septic shock reduce the commonly large
variability in the data collected in a clinical setting, where the timing of blood samples is crucial and
no baseline is available. Animal studies will be used to fine-tune the hypothesis generated from the
clinical trial, on the basis of the results of the omics analyses. This will allow to reinforce or refine
our assumptions on the trigger mechanisms of septic shock and, eventually, to define new
therapeutic targets.
Analyses on cardiogenic shock patients enrolled in ShockOmics clinical trial will be also
performed. Some preliminary correlation analyses, done to qualitatively compare cardiogenic shock
6. Discussion and conclusions
117
patients and septic shock ones, are reported in Appendix C. The goal of these investigations will be
to analyze both groups in order to identify common pathways associated with disease progression;
for example, similarities in the inflammation process, one derived from infection and the other from
ischemia/reperfusion injury, will be evaluated. This information will be very useful to understand
the molecular mechanisms which triggers MOF and heart failure and which can be independent
from the root cause of shock.
Eventually, by merging the information gained from animal experiments and from the
analyses on cardiogenic shock patients, we aim to identify inflammatory mediators and molecular
markers activated in shock and to provide a list of putative biomarkers and pathways involved in
the progression of this syndrome. Their knowledge would in fact be crucial to guide a timely early
goal directed therapy and a personalized treatment.
118
Appendix
119
A METABOLOMICS ANALYSES
A.1 UNTARGETED METABOLOMICS BY FLOW INJECTION-TOF-MS
A rapid untargeted analysis by flow injection-TOF-MS was performed to screen for metabolic
features significantly characterizing the responsiveness (R group) and non-responsiveness (NR
group) to therapy in septic shock patients.
A.1.1 Samples preparation
Metabolites were extracted by adding four volumes of cold methanol to the plasma sample
(10 μL); samples were vortexed and incubated at -20°C for 1 hour. They were then centrifuged 10
min at 14,000xg, and the supernatant was collected, dried in a SpeedVac and resuspended in 50 μL
of 0.1% formic acid[143]. A portion (15 μL) of metabolite extract was analyzed by mass
spectrometry.
A.1.2 Flow Injection-TOF MS/MS
The analysis was performed on an Agilent 1290 infinity Series coupled to an Agilent 6550
iFunnel Q-TOF mass spectrometer (Agilent, Santa Clara, CA) equipped with an electrospray source
operated in negative and positive mode. The flow rate was 150 μL/min of mobile phase consisting
of isopropanol/water (60:40, v/v) buffered with 5 mM ammonium at pH 9 for negative mode and
methanol/water (60:40, v/v) with 0.1% formic acid at pH 3 for positive mode. Reference masses for
internal calibration were used in continuous infusion during the analysis (m/z 121.050873,
922.009798 for positive and m/z 11.9856, 1033.9881 for negative ionization). Mass spectra were
recorded from m/z 50 to 1100. Source temperature was set to 320°C with 15 L/min drying gas and
a nebulizer pressure of 35 psig. Fragmentor, skimmer, and octopole voltages were set to 175, 65,
and 750 V, respectively. MS/MS fragmentation pattern of the significantly features were collected
and used to confirmed metabolite identity.
A.1.3 MS Data Processing
All steps of data processing and analysis were performed with Matlab R2016a (The
Mathworks, Natick) using in-house developed script following the workflow proposed by
Fuhrer[125]. Centroid m/z lists were exported to .csv format. Briefly, in this procedure, we applied
A. Metabolomics analyses
120
a cut-off to filter peaks of less than 500 ion counts for negative and 1000 ion counts for positive
ionization to avoid detection of background noise. Centroid m/z lists from different samples were
merged to a single matrix by binning the accurate centroid masses within the tolerance given by the
instrument resolution (about 10 ppm). The output m x n matrix contains the m peak intensities of
each mass for the n analyzed samples. Because mass axis calibration is applied online during
acquisition, no m/z correction was applied during processing to correct for potential drifts.
A.1.4 Metabolite identification
Metabolite identification was performed after the preliminary statistical analyses described
in Chapter 5. The statistically significant m/z values were used for batch searches on metabolomics
databases. Metabolic species were identified matching the experimental accurate mass and tandem
mass spectra (MS/MS) in positive and negative ionization with those available in metabolomics
databases (METLIN and HMDB). Only positively and negatively charged forms of the molecule
([M+H]+ or [M-H]-) and not additional variants of ions were considered for metabolite identification
by means of databases. We did not pursue to identify complex lipids because of the need of internal
standards for lipid classes for their unambiguous identification. It should be noted that a given
molecule may be represented by several different features, such as naturally occurring components
of its isotopic cluster or non-specific adduct ions. Several analytes were detected only in positive
mode, while others were observed only in the negative ion mode as already reported for plasma
samples.
A.2 TARGETED METABOLOMICS
Targeted metabolomics analysis of plasma samples from study subjects was performed using
the Biocrates AbsoluteIDQTM p180 kit (Biocrates Life Science AG, Innskruck, Austria). This validated
targeted assay allows for simultaneous detection and quantification of metabolites in biological
samples in a high-throughput manner. The metabolite extracts were processed following the
instructions by the manufacturer and analyzed on a triple-quadropole mass spectrometer (AB SCIEX
triple-quad 5500) operating in the multiple reaction monitoring mode. The assay is based on PITC
(phenylisothiocyanate)-derivatization in the presence of internal standards for the analysis of amino
acids and biogenic amines resolved and quantified by liquid chromatography- tandem mass
spectrometry (LC-MS/MS) using scheduled MRMs. Subsequent flow injection analysis tandem mass
spectrometry (FIA-MS/MS) was performed to analyze acylcarnitines, glycerophospholipids, hexose.
A. Metabolomics analyses
121
MRM detection was used for quantification applying spectra parsing algorithm integrated into the
MetIQ software (Biocrates Life Science AG, Innskruck, Austria). Concentrations were calculated and
evaluated by comparing measured analytes in a defined extracted ion count section to those of
specific labeled internal standards or non-labeled ones, provided by the kit. The measurements are
made in a 96-well format. Seven calibration standards, five quality control samples, three zero
samples (methanol) and one blank (solvents) are integrated into the plate. Plate design is shown in
Figure A.1.
Figure A.1- Plate design for Targeted metabolomics analysis of plasma samples using the Biocrates Absolute IDQTM p180 kit.
The limit of detection for the individual metabolites is set three times the value of the “zero
samples”. The average coefficient of variation of the metabolites among the biological replicates
was set to 30% since this variation is the sum of biological and technical ones. Based on the five
quality controls (QCs) included in the mass spectrometric analysis to monitor the instrumental
performances and evaluate the quality of the data, the CV was below 15% (technical variation). For
glycerophospholipids, the precise position of the double bonds and the distribution of the carbon
atoms in different fatty acid side chains cannot be determined with this technology. As a
consequence, the detected MRM signal is a sum of several isobaric/isomeric lipds. For example,
according to LIPID MAPS data base (www.lipidmaps.org) the signal of PC aa C36:6 can arise from at
least 15 different lipid species that have different fatty acid composition (e-g. PC 16:1/20:5 versus
PC 18:4/18:2, various position of fatty acid sn-1/sn-2 (e.g. PC 18:4/18:2 versus PC 18:2/18:4) and
different double bond positions and stereochemistry in those fatty acid chains (e.g. PC(18:4(6Z,)
Z,12Z,15Z)/18:2(9Z,12Z) versus PC (18:4(9E,11E,13E,15E)/18:2(9z,12Z)).
Lipid side-chain composition is abbreviated as Cx:y, where x denotes the number of carbons
in the side chain and y the number of double bonds. The nature of fatty acids linkage is expressed
as aa for diacyl or ae for acyl-alkyl. For example, PCaaC32:1 denotes diacyl-phosphatidylcholine with
32 carbons in the two fatty acids side chains and a single double bond in one of them. The list of all
the measurable metabolites is provided in Table A.1.
A. Metabolomics analyses
122
METABOLITE CLASS NUMBER METABOLITE NAME OR ABBREVIATION BIOLOGICAL RELEVANCE
AMINO ACIDS 21 Alanine, arginine, aspartate, citrulline, glutamine, glutamate, glycine,
histidine, isoleucine, leucine, lysine, methionine, ornithine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, valine
Amino acid metabolism, urea cycle, activity of gluconeogenesis and glycolysis, insulin
sensitivity, neurotransmitter metabolism, oxidative stress
CARNITINE 1 C0 Energy metabolism, fatty acid transport and mitochondrial fatty acid oxidation, ketosis, oxidative stress, mitochondrial membrane
damage ACYLCARNITINE 39
C2, C3, C3:1, C3-OH, C4, C4:1, C4-OH, C5, C5:1, C5:1-DC, C5-DC, C5-M-DC, C5-OH, C6, C6:1, C7-DC, C8, C9, C10, C10:1, C10:2, C12, C12-DC, C14,
C14:1, C14:1-OH, C14:2, C14:2-OH, C16, C16:1, C16:1-OH, C16:2, C16:2-OH, C16-OH, C18, C18:1, C18:1-OH, C18:2
BIOGENIC AMINES 19
Acetylornithine, asymmetric dimethylarginine, total dimethylarginine, alpha-aminoadipic acid, carnosine, creatinine, histamine, kynurenine,
methionine sulfoxide, nitrotyrosine, hydroxyproline, phenylethylamine, putrescine, sarcosine, serotonin, spermidine, spermine, taurine
Neurological disorders, cell proliferation, cell cycle progression, DNA stability, oxidative stress
LYSO-PHOSPHATIDYLCHOLINES 14 lysoPC a
C14:0/C16:0/C16:1/C17:0/C18:0/C18:1/C18:2/C20:3/C20:4/C26:0/C26:1/C28:0/C28:1
Degradation of phospholipids, membrane damage, signaling cascades, fatty acid profile
DIACYL-PHOSPHATIDYLCHOLINES 38
PC aa C24:0/C26:0/C28:1/C30:0/C30:2/C32:0/C32:1/C32:2/C32:3/C34:1/C32:2/C34:3/C32:4/C36:0/C36:1/C36:2/C36:3/C36:4/C36:5/C36:6/C38:0/C38:1/C38:3/C38:4/C38:5/C38:6/C40:1/C40:2/C40:3/C40:4/C40:5/C40:6/C42:0/
C42:1/C42:2/C42:4/C42:5/C42:6 Dyslipidemia, membrane composition and damage, fatty acid profile, activity of
desaturases ACYL-ALKYL-
PHOSPHATIDYLCHOLINE 38
PC ae C30:0/C30:2/C32:1/C32:2/C34:0/C34:1/C34:2/C34:3/C36:0/C36:1/C36:2/C36:3/C36:4/C36:5/C38:0/C38:1/C38:2/C38:3/C38:4/C38:5/C38:6/C40:1/C40:2/C40:3/C40:4/C40:5/C40:6/C42:0/C42:1/C42:2/C42:3/C42:4/C42:5/
C44:3/C44:4/C44:5/C44:6
SPHINGOMYELINS 15 SM (OH) C14:1, SM C16:0, SM C16:1, SM C16:1, SM C18:0, SM C18:1, SM
C20:2, SM C22:3, SM (OH) C22:1, SM (OH) C22:2, SM C24:0, SM C24:1, SM (OH) C24:1, SM C26:0, SM C26:1
Signaling cascades, membrane damage (e.g. neurodegeneration)
HEXOSE 1 H1 Carbohydrate metabolism TOTAL 186
Table A.1 - List of the measurable metabolites using the Biocrates Absolute IDQ p180 kit. Aa, acyl-acyl; ae, acyl-alkyl; a, lyso; Cx:y, where x is the number of carbons in the fatty acid side chain; y is the number of double bonds in the fatty acid side chain; DC, decarboxyl; M methyl; OH, hydroxyl; PC, phosphatidylcholine; SM, sphingomyeline.
123
B PROTEOMICS ANALYSES BY ITRAQ QUANTITATION
B.1 STUDY DESIGN
A multi-iTRAQ experiment was designed to compare the plasma protein pattern expression
between survivor (S) and non-survivor (NS) patients. Two time points were analyzed to compare
both study groups: day 1 (acute state, D1) and day 7 after diagnosis of septic shock (steady state,
D7). Sample from 17 septic shock patients (9 S and 8 NS) and from 5 healthy donors (M1 to M5)
were arranged in six iTRAQ™ 8plex experiment. The 5 healthy donors were used for LC-MS
normalization purposes.
B.2 SAMPLE PREPARATION
B.2.1 Human Plasma depletion
Collected human plasma samples (30 µl) were depleted to remove the 14 most abundant
plasma protein using a Seppro® IgY 14 LC2 inmunoaffinity column (Sigma Aldrich) as per
manufacturer’s indications. After depletion, resulting plasma was concentrated and buffer
exchanged to Tris-HCl/150 mM NaCl (pH 7.4) using centrifugal filters (Millipore 4.5 ml filters, 10 kDa,
4000 g, 15 o C), and then quantified by Micro BCA™ Protein Assay Kit (Thermo Scientific).
B.2.2 In solution sample digestion
35 µg of depleted protein plasma were brought up to 75 µl with 50 mM triethylammonium
bicarbonate (TEAB) and then denatured with 9 µl of Rapigest® detergent (Waters) (0.15% w/v in
digestion step). Samples were reduced with tris (2-carboxyethyl) phosphine (5,5 mM, 60 min, 60oC),
and alkylated with iodoacetamide (25 mM, RT, 30 minutes in the dark). Proteins were digested with
1.4 µg of trypsin per sample for 4 h (Promega, trypsin sequence; 37oC, pH 8.0) and then re-digested
for other additional 16 h (37oC, pH 8.0) with 7 µg of trypsin per sample. After digestion, the resulting
solution was acidified with trifluoroacetic acid TFA (1% final concentration, pH<2) and incubated for
60 min at 37oC to hydrolyze Rapigest®. The acidified solution was centrifuged at 14000 rpm, and the
supernatant peptide mixture recovered and desalted in a C18 tip (P200 Toptip, PolyLC), as per
manufacturer's indications. The peptide solution was dried in a SpeedVac system and kept at -20oC
until used.
B. Proteomics analyses by iTRAQ quantitation
124
B.2.3 Peptide labeling
Figure B.1 shows the experimental design. Each iTRAQ run (numbered from 1 to 6 in Figure
B.1) was composed of samples from three septic shock patients both at D1 and D7 (six samples in
total), and of samples from two healthy donors as internal standard to test technical reproducibility.
Digested and depleted samples were resuspended in 30 μL 500 mM TEAB, to perform iTRAQ labeling
(iTRAQ™ 8plex Multiplex kit) according to the product specifications. Briefly, 70 µl of isopropanol
were added to each vial of iTRAQ labeling reagent. The vials were vortexed for 1 minute and
spinned. The content of these vials was transferred to each sample tube, and sample-iTRAQ
mixtures were mixed and incubated at room temperature for 2 h to allow the iTRAQ labeling
reaction. Samples were arranged and labeled with the isobaric tag reporters as schematized in
Figure B.1.
113 114 115 116 117 118 119 121
iTRAQ run D1 D7 D1 D7 D1 D7 Internal standard
1 1 1 2 2 3 3 M1 M3
2 4 4 5 5 6 6 M4 M5
3 7 7 8 8 9 9 M2 M5
4 10 10 11 11 12 12 M3 M4
5 13 13 14 14 15 15 M2 M1
6 16 16 17 17 M5 M1 NS patients D1/D7 Acute state/Steady state S patients M1-M5 Healthy donors
Figure B.1 – Scheme of the iTRAQ experimental design. Note that for each run samples from S and NS were combined
An aliquot of each reaction was cleaned up with a C18 homemade stage tip and analyzed by
LC-MS/MS to ensure complete labeling. 100 µl of water were added to reaction mixtures in order
to quench the iTRAQ reaction and labeled samples were combined and dried down in a SpeedVac
system.
B.2.4 Sample clean-up and fractionation
Before LC-MS/MS analysis, two clean-up steps were performed on the labelled mixture. The
iTRAQ-labeled samples were first fractionated into 10 parts with a high pH reversed phase spin
B. Proteomics analyses by iTRAQ quantitation
125
column (Pierce). In the first clean-up step, the sample was resuspended in 100 μL 1% formic acid
(FA) solution, desalted in a C18 tip (P200 Toptip, PolyLC) and dried in a SpeedVac system. In the
second clean up step, dried peptides were resuspended in 100 μL 20 % acetonitrile (ACN)/0.1 % FA
(pH 2.7-3), cleaned in a strong cationic exchange tip (P200 toptip, PolySULFOETHYL A PolyLC /0.1%
FA) and dried in a SpeedVac system.
The sample was then subjected to high pH fractionation with a high pH reversed phase
peptide fractionation kit (Pierce, ref.84868) following the manufacturer’s instructions. Briefly the
samples were loaded onto a spin column in 0.1% TFA, washed and buffer exchanged with high pH
buffer and then eluted in 9 fractions of increasing acetonitrile (ACN) concentration (f1 = 10% ACN;
f2 = 12.5% ACN; f3 = 15% ACN; f4 = 17.5% ACN; f5= 20 % ACN; f6 = 22.5% ACN; f7= 25% ACN; f8 =
50% ACN; f9 = 75% ACN). Flow through and wash fractions were pooled and analyzed as FTwash
fraction. The fractions (a total of 10) were dried down in a speed-vacuum centrifuge.
B.3 LC-MS/MS ANALYSES
The 10 dried-down fractions were analysed in a nanoAcquity liquid chromatographer
(Waters) coupled to a LTQ-Orbitrap Velos (Thermo Scientific) mass spectrometer. Tryptic labelled
peptides of each fraction were resuspended in 2% ACN/1% FA solution and an aliquot was injected
for their chromatographic separation. Peptides were trapped on a Symmetry C18TM trap column (5
µm 180 µm x 20 mm; Waters), and separated using a C18 reverse phase capillary column (75 μm Øi,
25 cm, nano Acquity, 1.7μm BEH column; Waters). The gradient used for the elution of the peptides
was 2 to 35 % B in 155 minutes, followed by gradient from 35% to 45% in 20 min (A: 0.1% FA; B:
100% ACN, 0.1% FA), with a 250 nL/min flow rate.
Eluted peptides were subjected to electrospray ionization in an emitter needle (PicoTipTM,
New Objective) with an applied voltage of 2000V. Peptide masses (m/z 300-1800) were analysed in
data dependent mode where a full Scan MS in the Orbitrap with a resolution of 30,000 full width
half maximum (FWHM) at 400 m/z was obtained. Up to 15 most abundant peptides (minimum
intensity of 2000 counts) were selected from each MS scan and then fragmented with HCD (Higher
Energy Collision Dissociation) in C-trap using nitrogen as collision gas with 40% normalized collision
energy. Following, they were analyzed in the Orbitrap with a resolution of 7,500 FWHM at 400 m/z.
The scan time settings were: Full MS 250 ms (1 microscan) and MSn 300 ms (2 microscans).
Generated raw data files (raw format) were collected with Thermo Xcalibur (v.2.2).
B. Proteomics analyses by iTRAQ quantitation
126
B.4 DATABASE SEARCH
Thermo Proteome Discover (v.1.4.1.14) was used to search with SequestHT search engine
against the SwissProt Human public database (v. March 2015). For each iTRAQ batch, 10 raw files
corresponding to the 10 injections from the MS analyses were used to perform a single search
against this database. A database search against both a targeted and a decoy database was made
to obtain a false discovery rate (FDR), and thus estimate the number of incorrect peptide-spectrum
matches which exceed a given threshold. Additionally, to improve the sensitivity of the database
searching, the semi-supervised Percolator algorithm was used in order to enhance the
discrimination of correct and incorrect peptide spectrum matches. Percolator assigns a q-value to
each spectrum, which is defined as the minimal FDR at which the identification is deemed to be
correct. These q-values are estimated using the distribution of scores from a decoy database search.
A quantification method for iTRAQ™ 8-plex mass tags optimized for Thermo Scientific Instruments
was applied to obtain the reporter ion intensities. The following search parameters were applied:
• Database/Taxonomy: SwissProt Human (v. March2015) + contaminants • Enzyme: Trypsin • Missed cleavage: 2 • Fixed modifications: Carbamidomethyl of cystein, iTRAQ8plex (N-term) • Variable modifications: oxidation of methionine, iTRAQ8plex (Y), iTRAQ8plex (K) • Peptide tolerance: 10 ppm and 0.1 Da (respectively for MS and MS/MS spectra) • Percolator: Target FDR (Strict) 0.01; validation based on: q-value<0.01
B.5 DATA ANALYSIS
Reporter intensities from Proteome Discoverer quantitation files were used to perform iTRAQ
quantitation. Within each iTRAQ™ 8plex experiment, reporter ion intensities of each individual
peptide were summed from 10 injected fractions (LC-MS run), log2 transformed and then LOESS
normalized against mean global intensity from the all 6 iTRAQ™ 8plex experiments. Protein
abundance was defined as the mean of the normalized intensities values belonging to the given
protein for each reporter. Data validation and normalization was performed by R, v3.1.2 and Inferno
RND software v 1.0 (graphical front-end to R forcommon data analysis; Pacific Northwest National
Laboratory; US Department of energy).
B. Proteomics analyses by iTRAQ quantitation
127
UniProt ID
Protein name Main functions O75882 Attractin Inflammatory response P00746 Complement factor D complement activation P00751 Complement factor B complement alternate pathway, innate immunity P00951 Carbonic anhydrase 1 bicarbonate transport P01011 Alpha-1-antichymotrypsin acute phase inflammatory response P02649 Apolipoprotein E lipid metabolism and transport P02741 C-reactive protein acute phase P02745 Complement C1q subcomponent subunit A complement pathway, innate immunity P02746 Complement C1q subcomponent subunit B complement pathway, innate immunity P02750 Leucine-rich alpha-2-glycoprotein angiogenesis, endothelial cell proliferation P02765 Alpha-2-HS-glycoprotein acute-phase response P02790 Hemopexin Host-virus interactions P05155 Plasma protease C1 inhibitor blood coagulation P05543 Thyroxine-binding globulin negative regulation of endopeptidase activity, thyroid hormone transport P06276 Cholinesterase contributes to the inactivation of the neurotransmitter acetylcholine P06681 Complement C2 complement pathway, innate immunity and immunity P06727 Apolipoprotein A-IV lipid transport P07358 Complement component C8 beta chain complement pathway, innate immunity and immunity, cytolysis P07360 Complement component C8 gamma chain complement pathway P01034 Cystatin-C cysteine proteinases inhibitor P13769 Recombinase Flp protein DNA recombination and integration P15169 Carboxypeptidase N catalytic chain peptide metabolic process, protein processing P18065 Insulin-like growth factor-binding protein 2 T-cell regulation P18428 Lipopolysaccharide-binding protein lipid transport, innate immunity and immunity P19823 Inter-alpha-trypsin inhibitor heavy chain H2 serine-type endopeptidase inhibitor activity P20851 C4b-binding protein beta chain controls the classical pathway of complement activation P22792 Carboxypeptidase N subunit 2 regulation of catalytic activity P25311 Zinc-alpha-2-glycoprotein stimulation of lipid degradation P36222 Chitinase-3-like protein 1 apoptotic process, inflammatory response P49908 Selenoprotein P response to oxidative stress Q15582 Transforming growth factor-beta-induced protein ig-h3 cell adhesion, sensory transduction Q86VB7 Scavenger receptor cysteine-rich type 1 protein M130 acute-phase response Q96PD5 N-acetylmuramoyl-L-alanine amidase innate immune response, regulation of inflammatory response Q9Y5Y7 Lymphatic vessel endothelial hyaluronic acid receptor 1 receptor activity (transport), metabolic processes
Table B.1– List of the proteins identified in univariate and multivariate analyses (alphabetic order) reporting UniProt ID, extended protein names and main functions.
128
C COMPARISON OF METABOLOMICS PROFILE OF CARDIOGENIC AND SEPTIC
SHOCK PATIENTS
C.1 AIM OF THE ANALYSES
In this ancillary study we compared 13 cardiogenic shock patients (CS) and 21 septic shock
patients admitted with shock to the ICU of Geneva University Hospitals and enrolled in the
multicenter clinical trial ShockOmics (NCT02141607). The aim of this explorative analyses was to
qualitatively compare cardiogenic shock patients and septic shock patients by means of correlation
between metabolite concentration measured at T1 (acute-phase) and T2 (post treatment phase), as
already described in Chapter 5. Also the time trend variation, expressed as Δ= T1-T2 values of
metabolite concentrations, were analyzed.
C.2 PRELIMINARY RESULTS
Correlation among different metabolites was computed by means of the Pearson correlation.
Metabolites were considered correlated when both conditions p-value<0.05 and |ρ| > 0.7 held,
where ρ is the Pearson correlation coefficient. Correlations are represented in the following
heatmaps; red cells indicate a positive correlation, blue cells negative ones. White cells imply that
the correlation is not meaningful. The intensity of the color represents the strength of the
correlation, the darker is the color the higher is the level of correlation. Results for the individual
time point T1, T2 and the Δ are shown in figures C.1, C.2 and C.3 respectively.
At T1 (Figure C.1) the main differences concern the correlation of biogenic amines with PCs
and amino acids (AAs). In fact, different metabolites are correlated and correlations are both
positive and negative in CS patients, and only positive in CS patients. Moreover, the correlation
among AAs and lipid species (lysoPC, PC and shingolipids) is completely absent in the SS cohort,
differently from what happens in CS patients.
At T2, we notice that overall there are more correlation among metabolites in CS patients
than SS ones (Figure C.2). More precisely, correlation among biogenic amines and AAs with lipid
species (lysoPC, PC and shingolipids) completely absent in SS patients, whereas few correlations
(both positive and negative) are present in CS patients. Moreover, PCs seems more correlated in the
CS group.
C. Comparison of metabolomics profile of cardiogenic and septic shock patients
129
As for the Δ the main differences can be found again in the correlation among biogenic amines
and AAs with lipid species (lysoPCs, PC and shingolipids) and also of lysoPCs with the PCs and
shingolipds.
Interestingly, we noticed that correlations among lysoPCs are similar in CS and SS patients in
all the analyses.
C.3 REMARKS
Overall, this qualitative explorative analyses compare the metabolomic profiles of CS and SS.
Lipids appear to have a key role in both conditions even if different interactions among them and
with our species seem to occur. Further hypothesis-driven analyses are advisable to verify which
metabolic pathways are differentially expressed in the two groups and, conversely, which are in
common. Animal models of SS and CS are ongoing and the opportunity to test in a such controlled
experimental condition a common metabolic pathway could help in elucidating key inflammatory
mechanisms.
C. Comparison of metabolomics profile of cardiogenic and septic shock patients
130
Figure C.1 – Correlation analyses of cardiogenic and septic shock patients at T1
C. Comparison of metabolomics profile of cardiogenic and septic shock patients
131
Figure C.2 – Correlation analyses of cardiogenic and septic shock patients at T2
C. Comparison of metabolomics profile of cardiogenic and septic shock patients
132
Figure C.3– Correlation analyses of cardiogenic and septic shock patients for the Δ.
133
D LIST OF PUBLICATIONS
The complete list of journal papers and conference proceeding contributions published
during the PhD program by the candidate is here reported:
ISI Journal Papers
• A. CAMBIAGHI, B. Bollen Pinto, L. Brunelli, F. Falcetta, K. Bendjelid, F. Aletti, R. Pastorelli, M. Ferrario, “Characterization of a metabolomic profile associated with responsiveness to therapy in the acute phase of septic shock”, Scientific Report (submitted)
• N. Clendenen, A. Tollefson, M. Dzieciatkowska, A. CAMBIAGHI, M. Ferrario, M. Kroehl, A. Banerjee, A. D'Alessandro, K. C. Hansen, N. Weitzel, “Correlation of pre-operative plasma protein concentrations in cardiac surgery patients with bleeding outcomes using a targeted quantitative proteomics approach”, Proteomics-Clinical Applications, 2 (2017)
• A. CAMBIAGHI, M. Ferrario, M. Masseroli, “Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration”, Briefings in Bioinformatics (2016)
• M. Ferrario, A. CAMBIAGHI, L. Brunelli, S. Giordano, P. Caironi, L. Guatteri, F. Raimondi, L. Gattinoni, R. Latini, S. Masson, G. Ristagno, R. Pastorelli, “Mortality prediction in patients with severe septic shock: a pilot study using a target metabolomics approach”, Scientific Report, 6 (2016)
Conference Proceedings
• A. CAMBIAGHI, B. Bollen Pinto, L. Brunelli, F. Falcetta, K. Bendjelid, F. Aletti, R. Pastorelli, M. Ferrario, “Responsiveness to therapy in the acute phase of septic shock: a metabolomics analysis”, 40th Annual Conference on shock, Fort Lauderdale, Florida, June 9-12 2017 (accepted).
• A. CAMBIAGHI, M. Ferrario, B. Bollen Pinto, K. Bendjelid, L. Brunelli, R. Pastorelli, “Metabolomic state as early indicator of organ improvement in septic shock patients: feasibility study for small data sample”, 12th Annual Conference of the Metabolomics Society, Dublin, Ireland
• A. CAMBIAGHI, M. Ferrario, E. Moore et al., “A classification model based on metabolomics and proteomics for acute traumatic coagulopathy patients: a feasibility study”, 39th Annual Conference on shock, Austin, Texas, June 11-14 2016 (abstract published in SHOCK 2016, 45 (6) – pp: 78-78)
• A. CAMBIAGHI, L. Brunelli, Caironi P, et al., “SCK-3: Target metabolomics for improving early prediction of death in patients with septic shock”, XVI. Congress of the EUROPEAN SHOCK SOCIETY, Cologne, Germany, September 24-26 2015 (abstract published in Shock Journal, 2015, 44(2) - pp: 1-27)
134
Bibliography
[1] I. Damjanov, Pathology for the Health Professions, 4th ed. St. Louis, Missouri (USA): Elsevier, Saunders, 2000.
[2] Miller-Kaene, Encyclopedia and Dictionary of Medicine, Nursing, and Allied Health, Seventh Ed. 2003.
[3] M. Singer et al., “The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3).,” Jama, vol. 315, no. 8, pp. 801–10, 2016.
[4] D. C. Angus and T. van der Poll, “Severe sepsis and Septic Shock,” N Engl J Med, vol. 369, no. 1, pp. 840–51, 2013.
[5] M. Shankar-Hari et al., “Developing a New Definition and Assessing New Clinical Criteria for Septic Shock,” Jama, vol. 315, no. 8, p. 775, 2016.
[6] M. Garcia-Alvarez, P. Marik, and R. Bellomo, “Sepsis-associated hyperlactatemia,” Crit. Care, vol. 18, no. 5, p. 503, 2014.
[7] D. C. Angus, W. Linde-Zwirble, J. Lidicker, G. Clermont, J. Carcillo, and M. R. Pinsky, “Epidemiology of severe sepsis in the United States: analysis of incidence, outcome, and associated costs of care,” Crit. Care Med., vol. 29, no. 7, pp. 1301–1310, 2001.
[8] C. Torio and R. Andrews, “National inpatient hospital costs: the most expensive conditions by payer, 2011.,” in Healthcare Cost and Utilization Project (HCUP), Statistical Briefs., 2013.
[9] et al; Fleischmann C, Scherag A, Adhikari NK, “International Forum of Acute Care Trialists. Assessment of global incidence and mortality of hospital-treated sepsis: current estimates and limitations.,” Am J Respir Crit Care Med, vol. 193, no. 3, pp. 259–72, 2016.
[10] J. Vincent et al., “Sepsis in European intensive care units: results of the SOAP study,” Crit. Care Med., vol. 34, no. 2, pp. 344–353, 2006.
[11] L. K. Iwashyna TJ, Ely EW, Smith DM, “Long-term cognitive impairment and functional disability among survivors of severe sepsis,” Jama, vol. 304, no. 16, pp. 1787–1794, 2010.
[12] D. De Backer, D. Orbegozo Cortes, K. Donadello, and J.-L. Vincent, “Pathophysiology of microcirculatory dysfunction and the pathogenesis of septic shock.,” Virulence, vol. 5, no. 1, pp. 73–9, 2014.
[13] F. Lupu, R. S. Keshari, J. D. Lambris, and K. M. Coggeshall, “Crosstalk between the coagulation and complement systems in sepsis,” Thromosis Reasearch, vol. 133, pp. S28–S31, 2014.
[14] D. Brealey et al., “Association between mitochondrial dysfunction and severity and outcome of septic shock,” Lncet, vol. 360, no. 9328, pp. 219–223, 2002.
[15] H. M. McBride, M. Neuspiel, and S. Wasiak, “Mitochondria: more than just a powerhouse.,” Curr. Biol., vol. 16, no. 14, pp. R551-60, Jul. 2006.
[16] B. Albert, A. Johnson, and J. Lewis, Molecular Biology of the Cell, 4th ed. New York: Garland Science, 2002.
[17] E. J. Lesnefsky, S. Moghaddas, B. Tandler, J. Kerner, and C. L. Hoppel, “Mitochondrial dysfunction in cardiac disease: ischemia--reperfusion, aging, and heart failure.,” J. Mol. Cell. Cardiol., vol. 33, no. 6, pp. 1065–89, Jun. 2001.
Bibliography
135
[18] T. Doenst, N. Td, and A. Ed, “Cardiac metabolism in heart failure : implications beyond ATP production. PubMed Commons,” vol. 113, no. 6, p. 23989714, 2014.
[19] H. Ashrafian, M. P. Frenneaux, and L. H. Opie, “Metabolic mechanisms in heart failure.,” Circulation, vol. 116, no. 4, pp. 434–48, Jul. 2007.
[20] G. J. van der Vusse, M. van Bilsen, and J. F. Glatz, “Cardiac fatty acid uptake and transport in health and disease.,” Cardiovasc. Res., vol. 45, no. 2, pp. 279–93, Jan. 2000.
[21] N. Fillmore, A. A. Osama, and G. D. Lopaschuk, “Fatty Acid β-Oxidation: an overview,” 2011. .
[22] M. Van Bilsen, P. J. H. Smeets, A. J. Gilde, and G. Van der Vusse, “Metabolic remodelling of the failing heart: the cardiac burn-out syndrome?,” Cardiovasc. Res., vol. 61, no. 2, pp. 218–226, Feb. 2004.
[23] T. Doenst, T. D. Nguyen, and E. D. Abel, “Cardiac metabolism in heart failure: implications beyond ATP production.,” Circ. Res., vol. 113, no. 6, pp. 709–24, Aug. 2013.
[24] J. F. Turrens, “Mitochondrial formation of reactive oxygen species.,” J. Physiol., vol. 552, no. Pt 2, pp. 335–44, Oct. 2003.
[25] E. D. Crouser, “Mitochondrial dysfunction in septic shock and multiple organ dysfunction syndrome.,” Mitochondrion, vol. 4, no. 5–6, pp. 729–41, Sep. 2004.
[26] M. P. Murphy, “How mitochondria produce reactive oxygen species.,” Biochem. J., vol. 417, no. 1, pp. 1–13, Jan. 2009.
[27] R. Dellinger, M. Levy, and A. Rhodes, “Surviving Sepsis Campaign: international guidelines for management of severe sepsis and septic shock, 2012,” Crit. Care Med., vol. 41, no. 2, pp. 580–637, 2013.
[28] A. Kumar, P. Ellis, and Y. Arabi, “Initiation of inappropriate antimicrobila therapy resukts in a fivefold reduction of surbìvival in human septic shock,” Chest, vol. 136, no. 5, pp. 1237–1248, 2009.
[29] J. L. Vincent et al., “Use of the SOFA score to asses the incidence of organ dysfunction/failure in intensive care units: Results of a multicenter, prospective study,” Crit Care Med, vol. 26, no. November 1998, pp. 1793–1800, 1998.
[30] A. R. Joyce and B. Ø. Palsson, “The model organism as a system: integrating ‘omics’ data sets.,” Nat. Rev. Mol. Cell Biol., vol. 7, no. 3, pp. 198–210, Mar. 2006.
[31] J. Xia and D. S. Wishart, “MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data.,” Nucleic Acids Res., vol. 38, no. Web Server issue, pp. W71-7, Jul. 2010.
[32] D. Gomez-Cabrero et al., “Data integration in the era of omics: current and future challenges,” BMC Syst. Biol., vol. 8, no. Suppl 2, p. I1, 2014.
[33] B. Domon and R. Aebersold, “Challenges and Opportunities in Proteomics Data Analysis,” Mol. Cell. Proteomics, vol. 5, no. 10, pp. 1921–1926, 2006.
[34] D. S. Wishart, “Current progress in computational metabolomics.,” Brief. Bioinform., vol. 8, no. 5, pp. 279–93, Sep. 2007.
[35] B. Mehrotra and P. Mendes, “Bioinformatics approaches to integrate metabolomics and other system biology data,” in Plant Metabolomics, K. Saito, R. A. Dixon, and L. Willimitzer, Eds. Springer Berlin Heidelberg, 2006, pp. 105–115.
[36] M. Vinaixa, S. Samino, I. Saez, J. Duran, J. J. Guinovart, and O. Yanes, “A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data.,” Metabolites, vol. 2, no. 4, pp. 775–95, Jan. 2012.
Bibliography
136
[37] G. J. Patti, O. Yanes, and G. Siuzdak, “Metabolomics: the apogee of the omics triology,” Mol. Cell Biol., vol. 13, pp. 263–269, 2012.
[38] V. Shulaev, “Metabolomics technology and bioinformatics.,” Brief. Bioinform., vol. 7, no. 2, pp. 128–39, Jun. 2006.
[39] W. B. Dunn and D. I. Ellis, “Metabolomics: Current analytical platforms and methodologies,” TrAC - Trends Anal. Chem., vol. 24, no. 4, pp. 285–294, 2005.
[40] A. Krastanov, “Metabolomics - The state of art,” Biotechnol. Biotechnol. Equip., vol. 24, no. 1, pp. 1537–1543, 2010.
[41] A. Zhang, H. Sun, P. Wang, Y. Han, and X. Wang, “Modern analytical techniques in metabolomics analysis,” Analyst, vol. 137, no. 2, p. 293, 2012.
[42] M. Sugimoto, M. Kawakami, M. Robert, T. Soga, and M. Tomita, “Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis,” Curr. Bioinform., vol. 7, no. 1, pp. 96–108, 2012.
[43] A. L. Castle, O. Fiehn, R. Kaddurah-Daouk, and J. C. Lindon, “Metabolomics Standards Workshop and the development of international standards for reporting metabolomics experimental results.,” Brief. Bioinform., vol. 7, no. 2, pp. 159–65, Jun. 2006.
[44] D. Vuckovic, “Current trends and challenges in sample preparation for global metabolomics using liquid chromatography-mass spectrometry,” Anal. Bioanal. Chem., vol. 403, no. 6, pp. 1523–1548, 2012.
[45] C. A. Smith, E. J. Want, G. O. Maille, R. Abagyan, and G. Siuzdak, “XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignemetn, matching, and identification,” Anal. Chem., vol. 78, no. 3, pp. 779–787, 2006.
[46] D. S. Wishart et al., “HMDB 3.0-The Human Metabolome Database in 2013,” Nucleic Acids Res., vol. 41, no. D1, pp. 801–807, 2013.
[47] C. A. Smith et al., “METLIN: a metabolite mass spectral database,” Ther. Drug Monit., vol. 27, no. 6, pp. 747–751, 2005.
[48] P. Reshetova, A. K. Smilde, A. H. C. van Kampen, and J. a Westerhuis, “Use of prior knowledge for the analysis of high-throughput transcriptomics and metabolomics data.,” BMC Syst. Biol., vol. 8 Suppl 2, no. Suppl 2, p. S2, Jan. 2014.
[49] J. Xia and D. S. Wishart, “MetPA: A web-based metabolomics tool for pathway analysis and visualization,” Bioinformatics, vol. 26, no. 18, pp. 2342–2344, 2010.
[50] H. Ogata, S. Goto, K. Sato, W. Fujibuchi, H. Bono, and M. Kanehisa, “KEGG: Kyoto encyclopedia of genes and genomes,” Nucleic Acids Res., vol. 27, no. 1, pp. 29–34, 1999.
[51] R. Cavill, D. Jennen, J. Kleinjans, and J. J. Bried, “Transcriptomic and metabolomic data integration,” Brief. Bioinform., vol. 17, no. 5, pp. 891–901, 2016.
[52] M. Bantscheff and M. Schirle, “Quantitative mass spectrometry in proteomics : a critical review,” Anal. Bioanal. Chem., no. 389, pp. 1017–1031, 2007.
[53] K. Chandramouli and P.-Y. Qian, “Proteomics: challenges, techniques and possibilities to overcome biological sample complexity.,” Hum. Genomics Proteomics, vol. 2009, no. 239204, 2009.
[54] E. Dalmasso, D. Casena, and S. Miller, “Top-Down , Bottom-Up - The merging of two High-Performance technologies,” Bioradiations, no. 129, 2009.
[55] K. L. Cox, V. Devanarayan, A. Kriauciunas, C. Montrose, and S. Sittampalam, “Immunoassay
Bibliography
137
Methods,” in Assay Guidance Manual, G. Sittampalam, N. Coussen, and B. K, Eds. Eli Lilly & Company and the National Center for Advancing Translational Sciences, 2014, pp. 1–44.
[56] K. Chandramouli and P.-Y. Qian, “Proteomics: challenges, techniques and possibilities to overcome biological sample complexity.,” Hum. Genomics Proteomics, vol. 2009, no. 239204, 2009.
[57] S. Wiese, K. A. Reidegeld, H. E. Meyer, and B. Warscheid, “Protein labeling by iTRAQ: A new tool for quantitative mass spectrometry in proteome research,” Proteomics, vol. 7, no. 3, pp. 340–350, 2007.
[58] A. Schmidt, I. Forne, and A. Imhof, “Bioinformatic analysis of proteomics data,” BCM Syst. Biol., vol. 8, no. Suppl 2, pp. 1–7, 2014.
[59] R. Dellinger, M. Levy, and A. Rhodes, “Surviving Sepsis Campaign: international guidelines for management of severe sepsis and septic shock, 2012,” Crit. Care Med., vol. 41, no. 2, pp. 580–637, 2013.
[60] M. Garcia-Simon et al., “Prognosis biomarkers of severe sepsis and septic shock by 1h NMR urine metabolomics in the intensive care unit,” PLoS One, vol. 10, no. 11, pp. 1–12, 2015.
[61] S. Skibsted, M. K. Bhasin, W. C. Aird, and N. I. Shapiro, “Bench-to-bedside review: future novel diagnostics for sepsis - a systems biology approach.,” Crit. Care, vol. 17, no. 5, p. 231, 2013.
[62] O. Golubnitschaja et al., “Medicine in the early twenty-first century: paradigm and anticipation - EPMA position paper 2016,” EPMA J., vol. 7, no. 1, p. 23, 2016.
[63] P. Caironi et al., “Albumin replacement in patients with severe sepsis or septic shock,” N Engl J Med, vol. 370, no. 15, pp. 1412–1421, 2014.
[64] F. Aletti et al., “ShockOmics: multiscale approach to the identification of molecular biomarkers in acute heart failure induced by shock.,” Scand. J. Trauma. Resusc. Emerg. Med., vol. 24, no. 1, p. 9, 2016.
[65] C. Baumgartner, G. D. Lewis, M. Netzer, B. Pfeifer, and R. E. Gerszten, “A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury,” Bioinformatics, vol. 26, no. 14, pp. 1745–1751, 2010.
[66] A. Rodin, T. H. Mosley, A. G. Clark, C. F. Sing, and E. Boerwinkle, “Mining Genetic Epidemiology Data with Bayesian Networks Application to APOE Gene Variation and Plasma Lipid Levels,” J. Comput. Biol., vol. 12, no. 1, pp. 1–11, 2005.
[67] J. D. Storey and R. Tibshirani, “Statistical significance for genomewide studies,” Procedings Natl. Acad. Sci., no. 100, pp. 9440–9445, 2003.
[68] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Second., vol. 1. 2009.
[69] C. Vercellis, Business Intellignece: Data mining and Optimization for Decision Making. John Wiley & Sons, Ltd, 2009.
[70] A. E. Hoerl and R. W. Kennard, “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
[71] R. Tibshirani, “Regression Selection and Shrinkage via the Lasso,” Journal of the Royal Statistical Society B, vol. 58, no. 1. pp. 267–288, 1996.
[72] H. Zou and T. Hastie, “Regularization and variable selection via the elastic-net,” J. R. Stat. Soc., vol. 67, no. 2, pp. 301–320, 2005.
[73] G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Comput. Electr. Eng., vol. 40, no. 1, pp. 16–28, 2014.
Bibliography
138
[74] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artif. Intell., vol. 97, no. 1–2, pp. 273–324, 1997.
[75] S. H. Huang, “Supervised feature selection: A tutorial,” Artif. Intell. Res., vol. 4, no. 2, p. p22, 2015.
[76] T. N. Lal, O. Chapelle, J. Weston, and A. Elisseeff, “Embedded Methods,” Stud. Fuzziness Soft Comput., vol. 207, pp. 137–165, 2006.
[77] H. Peng, F. Long, and C. Ding, “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance and Min-Redundancy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226–1238, 2005.
[78] H. Li, Q. Xu, and Y. Liang, “libPLS: An Integrated Library for Partial Least Squares Regression and Discriminant Analysis,” Peer J Prepr., vol. 2, 2014.
[79] R. G. Brereton and G. R. Lloyd, “Partial least squares discriminant analysis: Taking the magic away,” J. Chemom., vol. 28, no. 4, pp. 213–225, 2014.
[80] H. Abdi, “Partial least squares regression and projection on latent structure regression,” Wiley Interdiscip. Rev. Comput. …, vol. 2, pp. 97–106, 2010.
[81] P. S. Gromski et al., “A tutorial review: Metabolomics and partial least squares-discriminant analysis - a marriage of convenience or a shotgun wedding,” Anal. Chim. Acta, vol. 879, pp. 10–23, 2015.
[82] N. V Chawla, “Data Mining for Imbalanced Datasets: An Overview,” in Data Mining and Knowledge Discovery Handbook, 2006, pp. 853–867.
[83] T. D. Nielsen and F. V. Jensen, Bayesian Network and Decision Graph. 2009.
[84] M. J. McGeachie, H. H. Chang, and S. T. Weiss, “CGBayesNets: Conditional Gaussian Bayesian Network Learning and Inference with Mixed Discrete and Continuous Data,” PLoS Comput. Biol., vol. 10, no. 6, 2014.
[85] M. Scutari, “Learning Bayesian Networks with the bnlearn R Package,” J. Stat. Softw., vol. 35, no. 3, pp. 1–22, 2010.
[86] C. Yuan, B. M. Malone, and X. Wu, “Learning Optimal Bayesian Networks Using A* Search.,” Int. Jt. Conf. Artif. Intell., pp. 2186–2191, 2011.
[87] S. Acid, L. M. De Campos, J. M. Fernández-Luna, S. Rodríguez, J. María Rodríguez, and J. Luis Salcedo, “A comparison of learning algorithms for Bayesian networks: A case study based on data from an emergency medical service,” Artif. Intell. Med., vol. 30, no. 3, pp. 215–232, 2004.
[88] K. P. Murphy, Machine Learning: a probabilistic perspective. 2012.
[89] A. J. Rogers et al., “Metabolomic derangements are associated with mortality in critically ill adult patients,” PLoS One, vol. 9, no. 1, pp. 1–7, 2014.
[90] J. Krumsiek, K. Suhre, T. Illig, J. Adamski, and F. J. Theis, “Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data.,” BMC Syst. Biol., vol. 5, no. 1, p. 21, 2011.
[91] G. C. G. Abreu, R. Labouriau, and D. Edwards, “High-Dimensional Graphical Model Search with the gRapHD R Package,” J. Stat. Softw., vol. 37, no. 1, pp. 1–18, 2010.
[92] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
[93] M. A. Hearst, S. T. Dumais, E. Osman, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intell. Syst., vol. 13, no. 4, pp. 18–28, 1998.
Bibliography
139
[94] A. Perner, “Hydroxyethyl starch 130/0.42 versus Ringer’s acetate in severe sepsis,” N. Engl. J. Med., vol. 367, no. 2, pp. 82–83, 2012.
[95] P. Asfar et al., “High versus low blood-pressure target in patients with septic shock,” N Engl J Med, vol. 370, no. 17, pp. 1583–1593, 2014.
[96] D. Schmerler, S. Neugebauer, K. Ludewig, F. M. Bremer-Streck, S. Brunkhorst, and M. Kiehntopf, “Targeted metabolomics for discrimination of systemic inflammatory disorders in critically ill patients,” J. Lipid Res., vol. 53, no. 7, pp. 1369–1375, 2012.
[97] B. Mickiewicz, G. E. Duggan, B. W. Winston, C. Doig, P. Kubes, and H. J. Vogel, “Metabolic profiling of serum samples by 1H nuclear magnetic resonance spectroscopy as a potential diagnostic approach for septic shock.Metabolic profiling of serum samples by 1H nuclear magnetic resonance spectroscopy as a potential diagnostic approach fo,” Crit. Care Med., vol. 42, no. 5, pp. 1140–9, 2014.
[98] R. J. Langley et al., “An integrated clinico-metabolomic model improves prediction of death in sepsis,” vol. 5, no. 12, 2014.
[99] W. Drobnik et al., “Plasma ceramide and lysophosphatidylcholine inversely correlate with mortality in sepsis patients.,” J. Lipid Res., vol. 44, no. 4, pp. 754–61, 2003.
[100] D. W. Park et al., “Impact of serial measurements of lysophosphatidylcholine on 28-day mortality prediction in patients admitted to the intensive care unit with severe sepsis or septic shock,” J. Crit. Care, vol. 29, no. 5, p. 882.e5-882.e11, 2014.
[101] R. A. Claus, A. C. Bunk, C. L. Bockmeyer, W. Losche, R. Kinscherf, and H. Deigner, “Role of increased sphingomyelinase activity in apoptosis and organ failure of patients with severe sepsis,” FASEB J., vol. 19, no. 12, pp. 1719–1721, 2005.
[102] T. Rival et al., “Alteration of plasma phospholipid fatty acid profile in patients with septic shock,” Biochimie, vol. 95, no. 11, pp. 2177–2181, 2013.
[103] H. J. Rhee, E. J. Kim, and J. K. Lee, “Physiological polyamines: simple primordial stress molecules,” J. Cell. Mol. Med., vol. 11, no. 4, pp. 685–703, 2007.
[104] L. L. Anzaldi and E. P. Skaar, “The evolution of a superbug: how Staphylococcus aureus overcomes its unique susceptibility to polyamines,” Mol. Microbiol., vol. 82, no. 1, pp. 1–3, 2011.
[105] H. R. Freund, J. A. J. Ryan, and J. E. Fischer, “Amino acid derangements in patients with sepsis: treatment with branched chain amino acid rich infusions,” Ann. urgery, vol. 188, no. 3, pp. 423–30, 1978.
[106] Y. Ohtake and M. G. Clemens, “Interrelationship between hepatic ureagenesis and gluconeogenesis in early sepsis,” Am. J. Physiol., vol. 260, no. 3, pp. E453-8, 1991.
[107] M. D. Sharma et al., “Reprogrammed foxp3(+) regulatory T cells provide essential help to support cross-presentation and CD8(+) T cell priming in naive mice,” Immunity, vol. 14, no. 33, pp. 942–54, 2010.
[108] M. D. Sharma et al., “Indoleamine 2,3-dioxygenase controls conversion of Foxp3+ Tregs to TH17-like cells in tumor-draining lymph nodes,” Blood, vol. 113, no. 24, pp. 6102–11, 2009.
[109] T. T. Lögters et al., “Increased plasma kynurenine values and kynurenine-tryptophan ratios after major trauma are early indicators for the development of sepsis,” Shock, vol. 32, no. 1, pp. 29–34, 2009.
[110] D. Changsirivathanathamrong et al., “Tryptophan metabolism to kynurenine is a potential novel contributor to hypotension in human sepsis,” Crit. Care Med., vol. 39, no. 12, pp. 2678–83, 2011.
Bibliography
140
[111] C. J. Darcy et al., “An observational cohort study of the Kynurenine to Tryptophan ratio in sepsis: association with impaired immune and microvascular function,” PLoS One, vol. 6, no. 6, p. e21185, 2011.
[112] P. C. Calder, S. J. Bevan, and E. A. Newsholme, “The inhibition of T-lymphocyte proliferation by fatty acids is via an eicosanoid-independent mechanism,” Immunology, vol. 75, no. 1, pp. 108–115, 1992.
[113] C. G. Radu, L. V. Yang, M. Riedinger, M. Au, and O. N. Witte, “T cell chemotaxis to lysophosphatidylcholine through the G2A receptor,” Procedings Natl. Acad. Sci., vol. 101, no. 1, pp. 245–50, 2004.
[114] B. L. Spangelo and W. D. Jarvis, “Lysophosphatidylcholine stimulates interleukin-6 release from rat anterior pituitary cells in vitro,” Endocrinology, vol. 137, no. 10, pp. 4419–26, 1996.
[115] M. H. Gräler and E. J. Goetzl, “Lysophospholipids and their G protein-coupled receptors in inflammation and immunity,” Biochim. Biophys. Acta, vol. 1582, no. 1–3, pp. 168–74, 2002.
[116] S. W. Standage, C. C. Caldwell, B. Zingarelli, and H. R. Wong, “Reduced peroxisome proliferator-activated receptor α expression is associated with decreased survival and increased tissue bacterial load in sepsis,” Shock, vol. 37, no. 2, pp. 164–9, 2012.
[117] M. Ferrario et al., “Mortality prediction in patients with severe septic shock: a pilot study using a target metabolomics approach,” Sci. Rep., vol. 6, no. August 2015, p. 20391, 2016.
[118] C. K. Chow and C. N. Liu, “Approximating discrete probability distributions with dependence trees,” IEEE Trans. Inf. Theory, vol. 14, no. 3, pp. 462–467, 1968.
[119] P. N. Nesargikar, B. Spiller, and R. Chavez, “The complement system: history, pathways, cascade and inhibitors,” Eur. J. Microbiol. Immunol., vol. 2, no. 2, pp. 103–111, 2012.
[120] H. H. Tong, Y. X. Li, G. L. Stahl, and J. M. Thurman, “Enhanced susceptibility to acute pneumococcal otitis media in mice deficient in complement C1qa, factor B, and factor B/C2,” Infect. Immun., vol. 78, no. 3, pp. 976–983, 2010.
[121] H. R. Wong et al., “Developing a clinically feasible personalized medicine approach to pediatric septic shock,” Am. J. Respir. Crit. Care Med., vol. 191, no. 3, pp. 309–315, 2015.
[122] H. P. Leite and L. F. P. de Lima, “Metabolic resuscitation in sepsis: a necessary step beyond the hemodynamic?,” J. Thorac. Dis., vol. 8, no. 7, pp. E552-7, 2016.
[123] A. J. Rogers et al., “Metabolomic derangements are associated with mortality in critically ill adult patients,” PLoS One, vol. 9, no. 1, p. e87538, 2014.
[124] S. Neugebauer et al., “Metabolite Profiles in Sepsis: Developing Prognostic Tools Based on the Type of Infection,” Crit Care Med, vol. 44, no. 9, pp. 1649–1662, 2016.
[125] T. Fuhrer, D. Heer, B. Begemann, and N. Zamboni, “High-throughput, accurate mass metabolome profiling of cellular extracts by flow injection-time-of-flight mass spectrometry,” Anal. Chem., vol. 83, no. 18, pp. 7074–7080, 2011.
[126] T. Cajka and O. Fiehn, “Toward Merging Untargeted and Targeted Methods in Mass Spectrometry-Based Metabolomics and Lipidomics,” Anal Chem, vol. 88, no. 1, pp. 524–545, 2016.
[127] E. D. Peltz et al., “Pathologic metabolism: an exploratory study of the plasma metabolome of critical injury,” J Trauma Acute Care Surg, vol. 78, no. 4, pp. 742–751, 2015.
[128] M. M. Levy et al., “Early changes in organ function predict eventual survival in severe sepsis.,” Crit. Care Med., vol. 33, no. 10, pp. 2194–2201, 2005.
[129] F. L. Ferreira, “Serial Evaluation of the SOFA Score,” vol. 286, no. 14, 2016.
Bibliography
141
[130] T. J. Cunningham, L. Yao, and A. Lucena, “Product inhibition of secreted phospholipase A2 may explain lysophosphatidylcholines’ unexpected therapeutic properties,” J Inflamm, vol. 5, p. 17, 2008.
[131] N. Tanaka, T. Matsubara, K. W. Krausz, A. D. Patterson, and F. J. Gonzalez, “Disruption of phospholipid and bile acid homeostasis in mice with nonalcoholic steatohepatitis,” Hepatology, vol. 56, no. 1, pp. 118–129, 2012.
[132] R. Lehmann et al., “Circulating lysophosphatidylcholines are markers of a metabolically benign nonalcoholic fatty liver,” Diabetes Care, vol. 36, no. 8, pp. 2331–2338, 2013.
[133] I. Maricic, E. Girardi, D. M. Zajonc, and V. Kumar, “Recognition of lysophosphatidylcholine by type II NKT cells and protection from an inflammatory liver disease,” J Immunol, vol. 193, no. 9, pp. 4580–4589, 2014.
[134] J. Dalli et al., “Human Sepsis Eicosanoid and Proresolving Lipid Mediator Temporal Profiles: Correlations With Survival and Clinical Outcomes,” Crit Care Med, vol. 45, no. 1, pp. 58–68, 2017.
[135] P. Brites, H. R. Waterham, and R. J. Wanders, “Functions and biosynthesis of plasmalogens in health and disease,” Biochim Biophys Acta, vol. 1636, no. 2–3, pp. 219–231, 2004.
[136] R. A. Zoeller, T. J. Grazia, P. LaCamera, J. Park, D. P. Gaposchkin, and H. W. Farber, “Increasing plasmalogen levels protects human endothelial cells during hypoxia,” Am J Physiol Hear. Circ Physiol, vol. 283, no. 2, pp. H671-9, 2002.
[137] T. Brosche, T. Bertsch, C. C. Sieber, and U. Hoffmann, “Reduced plasmalogen concentration as a surrogate marker of oxidative stress in elderly septic patients,” Arch Gerontol Geriatr, vol. 57, no. 1, pp. 66–69, 2013.
[138] J. J. Yan, J. S. Jung, L. J.E., and E. Al., “Therapeutic effects of lysophophatidylcholine in experimental sepsis,” Nat. Med., no. 10, pp. 161–167, 2004.
[139] O. Murch, M. Collin, B. Sepodes, S. J. Foster, and C. Mota-Filipe, H. Thiemermann, “Lysophosphatidylcholine reduces the organ injury and dysfunction in rodent models of Gram-negative and Gram-positive shock,” Br. J. Pharmacol., no. 148, pp. 769–777, 2006.
[140] M. Falcone et al., “Septic shock from community-onset pneumonia: is there a role for aspirin plus macrolides combination?,” Intensive Care Med., vol. 42, no. 2, pp. 301–302, 2016.
[141] H. D. Kiers, M. Kox, W. A. van der Heijden, N. P. Riksen, and P. Pickkers, “Aspirin may improve outcome in sepsis by augmentation of the inflammatory response,” Intensive Care Med., vol. 42, no. 6, p. 1096, 2016.
[142] R. C. Block et al., “The Effects of EPA, DHA, and Aspirin Ingestion on Plasma Lysophospholipids and Autotaxin,” Prostaglandins Leukot. Essent. Fat. Acids, vol. 87, no. 4–5, pp. 143–151, 2012.
[143] L. Brunelli et al., “A combination of untargeted and targeted metabolomics approaches unveils changes in the kynurenine pathway following cardiopulmonary resuscitation,” Metabolomics, vol. 9, no. 4, pp. 839–852, 2013.