FusedGroupLassoRegularizedMulti-TaskFeatureLearning ...adni.loni.usc.edu/adni-publications/Fused...

24
Neuroinformatics https://doi.org/10.1007/s12021-018-9398-5 ORIGINAL ARTICLE Fused Group Lasso Regularized Multi-Task Feature Learning and Its Application to the Cognitive Performance Prediction of Alzheimer’s Disease Xiaoli Liu 1,2 · Peng Cao 1 · Jianzhong Wang 3 · Jun Kong 3,4 · Dazhe Zhao 1,2 © Springer Science+Business Media, LLC, part of Springer Nature 2018 Abstract Alzheimer’s disease (AD) is characterized by gradual neurodegeneration and loss of brain function, especially for memory during early stages. Regression analysis has been widely applied to AD research to relate clinical and biomarker data such as predicting cognitive outcomes from MRI measures. Recently, multi-task based feature learning (MTFL) methods with sparsity-inducing 2,1 -norm have been widely studied to select a discriminative feature subset from MRI features by incorporating inherent correlations among multiple clinical cognitive measures. However, existing MTFL assumes the correlation among all tasks is uniform, and the task relatedness is modeled by encouraging a common subset of features via sparsity-inducing regularizations that neglect the inherent structure of tasks and MRI features. To address this issue, we proposed a fused group lasso regularization to model the underlying structures, involving 1) a graph structure within tasks and 2) a group structure among the image features. To this end, we present a multi-task feature learning framework with a mixed norm of fused group lasso and 2,1 -norm to model these more flexible structures. For optimization, we employed the alternating direction method of multipliers (ADMM) to efficiently solve the proposed non-smooth formulation. We evaluated the performance of the proposed method using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets. The experimental results demonstrate that incorporating the two prior structures with fused group lasso norm into the multi- task feature learning can improve prediction performance over several competing methods, with estimated correlations of cognitive functions and identification of cognition-relevant imaging markers that are clinically and biologically meaningful. Keywords Alzheimer’s disease · Multi-task learning · Sparse group lasso · Fused lasso Introduction Alzheimer’s disease (AD) is the most common cause of dementia, which mainly affects memory function, and pro- gress ultimately culminate in a state of dementia where all Peng Cao [email protected] Xiaoli Liu [email protected] 1 Computer Science and Engineering, Northeastern University, Shenyang, China 2 Key Laboratory of Medical Image Computing of Ministry of Education, Northeastern University, Shenyang, China 3 College of Information Science and Technology, Northeast Normal University, Changchun, China 4 Key Laboratory of Applied Statistics of MOE, Changchun, China cognitive functions are affected. The disease poses a serious challenge to the aging society. The worldwide prevalence of AD is predicted to quadruple from 46.8 million in 2016 (Alzheimer’s Association and et al. 2016) to 131.5 million by 2050 according to ADI’s World Alzheimer Report (Batsch and Mittelman 2015). Dementia also has a huge economic impact. Today, the total estimated worldwide cost of dementia of US is $818 billion, and it will achieve a trillion dollar disease by 2018. Predicting cognitive performance of subjects from neu- roimaging measures and identifying relevant imaging bio- markers are important focuses of the study of Alzheimer’s disease. Magnetic resonance imaging (MRI) allows the direct observation of brain changes such as cerebral atrophy or ventricular expansion (Castellani et al. 2010). Previ- ous work showed that brain atrophy detected by MRI is correlated with neuropsychological deficits (Frisoni et al. 2010). The relationships between commonly used cognitive measures and structural changes detected by MRI have been

Transcript of FusedGroupLassoRegularizedMulti-TaskFeatureLearning ...adni.loni.usc.edu/adni-publications/Fused...

  • Neuroinformaticshttps://doi.org/10.1007/s12021-018-9398-5

    ORIGINAL ARTICLE

    Fused Group Lasso Regularized Multi-Task Feature Learningand Its Application to the Cognitive Performance Predictionof Alzheimer’s Disease

    Xiaoli Liu1,2 · Peng Cao1 · JianzhongWang3 · Jun Kong3,4 ·Dazhe Zhao1,2

    © Springer Science+Business Media, LLC, part of Springer Nature 2018

    AbstractAlzheimer’s disease (AD) is characterized by gradual neurodegeneration and loss of brain function, especially for memoryduring early stages. Regression analysis has been widely applied to AD research to relate clinical and biomarker datasuch as predicting cognitive outcomes from MRI measures. Recently, multi-task based feature learning (MTFL) methodswith sparsity-inducing �2,1-norm have been widely studied to select a discriminative feature subset from MRI featuresby incorporating inherent correlations among multiple clinical cognitive measures. However, existing MTFL assumes thecorrelation among all tasks is uniform, and the task relatedness is modeled by encouraging a common subset of featuresvia sparsity-inducing regularizations that neglect the inherent structure of tasks and MRI features. To address this issue, weproposed a fused group lasso regularization to model the underlying structures, involving 1) a graph structure within tasksand 2) a group structure among the image features. To this end, we present a multi-task feature learning framework witha mixed norm of fused group lasso and �2,1-norm to model these more flexible structures. For optimization, we employedthe alternating direction method of multipliers (ADMM) to efficiently solve the proposed non-smooth formulation. Weevaluated the performance of the proposed method using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets.The experimental results demonstrate that incorporating the two prior structures with fused group lasso norm into the multi-task feature learning can improve prediction performance over several competing methods, with estimated correlations ofcognitive functions and identification of cognition-relevant imaging markers that are clinically and biologically meaningful.

    Keywords Alzheimer’s disease · Multi-task learning · Sparse group lasso · Fused lasso

    Introduction

    Alzheimer’s disease (AD) is the most common cause ofdementia, which mainly affects memory function, and pro-gress ultimately culminate in a state of dementia where all

    � Peng [email protected]

    Xiaoli [email protected]

    1 Computer Science and Engineering, Northeastern University,Shenyang, China

    2 Key Laboratory of Medical Image Computing of Ministryof Education, Northeastern University, Shenyang, China

    3 College of Information Science and Technology,Northeast Normal University, Changchun, China

    4 Key Laboratory of Applied Statistics ofMOE, Changchun, China

    cognitive functions are affected. The disease poses a seriouschallenge to the aging society. The worldwide prevalenceof AD is predicted to quadruple from 46.8 million in 2016(Alzheimer’s Association and et al. 2016) to 131.5 millionby 2050 according to ADI’s World Alzheimer Report(Batsch and Mittelman 2015). Dementia also has a hugeeconomic impact. Today, the total estimated worldwide costof dementia of US is $818 billion, and it will achieve atrillion dollar disease by 2018.

    Predicting cognitive performance of subjects from neu-roimaging measures and identifying relevant imaging bio-markers are important focuses of the study of Alzheimer’sdisease. Magnetic resonance imaging (MRI) allows thedirect observation of brain changes such as cerebral atrophyor ventricular expansion (Castellani et al. 2010). Previ-ous work showed that brain atrophy detected by MRI iscorrelated with neuropsychological deficits (Frisoni et al.2010). The relationships between commonly used cognitivemeasures and structural changes detected by MRI have been

    http://crossmark.crossref.org/dialog/?doi=10.1007/s12021-018-9398-5&domain=pdfhttp://orcid.org/0000-0003-2274-6180mailto: [email protected]: [email protected]

  • Neuroinform

    previously studied using regression models. Many analy-ses have demonstrated a relationship between baseline MRIfeatures and cognitive measures. The most commonly usedcognitive measures are the Alzheimer’s Disease AssessmentScale cognitive total score (ADAS), the Mini Mental StateExam score (MMSE), and the Rey Auditory Verbal Learn-ing Test (RAVLT). ADAS is the gold standard in AD drugtrials for cognitive function assessment, and is the mostpopular cognitive testing instrument to measure the severityof the most important symptoms of AD. MMSE mea-sures cognitive impairment, including orientation to timeand place, attention and calculation, immediate and delayedrecall of words, language and visuo-constructional func-tions. RAVLT measures episodic memory and is used forthe diagnosis of memory disturbances, including eight recalltrials and a recognition test.

    Early studies focused on traditional regression modelsto predict cognitive outcomes one at a time. To achievemore appropriate predictive models of performance andidentify relevant imaging biomarkers, many previous worksformulated the prediction of multiple cognitive outcomesas a multi-task learning problem and developed regularizedmulti-task learning methods to model disease cognitive out-comes (Wan et al. 2012, 2014; Zhou et al. 2013; Wanget al. 2012). Multi-task learning (MTL) (Caruana 1998)describes a learning paradigm that seeks to improve the gen-eralization performance of a learning task with the help ofother related tasks. The fundamental hypothesis of the MTLmethods is to assume that if tasks are related then learn-ing of one task can benefit from the learning of other tasks.Learning multiple related tasks simultaneously has beentheoretically and empirically shown to significantly improveperformance. The key of MTL is how to exploit correlationamong tasks via an appropriate shared representation. Twopopular shared representations to model task relatedness aremodel parameter sharing (Argyriou et al. 2008; Jebara 2011)and feature representation sharing (Evgeniou and learning2004; Yu et al. 2005; Xue et al. 2007). There are inherentcorrelations among different cognitive scores. Therefore,the prediction of different types of cognitive scores can bemodeled as an MTL formulation, and the tasks are relatedin the sense that they all share a small set of features, whichis multi-task feature learning (MTFL) problem. To solveMTFL problems, regularization has been introduced toproduce better performance than traditional solution usingsingle-task learning. The most commonly used regulariza-tion is �2,1-norm (Liu et al. 2009), which is employed toextract features that impact all or most clinical scores, sincethe assumption is that a given imaging marker can affectmultiple cognitive scores and only a subset of brain regions(region-of-interest, ROI) are relevant. (Wang et al. 2011)and (Zhang and Yeung 2012a) employed multi-task featurelearning strategies to select biomarkers that could predict

    multiple clinical scores. Specifically, (Wang et al. 2011)employed an �1-norm regularizer to impose sparsity amongall elements and proposed the use of a combined �2,1-norm and �1-norm regularizations to select features. (Zhanget al. 2012) proposed a multi-task learning with �2,1-normto select a common subset of relevant features for multiplevariables from each modality.

    A major limitation of existing MTFL methods is thatcomplex relationships among imaging markers and amongcognitive outcomes are often ignored. The correlating ofmultiple prediction models assumes that all tasks sharedthe same feature subset. This is not a realistic assumption,since it treats all cognitive outcomes (response) andMRI features (predictors) equally and neglects underlyingcorrelations between the cognitive tasks and structure withinMRI features. Specifically, 1) for the cognitive outcomes,each assessment typically yields multiple evaluation scoresfrom a set of relevant cognitive tasks, and thus thesescores are inherently correlated. An example would bethe scores of TOTAL and TOT6 in the RAVLT. Differentassessments can evaluate different cognitive functions,resulting in low correlation and preferring different brainregions. For example, the tasks in TRAILS aim to testa combination of visual, motor, and executive functions,while the set of RAVLT aims testing of verbal learningmemory. It is reasonable to assume that correlationsamong tasks are not equal, and some tasks may be moreclosely related than others in assessment tests of cognitiveoutcomes. 2) On the other hand, for MRI data, many MRIfeatures are interrelated and together reveal brain cognitivefunctions (Yan et al. 2015). In our data, multiple shapemeasures (volume, area, and thickness) from the sameregion provide a comprehensively quantitative evaluation ofcortical atrophy, and tend to be selected together as jointpredictors. Our previous study proposed a model in whichprior knowledge guided multi-task feature learning model.Using group information to enforce intra-group similarityhas been demonstrated to be an effective approach (Liu et al.2017). Overall, it is important to explore and utilize suchinterrelated structures and select important and structurallycorrelated features together.

    To address these model limitations, we designed anovel multi-task feature learning that models a commonrepresentation with respect to MRI features across tasksas well as the local task structure with respect to brainregions. Specifically, we designed novel mixed structuredsparsity norms, called fused group lasso, to capture theunderlying structures at the level of tasks and features.This regularizer is based on the natural assumption that ifsome tasks are correlated, they should have a small similarweight vector and similar selected brain regions. It penalizesdifferences between prediction models of highly correlatedtasks, and encourages similarity in the selected features

  • Neuroinform

    of highly correlated tasks. To discover such dependentstructures among the cognitive outcomes, we employed thePearson correlation coefficient to uncover the interrelationsamong cognitive measures and estimated the correlationmatrix of all tasks. In our work, all the cognitive measures(20 in total) in the ADNI dataset were used to exploit therelationship. To the best of our knowledge, our approachis the first work to analyze all cognitive measures in theADNI dataset and their relationships. With estimated taskcorrelation, we employ the idea of fused lasso to capturethe dependence of response variables. At the same time,taking into account the group structure among predictors,prior group information is incorporated into the fused lassonorm, promoting intra-group similarity with group sparsity.By incorporated fused group lasso into the MTFL model,we can better understand the underlying associations ofprediction tasks of cognitive measures, allowing more stableidentification of cognition-relevant imaging markers. Theresulting formulation is challenging to solve due to the useof non-smooth penalties including the fused group lasso andthe �2,1-norm. An effective ADMM algorithm is proposedto tackle the complex non-smoothness.

    Through empirical evaluation and comparison withdifferent baseline methods and recently developed MTLmethods using data from ADNI, we illustrate that the pro-posed FGL-MTFL method outperforms other methods.Improvements are statistically significant for most scores(tasks). The results demonstrate that incorporation of thefused group lasso into the traditional MTFL formulationimproves predictive performance relative to traditionalmachine learning methods. We discuss the most prominentROIs and task correlations identified by FGL-MTFL. Wefound that the results corroborate previous studies in neu-roscience. Our previous works also formulate predictiontasks with a multi-task learning scheme. The algorithmSMKMTL (Sparse multi-kernel based multi-task learning)in (Cao et al. 2017) exploits a nonlinear prediction modelbased on multi-kernel learning. However, it assumes thatcorrelations among tasks are equal and the features are inde-pendent. Although the GSGL-MTL (Group-guided SparseGroup Lasso regularized multi-task learning) algorithm (Liuet al. 2017) exploits the group structure of features byincorporating a priori group information, it does not con-sider the complex relationships among cognitive outcomes.

    The rest of the paper is organized as follows. In“Preliminary Methodology”, we provide a description of thepreliminary methodology: multi-task learning (MTL), �2,1-norm, group lasso norm, and fused lasso norm. A detailedmathematical formulation and optimization of FGL-MTFLis provided in “Fused Group Lasso Regularized Multi-TaskFeature Learning, FGL-MTFL”. In “Experimental Resultsand Discussions”, we present the experimental results andevaluate the performance of FGL-MTFL using data from

    the ADNI-1 dataset. The conclusions are presented in“Conclusion”.

    Preliminary Methodology

    Multi-Task Learning

    Consider a multi-task learning (MTL) setting with k tasks.Let p be the number of covariates, shared across all thetasks, and let n be the number of samples. Let X ∈ Rn×pdenote the matrix of covariates, Y ∈ Rn×k be the matrixof responses with each row corresponding to a sample,and � ∈ Rp×k denote the parameter matrix, with columnθ.m ∈ Rp corresponding to task m, m = 1, . . . , k, and rowθj . ∈ Rk corresponding to feature j , j = 1, . . . , p. TheMTL problem can be constructed by estimating the param-eters based on suitable regularized loss function. In ordereffectively to associate imaging markers and cognitive mea-sures, the MTL model minimizes the following objective:

    min�∈Rp×k

    L(Y, X, �) + λR(�) , (1)

    where L(·) denotes the loss function and R(·) is theregularizer. In the current context, we assume the loss to bethe square loss, i.e.,

    L(Y, X, �) = ‖Y − X�‖2F =n∑

    i=1‖yi − xi�‖22 , (2)

    where yi ∈ R1×k, xi ∈ R1×p are the i-th rows of Y, X,respectively corresponding to the multi-task response andcovariates for the i-th sample. We note that the MTL frame-work can be easily extended to other loss functions. Clearly,different choices of the penalty R(�) may present quite dif-ferent multi-task methods. Using some prior knowledge, wethen add penalty R(�) to encode relatedness among tasks.

    �2,1-norm

    One appealing property of the �2,1-norm regularization isthat it encourages multiple predictors from different tasks toshare similar parameter sparsity patterns. The MTFL modelvia the �2,1-norm regularization considers

    ‖�‖2,1 =p∑

    j=1‖θj .‖2 , (3)

    and is suitable for the simultaneous prioritization of sparsityover features for all tasks.

    The key point of Eq. 3 is the use of �2-norm for θj .,which forces grouping of the weights corresponding to thej -th feature across multiple tasks and tends to select features

  • Neuroinform

    based on the joint strength of k tasks jointly (See Fig. 1a).There is a correlation among multiple cognitive tests. Arelevant imaging predictor typically may have more or lessinfluence on all these scores, and it is possible that only asubset of brain regions are relevant to each assessment. Byemploying MTFL, the correlation among different tasks canbe incorporated into the model to build a more appropriatepredictive model and identify a subset of features. The rowsof � are equally treated in MTFL, which implies that theunderlying structures among predictors are ignored.

    G2,1-norm

    Despite the above achievements, few regression modelstake into account the covariance structure among predictors.To achieve a certain function, brain imaging measures areoften correlated with each other. For MRI data, groups

    correspond to specific regions-of-interest (ROIs) in thebrain, e.g., entorhinal and hippocampus. Individual featuresare the specific properties of those regions, e.g., corticalvolume and thickness. In this study, for each region (group),multiple features were extracted to measure the atrophyinformation of each ROI involving cortical thickness,surface area, and volume from gray matter and white matter.Multiple shape measures from the same region provide acomprehensively quantitative evaluation of cortical atrophy,and tend to be selected together as joint predictors.

    We assume the p covariates to be divided into g disjointgroups G�, � = 1, . . . , g, with each group including ν�covariates respectively. In the context of AD, each groupcorresponds to a region-of-interest (ROI) in the brain, andthe covariates in each group correspond to specific featuresof that region. For AD, the number of features in eachgroup, ν�, is 1 or 4, and the number of groups g can be in

    Fig. 1 The illustration of three different regularizations. Each col-umn of � is corresponding to a single task and each row representsa feature dimension. The MRI features in each region belong to agroup. We assume the p features to be divided into g disjoint groups

    G�, � = 1, . . . , g, with each group having ν� features respectively. Foreach element in �, white color means zero-valued elements and colorindicates non-zero values

  • Neuroinform

    the hundreds. We then introduce a G2,1-norm according tothe relationship between brain regions (ROIs) and cognitivetasks, and encourage a task-specific subset of ROIs (SeeFig. 1b). The G2,1-norm ‖�‖G2,1 is defined as:

    ‖�‖G2,1 =g∑

    �=1

    k∑

    m=1w�‖θG�,m‖2 . (4)

    where w� = √ν� is the weight for each group and θG�,m ∈R

    ν� is the coefficient vector for group G� and task m.

    Fused Lasso

    Fused lasso is one of these variants, where pairwisedifferences between variables are penalized using the �1norm, which results in successive variables being similar.The Fused lasso norm is defined as:

    ‖H�T‖1 =k−1∑

    m=1|θ.m − θ.m+1| , (5)

    whereH is a (k − 1) × k sparse matrix withHm,m = 1, andHm,m+1 = −1.

    It encourages θ.m and θ.m+1 to take the same valueby shrinking the difference between them toward zero.This approach has been employed to incorporate temporalsmoothness to model disease progression. In longitudinalmodel, it is assumed that the difference of the cognitivescores between two successive time points is relativelysmall.

    Fused Group Lasso RegularizedMulti-TaskFeature Learning, FGL-MTFL

    Formulation

    There are two limitations of traditional MTFL. Whenregularized by the �2,1-norm in Eq. 3, sparsity is achievedby treating each task equally, which ignores the underlyingstructures among predictors. On the other hand, for somehighly correlated features, the traditional MTFL tends toidentify one and ignore the others, which was inadequatefor yielding a biologically meaningful interpretation. Ourprevious work exploited relationships of features in thelearning procedure, which motivated us to consider theunderlying structure of these relationships. To address thelimitations of MTFL with �2,1-norm, we consider thestructure of task and features instead of assuming that alltasks have similar weights and commonly selected features.More specifically, we propose a new regularization formulti-task feature learning, called the fused group lasso.The underlying idea is that if tasks are highly correlatedaccording to the task interrelations, the tasks are supposed toshare common brain regions, but tasks with low correlationare more likely to have different brain regions.

    To model the task-feature relationships, we first constructa graph G to model the task correlation, as shown in Fig. 2(left). Let G = (V , E) be an undirected graph where Vrepresents the set of vertices and E is the set of edges. Eachtask is treated as a graph node, and each edge (pairwise link)em,l(m, l) ∈ E in G corresponds to an edge from the m-th task to the l-th task, and |rm,l | encodes the strength ofthe relationship between the m-th task and the l-th task. In

    Fig. 2 The illustration of the FGL-MTFL method. The methodinvolves two steps: 1) estimation of task correlation and construct anundirected graph (Left); and 2) joint learning of all regression models

    in a fused group lasso regularized multi-task feature learning (FGL-MTFL) formulation based on the estimated correlation matrix (Right)

  • Neuroinform

    this model, we adopt a simple and commonly-used approachto infer G from data. In this approach, we first computepairwise Pearson correlation coefficients for each pair oftasks, and then connect two nodes with an edge only if theircorrelation coefficient is above a given threshold τ .

    Brain imaging measures are often correlated with eachother, so incorporation of the covariance of MRI featurescan improve the performance of traditional MTL methods.Thus, to identify biologically meaningful markers, weutilize prior knowledge of interrelated structure to grouprelated features together to guide the learning process. Thebenefit of this strategy was also revealed in our previouswork (Liu et al. 2017) , in which we proposed a grouplasso multi-task learning algorithm that explicitly modelsthe correlation structure within features, and achieved goodperformance in the prediction of cognitive scores fromimaging measures.

    Once the graph of task correlation G is constructed, wethen integrate the group structure of features into the fusedlasso norm and propose a new fused group lasso regularizedmulti-task feature learning (FGL-MTFL) model as follows.

    min�

    1

    2‖Y − X�‖2F + λ1‖�‖2,1 + λ2

    (m,l)∈E|rm,l |‖θ.m

    −sign(rm,l)θ.l‖G2,1 , (6)

    where |rm,l | indicates the strength weight of the correlationbetween two tasks connected by an edge, λ1 and λ2 rep-resent regularization parameters that determine the amountof two separate penalties. In the new fused group lassonorm, if the two tasks m and l are highly correlated, thedifference between the two corresponding regression coef-ficients θ.m and θ.l will be penalized more than differencesfor other pairs of tasks with weaker correlation. The regular-ization tends to flatten the values of regression coefficientsfor each feature across multiple highly correlated tasks, sothat the strength of the influence of each task becomesmore similar across those tasks.

    The FGL-MTFL model includes two regularizationprocesses: (1) all tasks are regularized by the �2,1-norm regularizer, which captures global relationships byencouraging multiple predictors across tasks to share similarparameter sparsity patterns. This ensures that a small subsetof features will be selected for all regression modelsfrom a global perspective. (2) Two local structures (graphstructures within tasks and group structures within features)are considered from a local perspective by the proposedfused group lasso (FGL) regularizer, which combines twostructured sparsity regularizers (fused lasso and group lasso)to capture the specific task-ROI specific structures. Thisencourages the matching of related tasks to similar selectedbrain regions.

    This new model not only preserves the strength of �2,1-norm to require similarity across multiple scores from acognitive test, but also considers the complex graph struc-ture among responses and the interrelated group structureof imaging predictors. To the best of our knowledge, nosparsity-based algorithm has been described that includesboth global and local task correlation for the predictionof cognitive outcomes in AD.

    We used a symmetric correlation matrix D ∈ Rk×kto describe the correlation in the fusion penalty, which isdefined as:

    Dm,l =

    ⎧⎪⎪⎪⎪⎨

    ⎪⎪⎪⎪⎩

    −rm,l (m, l) ∈ E,m �= lk∑

    m=1m �=l

    |rm,l | (m, l) ∈ E,m = l

    0 otherwise.

    (7)

    where m, l = 1, 2, . . . , k. We normalize the matrix by κ ,which is the number of edges in E, defined as Z = (1/κ)D.Then the formulation can be written into the followingmatrix form:

    min�

    1

    2‖Y − X�‖2F + λ1‖�‖2,1 + λ2‖�Z‖G2,1 , (8)

    The framework of the proposed method is illustrated inFig. 2.

    Efficient Optimization for FGL-MTFL

    ADMM

    In recent years, ADMM has become popular since it is ofteneasy to parallelize distributed convex problems. In ADMM,the solutions to small local subproblems are coordinated toidentify the globally optimal solution (Boyd et al. 2011).

    minx,z

    f (x) + g(z)s.t . Ax + Bz = c.

    The variant augmented Largrangian of ADMM methodis formulated as follows:

    Lρ(x, z, u) = f (x) + g(z) + uT (Ax + Bz − c)+ρ2

    ‖Ax + Bz − c‖2

    where f and g are convex functions, and variables A ∈R

    p×n, x ∈ Rn, B ∈ Rp×m, z ∈ Rm, c ∈ Rp. u is a scaleddual augmented Lagrangian multiplier, and ρ is a non-negative penalty parameter. In each iteration of ADMM, thisproblem is solved by alternating minimization Lρ(x, z, u)

  • Neuroinform

    over x, z, and u. At the (k + 1)-th iteration, the update ofADMM is carried out by:

    xk+1 := argminx

    (x, zk, uk

    )

    zk+1 := argminz

    (xk+1, z, uk

    )

    uk+1 := uk + ρ(Axk+1 + Bzk+1 − c

    )

    Efficient Optimization for FGL-MTFL

    We developed an efficient ADMM-based algorithm to solvethe objective function in Eq. 8, which is equivalent to thefollowing constrained optimization problem:

    min�,Q,S

    1

    2‖Y − X�‖2F + λ1‖Q‖2,1 + λ2‖S‖G2,1

    s.t.� − Q = 0, �Z − S = 0 , (9)

    where Q, S are slack variables. Then Eq. 9 can be solved byADMM. The augmented Lagrangian is

    Lρ(�, Q, S, U, V ) = 12‖Y − X�‖2F + λ1‖Q‖2,1

    +λ2‖S‖G2,1 + 〈U, � − Q〉 +ρ

    2‖�

    −Q‖2 + 〈V,�Z − S〉+ρ2

    ‖�Z − S‖2 , (10)

    where U and V are augmented Lagrangian multipliers.Update �: From the augmented Lagrangian in Eq. 10,

    the update of � at (t + 1)-th iteration is carried out by:

    �(t+1) = argmin�

    1

    2‖Y − X�‖2F +

    〈U(t), �

    − Q(t)〉+ ρ

    2

    ∥∥∥� − Q(t)∥∥∥2

    +〈V (t), �Z − S(t)

    〉+ ρ

    2

    ∥∥∥�Z − S(t)∥∥∥2

    (11)

    which is the closed form and can be derived by settingEq. 11 to zero.

    0 = −XT(Y − X�) + U(t) + ρ(� − Q(t)

    )+ V (t)Z

    + ρ(�Z − S(t)

    )Z (12)

    Note that Z is a symmetric matrix. We define = ZZ ,where is also a symmetric matrix with m,l denoting thevalue of weight (m, l). With this linearization, the value for� can be updated in parallel by the individual θ.m. Thus

    in the (t + 1)-th iteration, θ(t+1).m can be updated efficientlyusing Cholesky factorization.

    0 = −XT (ym − Xθ.m) + u(t).m + ρ(θ.m − q(t).m

    )

    +

    ⎜⎜⎝v(t).m −

    k∑

    m=1m �=l

    Zm,lv(t).l

    ⎟⎟⎠

    ⎜⎜⎝m,mθ.m −k∑

    m=1m �=l

    m,lθ.l

    ⎟⎟⎠

    −ρ

    ⎜⎜⎝s(t).m −

    k∑

    m=1m �=l

    Zm,ls(t).l

    ⎟⎟⎠ (13)

    The above optimization problem is quadratic. Theoptimal solution is given by θ(t+1).m = F−1m b(t)m , whereFm = XTX + ρ(1 + m,m)I

    b(t)m = XTym − u(t).m −

    ⎜⎜⎝v(t).m −

    k∑

    m=1m �=l

    Zm,lv(t).l

    ⎟⎟⎠ + ρq(t).m

    + ρ

    ⎜⎜⎝s(t).m −

    k∑

    m=1m �=l

    Zm,ls(t).l

    ⎟⎟⎠ + ρk∑

    m=1m �=l

    m,lθ.l . (14)

    The computation of θ(t+1).m involves solving a linearsystem, which is the most time-consuming part of the wholealgorithm. To efficiently compute θ(t+1).m efficiently, we cancompute the Cholesky factorization of F at the beginning ofthe algorithm:

    Fm = ATmAm . (15)Note that F is a constant and positive definite matrix.

    Using Cholesky factorization, we only need to solve thefollowing two linear systems for each iteration:

    ATmθ̂.m = b(t), Aθ.m = θ̂.m . (16)Since Am is an upper triangular matrix, it is very efficient

    to solve these two linear systems.Update Q: The update for Q effectively needs to solve

    the following problem

    Q(t+1) = argminQ

    ρ

    2

    ∥∥∥Q − �(t+1)∥∥∥2 + λ1‖Q‖2,1

    −〈U(t), Q

    〉, (17)

    which is equivalent to the following problem:

    Q(t+1) =argminQ

    {λ1

    ρ‖Q‖2,1 + 1

    2‖Q − O(t+1)‖2

    }. (18)

  • Neuroinform

    where O(t+1) = �(t+1) + 1ρU(t). It is clear that Eq. 18 can

    be decoupled into

    q(t+1)i. = argminqi. φ(qi.)

    = argminqi.

    {1

    2‖qi. − o(t+1)i. ‖2 +

    λ1

    ρ‖qi.‖

    }(19)

    where qi. and oi. are the i-th row of Q(t+1) and O(t+1),respectively. Since φ(qi.) is strictly convex, we concludethat q(t+1)i. is its unique minimizer. Then we introduce thefollowing lemma (Liu et al. 2009) to solve the Eq. 19.

    Lemma 1 For any λ1 ≥ 0, we have

    qi. =max

    {‖oi.‖2 − λ1ρ , 0

    }

    ‖oi.‖2 oi. , (20)

    Update S: The update for S effectively needs to solvethe following problem

    S(t+1) = argminS

    ρ

    2

    ∥∥∥S − �(t+1)Z∥∥∥2 + λ2‖S‖G2,1

    −〈V (t), S

    〉, (21)

    which is effectively equivalent to computation of theproximal operator for G2,1-norm. In particular, the problemcan be written as

    S(t+1) = argminS

    {λ2

    ρ‖S‖G2,1 +

    1

    2‖S − �(t+1)‖2

    }. (22)

    where �(t+1) = �(t+1)Z+ 1ρV (t). Since the groups G� used

    in our work are disjointed, the Eq. 22 can be decoupled into

    s(t+1)Glm = argminsGlm φ(sGlm)

    = argminsGlm

    {1

    2‖sGlm − π(t+1)Glm ‖2 +

    λ2

    ρ‖sGlm‖

    }

    (23)

    where sG�m, πG�m are rows in group G� for task m of S(t+1)and �(t+1), respectively. Then we introduce the followinglemma (Yuan et al. 2013).

    Lemma 2 For any λ2 ≥ 0, we have

    sG�m =max

    {‖πG�m‖2 − λ2ν�ρ , 0

    }

    ‖πG�m‖2πG�m , (24)

    Dual Update for U and V : Following standard ADMMdual update, the update for the dual variables according toour setting is as follows:

    U(t+1) = U(t) + ρ(�(t+1) − Q(t+1)) (25a)V (t+1) = V (t) + ρ(�(t+1)Z − S(t+1)) . (25b)The dual updates can be performed in an element-

    wise parallel manner. Algorithm 1 summarizes the whole

    algorithm. The MATLAB codes of the proposed algorithmare available at: https://bitbucket.org/XIAOLILIU/fgl-mtfl.

    Convergence

    The convergence of the Algorithm 1 is shown in thefollowing theorem

    Theorem 3 Suppose there exists at least one solution �∗ ofEq. 8. Assume L(�) is convex, λ1 > 0, λ2 > 0. Then thefollowing property for FGL-MTFL iteration in Algorithm 1holds:

    limt→∞ L(�

    (t)) + λ1∥∥∥�(t)

    ∥∥∥2,1

    + λ2∥∥∥�(t)Z

    ∥∥∥G2,1

    = L(�∗) + λ1∥∥�∗

    ∥∥2,1 + λ2

    ∥∥�∗Z∥∥

    G2,1(26)

    Furthermore,

    limt→∞

    ∥∥∥�(t) − �∗∥∥∥ = 0 (27)

    whenever Eq. 8 has a unique solution.

    Note that the condition that allowed convergence inTheorem 3 is quite easy to satisfy. λ1, λ2 are regularizationparameters and should always be larger than zero. Thedetailed proof is discussed in (Cai et al. 2009). Unlike Caiet al., we do not require L(�) to be differentiable, andexplicitly treat the non-differentiability of L(�) by using itssubgradient vector ∂L(�), similar to the strategy used by(Ye and Xie 2011).

    Experimental Results and Discussions

    In this section, we present experimental results to demon-strate the effectiveness of the proposed FGL-MTFLto characterize AD progression using a dataset fromthe Alzheimer’s Disease Neuroimaging Initiative (ADNI)(Weiner et al. 2010).

    https://bitbucket.org/XIAOLILIU/fgl-mtfl

  • Neuroinform

    Experimental Setup

    MR images and data used in this work were obtained fromthe Alzheimers Disease Neuroimaging Initiative (ADNI)database (adni.loni.ucla.edu) (Weiner et al. 2010). The pri-mary goal of ADNI is to test whether serial MRI, PET,other biological markers, and clinical and neuropsychologi-cal assessments can be combined to measure the progressionof MCI and early AD. Approaches to characterize AD pro-gression will help researchers and clinicians develop newtreatments and monitor their effectiveness. Further, a bet-ter understanding of disease progression will increase thesafety and efficacy of drug development and potentiallydecrease the time and cost of clinical trails. In ADNI, allparticipants received 1.5 Tesla (T) structural MRI. The MRIfeatures used in our experiments are based on imaging datafrom the ADNI database processed by a team from UCSF(University of California at San Francisco), who performedcortical reconstruction and volumetric segmentations withthe FreeSurfer image analysis suite (http://surfer.nmr.mgh.harvard.edu/) according to the atlas generated in (Desikanet al. 2006). The FreeSurfer software was employed to auto-matically label the cortical and subcortical tissue classes forthe structural MRI scan of each subject and to extract thick-ness measures of the cortical regions of interests (ROIs) andvolume measures of cortical and subcortical regions.

    Briefly, this processing includes motion correctionand the averaging (Reuter et al. 2010) of multiplevolumetric T1-weighted images (when more than one isavailable), removal of non-brain tissue using a hybridwatershed/surface deformation procedure (Segonne et al.2004), automated Talairach transformation, segmentationof the subcortical white matter and deep gray mattervolumetric structures (including hippocampus, amygdala,caudate, putamen, and ventricles) (Fischl et al. 2002, 2004),intensity normalization (Sled et al. 1998), tessellation ofthe gray matter white matter boundary, automated topologycorrection (Fischl et al. 2001; Ségonne et al. 2007),and surface deformation following intensity gradients tooptimally place the gray/white and gray/cerebrospinal fluidborders at the location where the greatest shift in intensitydefines the transition to the other tissue class (Dale et al.1999; Dale and Sereno 1993).

    In total, 48 cortical regions and 44 subcortical regionswere generated, , with typically 1 or 4 features in eachgroup. The names of the cortical and subcortical regionsare listed in Tables 1 and 2. For each cortical region,the cortical thickness average (TA), standard deviation ofthickness (TS), surface area (SA), and cortical volume (CV)were calculated as features. For each subcortical region,the subcortical volume was calculated as a feature. Theseparate SA values for the left and right hemisphere andthe total intracranial volume (ICV) were also included.

    Table 1 Cortical features from the following 71 (= 35×2+1) corticalregions generated by FreeSurfer

    ID ROI name Laterality Type

    1 Banks superior temporal sulcus L, R CV, SA, TA, TS

    2 Caudal anterior cingulate cortex L, R CV, SA, TA, TS

    3 Caudal middle frontal gyrus L, R CV, SA, TA, TS

    4 Cuneus cortex L, R CV, SA, TA, TS

    5 Entorhinal cortex L, R CV, SA, TA, TS

    6 Frontal pole L, R CV, SA, TA, TS

    7 Fusiform gyrus L, R CV, SA, TA, TS

    8 Inferior parietal cortex L, R CV, SA, TA, TS

    9 Inferior temporal gyrus L, R CV, SA, TA, TS

    10 Insula L, R CV, SA, TA, TS

    11 IsthmusCingulate L, R CV, SA, TA, TS

    12 Lateral occipital cortex L, R CV, SA, TA, TS

    13 Lateral orbital frontal cortex L, R CV, SA, TA, TS

    14 Lingual gyrus L, R CV, SA, TA, TS

    15 Medial orbital frontal cortex L, R CV, SA, TA, TS

    16 Middle temporal gyrus L, R CV, SA, TA, TS

    17 Paracentral lobule L, R CV, SA, TA, TS

    18 Parahippocampal gyrus L, R CV, SA, TA, TS

    19 Pars opercularis L, R CV, SA, TA, TS

    20 Pars orbitalis L, R CV, SA, TA, TS

    21 Pars triangularis L, R CV, SA, TA, TS

    22 Pericalcarine cortex L, R CV, SA, TA, TS

    23 Postcentral gyrus L, R CV, SA, TA, TS

    24 Posterior cingulate cortex L, R CV, SA, TA, TS

    25 Precentral gyrus L, R CV, SA, TA, TS

    26 Precuneus cortex L, R CV, SA, TA, TS

    27 Rostral anterior cingulate cortex L, R CV, SA, TA, TS

    28 Rostral middle frontal gyrus L, R CV, SA, TA, TS

    29 Superior frontal gyrus L, R CV, SA, TA, TS

    30 Superior parietal cortex L, R CV, SA, TA, TS

    31 Superior temporal gyrus L, R CV, SA, TA, TS

    32 Supramarginal gyrus L, R CV, SA, TA, TS

    33 Temporal pole L, R CV, SA, TA, TS

    34 Transverse temporal cortex L, R CV, SA, TA, TS

    35 Hemisphere L, R SA

    36 Total intracranial volume Bilateral CV

    275 (= 34 × 2 × 4 + 1 × 2 × 1 + 1) cortical features calculatedwere analyzed in this study. Laterality indicates different feature typescalculated for L (left hemisphere), R (right hemisphere) or Bilateral(whole hemisphere)

    This yielded a total of p = 319 MRI features extractedfrom cortical/subcortical ROIs in each hemisphere (Tables 1and 2). Details of the analysis procedure are available athttp://adni.loni.ucla.edu/research/mri-post-processing/.

    The ADNI project is a longitudinal study, with datacollected repeatedly over a 6-month or 1-year interval. The

    http://surfer.nmr.mgh.harvard.edu/http://surfer.nmr.mgh.harvard.edu/http://adni.loni.ucla.edu/research/mri-post-processing/

  • Neuroinform

    Table 2 Subcortical features from the following 44 (= 16 × 2 + 12)subcortical regions generated by FreeSurfer

    number ROI Laterality Type

    1 Accumbens area L, R SV

    2 Amygdala L, R SV

    3 Caudate L, R SV

    4 Cerebellum cortex L, R SV

    5 Cerebellum white matter L, R SV

    6 Cerebral cortex L, R SV

    7 Cerebral white matter L, R SV

    8 Choroid plexus L, R SV

    9 Hippocampus L, R SV

    10 Inferior lateral ventricle L, R SV

    11 Lateral ventricle L, R SV

    12 Pallidum L, R SV

    13 Putamen L, R SV

    14 Thalamus L, R SV

    15 Ventricle diencephalon L, R SV

    16 Vessel L, R SV

    17 Brain stem Bilateral SV

    18 Corpus callosum anterior Bilateral SV

    19 Corpus callosum central Bilateral SV

    20 Corpus callosum middle anterior Bilateral SV

    21 Corpus callosum middle posterior Bilateral SV

    22 Corpus callosum posterior Bilateral SV

    23 Cerebrospinal fluid Bilateral SV

    24 Fourth ventricle Bilateral SV

    25 Non white matter hypointensities Bilateral SV

    26 Optic chiasm Bilateral SV

    27 Third ventricle Bilateral SV

    28 White matter hypointensities Bilateral SV

    44 subcortical features calculated were analyzed in this study.Laterality indicates different feature types calculated for L (lefthemisphere), R (right hemisphere) or Bilateral (whole hemisphere)

    scheduled screening date for subjects becomes baselineafter approval and the time point for the follow-up visits isdenoted by the duration starting from the baseline. In ourcurrent work, we investigated the prediction performance ofour method to infer cognitive outcomes based on a numberof neuropsychological assessments at the time of the initialbaseline. In this work, we further performed the followingpreprocessing steps:

    – remove features with more than 10% missing entries(for all patients and all time points);

    – remove the ROI whose name is “unknown;– remove the instances with missing value of cognitive

    scores;– exclude patients without baseline MRI records;– complete the missing entries using the average value.

    This yields a total of n = 788 subjects, who werethen categorized into three baseline diagnostic groups:Cognitively Normal (CN, n1 = 225), Mild CognitiveImpairment (MCI, n2 = 390), and Alzheimer’s Disease(AD, n3 = 173). Table 3 lists the demographics informationof all subjects, including age, gender, and education. Weused 10-fold cross validation to evaluate our model andconducted the comparison. In each of ten trials, a 5-fold nested cross validation procedure was employed totune the regularization parameters. The data were z-scoredbefore applying the regression methods. The range of eachparameter varied from 10−1 to 103. The average (avg)and standard deviation (std) of performance measures werecalculated and shown as avg ± std for each experiment.For each run, all methods received exactly the same trainand test set. The reported results are the best results ofeach method with the optimal parameters. For predictivemodeling, all the cognitive assessments (a total of 20 tasks)in Table 4 were examined. To the best of our knowledge, noprevious works have used all the cognitive scores to trainand evaluate their MTL models.

    For the quantitative performance evaluation, weemployed the metrics of Correlation Coefficient (CC) andRoot Mean Squared Error (rMSE) between the predictedclinical scores and the target clinical scores for each regres-sion task. To evaluate the overall performance for all tasks,the normalized mean squared error (nMSE) (Argyriou et al.2008; Zhou et al. 2013) and weighted R-value (wR) (Ston-nington et al. 2010) were used. The rMSE, CC, nMSE andwR are defined as follows:

    rMSE(y, ŷ

    ) =∥∥y − ŷ∥∥22

    n(28)

    Corr(y, ŷ

    ) = cov(y, ŷ)σ (y)σ (ŷ)

    (29)

    where y is the ground truth of the target at a singletask and ŷ is the corresponding prediction according toa prediction model, cov is the covariance, and σ is thestandard deviation.

    nMSE(Y, Ŷ

    )=

    ∑kh=1

    ‖Yh−Ŷh‖22σ(Yh)∑k

    h=1nh(30)

    Table 3 Summary of ADNI dataset and subject information

    Category CN MCI AD

    Number 225 390 173

    Gender (M/F) 116/109 252/138 88/85

    Age (y, ag ± sd) 75.87 ± 5.04 74.75 ± 7.39 75.42 ± 7.25Education (y, ag ± sd) 16.03 ± 2.85 15.67 ± 2.95 14.65 ± 3.17

    M, male; F, female; y, years; ag, average; sd, standard deviation

  • Neuroinform

    Table 4 Description of thecognitive scores considered inthe experiments

    Num Score name Description

    1 ADAS Alzheimer’s disease assessment scale

    2 MMSE Mini-mental state exam

    3 RAVLT TOTAL Total score of the first 5 learning trials

    4 TOT6 Trial 6 total number of words recalled

    5 TOTB Immediately after the fifth learning trial

    6 T30 30 minute delay total number of words recalled

    7 RECOG 30 minute delay recognition

    8 FLU ANIM Animal total score

    9 VEG Vegetable total score

    10 TRAILS A Trail making test A score

    11 B Trail making test B score

    12 LOGMEM IMMTOTAL Immediate recall

    13 DELTOTAL Delayed recall

    14 CLOCK DRAW Clock drawing

    15 COPYSCORE Clock copying

    16 BOSNAM Total number correct

    17 ANART ANART total score

    18 DSPAN For Digit span forward

    19 BAC Digit span backward

    20 DIGIT Digit symbol substitution

    wR(Y, Ŷ

    )=

    ∑kh=1Corr(Yh, Ŷh)nh∑k

    h=1nh(31)

    where Y and Ŷ are the ground truth cognitive scores and thepredicted cognitive scores, respectively.

    Smaller values of nMSE and rMSE and larger valuesof CC and wR indicate better regression performance.We report the mean and standard deviation based on 10experimental iterations on different splits of data for allcomparable experiments. A Student’s t-test at a significancelevel of 0.05 is performed to determine whether theperformances difference are significant.

    Comparison with Baseline Comparable Methods

    In this section, we conduct empirical evaluation for the pro-posed methods by comparison with two single-task learningmethods: Lasso and Ridge. Both methods were appliedindependently to each task, and compared to representativemulti-task learning methods:

    1. Multi-task feature learning (MTFL): min�

    12‖Y −

    X�‖2F + λ‖�‖2,1.2. Multi-task feature learning combined with lasso (SGL-

    MTFL): min�

    12‖Y − X�‖2F + λ1‖�‖2,1 + λ2‖�‖1.

    3. Fused lasso regularized multi-task learning (FL-MTL):min�

    12‖Y − X�‖2F + λ‖�Z‖1.

    4. Fused group lasso regularizedmulti-task learning (FGL-MTL): min

    12‖Y − X�‖2F + λ‖�Z‖G2,1 .

    5. Fused lasso regularized multi-task feature learning (FL-MTFL): min

    12‖Y − X�‖2F + λ1‖�‖2,1 + λ2‖�Z‖1.

    6. Group lasso regularized multi-task feature learning(GL-MTFL): min

    12‖Y − X�‖2F + λ1‖�‖2,1 +

    λ2‖�‖G2,1 .7. Fused lasso regularized SGL-MTL (FSGL-MTL):min

    12

    ‖Y − X�‖2F + λ1‖�‖1 + λ2‖�Z‖G2,1 .8. Fused group lasso regularized MTFL (FGL-MTFL):

    min�

    12‖Y − X�‖2F + λ1‖�‖2,1 + λ2‖�Z‖G2,1 .

    The average and standard deviation of performancemeasures were calculated by 10-fold cross validation ondifferent splits of data, and are shown in Tables 5 and 6. Itis worth noting that the same training and testing data wereused across experiments for all methods for fair comparison.Note that all pairwise links between tasks were incorporatedinto both the fused lasso and fused group lasso regularizedmethods, and the graph was created using only the trainingdata.

    From the experimental results in Tables 5 and 6, weobserve the following:

    1. FL-MTFL and FGL-MTFL both demonstrated animproved performance over the other baseline methodsin terms of nMSE and wR, while FGL-MTFLperformed the best among all competing methods. The

  • Neuroinform

    Table5

    Performance

    comparisonof

    variousmethods

    interm

    sof

    CC(the

    first2

    0columns)andwR(the

    lastcolumn)

    on10

    crossvalid

    ationcognitive

    predictio

    ntasks

    Method

    ADAS

    MMSE

    RAVLT

    FLU

    TRAILS

    TOTA

    LTOT6

    TOTB

    T30

    RECOG

    ANIM

    VEG

    AB

    Ridge

    0.598±

    0.053

    0.417±

    0.073

    0.407±

    0.120

    0.352±

    0.131

    0.145±

    0.093

    0.367±

    0.132

    0.259±

    0.107

    0.200±

    0.135

    0.390±

    0.132

    0.316±

    0.111

    0.363±

    0.125

    Lasso

    0.659±

    0.066

    0.547±

    0.065

    0.512±

    0.108

    0.501±

    0.117

    0.312±

    0.083

    0.514±

    0.116

    0.407±

    0.112

    0.353±

    0.095

    0.504±

    0.089

    0.382±

    0.099

    0.477±

    0.094

    MTFL

    0.667±

    0.065

    0.536±

    0.065

    0.515±

    0.121

    0.491±

    0.132

    0.297±

    0.094

    0.509±

    0.123

    0.405±

    0.123

    0.377±

    0.091

    0.505±

    0.089

    0.413±

    0.087

    0.476±

    0.098

    FL-M

    TL

    0.641±

    0.074

    0.523±

    0.095

    0.491±

    0.112

    0.472±

    0.102

    0.287±

    0.085

    0.473±

    0.096

    0.409±

    0.085

    0.355±

    0.107

    0.485±

    0.076

    0.352±

    0.085

    0.385±

    0.159

    FGL-M

    TL

    0.662±

    0.060

    0.557±

    0.065

    0.479±

    0.109

    0.501±

    0.116

    0.286±

    0.080

    0.516±

    0.113

    0.409±

    0.115

    0.370±

    0.106

    0.512±

    0.077

    0.347±

    0.101

    0.351±

    0.119

    SGL-M

    TFL

    0.667±

    0.063

    0.543±

    0.061

    0.517±

    0.120

    0.495±

    0.128

    0.308±

    0.089

    0.513±

    0.120

    0.408±

    0.123

    0.380±

    0.090

    0.506±

    0.089

    0.414±

    0.086

    0.477±

    0.097

    GL-M

    TFL

    0.668±

    0.063

    0.546±

    0.062

    0.517±

    0.119

    0.496±

    0.129

    0.308±

    0.087

    0.514±

    0.120

    0.410±

    0.123

    0.380±

    0.090

    0.508±

    0.088

    0.414±

    0.086

    0.477±

    0.097

    FL-M

    TFL

    0.665±

    0.067

    0.548±

    0.074

    0.522±

    0.118

    0.498±

    0.121

    0.324±

    0.084

    0.513±

    0.115

    0.420±

    0.127

    0.396±

    0.086

    0.510±

    0.084

    0.414±

    0.086

    0.479±

    0.095

    FSGL-M

    TL

    0.623±

    0.071

    0.503±

    0.083

    0.519±

    0.091

    0.437±

    0.101

    0.306±

    0.065

    0.430±

    0.092

    0.352±

    0.122

    0.387±

    0.072

    0.483±

    0.080

    0.379±

    0.104

    0.476±

    0.100

    FGL-M

    TFL

    0.668±

    0.068

    0.549±

    0.068

    0.522±

    0.114

    0.499±

    0.121

    0.324±

    0.069

    0.514±

    0.110

    0.429±

    0.121

    0.399±

    0.081

    0.505±

    0.087

    0.419±

    0.086

    0.481±

    0.095

    Method

    LOGMEM

    CLOCK

    BOSN

    AM

    ANART

    DSP

    AN

    DIG

    ITwR

    IMMTOTA

    LDELT

    OTA

    LDRAW

    COPY

    SCOR

    FOR

    BAC

    Ridge

    0.417±

    0.108

    0.426±

    0.121

    0.222±

    0.100

    0.126±

    0.091

    0.359±

    0.149

    0.054±

    0.071

    0.002±

    0.055

    0.041±

    0.114

    0.389±

    0.049

    0.293±

    0.046∗

    Lasso

    0.493±

    0.093

    0.524±

    0.107

    0.374±

    0.078

    0.209±

    0.050

    0.472±

    0.104

    0.129±

    0.078

    0.025±

    0.083

    0.167±

    0.132

    0.415±

    0.099

    0.399±

    0.049∗

    MTFL

    0.507±

    0.085

    0.519±

    0.099

    0.380±

    0.097

    0.199±

    0.114

    0.468±

    0.126

    0.104±

    0.080

    0.089±

    0.106

    0.153±

    0.081

    0.478±

    0.064

    0.404±

    0.055∗

    FL-M

    TL

    0.470±

    0.097

    0.514±

    0.108

    0.379±

    0.093

    0.221±

    0.102

    0.442±

    0.085

    0.094±

    0.104

    0.083±

    0.102

    0.177±

    0.108

    0.414±

    0.083

    0.383±

    0.046∗

    FGL-M

    TL

    0.499±

    0.093

    0.531±

    0.106

    0.378±

    0.087

    0.212±

    0.098

    0.479±

    0.096

    0.139±

    0.079

    0.110±

    0.105

    0.205±

    0.108

    0.450±

    0.049

    0.400±

    0.047∗

    SGL-M

    TFL

    0.508±

    0.086

    0.520±

    0.098

    0.394±

    0.092

    0.210±

    0.096

    0.471±

    0.118

    0.110±

    0.081

    0.099±

    0.104

    0.167±

    0.087

    0.477±

    0.063

    0.409±

    0.054∗

    GL-M

    TFL

    0.509±

    0.084

    0.521±

    0.098

    0.395±

    0.087

    0.220±

    0.102

    0.471±

    0.118

    0.109±

    0.081

    0.089±

    0.103

    0.169±

    0.085

    0.477±

    0.063

    0.410±

    0.054∗

    FL-M

    TFL

    0.510±

    0.084

    0.526±

    0.095

    0.408±

    0.083

    0.251±

    0.092

    0.471±

    0.096

    0.133±

    0.087

    0.093±

    0.098

    0.208±

    0.113

    0.478±

    0.069

    0.418±

    0.054

    FSGL-M

    TL

    0.452±

    0.073

    0.492±

    0.087

    0.398±

    0.074

    0.2412

    ±0.091

    0.440±

    0.070

    0.119±

    0.113

    0.109±

    0.113

    0.209±

    0.126

    0.435±

    0.085

    0.390±

    0.045∗

    FGL-M

    TFL

    0.514±

    0.084

    0.533±

    0.094

    0.408±

    0.083

    0.252±

    0.092

    0.473±

    0.089

    0.137±

    0.082

    0.096±

    0.090

    0.218±

    0.113

    0.484±

    0.070

    0.421±

    0.052

    Superscriptsym

    bols

    ∗indicates

    thatFG

    L-M

    TFL

    significantly

    outperform

    edthatmethodon

    thatscore.Student’s

    t-testatalevelo

    f0.05

    was

    used

    The

    bold

    valueindicatesthebestperformance

  • Neuroinform

    Table6

    Performance

    comparisonof

    variousmethods

    interm

    sof

    rmse

    (the

    first2

    0columns)andnM

    SE(the

    lastcolumn)

    on10

    crossvalid

    ationcognitive

    predictio

    ntasks

    Method

    ADAS

    MMSE

    RAVLT

    FLU

    TRAILS

    TOTA

    LTOT6

    TOTB

    T30

    RECOG

    ANIM

    VEG

    AB

    Ridge

    7.445±

    0.370

    2.568±

    0.146

    11.16±

    0.735

    3.909±

    0.366

    1.982±

    0.125

    4.061±

    0.287

    4.311±

    0.430

    6.307±

    0.552

    4.276±

    0.386

    26.19±

    3.764

    80.01±

    8.103

    Lasso

    6.727±

    0.424

    2.215±

    0.088

    9.909±

    0.907

    3.305±

    0.266

    1.666±

    0.166

    3.421±

    0.244

    3.634±

    0.247

    5.464±

    0.486

    3.690±

    0.192

    23.29±

    4.297

    69.78±

    5.152

    MTFL

    6.662±

    0.411

    2.191±

    0.106

    9.656±

    0.695

    3.324±

    0.260

    1.671±

    0.149

    3.441±

    0.233

    3.626±

    0.272

    5.266±

    0.448

    3.676±

    0.180

    23.01±

    3.492

    69.88±

    5.281

    FL-M

    TL

    7.012±

    0.471

    2.295±

    0.226

    10.02±

    0.830

    3.429±

    0.435

    1.796±

    0.199

    3.598±

    0.536

    3.684±

    0.319

    5.362±

    0.617

    3.772±

    0.239

    24.77±

    4.299

    78.18±

    10.52

    FGL-M

    TL

    6.691±

    0.458

    2.157±

    0.097

    10.04±

    0.593

    3.309±

    0.271

    1.676±

    0.153

    3.420±

    0.235

    3.627±

    0.265

    5.280±

    0.471

    3.672±

    0.182

    24.85±

    3.600

    81.07±

    7.425

    SGL-M

    TFL

    6.653±

    0.427

    2.191±

    0.098

    9.647±

    0.755

    3.315±

    0.271

    1.664±

    0.150

    3.432±

    0.246

    3.627±

    0.246

    5.260±

    0.499

    3.676±

    0.208

    23.00±

    3.568

    69.82±

    5.184

    GL-M

    TFL

    6.650±

    0.429

    2.186±

    0.099

    9.644±

    0.754

    3.315±

    0.270

    1.664±

    0.150

    3.431±

    0.248

    3.623±

    0.247

    5.261±

    0.497

    3.673±

    0.207

    22.99±

    3.566

    69.82±

    5.177

    FL-M

    TFL

    6.680±

    0.445

    2.204±

    0.096

    9.6223

    ±0.784

    3.324±

    0.273

    1.663±

    0.169

    3.445±

    0.286

    3.617±

    0.192

    5.220±

    0.503

    3.677±

    0.222

    22.97±

    3.660

    69.67±

    5.007

    FSGL-M

    TL

    7.183±

    0.730

    2.523±

    0.124

    9.811±

    0.859

    3.623±

    0.481

    2.009±

    0.180

    3.791±

    0.537

    3.912±

    0.222

    5.365±

    0.635

    3.929±

    0.383

    23.27±

    4.204

    69.75±

    5.084

    FGL-M

    TFL

    6.678±

    0.474

    2.214±

    0.081

    9.631±

    0.788

    3.335±

    0.276

    1.668±

    0.167

    3.456±

    0.305

    3.610±

    0.186

    5.221±

    0.493

    3.697±

    0.223

    22.90±

    3.666

    69.53±

    5.011

    Method

    LOGMEM

    CLOCK

    BOSN

    AM

    ANART

    DSP

    AN

    DIG

    ITnM

    SE

    IMMTOTA

    LDELT

    OTA

    LDRAW

    COPY

    SCOR

    FOR

    BAC

    Ridge

    4.696±

    0.366

    5.277±

    0.509

    1.153±

    0.108

    0.775±

    0.060

    4.563±

    0.540

    11.24±

    0.756

    2.570±

    0.262

    2.559±

    0.193

    12.76±

    1.273

    10.36±

    1.089∗

    Lasso

    4.198±

    0.327

    4.586±

    0.471

    0.990±

    0.099

    0.657±

    0.082

    3.992±

    0.457

    9.813±

    0.651

    2.100±

    0.285

    2.134±

    0.191

    11.65±

    1.318

    7.898±

    0.586∗

    MTFL

    4.145±

    0.303

    4.589±

    0.436

    0.967±

    0.120

    0.647±

    0.103

    3.952±

    0.478

    9.611±

    0.722

    2.010±

    0.123

    2.160±

    0.178

    11.22±

    1.225

    7.762±

    0.633∗

    FL-M

    TL

    4.319±

    0.496

    4.672±

    0.689

    1.174±

    0.263

    0.942±

    0.323

    4.073±

    0.494

    9.806±

    0.850

    2.131±

    0.252

    2.225±

    0.179

    11.76±

    1.391

    9.102±

    1.134∗

    FGL-M

    TL

    4.170±

    0.334

    4.560±

    0.494

    0.973±

    0.112

    0.656±

    0.081

    3.938±

    0.471

    9.746±

    0.640

    1.989±

    0.136

    2.116±

    0.199

    11.58±

    1.287

    9.114±

    0.923∗

    SGL-M

    TFL

    4.143±

    0.327

    4.585±

    0.456

    0.960±

    0.117

    0.644±

    0.093

    3.965±

    0.425

    9.584±

    0.743

    1.998±

    0.134

    2.143±

    0.186

    11.23±

    1.261

    7.742±

    0.617∗

    GL-M

    TFL

    4.140±

    0.326

    4.584±

    0.456

    0.960±

    0.114

    0.643±

    0.092

    3.965±

    0.424

    9.585±

    0.743

    2.001±

    0.133

    2.142±

    0.186

    11.23±

    1.261

    7.740±

    0.616∗

    FL-M

    TFL

    4.147±

    0.354

    4.582±

    0.491

    0.974±

    0.101

    0.654±

    0.076

    3.978±

    0.459

    9.491±

    0.728

    1.998±

    0.145

    2.121±

    0.187

    11.21±

    1.263

    7.711±

    0.585∗

    FSGL-M

    TL

    4.467±

    0.525

    4.859±

    0.719

    1.485±

    0.072

    1.307±

    0.049

    4.208±

    0.411

    9.553±

    0.721

    2.289±

    0.208

    2.394±

    0.244

    11.56±

    1.320

    8.329±

    0.593∗

    FGL-M

    TFL

    4.139±

    0.366

    4.566±

    0.505

    0.973±

    0.102

    0.654±

    0.076

    3.984±

    0.485

    9.471±

    0.708

    1.997±

    0.149

    2.119±

    0.191

    11.17±

    1.241

    7.689±

    0.565

    Superscriptsym

    bols

    ∗indicates

    thatFG

    L-M

    TFL

    significantly

    outperform

    edthatmethodon

    thatscore.Student’s

    t-testatalevelo

    f0.05

    was

    used

    The

    bold

    valueindicatesthebestperformance

  • Neuroinform

    t-test results show that the FGL-MTFL significantlyoutperforms all the other methods in terms of nMSEand all the other methods except FL-MTFL in termsof CC. Thus, simultaneously exploiting the structureamong the tasks and features resulted in significantlybetter prediction performance.

    2. The norm of both the fused lasso and fused group lassocan improve the performance of the traditional MTFL,which demonstrates that considering the local structurewithin the tasks can improve the overall predictionperformance. A fundamental step is to estimate thetrue relationships among tasks, thus promoting the mostappropriate information sharing among related taskswhile avoiding the use of information from unrelatedtasks.

    3. We investigated the effect of group penalty in ourmodel. The traditional MTFL consideres only the spar-sity of regression coefficients, thus failing to capture thegroup structure of features in the data.

    From the Tables 5 and 6, we can find that bothFL-MTFL and FGL-MTFL are better than traditionalMTFL. This is because the assumption in MTFLis a strict constraint and may affect the flexibilityof the multi-task model as described in “�2,1-norm”.The extraction of multiple features to measure theatrophy of each imaging biomarker can further improveprediction performance by capturing inherent featurestructures.

    GL-MTFL is similar to FGL-MTFL but the regu-larization term of G2,1-norm ignore the structure oftasks. This approach can flexibly take into accountthe complex relationships among imaging markers ina fused lasso regularization rather than relying on asimple grouping scheme. Moreover, compared withGL-MTFL, our FGL-MTFL can flexibly consider thecomplex relationships among outcomes in a group for-mat rather than within a simple group lasso scheme.Overall, FGL-MTFL achieved better performances than

    Actual y0 5 10 15 20 25 30 35 40 45 50 55

    Pre

    dict

    y

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    55 CC(MTFL) = 0.666 CC(FGL-MTFL) = 0.667

    MTFLFGL-MTFL

    (a) ADAS.

    Actual y035202

    Pre

    dict

    y

    20

    25

    30 CC(MTFL) = 0.535 CC(FGL-MTFL) = 0.548

    MTFLFGL-MTFL

    (b) MMSE.

    Actual y5 10 15 20 25 30 35 40 45 50 55 60 65 70

    Pre

    dict

    y

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    55

    60

    65

    70 CC(MTFL) = 0.515 CC(FGL-MTFL) = 0.522

    MTFLFGL-MTFL

    (c) TOTAL.

    Actual y0 5 10 15 20 25 30 35 40

    Pre

    dict

    y

    0

    5

    10

    15

    20

    25

    30

    35

    40 CC(MTFL) = 0.376 CC(FGL-MTFL) = 0.399

    MTFLFGL-MTFL

    (d) ANIM.

    Fig. 3 Scatter plots of actual versus predicted values of cognitive scores on each fold testing data using MTFL and FBGL-MTL methods basedon MRI features

  • Neuroinform

    FL-MTFL, demonstrating the benefit of employinggroup structural information among the features.

    4. All compared multi-task learning methods improvepredictive performance over the independent regressionalgorithm (Ridge, Lasso, GOSCAR, and ncFGS).This justifies the motivation to learn multiple taskssimultaneously.

    Additionally, we show the scatter plots for the predictedvalues versus the actual values for ADAS, MMSE, TOTAL,andANIM for the testing data in the cross-validation in Fig. 3.

    To investigate the influence of edge weight on per-formance, we compared our proposed FGL-MTFL withan unweighted FGL-MTFL (λ1‖�‖2,1 + λ2∑km=1‖θ.m −sign(rm,l)θ.l‖G2,1 ), which only considers the graph topologyof tasks and treats all links equally. The result is shown inTable 7, and shows obvious improvement with the weightedFGL-MTFL compared to the unweighted one.

    Comparison with the State-of-the-Art MTLMethods

    To illustrate how well our FGL-MTFL works by means ofmodeling the correlation among the tasks, we comprehen-sively compared our proposed methods with several popularstate-of-the-art related methods. The representative multi-task learning algorithms used for comparison include thefollowing models:

    1. Robust multi-Task Feature Learning (RMTL) (Chenet al. 2011):

    RMTL (min� L(X, Y, �) + λ1‖P ‖∗ + λ2‖S‖2,1,subject to � = P + S), which assumes that the model� can be decomposed into two components: a sharedfeature structure P that captures task relatedness and agroup-sparse structure S that can detect outliers.

    2. Clustered multi-Task Learning (CMTL) (Zhou et al.2011):

    CMTL (min�,M:MTM=Ic L(X, Y, �) + λ1(Tr(�T�) − Tr(MT�T�M)) + λ2Tr(�T�), where M ∈R

    c×k is an orthogonal cluster indicator matrix, and thetasks are clustered into c < k clusters) incorporates aregularization term to induce clustering between tasksand then shares information only to tasks belonging tothe same cluster. In CMTL, the number of clusters is setto 11 because the 20 tasks belong to 11 sets of cognitivefunctions.

    3. Trace-Norm Regularized multi-Task Learning (Trace)(Ji and Ye 2009): Assumes that all models share a com-mon low-dimensional subspace (min� L(X, Y, �) +λ‖�‖∗).

    4. Sparse regularized multi-task learning formulation(SRMTL) (Zhou ):

    SRMTL (min� L(X, Y, �) + λ1‖�Z‖2F + λ2‖�‖1 , where Z ∈ Rk×k) contains two regularization Table7

    The

    influenceof

    weightin

    gschemeof

    fusedgrouplassonorm

    inFG

    L-M

    TFL

    interm

    sof

    CCandwR

    Method

    ADAS

    MMSE

    RAVLT

    FLU

    TRAILS

    TOTA

    LTOT6

    TOTB

    T30

    RECOG

    ANIM

    VEG

    AB

    unweighed

    0.669±

    0.062

    0.542±

    0.055

    0.517±

    0.116

    0.4923

    ±0.128

    0.295±

    0.095

    0.510±

    0.120

    0.399±

    0.120

    0.381±

    0.090

    0.507±

    0.091

    0.416±

    0.085

    0.478±

    0.098

    weighted

    0.668±

    0.068

    0.549±

    0.068

    0.522±

    0.114

    0.499±

    0.121

    0.324±

    0.069

    0.514±

    0.110

    0.429±

    0.121

    0.399±

    0.081

    0.505±

    0.087

    0.419±

    0.086

    0.481±

    0.095

    Method

    LOGMEM

    CLOCK

    BOSN

    AM

    ANART

    DSP

    AN

    DIG

    ITwR

    IMMTOTA

    LDELT

    OTA

    LDRAW

    COPY

    SCOR

    FOR

    BAC

    unweighed

    0.508±

    0.086

    0.519±

    0.099

    0.341±

    0.077

    0.090±

    0.132

    0.471±

    0.113

    0.107±

    0.085

    0.086±

    0.114

    0.141±

    0.098

    0.479±

    0.065

    0.397±

    0.052∗

    weighted

    0.514±

    0.084

    0.533±

    0.094

    0.408±

    0.083

    0.252±

    0.092

    0.473±

    0.089

    0.137±

    0.082

    0.096±

    0.090

    0.218±

    0.113

    0.484±

    0.070

    0.421±

    0.052

    The

    first2

    0columns

    areCCandthelastoneiswR

    The

    bold

    valueindicatesthebestperformance

  • Neuroinform

    Table8

    Performance

    comparisonof

    variousmethods

    interm

    sof

    CC(the

    first2

    0columns)andwR(the

    lastcolumn)

    on10-foldcrossvalid

    ationforallthe

    cognitive

    predictio

    ntasks

    Method

    ADAS

    MMSE

    RAVLT

    FLU

    TRAILS

    TOTA

    LTOT6

    TOTB

    T30

    RECOG

    ANIM

    VEG

    AB

    Robust

    0.604±

    0.056

    0.394±

    0.085

    0.423±

    0.132

    0.420±

    0.143

    0.248±

    0.121

    0.427±

    0.119

    0.339±

    0.122

    0.272±

    0.130

    0.444±

    0.102

    0.309±

    0.114

    0.319±

    0.121

    CMTL

    0.646±

    0.060

    0.429±

    0.070

    0.494±

    0.090

    0.474±

    0.106

    0.263±

    0.084

    0.480±

    0.102

    0.381±

    0.094

    0.360±

    0.088

    0.494±

    0.086

    0.419±

    0.092

    0.479±

    0.088

    Trace

    0.559±

    0.070

    0.255±

    0.121

    0.413±

    0.125

    0.428±

    0.124

    0.227±

    0.142

    0.445±

    0.111

    0.356±

    0.135

    0.265±

    0.155

    0.419±

    0.111

    0.282±

    0.104

    0.313±

    0.094

    SRMTL

    0.664±

    0.062

    0.542±

    0.064

    0.495±

    0.095

    0.497±

    0.117

    0.323±

    0.081

    0.514±

    0.112

    0.401±

    0.118

    0.373±

    0.101

    0.502±

    0.089

    0.376±

    0.095

    0.425±

    0.122

    p-M

    SSL

    0.666±

    0.071

    0.541±

    0.074

    0.520±

    0.099

    0.493±

    0.113

    0.312±

    0.066

    0.510±

    0.107

    0.420±

    0.128

    0.394±

    0.085

    0.497±

    0.080

    0.378±

    0.087

    0.398±

    0.114

    G-SMuR

    FS0.670±

    0.062

    0.535±

    0.060

    0.506±

    0.114

    0.482±

    0.124

    0.284±

    0.100

    0.492±

    0.116

    0.391±

    0.113

    0.366±

    0.095

    0.495±

    0.092

    0.409±

    0.085

    0.471±

    0.101

    MTRL

    0.636±

    0.056

    0.543±

    0.071

    0.504±

    0.081

    0.476±

    0.104

    0.313±

    0.070

    0.478±

    0.095

    0.387±

    0.113

    0.395±

    0.082

    0.506±

    0.081

    0.417±

    0.097

    0.477±

    0.091

    FGL-M

    TFL

    0.668±

    0.068

    0.549±

    0.068

    0.522±

    0.114

    0.499±

    0.121

    0.324±

    0.069

    0.514±

    0.110

    0.429±

    0.121

    0.399±

    0.081

    0.505±

    0.087

    0.419±

    0.086

    0.481±

    0.095

    Method

    LOGMEM

    CLOCK

    BOSN

    AM

    ANART

    DSP

    AN

    DIG

    ITwR

    IMMTOTA

    LDELT

    OTA

    LDRAW

    COPY

    SCOR

    FOR

    BAC

    Robust

    0.448±

    0.109

    0.470±

    0.106

    0.342±

    0.099

    0.133±

    0.112

    0.373±

    0.130

    0.050±

    0.083

    0.007±

    0.120

    0.130±

    0.089

    0.392±

    0.049

    0.327±

    0.059∗

    CMTL

    0.497±

    0.086

    0.515±

    0.096

    0.370±

    0.077

    0.203±

    0.114

    0.440±

    0.079

    0.134±

    0.080

    0.044±

    0.130

    0.188±

    0.116

    0.468±

    0.070

    0.389±

    0.046∗

    Trace

    0.423±

    0.121

    0.470±

    0.119

    0.294±

    0.123

    0.065±

    0.136

    0.236±

    0.141

    0.045±

    0.082

    0.041±

    0.137

    0.149±

    0.069

    0.323±

    0.094

    0.301±

    0.070∗

    SRMTL

    0.504±

    0.091

    0.525±

    0.102

    0.362±

    0.077

    0.084±

    0.120

    0.482±

    0.095

    0.117±

    0.082

    0.092±

    0.092

    0.178±

    0.125

    0.463±

    0.051

    0.396±

    0.046∗

    p-M

    SSL

    0.512±

    0.087

    0.538±

    0.098

    0.384±

    0.063

    0.236±

    0.110

    0.483±

    0.085

    0.150±

    0.078

    0.096±

    0.113

    0.197±

    0.128

    0.469±

    0.074

    0.410±

    0.045∗

    G-SMuR

    FS0.502±

    0.091

    0.514±

    0.105

    0.366±

    0.100

    0.197±

    0.110

    0.475±

    0.129

    0.091±

    0.078

    0.070±

    0.102

    0.132±

    0.104

    0.477±

    0.059

    0.396±

    0.054∗

    MTRL

    0.493±

    0.075

    0.507±

    0.089

    0.402±

    0.076

    0.262±

    0.084

    0.470±

    0.087

    0.143±

    0.089

    0.108±

    0.102

    0.223±

    0.128

    0.469±

    0.0741

    0.410±

    0.045∗

    FGL-M

    TFL

    0.514±

    0.084

    0.533±

    0.094

    0.408±

    0.083

    0.252±

    0.092

    0.473±

    0.089

    0.137±

    0.082

    0.096±

    0.090

    0.218±

    0.113

    0.484±

    0.070

    0.421±

    0.052

    Superscriptsym

    bols

    ∗indicates

    thatFG

    L-M

    TFL

    significantly

    outperform

    edthatmethodon

    thatscore.Student’s

    t-testatalevelo

    f0.05

    was

    used

    The

    bold

    valueindicatesthebestperformance

  • Neuroinform

    Table9

    Performance

    comparisonof

    variousmethods

    interm

    sof

    rmse

    (the

    first2

    0columns)andnM

    SE(the

    lastcolumn)

    on10

    crossvalid

    ationcognitive

    predictio

    ntasks

    Method

    ADAS

    MMSE

    RAVLT

    FLU

    TRAILS

    TOTA

    LTOT6

    TOTB

    T30

    RECOG

    ANIM

    VEG

    AB

    Robust

    7.339±

    0.549

    2.811±

    0.131

    10.82±

    0.815

    3.587±

    0.333

    1.729±

    0.139

    3.742±

    0.238

    3.887±

    0.364

    5.7623

    ±0.491

    3.948±

    0.307

    26.52±

    3.501

    85.30±

    7.559

    CMTL

    15.88±

    1.291

    21.52±

    0.443

    27.94±

    1.645

    4.944±

    0.654

    3.520±

    0.172

    4.562±

    0.623

    8.824±

    0.537

    14.06±

    1.105

    9.731±

    0.777

    43.51±

    4.236

    126.2±

    11.40

    Trace

    7.969±

    0.925

    4.310±

    1.464

    11.50±

    1.000

    3.629±

    0.325

    1.805±

    0.136

    3.746±

    0.220

    3.988±

    0.562

    6.086±

    0.724

    4.179±

    0.593

    28.41±

    2.181

    88.38±

    5.778

    SRMTL

    6.925±

    0.464

    2.405±

    0.316

    10.40±

    0.814

    4.067±

    0.878

    3.028±

    1.786

    4.233±

    0.940

    4.116±

    0.737

    5.432±

    0.495

    4.243±

    0.869

    24.07±

    3.794

    75.16±

    8.209

    p-M

    SSL

    6.698±

    0.521

    2.330±

    0.117

    9.675±

    0.803

    3.400±

    0.383

    1.802±

    0.177

    3.514±

    0.397

    3.693±

    0.211

    5.267±

    0.577

    3.780±

    0.303

    23.49±

    3.806

    75.87±

    7.699

    G-SMuR

    FS6.641±

    0.422

    2.209±

    0.107

    9.741±

    0.708

    3.348±

    0.283

    1.684±

    0.150

    3.483±

    0.240

    3.666±

    0.262

    5.307±

    0.482

    3.719±

    0.196

    23.07±

    3.581

    70.23±

    5.622

    MTRL

    7.010±

    0.567

    2.197±

    0.101

    9.825±

    0.749

    3.386±

    0.282

    1.659±

    0.155

    3.540±

    0.323

    3.682±

    0.232

    5.232±

    0.436

    3.708±

    0.181

    22.95±

    4.051

    69.57±

    4.612

    FGL-M

    TFL

    6.678±

    0.474

    2.214±

    0.081

    9.631±

    0.788

    3.335±

    0.276

    1.668±

    0.167

    3.456±

    0.305

    3.610±

    0.186

    5.221±

    0.493

    3.697±

    0.223

    22.90±

    3.666

    69.53±

    5.011

    Method

    LOGMEM

    CLOCK

    BOSN

    AM

    ANART

    DSP

    AN

    DIG

    ITnM

    SE

    IMMTOTA

    LDELT

    OTA

    LDRAW

    COPY

    SCOR

    FOR

    BAC

    Robust

    4.436±

    0.374

    4.909±

    0.431

    1.002±

    0.125

    0.699±

    0.098

    4.488±

    0.355

    10.74±

    0.650

    2.146±

    0.143

    2.228±

    0.182

    12.55±

    1.275

    10.44±

    1.070∗

    CMTL

    7.890±

    0.807

    6.559±

    0.976

    3.465±

    0.099

    3.771±

    0.093

    20.76±

    0.510

    13.83±

    1.009

    6.899±

    0.230

    5.386±

    0.251

    31.85±

    1.789

    47.57±

    2.176∗

    Trace

    4.649±

    0.487

    4.994±

    0.645

    1.134±

    0.255

    0.939±

    0.306

    5.595±

    1.363

    10.97±

    0.858

    2.313±

    0.360

    2.331±

    0.227

    13.80±

    1.025

    11.88±

    1.516∗

    SRMTL

    4.896±

    1.048

    5.197±

    0.629

    2.698±

    2.063

    3.356±

    3.077

    4.040±

    0.519

    10.08±

    0.852

    3.635±

    2.039

    3.130±

    1.288

    12.15±

    1.571

    11.68±

    4.228∗

    p-M

    SSL

    4.199±

    0.440

    4.586±

    0.598

    1.200±

    0.058

    0.953±

    0.049

    4.022±

    0.390

    9.481±

    0.719

    2.114±

    0.184

    2.231±

    0.222

    11.30±

    1.162

    8.518±

    0.771∗

    G-SMuR

    FS4.168±

    0.310

    4.615±

    0.454

    0.976±

    0.116

    0.653±

    0.088

    3.960±

    0.408

    9.681±

    0.731

    2.025±

    0.127

    2.179±

    0.184

    11.26±

    1.246

    7.844±

    0.668∗

    MTRL

    4.223±

    0.308

    4.685±

    0.468

    0.959±

    0.119

    0.632±

    0.099

    3.966±

    0.540

    9.443±

    0.611

    1.985±

    0.136

    2.106±

    0.187

    11.278

    ±1.315

    7.761±

    0.045

    FGL-M

    TFL

    4.139±

    0.366

    4.566±

    0.505

    0.973±

    0.102

    0.654±

    0.076

    3.984±

    0.485

    9.471±

    0.708

    1.997±

    0.149

    2.119±

    0.191

    11.17±

    1.241

    7.689±

    0.565

    Superscriptsym

    bols

    ∗indicates

    thatFG

    L-M

    TFL

    significantly

    outperform

    edthatmethodon

    thatscore.Student’s

    t-testatalevelo

    f0.05

    was

    used

    The

    bold

    valueindicatesthebestperformance

  • Neuroinform

    processes: (1) all tasks are regularized by their meanvalue, and therefore knowledge obtained from onetask can be utilized by other tasks via the meanvalue; (2) sparsity is enforced in the learning with �1norm.

    5. Multi-task Sparse Structure Learning from tasks param-eters (p-MSSL) (Goncalves et al. 2014):

    p-MSSL (min�,��0 L(X, Y, �) − k2 log |�| +Tr(���T ) + λ1‖�‖1 + λ2‖�‖1, where � ∈ Rk×k is amatrix that captures the task relationship structure.

    6. Group-Sparse Multi-task Regression and Feature Selec-tion (G-SMuRFS) (Yan et al. 2015):

    G-SMuRFS (min� L(X, Y, �) + λ1‖�‖2,1 +λ2

    ∑ql=1 wl

    √∑j∈Gl ‖θj .‖2) takes into account coupled

    features and group sparsity across tasks. Parametersfor these three methods are set following the sameapproach as that used in the baseline comparable meth-ods. Tables 8 and 9 show the results of the comparableMTL methods in terms of rMSE, CC, nMSE and wR.

    7. Multi-task relationship learning (MTRL) (Zhang andYeung 2012b):

    MTRL (minW,b,�∑m

    I=1 1ni∑

    j=1 ni(yij − wTi xij −bi)

    2 + λ12 tr(WWT ) + λ22 tr(W�−1WT ), s.t. � 0, tr(�) = 1) can simultaneously model positivetask correlation, describe negative task correlation, andidentify outlier tasks based on the same underlyingprinciple.

    From the experimental results in Tables 8 and 9, weobserve that the proposed method significantly outperformsthe other recently developed algorithms. Specifically, thet-test results indicate that FGL-MTFL significantly outper-forms all other methods significantly in terms of CC andall but FL-MTRL in terms of nMSE. Compared with theother multi-task learning that utilize different assumptions,G-SMuRFS and our proposed methods, both multi-task fea-ture learning methods with sparsity-inducing norms, havean advantage. Since not all the brain regions are associatedwith AD, many of the features are irrelevant and redundant.Thus, sparse-based MTL methods involving FGL-MTFLand G-SMuRFS are more appropriate to predict cognitivemeasures with better performance than the non-sparse basedMTL methods. Unlike G-SMuRFS, the group regulariza-tion G2,1-norm in FGL-MTFL decouples the group sparseregularization across tasks to provide more flexibility.

    Identification of Correlation among CognitiveOutcomes

    A fundamental component of our FGL-MTFL is to estimatethe relationship structure among tasks, thus promotingappropriate information sharing among related tasks. Inthis section, we investigate and evaluate the estimated taskrelationships. Figure 4 shows the normalized estimatedcorrelation matrix Z .

    Fig. 4 Correlation matrix

    Tasks

    AD

    AS

    MM

    SE

    TO

    TA

    LT

    OT

    6T

    OT

    BT

    30R

    EC

    OG

    AN

    IMV

    EG A B

    IMM

    TO

    TA

    LD

    ELT

    OT

    AL

    DR

    AW

    CO

    PY

    SC

    OR

    BO

    SN

    AM

    AN

    AR

    TF

    OR

    BA

    CD

    IGIT

    Tas

    ks

    ADASMMSETOTAL

    TOT6TOTB

    T30RECOG

    ANIMVEG

    AB

    IMMTOTALDELTOTAL

    DRAWCOPYSCOR

    BOSNAMANART

    FORBAC

    DIGIT

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

  • Neuroinform

    Using the previous results obtained by our FGL-MTFL,the constructed graph G includes all pairwise links foreach pair of tasks. The correlation of these links may beweak or not actually correlated due to bad estimation ofthe correlation matrix. To clearly analyze the influenceof the estimated task relationships to the predictionperformance of cognitive outcomes, we constructed graphsG that vary the value of the threshold τ based on theestimated correlation matrix Z . The range of the τ value is0.1, 0.3, 0.5, and 0.7. The graphs of these four examplesare shown in Fig. 5. In this experiment, rather than includingall pairwise links, a specific threshold value was appliedto the correlation matrix Z , and the performance with

    different threshold values is presented in Table 10. In thiscase, by thresholding the estimated correlation matrix witha higher τ , we can construct a sparse undirected graph torepresent only the most reliable correlation. From the datashown in Fig. 5, we can find that with an increase in thethreshold value, the graph of tasks becomes less dense.When τ = 0.1, only one link is removed, and when τincreases to 0.3 and 0.5, the numbers of remaining links areonly 133 and 48, respectively. When τ = 0.7, there are onlyeight highly correlated cognitive scores. We found theseremaining cognitive outcomes (including ADAS, MMSE,and RAVLT) are the most commonly used in the multi-task learning to predict cognitive outcomes (Yan et al. 2013,

    (a) 0.1. (189) (b) 0.3. (133)

    (c) 0.5. (48) (d) 0.7. (8)Fig. 5 The correlation graphs with four different threshold values

  • Neuroinform

    Table10

    The

    influenceof

    thresholdto

    thepredictio

    nperformance

    forthefusedlassoor

    fusedgrouplassoregularizedmodel

    Method

    τADAS

    MMSE

    RAVLT

    FLU

    TRAILS

    TOTA

    LTOT6

    TOTB

    T30

    RECOG

    ANIM

    VEG

    AB

    FL-M

    TL

    0.7

    0.665±

    0.068

    0.553±

    0.069

    0.498±

    0.101

    0.506±

    0.120

    0.286±

    0.060

    0.517±

    0.117

    0.411±

    0.128

    0.382±

    0.102

    0.509±

    0.081

    0.363±

    0.089

    0.368±

    0.122

    0.5

    0.662±

    0.066

    0.550±

    0.070

    0.488±

    0.103

    0.507±

    0.119

    0.308±

    0.068

    0.516±

    0.116

    0.410±

    0.119

    0.375±

    0.104

    0.509±

    0.085

    0.355±

    0.093

    0.358±

    0.121

    0.3

    0.657±

    0.064

    0.557±

    0.070

    0.474±

    0.108

    0.501±

    0.117

    0.287±

    0.082

    0.511±

    0.114

    0.407±

    0.107

    0.366±

    0.106

    0.505±

    0.086

    0.348±

    0.097

    0.348±

    0.120

    0.1

    0.641±

    0.074

    0.522±

    0.095

    0.491±

    0.112

    0.472±

    0.102

    0.287±

    0.085

    0.472±

    0.096

    0.409±

    0.085

    0.356±

    0.107

    0.485±

    0.076

    0.352±

    0.085

    0.386±

    0.159

    –0.641±

    0.074

    0.523±

    0.095

    0.491±

    0.112

    0.472±

    0.102

    0.287±

    0.085

    0.473±

    0.096

    0.409±

    0.085

    0.355±

    0.107

    0.485±

    0.076

    0.352±

    0.085

    0.385±

    0.159

    FGL-M

    TL

    0.7

    0.659±

    0.066

    0.541±

    0.068

    0.504±

    0.097

    0.503±

    0.122

    0.286±

    0.053

    0.521±

    0.116

    0.409±

    0.137

    0.390±

    0.086

    0.508±

    0.081

    0.367±

    0.093

    0.385±

    0.119

    0.5

    0.660±

    0.064

    0.538±

    0.067

    0.496±

    0.099

    0.506±

    0.120

    0.307±

    0.064

    0.521±

    0.117

    0.412±

    0.130

    0.388±

    0.098

    0.513±

    0.081

    0.361±

    0.096

    0.372±

    0.119

    0.3

    0.663±

    0.062

    0.553±

    0.067

    0.487±

    0.105

    0.506±

    0.117

    0.291±

    0.077

    0.520±

    0.116

    0.411±

    0.120

    0.380±

    0.102

    0.514±

    0.078

    0.353±

    0.099

    0.359±

    0.119

    0.1

    0.662±

    0.060

    0.556±

    0.065

    0.479±

    0.109

    0.501±

    0.116

    0.287±

    0.080

    0.517±

    0.113

    0.410±

    0.116

    0.371±

    0.106

    0.512±

    0.077

    0.347±

    0.101

    0.351±

    0.119

    –0.662±

    0.060

    0.557±

    0.065

    0.479±

    0.109

    0.501±

    0.116

    0.286±

    0.080

    0.516±

    0.113

    0.409±

    0.115

    0.370±

    0.106

    0.512±

    0.077

    0.347±

    0.101

    0.351±

    0.119

    FGL-M

    TFL

    0.7

    0.667±

    0.062

    0.549±

    0.061

    0.517±

    0.118

    0.500±

    0.127

    0.320±

    0.081

    0.516±

    0.117

    0.421±

    0.126

    0.380±

    0.083

    0.505±

    0.082

    0.414±

    0.085

    0.479±

    0.098

    0.5

    0.665±

    0.063

    0.547±

    0.057

    0.523±

    0.113

    0.496±

    0.120

    0.317±

    0.078

    0.513±

    0.114

    0.413±

    0.133

    0.388±

    0.081

    0.499±

    0.089

    0.416±

    0.092

    0.481±

    0.095

    0.3

    0.667±

    0.069

    0.549±

    0.069

    0.522±

    0.113

    0.498±

    0.121

    0.327±

    0.069

    0.512±

    0.110

    0.427±

    0.123

    0.400±

    0.080

    0.503±

    0.087

    0.419±

    0.087

    0.482±

    0.095

    0.1

    0.668±

    0.068

    0.549±

    0.068

    0.522±

    0.114

    0.499±

    0.121

    0.324±

    0.069

    0.514±

    0.110

    0.429±

    0.121

    0.399±

    0.081

    0.505±

    0.087

    0.419±

    0.086

    0.481±

    0.095

    –0.668±

    0.068

    0.549±

    0.068

    0.522±

    0.114

    0.499±

    0.121

    0.324±

    0.069

    0.514±

    0.110

    0.429±

    0.121

    0.399±

    0.081

    0.505±

    0.087

    0.419±

    0.086

    0.481±

    0.095

    Method

    τLOGMEM

    CLOCK

    BOSN

    AM

    ANART

    DSP

    AN

    DIG

    ITwR

    IMMTOTA

    LDELT

    OTA

    LDRAW

    COPY

    SCOR

    For

    BAC

    FL-M

    TL

    0.7

    0.510±

    0.092

    0.534±

    0.104

    0.369±

    0.075

    0.226±

    0.099

    0.478±

    0.088

    0.156±

    0.093

    0.108±

    0.120

    0.187±

    0.137

    0.456±

    0.065

    0.404±

    0.048∗

    0.5

    0.503±

    0.093

    0.529±

    0.105

    0.378±

    0.073

    0.226±

    0.096

    0.480±

    0.096

    0.142±

    0.087

    0.117±

    0.116

    0.193±

    0.134

    0.448±

    0.059

    0.403±

    0.047∗

    0.3

    0.496±

    0.095

    0.524±

    0.108

    0.372±

    0.089

    0.218±

    0.100

    0.470±

    0.103

    0.119±

    0.078

    0.121±

    0.112

    0.191±

    0.102

    0.441±

    0.056

    0.396±

    0.048∗

    0.1

    0.470±

    0.097

    0.514±

    0.108

    0.379±

    0.093

    0.221±

    0.102

    0.442±

    0.084

    0.094±

    0.104

    0.083±

    0.102

    0.177±

    0.108

    0.414±

    0.083

    0.383±

    0.046∗

    –0.470±

    0.097

    0.514±

    0.108

    0.379±

    0.093

    0.221±

    0.102

    0.442±

    0.085

    0.094±