FusedGroupLassoRegularizedMulti-TaskFeatureLearning ...adni.loni.usc.edu/adni-publications/Fused...

Neuroinformaticshttps://doi.org/10.1007/s12021-018-9398-5

ORIGINAL ARTICLE

Fused Group Lasso Regularized Multi-Task Feature Learningand Its Application to the Cognitive Performance Predictionof Alzheimer’s Disease

Xiaoli Liu1,2 · Peng Cao1 · JianzhongWang3 · Jun Kong3,4 ·Dazhe Zhao1,2

© Springer Science+Business Media, LLC, part of Springer Nature 2018

AbstractAlzheimer’s disease (AD) is characterized by gradual neurodegeneration and loss of brain function, especially for memoryduring early stages. Regression analysis has been widely applied to AD research to relate clinical and biomarker datasuch as predicting cognitive outcomes from MRI measures. Recently, multi-task based feature learning (MTFL) methodswith sparsity-inducing �2,1-norm have been widely studied to select a discriminative feature subset from MRI featuresby incorporating inherent correlations among multiple clinical cognitive measures. However, existing MTFL assumes thecorrelation among all tasks is uniform, and the task relatedness is modeled by encouraging a common subset of featuresvia sparsity-inducing regularizations that neglect the inherent structure of tasks and MRI features. To address this issue, weproposed a fused group lasso regularization to model the underlying structures, involving 1) a graph structure within tasksand 2) a group structure among the image features. To this end, we present a multi-task feature learning framework witha mixed norm of fused group lasso and �2,1-norm to model these more flexible structures. For optimization, we employedthe alternating direction method of multipliers (ADMM) to efficiently solve the proposed non-smooth formulation. Weevaluated the performance of the proposed method using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets.The experimental results demonstrate that incorporating the two prior structures with fused group lasso norm into the multi-task feature learning can improve prediction performance over several competing methods, with estimated correlations ofcognitive functions and identification of cognition-relevant imaging markers that are clinically and biologically meaningful.

Keywords Alzheimer’s disease · Multi-task learning · Sparse group lasso · Fused lasso

Introduction

Alzheimer’s disease (AD) is the most common cause ofdementia, which mainly affects memory function, and pro-gress ultimately culminate in a state of dementia where all

� Peng [email protected]

Xiaoli [email protected]

1 Computer Science and Engineering, Northeastern University,Shenyang, China

2 Key Laboratory of Medical Image Computing of Ministryof Education, Northeastern University, Shenyang, China

3 College of Information Science and Technology,Northeast Normal University, Changchun, China

4 Key Laboratory of Applied Statistics ofMOE, Changchun, China

cognitive functions are affected. The disease poses a seriouschallenge to the aging society. The worldwide prevalenceof AD is predicted to quadruple from 46.8 million in 2016(Alzheimer’s Association and et al. 2016) to 131.5 millionby 2050 according to ADI’s World Alzheimer Report(Batsch and Mittelman 2015). Dementia also has a hugeeconomic impact. Today, the total estimated worldwide costof dementia of US is $818 billion, and it will achieve atrillion dollar disease by 2018.

Predicting cognitive performance of subjects from neu-roimaging measures and identifying relevant imaging bio-markers are important focuses of the study of Alzheimer’sdisease. Magnetic resonance imaging (MRI) allows thedirect observation of brain changes such as cerebral atrophyor ventricular expansion (Castellani et al. 2010). Previ-ous work showed that brain atrophy detected by MRI iscorrelated with neuropsychological deficits (Frisoni et al.2010). The relationships between commonly used cognitivemeasures and structural changes detected by MRI have been

http://crossmark.crossref.org/dialog/?doi=10.1007/s12021-018-9398-5&domain=pdfhttp://orcid.org/0000-0003-2274-6180mailto: [email protected]: [email protected]

Neuroinform

previously studied using regression models. Many analy-ses have demonstrated a relationship between baseline MRIfeatures and cognitive measures. The most commonly usedcognitive measures are the Alzheimer’s Disease AssessmentScale cognitive total score (ADAS), the Mini Mental StateExam score (MMSE), and the Rey Auditory Verbal Learn-ing Test (RAVLT). ADAS is the gold standard in AD drugtrials for cognitive function assessment, and is the mostpopular cognitive testing instrument to measure the severityof the most important symptoms of AD. MMSE mea-sures cognitive impairment, including orientation to timeand place, attention and calculation, immediate and delayedrecall of words, language and visuo-constructional func-tions. RAVLT measures episodic memory and is used forthe diagnosis of memory disturbances, including eight recalltrials and a recognition test.

Early studies focused on traditional regression modelsto predict cognitive outcomes one at a time. To achievemore appropriate predictive models of performance andidentify relevant imaging biomarkers, many previous worksformulated the prediction of multiple cognitive outcomesas a multi-task learning problem and developed regularizedmulti-task learning methods to model disease cognitive out-comes (Wan et al. 2012, 2014; Zhou et al. 2013; Wanget al. 2012). Multi-task learning (MTL) (Caruana 1998)describes a learning paradigm that seeks to improve the gen-eralization performance of a learning task with the help ofother related tasks. The fundamental hypothesis of the MTLmethods is to assume that if tasks are related then learn-ing of one task can benefit from the learning of other tasks.Learning multiple related tasks simultaneously has beentheoretically and empirically shown to significantly improveperformance. The key of MTL is how to exploit correlationamong tasks via an appropriate shared representation. Twopopular shared representations to model task relatedness aremodel parameter sharing (Argyriou et al. 2008; Jebara 2011)and feature representation sharing (Evgeniou and learning2004; Yu et al. 2005; Xue et al. 2007). There are inherentcorrelations among different cognitive scores. Therefore,the prediction of different types of cognitive scores can bemodeled as an MTL formulation, and the tasks are relatedin the sense that they all share a small set of features, whichis multi-task feature learning (MTFL) problem. To solveMTFL problems, regularization has been introduced toproduce better performance than traditional solution usingsingle-task learning. The most commonly used regulariza-tion is �2,1-norm (Liu et al. 2009), which is employed toextract features that impact all or most clinical scores, sincethe assumption is that a given imaging marker can affectmultiple cognitive scores and only a subset of brain regions(region-of-interest, ROI) are relevant. (Wang et al. 2011)and (Zhang and Yeung 2012a) employed multi-task featurelearning strategies to select biomarkers that could predict

multiple clinical scores. Specifically, (Wang et al. 2011)employed an �1-norm regularizer to impose sparsity amongall elements and proposed the use of a combined �2,1-norm and �1-norm regularizations to select features. (Zhanget al. 2012) proposed a multi-task learning with �2,1-normto select a common subset of relevant features for multiplevariables from each modality.

A major limitation of existing MTFL methods is thatcomplex relationships among imaging markers and amongcognitive outcomes are often ignored. The correlating ofmultiple prediction models assumes that all tasks sharedthe same feature subset. This is not a realistic assumption,since it treats all cognitive outcomes (response) andMRI features (predictors) equally and neglects underlyingcorrelations between the cognitive tasks and structure withinMRI features. Specifically, 1) for the cognitive outcomes,each assessment typically yields multiple evaluation scoresfrom a set of relevant cognitive tasks, and thus thesescores are inherently correlated. An example would bethe scores of TOTAL and TOT6 in the RAVLT. Differentassessments can evaluate different cognitive functions,resulting in low correlation and preferring different brainregions. For example, the tasks in TRAILS aim to testa combination of visual, motor, and executive functions,while the set of RAVLT aims testing of verbal learningmemory. It is reasonable to assume that correlationsamong tasks are not equal, and some tasks may be moreclosely related than others in assessment tests of cognitiveoutcomes. 2) On the other hand, for MRI data, many MRIfeatures are interrelated and together reveal brain cognitivefunctions (Yan et al. 2015). In our data, multiple shapemeasures (volume, area, and thickness) from the sameregion provide a comprehensively quantitative evaluation ofcortical atrophy, and tend to be selected together as jointpredictors. Our previous study proposed a model in whichprior knowledge guided multi-task feature learning model.Using group information to enforce intra-group similarityhas been demonstrated to be an effective approach (Liu et al.2017). Overall, it is important to explore and utilize suchinterrelated structures and select important and structurallycorrelated features together.

To address these model limitations, we designed anovel multi-task feature learning that models a commonrepresentation with respect to MRI features across tasksas well as the local task structure with respect to brainregions. Specifically, we designed novel mixed structuredsparsity norms, called fused group lasso, to capture theunderlying structures at the level of tasks and features.This regularizer is based on the natural assumption that ifsome tasks are correlated, they should have a small similarweight vector and similar selected brain regions. It penalizesdifferences between prediction models of highly correlatedtasks, and encourages similarity in the selected features

Neuroinform

of highly correlated tasks. To discover such dependentstructures among the cognitive outcomes, we employed thePearson correlation coefficient to uncover the interrelationsamong cognitive measures and estimated the correlationmatrix of all tasks. In our work, all the cognitive measures(20 in total) in the ADNI dataset were used to exploit therelationship. To the best of our knowledge, our approachis the first work to analyze all cognitive measures in theADNI dataset and their relationships. With estimated taskcorrelation, we employ the idea of fused lasso to capturethe dependence of response variables. At the same time,taking into account the group structure among predictors,prior group information is incorporated into the fused lassonorm, promoting intra-group similarity with group sparsity.By incorporated fused group lasso into the MTFL model,we can better understand the underlying associations ofprediction tasks of cognitive measures, allowing more stableidentification of cognition-relevant imaging markers. Theresulting formulation is challenging to solve due to the useof non-smooth penalties including the fused group lasso andthe �2,1-norm. An effective ADMM algorithm is proposedto tackle the complex non-smoothness.

Through empirical evaluation and comparison withdifferent baseline methods and recently developed MTLmethods using data from ADNI, we illustrate that the pro-posed FGL-MTFL method outperforms other methods.Improvements are statistically significant for most scores(tasks). The results demonstrate that incorporation of thefused group lasso into the traditional MTFL formulationimproves predictive performance relative to traditionalmachine learning methods. We discuss the most prominentROIs and task correlations identified by FGL-MTFL. Wefound that the results corroborate previous studies in neu-roscience. Our previous works also formulate predictiontasks with a multi-task learning scheme. The algorithmSMKMTL (Sparse multi-kernel based multi-task learning)in (Cao et al. 2017) exploits a nonlinear prediction modelbased on multi-kernel learning. However, it assumes thatcorrelations among tasks are equal and the features are inde-pendent. Although the GSGL-MTL (Group-guided SparseGroup Lasso regularized multi-task learning) algorithm (Liuet al. 2017) exploits the group structure of features byincorporating a priori group information, it does not con-sider the complex relationships among cognitive outcomes.

The rest of the paper is organized as follows. In“Preliminary Methodology”, we provide a description of thepreliminary methodology: multi-task learning (MTL), �2,1-norm, group lasso norm, and fused lasso norm. A detailedmathematical formulation and optimization of FGL-MTFLis provided in “Fused Group Lasso Regularized Multi-TaskFeature Learning, FGL-MTFL”. In “Experimental Resultsand Discussions”, we present the experimental results andevaluate the performance of FGL-MTFL using data from

the ADNI-1 dataset. The conclusions are presented in“Conclusion”.

Preliminary Methodology

Multi-Task Learning

Consider a multi-task learning (MTL) setting with k tasks.Let p be the number of covariates, shared across all thetasks, and let n be the number of samples. Let X ∈ Rn×pdenote the matrix of covariates, Y ∈ Rn×k be the matrixof responses with each row corresponding to a sample,and � ∈ Rp×k denote the parameter matrix, with columnθ.m ∈ Rp corresponding to task m, m = 1, . . . , k, and rowθj . ∈ Rk corresponding to feature j , j = 1, . . . , p. TheMTL problem can be constructed by estimating the param-eters based on suitable regularized loss function. In ordereffectively to associate imaging markers and cognitive mea-sures, the MTL model minimizes the following objective:

min�∈Rp×k

L(Y, X, �) + λR(�) , (1)

where L(·) denotes the loss function and R(·) is theregularizer. In the current context, we assume the loss to bethe square loss, i.e.,

L(Y, X, �) = ‖Y − X�‖2F =n∑

i=1‖yi − xi�‖22 , (2)

where yi ∈ R1×k, xi ∈ R1×p are the i-th rows of Y, X,respectively corresponding to the multi-task response andcovariates for the i-th sample. We note that the MTL frame-work can be easily extended to other loss functions. Clearly,different choices of the penalty R(�) may present quite dif-ferent multi-task methods. Using some prior knowledge, wethen add penalty R(�) to encode relatedness among tasks.

�2,1-norm

One appealing property of the �2,1-norm regularization isthat it encourages multiple predictors from different tasks toshare similar parameter sparsity patterns. The MTFL modelvia the �2,1-norm regularization considers

‖�‖2,1 =p∑

j=1‖θj .‖2 , (3)

and is suitable for the simultaneous prioritization of sparsityover features for all tasks.

The key point of Eq. 3 is the use of �2-norm for θj .,which forces grouping of the weights corresponding to thej -th feature across multiple tasks and tends to select features

Neuroinform

based on the joint strength of k tasks jointly (See Fig. 1a).There is a correlation among multiple cognitive tests. Arelevant imaging predictor typically may have more or lessinfluence on all these scores, and it is possible that only asubset of brain regions are relevant to each assessment. Byemploying MTFL, the correlation among different tasks canbe incorporated into the model to build a more appropriatepredictive model and identify a subset of features. The rowsof � are equally treated in MTFL, which implies that theunderlying structures among predictors are ignored.

G2,1-norm

Despite the above achievements, few regression modelstake into account the covariance structure among predictors.To achieve a certain function, brain imaging measures areoften correlated with each other. For MRI data, groups

correspond to specific regions-of-interest (ROIs) in thebrain, e.g., entorhinal and hippocampus. Individual featuresare the specific properties of those regions, e.g., corticalvolume and thickness. In this study, for each region (group),multiple features were extracted to measure the atrophyinformation of each ROI involving cortical thickness,surface area, and volume from gray matter and white matter.Multiple shape measures from the same region provide acomprehensively quantitative evaluation of cortical atrophy,and tend to be selected together as joint predictors.

We assume the p covariates to be divided into g disjointgroups G�, � = 1, . . . , g, with each group including ν�covariates respectively. In the context of AD, each groupcorresponds to a region-of-interest (ROI) in the brain, andthe covariates in each group correspond to specific featuresof that region. For AD, the number of features in eachgroup, ν�, is 1 or 4, and the number of groups g can be in

Fig. 1 The illustration of three different regularizations. Each col-umn of � is corresponding to a single task and each row representsa feature dimension. The MRI features in each region belong to agroup. We assume the p features to be divided into g disjoint groups

G�, � = 1, . . . , g, with each group having ν� features respectively. Foreach element in �, white color means zero-valued elements and colorindicates non-zero values

Neuroinform

the hundreds. We then introduce a G2,1-norm according tothe relationship between brain regions (ROIs) and cognitivetasks, and encourage a task-specific subset of ROIs (SeeFig. 1b). The G2,1-norm ‖�‖G2,1 is defined as:

‖�‖G2,1 =g∑

�=1

k∑

m=1w�‖θG�,m‖2 . (4)

where w� = √ν� is the weight for each group and θG�,m ∈R

ν� is the coefficient vector for group G� and task m.

Fused Lasso

Fused lasso is one of these variants, where pairwisedifferences between variables are penalized using the �1norm, which results in successive variables being similar.The Fused lasso norm is defined as:

‖H�T‖1 =k−1∑

m=1|θ.m − θ.m+1| , (5)

whereH is a (k − 1) × k sparse matrix withHm,m = 1, andHm,m+1 = −1.

It encourages θ.m and θ.m+1 to take the same valueby shrinking the difference between them toward zero.This approach has been employed to incorporate temporalsmoothness to model disease progression. In longitudinalmodel, it is assumed that the difference of the cognitivescores between two successive time points is relativelysmall.

Fused Group Lasso RegularizedMulti-TaskFeature Learning, FGL-MTFL

Formulation

There are two limitations of traditional MTFL. Whenregularized by the �2,1-norm in Eq. 3, sparsity is achievedby treating each task equally, which ignores the underlyingstructures among predictors. On the other hand, for somehighly correlated features, the traditional MTFL tends toidentify one and ignore the others, which was inadequatefor yielding a biologically meaningful interpretation. Ourprevious work exploited relationships of features in thelearning procedure, which motivated us to consider theunderlying structure of these relationships. To address thelimitations of MTFL with �2,1-norm, we consider thestructure of task and features instead of assuming that alltasks have similar weights and commonly selected features.More specifically, we propose a new regularization formulti-task feature learning, called the fused group lasso.The underlying idea is that if tasks are highly correlatedaccording to the task interrelations, the tasks are supposed toshare common brain regions, but tasks with low correlationare more likely to have different brain regions.

To model the task-feature relationships, we first constructa graph G to model the task correlation, as shown in Fig. 2(left). Let G = (V , E) be an undirected graph where Vrepresents the set of vertices and E is the set of edges. Eachtask is treated as a graph node, and each edge (pairwise link)em,l(m, l) ∈ E in G corresponds to an edge from the m-th task to the l-th task, and |rm,l | encodes the strength ofthe relationship between the m-th task and the l-th task. In

Fig. 2 The illustration of the FGL-MTFL method. The methodinvolves two steps: 1) estimation of task correlation and construct anundirected graph (Left); and 2) joint learning of all regression models

in a fused group lasso regularized multi-task feature learning (FGL-MTFL) formulation based on the estimated correlation matrix (Right)

Neuroinform

this model, we adopt a simple and commonly-used approachto infer G from data. In this approach, we first computepairwise Pearson correlation coefficients for each pair oftasks, and then connect two nodes with an edge only if theircorrelation coefficient is above a given threshold τ .

Brain imaging measures are often correlated with eachother, so incorporation of the covariance of MRI featurescan improve the performance of traditional MTL methods.Thus, to identify biologically meaningful markers, weutilize prior knowledge of interrelated structure to grouprelated features together to guide the learning process. Thebenefit of this strategy was also revealed in our previouswork (Liu et al. 2017) , in which we proposed a grouplasso multi-task learning algorithm that explicitly modelsthe correlation structure within features, and achieved goodperformance in the prediction of cognitive scores fromimaging measures.

Once the graph of task correlation G is constructed, wethen integrate the group structure of features into the fusedlasso norm and propose a new fused group lasso regularizedmulti-task feature learning (FGL-MTFL) model as follows.

min�

1

2‖Y − X�‖2F + λ1‖�‖2,1 + λ2

∑

(m,l)∈E|rm,l |‖θ.m

−sign(rm,l)θ.l‖G2,1 , (6)

where |rm,l | indicates the strength weight of the correlationbetween two tasks connected by an edge, λ1 and λ2 rep-resent regularization parameters that determine the amountof two separate penalties. In the new fused group lassonorm, if the two tasks m and l are highly correlated, thedifference between the two corresponding regression coef-ficients θ.m and θ.l will be penalized more than differencesfor other pairs of tasks with weaker correlation. The regular-ization tends to flatten the values of regression coefficientsfor each feature across multiple highly correlated tasks, sothat the strength of the influence of each task becomesmore similar across those tasks.

The FGL-MTFL model includes two regularizationprocesses: (1) all tasks are regularized by the �2,1-norm regularizer, which captures global relationships byencouraging multiple predictors across tasks to share similarparameter sparsity patterns. This ensures that a small subsetof features will be selected for all regression modelsfrom a global perspective. (2) Two local structures (graphstructures within tasks and group structures within features)are considered from a local perspective by the proposedfused group lasso (FGL) regularizer, which combines twostructured sparsity regularizers (fused lasso and group lasso)to capture the specific task-ROI specific structures. Thisencourages the matching of related tasks to similar selectedbrain regions.

This new model not only preserves the strength of �2,1-norm to require similarity across multiple scores from acognitive test, but also considers the complex graph struc-ture among responses and the interrelated group structureof imaging predictors. To the best of our knowledge, nosparsity-based algorithm has been described that includesboth global and local task correlation for the predictionof cognitive outcomes in AD.

We used a symmetric correlation matrix D ∈ Rk×kto describe the correlation in the fusion penalty, which isdefined as:

Dm,l =

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

−rm,l (m, l) ∈ E,m �= lk∑

m=1m �=l

|rm,l | (m, l) ∈ E,m = l

0 otherwise.

(7)

where m, l = 1, 2, . . . , k. We normalize the matrix by κ ,which is the number of edges in E, defined as Z = (1/κ)D.Then the formulation can be written into the followingmatrix form:

min�

1

2‖Y − X�‖2F + λ1‖�‖2,1 + λ2‖�Z‖G2,1 , (8)

The framework of the proposed method is illustrated inFig. 2.

Efficient Optimization for FGL-MTFL

ADMM

In recent years, ADMM has become popular since it is ofteneasy to parallelize distributed convex problems. In ADMM,the solutions to small local subproblems are coordinated toidentify the globally optimal solution (Boyd et al. 2011).

minx,z

f (x) + g(z)s.t . Ax + Bz = c.

The variant augmented Largrangian of ADMM methodis formulated as follows:

Lρ(x, z, u) = f (x) + g(z) + uT (Ax + Bz − c)+ρ2

‖Ax + Bz − c‖2

where f and g are convex functions, and variables A ∈R

p×n, x ∈ Rn, B ∈ Rp×m, z ∈ Rm, c ∈ Rp. u is a scaleddual augmented Lagrangian multiplier, and ρ is a non-negative penalty parameter. In each iteration of ADMM, thisproblem is solved by alternating minimization Lρ(x, z, u)

Neuroinform

over x, z, and u. At the (k + 1)-th iteration, the update ofADMM is carried out by:

xk+1 := argminx

Lρ

(x, zk, uk

)

zk+1 := argminz

Lρ

(xk+1, z, uk

)

uk+1 := uk + ρ(Axk+1 + Bzk+1 − c

)

Efficient Optimization for FGL-MTFL

We developed an efficient ADMM-based algorithm to solvethe objective function in Eq. 8, which is equivalent to thefollowing constrained optimization problem:

min�,Q,S

1

2‖Y − X�‖2F + λ1‖Q‖2,1 + λ2‖S‖G2,1

s.t.� − Q = 0, �Z − S = 0 , (9)

where Q, S are slack variables. Then Eq. 9 can be solved byADMM. The augmented Lagrangian is

Lρ(�, Q, S, U, V ) = 12‖Y − X�‖2F + λ1‖Q‖2,1

+λ2‖S‖G2,1 + 〈U, � − Q〉 +ρ

2‖�

−Q‖2 + 〈V,�Z − S〉+ρ2

‖�Z − S‖2 , (10)

where U and V are augmented Lagrangian multipliers.Update �: From the augmented Lagrangian in Eq. 10,

the update of � at (t + 1)-th iteration is carried out by:

�(t+1) = argmin�

1

2‖Y − X�‖2F +

〈U(t), �

− Q(t)〉+ ρ

2

∥∥∥� − Q(t)∥∥∥2

+〈V (t), �Z − S(t)

〉+ ρ

2

∥∥∥�Z − S(t)∥∥∥2

(11)

which is the closed form and can be derived by settingEq. 11 to zero.

0 = −XT(Y − X�) + U(t) + ρ(� − Q(t)

)+ V (t)Z

+ ρ(�Z − S(t)

)Z (12)

Note that Z is a symmetric matrix. We define = ZZ ,where is also a symmetric matrix with m,l denoting thevalue of weight (m, l). With this linearization, the value for� can be updated in parallel by the individual θ.m. Thus

in the (t + 1)-th iteration, θ(t+1).m can be updated efficientlyusing Cholesky factorization.

0 = −XT (ym − Xθ.m) + u(t).m + ρ(θ.m − q(t).m

)

+

⎛

⎜⎜⎝v(t).m −

k∑

m=1m �=l

Zm,lv(t).l

⎞

⎟⎟⎠

+ρ

⎛

⎜⎜⎝m,mθ.m −k∑

m=1m �=l

m,lθ.l

⎞

⎟⎟⎠

−ρ

⎛

⎜⎜⎝s(t).m −

k∑

m=1m �=l

Zm,ls(t).l

⎞

⎟⎟⎠ (13)

The above optimization problem is quadratic. Theoptimal solution is given by θ(t+1).m = F−1m b(t)m , whereFm = XTX + ρ(1 + m,m)I

b(t)m = XTym − u(t).m −

⎛

⎜⎜⎝v(t).m −

k∑

m=1m �=l

Zm,lv(t).l

⎞

⎟⎟⎠ + ρq(t).m

+ ρ

⎛

⎜⎜⎝s(t).m −

k∑

m=1m �=l

Zm,ls(t).l

⎞

⎟⎟⎠ + ρk∑

m=1m �=l

m,lθ.l . (14)

The computation of θ(t+1).m involves solving a linearsystem, which is the most time-consuming part of the wholealgorithm. To efficiently compute θ(t+1).m efficiently, we cancompute the Cholesky factorization of F at the beginning ofthe algorithm:

Fm = ATmAm . (15)Note that F is a constant and positive definite matrix.

Using Cholesky factorization, we only need to solve thefollowing two linear systems for each iteration:

ATmθ̂.m = b(t), Aθ.m = θ̂.m . (16)Since Am is an upper triangular matrix, it is very efficient

to solve these two linear systems.Update Q: The update for Q effectively needs to solve

the following problem

Q(t+1) = argminQ

ρ

2

∥∥∥Q − �(t+1)∥∥∥2 + λ1‖Q‖2,1

−〈U(t), Q

〉, (17)

which is equivalent to the following problem:

Q(t+1) =argminQ

{λ1

ρ‖Q‖2,1 + 1

2‖Q − O(t+1)‖2

}. (18)

Neuroinform

where O(t+1) = �(t+1) + 1ρU(t). It is clear that Eq. 18 can

be decoupled into

q(t+1)i. = argminqi. φ(qi.)

= argminqi.

{1

2‖qi. − o(t+1)i. ‖2 +

λ1

ρ‖qi.‖

}(19)

where qi. and oi. are the i-th row of Q(t+1) and O(t+1),respectively. Since φ(qi.) is strictly convex, we concludethat q(t+1)i. is its unique minimizer. Then we introduce thefollowing lemma (Liu et al. 2009) to solve the Eq. 19.

Lemma 1 For any λ1 ≥ 0, we have

qi. =max

{‖oi.‖2 − λ1ρ , 0

}

‖oi.‖2 oi. , (20)

Update S: The update for S effectively needs to solvethe following problem

S(t+1) = argminS

ρ

2

∥∥∥S − �(t+1)Z∥∥∥2 + λ2‖S‖G2,1

−〈V (t), S

〉, (21)

which is effectively equivalent to computation of theproximal operator for G2,1-norm. In particular, the problemcan be written as

S(t+1) = argminS

{λ2

ρ‖S‖G2,1 +

1

2‖S − �(t+1)‖2

}. (22)

where �(t+1) = �(t+1)Z+ 1ρV (t). Since the groups G� used

in our work are disjointed, the Eq. 22 can be decoupled into

s(t+1)Glm = argminsGlm φ(sGlm)

= argminsGlm

{1

2‖sGlm − π(t+1)Glm ‖2 +

λ2

ρ‖sGlm‖

}

(23)

where sG�m, πG�m are rows in group G� for task m of S(t+1)and �(t+1), respectively. Then we introduce the followinglemma (Yuan et al. 2013).

Lemma 2 For any λ2 ≥ 0, we have

sG�m =max

{‖πG�m‖2 − λ2ν�ρ , 0

}

‖πG�m‖2πG�m , (24)

Dual Update for U and V : Following standard ADMMdual update, the update for the dual variables according toour setting is as follows:

U(t+1) = U(t) + ρ(�(t+1) − Q(t+1)) (25a)V (t+1) = V (t) + ρ(�(t+1)Z − S(t+1)) . (25b)The dual updates can be performed in an element-

wise parallel manner. Algorithm 1 summarizes the whole

algorithm. The MATLAB codes of the proposed algorithmare available at: https://bitbucket.org/XIAOLILIU/fgl-mtfl.

Convergence

The convergence of the Algorithm 1 is shown in thefollowing theorem

Theorem 3 Suppose there exists at least one solution �∗ ofEq. 8. Assume L(�) is convex, λ1 > 0, λ2 > 0. Then thefollowing property for FGL-MTFL iteration in Algorithm 1holds:

limt→∞ L(�

(t)) + λ1∥∥∥�(t)

∥∥∥2,1

+ λ2∥∥∥�(t)Z

∥∥∥G2,1

= L(�∗) + λ1∥∥�∗

∥∥2,1 + λ2

∥∥�∗Z∥∥

G2,1(26)

Furthermore,

limt→∞

∥∥∥�(t) − �∗∥∥∥ = 0 (27)

whenever Eq. 8 has a unique solution.

Note that the condition that allowed convergence inTheorem 3 is quite easy to satisfy. λ1, λ2 are regularizationparameters and should always be larger than zero. Thedetailed proof is discussed in (Cai et al. 2009). Unlike Caiet al., we do not require L(�) to be differentiable, andexplicitly treat the non-differentiability of L(�) by using itssubgradient vector ∂L(�), similar to the strategy used by(Ye and Xie 2011).

Experimental Results and Discussions

In this section, we present experimental results to demon-strate the effectiveness of the proposed FGL-MTFLto characterize AD progression using a dataset fromthe Alzheimer’s Disease Neuroimaging Initiative (ADNI)(Weiner et al. 2010).

https://bitbucket.org/XIAOLILIU/fgl-mtfl

Neuroinform

Experimental Setup

MR images and data used in this work were obtained fromthe Alzheimers Disease Neuroimaging Initiative (ADNI)database (adni.loni.ucla.edu) (Weiner et al. 2010). The pri-mary goal of ADNI is to test whether serial MRI, PET,other biological markers, and clinical and neuropsychologi-cal assessments can be combined to measure the progressionof MCI and early AD. Approaches to characterize AD pro-gression will help researchers and clinicians develop newtreatments and monitor their effectiveness. Further, a bet-ter understanding of disease progression will increase thesafety and efficacy of drug development and potentiallydecrease the time and cost of clinical trails. In ADNI, allparticipants received 1.5 Tesla (T) structural MRI. The MRIfeatures used in our experiments are based on imaging datafrom the ADNI database processed by a team from UCSF(University of California at San Francisco), who performedcortical reconstruction and volumetric segmentations withthe FreeSurfer image analysis suite (http://surfer.nmr.mgh.harvard.edu/) according to the atlas generated in (Desikanet al. 2006). The FreeSurfer software was employed to auto-matically label the cortical and subcortical tissue classes forthe structural MRI scan of each subject and to extract thick-ness measures of the cortical regions of interests (ROIs) andvolume measures of cortical and subcortical regions.

Briefly, this processing includes motion correctionand the averaging (Reuter et al. 2010) of multiplevolumetric T1-weighted images (when more than one isavailable), removal of non-brain tissue using a hybridwatershed/surface deformation procedure (Segonne et al.2004), automated Talairach transformation, segmentationof the subcortical white matter and deep gray mattervolumetric structures (including hippocampus, amygdala,caudate, putamen, and ventricles) (Fischl et al. 2002, 2004),intensity normalization (Sled et al. 1998), tessellation ofthe gray matter white matter boundary, automated topologycorrection (Fischl et al. 2001; Ségonne et al. 2007),and surface deformation following intensity gradients tooptimally place the gray/white and gray/cerebrospinal fluidborders at the location where the greatest shift in intensitydefines the transition to the other tissue class (Dale et al.1999; Dale and Sereno 1993).

In total, 48 cortical regions and 44 subcortical regionswere generated, , with typically 1 or 4 features in eachgroup. The names of the cortical and subcortical regionsare listed in Tables 1 and 2. For each cortical region,the cortical thickness average (TA), standard deviation ofthickness (TS), surface area (SA), and cortical volume (CV)were calculated as features. For each subcortical region,the subcortical volume was calculated as a feature. Theseparate SA values for the left and right hemisphere andthe total intracranial volume (ICV) were also included.

Table 1 Cortical features from the following 71 (= 35×2+1) corticalregions generated by FreeSurfer

ID ROI name Laterality Type

1 Banks superior temporal sulcus L, R CV, SA, TA, TS

2 Caudal anterior cingulate cortex L, R CV, SA, TA, TS

3 Caudal middle frontal gyrus L, R CV, SA, TA, TS

4 Cuneus cortex L, R CV, SA, TA, TS

5 Entorhinal cortex L, R CV, SA, TA, TS

6 Frontal pole L, R CV, SA, TA, TS

7 Fusiform gyrus L, R CV, SA, TA, TS

8 Inferior parietal cortex L, R CV, SA, TA, TS

9 Inferior temporal gyrus L, R CV, SA, TA, TS

10 Insula L, R CV, SA, TA, TS

11 IsthmusCingulate L, R CV, SA, TA, TS

12 Lateral occipital cortex L, R CV, SA, TA, TS

13 Lateral orbital frontal cortex L, R CV, SA, TA, TS

14 Lingual gyrus L, R CV, SA, TA, TS

15 Medial orbital frontal cortex L, R CV, SA, TA, TS

16 Middle temporal gyrus L, R CV, SA, TA, TS

17 Paracentral lobule L, R CV, SA, TA, TS

18 Parahippocampal gyrus L, R CV, SA, TA, TS

19 Pars opercularis L, R CV, SA, TA, TS

20 Pars orbitalis L, R CV, SA, TA, TS

21 Pars triangularis L, R CV, SA, TA, TS

22 Pericalcarine cortex L, R CV, SA, TA, TS

23 Postcentral gyrus L, R CV, SA, TA, TS

24 Posterior cingulate cortex L, R CV, SA, TA, TS

25 Precentral gyrus L, R CV, SA, TA, TS

26 Precuneus cortex L, R CV, SA, TA, TS

27 Rostral anterior cingulate cortex L, R CV, SA, TA, TS

28 Rostral middle frontal gyrus L, R CV, SA, TA, TS

29 Superior frontal gyrus L, R CV, SA, TA, TS

30 Superior parietal cortex L, R CV, SA, TA, TS

31 Superior temporal gyrus L, R CV, SA, TA, TS

32 Supramarginal gyrus L, R CV, SA, TA, TS

33 Temporal pole L, R CV, SA, TA, TS

34 Transverse temporal cortex L, R CV, SA, TA, TS

35 Hemisphere L, R SA

36 Total intracranial volume Bilateral CV

275 (= 34 × 2 × 4 + 1 × 2 × 1 + 1) cortical features calculatedwere analyzed in this study. Laterality indicates different feature typescalculated for L (left hemisphere), R (right hemisphere) or Bilateral(whole hemisphere)

This yielded a total of p = 319 MRI features extractedfrom cortical/subcortical ROIs in each hemisphere (Tables 1and 2). Details of the analysis procedure are available athttp://adni.loni.ucla.edu/research/mri-post-processing/.

The ADNI project is a longitudinal study, with datacollected repeatedly over a 6-month or 1-year interval. The

http://surfer.nmr.mgh.harvard.edu/http://surfer.nmr.mgh.harvard.edu/http://adni.loni.ucla.edu/research/mri-post-processing/

Neuroinform

Table 2 Subcortical features from the following 44 (= 16 × 2 + 12)subcortical regions generated by FreeSurfer

number ROI Laterality Type

1 Accumbens area L, R SV

2 Amygdala L, R SV

3 Caudate L, R SV

4 Cerebellum cortex L, R SV

5 Cerebellum white matter L, R SV

6 Cerebral cortex L, R SV

7 Cerebral white matter L, R SV

8 Choroid plexus L, R SV

9 Hippocampus L, R SV

10 Inferior lateral ventricle L, R SV

11 Lateral ventricle L, R SV

12 Pallidum L, R SV

13 Putamen L, R SV

14 Thalamus L, R SV

15 Ventricle diencephalon L, R SV

16 Vessel L, R SV

17 Brain stem Bilateral SV

18 Corpus callosum anterior Bilateral SV

19 Corpus callosum central Bilateral SV

20 Corpus callosum middle anterior Bilateral SV

21 Corpus callosum middle posterior Bilateral SV

22 Corpus callosum posterior Bilateral SV

23 Cerebrospinal fluid Bilateral SV

24 Fourth ventricle Bilateral SV

25 Non white matter hypointensities Bilateral SV

26 Optic chiasm Bilateral SV

27 Third ventricle Bilateral SV

28 White matter hypointensities Bilateral SV

44 subcortical features calculated were analyzed in this study.Laterality indicates different feature types calculated for L (lefthemisphere), R (right hemisphere) or Bilateral (whole hemisphere)

scheduled screening date for subjects becomes baselineafter approval and the time point for the follow-up visits isdenoted by the duration starting from the baseline. In ourcurrent work, we investigated the prediction performance ofour method to infer cognitive outcomes based on a numberof neuropsychological assessments at the time of the initialbaseline. In this work, we further performed the followingpreprocessing steps:

– remove features with more than 10% missing entries(for all patients and all time points);

– remove the ROI whose name is “unknown;– remove the instances with missing value of cognitive

scores;– exclude patients without baseline MRI records;– complete the missing entries using the average value.

This yields a total of n = 788 subjects, who werethen categorized into three baseline diagnostic groups:Cognitively Normal (CN, n1 = 225), Mild CognitiveImpairment (MCI, n2 = 390), and Alzheimer’s Disease(AD, n3 = 173). Table 3 lists the demographics informationof all subjects, including age, gender, and education. Weused 10-fold cross validation to evaluate our model andconducted the comparison. In each of ten trials, a 5-fold nested cross validation procedure was employed totune the regularization parameters. The data were z-scoredbefore applying the regression methods. The range of eachparameter varied from 10−1 to 103. The average (avg)and standard deviation (std) of performance measures werecalculated and shown as avg ± std for each experiment.For each run, all methods received exactly the same trainand test set. The reported results are the best results ofeach method with the optimal parameters. For predictivemodeling, all the cognitive assessments (a total of 20 tasks)in Table 4 were examined. To the best of our knowledge, noprevious works have used all the cognitive scores to trainand evaluate their MTL models.

For the quantitative performance evaluation, weemployed the metrics of Correlation Coefficient (CC) andRoot Mean Squared Error (rMSE) between the predictedclinical scores and the target clinical scores for each regres-sion task. To evaluate the overall performance for all tasks,the normalized mean squared error (nMSE) (Argyriou et al.2008; Zhou et al. 2013) and weighted R-value (wR) (Ston-nington et al. 2010) were used. The rMSE, CC, nMSE andwR are defined as follows:

rMSE(y, ŷ

) =∥∥y − ŷ∥∥22

n(28)

Corr(y, ŷ

) = cov(y, ŷ)σ (y)σ (ŷ)

(29)

where y is the ground truth of the target at a singletask and ŷ is the corresponding prediction according toa prediction model, cov is the covariance, and σ is thestandard deviation.

nMSE(Y, Ŷ

)=

∑kh=1

‖Yh−Ŷh‖22σ(Yh)∑k

h=1nh(30)

Table 3 Summary of ADNI dataset and subject information

Category CN MCI AD

Number 225 390 173

Gender (M/F) 116/109 252/138 88/85

Age (y, ag ± sd) 75.87 ± 5.04 74.75 ± 7.39 75.42 ± 7.25Education (y, ag ± sd) 16.03 ± 2.85 15.67 ± 2.95 14.65 ± 3.17

M, male; F, female; y, years; ag, average; sd, standard deviation

Neuroinform

Table 4 Description of thecognitive scores considered inthe experiments

Num Score name Description

1 ADAS Alzheimer’s disease assessment scale

2 MMSE Mini-mental state exam

3 RAVLT TOTAL Total score of the first 5 learning trials

4 TOT6 Trial 6 total number of words recalled

5 TOTB Immediately after the fifth learning trial

6 T30 30 minute delay total number of words recalled

7 RECOG 30 minute delay recognition

8 FLU ANIM Animal total score

9 VEG Vegetable total score

10 TRAILS A Trail making test A score

11 B Trail making test B score

12 LOGMEM IMMTOTAL Immediate recall

13 DELTOTAL Delayed recall

14 CLOCK DRAW Clock drawing

15 COPYSCORE Clock copying

16 BOSNAM Total number correct

17 ANART ANART total score

18 DSPAN For Digit span forward

19 BAC Digit span backward

20 DIGIT Digit symbol substitution

wR(Y, Ŷ

)=

∑kh=1Corr(Yh, Ŷh)nh∑k

h=1nh(31)

where Y and Ŷ are the ground truth cognitive scores and thepredicted cognitive scores, respectively.

Smaller values of nMSE and rMSE and larger valuesof CC and wR indicate better regression performance.We report the mean and standard deviation based on 10experimental iterations on different splits of data for allcomparable experiments. A Student’s t-test at a significancelevel of 0.05 is performed to determine whether theperformances difference are significant.

Comparison with Baseline Comparable Methods

In this section, we conduct empirical evaluation for the pro-posed methods by comparison with two single-task learningmethods: Lasso and Ridge. Both methods were appliedindependently to each task, and compared to representativemulti-task learning methods:

1. Multi-task feature learning (MTFL): min�

12‖Y −

X�‖2F + λ‖�‖2,1.2. Multi-task feature learning combined with lasso (SGL-

MTFL): min�

12‖Y − X�‖2F + λ1‖�‖2,1 + λ2‖�‖1.

3. Fused lasso regularized multi-task learning (FL-MTL):min�

12‖Y − X�‖2F + λ‖�Z‖1.

4. Fused group lasso regularizedmulti-task learning (FGL-MTL): min

�

12‖Y − X�‖2F + λ‖�Z‖G2,1 .

5. Fused lasso regularized multi-task feature learning (FL-MTFL): min

�

12‖Y − X�‖2F + λ1‖�‖2,1 + λ2‖�Z‖1.

6. Group lasso regularized multi-task feature learning(GL-MTFL): min

�

12‖Y − X�‖2F + λ1‖�‖2,1 +

λ2‖�‖G2,1 .7. Fused lasso regularized SGL-MTL (FSGL-MTL):min

�

12

‖Y − X�‖2F + λ1‖�‖1 + λ2‖�Z‖G2,1 .8. Fused group lasso regularized MTFL (FGL-MTFL):

min�

12‖Y − X�‖2F + λ1‖�‖2,1 + λ2‖�Z‖G2,1 .

The average and standard deviation of performancemeasures were calculated by 10-fold cross validation ondifferent splits of data, and are shown in Tables 5 and 6. Itis worth noting that the same training and testing data wereused across experiments for all methods for fair comparison.Note that all pairwise links between tasks were incorporatedinto both the fused lasso and fused group lasso regularizedmethods, and the graph was created using only the trainingdata.

From the experimental results in Tables 5 and 6, weobserve the following:

1. FL-MTFL and FGL-MTFL both demonstrated animproved performance over the other baseline methodsin terms of nMSE and wR, while FGL-MTFLperformed the best among all competing methods. The

Neuroinform

Table5

Performance

comparisonof

variousmethods

interm

sof

CC(the

first2

0columns)andwR(the

lastcolumn)

on10

crossvalid

ationcognitive

predictio

ntasks

Method

ADAS

MMSE

RAVLT

FLU

TRAILS

TOTA

LTOT6

TOTB

T30

RECOG

ANIM

VEG

AB

Ridge

0.598±

0.053

0.417±

0.073

0.407±

0.120

0.352±

0.131

0.145±

0.093

0.367±

0.132

0.259±

0.107

0.200±

0.135

0.390±

0.132

0.316±

0.111

0.363±

0.125

Lasso

0.659±

0.066

0.547±

0.065

0.512±

0.108

0.501±

0.117

0.312±

0.083

0.514±

0.116

0.407±

0.112

0.353±

0.095

0.504±

0.089

0.382±

0.099

0.477±

0.094

MTFL

0.667±

0.065

0.536±

0.065

0.515±

0.121

0.491±

0.132

0.297±

0.094

0.509±

0.123

0.405±

0.123

0.377±

0.091

0.505±

0.089

0.413±

0.087

0.476±

0.098

FL-M

TL

0.641±

0.074

0.523±

0.095

0.491±

0.112

0.472±

0.102

0.287±

0.085

0.473±

0.096

0.409±

0.085

0.355±

0.107

0.485±

0.076

0.352±

0.085

0.385±

0.159

FGL-M

TL

0.662±

0.060

0.557±

0.065

0.479±

0.109

0.501±

0.116

0.286±

0.080

0.516±

0.113

0.409±

0.115

0.370±

0.106

0.512±

0.077

0.347±

0.101

0.351±

0.119

SGL-M

TFL

0.667±

0.063

0.543±

0.061

0.517±

0.120

0.495±

0.128

0.308±

0.089

0.513±

0.120

0.408±

0.123

0.380±

0.090

0.506±

0.089

0.414±

0.086

0.477±

0.097

GL-M

TFL

0.668±

0.063

0.546±

0.062

0.517±

0.119

0.496±

0.129

0.308±

0.087

0.514±

0.120

0.410±

0.123

0.380±

0.090

0.508±

0.088

0.414±

0.086

0.477±

0.097

FL-M

TFL

0.665±

0.067

0.548±

0.074

0.522±

0.118

0.498±

0.121

0.324±

0.084

0.513±

0.115

0.420±

0.127

0.396±

0.086

0.510±

0.084

0.414±

0.086

0.479±

0.095

FSGL-M

TL

0.623±

0.071

0.503±

0.083

0.519±

0.091

0.437±

0.101

0.306±

0.065

0.430±

0.092

0.352±

0.122

0.387±

0.072

0.483±

0.080

0.379±

0.104

0.476±

0.100

FGL-M

TFL

0.668±

0.068

0.549±

0.068

0.522±

0.114

0.499±

0.121

0.324±

0.069

0.514±

0.110

0.429±

0.121

0.399±

0.081

0.505±

0.087

0.419±

0.086

0.481±

0.095

Method

LOGMEM

CLOCK

BOSN

AM

ANART

DSP

AN

DIG

ITwR

IMMTOTA

LDELT

OTA

LDRAW

COPY

SCOR

FOR

BAC

Ridge

0.417±

0.108

0.426±

0.121

0.222±

0.100

0.126±

0.091

0.359±

0.149

0.054±

0.071

0.002±

0.055

0.041±

0.114

0.389±

0.049

0.293±

0.046∗

Lasso

0.493±

0.093

0.524±

0.107

0.374±

0.078

0.209±

0.050

0.472±

0.104

0.129±

0.078

0.025±

0.083

0.167±

0.132

0.415±

0.099

0.399±

0.049∗

MTFL

0.507±

0.085

0.519±

0.099

0.380±

0.097

0.199±

0.114

0.468±

0.126

0.104±

0.080

0.089±

0.106

0.153±

0.081

0.478±

0.064

0.404±

0.055∗

FL-M

TL

0.470±

0.097

0.514±

0.108

0.379±

0.093

0.221±

0.102

0.442±

0.085

0.094±

0.104

0.083±

0.102

0.177±

0.108

0.414±

0.083

0.383±

0.046∗

FGL-M

TL

0.499±

0.093

0.531±

0.106

0.378±

0.087

0.212±

0.098

0.479±

0.096

0.139±

0.079

0.110±

0.105

0.205±

0.108

0.450±

0.049

0.400±

0.047∗

SGL-M

TFL

0.508±

0.086

0.520±

0.098

0.394±

0.092

0.210±

0.096

0.471±

0.118

0.110±

0.081

0.099±

0.104

0.167±

0.087

0.477±

0.063

0.409±

0.054∗

GL-M

TFL

0.509±

0.084

0.521±

0.098

0.395±

0.087

0.220±

0.102

0.471±

0.118

0.109±

0.081

0.089±

0.103

0.169±

0.085

0.477±

0.063

0.410±

0.054∗

FL-M

TFL

0.510±

0.084

0.526±

0.095

0.408±

0.083

0.251±

0.092

0.471±

0.096

0.133±

0.087

0.093±

0.098

0.208±

0.113

0.478±

0.069

0.418±

0.054

FSGL-M

TL

0.452±

0.073

0.492±

0.087

0.398±

0.074

0.2412

±0.091

0.440±

0.070

0.119±

0.113

0.109±

0.113

0.209±

0.126

0.435±

0.085

0.390±

0.045∗

FGL-M

TFL

0.514±

0.084

0.533±

0.094

0.408±

0.083

0.252±

0.092

0.473±

0.089

0.137±

0.082

0.096±

0.090

0.218±

0.113

0.484±

0.070

0.421±

0.052

Superscriptsym

bols

∗indicates

thatFG

L-M

TFL

significantly

outperform

edthatmethodon

thatscore.Student’s

t-testatalevelo

f0.05

was

used

The

bold

valueindicatesthebestperformance

Neuroinform

Table6

Performance

comparisonof

variousmethods

interm

sof

rmse

(the

first2

0columns)andnM

SE(the

lastcolumn)

on10

crossvalid

ationcognitive

predictio

ntasks

Method

ADAS

MMSE

RAVLT

FLU

TRAILS

TOTA

LTOT6

TOTB

T30

RECOG

ANIM

VEG

AB

Ridge

7.445±

0.370

2.568±

0.146

11.16±

0.735

3.909±

0.366

1.982±

0.125

4.061±

0.287

4.311±

0.430

6.307±

0.552

4.276±

0.386

26.19±

3.764

80.01±

8.103

Lasso

6.727±

0.424

2.215±

0.088

9.909±

0.907

3.305±

0.266

1.666±

0.166

3.421±

0.244

3.634±

0.247

5.464±

0.486

3.690±

0.192

23.29±

4.297

69.78±

5.152

MTFL

6.662±

0.411

2.191±

0.106

9.656±

0.695

3.324±

0.260

1.671±

0.149

3.441±

0.233

3.626±

0.272

5.266±

0.448

3.676±

0.180

23.01±

3.492

69.88±

5.281

FL-M

TL

7.012±

0.471

2.295±

0.226

10.02±

0.830

3.429±

0.435

1.796±

0.199

3.598±

0.536

3.684±

0.319

5.362±

0.617

3.772±

0.239

24.77±

4.299

78.18±

10.52

FGL-M

TL

6.691±

0.458

2.157±

0.097

10.04±

0.593

3.309±

0.271

1.676±

0.153

3.420±

0.235

3.627±

0.265

5.280±

0.471

3.672±

0.182

24.85±

3.600

81.07±

7.425

SGL-M

TFL

6.653±

0.427

2.191±

0.098

9.647±

0.755

3.315±

0.271

1.664±

0.150

3.432±

0.246

3.627±

0.246

5.260±

0.499

3.676±

0.208

23.00±

3.568

69.82±

5.184

GL-M

TFL

6.650±

0.429

2.186±

0.099

9.644±

0.754

3.315±

0.270

1.664±

0.150

3.431±

0.248

3.623±

0.247

5.261±

0.497

3.673±

0.207

22.99±

3.566

69.82±

5.177

FL-M

TFL

6.680±

0.445

2.204±

0.096

9.6223

±0.784

3.324±

0.273

1.663±

0.169

3.445±

0.286

3.617±

0.192

5.220±

0.503

3.677±

0.222

22.97±

3.660

69.67±

5.007

FSGL-M

TL

7.183±

0.730

2.523±

0.124

9.811±

0.859

3.623±

0.481

2.009±

0.180

3.791±

0.537

3.912±

0.222

5.365±

0.635

3.929±

0.383

23.27±

4.204

69.75±

5.084

FGL-M

TFL

6.678±

0.474

2.214±

0.081

9.631±

0.788

3.335±

0.276

1.668±

0.167

3.456±

0.305

3.610±

0.186

5.221±

0.493

3.697±

0.223

22.90±

3.666

69.53±

5.011

Method

LOGMEM

CLOCK

BOSN

AM

ANART

DSP

AN

DIG

ITnM

SE

IMMTOTA

LDELT

OTA

LDRAW

COPY

SCOR

FOR

BAC

Ridge

4.696±

0.366

5.277±

0.509

1.153±

0.108

0.775±

0.060

4.563±

0.540

11.24±

0.756

2.570±

0.262

2.559±

0.193

12.76±

1.273

10.36±

1.089∗

Lasso

4.198±

0.327

4.586±

0.471

0.990±

0.099

0.657±

0.082

3.992±

0.457

9.813±

0.651

2.100±

0.285

2.134±

0.191

11.65±

1.318

7.898±

0.586∗

MTFL

4.145±

0.303

4.589±

0.436

0.967±

0.120

0.647±

0.103

3.952±

0.478

9.611±

0.722

2.010±

0.123

2.160±

0.178

11.22±

1.225

7.762±

0.633∗

FL-M

TL

4.319±

0.496

4.672±

0.689

1.174±

0.263

0.942±

0.323

4.073±

0.494

9.806±

0.850

2.131±

0.252

2.225±

0.179

11.76±

1.391

9.102±

1.134∗

FGL-M

TL

4.170±

0.334

4.560±

0.494

0.973±

0.112

0.656±

0.081

3.938±

0.471

9.746±

0.640

1.989±

0.136

2.116±

0.199

11.58±

1.287

9.114±

0.923∗

SGL-M

TFL

4.143±

0.327

4.585±

0.456

0.960±

0.117

0.644±

0.093

3.965±

0.425

9.584±

0.743

1.998±

0.134

2.143±

0.186

11.23±

1.261

7.742±

0.617∗

GL-M

TFL

4.140±

0.326

4.584±

0.456

0.960±

0.114

0.643±

0.092

3.965±

0.424

9.585±

0.743

2.001±

0.133

2.142±

0.186

11.23±

1.261

7.740±

0.616∗

FL-M

TFL

4.147±

0.354

4.582±

0.491

0.974±

0.101

0.654±

0.076

3.978±

0.459

9.491±

0.728

1.998±

0.145

2.121±

0.187

11.21±

1.263

7.711±

0.585∗

FSGL-M

TL

4.467±

0.525

4.859±

0.719

1.485±

0.072

1.307±

0.049

4.208±

0.411

9.553±

0.721

2.289±

0.208

2.394±

0.244

11.56±

1.320

8.329±

0.593∗

FGL-M

TFL

4.139±

0.366

4.566±

0.505

0.973±

0.102

0.654±

0.076

3.984±

0.485

9.471±

0.708

1.997±

0.149

2.119±

0.191

11.17±

1.241

7.689±

0.565

Superscriptsym

bols

∗indicates

thatFG

L-M

TFL

significantly

outperform

edthatmethodon


t-testatalevelo

f0.05

was

used

The

bold


Neuroinform

t-test results show that the FGL-MTFL significantlyoutperforms all the other methods in terms of nMSEand all the other methods except FL-MTFL in termsof CC. Thus, simultaneously exploiting the structureamong the tasks and features resulted in significantlybetter prediction performance.

2. The norm of both the fused lasso and fused group lassocan improve the performance of the traditional MTFL,which demonstrates that considering the local structurewithin the tasks can improve the overall predictionperformance. A fundamental step is to estimate thetrue relationships among tasks, thus promoting the mostappropriate information sharing among related taskswhile avoiding the use of information from unrelatedtasks.

3. We investigated the effect of group penalty in ourmodel. The traditional MTFL consideres only the spar-sity of regression coefficients, thus failing to capture thegroup structure of features in the data.

From the Tables 5 and 6, we can find that bothFL-MTFL and FGL-MTFL are better than traditionalMTFL. This is because the assumption in MTFLis a strict constraint and may affect the flexibilityof the multi-task model as described in “�2,1-norm”.The extraction of multiple features to measure theatrophy of each imaging biomarker can further improveprediction performance by capturing inherent featurestructures.

GL-MTFL is similar to FGL-MTFL but the regu-larization term of G2,1-norm ignore the structure oftasks. This approach can flexibly take into accountthe complex relationships among imaging markers ina fused lasso regularization rather than relying on asimple grouping scheme. Moreover, compared withGL-MTFL, our FGL-MTFL can flexibly consider thecomplex relationships among outcomes in a group for-mat rather than within a simple group lasso scheme.Overall, FGL-MTFL achieved better performances than

Actual y0 5 10 15 20 25 30 35 40 45 50 55

Pre

dict

y

0

5

10

15

20

25

30

35

40

45

50

55 CC(MTFL) = 0.666 CC(FGL-MTFL) = 0.667

MTFLFGL-MTFL

(a) ADAS.

Actual y035202

Pre

dict

y

20

25


MTFLFGL-MTFL

(b) MMSE.

Actual y5 10 15 20 25 30 35 40 45 50 55 60 65 70

Pre

dict

y

5

10

15

20

25

30

35

40

45

50

55

60

65


MTFLFGL-MTFL

(c) TOTAL.

Actual y0 5 10 15 20 25 30 35 40

Pre

dict

y

0

5

10

15

20

25

30

35


MTFLFGL-MTFL

(d) ANIM.

Fig. 3 Scatter plots of actual versus predicted values of cognitive scores on each fold testing data using MTFL and FBGL-MTL methods basedon MRI features

Neuroinform

FL-MTFL, demonstrating the benefit of employinggroup structural information among the features.

4. All compared multi-task learning methods improvepredictive performance over the independent regressionalgorithm (Ridge, Lasso, GOSCAR, and ncFGS).This justifies the motivation to learn multiple taskssimultaneously.

Additionally, we show the scatter plots for the predictedvalues versus the actual values for ADAS, MMSE, TOTAL,andANIM for the testing data in the cross-validation in Fig. 3.

To investigate the influence of edge weight on per-formance, we compared our proposed FGL-MTFL withan unweighted FGL-MTFL (λ1‖�‖2,1 + λ2∑km=1‖θ.m −sign(rm,l)θ.l‖G2,1 ), which only considers the graph topologyof tasks and treats all links equally. The result is shown inTable 7, and shows obvious improvement with the weightedFGL-MTFL compared to the unweighted one.

Comparison with the State-of-the-Art MTLMethods

To illustrate how well our FGL-MTFL works by means ofmodeling the correlation among the tasks, we comprehen-sively compared our proposed methods with several popularstate-of-the-art related methods. The representative multi-task learning algorithms used for comparison include thefollowing models:

1. Robust multi-Task Feature Learning (RMTL) (Chenet al. 2011):

RMTL (min� L(X, Y, �) + λ1‖P ‖∗ + λ2‖S‖2,1,subject to � = P + S), which assumes that the model� can be decomposed into two components: a sharedfeature structure P that captures task relatedness and agroup-sparse structure S that can detect outliers.

2. Clustered multi-Task Learning (CMTL) (Zhou et al.2011):

CMTL (min�,M:MTM=Ic L(X, Y, �) + λ1(Tr(�T�) − Tr(MT�T�M)) + λ2Tr(�T�), where M ∈R

c×k is an orthogonal cluster indicator matrix, and thetasks are clustered into c < k clusters) incorporates aregularization term to induce clustering between tasksand then shares information only to tasks belonging tothe same cluster. In CMTL, the number of clusters is setto 11 because the 20 tasks belong to 11 sets of cognitivefunctions.

3. Trace-Norm Regularized multi-Task Learning (Trace)(Ji and Ye 2009): Assumes that all models share a com-mon low-dimensional subspace (min� L(X, Y, �) +λ‖�‖∗).

4. Sparse regularized multi-task learning formulation(SRMTL) (Zhou ):

SRMTL (min� L(X, Y, �) + λ1‖�Z‖2F + λ2‖�‖1 , where Z ∈ Rk×k) contains two regularization Table7

The

influenceof

weightin

gschemeof

fusedgrouplassonorm

inFG

L-M

TFL

interm

sof

CCandwR

Method

ADAS

MMSE

RAVLT

FLU

TRAILS

TOTA

LTOT6

TOTB

T30

RECOG

ANIM

VEG

AB

unweighed

0.669±

0.062

0.542±

0.055

0.517±

0.116

0.4923

±0.128

0.295±

0.095

0.510±

0.120

0.399±

0.120

0.381±

0.090

0.507±

0.091

0.416±

0.085

0.478±

0.098

weighted

0.668±

0.068

0.549±

0.068

0.522±

0.114

0.499±

0.121

0.324±

0.069

0.514±

0.110

0.429±

0.121

0.399±

0.081

0.505±

0.087

0.419±

0.086

0.481±

0.095

Method

LOGMEM

CLOCK

BOSN

AM

ANART

DSP

AN

DIG

ITwR

IMMTOTA

LDELT

OTA

LDRAW

COPY

SCOR

FOR

BAC

unweighed

0.508±

0.086

0.519±

0.099

0.341±

0.077

0.090±

0.132

0.471±

0.113

0.107±

0.085

0.086±

0.114

0.141±

0.098

0.479±

0.065

0.397±

0.052∗

weighted

0.514±

0.084

0.533±

0.094

0.408±

0.083

0.252±

0.092

0.473±

0.089

0.137±

0.082

0.096±

0.090

0.218±

0.113

0.484±

0.070

0.421±

0.052

The

first2

0columns

areCCandthelastoneiswR

The

bold


Neuroinform

Table8

Performance

comparisonof

variousmethods

interm

sof

CC(the

first2

0columns)andwR(the

lastcolumn)

on10-foldcrossvalid

ationforallthe

cognitive

predictio

ntasks

Method

ADAS

MMSE

RAVLT

FLU

TRAILS

TOTA

LTOT6

TOTB

T30

RECOG

ANIM

VEG

AB

Robust

0.604±

0.056

0.394±

0.085

0.423±

0.132

0.420±

0.143

0.248±

0.121

0.427±

0.119

0.339±

0.122

0.272±

0.130

0.444±

0.102

0.309±

0.114

0.319±

0.121

CMTL

0.646±

0.060

0.429±

0.070

0.494±

0.090

0.474±

0.106

0.263±

0.084

0.480±

0.102

0.381±

0.094

0.360±

0.088

0.494±

0.086

0.419±

0.092

0.479±

0.088

Trace

0.559±

0.070

0.255±

0.121

0.413±

0.125

0.428±

0.124

0.227±

0.142

0.445±

0.111

0.356±

0.135

0.265±

0.155

0.419±

0.111

0.282±

0.104

0.313±

0.094

SRMTL

0.664±

0.062

0.542±

0.064

0.495±

0.095

0.497±

0.117

0.323±

0.081

0.514±

0.112

0.401±

0.118

0.373±

0.101

0.502±

0.089

0.376±

0.095

0.425±

0.122

p-M

SSL

0.666±

0.071

0.541±

0.074

0.520±

0.099

0.493±

0.113

0.312±

0.066

0.510±

0.107

0.420±

0.128

0.394±

0.085

0.497±

0.080

0.378±

0.087

0.398±

0.114

G-SMuR

FS0.670±

0.062

0.535±

0.060

0.506±

0.114

0.482±

0.124

0.284±

0.100

0.492±

0.116

0.391±

0.113

0.366±

0.095

0.495±

0.092

0.409±

0.085

0.471±

0.101

MTRL

0.636±

0.056

0.543±

0.071

0.504±

0.081

0.476±

0.104

0.313±

0.070

0.478±

0.095

0.387±

0.113

0.395±

0.082

0.506±

0.081

0.417±

0.097

0.477±

0.091

FGL-M

TFL

0.668±

0.068

0.549±

0.068

0.522±

0.114

0.499±

0.121

0.324±

0.069

0.514±

0.110

0.429±

0.121

0.399±

0.081

0.505±

0.087

0.419±

0.086

0.481±

0.095

Method

LOGMEM

CLOCK

BOSN

AM

ANART

DSP

AN

DIG

ITwR

IMMTOTA

LDELT

OTA

LDRAW

COPY

SCOR

FOR

BAC

Robust

0.448±

0.109

0.470±

0.106

0.342±

0.099

0.133±

0.112

0.373±

0.130

0.050±

0.083

0.007±

0.120

0.130±

0.089

0.392±

0.049

0.327±

0.059∗

CMTL

0.497±

0.086

0.515±

0.096

0.370±

0.077

0.203±

0.114

0.440±

0.079

0.134±

0.080

0.044±

0.130

0.188±

0.116

0.468±

0.070

0.389±

0.046∗

Trace

0.423±

0.121

0.470±

0.119

0.294±

0.123

0.065±

0.136

0.236±

0.141

0.045±

0.082

0.041±

0.137

0.149±

0.069

0.323±

0.094

0.301±

0.070∗

SRMTL

0.504±

0.091

0.525±

0.102

0.362±

0.077

0.084±

0.120

0.482±

0.095

0.117±

0.082

0.092±

0.092

0.178±

0.125

0.463±

0.051

0.396±

0.046∗

p-M

SSL

0.512±

0.087

0.538±

0.098

0.384±

0.063

0.236±

0.110

0.483±

0.085

0.150±

0.078

0.096±

0.113

0.197±

0.128

0.469±

0.074

0.410±

0.045∗

G-SMuR

FS0.502±

0.091

0.514±

0.105

0.366±

0.100

0.197±

0.110

0.475±

0.129

0.091±

0.078

0.070±

0.102

0.132±

0.104

0.477±

0.059

0.396±

0.054∗

MTRL

0.493±

0.075

0.507±

0.089

0.402±

0.076

0.262±

0.084

0.470±

0.087

0.143±

0.089

0.108±

0.102

0.223±

0.128

0.469±

0.0741

0.410±

0.045∗

FGL-M

TFL

0.514±

0.084

0.533±

0.094

0.408±

0.083

0.252±

0.092

0.473±

0.089

0.137±

0.082

0.096±

0.090

0.218±

0.113

0.484±

0.070

0.421±

0.052

Superscriptsym

bols

∗indicates

thatFG

L-M

TFL

significantly

outperform

edthatmethodon


t-testatalevelo

f0.05

was

used

The

bold


Neuroinform

Table9

Performance

comparisonof

variousmethods

interm

sof

rmse

(the

first2

0columns)andnM

SE(the

lastcolumn)

on10

crossvalid

ationcognitive

predictio

ntasks

Method

ADAS

MMSE

RAVLT

FLU

TRAILS

TOTA

LTOT6

TOTB

T30

RECOG

ANIM

VEG

AB

Robust

7.339±

0.549

2.811±

0.131

10.82±

0.815

3.587±

0.333

1.729±

0.139

3.742±

0.238

3.887±

0.364

5.7623

±0.491

3.948±

0.307

26.52±

3.501

85.30±

7.559

CMTL

15.88±

1.291

21.52±

0.443

27.94±

1.645

4.944±

0.654

3.520±

0.172

4.562±

0.623

8.824±

0.537

14.06±

1.105

9.731±

0.777

43.51±

4.236

126.2±

11.40

Trace

7.969±

0.925

4.310±

1.464

11.50±

1.000

3.629±

0.325

1.805±

0.136

3.746±

0.220

3.988±

0.562

6.086±

0.724

4.179±

0.593

28.41±

2.181

88.38±

5.778

SRMTL

6.925±

0.464

2.405±

0.316

10.40±

0.814

4.067±

0.878

3.028±

1.786

4.233±

0.940

4.116±

0.737

5.432±

0.495

4.243±

0.869

24.07±

3.794

75.16±

8.209

p-M

SSL

6.698±

0.521

2.330±

0.117

9.675±

0.803

3.400±

0.383

1.802±

0.177

3.514±

0.397

3.693±

0.211

5.267±

0.577

3.780±

0.303

23.49±

3.806

75.87±

7.699

G-SMuR

FS6.641±

0.422

2.209±

0.107

9.741±

0.708

3.348±

0.283

1.684±

0.150

3.483±

0.240

3.666±

0.262

5.307±

0.482

3.719±

0.196

23.07±

3.581

70.23±

5.622

MTRL

7.010±

0.567

2.197±

0.101

9.825±

0.749

3.386±

0.282

1.659±

0.155

3.540±

0.323

3.682±

0.232

5.232±

0.436

3.708±

0.181

22.95±

4.051

69.57±

4.612

FGL-M

TFL

6.678±

0.474

2.214±

0.081

9.631±

0.788

3.335±

0.276

1.668±

0.167

3.456±

0.305

3.610±

0.186

5.221±

0.493

3.697±

0.223

22.90±

3.666

69.53±

5.011

Method

LOGMEM

CLOCK

BOSN

AM

ANART

DSP

AN

DIG

ITnM

SE

IMMTOTA

LDELT

OTA

LDRAW

COPY

SCOR

FOR

BAC

Robust

4.436±

0.374

4.909±

0.431

1.002±

0.125

0.699±

0.098

4.488±

0.355

10.74±

0.650

2.146±

0.143

2.228±

0.182

12.55±

1.275

10.44±

1.070∗

CMTL

7.890±

0.807

6.559±

0.976

3.465±

0.099

3.771±

0.093

20.76±

0.510

13.83±

1.009

6.899±

0.230

5.386±

0.251

31.85±

1.789

47.57±

2.176∗

Trace

4.649±

0.487

4.994±

0.645

1.134±

0.255

0.939±

0.306

5.595±

1.363

10.97±

0.858

2.313±

0.360

2.331±

0.227

13.80±

1.025

11.88±

1.516∗

SRMTL

4.896±

1.048

5.197±

0.629

2.698±

2.063

3.356±

3.077

4.040±

0.519

10.08±

0.852

3.635±

2.039

3.130±

1.288

12.15±

1.571

11.68±

4.228∗

p-M

SSL

4.199±

0.440

4.586±

0.598

1.200±

0.058

0.953±

0.049

4.022±

0.390

9.481±

0.719

2.114±

0.184

2.231±

0.222

11.30±

1.162

8.518±

0.771∗

G-SMuR

FS4.168±

0.310

4.615±

0.454

0.976±

0.116

0.653±

0.088

3.960±

0.408

9.681±

0.731

2.025±

0.127

2.179±

0.184

11.26±

1.246

7.844±

0.668∗

MTRL

4.223±

0.308

4.685±

0.468

0.959±

0.119

0.632±

0.099

3.966±

0.540

9.443±

0.611

1.985±

0.136

2.106±

0.187

11.278

±1.315

7.761±

0.045

FGL-M

TFL

4.139±

0.366

4.566±

0.505

0.973±

0.102

0.654±

0.076

3.984±

0.485

9.471±

0.708

1.997±

0.149

2.119±

0.191

11.17±

1.241

7.689±

0.565

Superscriptsym

bols

∗indicates

thatFG

L-M

TFL

significantly

outperform

edthatmethodon


t-testatalevelo

f0.05

was

used

The

bold


Neuroinform

processes: (1) all tasks are regularized by their meanvalue, and therefore knowledge obtained from onetask can be utilized by other tasks via the meanvalue; (2) sparsity is enforced in the learning with �1norm.

5. Multi-task Sparse Structure Learning from tasks param-eters (p-MSSL) (Goncalves et al. 2014):

p-MSSL (min�,��0 L(X, Y, �) − k2 log |�| +Tr(��T ) + λ1‖�‖1 + λ2‖�‖1, where � ∈ Rk×k is amatrix that captures the task relationship structure.

6. Group-Sparse Multi-task Regression and Feature Selec-tion (G-SMuRFS) (Yan et al. 2015):

G-SMuRFS (min� L(X, Y, �) + λ1‖�‖2,1 +λ2

∑ql=1 wl

√∑j∈Gl ‖θj .‖2) takes into account coupled

features and group sparsity across tasks. Parametersfor these three methods are set following the sameapproach as that used in the baseline comparable meth-ods. Tables 8 and 9 show the results of the comparableMTL methods in terms of rMSE, CC, nMSE and wR.

7. Multi-task relationship learning (MTRL) (Zhang andYeung 2012b):

MTRL (minW,b,�∑m

I=1 1ni∑

j=1 ni(yij − wTi xij −bi)

2 + λ12 tr(WWT ) + λ22 tr(W�−1WT ), s.t. � 0, tr(�) = 1) can simultaneously model positivetask correlation, describe negative task correlation, andidentify outlier tasks based on the same underlyingprinciple.

From the experimental results in Tables 8 and 9, weobserve that the proposed method significantly outperformsthe other recently developed algorithms. Specifically, thet-test results indicate that FGL-MTFL significantly outper-forms all other methods significantly in terms of CC andall but FL-MTRL in terms of nMSE. Compared with theother multi-task learning that utilize different assumptions,G-SMuRFS and our proposed methods, both multi-task fea-ture learning methods with sparsity-inducing norms, havean advantage. Since not all the brain regions are associatedwith AD, many of the features are irrelevant and redundant.Thus, sparse-based MTL methods involving FGL-MTFLand G-SMuRFS are more appropriate to predict cognitivemeasures with better performance than the non-sparse basedMTL methods. Unlike G-SMuRFS, the group regulariza-tion G2,1-norm in FGL-MTFL decouples the group sparseregularization across tasks to provide more flexibility.

Identification of Correlation among CognitiveOutcomes

A fundamental component of our FGL-MTFL is to estimatethe relationship structure among tasks, thus promotingappropriate information sharing among related tasks. Inthis section, we investigate and evaluate the estimated taskrelationships. Figure 4 shows the normalized estimatedcorrelation matrix Z .

Fig. 4 Correlation matrix

Tasks

AD

AS

MM

SE

TO

TA

LT

OT

6T

OT

BT

30R

EC

OG

AN

IMV

EG A B

IMM

TO

TA

LD

ELT

OT

AL

DR

AW

CO

PY

SC

OR

BO

SN

AM

AN

AR

TF

OR

BA

CD

IGIT

Tas

ks

ADASMMSETOTAL

TOT6TOTB

T30RECOG

ANIMVEG

AB

IMMTOTALDELTOTAL

DRAWCOPYSCOR

BOSNAMANART

FORBAC

DIGIT

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Neuroinform

Using the previous results obtained by our FGL-MTFL,the constructed graph G includes all pairwise links foreach pair of tasks. The correlation of these links may beweak or not actually correlated due to bad estimation ofthe correlation matrix. To clearly analyze the influenceof the estimated task relationships to the predictionperformance of cognitive outcomes, we constructed graphsG that vary the value of the threshold τ based on theestimated correlation matrix Z . The range of the τ value is0.1, 0.3, 0.5, and 0.7. The graphs of these four examplesare shown in Fig. 5. In this experiment, rather than includingall pairwise links, a specific threshold value was appliedto the correlation matrix Z , and the performance with

different threshold values is presented in Table 10. In thiscase, by thresholding the estimated correlation matrix witha higher τ , we can construct a sparse undirected graph torepresent only the most reliable correlation. From the datashown in Fig. 5, we can find that with an increase in thethreshold value, the graph of tasks becomes less dense.When τ = 0.1, only one link is removed, and when τincreases to 0.3 and 0.5, the numbers of remaining links areonly 133 and 48, respectively. When τ = 0.7, there are onlyeight highly correlated cognitive scores. We found theseremaining cognitive outcomes (including ADAS, MMSE,and RAVLT) are the most commonly used in the multi-task learning to predict cognitive outcomes (Yan et al. 2013,

(a) 0.1. (189) (b) 0.3. (133)

(c) 0.5. (48) (d) 0.7. (8)Fig. 5 The correlation graphs with four different threshold values

Neuroinform

Table10

The

influenceof

thresholdto

thepredictio

nperformance

forthefusedlassoor

fusedgrouplassoregularizedmodel

Method

τADAS

MMSE

RAVLT

FLU

TRAILS

TOTA

LTOT6

TOTB

T30

RECOG

ANIM

VEG

AB

FL-M

TL

0.7

0.665±

0.068

0.553±

0.069

0.498±

0.101

0.506±

0.120

0.286±

0.060

0.517±

0.117

0.411±

0.128

0.382±

0.102

0.509±

0.081

0.363±

0.089

0.368±

0.122

0.5

0.662±

0.066

0.550±

0.070

0.488±

0.103

0.507±

0.119

0.308±

0.068

0.516±

0.116

0.410±

0.119

0.375±

0.104

0.509±

0.085

0.355±

0.093

0.358±

0.121

0.3

0.657±

0.064

0.557±

0.070

0.474±

0.108

0.501±

0.117

0.287±

0.082

0.511±

0.114

0.407±

0.107

0.366±

0.106

0.505±

0.086

0.348±

0.097

0.348±

0.120

0.1

0.641±

0.074

0.522±

0.095

0.491±

0.112

0.472±

0.102

0.287±

0.085

0.472±

0.096

0.409±

0.085

0.356±

0.107

0.485±

0.076

0.352±

0.085

0.386±

0.159

–0.641±

0.074

0.523±

0.095

0.491±

0.112

0.472±

0.102

0.287±

0.085

0.473±

0.096

0.409±

0.085

0.355±

0.107

0.485±

0.076

0.352±

0.085

0.385±

0.159

FGL-M

TL

0.7

0.659±

0.066

0.541±

0.068

0.504±

0.097

0.503±

0.122

0.286±

0.053

0.521±

0.116

0.409±

0.137

0.390±

0.086

0.508±

0.081

0.367±

0.093

0.385±

0.119

0.5

0.660±

0.064

0.538±

0.067

0.496±

0.099

0.506±

0.120

0.307±

0.064

0.521±

0.117

0.412±

0.130

0.388±

0.098

0.513±

0.081

0.361±

0.096

0.372±

0.119

0.3

0.663±

0.062

0.553±

0.067

0.487±

0.105

0.506±

0.117

0.291±

0.077

0.520±

0.116

0.411±

0.120

0.380±

0.102

0.514±

0.078

0.353±

0.099

0.359±

0.119

0.1

0.662±

0.060

0.556±

0.065

0.479±

0.109

0.501±

0.116

0.287±

0.080

0.517±

0.113

0.410±

0.116

0.371±

0.106

0.512±

0.077

0.347±

0.101

0.351±

0.119

–0.662±

0.060

0.557±

0.065

0.479±

0.109

0.501±

0.116

0.286±

0.080

0.516±

0.113

0.409±

0.115

0.370±

0.106

0.512±

0.077

0.347±

0.101

0.351±

0.119

FGL-M

TFL

0.7

0.667±

0.062

0.549±

0.061

0.517±

0.118

0.500±

0.127

0.320±

0.081

0.516±

0.117

0.421±

0.126

0.380±

0.083

0.505±

0.082

0.414±

0.085

0.479±

0.098

0.5

0.665±

0.063

0.547±

0.057

0.523±

0.113

0.496±

0.120

0.317±

0.078

0.513±

0.114

0.413±

0.133

0.388±

0.081

0.499±

0.089

0.416±

0.092

0.481±

0.095

0.3

0.667±

0.069

0.549±

0.069

0.522±

0.113

0.498±

0.121

0.327±

0.069

0.512±

0.110

0.427±

0.123

0.400±

0.080

0.503±

0.087

0.419±

0.087

0.482±

0.095

0.1

0.668±

0.068

0.549±

0.068

0.522±

0.114

0.499±

0.121

0.324±

0.069

0.514±

0.110

0.429±

0.121

0.399±

0.081

0.505±

0.087

0.419±

0.086

0.481±

0.095

–0.668±

0.068

0.549±

0.068

0.522±

0.114

0.499±

0.121

0.324±

0.069

0.514±

0.110

0.429±

0.121

0.399±

0.081

0.505±

0.087

0.419±

0.086

0.481±

0.095

Method

τLOGMEM

CLOCK

BOSN

AM

ANART

DSP

AN

DIG

ITwR

IMMTOTA

LDELT

OTA

LDRAW

COPY

SCOR

For

BAC

FL-M

TL

0.7

0.510±

0.092

0.534±

0.104

0.369±

0.075

0.226±

0.099

0.478±

0.088

0.156±

0.093

0.108±

0.120

0.187±

0.137

0.456±

0.065

0.404±

0.048∗

0.5

0.503±

0.093

0.529±

0.105

0.378±

0.073

0.226±

0.096

0.480±

0.096

0.142±

0.087

0.117±

0.116

0.193±

0.134

0.448±

0.059

0.403±

0.047∗

0.3

0.496±

0.095

0.524±

0.108

0.372±

0.089

0.218±

0.100

0.470±

0.103

0.119±

0.078

0.121±

0.112

0.191±

0.102

0.441±

0.056

0.396±

0.048∗

0.1

0.470±

0.097

0.514±

0.108

0.379±

0.093

0.221±

0.102

0.442±

0.084

0.094±

0.104

0.083±

0.102

0.177±

0.108

0.414±

0.083

0.383±

0.046∗

–0.470±

0.097

0.514±

0.108

0.379±

0.093

0.221±

0.102

0.442±

0.085

0.094±

FusedGroupLassoRegularizedMulti-TaskFeatureLearning ...adni.loni.usc.edu/adni-publications/Fused...

Documents

Transcript of FusedGroupLassoRegularizedMulti-TaskFeatureLearning ...adni.loni.usc.edu/adni-publications/Fused...