Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic...

13
Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity Issues in Younger and Older Brains Elisabeth Wenger, 1 * Johan Ma ˚rtensson, 1,2 Hannes Noack, 1,3 Nils Christian Bodammer, 1 Simone Kuhn, 1 Sabine Schaefer, 1 Hans-Jochen Heinze, 4,5 Emrah Duzel, 3,4,6 Lars Backman, 7 Ulman Lindenberger, 1 and Martin Lovd en 1,7 * 1 Center for Lifespan Psychology, Max Planck Institute for Human Development, Germany 2 Department of Psychology, Lund University, Sweden 3 Institute of Medical Psychology and Behavioral Neurobiology, Tubingen University, Germany 4 Department of Neurology, Otto-von-Guericke University of Magdeburg, Germany 5 German Centre for Neurodegenerative Diseases (DZNE), Magdeburg, Germany 6 Institute of Cognitive Neuroscience, University College London, United Kingdom 7 Aging Research Center, Karolinska Institutet and Stockholm University, Sweden r r Abstract: We compared hippocampal volume measures obtained by manual tracing to automatic seg- mentation with FreeSurfer in 44 younger (20–30 years) and 47 older (60–70 years) adults, each meas- ured with magnetic resonance imaging (MRI) over three successive time points, separated by four months. Retest correlations over time were very high for both manual and FreeSurfer segmentations. With FreeSurfer, correlations over time were significantly lower in the older than in the younger age group, which was not the case with manual segmentation. Pearson correlations between manual and FreeSurfer estimates were sufficiently high, numerically even higher in the younger group, whereas intra-class correlation coefficient (ICC) estimates were lower in the younger than in the older group. FreeSurfer yielded higher volume estimates than manual segmentation, particularly in the younger age group. Importantly, FreeSurfer consistently overestimated hippocampal volumes independently of manually assessed volume in the younger age group, but overestimated larger volumes in the older age group to a less extent, introducing a systematic age bias in the data. Age differences in hippocam- pal volumes were significant with FreeSurfer, but not with manual tracing. Manual tracing resulted in a significant difference between left and right hippocampus (right > left), whereas this asymmetry Additional Supporting Information may be found in the online version of this article. Contract grant sponsor: Max Planck Society; Sofja Kovalevskaja Award, the Alexander von Humboldt foundation, the German Federal Ministry for Education and Research (BMBF); German Research Council; BMBF; Swedish Research Council; Swedish Brain Power; Alexander von Humboldt Research Award; af Joc- hnick Foundation; International Max Planck Research School “The Life Course: Evolutionary and Ontogenetic Dynamics” (LIFE). *Correspondence to: Elisabeth Wenger, Center for Lifespan Psy- chology, Max Planck Institute for Human Development, Lent- zeallee 94, 14195 Berlin, Germany. E-mail: Wenger@mpib- berlin.mpg.de (or) Martin Lovd en, Center for Lifespan Psychol- ogy, Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin, Germany. E-mail: [email protected] Disclosure Statement: No conflicts of interest. Received for publication 3 December 2013; Accepted 14 January 2014. DOI 10.1002/hbm.22473 Published online 00 Month 2014 in Wiley Online Library (wileyonlinelibrary.com). r Human Brain Mapping 00:00–00 (2014) r V C 2014 Wiley Periodicals, Inc.

Transcript of Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic...

Page 1: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

Comparing Manual and Automatic Segmentationof Hippocampal Volumes Reliability and Validity

Issues in Younger and Older Brains

Elisabeth Wenger1 Johan Martensson12 Hannes Noack13

Nils Christian Bodammer1 Simone Keurouhn1 Sabine Schaefer1

Hans-Jochen Heinze45 Emrah Deurouzel346 Lars Beuroackman7

Ulman Lindenberger1 and Martin Leuroovden17

1Center for Lifespan Psychology Max Planck Institute for Human Development Germany2Department of Psychology Lund University Sweden

3Institute of Medical Psychology and Behavioral Neurobiology Teuroubingen University Germany4Department of Neurology Otto-von-Guericke University of Magdeburg Germany

5German Centre for Neurodegenerative Diseases (DZNE) Magdeburg Germany6Institute of Cognitive Neuroscience University College London United Kingdom7Aging Research Center Karolinska Institutet and Stockholm University Sweden

r r

Abstract We compared hippocampal volume measures obtained by manual tracing to automatic seg-mentation with FreeSurfer in 44 younger (20ndash30 years) and 47 older (60ndash70 years) adults each meas-ured with magnetic resonance imaging (MRI) over three successive time points separated by fourmonths Retest correlations over time were very high for both manual and FreeSurfer segmentationsWith FreeSurfer correlations over time were significantly lower in the older than in the younger agegroup which was not the case with manual segmentation Pearson correlations between manual andFreeSurfer estimates were sufficiently high numerically even higher in the younger group whereasintra-class correlation coefficient (ICC) estimates were lower in the younger than in the older groupFreeSurfer yielded higher volume estimates than manual segmentation particularly in the younger agegroup Importantly FreeSurfer consistently overestimated hippocampal volumes independently ofmanually assessed volume in the younger age group but overestimated larger volumes in the olderage group to a less extent introducing a systematic age bias in the data Age differences in hippocam-pal volumes were significant with FreeSurfer but not with manual tracing Manual tracing resulted ina significant difference between left and right hippocampus (right gt left) whereas this asymmetry

Additional Supporting Information may be found in the onlineversion of this article

Contract grant sponsor Max Planck Society Sofja KovalevskajaAward the Alexander von Humboldt foundation the GermanFederal Ministry for Education and Research (BMBF) GermanResearch Council BMBF Swedish Research Council SwedishBrain Power Alexander von Humboldt Research Award af Joc-hnick Foundation International Max Planck Research School ldquoTheLife Course Evolutionary and Ontogenetic Dynamicsrdquo (LIFE)

Correspondence to Elisabeth Wenger Center for Lifespan Psy-chology Max Planck Institute for Human Development Lent-

zeallee 94 14195 Berlin Germany E-mail Wengermpib-berlinmpgde (or) Martin Leuroovden Center for Lifespan Psychol-ogy Max Planck Institute for Human Development Lentzeallee94 14195 Berlin Germany E-mail MartinlovdenkiseDisclosure Statement No conflicts of interest

Received for publication 3 December 2013 Accepted 14 January2014

DOI 101002hbm22473Published online 00 Month 2014 in Wiley Online Library(wileyonlinelibrarycom)

r Human Brain Mapping 0000ndash00 (2014) r

VC 2014 Wiley Periodicals Inc

effect was considerably smaller with FreeSurfer estimates We conclude that FreeSurfer constitutes afeasible method to assess differences in hippocampal volume in young adults FreeSurfer estimates inolder age groups should however be interpreted with care until the automatic segmentation pipelinehas been further optimized to increase validity and reliability in this age group Hum Brain Mapp00000ndash000 2014 VC 2014 Wiley Periodicals Inc

Key words hippocampus FreeSurfer manual segmentation left right asymmetry aging

r r

INTRODUCTION

Given its importance in memory and learning the hip-pocampus (Hc) has been extensively studied with neuroi-maging techniques in the last decade [eg Squire et al2004] Measurement of hippocampal volume has becomean important diagnostic tool in clinical settings [egMeurouller et al 2010] and volume differences have beenrevealed for a variety of neurological and psychiatric dis-orders [Bremner et al 1995 Chetelat and Baron 2003Heckers 2001 Jack et al 1999 Sheline et al 1996] Meas-uring hippocampal volume is also important in studies ofhealthy adults [Bhatia et al 1993 Honeycutt and Smith1995] The Hc is considered one of the very few brainregions that preserve their potential for neurogenesis intoadulthood and aging [Christie and Cameron 2006] It istherefore a valuable model for studying both the potentialfor neuroplasticity as well as its limits in mature brainsAnimal studies have demonstrated hippocampal changesin response to experience [Churchhill et al 2002 Jess-berger and Gage 2008 Kempermann et al 2002 Kronen-berg et al 2006 Rosenzweig and Bennett 1996 van Praaget al 2000] In the last years neuroplasticity within thehippocampal formation has also been shown in humansIn their seminal studies Maguire et al showed that Lon-don taxi drivers have a larger posterior hippocampusregion compared to controls and that successful spatialknowledge acquisition is related to hippocampal growth[Maguire et al 2000 2006 Woollett and Maguire 2011]Other types of learning can trigger changes in hippocam-pal volume as well for example learning how to juggleover a period of 3 months [Boyke et al 2008] studyingfor a final medical exam [Draganski et al 2006] spatialnavigation training [Leuroovden et al 2012] or intense study-ing of a foreign language [Martensson et al 2012]

Automated morphometric methods such as voxel-basedmorphometry [Ashburner and Friston 2000] have beenrepeatedly criticized for inaccurate normalization proce-dures and for being susceptible to minor changes in theprocessing pipelines [Bookstein 2001 Thomas et al 2009but see Ashburner and Friston 2001 for a rebuttal] There-fore manual segmentation is still considered the goldstandard However manual segmentation is time-consuming and labor-intensive and thus often not feasiblefor large data sets It also requires at least two individual

tracers to avoid biases In many cases it is desirable tosegment the hippocampal formation quickly and effi-ciently either to gather information on the health status ofa specific person or to investigate large data sets resource-efficiently Thus it is important to further validate the use-fulness of available automatic tools and to specify underwhich conditions they do and do not work well

Manual and automatic segmentation methods have beencompared previously [eg Cherbuin et al 2009 Moreyet al 2009] The findings mostly validate the use of auto-matic methods for segmentation of brain structures such asthe Hc Cherbuin et al [2009] segmented 430 randomlyselected individuals aged 44ndash48 years and showed thatabsolute hippocampal volumes were significantly largerwith the automated measure compared with manual seg-mentation supposedly due to relatively uniform over-inclusion of boundary voxels and surrounding cerebrospi-nal fluid and the inclusion of the subiculumentorhinalparahippocampal regions Still correlations between thetwo methods were high and the relationship of hippocam-pal volume to selected sociodemographic and cognitive var-iables were unaffected by measurement method Deweyet al [2010] analyzed scans of 120 HIV-infected patients(867 male with a mean age of 473 years) and comparedtwo automated volumetric outputs namely FreeSurfer andindividual brain atlases using statistical parametric map-ping (IBASPM) to auto-assisted manual tracings They eval-uated FreeSurfer to be effective for subcortical volumetrybut recommend visual inspection of segmentation outputalong with manual correction to ensure validity of the dataThey suspected especially the border between amygdalaand Hc to often be erroneous as well as the border betweentail of the Hc and lateral ventricle Morey et al [2009] com-pared manual tracings to automatically segmented hippo-campal volumes with FreeSurfer and FSLFIRST in 20subjects Validated in relation to hand tracing hippocampalmeasurements with FreeSurfer were superior to FIRST onall aspects of the objective measures they used They alsoreported a systematic inflation of Hc volume in Freesurferestimates compared to hand tracing which they attribute toshape differences in the anterior-medial surface andincreased variance in this region of the Hc Shen et al [2010]compared manual and automated estimates of hippocampalvolumes in patients with cognitive complaints (n 5 39mean age 5 728) MCI (n 5 37 mean age 5 727) and early

r Wenger et al r

r 2 r

AD (n 5 11 mean age 5 756) as well as in healthy controls(n 5 38 mean age 5 706) The two methods agreedstrongly confirming FreeSurferrsquos potential to determine hip-pocampal volumes in large-scale studies even though therewas a systematic volume difference between FreeSurfer andmanual results The authors argue that this difference stemsfrom FreeSurferrsquos tendency to be more inclusive especiallyin the tail region and to have a few local excursions on thesurfaces Tae et al [2008]) conducted a study on 21 femalepatients aged 18ndash60 years with major depressive disorder inwhich FreeSurfer showed good agreement with manual hip-pocampal volume and detected hippocampal atrophy in thepatients compared with healthy controls Similar to thestudies mentioned earlier the authors speculated that thevolume overestimation was caused by an expansion of theHc segmentation into adjacent white matter at the bottom ofthe Hc as well as the inclusion of the entorhinal cortex andparts of the lateral ventricle and an overall inaccurate dif-ferentiation of the Hc from the amygdala Oscar-Bermanand Song [2011] investigated the usefulness of FreeSurfer in32 alcoholics and 37 controls with a mean age of about 53years and showed that many brain structures as well as theHc could be segmented reliably but correlations betweenFreeSurfer and a semiautomated-supervised-system werelowest in subjects with the greatest abnormalities Sanchez-Benavides et al [2010] compared hippocampal automatedmeasures with manual tracings in healthy (n 5 41) mildcognitive impairment (MCI n 5 23) and Alzheimerrsquos dis-ease patients (n 5 25 covering the age range of 60ndash80 years)and found adequate validity for FreeSurfer estimates with ageneral tendency for volume overestimation even morepronounced in atrophic brains

In this study we compare results from a manual seg-mentation protocol that has extensively been described inLeuroovden et al [2012] mainly following the guidelines byPruessner et al [2000] to an automated segmentation ofthe hippocampus done with the FreeSurfer toolbox (wwwsurfernmrmghharvardedu) Here we focus on twoaspects that have not received sufficient attention so farFirst we investigate potential age group differences in reli-ability and validity of the automated volume estimationmethod of FreeSurfer As an aged brain could also be seenas a brain with abnormalities that are more or less pro-nounced depending on interindividual differences in bio-logical aging it is warranted to examine potentiallimitations of FreeSurfer in this population when compar-ing different age groups Second we investigate potentialdifferences with respect to the segmentation of left andright Hc Brain morphometric studies often incorporatehemispheric asymmetry analyses of segmented brainstructures Such asymmetry has been studied in manyneurological conditions for example post-traumatic stressdisorder [Pavic et al 2007] MCI and Alzheimerrsquos disease[Chetelat and Baron 2003 Geroldi et al 2000 Shi et al2009] and depression [Kronmeurouller et al 2009] Since ananalysis of hippocampal asymmetry should be important

not only for diagnostic purposes it is warranted to investi-gate the performance of automatic segmentation methodscompared to manual tracing more closely also in normalpopulations

METHODS

Participants and Study Design

To compare manual and automated segmentation of Hcwe used a previously acquired dataset including 44younger participants (20ndash30 years M 5 260 SD 5 28)and 47 older participants (60ndash70 years M 5 650 SD 5

28) A detailed description of the effective sample and thedesign of the study have been reported elsewhere [seeLeuroovden et al 2011 2012 Wenger et al 2012] In short thestudy consisted of a pretest measure (henceforth calledtime point A) with magnetic resonance imaging (MRI) atraining phase of about 4 months posttest 1 (henceforthcalled time point B) with MRI immediately after the train-ing phase a period of another 4 months without any train-ing followed by posttest 2 (henceforth called time pointC) with a final MRI measure Participants in the experi-mental group performed a navigation task in a virtualenvironment of a zoo while walking on a treadmill Partic-ipants in the control group walked the exact same amountof time on a treadmill but without performing the naviga-tion task Here we analyze participants from both groupstogether and the data from all three time points Since theresults for each time point are highly similar we willfocus on time point A The replicated results from timepoint B and C can be found in the appendix For cross-time point analyses we use training group as a covariateto control for differential training effects across group andtime which have already been reported elsewhere[Leuroovden et al 2011 2012 Wenger et al 2012]

MRI Acquisition

Three high-resolution T1-weighted images from each par-ticipant were acquired on the same 3 Tesla Magnetom Trioscanner (Siemens Erlangen Germany) with an 8-channelphased-array head coil We used an MPRAGE sequencewith the following parameters TE 5 512 ms TR 5 2600 msTI 5 1100 ms flip angle 5 7 bandwidth 5 140 Hzpixelmatrix 5 320 3 320 3 240 isometric voxel size 5 08 mm3

Manual Tracing

All manual tracing was performed based on the T1-weighted MPRAGE images using a stylus on a WacomDTU-710 pen tablet (Wacom Technology Vancouver WA)and the Analyze 81 software package (AnalyzeDirectOverland Park KS)

The two experienced raters were always blinded togroup and measurement occasion before performing

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 3 r

segmentation Images were displayed in native space at amagnification factor of 3 We manually aligned all imagesand determined the anterior and posterior limits of thehippocampus to define the set of relevant slices Bothraters worked on a subset of randomly chosen slices of thesame size to minimize rater-related biases in single-datasets [Raz et al 2004] To enhance intraperson reliabilityacross the 3 measurement occasions we used a 2-step seg-mentation procedure In the first step one data set fromeach participant was randomly chosen and segmentedaccording to the protocol described below The resultingregions of interest (ROIs) were then coregistered to theremaining two data sets of that person using automaticrigid-body coregistration routines provided in Analyze 81All data sets were then anonymized with respect to themeasurement point such that the raters were unaware ofthe origin of the template ROI The final segmentation wasdone on the basis of both the segmentation protocol andthe template that overlaid the images from each timepoint The entire Hc was segmented anterior-to-posteriorusing the coronal view as default Generally our approachfollowed the protocol provided by Pruessner et al [2000]and was based on intrinsic anatomic properties of the hip-pocampus [Duvernoy 2005] following a number of guid-ing rules Full details of our protocol are reported asSupporting Information To determine the reliability of theROIs that entered our data analysis we randomly chose asubsample of 20 data sets after the segmentation of thewhole sample was completed A parallel version of theROIs that already existed for these data sets was added sothat each rater now processed those slices that the otherrater had processed before The interrater agreements forall slice-based and volume-based comparisons exceeded093 [ICC2 Shrout and Fleiss 1979]

FreeSurfer Segmentation

All cortical reconstruction and volumetric segmentationwere performed with the FreeSurfer software package ver-sion 53 using the longitudinal processing scheme imple-mented to incorporate the subject-wise correlation oflongitudinal data into the processing stream (httpsurfernmrmghharvardedu eg Fischl et al 2002) on a homoge-nous GNULinux based computer cluster comprising of 608CPU cores and a total memory size of 23 TB Assessments oftest-retest reliability of FreeSurfer have revealed high intra-class correlations of 0994 for MPRAGE sequences [Wonder-lick et al 2009] All reconstructed data were visually checkedfor segmentation accuracy at each time point No manualinterventions on the data were performed

Statistical Analysis

All statistical analyses were conducted in SPSS 20 withan alpha level for all analyses of p 5 005 We calculated sta-bility coefficients over time for the two methods with par-tial Pearson correlations using training group as a covariate

These test-retest correlations represent lower bound esti-mates of retest-reliability rather than test-retest reliabilityper se given that they are calculated on a data set in whichnot only measurement error can introduce variance but alsoboth normal aging and training effects can allow for truevariance in change All other analyses were run using thedata from the first time point A Replication of the reportedpattern of results on the data of time point B and time pointC can be found in the appendix

We calculated bivariate Pearson correlations as well asintra-class correlation coefficients (ICC2) between the twoestimation methods Both measures can be used to comparetwo raters or methods The Pearson correlation is a measureof interclass agreement that represents the overlap in rela-tive ordering of individual data points in two data sets Incontrast ICC are based on the assumption that two data setspertain to one class with common variance and mean[Shrout and Fleiss 1979 McGraw and Wong 1996] As aconsequence ICCs of the ratings of two methods do notonly decrease in response to relative ordering of the singledata points but also in response to differences in variance ordifferences in mean In that sense ICCs provide more strin-gent information about rater agreement [Meurouller andBeurouttner 1994] Conceptually ICCs are more appropriate toquantify the agreement between manual and automatedsegmentation procedures To provide better comparabilityto results of earlier studies [Cherbuin et al 2009 Morreyet al 2009 Sanchez-Benavides et al 2010] however wealso report Pearson correlations

As Shrout and Fleiss [1979] point out in their seminalarticle there are many different types of ICC and the ver-sion used must be selected carefully Following sugges-tions by McGraw and Wong [1996] we used a two-waymixed model single measure intraclass correlation withagreement type as is appropriate for the data at hand

To investigate potential age and hemisphere differenceswe conducted a 3-way repeated measures analysis of var-iance (ANOVA) with hemisphere (left right) and method(manual FreeSurfer) as within-subjects factors and age(young old) as a between-subjects factor To trace thesource of significant interactions we calculated 2-wayrepeated measures ANOVAs for each method with hemi-sphere and age as factors In addition we conductedpaired-samples t-tests to assess differences between esti-mation methods separately for the two hemispheres andbetween hemispheres for each estimation method Weconducted these follow-up analyses across as well aswithin age groups To assess potential bias in the auto-matic volume estimation we computed bivariate Pearsonrsquoscorrelations between manual estimates and the differencebetween FreeSurfer and manual estimates

RESULTS

Stability Coefficients Over Time

Stability coefficients are summarized in Table I Gener-ally partial correlations controlling for training group

r Wenger et al r

r 4 r

between the 3 measurement points were very high Formanual tracing correlations were comparably high foryounger and older adults alike with no significant differ-ences between correlations For FreeSurfer segmentationshowever we observed significant age-group differences inthe correlations Correlations were lower for the older thanfor the younger age group between some time pointsBetween pretest and posttest 1 and posttest 1 and posttest2 correlations were significantly lower in the older agegroup (computed with Fisher r-to-z transformation z 227 p lt 005 see Table I for exact correlation values)

Correlation Between FreeSurfer and Manual

Volumes

Results for Pearson correlations between manual andFreeSurfer estimates as well as ICC values are listed inTable II Pearson correlations between manual and Free-Surfer segmentations were 688 for the left Hc and 785 forthe right Hc Correlations between manual and FreeSurfersegmentations were numerically though not significantlyhigher in the young (rleft 5 0824 rright 5 0890) than inthe old (rleft 5 0659 rright 5 0784) ICC estimates wereconsiderably lower than Pearson correlations 0321 for theleft Hc and 0466 for the right Hc In addition age groupcomparisons indicated that the numerical pattern of agree-ment was reversed compared with Pearson correlationsICCs were lower in the younger group (ICCleft 5 0275ICCright 5 0381) than in the older group (ICCleft 5 0361ICCright 5 0577) reflecting a more pronounced disagree-

ment between estimates from manual and automated seg-mentation in the younger age group

Mean Differences Between FreeSurfer and

Manual Volumes as a Function of Age and

Hemisphere

The 3-way ANOVA with Hemisphere Method andAge as factors revealed a significant main effect ofMethod F(189) 5 56539 p lt 0001 h2

p 5 0864 indicat-ing a consistently higher volume estimation in FreeSur-fer tleft(90) 5 1712 p lt 0001 r2 5 0765 tright(90) 5

1584 p lt 0001 r2 5 0736 (see Table III for means andstandard deviations) Further we obtained a significantMethod 3 Age Group interaction F(189) 5 7493 p lt0001 h2

p 5 0457 reflecting two facts First there was alarger mean volume difference in younger tleft(43) 5

2100 p lt 0001 r2 5 0911 tright(43) 5 2131 p lt 0001r2 5 0914 than in older adults tleft(46) 5 1049 p lt0001 r2 5 705 tright(46) 5 880 p lt 001 r2 5 627 Sec-ond there was a significant age-related difference in hip-pocampal volume for FreeSurfer but not for manualsegmentation (Fig 1) Hence the main effect of Age inthe 2-way ANOVA was reliable for FreeSurfer (F(189) 5

4057 p lt 0001 h2p 5 0313) but not for manual tracing

(F(189) 5 194 p 5 0168 h2p 5 0021)

The significant Hemisphere 3 Method interaction in the3-way ANOVA F(189) 5 2547 p lt 0001 h2

p 5 0223indicates differences in hemispheric segmentation betweenthe two methods (see Fig 2) Specifically with manual

TABLE I Stability coefficients over time separate for

age groups controlling for training group

rpre post1 rpost1 post2 rpre post2

Left Right Left Right Left Right

YoungManual 0973 0978 0987 0983 0977 0976FreeSurfer 0987 0991 0992 0994 0987 0990

OldManual 0974 0982 0978 0982 0979 0980FreeSurfer 0967 0976 0970 0978 0977 0981

TABLE II Bivariate Pearson correlations and intra-class

correlations (agreement type) between manual segmen-

tations and FreeSurfer estimates at time point A

rmanual FreeSurfer

Partial r controllingfor age ryoung rold

Left 0688 0751 0824 0659Right 0785 0836 0890 0784

ICCoverall ICCyoung ICCold

Left 0321 0275 0361Right 0466 0381 0577

TABLE III Means and standard deviations of manual and FreeSurfer estimates as well as absolute differences

between the two methods at first measurement time point A

Left Right

Manual FreeSurfer Manual FreeSurferM 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 353014 6 37931 427572 6 49254 74558 371403 6 41350 424482 6 51514 530579Young (n 5 44) 358256 6 40077 446469 6 49117 88213 377816 6 41627 452015 6 501762 74199Old (n 5 47) 348108 6 35535 390519 6 30738 42411 365398 6 40612 398706 6 37893 33308

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 5 r

segmentation right Hc was estimated to be significantlylarger than left Hc t(90) 5 1093 p lt 0001 r2 5 0570FreeSurfer also estimated the right Hc slightly larger thanthe left however with a considerably smaller effect sizet(90) 5 286 p 5 0005 r2 5 0083

Correlation of Difference Between FreeSurfer

and Manual Estimates and Manual Segmentation

as a Proxy for True Hippocampal Volume

In the younger age group there was no relation betweenmanually assessed hippocampal volume and the differencebetween FreeSurfer and manual estimates neither in leftHc r 5 0013 p 5 0931 nor in right Hc r 5 0130 p 5

0401 However in the older age group there was a nega-tive association both in left Hc r 5 20551 p lt 0001 andin right Hc r 5 20421 p 5 0003 This suggests a bias inautomatic volume estimation for the old age group Specif-ically in the younger age group FreeSurfer overestimatedhippocampal volume and it did so independently of man-ually assessed hippocampal volume (Fig 3 see also Fig 4for exemplary visualization of small and large young andold Hc segmentations) By contrast in the older age groupthe difference between FreeSurfer and manual segmenta-tion decreased as the manually segmented Hc sizeincreased indicating a differential bias toward relativelyunderestimating larger volumes in older adults

The same analyses were run on the second and thirdmeasurement time point where results display the exactsame pattern as on the first time point (see appendix)

DISCUSSION

Our results reveal high stability coefficients over timefor both manual and FreeSurfer segmentations With Free-Surfer correlations over time were significantly lower inthe older than in the younger age group which was notthe case with manual segmentation The Pearson correla-tions between the two assessment procedures were rela-tively high (0683 for left Hc and 0766 for right Hc)

Figure 1

Mean differences between the two estimation methods were more pronounced in the young age group

Figure 2

Manual segmentation yielded a pronounced hemispheric differ-

ence in hippocampal volume FreeSurfer also segmented the

right Hc slightly larger than the left however with a consider-

ably smaller effect size reflected in a significant Hemisphere 3

Method interaction

r Wenger et al r

r 6 r

Absolute agreements between the two measures howeverwere considerably lower as FreeSurfer estimated volumesto be higher This volume difference was larger in theyoung than in the old FreeSurfer detected a significantage difference in hippocampal volume whereas manualtracing did not However manual tracing resulted in a sig-nificant difference between left and right Hc whereasFreeSurfer segmented both sides in a more similar man-ner FreeSurfer overestimated hippocampal size independ-ently of manually assessed volume in the younger agegroup but relatively underestimated larger volumes inolder adults thus introducing a bias in this age group

To our knowledge no other investigation up to date hascompared manual segmentation with FreeSurfer segmenta-tion in healthy younger adults and healthy older adultswithin the same study Doing so has allowed us to dis-cover an age-differential segmentation bias that wouldhave gone unnoticed otherwise For younger adults indi-vidual differences in volume estimates derived from Free-Surfer and manual segmentation were highly correlatedas the mean volumes estimates with FreeSurfer exceededmanually obtained estimates by a constant amount In con-trast for older adults adding a constant value did notcapture differences between the two methods as thedegree of disparity between the two methods was foundto depend on the size of manually estimated volumesSpecifically the difference between FreeSurfer and manualsegmentation decreased as the manually segmented hippo-campus volumes increased pointing to a differential biastowards relatively underestimating larger volumes in olderadults with FreeSurfer compared to manual segmentationThus FreeSurfer may yield cross-sectional age-group dif-ferences in instances in which manual segmentation wouldnot This finding is novel and important for the field ashippocampal segmentation based on FreeSurfer is increas-

ingly being used in studies on normal cognitive agingpathological cognitive aging and in studies involvingelderly populations suffering from some kind of disease

These results thus portray a twofold picture of reliabilityand validity in manual and automatic Hc segmentationsIn the younger age group FreeSurfer can be regarded as areliable and valid method for assessing differences in hip-pocampal volume with at least as high reliability as man-ual tracing However in the older age group theoverestimation with FreeSurfer was dependent on ldquotruerdquohippocampal volume as it was smaller for larger hippo-campi than for smaller hippocampi Clearly these novelresults concerning an age-differential bias in older hippo-campi are important for the growing field using automatichippocampal segmentations in aging and diseasedpopulations

The absence of a significant main effect of age for ourmanual segmentation method may be regarded as surpris-ing given previous reports of hippocampal shrinkage innormal aging [eg Raz et al 2005 Scahill et al 2003]However there have also been studies finding no cross-sectional differences in hippocampal volume with age[Bigler et al 1997 Du et al 2006 Liu et al 2003 Sullivanet al 1995 2005 Szentkuti et al 2004] It has also beenshown previously that several individuals from an olderage group in their seventies can indeed have a similar oreven larger hippocampal volume than 20ndash30 year oldadults and that the dispersion around the mean of hippo-campal volume does not necessarily have to differ foryounger and older adults [Lupien et al 2007] which isexactly what we see in our data from manual tracing Fur-thermore it is of course important to note the strict inclu-sion criteria that had to be met in order to participate inthe study Participants had to be able to walk on a tread-mill without any gait problems be free from neurological

Figure 3

Difference between FreeSurfer and manual estimates as a function of manually segmented ldquotruerdquo

hippocampal volume decreases in the old age group but stays stable for the young age group

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 7 r

or other severe diseases and had to be able to undergoMRI scanning which removed participants with implantsTherefore participants in our older age group may be clas-sified as high-functioning elderly adults As such they are

more likely to have a better-preserved brain structure thanthe average person from this age cohort

The bias in volume estimation within FreeSurfer in theolder age group is not only interfering with cross-sectional

Figure 4

A Small Hc of a young participant B Large Hc of a young participant C Small Hc of an old par-

ticipant D Large Hc of an old participant

r Wenger et al r

r 8 r

comparisons but also poses challenges on the programsrsquoapplicability in longitudinal studies FreeSurfer would pre-sumably be less sensitive to longitudinal decreases in hip-pocampal volume in samples of older adults as theoverestimation is higher for smaller volumes in this agegroup and would also be less likely to find longitudinalincreases as larger hippocampal volumes are not beingoverestimated to the same extent as smaller ones

Another intriguing difference between the two segmen-tation methods concerns the leftndashright Hc asymmetry Asexpected from the literature [eg Barnes et al 2008 Bassoet al 2006 Honeycutt and Smith 1995 Jack et al 2003Krasuski et al 1998 Pruessner et al 2000] manual seg-mentation yielded a bigger right than left Hc FreeSurferalso estimates the right Hc as slightly bigger but with aconsiderably smaller effect size When we used earlierprocessing versions of FreeSurfer we even saw an absenceof significant differences between the hippocampi Thedata pattern observed in our study is in good agreementwith the left-right asymmetry reported in Sanchez-Benavides et al [2010] Sanchez-Benavides et al [2010]found that the left-right discrepancy was statistically reli-able with manual segmentation but not with FreeSurfersegmentation Although the left-right asymmetric segmen-tations result is repeatedly reported in several studiesthere are also others that do not find a significant differ-ence between left and right hippocampus in healthy indi-viduals both with segmentation on MR images [egBigler et al 1997 Raz et al 2004] and in autopsy data ofpost-mortem brains [Bogerts et al 1990] This ambiguitymight stem from a potential bias in manual segmentationbased on the laterality of visual perception [Maltbie et al2012] It is possible that human tracers do prefer one sideof a displayed brain during tracing and are thus moreaccurate on one side of the brain therefore producing onebigger and one smaller Hc A possible solution could be todisplay both hippocampi on the same side of the monitorby flipping the MR image during segmentation As it can-not be determined from manual segmentation resultsalone how big the true anatomical left-right asymmetryeffect is it cannot be concluded whether FreeSurfer is per-forming erroneously or correctly in producing a consider-ably smaller effect

We have discussed that FreeSurferrsquos reliability andvalidity is different for younger and older hippocampi Itis quite likely that the values for tissue priors and coordi-nates of the default segmentation atlas used in FreeSurferare further away from optimal values for older adults thanfor younger adults The default segmentation atlas in Free-Surfer is created on the basis of 39 middle-aged brains(mean age around 38 years 6 10) primarily drawn from asample described by Goldstein et al [1999] The hippo-campi of this sample were manually segmented accordingto the conventions of the Center for Morphometric Analy-sis [Fischl et al 2002 2004] Probably the brain structureof these middle-aged brains was more similar to the brainstructure of the younger adults of our study who were

aged 20ndash30 years than to the brain structure of the olderadults who were aged 60ndash70 years resulting in moreaccurate segmentation for younger than for older brainsOlder brain structures as displayed on T1-weighted MRimages typically show a less distinct gray-white mattercontrast [Westlye et al 2009] and are more ldquoblurryrdquowhich makes it harder to define the true border aroundthe Hc and its surrounding structures or cerebrospinalfluid However it remains unknown why the automaticsegmentation algorithm did not overestimate the volumefor bigger older Hc in the same way as for smaller olderHc or younger Hc in general Considering the blurrinessof old brain structures on MR images it would have beenmaybe even more plausible if bigger older Hc had beenmore overestimated However the present data yieldedthe opposite pattern Future studies should follow up onthis finding of overestimation in younger and smallerolder Hc coupled with a relative underestimation of largerolder Hc and try to determine sources of this estimationbias A useful step would be to build age group-specificdefault segmentation atlases [see also Avants et al 2010]Given that normal aging is associated with gradual ana-tomical changes [eg Raz et al 2005] and is often modu-lated by age-correlated conditions that affect the brainsuch as cardiovascular and metabolic syndromes theimportance of arriving at an age-adjusted template isobvious Rerunning the FreeSurfer pipeline with custom-ized masks in the background and examining how suchdefault-segmentation atlases may change subcortical vol-ume estimates may help to circumvent the current ambi-guities in the data As the generation of a whole-brainsegmentation atlas is very time-consuming cooperationbetween different laboratories performing manual wholebrain segmentation is desirable with the common goal ofcreating an age-graded segmentation atlas

In general there are marked differences in the labelingprotocol between FreeSurfer and our manual segmenta-tions which followed the rules provided by Pruessneret al [2000] Visual inspection of the graphic overlay ofboth segmentations gives hints to differences in the label-ing protocol especially in the subiculumentorhinalpara-hippocampal regions the border between amygdala andthe hippocampus and the border between the tail of thehippocampus and the lateral ventricle as has also beenreported by others [eg Cherbuin et al 2009 Deweyet al 2010 Morey et al 2009] In all these areas FreeSur-fer seems to be more inclusive whereas the manual trac-ing protocol provided clear rules of where to draw theanatomical boundary between the hippocampus and itssurroundings which is in general difficult to define[Duvernoy 2005 also see Supporting Information for fur-ther information] These differences in segmentation proto-cols might prevent the two methods from beingcompletely interchangeable and might pose challenges onstudies in which hippocampal shape in these regions is ofmain interest The tendency of FreeSurfer towards overin-clusion of boundary voxels and a more liberal definition

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 9 r

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 2: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

effect was considerably smaller with FreeSurfer estimates We conclude that FreeSurfer constitutes afeasible method to assess differences in hippocampal volume in young adults FreeSurfer estimates inolder age groups should however be interpreted with care until the automatic segmentation pipelinehas been further optimized to increase validity and reliability in this age group Hum Brain Mapp00000ndash000 2014 VC 2014 Wiley Periodicals Inc

Key words hippocampus FreeSurfer manual segmentation left right asymmetry aging

r r

INTRODUCTION

Given its importance in memory and learning the hip-pocampus (Hc) has been extensively studied with neuroi-maging techniques in the last decade [eg Squire et al2004] Measurement of hippocampal volume has becomean important diagnostic tool in clinical settings [egMeurouller et al 2010] and volume differences have beenrevealed for a variety of neurological and psychiatric dis-orders [Bremner et al 1995 Chetelat and Baron 2003Heckers 2001 Jack et al 1999 Sheline et al 1996] Meas-uring hippocampal volume is also important in studies ofhealthy adults [Bhatia et al 1993 Honeycutt and Smith1995] The Hc is considered one of the very few brainregions that preserve their potential for neurogenesis intoadulthood and aging [Christie and Cameron 2006] It istherefore a valuable model for studying both the potentialfor neuroplasticity as well as its limits in mature brainsAnimal studies have demonstrated hippocampal changesin response to experience [Churchhill et al 2002 Jess-berger and Gage 2008 Kempermann et al 2002 Kronen-berg et al 2006 Rosenzweig and Bennett 1996 van Praaget al 2000] In the last years neuroplasticity within thehippocampal formation has also been shown in humansIn their seminal studies Maguire et al showed that Lon-don taxi drivers have a larger posterior hippocampusregion compared to controls and that successful spatialknowledge acquisition is related to hippocampal growth[Maguire et al 2000 2006 Woollett and Maguire 2011]Other types of learning can trigger changes in hippocam-pal volume as well for example learning how to juggleover a period of 3 months [Boyke et al 2008] studyingfor a final medical exam [Draganski et al 2006] spatialnavigation training [Leuroovden et al 2012] or intense study-ing of a foreign language [Martensson et al 2012]

Automated morphometric methods such as voxel-basedmorphometry [Ashburner and Friston 2000] have beenrepeatedly criticized for inaccurate normalization proce-dures and for being susceptible to minor changes in theprocessing pipelines [Bookstein 2001 Thomas et al 2009but see Ashburner and Friston 2001 for a rebuttal] There-fore manual segmentation is still considered the goldstandard However manual segmentation is time-consuming and labor-intensive and thus often not feasiblefor large data sets It also requires at least two individual

tracers to avoid biases In many cases it is desirable tosegment the hippocampal formation quickly and effi-ciently either to gather information on the health status ofa specific person or to investigate large data sets resource-efficiently Thus it is important to further validate the use-fulness of available automatic tools and to specify underwhich conditions they do and do not work well

Manual and automatic segmentation methods have beencompared previously [eg Cherbuin et al 2009 Moreyet al 2009] The findings mostly validate the use of auto-matic methods for segmentation of brain structures such asthe Hc Cherbuin et al [2009] segmented 430 randomlyselected individuals aged 44ndash48 years and showed thatabsolute hippocampal volumes were significantly largerwith the automated measure compared with manual seg-mentation supposedly due to relatively uniform over-inclusion of boundary voxels and surrounding cerebrospi-nal fluid and the inclusion of the subiculumentorhinalparahippocampal regions Still correlations between thetwo methods were high and the relationship of hippocam-pal volume to selected sociodemographic and cognitive var-iables were unaffected by measurement method Deweyet al [2010] analyzed scans of 120 HIV-infected patients(867 male with a mean age of 473 years) and comparedtwo automated volumetric outputs namely FreeSurfer andindividual brain atlases using statistical parametric map-ping (IBASPM) to auto-assisted manual tracings They eval-uated FreeSurfer to be effective for subcortical volumetrybut recommend visual inspection of segmentation outputalong with manual correction to ensure validity of the dataThey suspected especially the border between amygdalaand Hc to often be erroneous as well as the border betweentail of the Hc and lateral ventricle Morey et al [2009] com-pared manual tracings to automatically segmented hippo-campal volumes with FreeSurfer and FSLFIRST in 20subjects Validated in relation to hand tracing hippocampalmeasurements with FreeSurfer were superior to FIRST onall aspects of the objective measures they used They alsoreported a systematic inflation of Hc volume in Freesurferestimates compared to hand tracing which they attribute toshape differences in the anterior-medial surface andincreased variance in this region of the Hc Shen et al [2010]compared manual and automated estimates of hippocampalvolumes in patients with cognitive complaints (n 5 39mean age 5 728) MCI (n 5 37 mean age 5 727) and early

r Wenger et al r

r 2 r

AD (n 5 11 mean age 5 756) as well as in healthy controls(n 5 38 mean age 5 706) The two methods agreedstrongly confirming FreeSurferrsquos potential to determine hip-pocampal volumes in large-scale studies even though therewas a systematic volume difference between FreeSurfer andmanual results The authors argue that this difference stemsfrom FreeSurferrsquos tendency to be more inclusive especiallyin the tail region and to have a few local excursions on thesurfaces Tae et al [2008]) conducted a study on 21 femalepatients aged 18ndash60 years with major depressive disorder inwhich FreeSurfer showed good agreement with manual hip-pocampal volume and detected hippocampal atrophy in thepatients compared with healthy controls Similar to thestudies mentioned earlier the authors speculated that thevolume overestimation was caused by an expansion of theHc segmentation into adjacent white matter at the bottom ofthe Hc as well as the inclusion of the entorhinal cortex andparts of the lateral ventricle and an overall inaccurate dif-ferentiation of the Hc from the amygdala Oscar-Bermanand Song [2011] investigated the usefulness of FreeSurfer in32 alcoholics and 37 controls with a mean age of about 53years and showed that many brain structures as well as theHc could be segmented reliably but correlations betweenFreeSurfer and a semiautomated-supervised-system werelowest in subjects with the greatest abnormalities Sanchez-Benavides et al [2010] compared hippocampal automatedmeasures with manual tracings in healthy (n 5 41) mildcognitive impairment (MCI n 5 23) and Alzheimerrsquos dis-ease patients (n 5 25 covering the age range of 60ndash80 years)and found adequate validity for FreeSurfer estimates with ageneral tendency for volume overestimation even morepronounced in atrophic brains

In this study we compare results from a manual seg-mentation protocol that has extensively been described inLeuroovden et al [2012] mainly following the guidelines byPruessner et al [2000] to an automated segmentation ofthe hippocampus done with the FreeSurfer toolbox (wwwsurfernmrmghharvardedu) Here we focus on twoaspects that have not received sufficient attention so farFirst we investigate potential age group differences in reli-ability and validity of the automated volume estimationmethod of FreeSurfer As an aged brain could also be seenas a brain with abnormalities that are more or less pro-nounced depending on interindividual differences in bio-logical aging it is warranted to examine potentiallimitations of FreeSurfer in this population when compar-ing different age groups Second we investigate potentialdifferences with respect to the segmentation of left andright Hc Brain morphometric studies often incorporatehemispheric asymmetry analyses of segmented brainstructures Such asymmetry has been studied in manyneurological conditions for example post-traumatic stressdisorder [Pavic et al 2007] MCI and Alzheimerrsquos disease[Chetelat and Baron 2003 Geroldi et al 2000 Shi et al2009] and depression [Kronmeurouller et al 2009] Since ananalysis of hippocampal asymmetry should be important

not only for diagnostic purposes it is warranted to investi-gate the performance of automatic segmentation methodscompared to manual tracing more closely also in normalpopulations

METHODS

Participants and Study Design

To compare manual and automated segmentation of Hcwe used a previously acquired dataset including 44younger participants (20ndash30 years M 5 260 SD 5 28)and 47 older participants (60ndash70 years M 5 650 SD 5

28) A detailed description of the effective sample and thedesign of the study have been reported elsewhere [seeLeuroovden et al 2011 2012 Wenger et al 2012] In short thestudy consisted of a pretest measure (henceforth calledtime point A) with magnetic resonance imaging (MRI) atraining phase of about 4 months posttest 1 (henceforthcalled time point B) with MRI immediately after the train-ing phase a period of another 4 months without any train-ing followed by posttest 2 (henceforth called time pointC) with a final MRI measure Participants in the experi-mental group performed a navigation task in a virtualenvironment of a zoo while walking on a treadmill Partic-ipants in the control group walked the exact same amountof time on a treadmill but without performing the naviga-tion task Here we analyze participants from both groupstogether and the data from all three time points Since theresults for each time point are highly similar we willfocus on time point A The replicated results from timepoint B and C can be found in the appendix For cross-time point analyses we use training group as a covariateto control for differential training effects across group andtime which have already been reported elsewhere[Leuroovden et al 2011 2012 Wenger et al 2012]

MRI Acquisition

Three high-resolution T1-weighted images from each par-ticipant were acquired on the same 3 Tesla Magnetom Trioscanner (Siemens Erlangen Germany) with an 8-channelphased-array head coil We used an MPRAGE sequencewith the following parameters TE 5 512 ms TR 5 2600 msTI 5 1100 ms flip angle 5 7 bandwidth 5 140 Hzpixelmatrix 5 320 3 320 3 240 isometric voxel size 5 08 mm3

Manual Tracing

All manual tracing was performed based on the T1-weighted MPRAGE images using a stylus on a WacomDTU-710 pen tablet (Wacom Technology Vancouver WA)and the Analyze 81 software package (AnalyzeDirectOverland Park KS)

The two experienced raters were always blinded togroup and measurement occasion before performing

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 3 r

segmentation Images were displayed in native space at amagnification factor of 3 We manually aligned all imagesand determined the anterior and posterior limits of thehippocampus to define the set of relevant slices Bothraters worked on a subset of randomly chosen slices of thesame size to minimize rater-related biases in single-datasets [Raz et al 2004] To enhance intraperson reliabilityacross the 3 measurement occasions we used a 2-step seg-mentation procedure In the first step one data set fromeach participant was randomly chosen and segmentedaccording to the protocol described below The resultingregions of interest (ROIs) were then coregistered to theremaining two data sets of that person using automaticrigid-body coregistration routines provided in Analyze 81All data sets were then anonymized with respect to themeasurement point such that the raters were unaware ofthe origin of the template ROI The final segmentation wasdone on the basis of both the segmentation protocol andthe template that overlaid the images from each timepoint The entire Hc was segmented anterior-to-posteriorusing the coronal view as default Generally our approachfollowed the protocol provided by Pruessner et al [2000]and was based on intrinsic anatomic properties of the hip-pocampus [Duvernoy 2005] following a number of guid-ing rules Full details of our protocol are reported asSupporting Information To determine the reliability of theROIs that entered our data analysis we randomly chose asubsample of 20 data sets after the segmentation of thewhole sample was completed A parallel version of theROIs that already existed for these data sets was added sothat each rater now processed those slices that the otherrater had processed before The interrater agreements forall slice-based and volume-based comparisons exceeded093 [ICC2 Shrout and Fleiss 1979]

FreeSurfer Segmentation

All cortical reconstruction and volumetric segmentationwere performed with the FreeSurfer software package ver-sion 53 using the longitudinal processing scheme imple-mented to incorporate the subject-wise correlation oflongitudinal data into the processing stream (httpsurfernmrmghharvardedu eg Fischl et al 2002) on a homoge-nous GNULinux based computer cluster comprising of 608CPU cores and a total memory size of 23 TB Assessments oftest-retest reliability of FreeSurfer have revealed high intra-class correlations of 0994 for MPRAGE sequences [Wonder-lick et al 2009] All reconstructed data were visually checkedfor segmentation accuracy at each time point No manualinterventions on the data were performed

Statistical Analysis

All statistical analyses were conducted in SPSS 20 withan alpha level for all analyses of p 5 005 We calculated sta-bility coefficients over time for the two methods with par-tial Pearson correlations using training group as a covariate

These test-retest correlations represent lower bound esti-mates of retest-reliability rather than test-retest reliabilityper se given that they are calculated on a data set in whichnot only measurement error can introduce variance but alsoboth normal aging and training effects can allow for truevariance in change All other analyses were run using thedata from the first time point A Replication of the reportedpattern of results on the data of time point B and time pointC can be found in the appendix

We calculated bivariate Pearson correlations as well asintra-class correlation coefficients (ICC2) between the twoestimation methods Both measures can be used to comparetwo raters or methods The Pearson correlation is a measureof interclass agreement that represents the overlap in rela-tive ordering of individual data points in two data sets Incontrast ICC are based on the assumption that two data setspertain to one class with common variance and mean[Shrout and Fleiss 1979 McGraw and Wong 1996] As aconsequence ICCs of the ratings of two methods do notonly decrease in response to relative ordering of the singledata points but also in response to differences in variance ordifferences in mean In that sense ICCs provide more strin-gent information about rater agreement [Meurouller andBeurouttner 1994] Conceptually ICCs are more appropriate toquantify the agreement between manual and automatedsegmentation procedures To provide better comparabilityto results of earlier studies [Cherbuin et al 2009 Morreyet al 2009 Sanchez-Benavides et al 2010] however wealso report Pearson correlations

As Shrout and Fleiss [1979] point out in their seminalarticle there are many different types of ICC and the ver-sion used must be selected carefully Following sugges-tions by McGraw and Wong [1996] we used a two-waymixed model single measure intraclass correlation withagreement type as is appropriate for the data at hand

To investigate potential age and hemisphere differenceswe conducted a 3-way repeated measures analysis of var-iance (ANOVA) with hemisphere (left right) and method(manual FreeSurfer) as within-subjects factors and age(young old) as a between-subjects factor To trace thesource of significant interactions we calculated 2-wayrepeated measures ANOVAs for each method with hemi-sphere and age as factors In addition we conductedpaired-samples t-tests to assess differences between esti-mation methods separately for the two hemispheres andbetween hemispheres for each estimation method Weconducted these follow-up analyses across as well aswithin age groups To assess potential bias in the auto-matic volume estimation we computed bivariate Pearsonrsquoscorrelations between manual estimates and the differencebetween FreeSurfer and manual estimates

RESULTS

Stability Coefficients Over Time

Stability coefficients are summarized in Table I Gener-ally partial correlations controlling for training group

r Wenger et al r

r 4 r

between the 3 measurement points were very high Formanual tracing correlations were comparably high foryounger and older adults alike with no significant differ-ences between correlations For FreeSurfer segmentationshowever we observed significant age-group differences inthe correlations Correlations were lower for the older thanfor the younger age group between some time pointsBetween pretest and posttest 1 and posttest 1 and posttest2 correlations were significantly lower in the older agegroup (computed with Fisher r-to-z transformation z 227 p lt 005 see Table I for exact correlation values)

Correlation Between FreeSurfer and Manual

Volumes

Results for Pearson correlations between manual andFreeSurfer estimates as well as ICC values are listed inTable II Pearson correlations between manual and Free-Surfer segmentations were 688 for the left Hc and 785 forthe right Hc Correlations between manual and FreeSurfersegmentations were numerically though not significantlyhigher in the young (rleft 5 0824 rright 5 0890) than inthe old (rleft 5 0659 rright 5 0784) ICC estimates wereconsiderably lower than Pearson correlations 0321 for theleft Hc and 0466 for the right Hc In addition age groupcomparisons indicated that the numerical pattern of agree-ment was reversed compared with Pearson correlationsICCs were lower in the younger group (ICCleft 5 0275ICCright 5 0381) than in the older group (ICCleft 5 0361ICCright 5 0577) reflecting a more pronounced disagree-

ment between estimates from manual and automated seg-mentation in the younger age group

Mean Differences Between FreeSurfer and

Manual Volumes as a Function of Age and

Hemisphere

The 3-way ANOVA with Hemisphere Method andAge as factors revealed a significant main effect ofMethod F(189) 5 56539 p lt 0001 h2

p 5 0864 indicat-ing a consistently higher volume estimation in FreeSur-fer tleft(90) 5 1712 p lt 0001 r2 5 0765 tright(90) 5

1584 p lt 0001 r2 5 0736 (see Table III for means andstandard deviations) Further we obtained a significantMethod 3 Age Group interaction F(189) 5 7493 p lt0001 h2

p 5 0457 reflecting two facts First there was alarger mean volume difference in younger tleft(43) 5

2100 p lt 0001 r2 5 0911 tright(43) 5 2131 p lt 0001r2 5 0914 than in older adults tleft(46) 5 1049 p lt0001 r2 5 705 tright(46) 5 880 p lt 001 r2 5 627 Sec-ond there was a significant age-related difference in hip-pocampal volume for FreeSurfer but not for manualsegmentation (Fig 1) Hence the main effect of Age inthe 2-way ANOVA was reliable for FreeSurfer (F(189) 5

4057 p lt 0001 h2p 5 0313) but not for manual tracing

(F(189) 5 194 p 5 0168 h2p 5 0021)

The significant Hemisphere 3 Method interaction in the3-way ANOVA F(189) 5 2547 p lt 0001 h2

p 5 0223indicates differences in hemispheric segmentation betweenthe two methods (see Fig 2) Specifically with manual

TABLE I Stability coefficients over time separate for

age groups controlling for training group

rpre post1 rpost1 post2 rpre post2

Left Right Left Right Left Right

YoungManual 0973 0978 0987 0983 0977 0976FreeSurfer 0987 0991 0992 0994 0987 0990

OldManual 0974 0982 0978 0982 0979 0980FreeSurfer 0967 0976 0970 0978 0977 0981

TABLE II Bivariate Pearson correlations and intra-class

correlations (agreement type) between manual segmen-

tations and FreeSurfer estimates at time point A

rmanual FreeSurfer

Partial r controllingfor age ryoung rold

Left 0688 0751 0824 0659Right 0785 0836 0890 0784

ICCoverall ICCyoung ICCold

Left 0321 0275 0361Right 0466 0381 0577

TABLE III Means and standard deviations of manual and FreeSurfer estimates as well as absolute differences

between the two methods at first measurement time point A

Left Right

Manual FreeSurfer Manual FreeSurferM 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 353014 6 37931 427572 6 49254 74558 371403 6 41350 424482 6 51514 530579Young (n 5 44) 358256 6 40077 446469 6 49117 88213 377816 6 41627 452015 6 501762 74199Old (n 5 47) 348108 6 35535 390519 6 30738 42411 365398 6 40612 398706 6 37893 33308

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 5 r

segmentation right Hc was estimated to be significantlylarger than left Hc t(90) 5 1093 p lt 0001 r2 5 0570FreeSurfer also estimated the right Hc slightly larger thanthe left however with a considerably smaller effect sizet(90) 5 286 p 5 0005 r2 5 0083

Correlation of Difference Between FreeSurfer

and Manual Estimates and Manual Segmentation

as a Proxy for True Hippocampal Volume

In the younger age group there was no relation betweenmanually assessed hippocampal volume and the differencebetween FreeSurfer and manual estimates neither in leftHc r 5 0013 p 5 0931 nor in right Hc r 5 0130 p 5

0401 However in the older age group there was a nega-tive association both in left Hc r 5 20551 p lt 0001 andin right Hc r 5 20421 p 5 0003 This suggests a bias inautomatic volume estimation for the old age group Specif-ically in the younger age group FreeSurfer overestimatedhippocampal volume and it did so independently of man-ually assessed hippocampal volume (Fig 3 see also Fig 4for exemplary visualization of small and large young andold Hc segmentations) By contrast in the older age groupthe difference between FreeSurfer and manual segmenta-tion decreased as the manually segmented Hc sizeincreased indicating a differential bias toward relativelyunderestimating larger volumes in older adults

The same analyses were run on the second and thirdmeasurement time point where results display the exactsame pattern as on the first time point (see appendix)

DISCUSSION

Our results reveal high stability coefficients over timefor both manual and FreeSurfer segmentations With Free-Surfer correlations over time were significantly lower inthe older than in the younger age group which was notthe case with manual segmentation The Pearson correla-tions between the two assessment procedures were rela-tively high (0683 for left Hc and 0766 for right Hc)

Figure 1

Mean differences between the two estimation methods were more pronounced in the young age group

Figure 2

Manual segmentation yielded a pronounced hemispheric differ-

ence in hippocampal volume FreeSurfer also segmented the

right Hc slightly larger than the left however with a consider-

ably smaller effect size reflected in a significant Hemisphere 3

Method interaction

r Wenger et al r

r 6 r

Absolute agreements between the two measures howeverwere considerably lower as FreeSurfer estimated volumesto be higher This volume difference was larger in theyoung than in the old FreeSurfer detected a significantage difference in hippocampal volume whereas manualtracing did not However manual tracing resulted in a sig-nificant difference between left and right Hc whereasFreeSurfer segmented both sides in a more similar man-ner FreeSurfer overestimated hippocampal size independ-ently of manually assessed volume in the younger agegroup but relatively underestimated larger volumes inolder adults thus introducing a bias in this age group

To our knowledge no other investigation up to date hascompared manual segmentation with FreeSurfer segmenta-tion in healthy younger adults and healthy older adultswithin the same study Doing so has allowed us to dis-cover an age-differential segmentation bias that wouldhave gone unnoticed otherwise For younger adults indi-vidual differences in volume estimates derived from Free-Surfer and manual segmentation were highly correlatedas the mean volumes estimates with FreeSurfer exceededmanually obtained estimates by a constant amount In con-trast for older adults adding a constant value did notcapture differences between the two methods as thedegree of disparity between the two methods was foundto depend on the size of manually estimated volumesSpecifically the difference between FreeSurfer and manualsegmentation decreased as the manually segmented hippo-campus volumes increased pointing to a differential biastowards relatively underestimating larger volumes in olderadults with FreeSurfer compared to manual segmentationThus FreeSurfer may yield cross-sectional age-group dif-ferences in instances in which manual segmentation wouldnot This finding is novel and important for the field ashippocampal segmentation based on FreeSurfer is increas-

ingly being used in studies on normal cognitive agingpathological cognitive aging and in studies involvingelderly populations suffering from some kind of disease

These results thus portray a twofold picture of reliabilityand validity in manual and automatic Hc segmentationsIn the younger age group FreeSurfer can be regarded as areliable and valid method for assessing differences in hip-pocampal volume with at least as high reliability as man-ual tracing However in the older age group theoverestimation with FreeSurfer was dependent on ldquotruerdquohippocampal volume as it was smaller for larger hippo-campi than for smaller hippocampi Clearly these novelresults concerning an age-differential bias in older hippo-campi are important for the growing field using automatichippocampal segmentations in aging and diseasedpopulations

The absence of a significant main effect of age for ourmanual segmentation method may be regarded as surpris-ing given previous reports of hippocampal shrinkage innormal aging [eg Raz et al 2005 Scahill et al 2003]However there have also been studies finding no cross-sectional differences in hippocampal volume with age[Bigler et al 1997 Du et al 2006 Liu et al 2003 Sullivanet al 1995 2005 Szentkuti et al 2004] It has also beenshown previously that several individuals from an olderage group in their seventies can indeed have a similar oreven larger hippocampal volume than 20ndash30 year oldadults and that the dispersion around the mean of hippo-campal volume does not necessarily have to differ foryounger and older adults [Lupien et al 2007] which isexactly what we see in our data from manual tracing Fur-thermore it is of course important to note the strict inclu-sion criteria that had to be met in order to participate inthe study Participants had to be able to walk on a tread-mill without any gait problems be free from neurological

Figure 3

Difference between FreeSurfer and manual estimates as a function of manually segmented ldquotruerdquo

hippocampal volume decreases in the old age group but stays stable for the young age group

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 7 r

or other severe diseases and had to be able to undergoMRI scanning which removed participants with implantsTherefore participants in our older age group may be clas-sified as high-functioning elderly adults As such they are

more likely to have a better-preserved brain structure thanthe average person from this age cohort

The bias in volume estimation within FreeSurfer in theolder age group is not only interfering with cross-sectional

Figure 4

A Small Hc of a young participant B Large Hc of a young participant C Small Hc of an old par-

ticipant D Large Hc of an old participant

r Wenger et al r

r 8 r

comparisons but also poses challenges on the programsrsquoapplicability in longitudinal studies FreeSurfer would pre-sumably be less sensitive to longitudinal decreases in hip-pocampal volume in samples of older adults as theoverestimation is higher for smaller volumes in this agegroup and would also be less likely to find longitudinalincreases as larger hippocampal volumes are not beingoverestimated to the same extent as smaller ones

Another intriguing difference between the two segmen-tation methods concerns the leftndashright Hc asymmetry Asexpected from the literature [eg Barnes et al 2008 Bassoet al 2006 Honeycutt and Smith 1995 Jack et al 2003Krasuski et al 1998 Pruessner et al 2000] manual seg-mentation yielded a bigger right than left Hc FreeSurferalso estimates the right Hc as slightly bigger but with aconsiderably smaller effect size When we used earlierprocessing versions of FreeSurfer we even saw an absenceof significant differences between the hippocampi Thedata pattern observed in our study is in good agreementwith the left-right asymmetry reported in Sanchez-Benavides et al [2010] Sanchez-Benavides et al [2010]found that the left-right discrepancy was statistically reli-able with manual segmentation but not with FreeSurfersegmentation Although the left-right asymmetric segmen-tations result is repeatedly reported in several studiesthere are also others that do not find a significant differ-ence between left and right hippocampus in healthy indi-viduals both with segmentation on MR images [egBigler et al 1997 Raz et al 2004] and in autopsy data ofpost-mortem brains [Bogerts et al 1990] This ambiguitymight stem from a potential bias in manual segmentationbased on the laterality of visual perception [Maltbie et al2012] It is possible that human tracers do prefer one sideof a displayed brain during tracing and are thus moreaccurate on one side of the brain therefore producing onebigger and one smaller Hc A possible solution could be todisplay both hippocampi on the same side of the monitorby flipping the MR image during segmentation As it can-not be determined from manual segmentation resultsalone how big the true anatomical left-right asymmetryeffect is it cannot be concluded whether FreeSurfer is per-forming erroneously or correctly in producing a consider-ably smaller effect

We have discussed that FreeSurferrsquos reliability andvalidity is different for younger and older hippocampi Itis quite likely that the values for tissue priors and coordi-nates of the default segmentation atlas used in FreeSurferare further away from optimal values for older adults thanfor younger adults The default segmentation atlas in Free-Surfer is created on the basis of 39 middle-aged brains(mean age around 38 years 6 10) primarily drawn from asample described by Goldstein et al [1999] The hippo-campi of this sample were manually segmented accordingto the conventions of the Center for Morphometric Analy-sis [Fischl et al 2002 2004] Probably the brain structureof these middle-aged brains was more similar to the brainstructure of the younger adults of our study who were

aged 20ndash30 years than to the brain structure of the olderadults who were aged 60ndash70 years resulting in moreaccurate segmentation for younger than for older brainsOlder brain structures as displayed on T1-weighted MRimages typically show a less distinct gray-white mattercontrast [Westlye et al 2009] and are more ldquoblurryrdquowhich makes it harder to define the true border aroundthe Hc and its surrounding structures or cerebrospinalfluid However it remains unknown why the automaticsegmentation algorithm did not overestimate the volumefor bigger older Hc in the same way as for smaller olderHc or younger Hc in general Considering the blurrinessof old brain structures on MR images it would have beenmaybe even more plausible if bigger older Hc had beenmore overestimated However the present data yieldedthe opposite pattern Future studies should follow up onthis finding of overestimation in younger and smallerolder Hc coupled with a relative underestimation of largerolder Hc and try to determine sources of this estimationbias A useful step would be to build age group-specificdefault segmentation atlases [see also Avants et al 2010]Given that normal aging is associated with gradual ana-tomical changes [eg Raz et al 2005] and is often modu-lated by age-correlated conditions that affect the brainsuch as cardiovascular and metabolic syndromes theimportance of arriving at an age-adjusted template isobvious Rerunning the FreeSurfer pipeline with custom-ized masks in the background and examining how suchdefault-segmentation atlases may change subcortical vol-ume estimates may help to circumvent the current ambi-guities in the data As the generation of a whole-brainsegmentation atlas is very time-consuming cooperationbetween different laboratories performing manual wholebrain segmentation is desirable with the common goal ofcreating an age-graded segmentation atlas

In general there are marked differences in the labelingprotocol between FreeSurfer and our manual segmenta-tions which followed the rules provided by Pruessneret al [2000] Visual inspection of the graphic overlay ofboth segmentations gives hints to differences in the label-ing protocol especially in the subiculumentorhinalpara-hippocampal regions the border between amygdala andthe hippocampus and the border between the tail of thehippocampus and the lateral ventricle as has also beenreported by others [eg Cherbuin et al 2009 Deweyet al 2010 Morey et al 2009] In all these areas FreeSur-fer seems to be more inclusive whereas the manual trac-ing protocol provided clear rules of where to draw theanatomical boundary between the hippocampus and itssurroundings which is in general difficult to define[Duvernoy 2005 also see Supporting Information for fur-ther information] These differences in segmentation proto-cols might prevent the two methods from beingcompletely interchangeable and might pose challenges onstudies in which hippocampal shape in these regions is ofmain interest The tendency of FreeSurfer towards overin-clusion of boundary voxels and a more liberal definition

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 9 r

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 3: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

AD (n 5 11 mean age 5 756) as well as in healthy controls(n 5 38 mean age 5 706) The two methods agreedstrongly confirming FreeSurferrsquos potential to determine hip-pocampal volumes in large-scale studies even though therewas a systematic volume difference between FreeSurfer andmanual results The authors argue that this difference stemsfrom FreeSurferrsquos tendency to be more inclusive especiallyin the tail region and to have a few local excursions on thesurfaces Tae et al [2008]) conducted a study on 21 femalepatients aged 18ndash60 years with major depressive disorder inwhich FreeSurfer showed good agreement with manual hip-pocampal volume and detected hippocampal atrophy in thepatients compared with healthy controls Similar to thestudies mentioned earlier the authors speculated that thevolume overestimation was caused by an expansion of theHc segmentation into adjacent white matter at the bottom ofthe Hc as well as the inclusion of the entorhinal cortex andparts of the lateral ventricle and an overall inaccurate dif-ferentiation of the Hc from the amygdala Oscar-Bermanand Song [2011] investigated the usefulness of FreeSurfer in32 alcoholics and 37 controls with a mean age of about 53years and showed that many brain structures as well as theHc could be segmented reliably but correlations betweenFreeSurfer and a semiautomated-supervised-system werelowest in subjects with the greatest abnormalities Sanchez-Benavides et al [2010] compared hippocampal automatedmeasures with manual tracings in healthy (n 5 41) mildcognitive impairment (MCI n 5 23) and Alzheimerrsquos dis-ease patients (n 5 25 covering the age range of 60ndash80 years)and found adequate validity for FreeSurfer estimates with ageneral tendency for volume overestimation even morepronounced in atrophic brains

In this study we compare results from a manual seg-mentation protocol that has extensively been described inLeuroovden et al [2012] mainly following the guidelines byPruessner et al [2000] to an automated segmentation ofthe hippocampus done with the FreeSurfer toolbox (wwwsurfernmrmghharvardedu) Here we focus on twoaspects that have not received sufficient attention so farFirst we investigate potential age group differences in reli-ability and validity of the automated volume estimationmethod of FreeSurfer As an aged brain could also be seenas a brain with abnormalities that are more or less pro-nounced depending on interindividual differences in bio-logical aging it is warranted to examine potentiallimitations of FreeSurfer in this population when compar-ing different age groups Second we investigate potentialdifferences with respect to the segmentation of left andright Hc Brain morphometric studies often incorporatehemispheric asymmetry analyses of segmented brainstructures Such asymmetry has been studied in manyneurological conditions for example post-traumatic stressdisorder [Pavic et al 2007] MCI and Alzheimerrsquos disease[Chetelat and Baron 2003 Geroldi et al 2000 Shi et al2009] and depression [Kronmeurouller et al 2009] Since ananalysis of hippocampal asymmetry should be important

not only for diagnostic purposes it is warranted to investi-gate the performance of automatic segmentation methodscompared to manual tracing more closely also in normalpopulations

METHODS

Participants and Study Design

To compare manual and automated segmentation of Hcwe used a previously acquired dataset including 44younger participants (20ndash30 years M 5 260 SD 5 28)and 47 older participants (60ndash70 years M 5 650 SD 5

28) A detailed description of the effective sample and thedesign of the study have been reported elsewhere [seeLeuroovden et al 2011 2012 Wenger et al 2012] In short thestudy consisted of a pretest measure (henceforth calledtime point A) with magnetic resonance imaging (MRI) atraining phase of about 4 months posttest 1 (henceforthcalled time point B) with MRI immediately after the train-ing phase a period of another 4 months without any train-ing followed by posttest 2 (henceforth called time pointC) with a final MRI measure Participants in the experi-mental group performed a navigation task in a virtualenvironment of a zoo while walking on a treadmill Partic-ipants in the control group walked the exact same amountof time on a treadmill but without performing the naviga-tion task Here we analyze participants from both groupstogether and the data from all three time points Since theresults for each time point are highly similar we willfocus on time point A The replicated results from timepoint B and C can be found in the appendix For cross-time point analyses we use training group as a covariateto control for differential training effects across group andtime which have already been reported elsewhere[Leuroovden et al 2011 2012 Wenger et al 2012]

MRI Acquisition

Three high-resolution T1-weighted images from each par-ticipant were acquired on the same 3 Tesla Magnetom Trioscanner (Siemens Erlangen Germany) with an 8-channelphased-array head coil We used an MPRAGE sequencewith the following parameters TE 5 512 ms TR 5 2600 msTI 5 1100 ms flip angle 5 7 bandwidth 5 140 Hzpixelmatrix 5 320 3 320 3 240 isometric voxel size 5 08 mm3

Manual Tracing

All manual tracing was performed based on the T1-weighted MPRAGE images using a stylus on a WacomDTU-710 pen tablet (Wacom Technology Vancouver WA)and the Analyze 81 software package (AnalyzeDirectOverland Park KS)

The two experienced raters were always blinded togroup and measurement occasion before performing

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 3 r

segmentation Images were displayed in native space at amagnification factor of 3 We manually aligned all imagesand determined the anterior and posterior limits of thehippocampus to define the set of relevant slices Bothraters worked on a subset of randomly chosen slices of thesame size to minimize rater-related biases in single-datasets [Raz et al 2004] To enhance intraperson reliabilityacross the 3 measurement occasions we used a 2-step seg-mentation procedure In the first step one data set fromeach participant was randomly chosen and segmentedaccording to the protocol described below The resultingregions of interest (ROIs) were then coregistered to theremaining two data sets of that person using automaticrigid-body coregistration routines provided in Analyze 81All data sets were then anonymized with respect to themeasurement point such that the raters were unaware ofthe origin of the template ROI The final segmentation wasdone on the basis of both the segmentation protocol andthe template that overlaid the images from each timepoint The entire Hc was segmented anterior-to-posteriorusing the coronal view as default Generally our approachfollowed the protocol provided by Pruessner et al [2000]and was based on intrinsic anatomic properties of the hip-pocampus [Duvernoy 2005] following a number of guid-ing rules Full details of our protocol are reported asSupporting Information To determine the reliability of theROIs that entered our data analysis we randomly chose asubsample of 20 data sets after the segmentation of thewhole sample was completed A parallel version of theROIs that already existed for these data sets was added sothat each rater now processed those slices that the otherrater had processed before The interrater agreements forall slice-based and volume-based comparisons exceeded093 [ICC2 Shrout and Fleiss 1979]

FreeSurfer Segmentation

All cortical reconstruction and volumetric segmentationwere performed with the FreeSurfer software package ver-sion 53 using the longitudinal processing scheme imple-mented to incorporate the subject-wise correlation oflongitudinal data into the processing stream (httpsurfernmrmghharvardedu eg Fischl et al 2002) on a homoge-nous GNULinux based computer cluster comprising of 608CPU cores and a total memory size of 23 TB Assessments oftest-retest reliability of FreeSurfer have revealed high intra-class correlations of 0994 for MPRAGE sequences [Wonder-lick et al 2009] All reconstructed data were visually checkedfor segmentation accuracy at each time point No manualinterventions on the data were performed

Statistical Analysis

All statistical analyses were conducted in SPSS 20 withan alpha level for all analyses of p 5 005 We calculated sta-bility coefficients over time for the two methods with par-tial Pearson correlations using training group as a covariate

These test-retest correlations represent lower bound esti-mates of retest-reliability rather than test-retest reliabilityper se given that they are calculated on a data set in whichnot only measurement error can introduce variance but alsoboth normal aging and training effects can allow for truevariance in change All other analyses were run using thedata from the first time point A Replication of the reportedpattern of results on the data of time point B and time pointC can be found in the appendix

We calculated bivariate Pearson correlations as well asintra-class correlation coefficients (ICC2) between the twoestimation methods Both measures can be used to comparetwo raters or methods The Pearson correlation is a measureof interclass agreement that represents the overlap in rela-tive ordering of individual data points in two data sets Incontrast ICC are based on the assumption that two data setspertain to one class with common variance and mean[Shrout and Fleiss 1979 McGraw and Wong 1996] As aconsequence ICCs of the ratings of two methods do notonly decrease in response to relative ordering of the singledata points but also in response to differences in variance ordifferences in mean In that sense ICCs provide more strin-gent information about rater agreement [Meurouller andBeurouttner 1994] Conceptually ICCs are more appropriate toquantify the agreement between manual and automatedsegmentation procedures To provide better comparabilityto results of earlier studies [Cherbuin et al 2009 Morreyet al 2009 Sanchez-Benavides et al 2010] however wealso report Pearson correlations

As Shrout and Fleiss [1979] point out in their seminalarticle there are many different types of ICC and the ver-sion used must be selected carefully Following sugges-tions by McGraw and Wong [1996] we used a two-waymixed model single measure intraclass correlation withagreement type as is appropriate for the data at hand

To investigate potential age and hemisphere differenceswe conducted a 3-way repeated measures analysis of var-iance (ANOVA) with hemisphere (left right) and method(manual FreeSurfer) as within-subjects factors and age(young old) as a between-subjects factor To trace thesource of significant interactions we calculated 2-wayrepeated measures ANOVAs for each method with hemi-sphere and age as factors In addition we conductedpaired-samples t-tests to assess differences between esti-mation methods separately for the two hemispheres andbetween hemispheres for each estimation method Weconducted these follow-up analyses across as well aswithin age groups To assess potential bias in the auto-matic volume estimation we computed bivariate Pearsonrsquoscorrelations between manual estimates and the differencebetween FreeSurfer and manual estimates

RESULTS

Stability Coefficients Over Time

Stability coefficients are summarized in Table I Gener-ally partial correlations controlling for training group

r Wenger et al r

r 4 r

between the 3 measurement points were very high Formanual tracing correlations were comparably high foryounger and older adults alike with no significant differ-ences between correlations For FreeSurfer segmentationshowever we observed significant age-group differences inthe correlations Correlations were lower for the older thanfor the younger age group between some time pointsBetween pretest and posttest 1 and posttest 1 and posttest2 correlations were significantly lower in the older agegroup (computed with Fisher r-to-z transformation z 227 p lt 005 see Table I for exact correlation values)

Correlation Between FreeSurfer and Manual

Volumes

Results for Pearson correlations between manual andFreeSurfer estimates as well as ICC values are listed inTable II Pearson correlations between manual and Free-Surfer segmentations were 688 for the left Hc and 785 forthe right Hc Correlations between manual and FreeSurfersegmentations were numerically though not significantlyhigher in the young (rleft 5 0824 rright 5 0890) than inthe old (rleft 5 0659 rright 5 0784) ICC estimates wereconsiderably lower than Pearson correlations 0321 for theleft Hc and 0466 for the right Hc In addition age groupcomparisons indicated that the numerical pattern of agree-ment was reversed compared with Pearson correlationsICCs were lower in the younger group (ICCleft 5 0275ICCright 5 0381) than in the older group (ICCleft 5 0361ICCright 5 0577) reflecting a more pronounced disagree-

ment between estimates from manual and automated seg-mentation in the younger age group

Mean Differences Between FreeSurfer and

Manual Volumes as a Function of Age and

Hemisphere

The 3-way ANOVA with Hemisphere Method andAge as factors revealed a significant main effect ofMethod F(189) 5 56539 p lt 0001 h2

p 5 0864 indicat-ing a consistently higher volume estimation in FreeSur-fer tleft(90) 5 1712 p lt 0001 r2 5 0765 tright(90) 5

1584 p lt 0001 r2 5 0736 (see Table III for means andstandard deviations) Further we obtained a significantMethod 3 Age Group interaction F(189) 5 7493 p lt0001 h2

p 5 0457 reflecting two facts First there was alarger mean volume difference in younger tleft(43) 5

2100 p lt 0001 r2 5 0911 tright(43) 5 2131 p lt 0001r2 5 0914 than in older adults tleft(46) 5 1049 p lt0001 r2 5 705 tright(46) 5 880 p lt 001 r2 5 627 Sec-ond there was a significant age-related difference in hip-pocampal volume for FreeSurfer but not for manualsegmentation (Fig 1) Hence the main effect of Age inthe 2-way ANOVA was reliable for FreeSurfer (F(189) 5

4057 p lt 0001 h2p 5 0313) but not for manual tracing

(F(189) 5 194 p 5 0168 h2p 5 0021)

The significant Hemisphere 3 Method interaction in the3-way ANOVA F(189) 5 2547 p lt 0001 h2

p 5 0223indicates differences in hemispheric segmentation betweenthe two methods (see Fig 2) Specifically with manual

TABLE I Stability coefficients over time separate for

age groups controlling for training group

rpre post1 rpost1 post2 rpre post2

Left Right Left Right Left Right

YoungManual 0973 0978 0987 0983 0977 0976FreeSurfer 0987 0991 0992 0994 0987 0990

OldManual 0974 0982 0978 0982 0979 0980FreeSurfer 0967 0976 0970 0978 0977 0981

TABLE II Bivariate Pearson correlations and intra-class

correlations (agreement type) between manual segmen-

tations and FreeSurfer estimates at time point A

rmanual FreeSurfer

Partial r controllingfor age ryoung rold

Left 0688 0751 0824 0659Right 0785 0836 0890 0784

ICCoverall ICCyoung ICCold

Left 0321 0275 0361Right 0466 0381 0577

TABLE III Means and standard deviations of manual and FreeSurfer estimates as well as absolute differences

between the two methods at first measurement time point A

Left Right

Manual FreeSurfer Manual FreeSurferM 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 353014 6 37931 427572 6 49254 74558 371403 6 41350 424482 6 51514 530579Young (n 5 44) 358256 6 40077 446469 6 49117 88213 377816 6 41627 452015 6 501762 74199Old (n 5 47) 348108 6 35535 390519 6 30738 42411 365398 6 40612 398706 6 37893 33308

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 5 r

segmentation right Hc was estimated to be significantlylarger than left Hc t(90) 5 1093 p lt 0001 r2 5 0570FreeSurfer also estimated the right Hc slightly larger thanthe left however with a considerably smaller effect sizet(90) 5 286 p 5 0005 r2 5 0083

Correlation of Difference Between FreeSurfer

and Manual Estimates and Manual Segmentation

as a Proxy for True Hippocampal Volume

In the younger age group there was no relation betweenmanually assessed hippocampal volume and the differencebetween FreeSurfer and manual estimates neither in leftHc r 5 0013 p 5 0931 nor in right Hc r 5 0130 p 5

0401 However in the older age group there was a nega-tive association both in left Hc r 5 20551 p lt 0001 andin right Hc r 5 20421 p 5 0003 This suggests a bias inautomatic volume estimation for the old age group Specif-ically in the younger age group FreeSurfer overestimatedhippocampal volume and it did so independently of man-ually assessed hippocampal volume (Fig 3 see also Fig 4for exemplary visualization of small and large young andold Hc segmentations) By contrast in the older age groupthe difference between FreeSurfer and manual segmenta-tion decreased as the manually segmented Hc sizeincreased indicating a differential bias toward relativelyunderestimating larger volumes in older adults

The same analyses were run on the second and thirdmeasurement time point where results display the exactsame pattern as on the first time point (see appendix)

DISCUSSION

Our results reveal high stability coefficients over timefor both manual and FreeSurfer segmentations With Free-Surfer correlations over time were significantly lower inthe older than in the younger age group which was notthe case with manual segmentation The Pearson correla-tions between the two assessment procedures were rela-tively high (0683 for left Hc and 0766 for right Hc)

Figure 1

Mean differences between the two estimation methods were more pronounced in the young age group

Figure 2

Manual segmentation yielded a pronounced hemispheric differ-

ence in hippocampal volume FreeSurfer also segmented the

right Hc slightly larger than the left however with a consider-

ably smaller effect size reflected in a significant Hemisphere 3

Method interaction

r Wenger et al r

r 6 r

Absolute agreements between the two measures howeverwere considerably lower as FreeSurfer estimated volumesto be higher This volume difference was larger in theyoung than in the old FreeSurfer detected a significantage difference in hippocampal volume whereas manualtracing did not However manual tracing resulted in a sig-nificant difference between left and right Hc whereasFreeSurfer segmented both sides in a more similar man-ner FreeSurfer overestimated hippocampal size independ-ently of manually assessed volume in the younger agegroup but relatively underestimated larger volumes inolder adults thus introducing a bias in this age group

To our knowledge no other investigation up to date hascompared manual segmentation with FreeSurfer segmenta-tion in healthy younger adults and healthy older adultswithin the same study Doing so has allowed us to dis-cover an age-differential segmentation bias that wouldhave gone unnoticed otherwise For younger adults indi-vidual differences in volume estimates derived from Free-Surfer and manual segmentation were highly correlatedas the mean volumes estimates with FreeSurfer exceededmanually obtained estimates by a constant amount In con-trast for older adults adding a constant value did notcapture differences between the two methods as thedegree of disparity between the two methods was foundto depend on the size of manually estimated volumesSpecifically the difference between FreeSurfer and manualsegmentation decreased as the manually segmented hippo-campus volumes increased pointing to a differential biastowards relatively underestimating larger volumes in olderadults with FreeSurfer compared to manual segmentationThus FreeSurfer may yield cross-sectional age-group dif-ferences in instances in which manual segmentation wouldnot This finding is novel and important for the field ashippocampal segmentation based on FreeSurfer is increas-

ingly being used in studies on normal cognitive agingpathological cognitive aging and in studies involvingelderly populations suffering from some kind of disease

These results thus portray a twofold picture of reliabilityand validity in manual and automatic Hc segmentationsIn the younger age group FreeSurfer can be regarded as areliable and valid method for assessing differences in hip-pocampal volume with at least as high reliability as man-ual tracing However in the older age group theoverestimation with FreeSurfer was dependent on ldquotruerdquohippocampal volume as it was smaller for larger hippo-campi than for smaller hippocampi Clearly these novelresults concerning an age-differential bias in older hippo-campi are important for the growing field using automatichippocampal segmentations in aging and diseasedpopulations

The absence of a significant main effect of age for ourmanual segmentation method may be regarded as surpris-ing given previous reports of hippocampal shrinkage innormal aging [eg Raz et al 2005 Scahill et al 2003]However there have also been studies finding no cross-sectional differences in hippocampal volume with age[Bigler et al 1997 Du et al 2006 Liu et al 2003 Sullivanet al 1995 2005 Szentkuti et al 2004] It has also beenshown previously that several individuals from an olderage group in their seventies can indeed have a similar oreven larger hippocampal volume than 20ndash30 year oldadults and that the dispersion around the mean of hippo-campal volume does not necessarily have to differ foryounger and older adults [Lupien et al 2007] which isexactly what we see in our data from manual tracing Fur-thermore it is of course important to note the strict inclu-sion criteria that had to be met in order to participate inthe study Participants had to be able to walk on a tread-mill without any gait problems be free from neurological

Figure 3

Difference between FreeSurfer and manual estimates as a function of manually segmented ldquotruerdquo

hippocampal volume decreases in the old age group but stays stable for the young age group

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 7 r

or other severe diseases and had to be able to undergoMRI scanning which removed participants with implantsTherefore participants in our older age group may be clas-sified as high-functioning elderly adults As such they are

more likely to have a better-preserved brain structure thanthe average person from this age cohort

The bias in volume estimation within FreeSurfer in theolder age group is not only interfering with cross-sectional

Figure 4

A Small Hc of a young participant B Large Hc of a young participant C Small Hc of an old par-

ticipant D Large Hc of an old participant

r Wenger et al r

r 8 r

comparisons but also poses challenges on the programsrsquoapplicability in longitudinal studies FreeSurfer would pre-sumably be less sensitive to longitudinal decreases in hip-pocampal volume in samples of older adults as theoverestimation is higher for smaller volumes in this agegroup and would also be less likely to find longitudinalincreases as larger hippocampal volumes are not beingoverestimated to the same extent as smaller ones

Another intriguing difference between the two segmen-tation methods concerns the leftndashright Hc asymmetry Asexpected from the literature [eg Barnes et al 2008 Bassoet al 2006 Honeycutt and Smith 1995 Jack et al 2003Krasuski et al 1998 Pruessner et al 2000] manual seg-mentation yielded a bigger right than left Hc FreeSurferalso estimates the right Hc as slightly bigger but with aconsiderably smaller effect size When we used earlierprocessing versions of FreeSurfer we even saw an absenceof significant differences between the hippocampi Thedata pattern observed in our study is in good agreementwith the left-right asymmetry reported in Sanchez-Benavides et al [2010] Sanchez-Benavides et al [2010]found that the left-right discrepancy was statistically reli-able with manual segmentation but not with FreeSurfersegmentation Although the left-right asymmetric segmen-tations result is repeatedly reported in several studiesthere are also others that do not find a significant differ-ence between left and right hippocampus in healthy indi-viduals both with segmentation on MR images [egBigler et al 1997 Raz et al 2004] and in autopsy data ofpost-mortem brains [Bogerts et al 1990] This ambiguitymight stem from a potential bias in manual segmentationbased on the laterality of visual perception [Maltbie et al2012] It is possible that human tracers do prefer one sideof a displayed brain during tracing and are thus moreaccurate on one side of the brain therefore producing onebigger and one smaller Hc A possible solution could be todisplay both hippocampi on the same side of the monitorby flipping the MR image during segmentation As it can-not be determined from manual segmentation resultsalone how big the true anatomical left-right asymmetryeffect is it cannot be concluded whether FreeSurfer is per-forming erroneously or correctly in producing a consider-ably smaller effect

We have discussed that FreeSurferrsquos reliability andvalidity is different for younger and older hippocampi Itis quite likely that the values for tissue priors and coordi-nates of the default segmentation atlas used in FreeSurferare further away from optimal values for older adults thanfor younger adults The default segmentation atlas in Free-Surfer is created on the basis of 39 middle-aged brains(mean age around 38 years 6 10) primarily drawn from asample described by Goldstein et al [1999] The hippo-campi of this sample were manually segmented accordingto the conventions of the Center for Morphometric Analy-sis [Fischl et al 2002 2004] Probably the brain structureof these middle-aged brains was more similar to the brainstructure of the younger adults of our study who were

aged 20ndash30 years than to the brain structure of the olderadults who were aged 60ndash70 years resulting in moreaccurate segmentation for younger than for older brainsOlder brain structures as displayed on T1-weighted MRimages typically show a less distinct gray-white mattercontrast [Westlye et al 2009] and are more ldquoblurryrdquowhich makes it harder to define the true border aroundthe Hc and its surrounding structures or cerebrospinalfluid However it remains unknown why the automaticsegmentation algorithm did not overestimate the volumefor bigger older Hc in the same way as for smaller olderHc or younger Hc in general Considering the blurrinessof old brain structures on MR images it would have beenmaybe even more plausible if bigger older Hc had beenmore overestimated However the present data yieldedthe opposite pattern Future studies should follow up onthis finding of overestimation in younger and smallerolder Hc coupled with a relative underestimation of largerolder Hc and try to determine sources of this estimationbias A useful step would be to build age group-specificdefault segmentation atlases [see also Avants et al 2010]Given that normal aging is associated with gradual ana-tomical changes [eg Raz et al 2005] and is often modu-lated by age-correlated conditions that affect the brainsuch as cardiovascular and metabolic syndromes theimportance of arriving at an age-adjusted template isobvious Rerunning the FreeSurfer pipeline with custom-ized masks in the background and examining how suchdefault-segmentation atlases may change subcortical vol-ume estimates may help to circumvent the current ambi-guities in the data As the generation of a whole-brainsegmentation atlas is very time-consuming cooperationbetween different laboratories performing manual wholebrain segmentation is desirable with the common goal ofcreating an age-graded segmentation atlas

In general there are marked differences in the labelingprotocol between FreeSurfer and our manual segmenta-tions which followed the rules provided by Pruessneret al [2000] Visual inspection of the graphic overlay ofboth segmentations gives hints to differences in the label-ing protocol especially in the subiculumentorhinalpara-hippocampal regions the border between amygdala andthe hippocampus and the border between the tail of thehippocampus and the lateral ventricle as has also beenreported by others [eg Cherbuin et al 2009 Deweyet al 2010 Morey et al 2009] In all these areas FreeSur-fer seems to be more inclusive whereas the manual trac-ing protocol provided clear rules of where to draw theanatomical boundary between the hippocampus and itssurroundings which is in general difficult to define[Duvernoy 2005 also see Supporting Information for fur-ther information] These differences in segmentation proto-cols might prevent the two methods from beingcompletely interchangeable and might pose challenges onstudies in which hippocampal shape in these regions is ofmain interest The tendency of FreeSurfer towards overin-clusion of boundary voxels and a more liberal definition

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 9 r

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 4: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

segmentation Images were displayed in native space at amagnification factor of 3 We manually aligned all imagesand determined the anterior and posterior limits of thehippocampus to define the set of relevant slices Bothraters worked on a subset of randomly chosen slices of thesame size to minimize rater-related biases in single-datasets [Raz et al 2004] To enhance intraperson reliabilityacross the 3 measurement occasions we used a 2-step seg-mentation procedure In the first step one data set fromeach participant was randomly chosen and segmentedaccording to the protocol described below The resultingregions of interest (ROIs) were then coregistered to theremaining two data sets of that person using automaticrigid-body coregistration routines provided in Analyze 81All data sets were then anonymized with respect to themeasurement point such that the raters were unaware ofthe origin of the template ROI The final segmentation wasdone on the basis of both the segmentation protocol andthe template that overlaid the images from each timepoint The entire Hc was segmented anterior-to-posteriorusing the coronal view as default Generally our approachfollowed the protocol provided by Pruessner et al [2000]and was based on intrinsic anatomic properties of the hip-pocampus [Duvernoy 2005] following a number of guid-ing rules Full details of our protocol are reported asSupporting Information To determine the reliability of theROIs that entered our data analysis we randomly chose asubsample of 20 data sets after the segmentation of thewhole sample was completed A parallel version of theROIs that already existed for these data sets was added sothat each rater now processed those slices that the otherrater had processed before The interrater agreements forall slice-based and volume-based comparisons exceeded093 [ICC2 Shrout and Fleiss 1979]

FreeSurfer Segmentation

All cortical reconstruction and volumetric segmentationwere performed with the FreeSurfer software package ver-sion 53 using the longitudinal processing scheme imple-mented to incorporate the subject-wise correlation oflongitudinal data into the processing stream (httpsurfernmrmghharvardedu eg Fischl et al 2002) on a homoge-nous GNULinux based computer cluster comprising of 608CPU cores and a total memory size of 23 TB Assessments oftest-retest reliability of FreeSurfer have revealed high intra-class correlations of 0994 for MPRAGE sequences [Wonder-lick et al 2009] All reconstructed data were visually checkedfor segmentation accuracy at each time point No manualinterventions on the data were performed

Statistical Analysis

All statistical analyses were conducted in SPSS 20 withan alpha level for all analyses of p 5 005 We calculated sta-bility coefficients over time for the two methods with par-tial Pearson correlations using training group as a covariate

These test-retest correlations represent lower bound esti-mates of retest-reliability rather than test-retest reliabilityper se given that they are calculated on a data set in whichnot only measurement error can introduce variance but alsoboth normal aging and training effects can allow for truevariance in change All other analyses were run using thedata from the first time point A Replication of the reportedpattern of results on the data of time point B and time pointC can be found in the appendix

We calculated bivariate Pearson correlations as well asintra-class correlation coefficients (ICC2) between the twoestimation methods Both measures can be used to comparetwo raters or methods The Pearson correlation is a measureof interclass agreement that represents the overlap in rela-tive ordering of individual data points in two data sets Incontrast ICC are based on the assumption that two data setspertain to one class with common variance and mean[Shrout and Fleiss 1979 McGraw and Wong 1996] As aconsequence ICCs of the ratings of two methods do notonly decrease in response to relative ordering of the singledata points but also in response to differences in variance ordifferences in mean In that sense ICCs provide more strin-gent information about rater agreement [Meurouller andBeurouttner 1994] Conceptually ICCs are more appropriate toquantify the agreement between manual and automatedsegmentation procedures To provide better comparabilityto results of earlier studies [Cherbuin et al 2009 Morreyet al 2009 Sanchez-Benavides et al 2010] however wealso report Pearson correlations

As Shrout and Fleiss [1979] point out in their seminalarticle there are many different types of ICC and the ver-sion used must be selected carefully Following sugges-tions by McGraw and Wong [1996] we used a two-waymixed model single measure intraclass correlation withagreement type as is appropriate for the data at hand

To investigate potential age and hemisphere differenceswe conducted a 3-way repeated measures analysis of var-iance (ANOVA) with hemisphere (left right) and method(manual FreeSurfer) as within-subjects factors and age(young old) as a between-subjects factor To trace thesource of significant interactions we calculated 2-wayrepeated measures ANOVAs for each method with hemi-sphere and age as factors In addition we conductedpaired-samples t-tests to assess differences between esti-mation methods separately for the two hemispheres andbetween hemispheres for each estimation method Weconducted these follow-up analyses across as well aswithin age groups To assess potential bias in the auto-matic volume estimation we computed bivariate Pearsonrsquoscorrelations between manual estimates and the differencebetween FreeSurfer and manual estimates

RESULTS

Stability Coefficients Over Time

Stability coefficients are summarized in Table I Gener-ally partial correlations controlling for training group

r Wenger et al r

r 4 r

between the 3 measurement points were very high Formanual tracing correlations were comparably high foryounger and older adults alike with no significant differ-ences between correlations For FreeSurfer segmentationshowever we observed significant age-group differences inthe correlations Correlations were lower for the older thanfor the younger age group between some time pointsBetween pretest and posttest 1 and posttest 1 and posttest2 correlations were significantly lower in the older agegroup (computed with Fisher r-to-z transformation z 227 p lt 005 see Table I for exact correlation values)

Correlation Between FreeSurfer and Manual

Volumes

Results for Pearson correlations between manual andFreeSurfer estimates as well as ICC values are listed inTable II Pearson correlations between manual and Free-Surfer segmentations were 688 for the left Hc and 785 forthe right Hc Correlations between manual and FreeSurfersegmentations were numerically though not significantlyhigher in the young (rleft 5 0824 rright 5 0890) than inthe old (rleft 5 0659 rright 5 0784) ICC estimates wereconsiderably lower than Pearson correlations 0321 for theleft Hc and 0466 for the right Hc In addition age groupcomparisons indicated that the numerical pattern of agree-ment was reversed compared with Pearson correlationsICCs were lower in the younger group (ICCleft 5 0275ICCright 5 0381) than in the older group (ICCleft 5 0361ICCright 5 0577) reflecting a more pronounced disagree-

ment between estimates from manual and automated seg-mentation in the younger age group

Mean Differences Between FreeSurfer and

Manual Volumes as a Function of Age and

Hemisphere

The 3-way ANOVA with Hemisphere Method andAge as factors revealed a significant main effect ofMethod F(189) 5 56539 p lt 0001 h2

p 5 0864 indicat-ing a consistently higher volume estimation in FreeSur-fer tleft(90) 5 1712 p lt 0001 r2 5 0765 tright(90) 5

1584 p lt 0001 r2 5 0736 (see Table III for means andstandard deviations) Further we obtained a significantMethod 3 Age Group interaction F(189) 5 7493 p lt0001 h2

p 5 0457 reflecting two facts First there was alarger mean volume difference in younger tleft(43) 5

2100 p lt 0001 r2 5 0911 tright(43) 5 2131 p lt 0001r2 5 0914 than in older adults tleft(46) 5 1049 p lt0001 r2 5 705 tright(46) 5 880 p lt 001 r2 5 627 Sec-ond there was a significant age-related difference in hip-pocampal volume for FreeSurfer but not for manualsegmentation (Fig 1) Hence the main effect of Age inthe 2-way ANOVA was reliable for FreeSurfer (F(189) 5

4057 p lt 0001 h2p 5 0313) but not for manual tracing

(F(189) 5 194 p 5 0168 h2p 5 0021)

The significant Hemisphere 3 Method interaction in the3-way ANOVA F(189) 5 2547 p lt 0001 h2

p 5 0223indicates differences in hemispheric segmentation betweenthe two methods (see Fig 2) Specifically with manual

TABLE I Stability coefficients over time separate for

age groups controlling for training group

rpre post1 rpost1 post2 rpre post2

Left Right Left Right Left Right

YoungManual 0973 0978 0987 0983 0977 0976FreeSurfer 0987 0991 0992 0994 0987 0990

OldManual 0974 0982 0978 0982 0979 0980FreeSurfer 0967 0976 0970 0978 0977 0981

TABLE II Bivariate Pearson correlations and intra-class

correlations (agreement type) between manual segmen-

tations and FreeSurfer estimates at time point A

rmanual FreeSurfer

Partial r controllingfor age ryoung rold

Left 0688 0751 0824 0659Right 0785 0836 0890 0784

ICCoverall ICCyoung ICCold

Left 0321 0275 0361Right 0466 0381 0577

TABLE III Means and standard deviations of manual and FreeSurfer estimates as well as absolute differences

between the two methods at first measurement time point A

Left Right

Manual FreeSurfer Manual FreeSurferM 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 353014 6 37931 427572 6 49254 74558 371403 6 41350 424482 6 51514 530579Young (n 5 44) 358256 6 40077 446469 6 49117 88213 377816 6 41627 452015 6 501762 74199Old (n 5 47) 348108 6 35535 390519 6 30738 42411 365398 6 40612 398706 6 37893 33308

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 5 r

segmentation right Hc was estimated to be significantlylarger than left Hc t(90) 5 1093 p lt 0001 r2 5 0570FreeSurfer also estimated the right Hc slightly larger thanthe left however with a considerably smaller effect sizet(90) 5 286 p 5 0005 r2 5 0083

Correlation of Difference Between FreeSurfer

and Manual Estimates and Manual Segmentation

as a Proxy for True Hippocampal Volume

In the younger age group there was no relation betweenmanually assessed hippocampal volume and the differencebetween FreeSurfer and manual estimates neither in leftHc r 5 0013 p 5 0931 nor in right Hc r 5 0130 p 5

0401 However in the older age group there was a nega-tive association both in left Hc r 5 20551 p lt 0001 andin right Hc r 5 20421 p 5 0003 This suggests a bias inautomatic volume estimation for the old age group Specif-ically in the younger age group FreeSurfer overestimatedhippocampal volume and it did so independently of man-ually assessed hippocampal volume (Fig 3 see also Fig 4for exemplary visualization of small and large young andold Hc segmentations) By contrast in the older age groupthe difference between FreeSurfer and manual segmenta-tion decreased as the manually segmented Hc sizeincreased indicating a differential bias toward relativelyunderestimating larger volumes in older adults

The same analyses were run on the second and thirdmeasurement time point where results display the exactsame pattern as on the first time point (see appendix)

DISCUSSION

Our results reveal high stability coefficients over timefor both manual and FreeSurfer segmentations With Free-Surfer correlations over time were significantly lower inthe older than in the younger age group which was notthe case with manual segmentation The Pearson correla-tions between the two assessment procedures were rela-tively high (0683 for left Hc and 0766 for right Hc)

Figure 1

Mean differences between the two estimation methods were more pronounced in the young age group

Figure 2

Manual segmentation yielded a pronounced hemispheric differ-

ence in hippocampal volume FreeSurfer also segmented the

right Hc slightly larger than the left however with a consider-

ably smaller effect size reflected in a significant Hemisphere 3

Method interaction

r Wenger et al r

r 6 r

Absolute agreements between the two measures howeverwere considerably lower as FreeSurfer estimated volumesto be higher This volume difference was larger in theyoung than in the old FreeSurfer detected a significantage difference in hippocampal volume whereas manualtracing did not However manual tracing resulted in a sig-nificant difference between left and right Hc whereasFreeSurfer segmented both sides in a more similar man-ner FreeSurfer overestimated hippocampal size independ-ently of manually assessed volume in the younger agegroup but relatively underestimated larger volumes inolder adults thus introducing a bias in this age group

To our knowledge no other investigation up to date hascompared manual segmentation with FreeSurfer segmenta-tion in healthy younger adults and healthy older adultswithin the same study Doing so has allowed us to dis-cover an age-differential segmentation bias that wouldhave gone unnoticed otherwise For younger adults indi-vidual differences in volume estimates derived from Free-Surfer and manual segmentation were highly correlatedas the mean volumes estimates with FreeSurfer exceededmanually obtained estimates by a constant amount In con-trast for older adults adding a constant value did notcapture differences between the two methods as thedegree of disparity between the two methods was foundto depend on the size of manually estimated volumesSpecifically the difference between FreeSurfer and manualsegmentation decreased as the manually segmented hippo-campus volumes increased pointing to a differential biastowards relatively underestimating larger volumes in olderadults with FreeSurfer compared to manual segmentationThus FreeSurfer may yield cross-sectional age-group dif-ferences in instances in which manual segmentation wouldnot This finding is novel and important for the field ashippocampal segmentation based on FreeSurfer is increas-

ingly being used in studies on normal cognitive agingpathological cognitive aging and in studies involvingelderly populations suffering from some kind of disease

These results thus portray a twofold picture of reliabilityand validity in manual and automatic Hc segmentationsIn the younger age group FreeSurfer can be regarded as areliable and valid method for assessing differences in hip-pocampal volume with at least as high reliability as man-ual tracing However in the older age group theoverestimation with FreeSurfer was dependent on ldquotruerdquohippocampal volume as it was smaller for larger hippo-campi than for smaller hippocampi Clearly these novelresults concerning an age-differential bias in older hippo-campi are important for the growing field using automatichippocampal segmentations in aging and diseasedpopulations

The absence of a significant main effect of age for ourmanual segmentation method may be regarded as surpris-ing given previous reports of hippocampal shrinkage innormal aging [eg Raz et al 2005 Scahill et al 2003]However there have also been studies finding no cross-sectional differences in hippocampal volume with age[Bigler et al 1997 Du et al 2006 Liu et al 2003 Sullivanet al 1995 2005 Szentkuti et al 2004] It has also beenshown previously that several individuals from an olderage group in their seventies can indeed have a similar oreven larger hippocampal volume than 20ndash30 year oldadults and that the dispersion around the mean of hippo-campal volume does not necessarily have to differ foryounger and older adults [Lupien et al 2007] which isexactly what we see in our data from manual tracing Fur-thermore it is of course important to note the strict inclu-sion criteria that had to be met in order to participate inthe study Participants had to be able to walk on a tread-mill without any gait problems be free from neurological

Figure 3

Difference between FreeSurfer and manual estimates as a function of manually segmented ldquotruerdquo

hippocampal volume decreases in the old age group but stays stable for the young age group

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 7 r

or other severe diseases and had to be able to undergoMRI scanning which removed participants with implantsTherefore participants in our older age group may be clas-sified as high-functioning elderly adults As such they are

more likely to have a better-preserved brain structure thanthe average person from this age cohort

The bias in volume estimation within FreeSurfer in theolder age group is not only interfering with cross-sectional

Figure 4

A Small Hc of a young participant B Large Hc of a young participant C Small Hc of an old par-

ticipant D Large Hc of an old participant

r Wenger et al r

r 8 r

comparisons but also poses challenges on the programsrsquoapplicability in longitudinal studies FreeSurfer would pre-sumably be less sensitive to longitudinal decreases in hip-pocampal volume in samples of older adults as theoverestimation is higher for smaller volumes in this agegroup and would also be less likely to find longitudinalincreases as larger hippocampal volumes are not beingoverestimated to the same extent as smaller ones

Another intriguing difference between the two segmen-tation methods concerns the leftndashright Hc asymmetry Asexpected from the literature [eg Barnes et al 2008 Bassoet al 2006 Honeycutt and Smith 1995 Jack et al 2003Krasuski et al 1998 Pruessner et al 2000] manual seg-mentation yielded a bigger right than left Hc FreeSurferalso estimates the right Hc as slightly bigger but with aconsiderably smaller effect size When we used earlierprocessing versions of FreeSurfer we even saw an absenceof significant differences between the hippocampi Thedata pattern observed in our study is in good agreementwith the left-right asymmetry reported in Sanchez-Benavides et al [2010] Sanchez-Benavides et al [2010]found that the left-right discrepancy was statistically reli-able with manual segmentation but not with FreeSurfersegmentation Although the left-right asymmetric segmen-tations result is repeatedly reported in several studiesthere are also others that do not find a significant differ-ence between left and right hippocampus in healthy indi-viduals both with segmentation on MR images [egBigler et al 1997 Raz et al 2004] and in autopsy data ofpost-mortem brains [Bogerts et al 1990] This ambiguitymight stem from a potential bias in manual segmentationbased on the laterality of visual perception [Maltbie et al2012] It is possible that human tracers do prefer one sideof a displayed brain during tracing and are thus moreaccurate on one side of the brain therefore producing onebigger and one smaller Hc A possible solution could be todisplay both hippocampi on the same side of the monitorby flipping the MR image during segmentation As it can-not be determined from manual segmentation resultsalone how big the true anatomical left-right asymmetryeffect is it cannot be concluded whether FreeSurfer is per-forming erroneously or correctly in producing a consider-ably smaller effect

We have discussed that FreeSurferrsquos reliability andvalidity is different for younger and older hippocampi Itis quite likely that the values for tissue priors and coordi-nates of the default segmentation atlas used in FreeSurferare further away from optimal values for older adults thanfor younger adults The default segmentation atlas in Free-Surfer is created on the basis of 39 middle-aged brains(mean age around 38 years 6 10) primarily drawn from asample described by Goldstein et al [1999] The hippo-campi of this sample were manually segmented accordingto the conventions of the Center for Morphometric Analy-sis [Fischl et al 2002 2004] Probably the brain structureof these middle-aged brains was more similar to the brainstructure of the younger adults of our study who were

aged 20ndash30 years than to the brain structure of the olderadults who were aged 60ndash70 years resulting in moreaccurate segmentation for younger than for older brainsOlder brain structures as displayed on T1-weighted MRimages typically show a less distinct gray-white mattercontrast [Westlye et al 2009] and are more ldquoblurryrdquowhich makes it harder to define the true border aroundthe Hc and its surrounding structures or cerebrospinalfluid However it remains unknown why the automaticsegmentation algorithm did not overestimate the volumefor bigger older Hc in the same way as for smaller olderHc or younger Hc in general Considering the blurrinessof old brain structures on MR images it would have beenmaybe even more plausible if bigger older Hc had beenmore overestimated However the present data yieldedthe opposite pattern Future studies should follow up onthis finding of overestimation in younger and smallerolder Hc coupled with a relative underestimation of largerolder Hc and try to determine sources of this estimationbias A useful step would be to build age group-specificdefault segmentation atlases [see also Avants et al 2010]Given that normal aging is associated with gradual ana-tomical changes [eg Raz et al 2005] and is often modu-lated by age-correlated conditions that affect the brainsuch as cardiovascular and metabolic syndromes theimportance of arriving at an age-adjusted template isobvious Rerunning the FreeSurfer pipeline with custom-ized masks in the background and examining how suchdefault-segmentation atlases may change subcortical vol-ume estimates may help to circumvent the current ambi-guities in the data As the generation of a whole-brainsegmentation atlas is very time-consuming cooperationbetween different laboratories performing manual wholebrain segmentation is desirable with the common goal ofcreating an age-graded segmentation atlas

In general there are marked differences in the labelingprotocol between FreeSurfer and our manual segmenta-tions which followed the rules provided by Pruessneret al [2000] Visual inspection of the graphic overlay ofboth segmentations gives hints to differences in the label-ing protocol especially in the subiculumentorhinalpara-hippocampal regions the border between amygdala andthe hippocampus and the border between the tail of thehippocampus and the lateral ventricle as has also beenreported by others [eg Cherbuin et al 2009 Deweyet al 2010 Morey et al 2009] In all these areas FreeSur-fer seems to be more inclusive whereas the manual trac-ing protocol provided clear rules of where to draw theanatomical boundary between the hippocampus and itssurroundings which is in general difficult to define[Duvernoy 2005 also see Supporting Information for fur-ther information] These differences in segmentation proto-cols might prevent the two methods from beingcompletely interchangeable and might pose challenges onstudies in which hippocampal shape in these regions is ofmain interest The tendency of FreeSurfer towards overin-clusion of boundary voxels and a more liberal definition

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 9 r

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 5: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

between the 3 measurement points were very high Formanual tracing correlations were comparably high foryounger and older adults alike with no significant differ-ences between correlations For FreeSurfer segmentationshowever we observed significant age-group differences inthe correlations Correlations were lower for the older thanfor the younger age group between some time pointsBetween pretest and posttest 1 and posttest 1 and posttest2 correlations were significantly lower in the older agegroup (computed with Fisher r-to-z transformation z 227 p lt 005 see Table I for exact correlation values)

Correlation Between FreeSurfer and Manual

Volumes

Results for Pearson correlations between manual andFreeSurfer estimates as well as ICC values are listed inTable II Pearson correlations between manual and Free-Surfer segmentations were 688 for the left Hc and 785 forthe right Hc Correlations between manual and FreeSurfersegmentations were numerically though not significantlyhigher in the young (rleft 5 0824 rright 5 0890) than inthe old (rleft 5 0659 rright 5 0784) ICC estimates wereconsiderably lower than Pearson correlations 0321 for theleft Hc and 0466 for the right Hc In addition age groupcomparisons indicated that the numerical pattern of agree-ment was reversed compared with Pearson correlationsICCs were lower in the younger group (ICCleft 5 0275ICCright 5 0381) than in the older group (ICCleft 5 0361ICCright 5 0577) reflecting a more pronounced disagree-

ment between estimates from manual and automated seg-mentation in the younger age group

Mean Differences Between FreeSurfer and

Manual Volumes as a Function of Age and

Hemisphere

The 3-way ANOVA with Hemisphere Method andAge as factors revealed a significant main effect ofMethod F(189) 5 56539 p lt 0001 h2

p 5 0864 indicat-ing a consistently higher volume estimation in FreeSur-fer tleft(90) 5 1712 p lt 0001 r2 5 0765 tright(90) 5

1584 p lt 0001 r2 5 0736 (see Table III for means andstandard deviations) Further we obtained a significantMethod 3 Age Group interaction F(189) 5 7493 p lt0001 h2

p 5 0457 reflecting two facts First there was alarger mean volume difference in younger tleft(43) 5

2100 p lt 0001 r2 5 0911 tright(43) 5 2131 p lt 0001r2 5 0914 than in older adults tleft(46) 5 1049 p lt0001 r2 5 705 tright(46) 5 880 p lt 001 r2 5 627 Sec-ond there was a significant age-related difference in hip-pocampal volume for FreeSurfer but not for manualsegmentation (Fig 1) Hence the main effect of Age inthe 2-way ANOVA was reliable for FreeSurfer (F(189) 5

4057 p lt 0001 h2p 5 0313) but not for manual tracing

(F(189) 5 194 p 5 0168 h2p 5 0021)

The significant Hemisphere 3 Method interaction in the3-way ANOVA F(189) 5 2547 p lt 0001 h2

p 5 0223indicates differences in hemispheric segmentation betweenthe two methods (see Fig 2) Specifically with manual

TABLE I Stability coefficients over time separate for

age groups controlling for training group

rpre post1 rpost1 post2 rpre post2

Left Right Left Right Left Right

YoungManual 0973 0978 0987 0983 0977 0976FreeSurfer 0987 0991 0992 0994 0987 0990

OldManual 0974 0982 0978 0982 0979 0980FreeSurfer 0967 0976 0970 0978 0977 0981

TABLE II Bivariate Pearson correlations and intra-class

correlations (agreement type) between manual segmen-

tations and FreeSurfer estimates at time point A

rmanual FreeSurfer

Partial r controllingfor age ryoung rold

Left 0688 0751 0824 0659Right 0785 0836 0890 0784

ICCoverall ICCyoung ICCold

Left 0321 0275 0361Right 0466 0381 0577

TABLE III Means and standard deviations of manual and FreeSurfer estimates as well as absolute differences

between the two methods at first measurement time point A

Left Right

Manual FreeSurfer Manual FreeSurferM 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 353014 6 37931 427572 6 49254 74558 371403 6 41350 424482 6 51514 530579Young (n 5 44) 358256 6 40077 446469 6 49117 88213 377816 6 41627 452015 6 501762 74199Old (n 5 47) 348108 6 35535 390519 6 30738 42411 365398 6 40612 398706 6 37893 33308

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 5 r

segmentation right Hc was estimated to be significantlylarger than left Hc t(90) 5 1093 p lt 0001 r2 5 0570FreeSurfer also estimated the right Hc slightly larger thanthe left however with a considerably smaller effect sizet(90) 5 286 p 5 0005 r2 5 0083

Correlation of Difference Between FreeSurfer

and Manual Estimates and Manual Segmentation

as a Proxy for True Hippocampal Volume

In the younger age group there was no relation betweenmanually assessed hippocampal volume and the differencebetween FreeSurfer and manual estimates neither in leftHc r 5 0013 p 5 0931 nor in right Hc r 5 0130 p 5

0401 However in the older age group there was a nega-tive association both in left Hc r 5 20551 p lt 0001 andin right Hc r 5 20421 p 5 0003 This suggests a bias inautomatic volume estimation for the old age group Specif-ically in the younger age group FreeSurfer overestimatedhippocampal volume and it did so independently of man-ually assessed hippocampal volume (Fig 3 see also Fig 4for exemplary visualization of small and large young andold Hc segmentations) By contrast in the older age groupthe difference between FreeSurfer and manual segmenta-tion decreased as the manually segmented Hc sizeincreased indicating a differential bias toward relativelyunderestimating larger volumes in older adults

The same analyses were run on the second and thirdmeasurement time point where results display the exactsame pattern as on the first time point (see appendix)

DISCUSSION

Our results reveal high stability coefficients over timefor both manual and FreeSurfer segmentations With Free-Surfer correlations over time were significantly lower inthe older than in the younger age group which was notthe case with manual segmentation The Pearson correla-tions between the two assessment procedures were rela-tively high (0683 for left Hc and 0766 for right Hc)

Figure 1

Mean differences between the two estimation methods were more pronounced in the young age group

Figure 2

Manual segmentation yielded a pronounced hemispheric differ-

ence in hippocampal volume FreeSurfer also segmented the

right Hc slightly larger than the left however with a consider-

ably smaller effect size reflected in a significant Hemisphere 3

Method interaction

r Wenger et al r

r 6 r

Absolute agreements between the two measures howeverwere considerably lower as FreeSurfer estimated volumesto be higher This volume difference was larger in theyoung than in the old FreeSurfer detected a significantage difference in hippocampal volume whereas manualtracing did not However manual tracing resulted in a sig-nificant difference between left and right Hc whereasFreeSurfer segmented both sides in a more similar man-ner FreeSurfer overestimated hippocampal size independ-ently of manually assessed volume in the younger agegroup but relatively underestimated larger volumes inolder adults thus introducing a bias in this age group

To our knowledge no other investigation up to date hascompared manual segmentation with FreeSurfer segmenta-tion in healthy younger adults and healthy older adultswithin the same study Doing so has allowed us to dis-cover an age-differential segmentation bias that wouldhave gone unnoticed otherwise For younger adults indi-vidual differences in volume estimates derived from Free-Surfer and manual segmentation were highly correlatedas the mean volumes estimates with FreeSurfer exceededmanually obtained estimates by a constant amount In con-trast for older adults adding a constant value did notcapture differences between the two methods as thedegree of disparity between the two methods was foundto depend on the size of manually estimated volumesSpecifically the difference between FreeSurfer and manualsegmentation decreased as the manually segmented hippo-campus volumes increased pointing to a differential biastowards relatively underestimating larger volumes in olderadults with FreeSurfer compared to manual segmentationThus FreeSurfer may yield cross-sectional age-group dif-ferences in instances in which manual segmentation wouldnot This finding is novel and important for the field ashippocampal segmentation based on FreeSurfer is increas-

ingly being used in studies on normal cognitive agingpathological cognitive aging and in studies involvingelderly populations suffering from some kind of disease

These results thus portray a twofold picture of reliabilityand validity in manual and automatic Hc segmentationsIn the younger age group FreeSurfer can be regarded as areliable and valid method for assessing differences in hip-pocampal volume with at least as high reliability as man-ual tracing However in the older age group theoverestimation with FreeSurfer was dependent on ldquotruerdquohippocampal volume as it was smaller for larger hippo-campi than for smaller hippocampi Clearly these novelresults concerning an age-differential bias in older hippo-campi are important for the growing field using automatichippocampal segmentations in aging and diseasedpopulations

The absence of a significant main effect of age for ourmanual segmentation method may be regarded as surpris-ing given previous reports of hippocampal shrinkage innormal aging [eg Raz et al 2005 Scahill et al 2003]However there have also been studies finding no cross-sectional differences in hippocampal volume with age[Bigler et al 1997 Du et al 2006 Liu et al 2003 Sullivanet al 1995 2005 Szentkuti et al 2004] It has also beenshown previously that several individuals from an olderage group in their seventies can indeed have a similar oreven larger hippocampal volume than 20ndash30 year oldadults and that the dispersion around the mean of hippo-campal volume does not necessarily have to differ foryounger and older adults [Lupien et al 2007] which isexactly what we see in our data from manual tracing Fur-thermore it is of course important to note the strict inclu-sion criteria that had to be met in order to participate inthe study Participants had to be able to walk on a tread-mill without any gait problems be free from neurological

Figure 3

Difference between FreeSurfer and manual estimates as a function of manually segmented ldquotruerdquo

hippocampal volume decreases in the old age group but stays stable for the young age group

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 7 r

or other severe diseases and had to be able to undergoMRI scanning which removed participants with implantsTherefore participants in our older age group may be clas-sified as high-functioning elderly adults As such they are

more likely to have a better-preserved brain structure thanthe average person from this age cohort

The bias in volume estimation within FreeSurfer in theolder age group is not only interfering with cross-sectional

Figure 4

A Small Hc of a young participant B Large Hc of a young participant C Small Hc of an old par-

ticipant D Large Hc of an old participant

r Wenger et al r

r 8 r

comparisons but also poses challenges on the programsrsquoapplicability in longitudinal studies FreeSurfer would pre-sumably be less sensitive to longitudinal decreases in hip-pocampal volume in samples of older adults as theoverestimation is higher for smaller volumes in this agegroup and would also be less likely to find longitudinalincreases as larger hippocampal volumes are not beingoverestimated to the same extent as smaller ones

Another intriguing difference between the two segmen-tation methods concerns the leftndashright Hc asymmetry Asexpected from the literature [eg Barnes et al 2008 Bassoet al 2006 Honeycutt and Smith 1995 Jack et al 2003Krasuski et al 1998 Pruessner et al 2000] manual seg-mentation yielded a bigger right than left Hc FreeSurferalso estimates the right Hc as slightly bigger but with aconsiderably smaller effect size When we used earlierprocessing versions of FreeSurfer we even saw an absenceof significant differences between the hippocampi Thedata pattern observed in our study is in good agreementwith the left-right asymmetry reported in Sanchez-Benavides et al [2010] Sanchez-Benavides et al [2010]found that the left-right discrepancy was statistically reli-able with manual segmentation but not with FreeSurfersegmentation Although the left-right asymmetric segmen-tations result is repeatedly reported in several studiesthere are also others that do not find a significant differ-ence between left and right hippocampus in healthy indi-viduals both with segmentation on MR images [egBigler et al 1997 Raz et al 2004] and in autopsy data ofpost-mortem brains [Bogerts et al 1990] This ambiguitymight stem from a potential bias in manual segmentationbased on the laterality of visual perception [Maltbie et al2012] It is possible that human tracers do prefer one sideof a displayed brain during tracing and are thus moreaccurate on one side of the brain therefore producing onebigger and one smaller Hc A possible solution could be todisplay both hippocampi on the same side of the monitorby flipping the MR image during segmentation As it can-not be determined from manual segmentation resultsalone how big the true anatomical left-right asymmetryeffect is it cannot be concluded whether FreeSurfer is per-forming erroneously or correctly in producing a consider-ably smaller effect

We have discussed that FreeSurferrsquos reliability andvalidity is different for younger and older hippocampi Itis quite likely that the values for tissue priors and coordi-nates of the default segmentation atlas used in FreeSurferare further away from optimal values for older adults thanfor younger adults The default segmentation atlas in Free-Surfer is created on the basis of 39 middle-aged brains(mean age around 38 years 6 10) primarily drawn from asample described by Goldstein et al [1999] The hippo-campi of this sample were manually segmented accordingto the conventions of the Center for Morphometric Analy-sis [Fischl et al 2002 2004] Probably the brain structureof these middle-aged brains was more similar to the brainstructure of the younger adults of our study who were

aged 20ndash30 years than to the brain structure of the olderadults who were aged 60ndash70 years resulting in moreaccurate segmentation for younger than for older brainsOlder brain structures as displayed on T1-weighted MRimages typically show a less distinct gray-white mattercontrast [Westlye et al 2009] and are more ldquoblurryrdquowhich makes it harder to define the true border aroundthe Hc and its surrounding structures or cerebrospinalfluid However it remains unknown why the automaticsegmentation algorithm did not overestimate the volumefor bigger older Hc in the same way as for smaller olderHc or younger Hc in general Considering the blurrinessof old brain structures on MR images it would have beenmaybe even more plausible if bigger older Hc had beenmore overestimated However the present data yieldedthe opposite pattern Future studies should follow up onthis finding of overestimation in younger and smallerolder Hc coupled with a relative underestimation of largerolder Hc and try to determine sources of this estimationbias A useful step would be to build age group-specificdefault segmentation atlases [see also Avants et al 2010]Given that normal aging is associated with gradual ana-tomical changes [eg Raz et al 2005] and is often modu-lated by age-correlated conditions that affect the brainsuch as cardiovascular and metabolic syndromes theimportance of arriving at an age-adjusted template isobvious Rerunning the FreeSurfer pipeline with custom-ized masks in the background and examining how suchdefault-segmentation atlases may change subcortical vol-ume estimates may help to circumvent the current ambi-guities in the data As the generation of a whole-brainsegmentation atlas is very time-consuming cooperationbetween different laboratories performing manual wholebrain segmentation is desirable with the common goal ofcreating an age-graded segmentation atlas

In general there are marked differences in the labelingprotocol between FreeSurfer and our manual segmenta-tions which followed the rules provided by Pruessneret al [2000] Visual inspection of the graphic overlay ofboth segmentations gives hints to differences in the label-ing protocol especially in the subiculumentorhinalpara-hippocampal regions the border between amygdala andthe hippocampus and the border between the tail of thehippocampus and the lateral ventricle as has also beenreported by others [eg Cherbuin et al 2009 Deweyet al 2010 Morey et al 2009] In all these areas FreeSur-fer seems to be more inclusive whereas the manual trac-ing protocol provided clear rules of where to draw theanatomical boundary between the hippocampus and itssurroundings which is in general difficult to define[Duvernoy 2005 also see Supporting Information for fur-ther information] These differences in segmentation proto-cols might prevent the two methods from beingcompletely interchangeable and might pose challenges onstudies in which hippocampal shape in these regions is ofmain interest The tendency of FreeSurfer towards overin-clusion of boundary voxels and a more liberal definition

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 9 r

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 6: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

segmentation right Hc was estimated to be significantlylarger than left Hc t(90) 5 1093 p lt 0001 r2 5 0570FreeSurfer also estimated the right Hc slightly larger thanthe left however with a considerably smaller effect sizet(90) 5 286 p 5 0005 r2 5 0083

Correlation of Difference Between FreeSurfer

and Manual Estimates and Manual Segmentation

as a Proxy for True Hippocampal Volume

In the younger age group there was no relation betweenmanually assessed hippocampal volume and the differencebetween FreeSurfer and manual estimates neither in leftHc r 5 0013 p 5 0931 nor in right Hc r 5 0130 p 5

0401 However in the older age group there was a nega-tive association both in left Hc r 5 20551 p lt 0001 andin right Hc r 5 20421 p 5 0003 This suggests a bias inautomatic volume estimation for the old age group Specif-ically in the younger age group FreeSurfer overestimatedhippocampal volume and it did so independently of man-ually assessed hippocampal volume (Fig 3 see also Fig 4for exemplary visualization of small and large young andold Hc segmentations) By contrast in the older age groupthe difference between FreeSurfer and manual segmenta-tion decreased as the manually segmented Hc sizeincreased indicating a differential bias toward relativelyunderestimating larger volumes in older adults

The same analyses were run on the second and thirdmeasurement time point where results display the exactsame pattern as on the first time point (see appendix)

DISCUSSION

Our results reveal high stability coefficients over timefor both manual and FreeSurfer segmentations With Free-Surfer correlations over time were significantly lower inthe older than in the younger age group which was notthe case with manual segmentation The Pearson correla-tions between the two assessment procedures were rela-tively high (0683 for left Hc and 0766 for right Hc)

Figure 1

Mean differences between the two estimation methods were more pronounced in the young age group

Figure 2

Manual segmentation yielded a pronounced hemispheric differ-

ence in hippocampal volume FreeSurfer also segmented the

right Hc slightly larger than the left however with a consider-

ably smaller effect size reflected in a significant Hemisphere 3

Method interaction

r Wenger et al r

r 6 r

Absolute agreements between the two measures howeverwere considerably lower as FreeSurfer estimated volumesto be higher This volume difference was larger in theyoung than in the old FreeSurfer detected a significantage difference in hippocampal volume whereas manualtracing did not However manual tracing resulted in a sig-nificant difference between left and right Hc whereasFreeSurfer segmented both sides in a more similar man-ner FreeSurfer overestimated hippocampal size independ-ently of manually assessed volume in the younger agegroup but relatively underestimated larger volumes inolder adults thus introducing a bias in this age group

To our knowledge no other investigation up to date hascompared manual segmentation with FreeSurfer segmenta-tion in healthy younger adults and healthy older adultswithin the same study Doing so has allowed us to dis-cover an age-differential segmentation bias that wouldhave gone unnoticed otherwise For younger adults indi-vidual differences in volume estimates derived from Free-Surfer and manual segmentation were highly correlatedas the mean volumes estimates with FreeSurfer exceededmanually obtained estimates by a constant amount In con-trast for older adults adding a constant value did notcapture differences between the two methods as thedegree of disparity between the two methods was foundto depend on the size of manually estimated volumesSpecifically the difference between FreeSurfer and manualsegmentation decreased as the manually segmented hippo-campus volumes increased pointing to a differential biastowards relatively underestimating larger volumes in olderadults with FreeSurfer compared to manual segmentationThus FreeSurfer may yield cross-sectional age-group dif-ferences in instances in which manual segmentation wouldnot This finding is novel and important for the field ashippocampal segmentation based on FreeSurfer is increas-

ingly being used in studies on normal cognitive agingpathological cognitive aging and in studies involvingelderly populations suffering from some kind of disease

These results thus portray a twofold picture of reliabilityand validity in manual and automatic Hc segmentationsIn the younger age group FreeSurfer can be regarded as areliable and valid method for assessing differences in hip-pocampal volume with at least as high reliability as man-ual tracing However in the older age group theoverestimation with FreeSurfer was dependent on ldquotruerdquohippocampal volume as it was smaller for larger hippo-campi than for smaller hippocampi Clearly these novelresults concerning an age-differential bias in older hippo-campi are important for the growing field using automatichippocampal segmentations in aging and diseasedpopulations

The absence of a significant main effect of age for ourmanual segmentation method may be regarded as surpris-ing given previous reports of hippocampal shrinkage innormal aging [eg Raz et al 2005 Scahill et al 2003]However there have also been studies finding no cross-sectional differences in hippocampal volume with age[Bigler et al 1997 Du et al 2006 Liu et al 2003 Sullivanet al 1995 2005 Szentkuti et al 2004] It has also beenshown previously that several individuals from an olderage group in their seventies can indeed have a similar oreven larger hippocampal volume than 20ndash30 year oldadults and that the dispersion around the mean of hippo-campal volume does not necessarily have to differ foryounger and older adults [Lupien et al 2007] which isexactly what we see in our data from manual tracing Fur-thermore it is of course important to note the strict inclu-sion criteria that had to be met in order to participate inthe study Participants had to be able to walk on a tread-mill without any gait problems be free from neurological

Figure 3

Difference between FreeSurfer and manual estimates as a function of manually segmented ldquotruerdquo

hippocampal volume decreases in the old age group but stays stable for the young age group

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 7 r

or other severe diseases and had to be able to undergoMRI scanning which removed participants with implantsTherefore participants in our older age group may be clas-sified as high-functioning elderly adults As such they are

more likely to have a better-preserved brain structure thanthe average person from this age cohort

The bias in volume estimation within FreeSurfer in theolder age group is not only interfering with cross-sectional

Figure 4

A Small Hc of a young participant B Large Hc of a young participant C Small Hc of an old par-

ticipant D Large Hc of an old participant

r Wenger et al r

r 8 r

comparisons but also poses challenges on the programsrsquoapplicability in longitudinal studies FreeSurfer would pre-sumably be less sensitive to longitudinal decreases in hip-pocampal volume in samples of older adults as theoverestimation is higher for smaller volumes in this agegroup and would also be less likely to find longitudinalincreases as larger hippocampal volumes are not beingoverestimated to the same extent as smaller ones

Another intriguing difference between the two segmen-tation methods concerns the leftndashright Hc asymmetry Asexpected from the literature [eg Barnes et al 2008 Bassoet al 2006 Honeycutt and Smith 1995 Jack et al 2003Krasuski et al 1998 Pruessner et al 2000] manual seg-mentation yielded a bigger right than left Hc FreeSurferalso estimates the right Hc as slightly bigger but with aconsiderably smaller effect size When we used earlierprocessing versions of FreeSurfer we even saw an absenceof significant differences between the hippocampi Thedata pattern observed in our study is in good agreementwith the left-right asymmetry reported in Sanchez-Benavides et al [2010] Sanchez-Benavides et al [2010]found that the left-right discrepancy was statistically reli-able with manual segmentation but not with FreeSurfersegmentation Although the left-right asymmetric segmen-tations result is repeatedly reported in several studiesthere are also others that do not find a significant differ-ence between left and right hippocampus in healthy indi-viduals both with segmentation on MR images [egBigler et al 1997 Raz et al 2004] and in autopsy data ofpost-mortem brains [Bogerts et al 1990] This ambiguitymight stem from a potential bias in manual segmentationbased on the laterality of visual perception [Maltbie et al2012] It is possible that human tracers do prefer one sideof a displayed brain during tracing and are thus moreaccurate on one side of the brain therefore producing onebigger and one smaller Hc A possible solution could be todisplay both hippocampi on the same side of the monitorby flipping the MR image during segmentation As it can-not be determined from manual segmentation resultsalone how big the true anatomical left-right asymmetryeffect is it cannot be concluded whether FreeSurfer is per-forming erroneously or correctly in producing a consider-ably smaller effect

We have discussed that FreeSurferrsquos reliability andvalidity is different for younger and older hippocampi Itis quite likely that the values for tissue priors and coordi-nates of the default segmentation atlas used in FreeSurferare further away from optimal values for older adults thanfor younger adults The default segmentation atlas in Free-Surfer is created on the basis of 39 middle-aged brains(mean age around 38 years 6 10) primarily drawn from asample described by Goldstein et al [1999] The hippo-campi of this sample were manually segmented accordingto the conventions of the Center for Morphometric Analy-sis [Fischl et al 2002 2004] Probably the brain structureof these middle-aged brains was more similar to the brainstructure of the younger adults of our study who were

aged 20ndash30 years than to the brain structure of the olderadults who were aged 60ndash70 years resulting in moreaccurate segmentation for younger than for older brainsOlder brain structures as displayed on T1-weighted MRimages typically show a less distinct gray-white mattercontrast [Westlye et al 2009] and are more ldquoblurryrdquowhich makes it harder to define the true border aroundthe Hc and its surrounding structures or cerebrospinalfluid However it remains unknown why the automaticsegmentation algorithm did not overestimate the volumefor bigger older Hc in the same way as for smaller olderHc or younger Hc in general Considering the blurrinessof old brain structures on MR images it would have beenmaybe even more plausible if bigger older Hc had beenmore overestimated However the present data yieldedthe opposite pattern Future studies should follow up onthis finding of overestimation in younger and smallerolder Hc coupled with a relative underestimation of largerolder Hc and try to determine sources of this estimationbias A useful step would be to build age group-specificdefault segmentation atlases [see also Avants et al 2010]Given that normal aging is associated with gradual ana-tomical changes [eg Raz et al 2005] and is often modu-lated by age-correlated conditions that affect the brainsuch as cardiovascular and metabolic syndromes theimportance of arriving at an age-adjusted template isobvious Rerunning the FreeSurfer pipeline with custom-ized masks in the background and examining how suchdefault-segmentation atlases may change subcortical vol-ume estimates may help to circumvent the current ambi-guities in the data As the generation of a whole-brainsegmentation atlas is very time-consuming cooperationbetween different laboratories performing manual wholebrain segmentation is desirable with the common goal ofcreating an age-graded segmentation atlas

In general there are marked differences in the labelingprotocol between FreeSurfer and our manual segmenta-tions which followed the rules provided by Pruessneret al [2000] Visual inspection of the graphic overlay ofboth segmentations gives hints to differences in the label-ing protocol especially in the subiculumentorhinalpara-hippocampal regions the border between amygdala andthe hippocampus and the border between the tail of thehippocampus and the lateral ventricle as has also beenreported by others [eg Cherbuin et al 2009 Deweyet al 2010 Morey et al 2009] In all these areas FreeSur-fer seems to be more inclusive whereas the manual trac-ing protocol provided clear rules of where to draw theanatomical boundary between the hippocampus and itssurroundings which is in general difficult to define[Duvernoy 2005 also see Supporting Information for fur-ther information] These differences in segmentation proto-cols might prevent the two methods from beingcompletely interchangeable and might pose challenges onstudies in which hippocampal shape in these regions is ofmain interest The tendency of FreeSurfer towards overin-clusion of boundary voxels and a more liberal definition

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 9 r

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 7: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

Absolute agreements between the two measures howeverwere considerably lower as FreeSurfer estimated volumesto be higher This volume difference was larger in theyoung than in the old FreeSurfer detected a significantage difference in hippocampal volume whereas manualtracing did not However manual tracing resulted in a sig-nificant difference between left and right Hc whereasFreeSurfer segmented both sides in a more similar man-ner FreeSurfer overestimated hippocampal size independ-ently of manually assessed volume in the younger agegroup but relatively underestimated larger volumes inolder adults thus introducing a bias in this age group

To our knowledge no other investigation up to date hascompared manual segmentation with FreeSurfer segmenta-tion in healthy younger adults and healthy older adultswithin the same study Doing so has allowed us to dis-cover an age-differential segmentation bias that wouldhave gone unnoticed otherwise For younger adults indi-vidual differences in volume estimates derived from Free-Surfer and manual segmentation were highly correlatedas the mean volumes estimates with FreeSurfer exceededmanually obtained estimates by a constant amount In con-trast for older adults adding a constant value did notcapture differences between the two methods as thedegree of disparity between the two methods was foundto depend on the size of manually estimated volumesSpecifically the difference between FreeSurfer and manualsegmentation decreased as the manually segmented hippo-campus volumes increased pointing to a differential biastowards relatively underestimating larger volumes in olderadults with FreeSurfer compared to manual segmentationThus FreeSurfer may yield cross-sectional age-group dif-ferences in instances in which manual segmentation wouldnot This finding is novel and important for the field ashippocampal segmentation based on FreeSurfer is increas-

ingly being used in studies on normal cognitive agingpathological cognitive aging and in studies involvingelderly populations suffering from some kind of disease

These results thus portray a twofold picture of reliabilityand validity in manual and automatic Hc segmentationsIn the younger age group FreeSurfer can be regarded as areliable and valid method for assessing differences in hip-pocampal volume with at least as high reliability as man-ual tracing However in the older age group theoverestimation with FreeSurfer was dependent on ldquotruerdquohippocampal volume as it was smaller for larger hippo-campi than for smaller hippocampi Clearly these novelresults concerning an age-differential bias in older hippo-campi are important for the growing field using automatichippocampal segmentations in aging and diseasedpopulations

The absence of a significant main effect of age for ourmanual segmentation method may be regarded as surpris-ing given previous reports of hippocampal shrinkage innormal aging [eg Raz et al 2005 Scahill et al 2003]However there have also been studies finding no cross-sectional differences in hippocampal volume with age[Bigler et al 1997 Du et al 2006 Liu et al 2003 Sullivanet al 1995 2005 Szentkuti et al 2004] It has also beenshown previously that several individuals from an olderage group in their seventies can indeed have a similar oreven larger hippocampal volume than 20ndash30 year oldadults and that the dispersion around the mean of hippo-campal volume does not necessarily have to differ foryounger and older adults [Lupien et al 2007] which isexactly what we see in our data from manual tracing Fur-thermore it is of course important to note the strict inclu-sion criteria that had to be met in order to participate inthe study Participants had to be able to walk on a tread-mill without any gait problems be free from neurological

Figure 3

Difference between FreeSurfer and manual estimates as a function of manually segmented ldquotruerdquo

hippocampal volume decreases in the old age group but stays stable for the young age group

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 7 r

or other severe diseases and had to be able to undergoMRI scanning which removed participants with implantsTherefore participants in our older age group may be clas-sified as high-functioning elderly adults As such they are

more likely to have a better-preserved brain structure thanthe average person from this age cohort

The bias in volume estimation within FreeSurfer in theolder age group is not only interfering with cross-sectional

Figure 4

A Small Hc of a young participant B Large Hc of a young participant C Small Hc of an old par-

ticipant D Large Hc of an old participant

r Wenger et al r

r 8 r

comparisons but also poses challenges on the programsrsquoapplicability in longitudinal studies FreeSurfer would pre-sumably be less sensitive to longitudinal decreases in hip-pocampal volume in samples of older adults as theoverestimation is higher for smaller volumes in this agegroup and would also be less likely to find longitudinalincreases as larger hippocampal volumes are not beingoverestimated to the same extent as smaller ones

Another intriguing difference between the two segmen-tation methods concerns the leftndashright Hc asymmetry Asexpected from the literature [eg Barnes et al 2008 Bassoet al 2006 Honeycutt and Smith 1995 Jack et al 2003Krasuski et al 1998 Pruessner et al 2000] manual seg-mentation yielded a bigger right than left Hc FreeSurferalso estimates the right Hc as slightly bigger but with aconsiderably smaller effect size When we used earlierprocessing versions of FreeSurfer we even saw an absenceof significant differences between the hippocampi Thedata pattern observed in our study is in good agreementwith the left-right asymmetry reported in Sanchez-Benavides et al [2010] Sanchez-Benavides et al [2010]found that the left-right discrepancy was statistically reli-able with manual segmentation but not with FreeSurfersegmentation Although the left-right asymmetric segmen-tations result is repeatedly reported in several studiesthere are also others that do not find a significant differ-ence between left and right hippocampus in healthy indi-viduals both with segmentation on MR images [egBigler et al 1997 Raz et al 2004] and in autopsy data ofpost-mortem brains [Bogerts et al 1990] This ambiguitymight stem from a potential bias in manual segmentationbased on the laterality of visual perception [Maltbie et al2012] It is possible that human tracers do prefer one sideof a displayed brain during tracing and are thus moreaccurate on one side of the brain therefore producing onebigger and one smaller Hc A possible solution could be todisplay both hippocampi on the same side of the monitorby flipping the MR image during segmentation As it can-not be determined from manual segmentation resultsalone how big the true anatomical left-right asymmetryeffect is it cannot be concluded whether FreeSurfer is per-forming erroneously or correctly in producing a consider-ably smaller effect

We have discussed that FreeSurferrsquos reliability andvalidity is different for younger and older hippocampi Itis quite likely that the values for tissue priors and coordi-nates of the default segmentation atlas used in FreeSurferare further away from optimal values for older adults thanfor younger adults The default segmentation atlas in Free-Surfer is created on the basis of 39 middle-aged brains(mean age around 38 years 6 10) primarily drawn from asample described by Goldstein et al [1999] The hippo-campi of this sample were manually segmented accordingto the conventions of the Center for Morphometric Analy-sis [Fischl et al 2002 2004] Probably the brain structureof these middle-aged brains was more similar to the brainstructure of the younger adults of our study who were

aged 20ndash30 years than to the brain structure of the olderadults who were aged 60ndash70 years resulting in moreaccurate segmentation for younger than for older brainsOlder brain structures as displayed on T1-weighted MRimages typically show a less distinct gray-white mattercontrast [Westlye et al 2009] and are more ldquoblurryrdquowhich makes it harder to define the true border aroundthe Hc and its surrounding structures or cerebrospinalfluid However it remains unknown why the automaticsegmentation algorithm did not overestimate the volumefor bigger older Hc in the same way as for smaller olderHc or younger Hc in general Considering the blurrinessof old brain structures on MR images it would have beenmaybe even more plausible if bigger older Hc had beenmore overestimated However the present data yieldedthe opposite pattern Future studies should follow up onthis finding of overestimation in younger and smallerolder Hc coupled with a relative underestimation of largerolder Hc and try to determine sources of this estimationbias A useful step would be to build age group-specificdefault segmentation atlases [see also Avants et al 2010]Given that normal aging is associated with gradual ana-tomical changes [eg Raz et al 2005] and is often modu-lated by age-correlated conditions that affect the brainsuch as cardiovascular and metabolic syndromes theimportance of arriving at an age-adjusted template isobvious Rerunning the FreeSurfer pipeline with custom-ized masks in the background and examining how suchdefault-segmentation atlases may change subcortical vol-ume estimates may help to circumvent the current ambi-guities in the data As the generation of a whole-brainsegmentation atlas is very time-consuming cooperationbetween different laboratories performing manual wholebrain segmentation is desirable with the common goal ofcreating an age-graded segmentation atlas

In general there are marked differences in the labelingprotocol between FreeSurfer and our manual segmenta-tions which followed the rules provided by Pruessneret al [2000] Visual inspection of the graphic overlay ofboth segmentations gives hints to differences in the label-ing protocol especially in the subiculumentorhinalpara-hippocampal regions the border between amygdala andthe hippocampus and the border between the tail of thehippocampus and the lateral ventricle as has also beenreported by others [eg Cherbuin et al 2009 Deweyet al 2010 Morey et al 2009] In all these areas FreeSur-fer seems to be more inclusive whereas the manual trac-ing protocol provided clear rules of where to draw theanatomical boundary between the hippocampus and itssurroundings which is in general difficult to define[Duvernoy 2005 also see Supporting Information for fur-ther information] These differences in segmentation proto-cols might prevent the two methods from beingcompletely interchangeable and might pose challenges onstudies in which hippocampal shape in these regions is ofmain interest The tendency of FreeSurfer towards overin-clusion of boundary voxels and a more liberal definition

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 9 r

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 8: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

or other severe diseases and had to be able to undergoMRI scanning which removed participants with implantsTherefore participants in our older age group may be clas-sified as high-functioning elderly adults As such they are

more likely to have a better-preserved brain structure thanthe average person from this age cohort

The bias in volume estimation within FreeSurfer in theolder age group is not only interfering with cross-sectional

Figure 4

A Small Hc of a young participant B Large Hc of a young participant C Small Hc of an old par-

ticipant D Large Hc of an old participant

r Wenger et al r

r 8 r

comparisons but also poses challenges on the programsrsquoapplicability in longitudinal studies FreeSurfer would pre-sumably be less sensitive to longitudinal decreases in hip-pocampal volume in samples of older adults as theoverestimation is higher for smaller volumes in this agegroup and would also be less likely to find longitudinalincreases as larger hippocampal volumes are not beingoverestimated to the same extent as smaller ones

Another intriguing difference between the two segmen-tation methods concerns the leftndashright Hc asymmetry Asexpected from the literature [eg Barnes et al 2008 Bassoet al 2006 Honeycutt and Smith 1995 Jack et al 2003Krasuski et al 1998 Pruessner et al 2000] manual seg-mentation yielded a bigger right than left Hc FreeSurferalso estimates the right Hc as slightly bigger but with aconsiderably smaller effect size When we used earlierprocessing versions of FreeSurfer we even saw an absenceof significant differences between the hippocampi Thedata pattern observed in our study is in good agreementwith the left-right asymmetry reported in Sanchez-Benavides et al [2010] Sanchez-Benavides et al [2010]found that the left-right discrepancy was statistically reli-able with manual segmentation but not with FreeSurfersegmentation Although the left-right asymmetric segmen-tations result is repeatedly reported in several studiesthere are also others that do not find a significant differ-ence between left and right hippocampus in healthy indi-viduals both with segmentation on MR images [egBigler et al 1997 Raz et al 2004] and in autopsy data ofpost-mortem brains [Bogerts et al 1990] This ambiguitymight stem from a potential bias in manual segmentationbased on the laterality of visual perception [Maltbie et al2012] It is possible that human tracers do prefer one sideof a displayed brain during tracing and are thus moreaccurate on one side of the brain therefore producing onebigger and one smaller Hc A possible solution could be todisplay both hippocampi on the same side of the monitorby flipping the MR image during segmentation As it can-not be determined from manual segmentation resultsalone how big the true anatomical left-right asymmetryeffect is it cannot be concluded whether FreeSurfer is per-forming erroneously or correctly in producing a consider-ably smaller effect

We have discussed that FreeSurferrsquos reliability andvalidity is different for younger and older hippocampi Itis quite likely that the values for tissue priors and coordi-nates of the default segmentation atlas used in FreeSurferare further away from optimal values for older adults thanfor younger adults The default segmentation atlas in Free-Surfer is created on the basis of 39 middle-aged brains(mean age around 38 years 6 10) primarily drawn from asample described by Goldstein et al [1999] The hippo-campi of this sample were manually segmented accordingto the conventions of the Center for Morphometric Analy-sis [Fischl et al 2002 2004] Probably the brain structureof these middle-aged brains was more similar to the brainstructure of the younger adults of our study who were

aged 20ndash30 years than to the brain structure of the olderadults who were aged 60ndash70 years resulting in moreaccurate segmentation for younger than for older brainsOlder brain structures as displayed on T1-weighted MRimages typically show a less distinct gray-white mattercontrast [Westlye et al 2009] and are more ldquoblurryrdquowhich makes it harder to define the true border aroundthe Hc and its surrounding structures or cerebrospinalfluid However it remains unknown why the automaticsegmentation algorithm did not overestimate the volumefor bigger older Hc in the same way as for smaller olderHc or younger Hc in general Considering the blurrinessof old brain structures on MR images it would have beenmaybe even more plausible if bigger older Hc had beenmore overestimated However the present data yieldedthe opposite pattern Future studies should follow up onthis finding of overestimation in younger and smallerolder Hc coupled with a relative underestimation of largerolder Hc and try to determine sources of this estimationbias A useful step would be to build age group-specificdefault segmentation atlases [see also Avants et al 2010]Given that normal aging is associated with gradual ana-tomical changes [eg Raz et al 2005] and is often modu-lated by age-correlated conditions that affect the brainsuch as cardiovascular and metabolic syndromes theimportance of arriving at an age-adjusted template isobvious Rerunning the FreeSurfer pipeline with custom-ized masks in the background and examining how suchdefault-segmentation atlases may change subcortical vol-ume estimates may help to circumvent the current ambi-guities in the data As the generation of a whole-brainsegmentation atlas is very time-consuming cooperationbetween different laboratories performing manual wholebrain segmentation is desirable with the common goal ofcreating an age-graded segmentation atlas

In general there are marked differences in the labelingprotocol between FreeSurfer and our manual segmenta-tions which followed the rules provided by Pruessneret al [2000] Visual inspection of the graphic overlay ofboth segmentations gives hints to differences in the label-ing protocol especially in the subiculumentorhinalpara-hippocampal regions the border between amygdala andthe hippocampus and the border between the tail of thehippocampus and the lateral ventricle as has also beenreported by others [eg Cherbuin et al 2009 Deweyet al 2010 Morey et al 2009] In all these areas FreeSur-fer seems to be more inclusive whereas the manual trac-ing protocol provided clear rules of where to draw theanatomical boundary between the hippocampus and itssurroundings which is in general difficult to define[Duvernoy 2005 also see Supporting Information for fur-ther information] These differences in segmentation proto-cols might prevent the two methods from beingcompletely interchangeable and might pose challenges onstudies in which hippocampal shape in these regions is ofmain interest The tendency of FreeSurfer towards overin-clusion of boundary voxels and a more liberal definition

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 9 r

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 9: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

comparisons but also poses challenges on the programsrsquoapplicability in longitudinal studies FreeSurfer would pre-sumably be less sensitive to longitudinal decreases in hip-pocampal volume in samples of older adults as theoverestimation is higher for smaller volumes in this agegroup and would also be less likely to find longitudinalincreases as larger hippocampal volumes are not beingoverestimated to the same extent as smaller ones

Another intriguing difference between the two segmen-tation methods concerns the leftndashright Hc asymmetry Asexpected from the literature [eg Barnes et al 2008 Bassoet al 2006 Honeycutt and Smith 1995 Jack et al 2003Krasuski et al 1998 Pruessner et al 2000] manual seg-mentation yielded a bigger right than left Hc FreeSurferalso estimates the right Hc as slightly bigger but with aconsiderably smaller effect size When we used earlierprocessing versions of FreeSurfer we even saw an absenceof significant differences between the hippocampi Thedata pattern observed in our study is in good agreementwith the left-right asymmetry reported in Sanchez-Benavides et al [2010] Sanchez-Benavides et al [2010]found that the left-right discrepancy was statistically reli-able with manual segmentation but not with FreeSurfersegmentation Although the left-right asymmetric segmen-tations result is repeatedly reported in several studiesthere are also others that do not find a significant differ-ence between left and right hippocampus in healthy indi-viduals both with segmentation on MR images [egBigler et al 1997 Raz et al 2004] and in autopsy data ofpost-mortem brains [Bogerts et al 1990] This ambiguitymight stem from a potential bias in manual segmentationbased on the laterality of visual perception [Maltbie et al2012] It is possible that human tracers do prefer one sideof a displayed brain during tracing and are thus moreaccurate on one side of the brain therefore producing onebigger and one smaller Hc A possible solution could be todisplay both hippocampi on the same side of the monitorby flipping the MR image during segmentation As it can-not be determined from manual segmentation resultsalone how big the true anatomical left-right asymmetryeffect is it cannot be concluded whether FreeSurfer is per-forming erroneously or correctly in producing a consider-ably smaller effect

We have discussed that FreeSurferrsquos reliability andvalidity is different for younger and older hippocampi Itis quite likely that the values for tissue priors and coordi-nates of the default segmentation atlas used in FreeSurferare further away from optimal values for older adults thanfor younger adults The default segmentation atlas in Free-Surfer is created on the basis of 39 middle-aged brains(mean age around 38 years 6 10) primarily drawn from asample described by Goldstein et al [1999] The hippo-campi of this sample were manually segmented accordingto the conventions of the Center for Morphometric Analy-sis [Fischl et al 2002 2004] Probably the brain structureof these middle-aged brains was more similar to the brainstructure of the younger adults of our study who were

aged 20ndash30 years than to the brain structure of the olderadults who were aged 60ndash70 years resulting in moreaccurate segmentation for younger than for older brainsOlder brain structures as displayed on T1-weighted MRimages typically show a less distinct gray-white mattercontrast [Westlye et al 2009] and are more ldquoblurryrdquowhich makes it harder to define the true border aroundthe Hc and its surrounding structures or cerebrospinalfluid However it remains unknown why the automaticsegmentation algorithm did not overestimate the volumefor bigger older Hc in the same way as for smaller olderHc or younger Hc in general Considering the blurrinessof old brain structures on MR images it would have beenmaybe even more plausible if bigger older Hc had beenmore overestimated However the present data yieldedthe opposite pattern Future studies should follow up onthis finding of overestimation in younger and smallerolder Hc coupled with a relative underestimation of largerolder Hc and try to determine sources of this estimationbias A useful step would be to build age group-specificdefault segmentation atlases [see also Avants et al 2010]Given that normal aging is associated with gradual ana-tomical changes [eg Raz et al 2005] and is often modu-lated by age-correlated conditions that affect the brainsuch as cardiovascular and metabolic syndromes theimportance of arriving at an age-adjusted template isobvious Rerunning the FreeSurfer pipeline with custom-ized masks in the background and examining how suchdefault-segmentation atlases may change subcortical vol-ume estimates may help to circumvent the current ambi-guities in the data As the generation of a whole-brainsegmentation atlas is very time-consuming cooperationbetween different laboratories performing manual wholebrain segmentation is desirable with the common goal ofcreating an age-graded segmentation atlas

In general there are marked differences in the labelingprotocol between FreeSurfer and our manual segmenta-tions which followed the rules provided by Pruessneret al [2000] Visual inspection of the graphic overlay ofboth segmentations gives hints to differences in the label-ing protocol especially in the subiculumentorhinalpara-hippocampal regions the border between amygdala andthe hippocampus and the border between the tail of thehippocampus and the lateral ventricle as has also beenreported by others [eg Cherbuin et al 2009 Deweyet al 2010 Morey et al 2009] In all these areas FreeSur-fer seems to be more inclusive whereas the manual trac-ing protocol provided clear rules of where to draw theanatomical boundary between the hippocampus and itssurroundings which is in general difficult to define[Duvernoy 2005 also see Supporting Information for fur-ther information] These differences in segmentation proto-cols might prevent the two methods from beingcompletely interchangeable and might pose challenges onstudies in which hippocampal shape in these regions is ofmain interest The tendency of FreeSurfer towards overin-clusion of boundary voxels and a more liberal definition

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 9 r

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 10: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

of boundaries to the amygdala entorhinal cortex and lat-eral ventricle would not constitute a major drawback formany research questions if it were consistent across indi-viduals and volumes However the results of this studyshow that this tendency varies by volume and age groupresulting in biased comparisons between younger andolder adults and biased individual estimates withingroups of older adults We note a need for furtherimprovements of the FreeSurfer segmentation pipeline toidentify and overcome these sources of bias

To summarize there were differences in segmentationoutcomes between manual and FreeSurfer estimates withregard to age effects and left-right asymmetry with fur-ther investigation needed to establish which segmentationmethod introduces a bias in volume estimation and left-right asymmetry We conclude that FreeSurfer is a credibleand valid method to assess differences in hippocampalvolume in younger age groups and can therefore be a fea-sible method to analyze large datasets of young adultsHowever no definitive statements about the size of leftand right Hc should be made at this point and researchwhere such asymmetries are of importance should takeheed Importantly FreeSurfer estimates in older agegroups should be interpreted with care until the automaticsegmentation pipeline has been further optimized toincrease validity and reliability in this age group

ACKNOWLEDGMENTS

The authors thank Colin Bauer Daniel Bittner MarcelDerks Isabel Dziobek Gabriele Faust Martin KanowskiJeuroorn Kaufmann Kristen Kennedy Shireen Kwiatkowska-Naqvi Naftali Raz Karen Rodrigue Michael SchellenbachClaus Tempelmann and all student assistants

APPENDIX

Similar to the data from the first measurement timepoint correlations between manual and FreeSurfer esti-mates were numerically higher in the younger than in the

older age group at both later time points B and C (seeTable IV) FreeSurfer again yielded consistently higher hip-pocampal volume than manual segmentation which wasagain more pronounced in younger adults Manual seg-mentation resulted in a hemispheric difference betweenleft and right hippocampus whereas FreeSurfer estimatedthem more similarly (see Table V) Replicating resultsfrom the first measurement time point we found no rela-tion between manually assessed hippocampal volume andthe difference between manual and FreeSurfer estimates inyounger adults (time point B rleft Hc 5 0073 p 5 0637rright Hc 5 0123 p 5 0428 time point C rleft Hc 5 0032 p5 0836 rright Hc 5 0178 p 5 0247) but a negative associ-ation in the older age group (time point B rleft Hc 520550 p lt 0001 rright Hc 5 20465 p 5 0001 time pointC rleft Hc 5 20538 p lt 0001 rright Hc 5 20447 p 50002) Again FreeSurfer overestimated hippocampal vol-ume consistently in the young age group but in the olderage group FreeSurfer overestimated large hippocampi to aless extent than small hippocampi

TABLE IV Bivariate Pearson correlations and intraclass

correlations between manual segmentations and Free-

Surfer estimates at time point B and C

rmanual FreeSurfer

Partial r

controllingfor age ryoung rold

Time point BLeft 0695 0746 0829 0641Right 0778 0827 0869 0798

ICCoverall ICC young ICC old

Left 0325 0276 0351Right 0456 0358 0579Time point CLeft 0687 0764 0834 0683Right 0765 0814 0873 0768

ICCoverall ICCyoung ICCold

Left 0323 0266 0397Right 0452 0356 0574

TABLE V Means and standard deviations of manual and FreeSurfer estimates for timepoint B and C as well as

absolute differences between the two methods

Left Right

Manual FreeSurfer Manual FreeSurferTime point B M 6 SD M 6 SD Difference M 6 SD M 6 SD Difference

All (n 5 91) 351988 6 38037 417121 6 50428 65133 369867 6 40758 423330 6 51378 53463Young (n 5 44) 358292 6 39541 446750 6 50165 88458 377119 6 40150 452083 6 49703 74964Old (n 5 47) 346086 6 35993 389289 6 31403 43203 363077 6 40612 396412 6 36390 33335Time point CAll (n 5 91) 352708 6 37973 417100 6 49918 64392 369511 6 41143 423242 6 52822 53731Young (n5 44) 357684 6 39343 447374 6 48206 89690 376450 6 39881 452455 6 50803 76005Old (n 5 47) 348050 6 36448 388758 6 31713 40708 363015 6 41665 395893 6 38285 32878

r Wenger et al r

r 10 r

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 11: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

REFERENCES

Ashburner J Friston KJ (2000) Voxel-based morphometry - Themethods Neuroimage 11805-821 doi httpdxdoiorg101006nimg20000582

Ashburner J Friston KJ (2001) Why voxel-based morphometryshould be used Neuroimage 141238-1243 doi httpdxdoiorg101006nimg20010961

Avants BB Yushkevich P Pluta J Minkoff D Korczykowski MDetre J Gee JC (2010) The optimal template effect in hippo-campus studies of diseased populations Neuroimage 492457-2466 doi httpdxdoiorg101016jneuroimage200909062

Barnes J Foster J Boyes RG Pepple T Moore EK Schott JM FoxNC (2008) A comparison of methods for the automated calcu-lation of volumes and atrophy rates in the hippocampus Neu-roimage 401655-1671 doi httpdxdoiorg101016jneuroimage200801012

Basso M Yang J Warren L MacAvoy MG Varma P Bronen RaVan Dyck CH (2006) Volumetry of amygdala and hippocam-pus and memory performance in Alzheimerrsquos disease Psychia-try Res 146251ndash261 doi101016jpscychresns200601007

Bhatia S Bookheimer SY Gaillard WD Theodore WH (1993)Measurement of whole temporal lobe and hippocampus forMR volumetry Normative data Neurology 432006-2010 doihttpdxdoiorg101212WNL43102006

Bigler ED Blatter DD Anderson CV Johnson SC Gale SDHopkins RO Burnett B (1997) Hippocampal volume in normalaging and traumatic brain injury Am J Neuroradiol 1811-23

Bogerts B Falkai P Haupts M Greve B Ernst S Tapernon-FranzU Heinzmann U (1990) Post-mortem volume measurementsof limbic system and basal ganglia structures in chronic schiz-ophrenics Initial results from a new brain collection Schiz-ophr Res 3295-301 doi httpdxdoiorg1010160920-9964(90)90013-W

Bookstein FL (2001) ldquoVoxel-based morphometryrdquo should not beused with imperfectly registered images Neuroimage 141454-1462 doi httpdxdoiorg101006nimg20010770

Boyke J Driemeyer J Gaser C Beurouchel C May A (2008) Training-induced brain structure changes in the elderly J Neurosci 287031-7035 doi httpdxdoiorg101523JNEUROSCI0742-082008

Bremner JD Randall P Scott TM Bronen RA Seibyl JPSouthwick SM Innis RB (1995) MRI-based measurement ofhippocampal volume in patients with combat-related postrau-matic stress disorder Am J Psychiatry 152973-981

Cherbuin N Anstey KJ Reglade-Meslin C Sachdev PS (2009) Invivo hippocampal measurement and memory A comparisonof manual tracing and automated segmentation in a largecommunity-based sample PLoS One 4e5265 doi httpdxdoiorg101371journalpone0005265

Chetelat G Baron JC (2003) Early diagnosis of Alzheimerrsquos dis-ease Contribution of structural neuroimaging NeuroImage 18525-541 doi httpdxdoiorg101016S1053-8119(02)00026-5

Christie BR Cameron HA (2006) Neurogenesis in the adult hip-pocampus Hippocampus 16199-207 doi 101002hipo20151

Churchhill JD Galvez R Colcombe S Swain RA Kramer AFGreenough WT (2002) Exercise experience and the agingbrain Neurobiol Aging 23941-955 doi httpdxdoiorg101016S0197-4580(02)00028-3

Dewey J Hana G Russell T Price J McCaffrey D Harezlak JConsortium HN (2010) Reliability and validity of MRI-basedautomated volumetry software realtive to auto-assisted manual

measurement of subcortical structures in HIV-infected patientsfrom a multisite study Neuroimage 511334-1344 doi httpdxdoiorg101016jneuroimage201003033

Draganski B Gaser C Kempermann G Kuhn HG Winkler JBeurouchel C May A (2006) Temporal and spatial dynamics ofbrain structure changes during extensive learning J Neurosci266314-6317 doi httpdxdoiorg101523JNEUROSCI4628-052006

Du AT Schuff N Chao LL Kornak J Jagust WJ Kramer JHWeiner MW (2006) Age effects on atrophy rates of entorhinalcortex and hippocampus Neurobiol Aging 27733-740 doihttpdxdoiorg101016jneurobiolaging200503021

Duvernoy HM (2005) The Human Hippocampus FunctionalAnatomy Vascularization And Serial Sections with MRI 3rded Berlin Springer-Verlag

Fischl B Salat DH Busa E Albert M Dieterich M Haselgrove CDale AM (2002) Whole brain segmentation Automated label-ing of neuroanatomic structures in the human brain Neuron33341-355 doi httpdxdoiorg101016S0896-6273(02)00569-X

Fischl B van der Kouwe A Destrieux C Halgren E Segonne FSalat DH Dale AM (2004) Automatically Parcellating thehuman cerebral cortex Cereb Cortex 1411-22 doi httpdxdoiorg101093cercorbhg087

Geroldi C Laasko MP DeCarli C Beltamello A Bianchetti ASoininen H Frisoni GB (2000) Apolipoprotein E genotype andhippocampal asymmetry in Alzheimerrsquos disease A volumetricMRI study J Neurol Neurosurg Psychiatry 6893-96 doihttpdxdoiorg101136jnnp68193

Goldstein JM Goodman JM Seidman LJ Kennedy DN Makris NLee H Tsuang MT (1999) Cortical abnormalities in schizo-phrenia identified by structural magnetic resonance imagingArch General Psychiatry 56537-547 doi httpdxdoiorg101001archpsyc566537

Heckers S (2001) Neuroimaging studies of the hippocamps inschozophrenia Hippocampus 11520-528 doi httpdxdoiorg101002hipo1068

Honeycutt NA Smith CD (1995) Hippocampal volume measure-ments using magnetic resonance imaging in normal youngadults J Neuroimaging 595-100

Jack Jr CR Peterson RC Xu YC OrsquoBrian eC Smith GE Ivnik RJKokmen E (1999) Prediction of AD with MRI-based hippocam-pal volume in mild cognitive impairment Neurology 521397-1403 doi httpdxdoiorg101212WNL5271397

Jack Jr CR Slomkowski M Gracon S Hoover TM Felmlee JPStewat K Xu Y et al (2003) MRI as a biomarker of diseaseprogression in a therapeutic trial of milameline for AD Neu-rology 60253ndash260 doi httpdxdoiorg10121201WNL00000424808687203

Jessberger S Gage FH (2008) Stem cell-associated structural andfunctional plasticity in the aging hippocampus Psychol Aging23684-691 doi httpdxdoiorg101037a0014188

Kempermann G Gast D Gage FH (2002) Neuroplasticity in oldage Sustained fivefold induction of hippocampal neurogenesisby long-term environmental enrichment Ann Neurol 52135-143 doi httpdxdoiorg101002ana10262

Krasuski JS Alexander GE Horwitz B Daly EM Murphy DGMRapoport SI Schapiro MB (1998) Volumes of medial temporallobe structures in patients with Alzheimerrsquos disease and mildcognitive impairment (and in Healthy Controls) Biol Psychia-try 4360-68 doi httpdxdoiorg101016S0006-3223(97)00013-9

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 11 r

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 12: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

Kronenberg G Bick-Sander A Bunk E Wolf C Ehninger DKempermann G (2006) Physical exercise prevents age-relateddecline in precursor cell activity in the mouse dntate gyrusNeurobiol Aging 271505-1513 doi httpdxdoiorg101016jneurobiolaging200509016

Kronmeurouller KT Schreurooder J Keuroohler S Geurootz B Victor D Unger JEssig M Pantel J (2009) Hippocampal volume in first episodeand recurrent depression Psychiatry Res Neuroimaging 17462-66 doi httpdxdoiorg101016jpscychresns200808001

Liu RS Lemieux L Bell GS Sisodiya SM Shorvon SD Sander JWDuncan JS (2003) A longitudinal study of brain morphomet-rics using quantitative magnetic resonance imaging and differ-ence image analysis NeuroImage 2022-33 doi httpdxdoiorg101016S1053-8119(03)00219-2

Leuroovden M Schaefer S Noack H Kanowski M Kaufmann JTempelmann C Beuroackman L (2011) Performance-relatedincreases in hippocampal N-acetylaspartate (NAA) induced byspatial navigation training are restricted to BDNF Val homozy-gotes Cerebral Cortex 211435-1442

Leuroovden M Schaefer S Noack H Bodammer NC Keurouhn S HeinzeH-J Lindenberger U (2012) Spatial navigation training protectsthe hippocampus against age-related changes during early andlate adulthood Neurobiol Aging 33620e629-620e622 doihttpdxdoiorg101016jneurobiolaging201102013

Lupien SJ Evans A Lord C Miles J Pruessner M Pike BPruessner JC (2007) Hippocampal volume is as variable inyoung as in older adults Implications for the notion of hippo-camapl atrophy in humans NeuroImage 34479-485 doihttpdxdoiorg101016jneuroimage200609041

Maguire EA Gadian DG Johnsrude IS Good CD Ashburner JFrackowiak RS Frith CD (2000) Navigation-related structuralchange in the hippocampus of taxi drivers Proc Natl Acad SciUSA 974398-4403 doi httpdxdoiorg101073pnas070039597

Maguire EA Woollett K Spiers HJ (2006) London taxi and busdrivers A structural MRI and neuropsychological analysisHippocampus 161091-1101 doi httpdxdoiorg101002hipo20233

Maltbie E Bhatt K Paniagua B Smith RG Graves MM MosconiMW Styner MA (2012) Asymmetric bias in user guided seg-mentations of brain structures Neuroimage 591315-1323 doihttpdxdoiorg101016jneuroimage201108025

Martensson J Eriksson J Bodammer NC Lindgren M JohanssonM Leuroovden M (2012) Growth of language-related brain areasafter foreign language learning NeuroImage 63240-244 doidxdoiorg101016jneuroimage201206043

McGraw KO Wong SP (1996) Forming inferences about someintraclass correlation coefficients Psychol Methods 130ndash46doi httpdxdoiorg1010371082-989X1130

Morey RA Petty CM Xu Y Hayes JP Wagner HR II Lewis DVMcCarthy G (2009) A comparison of automated segmentationand manual tracing for quantifying hippocampal and amyg-dala volumes Neuroimage 45855-866 doi httpdxdoiorg101016jneuroimage200812033

Meurouller R Beurouttner P (1994) A critical discussion of intraclass corre-lation coefficients Stat Med 132465ndash2476 doi httpdxdoiorg101002sim4780132310

Meurouller SG Schuff N Yaffe K Madison C Miller B Weiner MW(2010) Hippocampal atrophy patterns in mild cognitiveimpairment and alzheimerrsquos disease Hum Brain Mapp 311339-1347 doi httpdxdoiorg101002hbm20934

Oscar-Berman M Song J (2011) Brain volumetric measures inalcoholics A comparison of two segmnetaiton methods Neu-ropsychiatr Dis Treat 765-75 doi httpdxdoiorg102147NDTS13405

Pavic L Rudolf G Rados M Brklijacic B Brajkovic L Simentin-Pavic L Kalousek V (2007) Smaler right hippocampus in warveterans with posttraumatic stress disorder Psychiatry ResNeuroimaging 154191-198

Pruessner JC Li LM Series W Pruessner M Collins DL KabaniN Evans AC (2000) Volumetry of hippocampus and amyg-dala with high-resolution MRI and three-dimensional analysissoftware Minimizing the discrepancies between laboratoriesCerebral Cortex 10433-442 doi httpdxdoiorg101093cer-cor104433

Raz N Gunning-Dixon F Head D Rodrigue KM Williamson AAcker JD (2004) Aging sexual dimorphism and hemisphericasymmetry of the cerebral cortex Replicability of regional dif-ferences in volume Neurobiol Aging 25377-396 doi httpdxdoiorg101016S0197-4580(03)00118-0

Raz N Lindenberger U Rodrigue KM Kennedy KM Head DWilliamson A Acker JD (2005) Regional brain changes inaging healthy adults General trend individual differences andmodifiers Cerebral Cortex 151676-1689

Rosenzweig MR Bennett EL (1996) Psychobiology of plasticityEffects of training and experience on brain and behaviorBehav Brain Res 7857-65 doi httpdxdoiorg1010160166-4328(95)00216-2

Sanchez-Benavides G Gomez-Anson B Sainz A Vives Y DelfinoM Pe~na-Casanova J (2010) Manual validation of FreeSurferrsquosautomated hippocampal segmentation in normal aging mildcognitive impairment and Alzheimer Disease subjects Psychi-atry Res 181219ndash25 doi101016jpscychresns200910011

Scahill RI Frost C Jenkins R Whitwell JL Rossor MN Fox NC(2003) A longitudinal study of brain volume changes in nor-mal aging using serial magnetic resonance imaging Arch Neu-rol 60989-994 doi httpdxdoiorg101001archneur607989

Sheline Y Wang pW Gado MH Csernansky JG Vannier MW(1996) Hippocampal atrophy in recurrent major depressionProc Natl Acad Sci USA 933908-3913 doi httpdxdoiorg101073pnas9393908

Shen L Sayhin AJ Kim S Firpi iA West JD Risacher SLFlashman LA (2010) Comparison of manual and automateddetermination of hippocampal volumes in MCI and early ADBrain Imaging Behavior 486-95 doi httpdxdoiorg101007s11682-010-9088-x

Shi F Liu B Zhou Y Yu C Jiang T (2009) Hippocampal volumeand asymmetry in mild cognitive impairment and Alzheimerrsquosdisease Meta-analyses of MRI studies Hippocampus 191055-1064 doi httpdxdoiorg101002hipo20573

Shrout PE Fleiss JL (1979) Intraclass correlations Uses in assess-ing rater reliability Psychol Bull 86420-428 doi httpdxdoiorg1010370033-2909862420

Squire LR Stark CEL Clark RE (2004) The medial temporal lobeAnnu Rev Neurosci 27279-306 doi httpdxdoiorg101146annurevneuro27070203144130

Sullivan EV Marsh L Mathalon DH Lim KO Pfefferbaum A(1995) Age-related decline in MRI volumes of temporal lobegray matter but not hippocampus Neurobiol Aging 16591-606 doi httpdxdoiorg1010160197-4580(95)00074-O

Sullivan EV Marsh L Pfefferbaum A (2005) Preservation of hip-pocampal volume throughout adulthood in healthy men and

r Wenger et al r

r 12 r

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r

Page 13: Comparing Manual and Automatic Segmentation of Hippocampal ... · Comparing Manual and Automatic Segmentation of Hippocampal Volumes: Reliability and Validity ... (Hc) has been ...

women Neurobiol Aging 261093-1098 doi httpdxdoiorg101016jneurobiolaging200409015

Szentkuti A Guderian S Schiltz K Kaufmann J Meurounte TFHeinze H-J Deurouzel E (2004) Quantitative MR analyses of thehippocampus Unspecific metabolic changes in aging J Neurol2511345-1353 doi 101007s00415-004-0540-y

Tae WS Kim SS Lee KU Nam E-C Kim KW (2008) Validationof hippocampal volumes measured using a manual methodand two automated methods (FreeSurfer and IBASPM) inchronic major depressive disorder Neuroradiology 50569ndash581doi httpdxdoiorg101007s00234-008-0383-9

Thomas AG Marrett S Saad ZS Ruff DA Martin A BandettiniPA (2009) Functional but not structural changes associatedwith learning An exploration of longitudinal Voxel-BasedMorphometry (VBM) Neuroimage 48117-125 doi httpdxdoiorg101016jneuroimage200905097

van Praag H Kempermann G Gage FH (2000) Neural consequen-ces of environmental enrichment Nat Rev Neurosci 1191-198doi httpdxdoiorg10103835044558

Wenger E Schaefer S Noack H Keurouhn S Martensson J Heinze H-J Leuroovden M (2012) Cortical thickness changes following spa-tial navigation training in adulthood and aging Neuroimage593389-3397 doi httpdxdoiorg

Westlye LT Walhovd KB Dale AM Espeseth T Reinvang I RazN Fjell AM (2009) Increased sensitivity to effects of normalaging and Alzheimerrsquos disease on cortical thickness by adjust-ment for local variability in graywhite contrast A multi-sample MRI study Neuroimage 471545-1557 doi httpdxdoiorg101016jneuroimage200905084

Wonderlick JS Ziegler DA Hosseini-Varnamkhasti P Locascio JJBakkour A van der Kouwe A Dickerson BC (2009) Reliabilityof MRI-derived cortical and subcortical morphometric meas-ures Effects of pulse sequence voxel geometry and parallelimaging Neuroimage 441324-1333 doi httpdxdoiorg101016jneuroimage200810037

Woollett K Maguire EA (2011) Acquiring ldquothe knowledgerdquo ofLondonrsquos layout drives structural brain changes Curr Biol 212109-2114 doi httpdxdoiorg101016jcub201111018

r Comparing Manual and Automatic Segmentation of Hc Volumes r

r 13 r