All Hands Meeting 2005 FBIRN AHM 2006 Statistics Working Group Update Greg Brown, UCSD Hal Stern,...

43
All Hands Meeting 2005 FBIRN AHM 2006 Statistics Working Group Update Greg Brown, UCSD Hal Stern, UCI

Transcript of All Hands Meeting 2005 FBIRN AHM 2006 Statistics Working Group Update Greg Brown, UCSD Hal Stern,...

All Hands Meeting 2005FBIRN AHM 2006

Statistics Working Group Update

Greg Brown, UCSD

Hal Stern, UCI

Statistics Update Discussion Points

Aims of Statistics Workgroup

Activities last 6 months (highlights only)

Future Plans

Aims

Aim 1. Refine tools to assess the quality and reliability of fMRI data, and apply these tools to guide the collection and analysis of multi-site imaging data

Aim 2. Develop statistical methods to analyze multi-site fMRI data, while accounting for between site variation

Aim 3: Develop statistical and machine learning tools to identify homogenous subgroups

Statistics Workgroup Structure

Statistical and FBIRNProgram m ing Inte gration

Greg Brow n- Leader

Re liability andCalibration

Hal S tern- Leader

Data Proce ssingAnthony Gam s t- Leader

Algorithm De v e lopm e ntPadhraic Sm yth- Leader

Statistical W GHal S tern- C o-C hair

Greg B row n- C o-C hair

UC SD (Greg B row n)UNM (Lee Friedm an)

S tanford (Gary Glover)Yale (D an Mathalon)

D uke (Greg McC arthy)BW H (Kelly Zou)

UC SD (Greg B row n)UC I (Hyo Jong Lee)

D uke (Greg McC arthy)UC SD (Burak O zyurt)UC SD (R andy Yum el)

UC SD (D ata Tec hnic ian TBD )

UC SD (Anders D ale)BW H (O la Frim an)

UC SD (Anthony Gam s t)BW H (S teve P ieper)

UC I (Hal S tern)MGH (Mark Vangel)

BW H (S im on W arfield)BW H/MIT (Sandy W ells )

UC SD (Anders D ale)D uke (Syam Gadde)

UC SD (Anthony Gam s t)MGH (D oug Greve)UC I (Hyo Jong Lee)

UC SD (R andy Notes tine)UC SD (Burak O zyurt)BW H (S teve P ieper)UC SD (Nik Sc hork)

Activities over the past 6 monthsorganized by Statistics WG subcommittees

Data Processing Statistics WG

Developed download scripts at several sites

Continual download script running at San Diego site

• Field maps for GE sites need special file structure to upload

• Down load time varies by download site and by down load software options

Data Processing Statistics WG

Preprocessing Scripts (stand alone modules and integration with FIPS are available for most scripts) • All scripts run on Analyze 7.5 format• Some scripts run on AFNI BRIK format also. • Scripts available for

Slice time correction Motion correction B0 inhomogeniety warping Spatial smoothing to a target smoothness (several approaches

are available)• Scripts are in place for Siemen’s sites and scripts for GE

sites are being integrated into fBIRN stream• These scripts have been run on all auditory oddball

images from Minnesota, MGH, and New Mexico

Data Processing Statistics WG

Several QA tools have been tested: Duke tools, GabLab tools, AIRT, AFNI tools

• Goal is to develop automated or semi-automated QA tools usable with large image datasets

• Validation of these tools will require visual inspection

Migrating Functional Image Processing System (FIPS) throughout the fBIRN consortium.

• Five fBIRN sites are currently using FIPS to test the processing of images at their site

• FIPS Power Users are being trained at several sites. These power uses are meant to be a regional fBIRN resource as well as a local resource. They will relieve the FIPS developers from day to day consultations about FIPS

Data Processing Statistics WG

Processing strategy to test between-group hypotheses involving auditory oddball and Sternberg Memory Scanning paradigms.

• One site will be the lead site for this analysis so that the fBIRN community presents a uniform report of results to the general imaging community

• UCSD has volunteered to be the lead site for between group hypotheses

• Other sites have volunteered to analyze images from their site UCI University of Minnesota

• The lead site will re-analyze a subset of images from volunteer sites to insure uniformity of analysis results.

Reliability and Calibration WG

What is the outer limit of reliability of robustly activating paradigms in multi-site fMRI studies?

How reliably did the Phase I traveling subjects study measure site variation?

How much unwanted variance can be reduced in multisite sensorimotor imaging data by breath hold calibration once between site differences in image intensity are controlled?

What is the outer limit of reliability of robustly activating paradigms in

multi-site fMRI studies?

Outer limit of reliability: Sensorimotor Task and Breath Hold Tasks

FIPS

Signed Magnitude

Top 10%

AFNI % Signal Change from Average

Image Value

Mean Across ROI

Task Generalizability

Coefficient

Dependability

Coefficient

Generalizability

Coefficient

Dependability

Coefficient

Sensorimotor

(Visual ROI)

.92 .79 .93 .80

Breath Hold

(Average across ROIs)

.92 .86 .94 .88

Outer limit of reliability: Conclusions

For simple sensorimotor and breath hold tasks, the reliability of intensity corrected measures of BOLD response for a region of interest can be very good to excellent.

One month test-retest correlation coefficients for subtests of the Wechsler Adult Intelligence Test III for adults 30 to 54 ranges from .70 to .93.

Consistency measures of fMRI reliability can be as good or better than that of well constructed psychological tests scores.

Are sensorimotor and breath hold tasks equally sensitive to site and

subject effects?

Sensorimotor: Variance Components Analysis

   Percent Variance Accounted For in Visual ROI (GENOVA)

   FIPS Signed Magnitude Top 10%

 AFNI

 Variance Source

 Uncorrected

 Uncorrected

 Person 18.24   16.67  

 Day  0.13  Neg  

 Run  0.02 .  Neg

 Site  32.43   29.01

 Person by Day  Neg  Neg

 Person by Run  0.65 Neg

 Person by Site  7.61    5.28

 Person by Hemisphere  2.81  Neg

 Person 3-ways  34.02  41.78

 Residual (4-way + )  2.89  5.67  

Breath Hold Task: Variance Components Analysis

   Percent Variance Accounted For (GENOVA)

   FIPS Signed Magnitude Top 10%

 AFNI

 Variance Source

 6 ROIs

 10 ROIs

 Person 37.02   37.33  

 Day  0.27  0.03  

 Run  0.27 .  0.30

 Site  22.44   23.87

 Person by Day  0.76  1.77

 Person by Run  1.65 0.63

 Person by Site  4.90    7.58

 Person by Hemisphere  -  -

 Person 3-ways  13.46  25.93

 Residual (4-way + )  16.22  0.90  

Sensorimotor – Breath Hold Task Comparison

The sensorimotor task is more sensitive to site effects than to subject effects.

The breath hold task is more sensitive to subject than to site effects.

How reliably did the Phase I traveling subjects study measure site

variation?

Reliability of Site Differences

Treat Site rather than Subject as Measurement Object

Same variance components tables presented previously can be used to estimate how reliably site differences were measured across the study factors of run, day, and person.

Reliability of Site Differences: AFNI Analysis

Task Consistency Dependability

Sensorimotor

(visual ROI)

.97 .92

Measurement of Site Variance

Measurements of site variability provided by the Phase I traveling subject study were very reliable, at least in the visual region of interest.

How much unwanted variance can be reduced in multisite sensorimotor imaging data by breath hold calibration once between site differences in image intensity are controlled?

Breath Hold Correction of Sensorimotor Data

Our previous work showed that breath hold calibration improved the dependability of native regression weights that were not intensity corrected.

Site specific calibration:

Subject tailored calibration

site specific a at hold breath mean

mean hold breath grand value orsensorimot

run and day, site, subject, comparable for value hold breath

mean hold breath grandvalue orsensorimot

Reliability of Breath Hold Calibrated Intensity Corrected Values: Visual ROI

FIPS Sign Mag

Top 10%

AFNI %Change

Entire ROI

Condition Generalizability Dependability Generalizability Dependability

Uncorrected .92 .79 .93 .80

Site Specific .93 .83 .94 .86

Subject Tailored

Analyses done but not double checked. Subject Tailored calibration appears to be no better than Site Specific calibration.

Reliability of Breath Hold Calibrated Intensity Corrected Values: Hand ROI

FIPS Sign Mag

Top 10%

AFNI %Change

Entire ROI

Condition Generalizability Dependability Generalizability Dependability

Uncorrected .92 .81 .88 .84

Site Specific .92 .83 .87 .85

Subject Tailored

Analyses done but not double checked. Subject Tailored calibration appears to be no better than Site Specific calibration.

Visual ROI Task: Site Specific Correction

   Percent Variance Accounted For in Visual ROI (GENOVA)

   FIPS Signed Magnitude Top 10%

 AFNI %Change Entire ROI

 Variance Source

 Uncorrected

Site Specific Breath Hold Corrected

 Uncorrected

Site Specific Breath HoldCorrected

 Person 18.24  20.08 16.67 20.87

 Day  0.13 .19  Neg  Neg

 Run  0.02 .06  Neg  0.03

 Site  32.43  27.04 29.01 21.47

 Person by Day  Neg  Neg  Neg Neg

 Person by Run  0.65  0.67 Neg 1.62

 Person by Site  7.61  7.29  5.28 6.95

 Person by Hemisphere  2.81 2.26  Neg Neg

 Person 3-ways  34.02 37.47  41.78 41.85

 Residual (4-way + )  2.89 3.55  5.67  5.89

Hand Area ROI: Site Specific Correction

   Percent Variance Accounted For in Hand ROI (GENOVA)

   FIPS Signed Magnitude Top 10%

 AFNI %Change Entire ROI

 Variance Source

 Uncorrected

Site Specific Breath Hold Corrected

 Uncorrected

Site Specific Breath HoldCorrected

 Person 15.154  15.69 23.32  23.68

 Day  Neg Neg  Neg  Neg

 Run  0.46 .46  0.0  0.0

 Site  21.80  17.03 11.28 7.81

 Person by Day  0.41  0.70  Neg Neg

 Person by Run  0.14  0.22 Neg Neg

 Person by Site  6.68  6.60  28.92 31.26

 Person by Hemisphere  0.58 0.25  0.38 0.29

 Person 3-ways  38.61 41.53  27.04 27.66

 Residual (4-way + )  11.51 12.31  6.47 7.60

% of Site Variance Reduced by Site-Specific Breath Hold Calibration:

ROI FIPS AFNI

Visual

16.62% 27.88%

Hand

21.88% 30.76%

%var uncorrected site factor - %var correct site factor

%var uncorrected site factor

Conclusions on Breath Hold Correction (for intensity normalized MR images) Site specific breath hold calibration does not improve consistency

measures of reliability, at least for the highly consistent fBIRN sensorimotor task.

Site specific breath hold calibration produces modest increases in absolute agreement measures of reliability for the fBIRN sensorimotor task.

Site specific breath hold calibration reduces the unwanted variance

associated with site by 16% to 31%, depending on ROI and processing choices.

Site specific breath hold calibration did not reduce unwanted person by site variance for intensity normalized MR image.

Statistical and Programming Integration WG

Much of this work has migrated to other Statistics WGs, such as Data Processing

The WG is working with the BIRN CC to implement the program work flow scheduling system, Condor, at the San Diego site.

Future Plans: Reliability and Calibration

Complete Phase I variance components analysis of breath hold calibration• Confirm subject specific analysis• Complete analysis of auditory ROI• Compare results from completely crossed design with

run and day nested under site• Compare traditional method of moments method for

estimating variance components with Bayesian method Mean squares

Perform a generalizability and variance components analysis of smooth-to correction

Test newly developed calibration methods on Phase I data

Future Plans – Data Processing: Preprocessing

Preprocess all image sets

Complete artifact detection

Upload preprocessed images and artifact detection log into the Federated Database

Artifact correct images and upload corrected images

Data Processing

Train FIPS Power Users

Complete all subject level analyses and upload them into the database

Extend FIPS to level II (several extensions might be required)

Compare between group analyses plans• Conventional fixed effects design with site and group• Conventional fixed effects design with covariates

Site specific covariate adjustment Pooled covariate adjustement

• Meta-analytic methods site-specific error-weighting

Future Plans - Statistical and Programming Integration

Make the integration of FreeSurfer into FIPS pipeline more generally available.

Complete an implementation of Condor at San Diego site.

Algorithm Development

Extend work on independent components analysis done at Yale to multi-group and multi-task applications and incorporate into FIPS (perhaps through FSL Melodic).

Extend work done at BWH/MIT on Multivariate Autoregressive (MAR) Model for effective connectivity analyses to multi-group context.

Extend work with the expectation-maximization STAPLE method of analyzing inter-site differences to the voxel level.

Further develop the UCI parametric response surface model and integrate into the analysis pipeline.

Continue work on group classifiers.

Future Plans: Revise Aim 3 Current Aim 3: Develop statistical and machine learning tools to identify

homogenous subgroups

Proposed New Aim: Develop novel statistical and machine learning tools to analyze multisite imaging data.

(eg. STAPLE, Independent Components Analysis for Multisite-Multitask image data, Parametric Response Surface Modeling, MAR, etc.)

New aim would include the search for homogenous subgroups to the extent that it is feasible, but acknowledge other novel methods in development.

New Aim elevates the creative work being done in the Statistical Working Group to a formal project goal.

Revise Statistics Workgroup Structure

Data Processing

Greg Brown

Image Pipeline Forum

Doug Greve, Lee Freidman

Level II Statistical Modeling of MultiSite-MultiGroup Imaging

Data

END

Timeline

Train FIPS Power Users Vince’s work Sandy’s work Condor

Activities last 6 months• Data download • Using FIPS• Extending FIPS to second level analysis• Phase II image analysis plan• Variance Components analysis of Phase I images

Future Plans

Testing the Testbed Hypothesis

Testbed Hypothesis:

Before a federated imaging database can be released to the medical and scientific

community, it must be tested by performing a large-scale study involving patients.

Alternative Hypothesis:

(Field of Dreams Hypothesis): If you build it they will come.

Confirming the Testbed Hypothesis

The Testbed Hypothesis is being confirmed (with a vengence)

Revisions of the Testbed need to be programmed into our resource planning (especially personnel)

What are the implications of the Testbed Thesis for the use of distributed imaging databases outside of arenas where they have been tested? (Eg., longitudinal studies, drug trials).

Need for advocacy and exchange with medical scientists outside of BIRN

Statistics Update Discussion Points

Aims

Activities last 6 months• Data download and Testing the Database • Creation of Preprocessing Scripts• Using FIPS • Extending FIPS to second level analysis• Phase II image analysis plan• Variance Components analysis of Phase I images• Pipeline Forum

Future Plans

Subject Tailored Correction

   Percent Variance Accounted For in Visual ROI (GENOVA)

   FIPS Signed Magnitude Top 10%

 AFNI

 Variance Source

 Uncorrected

Site Mean Breath Hold Corrected

 Uncorrected

Site Mean Breath HoldCorrected

 Person 18.24  37.02 16.67 37.33 

 Day  0.13 0.27  Neg  .03

 Run  0.02 Neg  Neg .30

 Site  32.43  22.44 29.01 23.87

 Person by Day  Neg 0.76  Neg 1.77

 Person by Run  0.65 1.74 Neg 0.63

 Person by Site  7.61  4.90  5.28 7.58

 Person by Hemisphere  2.81 -  Neg - Person 3-ways  34.02 13.46  41.78 25.93

 Residual (4-way + )  2.89 16.22  5.67  0.90