Array CGH Tech Guide - Centre for Applied GenomicsArray CGH Tech Guide SEPTEMBER 2008 GENOME...

A TROUBLESHOOTING GUIDE: EXPERTS SHARE THEIR ADVICE ONPERFORMING ARRAY COMPARATIVE

GENOMIC HYBRIDIZATION

G E N O M E T E C H N O L O G Y

Array CGHTech Guide

M E T H O D S

http://www.genome-technology.com

Letter from the Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

Index of Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

Q1: How do you ensure optimal sample preparation,

including DNA extraction and amplification? . . . . . . . .7

Q2: What steps do you take to make sure you have

good labeling and hybridization techniques? . . . . . . . 9

Q3: How do you determine what type of array

(BAC or oligo) to use? . . . . . . . . . . . . . . . . . . . . . . . . . . .10

Q4: How do you validate your results? . . . . . . . . . . . . . . . . 12

Q5: How do you ensure reproducibility? . . . . . . . . . . . . . . . 14

Q6: What steps do you take to optimize visualization

and data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

List of Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Table of Contents

The Genomics Services Company

US: 1 877-226-4364

UK: +44 (0) 1279-873837

Email: [email protected]

France: +33 (0) 456-381102

Germany: +49 (0) 8158-998518www.cogenics.com

The Genomics Services CompanyCogenics is setting the standard in customizing and delivering expert genomics

solutions for Research, Clinical, and Manufacturing applications in the biotechnology

and pharmaceutical industries.

Whether your questions are best answered by sequencing, conventional or next-generation,

gene expression, genotyping, or a combination of techniques, Cogenics provides resource-

effective, expertly-run solutions for your research or FDA regulated genomics projects.

Go Green, Go Cogenics

Your analyses will be performed using the most appropriate platform to answer your research questions with

fast delivery times and high quality data. Whether you are planning a full or pilot project, here are some of

the solutions we consistently provide:

www.cogenics.com/gogreen

Sequencing solutions

Viral and oncogene analyses

SNP Discovery and Genotyping

Cell Bank Characterization

Genetic variant assay development and validation

Drug efficacy and safety related analyses

Support of global multi-center clinical trials

Biodistribution and Residual DNA Analyses

Use your research resources wisely...

Research Development Manufacturing

S E PT E M B E R 2 0 0 8 GENOME TECHNOLOGY 5Array CGH Tech Guide

This month, GT brings you

a technical guide on array

CGH. Array comparat ive

genomic hybridization evolved

from CGH, which was origi-

nally used to detect copy

number gain and loss at

the chromosome level. Several companies now

make whole-genome microarrays for CGH that

improve on this technique, offering both higher

resolution and increased reproducibility.

While other detection methods have come

online, including using SNP arrays to perform

comparative intensity analysis, many labs have

turned toward oligo arrays for CGH. BAC arrays

are still used by many clinical genetics labs to

diagnose cancer and birth defects, but oligo

arrays are shaping up to be a better choice for

large-scale genomics research mainly because

they’re cheaper, easier to make, and offer

higher resolution than BAC arrays.

Whatever the choice of platform, though,

it’s still important to nail the basics. To that end,

we’ve compiled expert advice to address the ABCs

of CGH — from optimizing DNA amplification to

proper labeling, hybridization, and validation

techniques. One of the main challenges to

performing array CGH is data analysis, and our

experts offer their suggestions on this topic, too.

And for additional help, be sure to take a look at

our resources section on p. 18.

— Jeanene Swanson

Index of experts

Timothy GraubertWWAASSHHIINNGGTTOONN UUNNIIVVEERRSSIITTYY SSCCHHOOOOLL OOFF MMEEDDIICCIINNEE

Eli HatchwellSSUUNNYY AATT SSTTOONNYY BBRROOOOKK

Matthew HurlesWWEELLLLCCOOMMEE TTRRUUSSTT SSAANNGGEERR IINNSSTTIITTUUTTEE

Christa MartinEEMMOORRYY UUNNIIVVEERRSSIITTYY

Steve SchererTTHHEE CCEENNTTRREE FFOORR AAPPPPLLIIEEDDGGEENNOOMMIICCSS,, TTHHEE HHOOSSPPIITTAALL FFOORR SSIICCKK CCHHIILLDDRREENN,, TTOORROONNTTOO

Genome Technology would like to thank the following contributors for taking the time to respond to

the questions in this tech guide.

Bauke YlstraVVUU UUNNIIVVEERRSSIITTYY MMEEDDIICCAALL CCEENNTTEERR,, AAMMSSTTEERRDDAAMM

Letter from the editor

© Agilent Technologies, Inc. 2008

Zoom into regionsof interest with

DNA Analytics

OPEN TO:

Seeing the whole genome clearly.Resolution is everything. And when you want

a highly detailed look at the genome, you need

aCGH microarrays from Agilent. These microarrays

feature the most robust signal-to-noise detection

as well as better sensitivity and specifi city than

the competition. Agilent also provides a complete

aCGH workflow—from sample prep to data

analysis. The intuitive, user-friendly Agilent

DNA Analytics software tool gives you a powerful,

comprehensive view of your data in the context

of the genome. If you want to take your research

further, there’s one thing you need to see with

unparalleled clarity: The genome.

Agilent aCGH microarrays-raising the standard in oligo aCGH.To learn more, please visit

www.opengenomics.com/CGH


How do you ensure optimal samplepreparation, including DNAextraction and amplification?

For long oligonucleotide arrayCGH, we have found the plat-forms quite forgiving withrespect to sample preparation.We have not seen a significantdifference in data quality usingtemplates prepared by crudephenol:chloroform extractionvs. spin column purification.We also see equivalent resultsusing DNA extracted from avariety of mouse tissues andfrom tumor vs. wild type tem-plates. We would cautionagainst use of whole genomeamplified templates if copynumber determination is thegoal. In our hands, the geno-type calls are highly concor-dant (pre- vs. post-WGA), but c o p y n u m b e r i s n o t always faithfully preservedduring amplification.

— TIMOTHY GRAUBERT

For DNA samples we obtain inour own work (usually isolatedfrom peripheral blood), we rou-tinely use the Promega Wizardkit for DNA extraction. Forgenomic DNA samples sent tous by collaborators or cus-tomers, we check concentra-tion, integrity, and purity by

both gel electrophoresis (toensure that there is minimalDNA degradation) and byNanoDrop measurement,checking that the 260/280ratio is as close to the range1.8 – 2.0 as possible. For thosegDNA samples that appear tobe impure (on the basis of poorNanoDrop readings), we re-extract — our preferredmethod is phenol/chloroformextraction, fol lowed byispopropanol precipitation.

There is little that can bedone for samples that areheavily degraded — we trythese but the results are often disappointing.

Where amounts of DNAare limiting, we favor amplifi-cation with Phi29 poly-merase, using the GenomiPhiKit from GE.

— ELI HATCHWELL

DNA is extracted from periph-

eral blood or tissue using a

Puregene kit and the quality is

checked by gel electrophore-

sis. If the sample is fragment-

ed, we perform a DNA cleanup

step using size exclusion

columns. The quantity of DNA

obtained from the extraction is

checked using a NanoDrop.

Our laboratory will only pro-

ceed with microarray analysis

if the DNA passes these initial

quality steps. We do not

amplify the samples, since

we have ample genomic

DNA from the peripheral blood

samples that we are analyzing.

— CHRISTA MARTIN

The DNA should be isolatedfrom the same laboratoryusing the same technique.Blood DNA is preferable butsaliva-based samples alsowork well.

We maintain optimalDNA quantity and qualityusing NanoDrop or PicoGreenmeasurements for quantityand agarose gel analysis for

“Blood DNA ispreferable, butsaliva-basedsamples alsowork well.”

— Steve Scherer

continued on page 17

Comprehensive miRNA, mRNA, and DNA Services

Asuragen provides pharmacogenomic laboratory services built on years of RNA and DNA knowledge and experience. Our capabilities can aid in accelerating drug development studies enabling our clients to focus on critical research and development activities.

Asuragen offers a wide range of unique services for

licensed service provider for platforms from Affymetrix® ® ®

and Agilent. No other service provider has the industry-leading platforms combined with so many cumulative years of experience in RNA.

and study planning so they achieve optimal information desired. Clients can use any of our service

from isolation through data analysis. Extensive QC methods in each process provide assurance that clients

Asuragen has a proven track record in providing

top pharmaceutical and biotech customers.

Please visit our website frequently for new services.

Accelerating Drug Developmentwith Molecular Biomarkers

®

PAXgene® and Tempus blood tubes

Tissue cultureOCT-embedded tissues Fresh TissueRNARetain™ / RNALater™ preserved tissues

microRNA Affymetrix / Ambion DiscovArray™ Expression Service

® qRT-PCR assays® qRT-PCR absolute quantitation

Agilent microarrays

mRNA Affymetrix GeneChip®

Proprietary 100ng Service Gene 1.0 ST and Exon 1.0 ST Arrays

DASL® qRT-PCR

Nugen™

DNA Agilent aCGH

® SNP Genotyping assays

Assay Development and Validation

™ - Asuragen developed data delivery system for microRNA array data

Affymetrix/Ambion Agilent

Feature Selection

Standard and Standard Service Premium

Affymetrix GeneChips

Consultation / design planningStatistical AnalysisDiagnostic Assay Characterization

1.877.777.1874 | [email protected] | [email protected]

asuragen.com


What steps do you take to makesure you have good labeling andhybridization techniques?

These steps are often per-formed in a core facility or con-tract laboratory. Quality con-trol often includes routine UVspectroscopy, gel visualiza-tion, and assessment of yieldpost-labeling.


So long as the DNA quality andintegrity are good, thereshould be no issues with DNAlabeling. Our throughput issufficiently high that ourreagents tend to be fresh. Onoccasion, we have had diffi-culty with precipitates in theCy5 dye, but we overcome thisby hard spinning just beforethe actual hybridization (i.e.,after the Cot-1 annealing).

For hybridization, it iscritical to make sure thatall solutions/hybridizationchambers are pre-warmed.The hybridization solutionstend to be very viscous andcontain nucleic acids at highconcentration, making precip-itation a serious concern. Inthe final analysis, this part ofthe procedure is highlydependent on the skill andexperience of the individual

performing the experiment.— ELI HATCHWELL

We follow the Agilent proto-col and perform the label-ing step in an ozone-freeenvironment. After labeling,the DNA is purified usingMicrocon YM-30 filters andanalyzed using a NanoDropspectrophotometer to deter-mine yield and labeling effi-ciency. We use opposite sexnormal controls (a pool of fiveindividuals, either male orfemale) for each hybridizationperformed. During microarrayanalysis, the sex chromo-somes are used as our internalhybridization control; if thearray shows the expected gainand loss of the sex chromo-somes (gain of X and loss of Yin a female patient or loss of Xand gain of Y in a malepatient), then the array datacan be analyzed.

— CHRISTA MARTIN

We use experienced staff andminimize the number of peo-ple that are involved in a par-ticular protocol or experiment.We also use vendor-provided

kits, follow protocol guidelinesstrictly, and use liquid handlingrobotic instrumentation forconsistency and accuracy.

— STEVE SCHERER

We always check incorpora-tion after labeling as a last quality measure before arraying. We judgearray quality by calculatingthe median absolute devia-tion of all the spots.When working with tumorsamples we use a matchedreference sample from the same individual whenpossible. This approach not only gives tighter profilesof the copy number aberra-tions, but profiles also devoidof copy number variations(Buffart et al., 2008).

— BAUKE YLSTRA

“This is highly

dependent on

the skill of the

individual.”— Eli Hatchwell

10 TECH GUIDE S E PT E M B E R 2 0 0 8 Array CGH Tech Guide

Our interest has been prima-rily detection of copy numberalterations at the highest pos-sible resolution. For this rea-son, we have turned to oligoarrays. Some projects in thelab have required whole-genome views, while othersh ave t a rge te d s p e c i f i cregions of the mouse orhuman genomes. The flexibil-ity of the NimbleGen customarray design is well suited tothese demands.


The choice of array is basedon a set of considerationsthat include cost, availability,and, most importantly, avail-able knowledge of normalvariation for the platformchosen. In our case, our sta-ple aCGH platform has been ahuman 19K tiling path BACarray, designed as a collabo-ration between my group andthat of Norma Nowak at theRoswell Park Cancer Institute,and printed at RPCI. Theadvantage of this platform forour group is that we havedata on close to 1,000 normalindividuals assayed using

e x a c t l y t h e s a m e platform (i.e., the 19K array).Thus, it is an easy matter for us to rapidly determinewhich of the CNVs we uncover in disease cohortsappear to be disease-specificand which are present in normal populations.

There is an increasingamount of data availableo n l i n e ( e s p e c i a l l y a thttp://projects.tcag.ca/vari-ation/) which lists structuralvariation discovered in nor-mals. However, we have foundthat the data is patchy, withpoor concordance betweendata elicited using differentplatforms and with manyexamples of copy numbervariation in supposedly nor-mal individuals that is highlysurprising (i.e., would other-wise be expected to bestrongly associated withsevere phenotypes).

Thus, in our opinion, it isimportant to possess struc-tural variation data that hasbeen discovered using thesame platform as that usedfor disease studies.

The above discussion

notwithstanding, however, itis clear that the increasingresolution of aCGH affordedby emerging platformsmakes these increasinglyattractive. We have somee x p e r i e n c e w i t h t h eNimbleGen 2.1M oligo arrayplatform and are impressedwith it (our lab was chosenas one of the beta testsites). We are also excitedabout trying out the new 1Mfeature Agilent arrays whenthese become available.

Fo r re g i o n - s p e c i f i canalysis, where extremelyhigh resolution is desirable,we have extensively usedcustom designed arraysfrom NimbleGen and havebeen pleased with the data obtained.

Clearly, another consid-eration in the choice of arraysis the equipment infrastruc-ture required. For our stapleBAC arrays, static hybridiza-tions work fine (no hybridiza-tion equipment required) anda standard 5-μm resolutionAxon scanner suffices. Forthe new Agilent arrays, it willbe mandatory to use a 2-μm

How do you determine what typeof array (BAC or oligo) to use?


scanner (preferably Agilent)and desirable to use a 2-μms c a n n e r a l s o f o r t h eNimbleGen arrays. For Agilent,the hybridization equipmentis fairly cheap while forNimbleGen, the preferred toolis a MAUI system (expensive,especially for the 12-positionmodel — about $50,000).

One note of caution withregard to the new generationof very high-resolution oligoarrays available from Agilentor NimbleGen: These arraysare likely to produce moredata than can be interpretedrationally. Hardly any dataexists on cohorts of normalindividuals analyzed withthese new platforms, andthere are few plans to createsuch datasets. One company,Population Diagnostics, has as one if its stated missions to generate large sets of data for high-resolution copy number variation in normal populations of varying ethnic backgrounds.

— ELI HATCHWELL

This depends on the study,and requires us to assess themost cost efficient means ofachieving the scientific objec-tives of the study. This notonly requires that we thinkabout the type of array, butalso what array format andwhich supplier, because reso-lution and sensitivity differbetween different oligo array

suppliers. Increasingly, thehigher resolution, ease of gen-erating custom arrays, andprinting reproducibility ofoligo arrays is leading to theselection of these platformsfor our experiments.

— MATTHEW HURLES

Our laboratory started outusing BAC arrays, but quicklymoved to validating oligoarrays when they becameavailable. Oligo arrays are eas-ier to reproduce reliably; wehave noticed much more vari-ation in the quality of BACarrays in comparison to ourcurrent oligo arrays. In addi-tion, with oligo arrays, it is eas-ier to obtain a higher densityof probes across the wholegenome so that imbalancescan be accurately sized ascompared to having interven-ing gaps between BAC clones.

— CHRISTA MARTIN

This is the question we aremost often asked. It reallydepends on the purpose ofthe study/need for resolu-tion/available budget. No plat-form is perfect and each has

its strengths and weaknesses.You need to use what worksfor you. In a 2007 NatureGenetics paper we ran thesame DNA sample on all avail-able platforms and got signifi-cantly different CNV calls witheach technology and CNVcalling algorithm. All vendorsare moving to higher resolu-tion arrays so the data willstart to stabilize, but evenwhen using 1 million featureoligonucleotide arrays (e.g.,Illumina 1M and Affymetrix6.0) you still only see a maxi-mum of 50% CNV call overlap.

BAC arrays are widelyused in the diagnostic settingas a first screening method forexclusion of large (typically>500 Kb) cytogenetic abnor-malities. Several labs havedeveloped their own customBAC array (spotted locally) andwill therefore give preference touse it as a first tool. BAC arraysare also traditionally less noisy.These arrays are tedious tomake and the trend is to movetowards the easier-to-manu-facture oligonucleotide arrays,but BAC arrays still have a rolein clinical laboratories.

Oligo arrays can be of highprobe density and/or tiling,which permits achieving highresolution compared to BACarrays — meaning that it is ableto detect smaller and morecandidate CNV regions. It is thetype of array preferred forresearch purposes, either for


“The trend is to

move towards

oligonucleotide

arrays.”— Steve Scherer


How do you validate your results?

This is a critical step in anaCGH experiment, especiallywhen using oligo arrays whichtend to generate somewhatnoisy data. In our view, find-ings should be validatedusing orthogonal technology(e.g., PCR, SNP array, FISH).Validation should be per-formed on as many calls aspossible, with highest prioritygiven to the "riskiest" calls(i.e., low amplitude deviationfrom normal copy number,low-density probe coverage).Another critical point that hasnot completely permeatedthe literature is that detectionof somatically acquired copynumber alterations (or copynumber neutral loss of het-erozygosity) is unreliableunless matched samplesfrom affected/unaffected tis-sues are directly compared.


Tra d i t i o n a l l y, w e h a veattempted to obtain FISH val-idation on all the copy num-ber variants we were interest-ed in pursuing further. Thisapproach, however, requiresthe availability of both cells

from the affected individualand a willing cytogenetics lab-oratory. Furthermore, FISHwill not work for the validationof very small deletions orsmall tandem duplications(which require FISH to bemore quantitative than it cur-rently is). We have tended notto use qPCR for validation,although many people do usethis approach. A 2:1 change(heterozygous deletion, forexample) will be manifestedby a 1-cycle difference inqPCR, while a 3:2 change(heterozygous duplication)will manifest as a ~0.5-cycledifference. Multiple replicatesare required, and the CVneeds to be very low for thisapproach to work. We favor the use of MLPA, amethod we have used forsome years. Historically, wehave used electrophoresis-based MLPA but are currentlyworking on Luminex bead-based MLPA, which affordsgreater multiplexing and does not require the use ofvery long oligonucleotides.Our group has developedsoftware for the automatic

design of MLPA assays,whether for electrophoresis-based or Luminex bead-based outputs.

Homozygous deletionscan clearly be validated bythe use of standard PCR,which will fail to amplify therelevant sequences.

When using region-spe-cific oligo arrays for detaileddelineation of deletion/dupli-cation/translocation break-points, we generally designprimers that will amplify aunique junction fragment,w h i c h c a n t h e n b esequenced. This providesincontrovertible validation ofthe structural change sus-pected but is limited to arrayswith sufficient resolution toallow for the direct inferenceof junction sequences.

— ELI HATCHWELL

As there is no gold-standardr e f e r e n c e g e n o m e o rgenome(s) against which wecan compare results from agiven experiment, we findthat we generally have to gen-erate a significant amount ofvalidation data for each new


study. We use validation datafor two subtly distinct purpos-es. The first is to tune theparameters in our analysis; forexample, where to set CNVcalling thresholds. This occursearlier in a project. The secondoccurs later in a project,when we want to estimatewhat proportion of the CNVsidentified are likely to befalse positives. We think thatit is important that eachmajor survey has an unbi-ased estimate of their falsepositive rate obtained usingindependent validation data,so as to give users confidenceand plan their experimentsaccordingly. Gaining an unbi-ased estimate of the falsepositive rate is not a simpleprocedure, not least becausethere is no single ideal valida-tion technology capable ofdetecting the existence of allclasses of CNV with a negligi-ble false negative rate. It isimportant that when estimat-ing this false positive rate inthe primary CNV screen thatCNVs are randomly selectedfor validation, rather thanpre-selected on the basis ofsize, frequency, type, orpotential biological impact.We typically use both locus-specific validation assays,such as real-time PCR usingeither TaqMan probes orSYBR green, and multiplexed validation assays, such ascustom microarrays. For more

complex variants, or forgreater characterization ofseemingly simple events, weuse cytogenetic methodsincluding metaphase, inter-phase, and fiber-FISH.

— MATTHEW HURLES

We validate all of our

abnormal microarray results

with FISH analysis, if the size of

the imbalance is large enough

(~100 Kb for losses and ~500

Kb for gains). If the imbalance is

too small for FISH, we use

qPCR, MLPA, or another array

platform. FISH is our preferred

methodology, since it reveals

the mechanism of the imbal-

ance (e.g., an unbalanced

translocation).This information

is important for recurrence risk

estimates in families with a

proband with a new imbalance

identified by oligo array. FISH

also allows performing parental

testing to determine if one of

the parents carries a balanced

form of the rearrangement,

which would not be detectable

by microarray analysis since

microarrays can only identify

unbalanced segments of DNA.

— CHRISTA MARTIN

We use standard samplesgenotyped across labs,which permits comparison ofresults obtained with differ-ent platforms, array resolu-tions, and CNV detectionalgorithms. We also use repli-cates, or samples genotypedrepeated times across time.

For validation we use non-microarray technology.Research labs will typicallyuse qPCR or other experi-mental quantitative meas-urement (e.g. , TaqMan,MLPA) by comparing the testCNV locus against a refer-ence locus known to havetwo DNA copies. Clinical labsusually use FISH. We havealso found using multipleprograms to call CNV workswell to increase discoveryand help prioritize regionsfor validation.

How not to validate is by comparing to the otherpublished CNVs (i.e., just byelectronic comparison tos a y, t h e D a t a b a s e o fGenomic Variants). You needto do some type of laboratory-based validation.

— STEVE SCHERER

We have used different waysto validate results, with FISHas the most common proce-dure. We are now in a processof moving to use Affymetrixarrays as a validation to theAgilent arrays.

— BAUKE YLSTRA

“FISH is our

preferred

methodology.”— Christa Martin

14 TECH GUIDE S E PT E M B E R 2 0 0 8 ARRAY CGH Tech Guide

How do you ensure reproducibility?

Replicate arrays can help with

this, but we have opted

instead to use fewer arrays

and rely on validation by

other techniques (e.g. ,

PCR/qPCR).


Our aCGH protocol hasevolved over a period of years.Every step has been optimized.The most crucial requirementto ensure reproducibility is tostick to the protocol exactly.The first thing we teach newpeople in the lab who embarkon aCGH experiments is tostick to the exact steps of theprotocol. In our experience,most of the explanations forpoor experimental data can beboiled down to variation in theway the protocol is followed.We have written down everystep in great detail, so there isno need to read between the lines.

— ELI HATCHWELL

We can assess reproducibil-

ity both in terms of CNV

calling and breakpoint esti-

mation relatively easily

through duplicate experi-

ments. Analysis of these

duplicate experiments has

proven invaluable in a num-

ber of studies. There are also

statistical methods to allow

false positive and false nega-

tive rates to be estimated

from these types of data,

which can be compared

against empirical estimates

o f t h e s e p a r a m e t e r s .

Ensuring reproducibility is a

different matter. We take

great care to order critical

reagents in large batches to

minimize the batch effects.

Seasonal effects, such as

ozone, can be mitigated by

carefully controlling the labo-

ratory environment; for

example, by installing ozone

scrubbers. We monitor data

quality over time and actively

look for time effects.

Reproducibility can also be

enhanced by defining QC

metrics targeted to different

types of failure, and re-run-

ning failed experiments

to generate a consistent

final dataset. The QC metrics

adopted by different compa-

nies differ substantially. We

typically use three or four QC

metrics designed to capture

experiments with high random

noise, high systematic noise

(autocorrelation), poor dose-

response, and across array

heterogeneity.We typically end

up re-running or excluding

five to 25% of experiments.

— MATTHEW HURLES

To m i n i m i ze v a r i a t i o n

between technologists, we

follow a standardized proto-

col developed in our laboratory

that includes numerous qual-

ity control steps to check

each major step of the proto-

col. All array processing is

carried out in a controlled

environment to eliminate any

interfering environmental

factors, such as temperature,

humidity, and ozone. In addi-

tion, we are trying to automate

“ R e p l i c at e

arrays can help

with this.”— Timothy Graubert



What steps do you take tooptimize visualization and data analysis?

This is very much still a work inprogress. Oligo array CGH datacan be noisy and the datasetsare very large. We and othershave developed a number ofalgorithms to find copy num-ber changes with high sensitiv-ity/specificity, define bound-aries with precision, andresolve complex local architec-ture (i.e., juxtaposition of dele-tions and amplifications).Currently available tools per-form reasonably well, but thereare still significant challenges.High on the list is the need tomove from qualitative geno-type calls ("normal" vs. "abnor-mal") to quantitative assess-ment of 1, 2, 3 … copies at copynumber variable regions.


We routinely use a 5-μm Axonscanner and GenePix Pro.Choosing the best PMT valuesto use for the scan is no trivialexercise. Many people rely onthe histogram to determinewhich values to use, but we donot favor this approach. It isimportant that the Cy5 andCy3 signals are evenlymatched in the features, not onthe slide in general (mild

increases in Cy5 or Cy3 back-ground can skew the his-togram and suggest PMT val-ues that do not yield balancedsignals on the features). Wetypically choose a small regionwith a few representative spotsand then scan at differentPMTs until we find the correctvalues that will yield roughlyequal intensities in both chan-nels. We then apply those PMT values to the whole slide.This method works well.Historically, we relied onGenePix Pro to extract featuredata and then performed man-ual analysis on the resultingExcel files (or used some sim-ple macros). For the last threeyears, however, we have beenusing BlueFuse software fromBlueGnome (Cambridge, UK).We favor this software for anumber of reasons: the soft-ware has an algorithm whichintelligently determines whichfeatures are good quality and which are not; grid alignment,feature signal extraction,fusion of data from differentfeatures with identical content,copy number calling, etc., areall automatic; and once param-eters have been chosen for the

software, those parameterscan be used consistently forevery experiment — thisensures that data from multi-ple arrays can be compared toeach other. In fact, we depositall our data in a MySQL data-base, so that we can easilystudy the behavior of individualfeatures across all arrays.

— ELI HATCHWELL

With each new dataset wespend quite a considerablelength of time visualizing thedata in different ways, to get afeel for the data and the likelysources of bias that might beminimized through normal-ization. Simply examining thedata plotted against genomicposition is a great way of visu-alizing the data. With noisierdata, smoothing the data-points to get a sense of large-scale genomic trends hasproven to be particularly use-ful in terms of characterizingthe “wave” effect that we seein all datasets. In part, thiseffect results from the heterogeneous distribution ofG and C nucleotides through-out the genome, and the difficulties in eradicating


subtle base composition biases in all nucleic acid-basedlaboratory protocols.

Visualizing the distribu-tion of log2 ratios at singleprobes across an entiredataset is also very useful inexploring variation in probeperformance. Extracting out-lier probes — for example,those with unusually high variance — and investigatingreasons for these outliers is a useful first step in probe QC. This approachenables us to identify arti-facts, such as autosomalprobes responding to sexchromosomal content.

We use the R packageextensively for most data visu-alization, but prefer to useC/C++ for normalizationpipelines for the speedadvantages. Nevertheless,R is useful for prototyping these most computationallyintensive methods.

To enable these analyseswe typically have to thinkcarefully about how we storethe data such that we can eas-ily access data for all probeswithin a given sample as wellas data for a single probe (orgenomic region) across allsamples. Lightweight mySQLdatabases have proven veryuseful in our work.

The model that we haveadopted for data analysistypically requires a bespokenormalization pipeline to beconstructed. Increasingly,this pipeline is constructedfrom modules that we have

used before; for example,for quantile normalizationand wave correction.

Once a dataset has beenfinalized, sample QC is a criti-cal step. We are typically prettyconservative. No set of QCmetrics is ever perfect, andpoor experiments can some-times be best identified byexamining the output of theanalysis and identifying out-lier samples — for example,those samples with themost/least CNV calls. SampleQC is also necessary to cap-ture other forms of biologicalvariation that we wish toexclude — for example, likely cell line artifacts.

There are a lot of peoplegenerating excellent softwarefor CNV analyses, and, as wellas generating our own soft-ware, we try to keep abreast ofthe literature. The ever-chang-ing nature of CNV analysesrequires that we take a modu-lar approach to our analysesso as to be able to integratenew tools for individual stepsin the analysis as theybecome available. For exam-ple, there is currently rapidgrowth in cross-sample CNVcalling algorithms.

Once a set of CNV regionshas been defined, many of thedownstream analyses are verysimilar — for example, examin-ing overlaps with differentgenomic annotations — andlike most groups, we have ourown in-house scripts.

For association studieswe really need good statisticalmethods for robust associa-tion testing. If we are to adaptthose methods from SNPgenotyping then we need toobtain robust CNV genotypes.

— MATTHEW HURLES

We perform quality controlanalyses prior to CNV analy-sis. We use a powerful desk-top computer for analyses (forWindows-based programs)and a Linux cluster for every-thing else. We organize datainto databases with browsercapabilities, and rank CNVsbased on a prioritization list(this will depend on the project).

— STEVE SCHERER

For cancer research we havewritten many data analysistools in the programming lan-guage R, and often integratethis with other successful arrayCGH bioinformatics toolsdeveloped by colleagues (vande Wiel et al., 2007). For thediagnostics arrays that are analyzed by the clinical genetics department we staywith the CGH analytics, a user-friendly interface offered byAgilent Technologies.

— BAUKE YLSTRA

“Extractingoutlier probes is a useful first step.”

— Matthew Hurles

as much of the array pro-

cessing procedure as possi-

ble. We recently introduced

the use of a Little Dipper for

the post-hybridization

array washes. Software

analysis settings and guide-

lines are globally set in the

laboratory so that all analy-

ses are performed using the

same parameters.

— CHRISTA MARTIN

We ensure reproducibility byreducing error/variability andensuring consistency betweenexperiments. Following proto-

cols and including blind duplicate samples are key. Ifperforming CGH, we try to use the right competitivehybridization sample.

Randomization of exper-iment/study design toreduce batch effects isimportant. For example, forfamily-based studies, it isideal to have the whole fami-ly genotyped with the samebatch of reagents; the sameapplies for case-controlassociation studies. One 96-well plate of submitted sam-ples would be filled with anequal number of cases andcontrols — half-filled with

cases, half-filled with con-trols. One also needs to eval-uate the quality of a CNValgorithm before applying itto study samples, by,for example, randomly pick-ing detected regions and vali-dating them experimentally.

Ideally, results wouldcome from one single analysismethod, but no single analysismethod is perfect. The moremethods used, the better fordiscovery. A compromise is toprioritize on calls detected byat least two algorithms inorder to reduce the amount offalse positive calls.

— STEVE SCHERER

genome-wide screens or ascustom array for candidateregion fine-mapping (i.e., fol-low-up of a collection of poten-tially interesting CNV regions).We'll surely start to see moreand more labs wanting to havea custom oligo array for screen-ing of candidate gene regionsfor a particular syndromic disease or group of diseases(e.g., an array for cancer-related genes, an array for autoimmune disorders, or

an array for neurological/neuropsychiatric disorders).

For purposes of CNV asso-ciation analysis, in general, oligoarrays are becoming increas-ingly cheaper, and many labswill be able to afford them, sothey will probably slowlyreplace the BAC in the future.There will soon be specializedarrays with high probe cover-age of common CNVs allowingCNV association testing incommon diseases.

Note that the use of array

technology doesn't replace

karyotying and FISH for detec-

tion of balanced structural

chromosome changes (e.g.,

inversions and translocations).

— STEVE SCHERER

We always go for the highestresolution possible, and in thatrespect oligo arrays outper-form BACs. Since 2006, our labhas no longer produced BACarrays (Coe et al., 2007; Ylstraet al., 2006).

— BAUKE YLSTRA


QQ33:: Continued from page 11


DNA degradation.

— STEVE SCHERER

We work a lot with formalin-fixed paraffin-embeddedmaterial. An overnight incu-bation of the isolated DNA

with NaSCN seems to bebeneficial there for the finalarray results. We use theNanoDrop spectrum toassess if protein or phenolcontaminants can be detectedin the sample, and if necessarywe do another cleanup using

phase-lock gels and an addi-tional precipitation. For theFFPE material we routinelyperform isothermal wholegenome amplification as aDNA quality assessment(Buffart et al., 2007).

— BAUKE YLSTRA



Our panel of experts referred to a number of

publications and online tools that may be able to

help you get a handle on array CGH. Whether you’re

a novice or a pro at the CNV game, these resources

are sure to come in handy.

PUBLICATIONSKallioniemi A, Kallioniemi OP, Sudar D, RutovitzD, Gray JW, Waldman F, Pinkel D. Comparativegenomic hybridization for molecular cyto-genetic analysis of solid tumors. Science.1992 Oct 30;258(5083):818-21.

Pinkel D, Segraves R, Sudar D, Clark S, Poole I,Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y,Dairkee SH, Ljung BM, Gray JW, Albertson DG.High resolution analysis of DNA copynumber variation using comparativegenomic hybridization to microarrays.Nat Genet. 1998 Oct;20(2):207-11.

Pollack JR, Perou CM, Alizadeh AA, Eisen MB,Pergamenschikov A, Williams CF, Jeffrey SS,Botstein D, Brown PO. Genome-wide analy-sis of DNA copy-number changes usingcDNA microarrays. Nat Genet. 1999Sep;23(1):41-6.

Scherer SW, Lee C, Birney E, Altshuler DM,Eichler EE, Carter NP, Hurles ME, Feuk L.Challenges and standards in integratingsurveys of structural variation. Nat Genet.2007 Jul;39(7 Suppl):S7-15.

Snijders AM, Nowak NJ, Huey B, Fridlyand J,Law S, Conroy J, Tokuyasu T, Demir K, Chiu R,Mao JH, Jain AN, Jones SJ, Balmain A, PinkelD, Albertson DG. Mapping segmental andsequence variations among laboratorymice using BAC array CGH. Genome Res.2005 Feb;15(2):302-11.

Snijders AM, Nowak N, Segraves R, BlackwoodS, Brown N, Conroy J, Hamilton G, Hindle AK,Huey B, Kimura K, Law S, Myambo K, Palmer J,Ylstra B, Yue JP, Gray JW, Jain AN, Pinkel D,

Albertson DG. Assembly of microarrays forgenome-wide measurement of DNA copynumber. Nat Genet. 2001 Nov;29(3):263-4.

Solinas-Toldo S, Lampel S, Stilgenbauer S,Nickolenko J, Benner A, Döhner H, Cremer T,Lichter P. Matrix-based comparativegenomic hybridization: biochips to screen for genomic imbalances. GenesChromosomes Cancer. 1997 Dec;20(4):399-407.

WEB SITEShttp://cancer.ucsf.edu/array/analysis/index.php

http://flintbox.ca/technology.asp?Page=706

http://sigma.bccrc.ca/

DATABASESCenter for Information Biology GeneExpression Database (CIBEX)http://cibex.nig.ac.jp/index.jsp

Coriell Cell Repositories NIGMS HumanGenetic Cell Repositoryhttp://locus.umdnj.edu/nigms/

Database of Chromosomal Imbalance andPhenotypes in Humans using EnsemblResources (DECIPHER)http://www.sanger.ac.uk/PostGenomics/decipher/

Human Segmental Duplication Databasehttp://projects.tcag.ca/humandup/

Human Structural Variation Databasehttp://humanparalogy.gs.washington.edu/structuralvariation/

NCBI Single Nucleotide PolymorphismDatabase (dbSNP)http://www.ncbi.nlm.nih.gov/projects/SNP/

Segmental Duplication Databasehttp://humanparalogy.gs.washington.edu

List of resources

Evolving? Don’t change jobs without us.

E-mail your updated address information to [email protected]. Please include the subscriber number appearing directly above your name on the address label.

GenomeWebIntelligenceNetwork

Array CGH Tech Guide - Centre for Applied GenomicsArray CGH Tech Guide SEPTEMBER 2008 GENOME...

Documents

Transcript of Array CGH Tech Guide - Centre for Applied GenomicsArray CGH Tech Guide SEPTEMBER 2008 GENOME...