Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are...

12
Minireview DNA microarray technology: the anticipated impact on the study of human disease Javed Khan, Michael L. Bittner, Yidong Chen, Paul S. Meltzer, Je¡rey M. Trent * Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA Received 30 November 1998; accepted 20 January 1999 Keywords : DNA microarray technology ; Medical research ; Data uniformity ; Microarray Contents 1. Introduction .......................................................... M17 2. All chips are not created equal ............................................ M18 3. So many ESTs to choose from, so little room on each slide ....................... M18 4. Technology, technology, technology ......................................... M20 4.1. Instrumentation .................................................... M21 4.2. Biochemistry ....................................................... M23 4.3. Informatics ........................................................ M26 5. Summary ............................................................ M27 Acknowledgements ......................................................... M27 References ............................................................... M27 1. Introduction This review focuses on the DNA microarray tech- nology, its preliminary results, and the presumed enormous future impact the technique will have on the study of human disease. The development of this technology is gaining increasing importance as a di- rect result of the Human Genome Project, and espe- cially due to the recent focus on establishing an in- creasingly massive collection of genes (expressed sequence tags, ESTs) for human and model organ- isms. Our belief is that the continued development of the microarray technology will be invaluable to the study of human disease. However, it is a striking reality that this important research tool remains largely restricted to the few laboratories that have developed expertise in this area, and a growing num- ber of commercial concerns. Ultimately the real value 0304-419X / 99 / $ ^ see front matter ß 1999 Elsevier Science B.V. All rights reserved. PII:S0304-419X(99)00004-9 * Corresponding author. Fax: +1 (301) 402-2040; E-mail : [email protected] Biochimica et Biophysica Acta 1423 (1999) M17^M28

Transcript of Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are...

Page 1: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

Minireview

DNA microarray technology: the anticipated impact on the studyof human disease

Javed Khan, Michael L. Bittner, Yidong Chen, Paul S. Meltzer, Je¡rey M. Trent *Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA

Received 30 November 1998; accepted 20 January 1999

Keywords: DNA microarray technology; Medical research; Data uniformity; Microarray

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M17

2. All chips are not created equal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M18

3. So many ESTs to choose from, so little room on each slide . . . . . . . . . . . . . . . . . . . . . . . M18

4. Technology, technology, technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M204.1. Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M214.2. Biochemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M234.3. Informatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M26

5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M27

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M27

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M27

1. Introduction

This review focuses on the DNA microarray tech-nology, its preliminary results, and the presumedenormous future impact the technique will have onthe study of human disease. The development of thistechnology is gaining increasing importance as a di-

rect result of the Human Genome Project, and espe-cially due to the recent focus on establishing an in-creasingly massive collection of genes (expressedsequence tags, ESTs) for human and model organ-isms. Our belief is that the continued development ofthe microarray technology will be invaluable to thestudy of human disease. However, it is a strikingreality that this important research tool remainslargely restricted to the few laboratories that havedeveloped expertise in this area, and a growing num-ber of commercial concerns. Ultimately the real value

0304-419X / 99 / $ ^ see front matter ß 1999 Elsevier Science B.V. All rights reserved.PII: S 0 3 0 4 - 4 1 9 X ( 9 9 ) 0 0 0 0 4 - 9

* Corresponding author. Fax: +1 (301) 402-2040;E-mail : [email protected]

BBACAN 87436 9-3-99

Biochimica et Biophysica Acta 1423 (1999) M17^M28

Page 2: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

of microarray technology will only be realized whenthis approach is available to the widest possible num-ber of scientists.

2. All chips are not created equal

Miniaturized DNA microarrays (called DNAchips) have evolved as an important genome researchtool. One major type of DNA chip contains high-density arrays of short (930 nucleotides long) oligo-nucleotides immobilized by photolithography to asolid surface [1^4]. This approach, pioneered by Af-fymetrix, allows as many as 300 000 oligonucleotidesto be arrayed in a 6 2 cm2 area. The resulting hy-bridization pattern is captured with the aid of a £uo-rescent scanning device and associated software thatallows a merging of the hybridization pattern withthe precise sequence of the arrayed DNA molecules.The applications of this approach are varied, buthave included `re-sequencing' of DNA for mutationdetection [5] and more recently expression analysis[6]. It has the distinct advantage of high density,and the distinct disadvantage (until the human ge-nome is sequenced in its entirety) of an absoluterequirement for the arrayed sequence to be known.This report will not deal further with this commer-cially available technologic approach for generatingor using microarrays. The reader is referred to URL 1(see p. M27) for further information and potentialapplications.

The focus of this review will be on an alternativeapproach to comparatively analyze genome-wide pat-terns of mRNA expression. This approach, termedcDNA microarrays, uses cDNA clone inserts (insteadof oligonucleotides), which are robotically printedonto a glass slide and subsequently hybridized totwo di¡erent £uorescently labeled probes. FormRNA expression studies, the ultimate goal is to de-velop microarrays with every gene in a genomeagainst which mRNA expression levels can be quan-titatively assessed. At the current time this approachallows the deposition of as many as 25 000 cDNAs ona single microscope slide. While limited in terms ofthe number of targets per cm2 compared to DNAchips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by thenecessity to know the sequence of the clone being

arrayed. The technology for cDNA microarray hy-bridization is based on a method whereby the probesare pools of £uorescently labeled cDNAs synthesizedafter isolating mRNA from cells or tissues in twostates that one wishes to compare. The di¡erentiallylabeled probes are hybridized to the microarray slide;the resulting £uorescent intensities are measured us-ing a laser confocal £uorescent microscope; and ratioinformation is obtained following image processing.

Finally, database development and design are crit-ical, in order to deal with the immense amount ofinformation generated by cDNA microarrays. Thisminireview brie£y considers aspects of clone-set iden-ti¢cation, printing and reading instrumentation,standard protocols, image analysis, and the databasemanagement behind this technology. As an illustra-tion, it will highlight the approach developed in ourlaboratory. The reader is referred to URL 2 for fur-ther information and speci¢c details of the NIH Mi-croarray Project, and an associated Web Links pageURL 3 referencing other academic and corporatesites involved in this technology.

3. So many ESTs to choose from, so little room oneach slide

Estimates of the total number of genes in the hu-man genome range from 70 000 to s 100 000 [7^10].With the sequence of the human genome about 6%completed (F. Collins, personal communication) andwith the completion of the ¢rst sequence of the hu-man now anticipated by 2003, we should, in the nextdecade, identify most (if not all) of the genes of thehuman genome and other targeted model organisms(see URL 4) [11]. However, it is obvious that inorder for an organism to function, each gene mustbe expressed in a speci¢c temporal and spatial con-text. Genome-wide expression studies using microar-ray-based technologies are an important link betweensequence and function. In addition to the large bodyof sequence information from genomic sequences, alarge volume of EST sequence data (from many or-ganisms) has been archived over the past decade.Now, as a result of this deposition of EST sequenceinformation, there are more DNA sequences in Gen-Bank than there are publications in the biomedicalliterature [12] (Fig. 1, URL 5).

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28M18

Page 3: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

The slope of this curve should continue to risewith future publications of expression pro¢les ofthousands of genes in a single report (in contrast tothe current practice of reporting the expression pro-¢le of a single gene).

Although sequence-based information on ESTs is

essential for genome-wide expression analyses, it isalso problematic. Speci¢cally, based on the postingof 8 October 1998 of the UniGene [13,14] (URL 6)collection of human expressed sequences, the totalnumber of EST sequences numbers 797 691 (URL7). With an estimated 100 000 human genes, andthe certainty that all genes do not yet reside in thesedatabases, it is obvious that signi¢cant redundancyexists in current EST databases. For practical con-cerns of economics and time, it is essential to identifya non-redundant collection of human and model sys-tem genes and ESTs. The UniGene collection of theNational Center for Biotechnology Information ofthe NIH has currently identi¢ed such a set for hu-man ESTs, with 40 000 genes (termed clusters) cur-rently represented. A subset of 15 000 transcribedhuman sequences, which we refer to as the 15K set,has been compiled, and information about this set isavailable at URL 8. A similar UniGene set for mur-ine ESTs has been generated, and other model or-ganisms will follow. It is this non-redundant `back-bone' of ESTs that our laboratory (and many others)have used in selecting clones for arraying.

Fig. 2. Schematic of probe preparation, hybridization, scanning and image analysis.

Fig. 1. Cumulative growth molecular biology and genetics liter-ature compared with DNA sequences. Articles in the `G5' (mo-lecular biology and genetics) subset of MEDLINE were plottedalong side of DNA sequence records in GenBank over thesame time period. Used with permission [12].

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28 M19

Page 4: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

4. Technology, technology, technology

Regardless of the clone set chosen for any micro-array, experiments are performed in much the sameway. For the purposes of this minireview, we will

limit our comments to the cDNA microarray tech-nology utilized in our laboratory at the NationalHuman Genome Research Institute, NIH. Brie£y,RNA is extracted from the test and control cells/tis-sue, puri¢ed, and then labeled by the reverse tran-

Fig. 3. Close view of print head loaded with 16 `quills'.

Fig. 4. cDNA microarray target printing apparatus. Computer-controlled robotic cantilever arm, capable of moving in XYZ direction-al planes, can be armed with up to 16 (two rows of eight) `quill' print tips on the print head. In one automated print cycle, the printhead dips the quills into a set of target DNA wells arrayed in 96 well microwell plates; then the print head traverses the vacuum tableand touches the quill tips to each glass slide in succession, depositing target DNA; the print head continues to the wash/dry stationwhere the tips are cleaned twice with water and dried. This cycle repeats as the print head returns to wet the tips in the next set oftargets, continuing until all targets of a 96 well microwell plate have been printed. An autoloading mechanism removes spent micro-well plates and can serve up new plates. By this method, microarray slides can be printed with as many as 15 000 precise and discretecDNA targets. Used with permission [16].

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28M20

Page 5: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

scriptase reaction to produce either £uorescently`tagged' cDNA (Fig. 2). The labeled cDNA probesare then hybridized onto the array. Following hy-bridization, the array is scanned using a confocallaser scanning microscope, and the resulting dataare analyzed.

The major issues a¡ecting this technology can bedivided into those related to instrumentation, bio-chemistry, and bioinformatics. Each will be consid-ered brie£y.

4.1. Instrumentation

Instrumentation remains a major and limiting fac-tor at the present time, in that the most advancedexpression systems are very expensive. As a result,they have only been available to laboratories inwhich this technology is being developed or to largecommercial companies. Even ¢lter-based radioactivesystems, the lower cost alternative to £uorescentcDNA microarrays or DNA chips, are certainlynot inexpensive to produce or to buy. To make allbut the smallest ¢lter array, one would need to con-struct a robot, order and amplify the clone set ofcDNAs to array, re-sequence these to be sure of their

identity, and develop the technical approach to arraythem on membranes.

This is impractical for single investigators at mostresearch centers, leaving the alternative, to pay anaverage of $0.25 per clone for arrayed clone sets onnylon ¢lters. Although these nylon ¢lter arrays canusually be stripped and re-probed, the cost per ex-periment for large sets of ESTs can be prohibitive.Nevertheless quantitative expression data can be ob-tained by scanning these radioactive nylon arrayswith a phosphoimager and analyzing the imagedata with relevant software (e.g. deArray, developedby Dr. Yidong Chen [15]) and other commerciallyavailable systems. This eliminates the need to buyan expensive £uorescence detector, and, as describedbelow, lowers the amount of mRNA required forprobe labeling. Until the cost per experiment falls,the number of experiments that can be performedand the number of replicates per experiment will belimited. In contrast to the printing of large arrayscontaining thousands of ESTs, it may be practicalfor groups of investigators to adapt the more widelyavailable standard laboratory robots to the printingof a few hundred selected cDNAs (which can beobtained from public clone archives) that correspond

Fig. 5. Entire arrayer showing the casing and controlling PC.

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28 M21

Page 6: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

to genes involved in pathways of interest for partic-ular projects.

Our laboratory at the NHGRI, in collaborationwith the Biomedical Engineering and Instrumenta-tion Program (BEIP) and the Division of ComputerResearch Technology (DCRT) at NIH, has devel-oped the robotic instrumentation for arrayingcDNA clones (termed arrayer), as well as a confocal

scanning £uorescent microscope to examine £uores-cent intensities of microarrays (termed reader). As inthe case of experimental protocols, the instrumenta-tion we are using is under constant development in aresearch setting. Accordingly, the information pro-vided merely illustrates the common components ofarrayers and readers.

The arraying instrument was conceptually modeled

Fig. 6. Overview of confocal laser scanning microscope reader.

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28M22

Page 7: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

after that of Brown and colleagues (URL 9), andcomprises an XYZ cantilever type robot holding 4^16 quill pens (Fig. 3, URL 10). Other features of thearrayer include the provision of a vacuum chuck forholding 48 standard microscope slides, a microtiter-tray loader/stacker, a wash/dry station (Fig. 4) [16], acontrolling PC, air-handling components and a cab-inet (Fig. 5, URL 11). The arrayer is able to putdown in one cycle up to 16 spots on each of 48 slides,and wash and dry the quill pens for the next set ofcDNAs, all in about 70 s. Most of this time is takenup with the actual spotting, since the wash and drycycles take about 2 s each. Loading takes about 10 s.Thus, the contents of one 96 well tray can be spottedon 48 slides every 7 min, and 10 000 spots would takeabout 12 h.

The reader designed at the NIH is also based onthe original design by Brown and colleagues (URL9), although a number of additional features havebeen added. The reader is basically a computer-con-trolled inverted scanning £uorescent confocal micro-scope with a triple laser illumination system (Fig. 6,URL 12). The optical system is folded and arrangedon an optical breadboard. The breadboard is hungwith shock mounts in a vertical plane (to save space)inside a lightweight enclosure which also protects theoptics from laboratory dust and personnel from laserlight. Illumination is from three air-cooled lasers: a

488 nm, 100 mW argon ion laser for exciting FITC;a 532 nm, 100 mW NdYag for Cy3, and a 633 nm,35 mW HeNe for Cy5. Any two lasers may be turnedon simultaneously and their beams combined withdichroic mirrors and delivered to the specimen viaa single dichroic and an objective lens (0.75 NA,0.66 mm wd). The objective lens can be preciselyfocused with a digital controller. The emitted lightfrom the £uorescent targets, after passing backthrough the objective and primary dichroic, is fo-cused through a confocal pinhole and through a sec-ondary dichroic onto two cooled PMTs which oper-ate in parallel for the two di¡erent wavelengths.Data are acquired with a custom integrator andstandard 16 bit A/D card in a PC. The operatorcan set the gain, speed, pixel size, pattern position,and pattern area. All of the electronics and powersupplies are mounted in the cabinet bay next to theoptics.

At 100 mm/s, with 20 Wm2 pixels, a 50U20 spotarray with spots on 400 Wm centers involves 400traverses each about 20 mm long and can be scannedin about 4 min. We can reliably detect about 10 pg/Wlof each species of cDNA [17].

4.2. Biochemistry

In an ideal world, to facilitate interpretation of

Fig. 7. Representative microarray hybridization. This pseudocolored image represents a portion of a microarray with the referenceprobe (normal ¢broblasts) in green and rhabdomyosarcoma in red. The up (red) and down (green) regulation of several genes are il-lustrated. Representative genes of interest are boxed (A = FKHR, B = MYCN, C = CDK4, D = MYBL2 and E = NGFR). Ratio datafor each of these individual spots are calculated and used for further analysis. Used with permission [19].

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28 M23

Page 8: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

expression data, the source of the mRNA to bestudied should be isolated from a homogeneous pop-ulation of cells, collected in a way that accuratelypreserves mRNA expression levels, and is availablein an unlimited quantity. These criteria are best ¢t bythe study of cultured cell lines, which can be derivedfrom clonal populations and can be readily manipu-lated by pharmacological or physiological means forexpression studies. In reality, one would like to beable to study human or animal tissues in addition tocell lines. However, tissues are di¤cult to study, be-cause most are composed of numerous cell types, andthe techniques and timing of sample collection vary,as does the quantity and quality of mRNA.

One way around the important issue of multiplecellular populations within a single biopsy specimenis to use the method of laser capture micro-dissection[18] to isolate speci¢c cells for analysis. The limitingfactor in this analysis is that very little RNA can be

prepared from the numbers of cells (6 1000) ob-tained. Consequently, methods are being devisedthat permit reverse transcription, linear ampli¢ca-tion, and alternative direct labeling strategies of van-ishingly small amounts of RNA, which hopefully willmaintain the relative concentrations of RNA speciesin the starting material.

Perhaps the most important advantage of £uores-cent probes is the ability to use multiple spectrallydistinct £uorophores. Because they can be multi-plexed, hybridization signals from two or more dif-ferently labeled probes can be detected separately onthe same slide. Thus, two-color £uorescence hybrid-izations allow direct comparison between two probesthat were hybridized simultaneously to the same ar-ray. Ratio measurements also increase the accuracyof the comparative analysis. Finally, a series of pro-tocols from our laboratory, which detail current pro-tocols for mRNA isolation and probe labeling, canbe found and downloaded at URL 13.

Fig. 9. The hierarchical clustering dendrogram indicates the or-der in which the 13 cell lines are combined to form clusters.The calculation of the dendrogram uses 1 minus Pearson corre-lation coe¤cient of log-ratios as the dissimilarity measure. Thescale represents the distance between merged clusters and celllines most similar are combined ¢rst. Using this method theseven ARMS cell lines (red) again cluster together. Used withpermission [19].

Fig. 8. The positions of all 13 cell lines in two-dimensional Eu-clidean distances are determined using the method of multidi-mensional scaling to make the between cell lines correspond asclosely as possible to 1 minus the Pearson correlation coe¤cientof the log-ratio values. The X and Y scales are arbitrary. Celllines falling close to one another in the plot have high correla-tion values. Using this method the ARMS cell lines (red) clustertogether and the non-ARMS tumors (black) fall at the periph-ery of the plot. Used with permission [19].

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28M24

Page 9: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

Fig. 10. Genes overexpressed in ARMS. Thirty-seven genes highly expressed relative to the reference probe in at least four out of sev-en ARMS and their functions are listed in the ¢rst and second columns. The third column indicates the number of ARMS cell linesin which each gene is up-regulated compared with the control at a level greater than the 99% con¢dence interval, and the fourth col-umn provides the UniGene cluster designation. The expression ratios from each cell line are color-coded such that a red color indi-cates overexpression and green color reduced expression in the tumor compared with the control cell line. When the ratios of expres-sion exceed the 99% con¢dence interval, the saturation of the red or green color increases in proportion to the ratio. A ratio colorscale is shown at the bottom of the ¢gure. Used with permission [19].

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28 M25

Page 10: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

4.3. Informatics

It is obvious that large-scale, high-throughput ex-perimental methods require information processingcoupled to a variety of analysis tools. LaboratoryInformation Management (LIM) software and data-base systems to design arrays, to track clones, tocollect, analyze, and interpret data from gene expres-sion studies, are in their infancy. Among otherthings, such systems have to catalog the expressionbehavior of thousands of genes in a single experi-ment, and subsequently make comparisons acrosstissues, developmental and pathological states, or cel-lular perturbations. Very large quantities of datahave to be managed both prior to and after an ex-periment, because direct access is required to all se-quences, annotations, and physical DNA resourcesfor the genes of the organism studied. Prior to anal-ysis of data, the readout of relative expression levelsobserved on an array must be stored and preservedso that it is available for image processing, statistical,and, ¢nally, biological analysis. The latter includesidentifying transcripts that show statistically signi¢-cant changes in absolute or relative levels of ex-pression. Our laboratory has developed an imageanalysis program called deArray, developed byDr. Yidong Chen [15], which handles, in seconds,the image capture and array spot identi¢cationsteps, and catalogues the information on all £uores-cent signals within a hybridization. The criteria forstatistical signi¢cance and methods can be found atURL 14.

Very recently our laboratory has also published anarray database management system, called ArrayDB,which provides a template for handling the £ow ofarray data and presents several analysis componentsfor `mining' data generated by this technology. URL15 depicts the elements of this system. The interestedreader is referred to Ermolaeva and colleagues [12]for a text-based description and can download theschemata and software for this Sybase-anchored da-tabase at URL 13. This database performs the mostobvious and straightforward of requirements includ-ing the provision of information (by hyperlinks toEntrez, NCBI) about any available sequence, struc-tures, and functions of the gene products of interest.While these tools are both necessary and extraordi-narily time-saving to the biologist, interpreting this

wealth of information clearly remains the responsi-bility of the investigator, who must be able to inter-rogate the data sets in multiple ways.

One approach we are incorporating within Ar-rayDB is a link to all biochemical pathways to whicha particular transcript belongs and to genes withwhich the transcript is thought to interact. In thelong run, our laboratory (and many others) are look-ing at ways to modify the software so that it is capa-ble of pre-interpreting data (using a biochemicalknowledge base and set of heuristics), presenting aninvestigator with alternative hypotheses or explana-tions of its meaning, and predicting relationships andpossible pathways of interactions of the genes understudy. It should be clear that there is an absoluterequirement for computational biology, mathemati-cal modeling, and developmental biostatistical ap-proaches for successful biologic interpretation ofthe results of array experiments. This conjoining ofdisciplines is the only way that experiments involvinghundreds of thousands of genes, with tens of thou-sands of changes across multiple experiments, can bemanaged.

Very recently, we utilized cDNA microarrays toinvestigate the gene expression pro¢le of a group ofseven alveolar rhabdomyosarcoma (ARMS) cell linescharacterized by the presence of the PAX3-FKHRfusion gene and six unrelated controls [19]. Thisstudy illustrates the power and approaches of model-ing data from array experiments. The study is basedon the fact that several forms of human sarcoma,lymphoma, and leukemia are characterized by so-matically acquired chromosome translocations whichresult in fusion genes encoding chimeric transcriptionfactors with oncogenic properties. Fig. 7 shows anexample of a microarray experiment where the ex-pression pro¢le of a normal myo-¢broblast cell line(green) was compared with that of an ARMS cellline (red). Using the method of multidimensionalscaling (Fig. 8) to represent the relationships amongthe cell lines in two-dimensional Euclidean space, wefound that ARMS cells show a consistent pattern ofgene expression allowing them to be clustered togeth-er (Fig. 9). By searching across the seven ARMS celllines, we found that a minimal subset of only 37genes were most consistently expressed in ARMSrelative to a reference cell line (Fig. 10). These resultsdemonstrate the potential of cDNA microarray tech-

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28M26

Page 11: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

nology to elucidate a tumor-speci¢c gene expressionpro¢le or `¢ngerprint' in human cancers.

5. Summary

One can imagine that, one day, there will be ageneral requirement that relevant array data be de-posited, at the time of publication of manuscripts inwhich they are described, into a single site madeavailable for the storage and analysis of array data(modeled after the GenBank submission require-ments for DNA sequence information). With thissystem in place, one can anticipate a time whendata from thousands of gene expression experimentswill be available for meta-analysis, which has thepotential to balance out artifacts from many individ-ual studies, thus leading to more robust results andsubtle conclusions. This will require that data adhereto some type of uniform structure and format thatwould ideally be independent of the particular ex-pression technology used to generate it. The prosand cons of various publication modalities for theselarge electronic data sets have been discussed else-where [12], but, practical di¤culties aside, generaldepositing must occur for this technology to reachthe broadest range of investigators.

Finally, as mentioned at the beginning of this re-view, it is unfortunate that this important researchtool remains largely restricted to a few laboratoriesthat have developed expertise in this area and to agrowing number of commercial interests. Ultimatelythe real value of microarray technology will only berealized when this approach is generally available. Itis hoped that issues including platforms, instrumen-tation, clone availability, and patents [20] will beresolved shortly, making this technology accessibleto the broadest range of scientists at the earliest pos-sible moment.

URLs.URL 1: http://www.a¡ymetrix.com/URL 2: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/URL 3: http://www.nhgri.nih.gov/DIR/LCG/15K/links.htmlURL 4: http://www.nhgri.nih.gov/98plan/URL 5: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/

¢g1.gifURL 6: http://www.ncbi.nlm.nih.gov/UniGene/URL 7: http://www.ncbi.nlm.nih.gov/UniGene/Hs.stats.shtml

URL 8: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/p15kabout.html

URL 9: http://www.stanford.edu/pbrown/array.htmlURL 10: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/

images/pins_jpeg.jpgURL 11: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/

images/arrayer_jpeg.jpgURL 12: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/

images/reader_jpeg.jpgURL 13: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/

protocol.htmlURL 14: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/

img_analysis.htmlURL 15: http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/

ng_paper.html

Acknowledgements

The custom-built robotic arrayer was developed byStephen B. Leighton and scanner optics by Paul D.Smith. Thomas Pohida developed the electronics ofboth the arrayer and scanner. We thank Yuan Jiang,Gerald C. Gooden, John Lueders, Kim A. Gayton,Art A. Glatfelter and Robert L. Walker for theirexcellent technical assistance on this work. We alsothank Lao H. Saal and Darryl Leja for assistance inthe preparation of the illustrations.

References

[1] R.J. Lipshutz et al., Using oligonucleotide probe arraysto access genetic diversity, BioTechniques 19 (1995) 442^447.

[2] S.P. Fodor et al., Multiplexed biochemical assays with bio-logical chips, Nature 364 (1993) 555^556.

[3] S.P. Fodor et al., Light-directed, spatially addressable paral-lel chemical synthesis, Science 251 (1991) 767^773.

[4] A.C. Pease et al., Light-generated oligonucleotide arrays forrapid DNA sequence analysis, Proc. Natl. Acad. Sci. USA91 (1994) 5022^5026.

[5] J.G. Hacia, L.C. Brody, M.S. Chee, S.P. Fodor, F.S. Col-lins, Detection of heterozygous mutations in BRCA1 usinghigh density oligonucleotide arrays and two-colour £uores-cence analysis [see comments], Nature Genet. 14 (1996) 441^447.

[6] D.J. Lockhart et al., Expression monitoring by hybridizationto high-density oligonucleotide arrays, Nature Biotechnol. 14(1996) 1675^1680.

[7] M.S. Guyer, F.S. Collins, How is the Human Genome Proj-ect doing, and what have we learned so far?, Proc. Natl.Acad. Sci. USA 92 (1995) 10841^10848.

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28 M27

Page 12: Minireview DNA microarray technology: the anticipated … · chips, cDNA microarrays are signi¢cantly more £ex-ible in the printing process and are not limited by the necessity

[8] G.D. Schuler et al., A gene map of the human genome,Science 274 (1996) 540^546.

[9] L. Rowen, G. Mahairas, L. Hood, Sequencing the humangenome, Science 278 (1997) 605^607.

[10] C. Fields, M.D. Adams, O. White, J.C. Venter, How manygenes in the human genome?, Nature Genet. 7 (1994) 345^346.

[11] F.S. Collins et al., New goals for the U.S. Human GenomeProject: 1998^2003, Science 282 (1998) 682^689.

[12] O. Ermolaeva et al., Data management and analysis for geneexpression arrays, Nature Genet. 20 (1998) 19^23.

[13] M.S. Boguski, G.D. Schuler, ESTablishing a human tran-script map [news], Nature Genet. 10 (1995) 369^371.

[14] G.D. Schuler, Pieces of the puzzle: expressed sequence tagsand the catalog of human genes, J. Mol. Med. 75 (1997)694^698.

[15] Y. Chen, E.R. Dougherty, M.L. Bittner, Ratio-based deci-

sions and the quantitative analysis of cDNA microarray im-ages, Biomed. Opt. 2 (1997) 364^374.

[16] J. Khan et al., Expression pro¢ling in cancer using cDNAmicroarrays, Electrophoresis (1998) (in press).

[17] J. DeRisi et al., Use of a cDNA microarray to analyse geneexpression patterns in human cancer, Nature Genet. 14(1996) 457^460.

[18] N.L. Simone, R.F. Bonner, J.W. Gillespie, M.R. Emmert-Buck, L.A. Liotta, Laser-capture microdissection: openingthe microscopic frontier to molecular analysis, Trends Gen-et. 14 (1998) 272^276.

[19] J. Khan et al., Gene expression pro¢ling of alveolar rhabdo-myosarcoma with cDNA microarrays, Cancer Res. 58 (1998)5009^5013.

[20] R.F. Service, Will patent ¢ghts hold DNA chips hostage?,Science 282 (1998) 397.

BBACAN 87436 9-3-99

J. Khan et al. / Biochimica et Biophysica Acta 1423 (1999) M17^M28M28