Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to...

9
www.ajhg.org The American Journal of Human Genetics Volume 78 April 2006 671 Report Recently Mobilized Transposons in the Human and Chimpanzee Genomes Ryan E. Mills, 1,2 E. Andrew Bennett, 1,2,3 Rebecca C. Iskow, 1,2,3 Christopher T. Luttig, 1 Circe Tsui, 1,2 W. Stephen Pittard, 1,2,4 and Scott E. Devine 1,2,3 1 Department of Biochemistry, 2 Emory Center for Bioinformatics, 3 Graduate Program in Genetics and Molecular Biology, and 4 BimCore, Emory University School of Medicine, Atlanta Transposable genetic elements are abundant in the genomes of most organisms, including humans. These endogenous mutagens can alter genes, promote genomic rearrangements, and may help to drive the speciation of organisms. In this study, we identified almost 11,000 transposon copies that are differentially present in the human and chimpan- zee genomes. Most of these transposon copies were mobilized after the existence of a common ancestor of humans and chimpanzees, 6 million years ago. Alu, L1, and SVA insertions accounted for 195% of the insertions in both species. Our data indicate that humans have supported higher levels of transposition than have chimpanzees during the past several million years and have amplified different transposon subfamilies. In both species, 34% of the insertions were located within known genes. These insertions represent a form of species-specific genetic variation that may have contributed to the differential evolution of humans and chimpanzees. In addition to providing an initial overview of recently mobilized elements, our collections will be useful for assessing the impact of these insertions on their hosts and for studying the transposition mechanisms of these elements. Received July 15, 2005; accepted for publication December 30, 2005; electronically published February 2, 2006. Address for correspondence and reprints: Dr. Scott E. Devine, Emory University School of Medicine, 4133 Rollins Research Center, 1510 Clifton Road NE, Atlanta, GA 30322. E-mail: [email protected] Am. J. Hum. Genet. 2006;78:671–679. 2006 by The American Society of Human Genetics. All rights reserved. 0002-9297/2006/7804-0013$15.00 Transposable genetic elements collectively occupy 44% of the human genome (International Human Genome Sequencing Consortium 2001). Although most of these transposons lost the ability to transpose long ago, some copies have transposed in relatively recent human history (Kazazian et al. 1988; Wallace et al. 1991; Moran et al. 1996; reviewed by Ostertag and Kazazian 2001; reviewed by Batzer and Deininger 2002; Bennett et al. 2004). These recently mobilized transposons are of great inter- est for a number of reasons. First, recent insertions with- in or near genes may cause phenotypic changes in hu- mans, including diseases (Kazazian et al. 1988; Wallace et al. 1991; reviewed by Ostertag and Kazazian 2001; reviewed by Batzer and Deininger 2002). Several dozen transposon insertions have been identified to date that cause human diseases, and human populations are likely to harbor additional transposon insertions that influence phenotypes as well. Some of these recently mobilized transposons also remain actively mobile today and con- tinue to generate new transposition events elsewhere in the genome (Moran et al. 1996; reviewed by Ostertag and Kazazian 2001; Dewannieux et al. 2003). Active retrotransposons in particular have been observed to be the most potent endogenous mutagens in humans, and these elements continue to generate mutations and ge- netic variation in human populations (reviewed by Os- tertag and Kazazian 2001). In some cases, transposon insertions also may go on to create genomic rearrange- ments by recombining with other transposon copies (re- viewed by Ostertag and Kazazian 2001). Thus, recently mobilized transposons continue to restructure the hu- man genome through a variety of mechanisms. The completion of a draft chimpanzee genome se- quence provided an opportunity to identify these recently mobilized transposons in both humans and chimpan- zees (Chimpanzee Sequencing and Analysis Consortium 2005). Transposons that inserted into either of these ge- nomes during the past 6 million years (i.e., since the existence of the most recent common ancestor of humans and chimpanzees) would be expected to be present in only one of the two genomes. We used a comparative genomics approach to identify these recently inserted transposon copies (fig. 1). We began by aligning the se- quences of the human and chimpanzee genomes to iden- tify all insertions and deletions (indels). We screened indels for the presence of transposable elements by comparing each indel to a library of known transposons (RepBase v. 10.02) (Jurka 2000). Using this

Transcript of Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to...

Page 1: Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to restructure the hu-man genome through a variety of mechanisms. The completion of a draft

www.ajhg.org The American Journal of Human Genetics Volume 78 April 2006 671

Report

Recently Mobilized Transposons in the Human and Chimpanzee GenomesRyan E. Mills,1,2 E. Andrew Bennett,1,2,3 Rebecca C. Iskow,1,2,3 Christopher T. Luttig,1Circe Tsui,1,2 W. Stephen Pittard,1,2,4 and Scott E. Devine1,2,3

1Department of Biochemistry, 2Emory Center for Bioinformatics, 3Graduate Program in Genetics and Molecular Biology, and 4BimCore, EmoryUniversity School of Medicine, Atlanta

Transposable genetic elements are abundant in the genomes of most organisms, including humans. These endogenousmutagens can alter genes, promote genomic rearrangements, and may help to drive the speciation of organisms.In this study, we identified almost 11,000 transposon copies that are differentially present in the human and chimpan-zee genomes. Most of these transposon copies were mobilized after the existence of a common ancestor of humansand chimpanzees, ∼6 million years ago. Alu, L1, and SVA insertions accounted for 195% of the insertions in bothspecies. Our data indicate that humans have supported higher levels of transposition than have chimpanzees duringthe past several million years and have amplified different transposon subfamilies. In both species, ∼34% of theinsertions were located within known genes. These insertions represent a form of species-specific genetic variationthat may have contributed to the differential evolution of humans and chimpanzees. In addition to providing aninitial overview of recently mobilized elements, our collections will be useful for assessing the impact of theseinsertions on their hosts and for studying the transposition mechanisms of these elements.

Received July 15, 2005; accepted for publication December 30, 2005; electronically published February 2, 2006.Address for correspondence and reprints: Dr. Scott E. Devine, Emory University School of Medicine, 4133 Rollins Research Center, 1510

Clifton Road NE, Atlanta, GA 30322. E-mail: [email protected]. J. Hum. Genet. 2006;78:671–679. � 2006 by The American Society of Human Genetics. All rights reserved. 0002-9297/2006/7804-0013$15.00

Transposable genetic elements collectively occupy ∼44%of the human genome (International Human GenomeSequencing Consortium 2001). Although most of thesetransposons lost the ability to transpose long ago, somecopies have transposed in relatively recent human history(Kazazian et al. 1988; Wallace et al. 1991; Moran et al.1996; reviewed by Ostertag and Kazazian 2001; reviewedby Batzer and Deininger 2002; Bennett et al. 2004).These recently mobilized transposons are of great inter-est for a number of reasons. First, recent insertions with-in or near genes may cause phenotypic changes in hu-mans, including diseases (Kazazian et al. 1988; Wallaceet al. 1991; reviewed by Ostertag and Kazazian 2001;reviewed by Batzer and Deininger 2002). Several dozentransposon insertions have been identified to date thatcause human diseases, and human populations are likelyto harbor additional transposon insertions that influencephenotypes as well. Some of these recently mobilizedtransposons also remain actively mobile today and con-tinue to generate new transposition events elsewhere inthe genome (Moran et al. 1996; reviewed by Ostertagand Kazazian 2001; Dewannieux et al. 2003). Activeretrotransposons in particular have been observed to bethe most potent endogenous mutagens in humans, and

these elements continue to generate mutations and ge-netic variation in human populations (reviewed by Os-tertag and Kazazian 2001). In some cases, transposoninsertions also may go on to create genomic rearrange-ments by recombining with other transposon copies (re-viewed by Ostertag and Kazazian 2001). Thus, recentlymobilized transposons continue to restructure the hu-man genome through a variety of mechanisms.

The completion of a draft chimpanzee genome se-quence provided an opportunity to identify these recentlymobilized transposons in both humans and chimpan-zees (Chimpanzee Sequencing and Analysis Consortium2005). Transposons that inserted into either of these ge-nomes during the past ∼6 million years (i.e., since theexistence of the most recent common ancestor of humansand chimpanzees) would be expected to be present inonly one of the two genomes. We used a comparativegenomics approach to identify these recently insertedtransposon copies (fig. 1). We began by aligning the se-quences of the human and chimpanzee genomes to iden-tify all insertions and deletions (indels).

We screened indels for the presence of transposableelements by comparing each indel to a library of knowntransposons (RepBase v. 10.02) (Jurka 2000). Using this

Page 2: Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to restructure the hu-man genome through a variety of mechanisms. The completion of a draft

Figure 1 Overview of our transposon insertion–discovery pipeline. A, The time line for speciation of humans and chimpanzees is comparedwith the time line for the generation of transposon insertions. Common insertions occurred a very long time ago and are fixed in both species.“Species-specific” insertions are differentially present in the two species and occurred mostly during the past ∼6 million years. MYA p millionyears ago. B, Our strategy for identifying new transposon insertions in humans and chimpanzees. Recently mobilized transposons are flankedby TSDs and are precisely absent from one of the two genomes. One of the two copies of the TSD is actually found within the indel. Thus,the transposon plus one TSD copy equals the “fill.” C, Our computational pipeline. The five sequential steps of our computational pipeline fordiscovering species-specific transposon insertions in humans and chimpanzees are depicted. The draft chimpanzee-genome (build panTro1) andhuman-genome (build hg17) sequences were obtained from the University of California Santa Cruz browser (Kent et al. 2002). BAC clonesequences for the chimpanzee genome were obtained from GenBank (National Center for Biotechnology Information [NCBI]). BLAST programsalso were obtained from NCBI. Repeatmasker was obtained from Arian Smit (Institute for Systems Biology). RepBase version 10.02 and theconsensus sequence for the L1-Hs element were obtained from Jurzy Jurka (Jurka 2000). Full-length consensus sequences for L1-PA2, L1-PA3,L1-PA4, and L1-PA5 were obtained from GenBank (Boissinot et al. 2000). Custom MySQL databases and PERL scripts were generated asnecessary. All analysis was performed locally on SUN SunFire v40z or Dell Power Edge 2500 servers running Linux operating systems. Ourcomputational pipeline began with identification of all indels in humans versus chimpanzees using genomic alignments that were generated withBLASTz. Next, indels containing transposons were identified using Repeatmasker (A. Smit, unpublished material) and RepBase version 10.02(Jurka 2000). RepBase libraries for humans and chimpanzees were modified to include full-length L1-PA2, L1-PA3, L1-PA4, L1-PA5 consensussequences (Boissinot et al. 2000). TSDs were identified using a Smith-Waterman local alignment algorithm on the regions flanking each indeljunction. The algorithm was restricted to require the optimum alignment to be located within 5 bp of the indel junction. Aligned sequencessmaller than 4 bp or having an identity !90% were not scored as TSDs. A probability scoring system was developed to determine the likelihoodthat a given indel was caused by a single transposon insertion plus its TSD. This score was obtained by adding together the fraction of theindel that was accounted for by the transposon, its TSD, and a poly (A) tail (if present). A score of 1.0 indicated that the gap was fully accountedfor by the transposon and associated sequences. We empirically determined that a lower cutoff of 0.85 provided accurate results while eliminatingfew, if any, true positives. SVA elements initially were annotated poorly by Repeatmasker. This program often split SVA elements into 2–3segments (and thus counted most elements more than once). We developed a new method to reassemble these segments into a single element,where appropriate.

Page 3: Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to restructure the hu-man genome through a variety of mechanisms. The completion of a draft

www.ajhg.org The American Journal of Human Genetics Volume 78 April 2006 673

Table 1

Summary of Transposon Insertions

TRANSPOSON

CLASS

NO. (%) OF TRANSPOSON

INSERTIONS

Human( )n p 7,786

Chimpanzee( )n p 2,933

Alu (All): 5,530 (71.0%) 1,642 (56.1%)Alu S 263 (3.3%) 50 (1.7%)Alu Ya5 1,709 (21.9%) 10 (.3%)Alu Yb8 1,290 (16.6%) 9 (.3%)Alu Y 484 (6.2%) 360 (12.3%)Alu Yc1 356 (4.6%) 979 (33.4%)Alu Yg6 261 (3.4%) 1 (.1%)

L1 (All): 1,174 (15.1%) 758 (25.9%)L1 Hs (Ta) 271 (3.5%) 0 (.0%)L1 Hs (Non Ta) 252 (3.2%) 210 (7.2%)L1 PA2 490 (6.3%) 476 (16.2%)

SVA (All) 864 (11.1%) 396 (13.6%)Other 219 (2.8%) 127 (4.4%)

approach, we initially identified a total of 14,783 trans-poson copies that were differentially present in the twogenomes. Many of these copies appeared to be recentlymobilized transposon insertions, whereas others weresimply transposon copies that happened to be locatedwithin larger genomic duplications or deletions in thetwo genomes.

To identify all of the insertions that were caused byactual transposition events, we next screened our collec-tions for insertions that (1) were precisely flanked by tar-get-site duplications (TSDs) and (2) precisely accountedfor a gap in one of the two genomes. Using these criteria,we identified 10,719 insertions of single transposon cop-ies that appeared to have been caused by transpositionevents (see the tab-delimited ASCII files, which can beimported into spreadsheets, of data sets 1 and 2 [onlineonly]). The remaining 4,064 examples lacked TSDs or,in general, did not precisely account for the indels, whichsuggests that they were caused by alternative mechanisms.Of the 10,719 transposon insertions, 7,786 (72.6%)were found in humans and only 2,933 (27.4%) werefound in chimpanzees. Therefore, it appears that tran-sposons have been significantly more active in the humangenome during the parallel evolution of these organisms(see data sets 1 and 2 [online only]). The different pop-ulation dynamics of these organisms during the past sev-eral million years also may have helped to shape thefinal patterns of transposons observed.

The most abundant classes of new transposon inser-tions in both chimpanzees and humans were Alu, L1,and SVA element insertions, and these three classes col-lectively accounted for 195% of the recently mobilizedtransposons in both species (table 1 and fig. 2). However,the relative abundance of these elements and their sub-families differed between the two species (table 1; datasets 1 and 2 [online only]; fig. 2; see below). Other, less-abundant classes of transposon insertions also wereidentified in our study. For example, long terminal repeat(LTR) retroelement insertions were observed in both spe-cies, including insertions of human endogenous retro-viruses (HERVs) and solo LTRs of these elements (dataset 1 [online only]). Solo LTR insertions have been shownto influence the expression of nearby genes, which makesthese insertions of particular interest (Landry and Mager2003). Also identified were five full-length HERV-K in-sertions with relatively long ORFs (up to several thou-sand amino acids in length) that could remain capableof retrotransposition (data set 1 [online only]). Insertionsof chimpanzee endogenous retroviruses (CERVs) alsowere identified (data set 2 [online only]) (Yohn et al.2005). Finally, mammalian interspersed repetitive ele-ments, copies of satellite DNA flanked by unusual TSDs,and small numbers of other interesting transposable el-ements were identified in the two species (data sets 1 and2 [online only]).

Alu insertions were by far the most abundant class oftransposon insertions in both humans and chimpanzees,and these insertions collectively accounted for the bulkof transposons in our study (table 1 and fig. 2). Thenumber of Alu insertions in humans (5,530) was 3.4-fold higher than the number observed in chimpanzees(1,642). The distributions of these elements among var-ious Alu subfamilies also differed between the two or-ganisms (table 1; data sets 1 and 2 [online only]; fig. 2).For example, Alu Ya5, Alu Yb8, Alu Y, and Alu Yc1were highly abundant in humans, whereas only Alu Yc1and Alu Y were highly abundant in chimpanzees. Ourdata indicate that Alu S elements, which have been pre-sumed to have been inactive for the past 35 million years(Johanning et al. 2003), apparently have been active inhumans and less active in chimpanzees during the past∼6 million years (table 1). It is possible that some ofthese older Alu S “insertions” were caused by the precisedeletion of Alu S elements from one of the two genomes(van de Lagemaat et al. 2005) or by gene-conversionevents (reviewed by Batzer and Deininger 2002). How-ever, these results also are in agreement with recent datafrom our laboratory, which indicates that a small num-ber of younger Alu S elements are polymorphic in hu-mans and appear to have transposed more recently thanthe bulk of Alu S elements (Bennett et al. 2004). Overall,our results indicate that the human genome has sup-ported higher levels of Alu retrotransposition and hasamplified a different set of Alu elements than has thechimpanzee genome (table 1 and fig. 2). These resultsconfirm and extend previous classifications of Alu el-ements of chimpanzee chromosome 22 (Hedges et al.2004; Watanabe et al. 2004).

L1 insertions also were abundant in both organisms.In humans, almost 1,200 recently mobilized L1 insertions

Page 4: Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to restructure the hu-man genome through a variety of mechanisms. The completion of a draft

674 The American Journal of Human Genetics Volume 78 April 2006 www.ajhg.org

Figure 2 Classes of species-specific transposons in humans and chimpanzees. A, The overall composition of species-specific insertions inhumans and chimpanzees. Note that 97.2% of all insertions in humans and 95.6% of all insertions in chimpanzees are Alu, L1, and SVAinsertions. B, The distributions of Alu and L1 subfamilies for humans. C, The distributions of Alu and L1 subfamilies for chimpanzees. Notethat different Alu and L1 subfamilies were amplified in humans (B) and chimpanzees (C).

with TSDs were identified that precisely accounted forgaps in the chimpanzees genome (data set 1 [online only]and table 1). These human L1 elements predominantlyincluded members of the L1-Hs and L1-PA2 families(data set 1 [online only]; table 1; fig. 2) (Boissinot et al.2000; Brouha et al. 2003). The human L1-Hs elementsincluded members of the pre-Ta, Ta0, and Ta1 subfam-ilies (data set 1 [online only]; grouped together as “L1-Hs Ta” in table 1 and fig. 2), which are known to behighly active in humans (Brouha et al. 2003). Also iden-tified in humans were additional L1-Hs and L1-PA2 sub-families that had unique base combinations at the ninekey positions described elsewhere (data set 1 [online only];grouped together as “L1-Hs non-Ta” or “L1-PA2” intable 1 and fig. 2) (Boissinot et al. 2000; Brouha et al.2003). These novel subfamilies contained 3–65 copies(data set 1 [online only]). The remaining L1 insertionsin humans belonged to older L1-PA2, L1-PA3, and L1-PA4 groups (data set 1 [online only] and fig. 2).

The L1 insertions identified in chimpanzees, in con-trast, were notably different from those outlined abovefor humans (table 1; data set 2 [online only]; fig. 2). For

example, fewer recently mobilized L1 insertions wereidentified in chimpanzees than in humans (758 in chim-panzees vs. 1,174 in humans). Only 4 of the chimpanzeeL1 insertions were full-length (compared with 1200 newfull-length insertions in humans), and none of the chim-panzee L1 insertions had intact ORFs (data set 2 [onlineonly]). The initial draft sequence of the chimpanzee ge-nome is likely to contain assembly errors that may ac-count for at least some of these observed differences.However, we also observed differences in the L1 sub-families of these organisms that are unrelated to genomeassembly issues. For example, proportionally more L1-PA2 insertions and fewer L1-Hs insertions were observedin chimpanzees than in humans (data set 2 [online only]and fig. 2). Initially, we were surprised to find L1-Hselements in chimpanzees at all, since these elements wereexpected to be found only in humans. However, furtheranalysis revealed that most of the L1-Hs elements inchimpanzees actually were “intermediate” elements thatmatched L1-Hs overall but had ORF1 sequences thatwere more similar to L1-PA2 elements (data set 2 [onlineonly]). Therefore, the L1-Hs family of elements includes

Page 5: Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to restructure the hu-man genome through a variety of mechanisms. The completion of a draft

www.ajhg.org The American Journal of Human Genetics Volume 78 April 2006 675

Table 2

Analysis of L1 ORFs

ORF

NO. OF ELEMENTS

FullHumanGenome(versionhg17)

Chimpanzee

BACs(260 Mb)

Full Genome(Extrapolation)

L1 15,500 bp 8,483 702 8,100Intact ORF1 (1,017 bp) 633 20 230Intact ORF2 (3,828 bp) 205 4 46Intact ORF1 and ORF2 126 2 23

subfamilies that are truly human specific as well as otherL1-Hs–like elements that are not human specific. We alsoaligned and analyzed all of our chimpanzee L1 inser-tions, using ClustalW and PAUP, to determine whetherany new L1 subfamilies (equivalent to L1-Ta elementsin humans) were present in chimpanzees. In addition, weclassified all of our chimpanzee L1 insertions, using thenine key positions that have been used elsewhere to clas-sify human L1 elements (Boissinot et al. 2000; Brouhaet al. 2003). In both cases, we failed to identify any newextended subfamilies of L1 elements within our collec-tion of chimpanzee insertions. Therefore, a dominant classof new offspring elements analogous to L1-Ta elementsin humans does not appear to have been produced inrecent chimpanzee history (data set 2 [online only]).

We next examined all of the existing L1 ORFs in thehuman and chimpanzee genomes to further characterizepossible differences between the L1 elements of thesespecies. We screened the human and chimpanzee genomesfor ORFs in all nearly full-length elements (15,500 bp)and identified 633 L1 elements with intact ORF1 se-quences in the human genome but only 39 elements withintact ORF1 sequences in the draft chimpanzee sequence(see the tab-delimited ASCII files, which can be importedinto spreadsheets, of data sets 3 and 4 [online only]).Moreover, we identified 205 L1 elements with intactORF2 sequences in the human genome (table 2) butfailed to detect intact ORF2 sequences in the draft chim-panzee sequence (data set 3 [online only]). These resultssuggested that functional L1 elements were likely to berare in chimpanzees. As outlined above, however, it alsowas possible that the quality of the chimpanzee draftsequence affected our ability to detect ORFs accurately.We determined that the sequence quality of the 15,500bp L1 elements in chimpanzees had average scores thatgenerally were high (140 Phred scores) (Ewing et al.1998; Kent et al. 2002; data set 3 [online only]). How-ever, single bases of low quality (!10 Phred scores) (Ew-ing et al. 1998) also were distributed throughout thedraft sequence at sporadic intervals (Kent et al. 2002).These single bases, although rare, often resulted in frame-shifts. Therefore, it was possible that these sporadic low-quality bases were interfering with our ability to detectORFs accurately.

To independently examine the frequency of intact L1ORFs in chimpanzees, we analyzed all L1 elements thatwere present in finished BAC sequences that had beengenerated for the chimpanzee genome project (see thetab-delimited ASCII file, which can be imported into aspreadsheet, of data set 5 [online only]). Approximately260 Mb of finished sequence was available in GenBankfrom chimpanzee BACs, and the quality of these se-quences was identical to that of the finished human ge-nome sequence. We identified a total of two L1 elementsin these BACs that were 15,500 bp in length and also

had intact ORFs (data set 5 [online only] and table 2).Neither of these two elements was present in the draftsequence, so it is unclear whether the quality of the draftaffected our ability to detect these ORFs. Nevertheless,extrapolation of these results to the whole genome(3,000 Mb) predicts that chimpanzees harbor ∼23 full-length L1s with intact ORFs (compared with 126 inhumans; table 2). Thus, the chimpanzee draft sequenceindicates that L1 elements with intact ORFs are up to20-fold less abundant in chimpanzees than in humans,whereas the BACs indicate that such elements are ∼5.5-fold less abundant (table 2). Since the draft chimpanzeesequence contains some sporadic low-quality bases, theBAC estimate is likely to be more accurate. Both of theseestimates indicate that functional L1 elements are lessabundant in chimpanzees than in humans.

We next examined the L1 ORF1 and ORF2 codingregions from chimpanzees to determine whether the en-coded proteins are likely to remain active today. Brouhaet al. (2003) showed elsewhere that human L1 elementsthat differ by an average of only 21 nucleotide changesfrom an active human L1 consensus were inactive. Thus,even elements that were 199% identical to this consensuscould be inactive, and no human elements that were!99% identical were found to be active. We determinedthat the human genome contains at least 119 elementswith 199% nucleotide identity to the active human L1consensus (Brouha et al. 2003) within the regions en-coding ORF1 and ORF2 (data set 4 [online only]). Incontrast, no L1 ORFs in the chimpanzee genome orBACs had 199% identity to this active human L1 con-sensus (data set 3 [online only]). We cannot rule out thepossibility that some of the chimpanzee L1 elements thatare !99% identical to the human L1 consensus couldremain active. Other “hot L1” elements might haveevolved separately in chimpanzees that are 11% variantfrom the most-active human elements. However, sincewe failed to detect extended subfamilies of L1-PA2 orL1-Hs elements within our collection of chimpanzee in-sertions (analogous to the L1-Ta subfamily in humans),such elements generally would be present at low copy

Page 6: Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to restructure the hu-man genome through a variety of mechanisms. The completion of a draft

676 The American Journal of Human Genetics Volume 78 April 2006 www.ajhg.org

numbers in the chimpanzee genome. Thus, the landscapeof potentially functional L1 elements in chimpanzees ap-pears to be quite different from the landscape of activeL1 elements in humans.

To verify these ORF results, we used an ORF1-trapping method to recover full-length ORF1 sequencesfrom L1 elements in humans and chimpanzees (Ivics andIzsvak 1997). We recovered and sequenced 41 intactORF1 sequences from humans and 51 intact ORF1 se-quences from chimpanzees (see the tab-delimited ASCIIfiles, which can be imported into spreadsheets, of datasets 6 and 7 [online only]) and observed results that werevery similar to those obtained through the computationalmethods described above. None of the intact chimpanzeeORF1 sequences recovered through ORF1 trapping were199% identical to the active human L1 consensus,whereas 17 (41%) of the 41 intact human ORF1 se-quences were 199% identical to the active human L1consensus (data sets 6 and 7 [online only]). Almost allof the intact ORF1 sequences trapped from chimpanzees(48/51; 94%) were L1-PA2 ORF1 sequences (data set 6[online only]). In contrast, only 21 (51%) of the 41 intactORF1 sequences recovered from humans were L1-PA2ORF1 sequences and 18 (44%) were L1-Hs sequences(data set 7 [online only]). Thus, our ORF1-trapping ex-periments confirmed that the most recently active ele-ments in chimpanzees (i.e., those with intact ORFs) con-tained ORF1 sequences that were divergent from theactive L1 consensus in humans (Brouha et al. 2003).

Our method for trapping ORF1 in humans and chim-panzees employed the pbFUS plasmid (Ivics and Izsvak1997). Briefly, full-length ORF1 sequences were amplifiedfrom human and chimpanzee genomic DNA (NA1MR91and NA03448A, respectively [Coriell Cell Repository])using PCR. The PCR primers were identical for hu-mans and chimpanzees, and had the following sequences5′-CCTGATCTGCGGCCGCATGGGGAAAAAACAG-AACAGAAAAACTGG-3′ and 5′-CGTCCGAACGATAT-CCATTTTGGCATGATTTTGCAGCGGCTGG-3′. Weused a combination of human and chimpanzee ORF1sequences to design these primers. The ORF1 sequencesidentified in our chimpanzee BAC experiments werealigned to generate a consensus sequence using ClustalW.This sequence was compared to the human L1 consen-sus, and we determined that the primers chosen wereconserved in human L1 sequences. Finally, we comparedthe candidate primer regions with L1-PA2, L1-PA3, andL1-PA4 elements and determined that the primer se-quences also were completely conserved in these ele-ments. Thus, the primers chosen were capable of am-plifying a wide spectrum of ORF1 sequences in bothhumans and chimpanzees (including L1-Hs, L1-PA2, L1-PA3, and L1-PA4 elements). The NotI and EcoRV re-striction sites that were introduced for cloning purposesare underlined. PCR products were cut with these en-

zymes and ligated to NotI/SmaI-digested pßFUS, suchthat the complete ORF1 sequence would be inframe withthe AUG-less lacZ in the plasmid. Recombinants wereidentified on LB medium containing X-gal (recombi-nants with ORFs were blue, whereas those withoutORFs and the empty vector alone were white). DNAwas prepared and sequenced at Agencourt Biosciencesusing the primers 5′-CCAGTCACGTTGTAAAACGAC-3′ and 5′-CTAGGCCTGTACGGAAGTGTTAC-3′. High-quality sequences were analyzed and assembled usingSequencher version 4.1.2.

In addition to Alu and L1 insertions, we also foundthat SVA elements have been highly active in humans andchimpanzees (table 1). In fact, SVA insertions were al-most as abundant as L1 insertions in humans during thepast ∼6 million years (table 1). SVA is an unusual com-posite element that contains four components: (1) a tan-dem repeat of TCTCCC(n), (2) an unusual Alu elementin reverse orientation, (3) a central variable-number-of-tandem-repeat (VNTR) region that is rich in CpG se-quences, and (4) a SINE-R sequence that was derivedfrom an LTR element (Shen et al. 1994). SVA ends witha poly (A) tail and is flanked by TSDs that closely re-semble the TSDs of Alu and L1 elements (Ostertag etal. 2003; Bennett et al. 2004). SVA recently was foundto be highly polymorphic among humans (Bennett et al.2004), and a few instances have been reported of SVAinsertions causing diseases (Kobayashi et al. 1998; Os-tertag et al. 2003). Our study now provides further evi-dence that SVA has been actively mobile in relativelyrecent primate history and may remain active today.

The ORF1 and ORF2 proteins of L1 elements performa specific retrotransposition mechanism known as “tar-get-primed reverse transcription” (TPRT) (Luan et al.1993), in which L1 mRNAs are copied into cDNAs andintegrated into the genome (reviewed by Ostertag andKazazian 2001). Alu RNAs (and other cellular RNAs)can compete for the L1 machinery during the TPRTprocess, which leads to the retrotransposition of thesealternative RNAs instead of the normal L1 mRNAs (Es-nault et al. 2000; Wei et al. 2001; Dewannieux et al.2003). This “trans” mechanism of retrotransposition isthought to have led to the massive expansion of Alu(Dewannieux et al. 2003) and SVA (Ostertag et al. 2003;Bennett et al. 2004) elements in the human genome.Therefore, if L1 elements are indeed less functional inchimpanzees, as predicted above (table 2), we likewisemight expect to see fewer Alu and SVA insertions in thechimpanzee genome. Table 1 and figure 3 show that thisis, in fact, the case. Since other factors also influence theamplification rates of Alu (and probably SVA) elements,these differences may not be totally caused by lowerlevels of L1 activity in chimpanzees. It is possible, forexample, that humans had a larger number of potentiallyactive Alu and SVA source elements than did chimpan-

Page 7: Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to restructure the hu-man genome through a variety of mechanisms. The completion of a draft

Figure 3 Genomic distributions of transposon insertions. A, Genomic distribution of Alu, L1, SVA, and other elements in the humangenome. B, Genomic distribution of Alu, L1, SVA and other elements in the chimpanzee genome. For both genomes, the number of insertionsin each chromosome is generally proportional to the amount of DNA present. Note that the Y-axis is the same for both charts. Thus, manymore transposon insertions are present throughout the human genome than the chimpanzee genome (compare the number of insertions depictedin panels A and B).

Page 8: Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to restructure the hu-man genome through a variety of mechanisms. The completion of a draft

678 The American Journal of Human Genetics Volume 78 April 2006 www.ajhg.org

Table 3

Transposon Insertions within Genes

Insertions in Genes Human Chimpanzee

Total no. insertions in genes 2,642 990No. of unique genes hit 1,891 828No. of promoters 50 13No. of exons 7 4No. of introns 2,478 973No. of terminators 17 4No. unclassified 90 0No. of insertions per gene:

0 16,901 19,3281 1,457 7042 265 973 99 224 37 35 13 16 9 07 4 08 1 09 5 110 1 0

zees in recent history. However, when combined withour other data demonstrating that chimpanzees (1) havefewer full-length L1 insertions than humans, (2) havefewer L1 elements with intact ORFs than humans, (3)have ORF sequences that are divergent from active hu-man L1 elements, and (4) lack extended subfamilies ofnew insertions, these data collectively indicate that chim-panzees are likely to have supported lower levels of L1activity in recent history compared with humans.

We next examined the genomic distributions of ourrecently mobilized transposon insertions. In both humansand chimpanzees, these insertions generally were distrib-uted according to the amount of DNA that was presenton each chromosome (fig. 3). We also examined the dis-tributions of new insertions relative to genes (table 3and tab-delimited ASCII files, which can be importedinto spreadsheets, of data sets 8 and 9 [online only]).Approximately 34% of the new insertions in both ge-nomes were located within known genes (defined as 3kb upstream to 0.5 kb downstream of a RefSeq gene)(table 3 and data sets 8 and 9 [online only]). Using thesame criteria, we determined that genes occupy ∼34%of the human and chimpanzee genomes (33.5% and34.8%, respectively). Therefore, the fraction of inser-tions in genes was very close to that expected if inte-gration (and mechanisms that subsequently remove in-sertions) had occurred randomly during the past ∼6 mil-lion years.

However, further analysis of these patterns revealedthat they were not, in fact, random. Although we iden-tified insertions in only ∼14% of all human genes, manyof these genes had more than one insertion (table 3 anddata set 9 [online only]). Overall, about a third of thehuman genes with insertions contained multiple inser-tions. Similar results were observed in the chimpanzeegenome (16.5% of the genes with insertions had multipleinsertions). We performed one-sample Z tests with ourinsertions and determined that the observed patternswere not consistent with a random integration model.For example, we observed 16,901 human genes thatlacked new transposon insertions from our collections(table 3). The chance of observing this many human geneswithout such insertions is zero ( ) with a randomP p 0integration model. Similar results were observed with Ztests for the remaining integration classes listed in table3 (data not shown). Therefore, our statistical tests al-lowed us to reject the hypothesis of random integrationwith a very high degree of confidence. On the basis ofthis analysis, it appears that a large fraction of the newtransposon insertions in humans and chimpanzees (themajority of which were Alu, L1, and SVA elements) weretargeted preferentially to specific genes. It is also possi-ble that negative selection eliminated insertions from alarger initial collection over time, and this led to theappearance of nonrandom integration. Although tar-

geted integration of L1 has not been observed previouslyin biochemical or cell culture experiments, previous stud-ies indicate that transposons are eliminated through neg-ative selection (Boissinot et al. 2001). Thus, negativeselection is likely to have played a role in dictating thefinal patterns of transposons observed. Our data alsomay reflect an integration targeting mechanism that isnot functional in cell-culture systems but is active in thegermline of whole organisms, where all of our insertionsoccurred.

Our study indicates that a relatively large number ofinsertions occurred within genes during the evolution ofhumans and chimpanzees (2,642 in humans and 990 inchimpanzees) (table 3). It is likely that at least some ofthese insertions altered the expression of the target genes,perhaps to the extent that mutant phenotypes emerged.Thus, at least some of the insertions might have had animpact on the differential speciation of humans and chim-panzees by influencing the expression of nearby genes.Since humans received at least 4,853 additional trans-poson insertions compared with chimpanzees, the im-pact of transposon mutagenesis was likely to be greatestin humans during the past several million years.

In conclusion, we have determined that the originalset of transposons in the common ancestor of humansand chimpanzees behaved differently during the subse-quent evolution of these organisms. More than 95% ofthe new transposon insertions in both organisms wereAlu, L1, and SVA insertions. However, our data indicatethat humans and chimpanzees have amplified very dif-ferent subfamilies of these elements. Our combined dataalso indicate that chimpanzees have supported lower lev-els of L1 activity than have humans during the past

Page 9: Report Recently Mobilized Transposons in the Human and ...mobilized transposons continue to restructure the hu-man genome through a variety of mechanisms. The completion of a draft

www.ajhg.org The American Journal of Human Genetics Volume 78 April 2006 679

several million years, and this has led to decreased levelsof Alu, L1, and SVA transposition in chimpanzees. Otherfactors, such as differences in population sizes and dif-ferences in population bottlenecks, also are likely to haveinfluenced the final patterns of transposon insertions ob-served in these organisms. In some cases, apparent “in-sertions” may have been caused by the precise deletionof transposon copies through homologous recombina-tion at the TSDs flanking these elements (van de Lage-maat et al. 2005). A fraction of our insertions also mayhave been older polymorphisms that were subject to lin-eage sorting. Thus, the final patterns of transposons inthese genomes are likely to have been shaped not onlyby integration and excision mechanisms but also by thepopulation dynamics of these organisms during the pastseveral million years.

Acknowledgments

We thank the Washington University Genome SequencingCenter and the Whitehead Genome Center for their chimpanzeedraft sequence data. We thank Zoltan Ivics for the pbFUS plas-mid. We thank Shari Corin for critical review of the manuscriptand for helpful advice. We thank Haiyan Wu for help withstatistical analysis. This work was supported by National In-stitutes of Health training grant 2T32GM00849 (to E.A.B. andR.C.I.), a grant from SUN Microsystems (to W.S.P. and S.E.D.),and National Institutes of Health grant 1R01HG002898 (toS.E.D.).

ReferencesBatzer MA, Deininger PL (2002) Alu repeats and human genomic

diversity. Nat Rev Genet 3:370–379Bennett EA, Coleman LE, Tsui C, Pittard WS, Devine SE (2004) Nat-

ural genetic variation caused by transposable elements in humans.Genetics 168:933–951

Boissinot S, Chevret P, Furano A (2000) L1 (LINE-1) retrotransposonevolution and amplification in recent human history. Mol Biol Evol17:915–928

Boissinot S, Entezam A, Furano AV (2001) Selection against deleteriousLINE-1-containing loci in the human lineage. Mol Biol Evol 18:926–935

Brouha B, Schstak J, Badge RM, Lutz-Prigg S, Farbey AH, Moran JV,Kazazian HH (2003) Hot L1s account for the bulk of retrotrans-position in the human population. Proc Natl Acad Sci USA 100:5280–5285

Chimpanzee Sequencing and Analysis Consortium (2005) Initial se-quence of the chimpanzee genome and comparison with the humangenome. Nature 437:69–87

Dewannieux M, Esnault C, Heidmann T (2003) LINE-mediated ret-rotransposition of marked Alu sequences. Nat Genet 35:41–48

Esnault C, Maestre J, Heidmann T (2000) Human LINE retrotran-sposons generate processed pseudogenes. Nat Genet 24:363–367

Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of au-tomated sequencer traces using Phred. I. Accuracy assessment. Ge-nome Res 8:175–185

Hedges DJ, Callinan PA, Cordaux R, Xing J, Barnes E, Batzer MA

(2004) Differential Alu mobilization and polymorphism among thehuman and chimpanzee lineages. Genome Res 14:1068–1075

International Human Genome Sequencing Consortium (2001) Initialsequencing and analysis of the human genome. Nature 409:860–921

Ivics Z, Izsvak Z (1997) Family of plasmid vectors for the expressionof b-galactosidase fusion proteins in eukaryotic cells. Biotechniques22:254–256

Johanning K, Stevenson CA, Oyeniran OO, Gozal YM, Roy-EngelAM, Jurka J, Deininger PL (2003) Potential for retroposition by oldAlu subfamilies. J Mol Evol 56:658–664

Jurka J (2000) Repbase update a database and an electronic journalof repetitive elements. Trends Genet 16:418–420

Kazazian HH, Wong C, Youssoufian H, Scott AF, Phillips DG, An-tonarakis SE (1988) Haemophilia A resulting from de novo insertionof L1 sequences represents a novel mechanism for mutation in man.Nature 332:164–166

Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM,Hausler D (2002) The human genome browser at UCSC. GenomeRes 12:996–1006

Kobayashi K, Nakahori Y, Miyake M, Matsumura K, Kondo-lida E,Nomura Y, Segawa M, Yoshioka M, Saito K, Osawa M, HamanoK, Sakakihara Y, Nonaka I, Nakagome Y, Kanazawa I, NakamuraY, Tokunaga K, Toda T (1998) An ancient retrotransposal insertioncauses Fukuyama-type congenital muscular dystrophy. Nature 394:388–392

Landry JR, Mager DL (2003) Functional analysis of the endogenousretroviral promoter of the human endothelin B receptor gene. J Virol77:7459–466

Luan DD, Korman MH, Jakubczak JL, Eickbush TH (1993) Reversetranscription of R2Bm RNA is primed by a nick at the chromosomaltarget site: a mechanism for non-LTR retrotransposition. Cell 72:595–605

Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Kazazian HH (1996)High frequency retrotransposition in cultured mammalian cells. Cell87:917–927

Ostertag EM, Goodier JL, Zhang Y, Kazazian HH Jr (2003) SVAelements are nonautonomous retrotransposons that cause disease inhumans. Am J Hum Genet 73:1444–1451

Ostertag EM, Kazazian HH (2001) Biology of mammalian L1 retro-transposons. Annu Rev Genet 35:501–538

Shen L, Wu L, Sanlioglu S, Chen R, Mendoza AR, Dangel AW, CarrollMC, Zipf WB, Yu CY (1994) Structure and genetics of the partiallyduplicated gene RP located immediately upstream of the comple-ment C4A and C4B genes in the HLA class III region. J Biol Chem269:8466–8476

van de Lagemaat LN, Gagnier L, Medstrand P, Mager DL (2005) Ge-nomic deletions and precise removal of transposable elements medi-ated by short identical DNA segments in primates. Genome Res 15:1243–1249

Wallace MR, Andersen LB, Saulino AM, Gregory PE, Glover TW,Collins FS (1991) A de novo Alu insertion results in neurofibro-matosis type 1. Nature 353:864–866

Watanabe H, Fujiyama A, Hattori M, Taylor TD, Toyoda A, KurokiY, Noguchi H, et al (2004) DNA sequence and comparative analysisof chimpanzee chromosome 22. Nature 429:382–388

Wei W, Gilbert N, Ooi SL, Lawler JF, Ostertag EM, Kazazian HH,Boeke JD, Moran JV (2001) Human L1 retrotransposition: cis pref-erence versus trans complementation. Mol Cell Biol 21:1429–1439

Yohn CT, Jiang Z, McGrath SD, Hayden KE, Khaitovich P, JohnsonME, Eichler MY, McPherson JD, Zhao S, Paabo S, Eichler EE (2005)Lineage-specific expansions of retroviral insertions within the ge-nomes of African great apes but not humans and orangutans. PLoSBiol 3:e110