New Gene Evolution: Little Did We Know

27
GE47CH15-Long ARI 31 August 2013 8:56 R E V I E W S I N A D V A N C E New Gene Evolution: Little Did We Know Manyuan Long, 1, 2, Nicholas W. VanKuren, 1, 2 Sidi Chen, 3 and Maria D. Vibranovski 4 1 Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois 60637; email: [email protected] 2 Committee on Genetics, Genomics, and Systems Biology, The University of Chicago, Chicago, Illinois 60637; email: [email protected] 3 Department of Biology and the Koch Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; email: [email protected] 4 Departamento de Gen´ etica e Biologia Evolutiva, Instituto de Biociˆ encias, Universidade de ao Paulo, S˜ ao Paulo, Brazil 05508; email: [email protected] Annu. Rev. Genet. 2013. 47:325–51 The Annual Review of Genetics is online at genet.annualreviews.org This article’s doi: 10.1146/annurev-genet-111212-133301 Copyright c 2013 by Annual Reviews. All rights reserved Corresponding author Keywords evolutionary patterns, evolutionary rates, phenotypic evolution, brain evolution, sex dimorphism, gene networks Abstract Genes are perpetually added to and deleted from genomes during evolution. Thus, it is important to understand how new genes are formed and how they evolve to be critical components of the genetic systems that determine the biological diversity of life. Two decades of effort have shed light on the process of new gene origination and have contributed to an emerging comprehensive picture of how new genes are added to genomes, ranging from the mechanisms that generate new gene structures to the presence of new genes in different organisms to the rates and patterns of new gene origination and the roles of new genes in phenotypic evolution. We review each of these aspects of new gene evolution, summarizing the main evidence for the origination and importance of new genes in evolution. We highlight findings showing that new genes rapidly change existing genetic systems that govern various molecular, cellular, and phenotypic functions. 325 Review in Advance first posted online on September 13, 2013. (Changes may still occur before final publication online and in print.) Changes may still occur before final publication online and in print Annu. Rev. Genet. 2013.47. Downloaded from www.annualreviews.org by Monash University on 09/23/13. For personal use only.

Transcript of New Gene Evolution: Little Did We Know

GE47CH15-Long ARI 31 August 2013 8:56

RE V I E W

S

IN

AD V A

NC

E

New Gene Evolution: LittleDid We KnowManyuan Long,1,2,∗ Nicholas W. VanKuren,1,2

Sidi Chen,3 and Maria D. Vibranovski41Department of Ecology and Evolution, The University of Chicago, Chicago,Illinois 60637; email: [email protected] on Genetics, Genomics, and Systems Biology, The University of Chicago,Chicago, Illinois 60637; email: [email protected] of Biology and the Koch Institute, Massachusetts Institute of Technology,Cambridge, Massachusetts 02139; email: [email protected] de Genetica e Biologia Evolutiva, Instituto de Biociencias, Universidade deSao Paulo, Sao Paulo, Brazil 05508; email: [email protected]

Annu. Rev. Genet. 2013. 47:325–51

The Annual Review of Genetics is online atgenet.annualreviews.org

This article’s doi:10.1146/annurev-genet-111212-133301

Copyright c© 2013 by Annual Reviews.All rights reserved

∗Corresponding author

Keywords

evolutionary patterns, evolutionary rates, phenotypic evolution, brainevolution, sex dimorphism, gene networks

Abstract

Genes are perpetually added to and deleted from genomes duringevolution. Thus, it is important to understand how new genes areformed and how they evolve to be critical components of the geneticsystems that determine the biological diversity of life. Two decades ofeffort have shed light on the process of new gene origination and havecontributed to an emerging comprehensive picture of how new genesare added to genomes, ranging from the mechanisms that generate newgene structures to the presence of new genes in different organismsto the rates and patterns of new gene origination and the roles of newgenes in phenotypic evolution. We review each of these aspects of newgene evolution, summarizing the main evidence for the origination andimportance of new genes in evolution. We highlight findings showingthat new genes rapidly change existing genetic systems that governvarious molecular, cellular, and phenotypic functions.

325

Review in Advance first posted online on September 13, 2013. (Changes may still occur before final publication online and in print.)

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

BACKGROUND ANDHISTORICAL OVERVIEW

Understanding how genes originate andsubsequently evolve is crucial to explaining thegenetic basis for the origin and evolution ofnovel phenotypes and, ultimately, biologicaldiversity. Gene origination is thus a widelyinteresting, yet difficult, problem to study.Perhaps unsurprisingly, the peculiar structures,functions, and evolution of evolutionarily newgenes have attracted the interests of pioneers ingenetics and evolution since the early twentiethcentury. Sturtevant (129) was one of the first toidentify a duplicated gene, the Bar duplicationin Drosophila melanogaster, from which Muller(103) developed the first prevalent model ofnew gene evolution in 1936. Muller (103,p. 529) predicted that a new duplicate copyof a gene could acquire a novel function andbe preserved in the genome, and further that“there remains no reason to doubt the appli-cation of the dictum ‘all life from pre-existinglife’ and ‘every cell from a pre-existing cell’to the gene: ‘every gene from a pre-existinggene.’” This early thinking on single-geneand whole-chromosome duplications (55) wasgreatly expanded in the 1970s. Ohno (112)further developed Muller’s model in 1970, andGilbert (52) proposed an entirely new modelof new gene formation in 1978, whereby piecesof unrelated genes can be recombined into newgenes rather than just be strictly duplicated.However, experimental work on new genes didnot begin until the early 1990s when a plausibleframework for experimental studies of new geneformation and evolution was proposed: studiesmust focus on genes that were recently formedbecause young genes still carry all the signa-tures of the evolutionary forces that shapedtheir origination and the evolution of their newstructures and functions (83). As genes age, theyaccumulate mutations that obscure the struc-tural or evolutionary signals from their earlyhistory (53, 79). Genes younger than 10–30 mil-lion years have not experienced much sequenceevolution and thus constitute a valid system inwhich to investigate the evolution of new genes

and to understand their properties. This ideawas first manifested in the discovery of jingwei,a three million-year-old gene in two species ofAfrican Drosophila (85). Jingwei revealed severalinteresting features of new gene evolution thatare now known to be general: (a) recombina-tion of existing genes, leading to a hybrid genestructure; (b) rapid sequence evolution drivenby positive selection; and (c) acquisition of newbiochemical functions (150, 162).

Today, it is clear that new gene origination isa general process in evolution and that species-specific or lineage-specific genes exist in many,if not all, organisms. Gigantic databases of ge-nomic sequences from thousands of species re-veal that genomes contain huge numbers anda large diversity of protein-coding genes. Forexample, the plant Glycine max genome en-codes more than 50,000 protein-coding genes,whereas the bacterial genome of CandidatusHodgkinia cicadicola contains only 189 genes.In addition, the abundance and diversity ofnon-protein-coding genes is only now begin-ning to be realized. Even genomes with similargene numbers can have very different, unrelatedgenes. These recent data reveal a widespreadprocess of birth and death of genes in organ-isms in which new genes enter the genome andold genes are lost. What mechanisms and forcesdictate gene birth and death? Specifically, howare new genes and novel functions added togenomes?

In the two decades since the discovery ofjingwei, there have been several hundred addi-tional publications reporting various interest-ing and significant observations of new genesand new gene functions in many different or-ganisms. Regrettably, we can only choose afew representative publications to sketch sev-eral lines of observation that can provide in-sight into an emerging, global picture of newgene evolution. We follow the growth of scien-tific information and underlying ideas and con-cepts in new gene evolution, beginning by dis-cussing the methods for identifying new genesand mechanistic processes of new gene forma-tion. We then describe the rates and patternsof new gene origination and evolution that may

326 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

Fixation: thepopulation geneticprocess by which amutation spreads to allindividuals in apopulation

Monophyletic group:a group of taxa thatshare a commonancestor

indicate some rules governing these processesand discuss the evolutionary forces that act onnew genes. Finally, we review the rapid growthof studies of the phenotypic effects of new genesand their impact on phenotypic evolution.

THE CONCEPT OF NEWGENE ORIGINATION

To understand various basic properties of newgene evolution, we need to have some concep-tion of the process of new gene origination andan operational definition for the process. Thisdefinition helps us explore methods for newgene identification.

The Process of New Gene Origination

New gene origination is a microevolutionaryprocess. A protogene structure is first generatedby a mutation in a single germ-cell genome.This protogene structure must then spreadthrough the population until it is fixed. Vari-ous evolutionary forces, such as natural selec-tion and genetic drift, govern the spread of theprotogene through the population, thus makingprotogene fixation a population genetic pro-cess. Both before and after fixation, the pro-togene accumulates mutations that confer on itnew structures and beneficial, sometimes novel,functions that are acted on by natural selection.From the point that the protogene carries anoptimized function and is fixed in the genome,it is essentially the same as most other, oldergenes in the genome and can be considereda new gene. New gene studies typically focuson these first two stages (the fixation processand acquisition of a beneficial function) and theconsequences of accepted mutations on the se-quence, structure, and function of the new gene.As the last section of this review shows, thesemicroevolutionary changes produce macroevo-lutionary changes in traits such as developmentand brain function.

Interest in new gene origination has raisedseveral general problems. What molecularmechanisms generate new gene structures?What are evolutionary forces that drive the

origination of new genes? How often are newgenes fixed in a species? Are there any rulesor patterns of new gene origination? What arethe roles of new genes in phenotypic evolution?This review provides an overview of efforts tounderstand the answers to these problems.

Approaches to Identifying New Genes

All new gene identification methods are basedon comparative analysis of the structures ofgenes and genomes. Within a group of closelyrelated species, we can define new genes asthose that are present in all members of amonophyletic group but absent from all out-group species (Figure 1). Early studies oftenserendipitously identified new genes by analyz-ing the phylogenetic distribution of genomicDNA Southern blot signals or via characteri-zation of small genomic regions (e.g., 85, 108).Microarrays (42, 44, 45) and especially next-generation sequencing (168, 169) have maderecent searches for new genes more purpose-ful efforts.

Multiple genomes. Syntenic alignments(Figure 1) of genomes can be used to identifynew genes from related species for which weknow the phylogenetic relationships. Syntenicalignments of each gene in each species allowidentification of genes that are present orabsent in one genome relative to another(Figure 1). In these comparisons, a gene canbe defined as a new gene candidate if it ispresent in a certain clade or single speciesand absent in all outgroup species (Figure 1).Additionally, the orthologous genes that flankthe new gene candidate appear in all species un-der consideration. This strategy has been usedwith great success in Drosophila and mammals(35, 168, 169, 172). New genes formed by dif-ferent mechanisms also have correspondinglydifferent structural features that can be usedto infer the mechanism of new gene formationand the ancestral and derived characters.

Single genomes. Duplicate genes withina single genome can be identified using

www.annualreviews.org • New Gene Evolution 327

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

S1

S2

S3

S4

G1 G2 G3a

SdicCdic AnnX

D. simulans

D. mauritiana

D. melanogaster

D. yakuba

b

Figure 1New genes are defined using syntenic and sequence comparisons between the genomes of a group of relatedspecies. (a) The general procedure to identify new genes. The relationship of species S1–S4 is shown by theblue tree. The relationships between the genes G1 ( yellow), G2 (red ), and G3 ( green) are shown within thespecies tree. Aligning the genomes of species S1–S4 shows that the new gene G2 is present in S1–S3 butabsent in S4, indicating that G2 arose in the common ancestor of S1–S3. G2 was thus generated in thegenome between old genes G1 and G3 in the common ancestor of S1, S2, and S3 (red star). (b) An example ofusing syntenic alignments to identify new genes. Sdic exists only in Drosophila melanogaster (110, 160). In thiscase, Sdic originated as a chimeric gene through recombination of duplicates of the two flanking genes, a 5′piece of Cdic encoding a cytoplasmic dynein intermediate chain and a 3′ piece of AnnX.

exhaustive pairwise comparisons between allannotated genes in that genome. Most mech-anisms to form new gene structures (see be-low) result in certain structural changes in thenew gene. For example, new genes created byRNA-based duplication (retrogenes) most of-ten lack introns, contain a stretch of adeninenucleotide at their 3′ end, and contain a pair ofshort flanking direct repeats. These signals fadewith evolutionary time. Betran et al. (11), Baiet al. (4), and Meisel et al. (100) took advantageof these new structures to identify new retro-genes in fruit flies; Wang et al. (147) in silk-worm; and Emerson et al. (43), Marques et al.(92) and Vinckenbosch et al. (144) in primatesand specifically humans. Divergence betweenthe new retrogene and the original gene fromwhich the retrogene was derived can be used todefine the age of the new genes using a molecu-lar clock. However, both strategies that we havediscussed so far can depend on the current an-notations, which are biased against the newestgenes, so caution must be taken when makingclaims about the presence/absence of genes indifferent genomes (167).

Predicting functionality of new genes. Itis desirable to predict whether candidate newgenes are functional before beginning more

laborious functional and phenotypic analyses.Comparisons of open reading frame length,transcription of new gene candidates, and sub-stitution rates between nonsynonymous andsynonymous sites (Ka versus Ks) and polymor-phism and divergence (60, 97) are often used topredict whether the new gene is functional. AKa/Ks ratio significantly lower than one (for sin-gle genome data, Ka/Ks < 0.5 in a comparisonbetween the new gene and its parental copy), forexample, indicates functional constraint actingon the new gene, which we would expect if dis-ruptive mutations were being prevented fromaccumulating in new protein-coding genes bynatural selection. These methods are widelyused as the first step to predict if a new gene islikely functional (e.g., 4, 11, 43, 147, 168, 169).

MECHANISMS TO FORM NEWGENE STRUCTURES

How are new gene structures formed? Mutationtoward a new gene structure is the first step ofnew gene evolution, and at least a dozen distinctmolecular processes are known that contributeto the formation of new genes. These mecha-nisms are covered in depth elsewhere (65, 84),so we only briefly touch on them here. We high-light several examples in Figure 2.

328 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

Pseudogenes: genesthat are thought tohave lost their abilityto code for afull-length protein

Gene Duplication

Gene duplication is thought to contribute mostto the generation of new genes. A single (or afew) new gene structure(s) can be formed at onetime by DNA-based duplication (the copyingand pasting of a DNA sequence from one ge-nomic region to another) or retroposition. Al-though DNA-based duplications are often tan-dem (134), retroposed genes most often moveto a new genomic environment (14, 15, 65, 172),where they must acquire new regulatory ele-ments or risk becoming processed pseudogenes.An important gene duplication mechanism iswhole-genome duplication (WGD), which hasoccurred multiple times in eukaryote evolu-tion, particularly in plants (126). Hundreds tothousands of duplicate genes are formed by aWGD event, and the vast majority of dupli-cates are quickly lost. However, estimates ofduplicate gene retention after WGDs in teleostfishes (∼15% after 350 million years) (16), yeast(∼12% after 80 million years) (68), and Ara-bidopsis (∼30% after 80 million years) (13) allsuggest that large fractions of duplicated locican be retained. We show below that there area variety of ways that new gene structures cansubsequently acquire new functions (2, 33, 61,78, 158, 170). McLysaght et al. (98) showed thatWGD may more easily generate new paralogs.

Alteration of Existing Gene Structures

New gene structures can be generated bymodifying existing genes, domains, or exons.Gilbert (52) proposed that exons and domainscould be recombined to produce new chimericgene structures (Figure 2a,b). Chimericproteins formed by gene recombination havebeen found in many organisms since theirdiscovery in the LDL receptor gene (86, 130),including yeast (133), Drosophila (85, 118, 119),Caenorhabditis elegans (67), mammals (92),and plants (151), and are estimated to havecontributed ∼19% of new exons in eukaryotes(see Reference 74 and references therein).In addition, retroposed sequences may jumpinto or near existing genes and recruit existing

exons, or be recruited into an existing codingsequence (164). Conversely, new gene struc-tures may be formed by splitting existing genes.Wang et al. (149), for example, found thatgene duplication is an intermediate stage inan evolutionary process leading to gene fission(Figure 2c). Okamura et al. (113) demonstratedthat frameshift mutations often generate newcoding sequences and found 470 human geneduplicates that had done so. Xue et al. (157)found that the Epstein-Barr virus contains anearly gene that undergoes frequent frameshifts,probably to combat host immunity. In addi-tion, divergence in alternative splicing patternsbetween duplicate genes can generate distincttranscripts that produce noncoding RNAs orpolypeptides with slightly or entirely differentfunctions and rapidly alter duplicate genestructures and functions (51, 57, 69, 163, 173).

De Novo Genes

New gene structures may arise from previouslynoncoding DNA (Figure 2d ). Chen et al. (24)were the first to show that antifreeze proteins,which bind and halt the growth of ice crys-tals in the blood of some polar fishes, werecreated by amplification of previously noncod-ing microsatellite DNA. Since then, a numberof de novo genes originating from noncodingregions have been identified in Drosophila (6,26, 75, 168, 172), humans (71, 153, 155, 169),primates (137), murine rodents (104), proto-zoa (159), yeast (17, 21), rice (154), and viruses(122). Similar to strict de novo gene origination,horizontal gene transfer (HGT), the exchangeof genes between genomes from distantly re-lated taxa, can immediately add new genes andfunctions to a genome (Figure 2f ). HGT is amajor mechanism for the addition of new genesto prokaryotic genomes (73, 111) but has alsobeen reported in a number of eukaryotic or-ganisms, including plants (8, 161), insects (102),and fungi (56) (Figure 2f ).

Noncoding RNAs

Not all new genes code for proteins. Noncod-ing RNAs were found to play an important role

www.annualreviews.org • New Gene Evolution 329

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

E1

E2

–E

13

E1

4E

15

E1

E2

–E

10

PSM

D4

PIP5

K1A

PIPS

L

Ch

r 1

Ch

r 1

0

Adh-

deri

ved

enzy

mat

ic d

omai

n

Dup

licat

ion

Dup

licat

ion

Ymp

Adh

Jing

wei

Pse

ud

oe

xon

s

Retr

opos

ition

Hyd

rop

ho

bic

do

mai

n

a

Read

-thr

ough

tran

scri

ptio

n

Reve

rse

tran

scri

ptio

n

Alte

rnat

ive

splic

ing

b

e

18

01

60

02

04

06

08

0

Tim

e (M

ya)

10

01

20

14

0

Δrps

2Δr

ps11

rps1

1

Betu

laCo

rylu

s

Ambo

rella

Sang

uina

ria

Actin

idia

Abel

iaO

xalis

Fagu

sCa

suar

ina

Apiu

mN

icot

iana

Sarr

acen

ia

Buxu

sPl

atan

us

Acor

usPa

ndan

usJu

ncus

Bocc

onia

Ranu

ncul

usCa

ulop

hyllu

m

Mag

nolia

Pipe

rAu

stro

baile

yaN

ymph

aea

Loni

cera

rps2

rps1

1

3' r

ps11

atp1

fP

Intl

Intl

P

Inte

gra

se

Att

Fore

ign

gene

?

Alu

DAF

DAF

Alu

B3

B1

B4

B1

B3

B3

B1

B4

B1

B3

150

100

50

0

250

200

400

350

300

0

600

100

200

300

1,000

1,100

1,200

1,300

1,400

1,500

1,600

400

500

700

800

900

mNSC

I

mNSC

I

cd

TA

GT

GA

TT

AG

GA

AT

G

TG

A

TG

A

Mu

tati

on

to

ge

ne

rate

CD

S(c

od

ing

se

qu

en

ce r

eg

ion

s)

Mo

use

ge

ne

EN

SMU

SG00

0000

7838

4

Mo

use Rat

Gu

ine

a p

igH

um

an

Mo

use Rat

Gu

ine

a p

igH

um

an

Mo

use Rat

Gu

ine

a p

igH

um

an

ATGCT-AACATACCCGGACTTTGCCGATCTCATTCTTGTGGATGGAAATGTTGGTGTTGA

ATGCTGAACATACCCGGACTTTGCCAATCTCATTCTTGTGGATGGAAATGTTGGTGTTGA

CTGCTGTACATACCCGGACTCTGCCAAACTCGTTCTTGTGGATGGAAATGTTGGTGCCAA

CTGCCACACATACCCGGACTTTGCCGATCTCGTCCTTGTGGATGGAGATGTTGGTGCCGA

GAGTGGTCACAGTGACCTGTCTCACGTAGGACACAGCGGGGCTACCCCGGTTCTCATTCT

GGGTGGTCACAGTGACCAGTCTCACATAGGACACGGCAGGGTTGCCTCGGTTCTCGTTCT

GGGCAGACACGGTGACACGCTTCACGTAGGACACGGCAGGGCTGCCTCGGTTCTCGTTTT

GGGCAGCCACGGTGACGACTCTCACGTAGGACACAGCAGGGTTGCCCCGGTTCTGGTTCT

TGGTTGTGACAGTGAAGGGAGTCAGGCCCTCGGCATTGACCCCAGGACAGAGCGTTCCTG

TGGTTGTGACAGTGAAGGGAGTCAGGCCCTCGGCATTGATCCCAGGACAGATTGTTCCTG

TGGTGGTGACAGTGAAGGGTGTCAGGCCCTCAGCACTGACCCCCGGGCAGCCCACTGCTG

TGGTGGTGACGGTGAAGGGTGTCAGGCCCTGGGTGCTGACCCCCGGGCAGCCAGTTGTTG

D. m

elan

ogas

ter a

nd

D. s

imul

ans m

kg(a

nce

stra

l ge

ne

)

D. m

aurit

iana

ance

stra

l mkg

(hyp

oth

eti

cal)

D. m

aurit

iana

mkg

-r3/

mkg

-p1

(ob

serv

ed

)

D. m

aurit

iana

mkg

-r3/

mkg

-p1

(pre

dic

ted

)

Dup

licat

ion

Com

plem

ent d

egen

erat

ion

Gen

e fis

sion

with

seq

uenc

e lo

ss

TA

GA

TG

TA

GA

TG

AT

GT

AG

AT

GT

AG

AT

GT

AG

AT

GT

AG

AT

GT

AG

330 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

in neuronal functions in the early 1990s (136).A large number of functional RNAs from non-coding regions have been reported to play vitalroles in a wide variety of organisms (7, 80). Mi-croRNAs appear to turn over rapidly, but canbe strongly influenced by positive selection (89,90, 109). Strikingly, Dai et al. (34) showed that anew long noncoding RNA influences courtshipbehavior in D. melanogaster. Pseudogenes areconventionally thought of as dead genes thatplay no functional roles (41), but they mayevolve functions in regulating expression of re-lated genes. Zheng & Gerstein (171) recentlyfound that many mammalian pseudogenesare transcribed and thus may still function.McCarrey & Riggs (96) predicted that pseudo-genes may regulate their parental genes, similarto long noncoding RNAs or miRNAs. An ex-plicit mechanistic model of the use of pseudo-gene transcripts as decoys for cross-regulatingexpression of target genes was actually proposedand tested by Marques et al. (93, 94).

New Gene Regulatory Systems

New genes must acquire a specific transcrip-tion regulatory system to ensure certain tempo-ral and spatial expression patterns. Betran et al.(10) investigated the origin of the male-specificexpression of Dntf-2r, a retroposed gene in theD. melanogaster–Drosophila simulans clade. Thenew retrogene did not contain the parental pro-moter but had acquired a new β2-tubulin-like

promoter by recruiting a novel 5′ regulatory se-quence. This regulatory sequence drives testis-specific expression of β2-tubulin and appearsto still do so for Dntf-2r. In addition, the newretrogene Xcbp1 recruited existing neuron pro-moters present at its site of integration (29).This co-opted mode of promoter recruitmentis also observed in human retrogenes (144) andmay be a general mode for retrogene promotergain (65). Additionally, Ni et al. (107) observedthat eight new genes essential for Drosophila de-velopment evolved binding sites for the CC-CTC binding factor (CTCF) insulator underpositive selection, ensuring the delineation ofthe regulatory domains of these genes.

Transposable Elements

Transposable elements (TEs) can contribute tofunctional divergence between duplicate genesthrough several methods, all similar to those de-scribed above (12). For instance, TEs can me-diate gene recombination by carrying codingsequences from one part of the genome to an-other (63, 158) and can even themselves be in-corporated into existing coding sequences (46,88, 106). In addition, TEs were recently foundto be a source of micro-RNAs, which are ma-jor components of posttranscriptional regula-tion of expression (116).

Although we still have a developing pictureof the contributions of each of these mecha-nisms for new gene formation in different taxa,

←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 2Representative new genes exhibiting various new gene origination mechanisms. (a) Jingwei, a new gene found only in Drosophila teissieriand Drosophila yakuba, was generated by a combination of retroposition, DNA-based duplication, and gene recombination, whichformed a chimeric gene consisting of Adh-derived enzymatic domain and a hydrophobic domain from Ymp (85, 150). (b) PIPSL inhumans is a consequence of gene fusion between two adjacent ancestral genes by read-through transcription and subsequentcoretroposition (164). (c) Gene fission split the ancestral gene monkeyking into two distinct genes in Drosophila mauritiana, revealing anintermediate process of gene fission aided by gene duplication and complementary degeneration (149). (d) The geneENSMUSG00000078384 in mouse revealed the evolutionary process of de novo gene origination (104). Red boxes are ancestral stopcodons (TGA) with two triangles showing the positions of the enabling mutations, including a substitution and a deletion. (e) Two newgenes in humans, DAF and mNSCI, were generated by domesticating transposable elements, Alu, and short interspersed elements(B1–B4) (91, 106). DAF and Alu elements together make an interesting case in which alternative splicing generated a new isoform inthe mammalian genome. ( f ) Horizontal gene transfer (HGT) is prevalent in bacteria with mechanisms such as homologousrecombination (111). Antibiotic resistance genes can be acquired by host genomes containing the intl gene (which encodes integrase), arecombination site (att), and a promoter to express the captured gene, as depicted by the process shown in the three panels on the left.

www.annualreviews.org • New Gene Evolution 331

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

work in humans and Drosophila suggests that∼80% of genes are formed by DNA-based du-plication, 5% to 10% by de novo duplication,and ∼10% by retroposition (168, 169). And al-though these mechanisms may generate the ini-tial gene structures, many new structures (in alarge variety of taxa) undergo radical structuralrenovation to change exon-intron structure andeven recruit new or existing coding sequencesinto the new locus (30, 49, 151, 172).

Evolution of Transcription Units

Other than the origination and evolution of themacrostructure of genes described above, it wasrecently found that the transcription units inthe genes of vertebrates have been direction-ally evolving toward a productive transcription.Almada et al. (1) reported a highly significantlinear correlation between the gene age and thecritical signals to define transcription units in agene, including the U1 small nuclear ribonu-cleoprotein recognition sites and polyadeny-lation sites (PASs). The observed incrementalgain of the U1 sites and gradual loss of PASsin the 5′ end of protein-coding genes revealeda selection for a U1-PAS axis for productivetranscription.

ABUNDANCE ANDORIGINATION RATESOF NEW GENES

The advent of whole-genome sequences formany organisms allowed identification of manynew DNA-based and RNA-based duplicategenes (e.g., 11, 43). With more genome se-quences available, especially in closely relatedgroups such as the twelve Drosophila species(32), it became possible to investigate the ratesof new gene origination in particular lineages.We review these findings in Drosophila, mam-mals, and plants. There have been no re-ports of new gene origination rates for mech-anisms other than DNA-based duplication,RNA-based duplication, de novo origination,and gene recombination. Thus, the rates of new

gene origination we highlight should be viewedas serious underestimates.

Drosophila

The first estimate of the rate of new gene orig-ination was made for retrogenes in Drosophilain 2002 by Betran et al. (11), who identified∼150 retrogenes in D. melanogaster (4, 11) thatarose after the divergence of the Drosophila andSophophora subgenera approximately 50 Mya.Their estimate of three new retrogenesper million years in the lineage leading toD. melanogaster was corroborated by an inde-pendent estimation of ∼1.5 new retrogenes permillion years based on cDNA hybridiza-tion against salivary polytene chromosomesin species in the D. melanogaster subgroup(∼25 million-year-old) (158). Zhou et al. (172)computationally estimated via DNA-basedduplication, retroposition, de novo origination,and gene recombination new gene originationrates in the D. melanogaster subgroup to be5–11 new genes per million years and founddifferent rates for the four mechanisms. In par-ticular, approximately 80% of new genes addedto the D. melanogaster lineage genome weregenerated by DNA-based duplication. Moreextensive and detailed analyses of DNA-basedand RNA-based duplicates were conducted byVibranovski et al. (142), Meisel et al. (100), andZhang et al. (168). Zhang et al. (168) analyzedthe 12 Drosophila genomes and estimated that∼17 duplicate genes per million years arosein the Drosophila genome. Figure 3a showsthe distribution of these new genes on theDrosophila phylogeny.

Mammals

Emerson et al. (43) and Marques et al. (92) iden-tified ∼120 retrogenes in the human genome,yielding an estimated retrogene origination rateof one retrogene per million years in the lin-eage leading to humans. Zhang et al. (166, 169)systematically identified new genes in verte-brates, especially in primates, and showed thatthe rates of new gene origination are variable

332 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

D. m

elanogaster

D. sechellia

D. sim

ulans

D. yakuba

D. erecta

D. ananassae

D. persim

ilis

D. pseudoobscura

D. w

illistoni

D. grim

shawi

D. m

ojavensis

D. virilis

Branch 0

Branch 1

Branch 2

Branch 3

Branch 4

Branch 5

Br. 6

40

35

25

11

6

3

Mya

284

68

154

161

220

11,909

60

a Drosophila

Hu

man

Ch

imp

Oran

gu

tan

Rh

esu

s

Marm

ose

t

Mo

use

Gu

ine

a pig

Do

g

Co

w

Arm

adillo

Te

rec

Op

ossu

m

Platyp

us

Ch

icken

Lizard

Frog

Fug

u

Ze

brafi

sh

Branch 0

Branch 1

Branch 2

Branch 3

Branch 4

Branch 5

Branch 6

Branch 8

Branch 9

Branch 10

Branch 11

Br. 12

450

370

310

220

160

100

70

43

25

13

6

Mya

389

447

392

286

314130130130

336

1,214

945

1,018

1,393

1,013

12,058

b Vertebrates

Figure 3The phylogenetic distribution of new gene origination events in (a) Drosophila and (b) vertebrates. These genes were generated byDNA-based duplication, retroposition, and de novo origination (168, 169). The number of new genes that originated in each timeperiod is shown above the branch. For example, in a, branch 1 shows that 220 genes originated between 36 and 41 Mya in Drosophila. Inb, red numbers are new genes that originated in the hominoid branches or specifically in humans.

www.annualreviews.org • New Gene Evolution 333

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

in different evolutionary stages of vertebrates(Figure 3b), although 25–30 genes generatedde novo and by DNA-based and RNA-basedduplication arise per million years. Interest-ingly, this rate is much higher on the branchescloser to human (66 new genes per million yearsin the human lineage alone) (166).

Plants

In contrast to flies and mammals, Zhanget al. (165) reported that 0.6 retrogenes permillion years arose in the Arabidopsis thalianagenome, a rate comparable to Populus (174),and a microarray-based study in Arabidopsisidentified 94 new genes created by DNA-basedduplication and retroposition (45). Surpris-ingly, Wang et al. (151) found that a very highrate of retrogene and chimeric gene originationwas present in rice: More than 1,000 retrogeneswere identified in the rice genome, 380 ofwhich evolved chimeric gene structures byrecruiting previously existing genes into theirgene structures. These authors determined therate of chimeric gene origination to be 7 permillion years in grass genomes in the lineageleading to rice, 50 times the origination rate ofchimeric genes in humans (144), and the high-est rate of chimeric gene origination known.In addition, Jiang et al. (63) identified morethan 3,000 gene recombinants in rice mediatedby Pack-Mutator-like transposable elements(Pack-MULEs). These results suggest a hugepotential for protein diversity in plant genomes.

Along with these extensive studies inDrosophila, mammals, and plants, there havebeen many valuable investigations of chimericgenes and retrogenes in Caenorhabditis elegans(66), fish (25, 49), silkworm (147), and chicken(62).

Copy Number Variation

Inexpensive whole-genome analysis has alsomade it possible to identify genes at thevery earliest stages of their evolution, beforefixation. Abundant copy number variation(CNV) of individual genes has been detectedin Drosophila (40, 42, 124), humans (47), mouse

(54), and C. elegans (81). Dopman & Hartl (40),Emerson et al. (42), Cardoso-Moreira & Long(20), and Cardoso-Moreira et al. (19) identifiedmore than 1,000 partial and 100 completegene duplications/deletions in just 15 strains ofD. melanogaster relative to the referencegenome using microarray hybridization.In addition, next-generation sequencingand microarrays have identified more than1,200 partial and 600 complete gene du-plications/deletions in 179 individual humangenomes relative to the reference genome (101,125). The recent sequencing of 43 genomes intwo D. melanogaster populations detected moreCNVs, including 2,588 duplications and 3,336deletions relative to the reference genome (74).The large number of new genes segregating inpopulations is just now beginning to be appre-ciated and investigated further. An active areaof research will be to perform functional andstatistical analyses of these new genes to under-stand their earliest stages of evolution. In all,these studies have shown that new gene origina-tion rates can differ between taxa, yet are appre-ciable in all groups studied. These results fur-ther strengthen the conclusion that new geneorigination is a general evolutionary process.

PATTERNS OF NEW GENEORIGINATION

Gene Traffic in Drosophila, Humans,and Other Organisms

With the large number of new genes identifiedin various organisms, researchers were ableto investigate statistical patterns of new genecharacteristics to explore the mechanistic andevolutionary forces that impact the formation,origination, and evolution of new genes. Betranet al. (11) examined the chromosomal distri-bution of retrogenes and their parental copiesin D. melanogaster (Figure 4a). Surprisingly,these authors found a significant excess ofautosomal retrogenes derived from X-linkedparental genes (X→A) and a significantdeficiency of retrogenes formed in the oppositedirection (A→X) or between autosomes

334 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

b Humans

Autosomes

X

299%299%299%260%260%260%

–10 ~ –12%

Excess maleExcess malebiased functionsbiased functionsExcess malebiased functions

Excess non-sex andExcess non-sex andfemale functionsfemale functions

Excess non-sex andfemale functions

a Drosophilia

X

2 332

4–39% –39%

–33%

Excess malebiased functions

114%

Excess malebiased functions114%

Figure 4Retrogene traffic in (a) Drosophila (11, 142) and (b) humans (43). Each arrow indicates the movement of retrogenes from the parentalgene chromosomal location to the retrogene’s location. The size of the arrow indicates the intensity of gene movement betweenchromosomes, and the percentages show quantitatively the excess of movement over the null expectation (random origination andinsertion). The functions of the retrogenes are indicated.

(A→A). Bai et al. (4) further revealed thatretrogenes derived from autosomal parentalcopies tend to locate to the same chromosomeas the parental copies. However, 42 out ofthe 43 retrogenes exhibited X→A movement;only one retrogene moved X→X. These twoobservations clearly reveal a striking patternof new gene origination in flies: Retrogenesderived from X-linked genes prefer to copy intoautosomes. This directional movement of newgenes is called gene traffic (43). These resultshold in the 12 sequenced species of Drosophila(100, 142) and in Anopheles gambiae (5, 138).Interestingly, 90% of X→A retrogenes inD. melanogaster are expressed in testis, a signif-icantly higher proportion of testis-expressedgenes than average (11), suggesting that the ret-rogene’s function (in this case, male-beneficialfunction) can influence its relocation. Thesymmetric pattern was observed in silkworm,which has ZW sex determination (femalesare ZW and males ZZ), whereby genes retro-posed from Z→A tend to be ovary expressed(147). Gene traffic appears to be general inDrosophila for different mechanisms of newgene formation, as Vibranovski et al. (142) alsoshowed that new genes created by DNA-basedduplication exhibit the same X→A movementand testis expression. Moreover, the neo-X

chromosome, an autosomal chromosome armthat fused to the ancestral X chromosome inthe Drosophila genus evolution, also shows thesame excess of gene traffic (100, 142).

Relative to Drosophila, human and mousestudies revealed similar yet distinct patternsof gene traffic (43). Compared with a neu-tral expectation based on the chromosomaldistribution of processed pseudogenes, whichare expected to be evolving neutrally, there isan excess of X→A retrogene movement andmost X→A retrogenes exhibit testis expres-sion. However, there is also a significant ex-cess of A→X retrogene movement in human,and these A→X retrogenes exhibit either fe-male expression or unbiased expression. A→Amovement is very low in humans (43). Themouse genome shows a very similar pattern.Zhang et al. (166, 168) have shown that thesepatterns exist for DNA-based duplicates, retro-genes, and de novo genes in Drosophila, humans,and mouse.

Consequences of Gene Traffic forGenome Evolution

If gene traffic has been historically impor-tant for genome evolution, the majority oftestis-biased/male-biased genes should be

www.annualreviews.org • New Gene Evolution 335

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

MSCI model:X chromosomeinactivation duringspermatogenesis favorsrelocation of genesinvolved inspermatogenesis toautosomes

autosomal, contrary to the previous conclusionthat the X was a hotbed for male-biased genes(148). Several microarray-based studies ofmale-biased genes and their chromosome loca-tions by Ranz et al. (117) and Parisi et al. (114)in Drosophila, Khil et al. (70) in mouse, and laterby Zhang et al. (166) in Drosophila, humans,and mouse have confirmed this prediction. InDrosophila, Zhang et al. (168) showed a smoothtransition of new male-biased genes from Xlinkage to autosomal linkage over evolutionarytime.

Models to Interpret the Causes ofGene Traffic

In general, models to explain gene traffic, andexperimental evaluation of those models, showthat natural selection is a major force govern-ing gene traffic but that mutational processeslikely also play a role (38). Meiotic sex chro-mosome inactivation (MSCI) in the male germline (11, 43, 139, 140), dosage compensationin the heterogametic sex (3, 143), sexual an-tagonism between male- and female-beneficialgenes (22, 128), and meiotic drive (131, 132)have all been implicated in driving gene traf-fic. The relative role of each of these forces hasbeen hotly debated. MSCI has a strong effect inmammals (70), and experimental evidence forMSCI in Drosophila comes from several studies(59, 139, 140). Vibranovski et al. (139) showedthat genes that are highly expressed in themeiotic phase of spermatogenesis (when the Xchromosome is predicted to be inactivated) aresignificantly enriched on the autosomes. Con-versely, genes expressed in the mitotic phases ofspermatogenesis are randomly distributedthroughout the genome. Other studies sug-gest reduced expression throughout spermato-genesis, including in the spermatogonia, whichalso discredits dosage compensation models(99; however, see 141). A clear-cut single celltranscriptome is needed to clarify these issues.Along with the MSCI model, other non-germ-line-based models, e.g., sexual antagonism, arealso necessary to interpret the expression of newgenes in the male somatic cells, although these

models need to be rigorously experimentallytested.

Correlation Between Gene Ageand Expression

Early studies revealed a connection be-tween the expression and the ages of newgenes. Betran & Long (10) showed thatDntf-2r, a ∼10 million-year-old gene in theD. melanogaster subgroup, is expressed onlyin testis; however, its parent Dntf-2 is ex-pressed ubiquitously. Almost all retrogenes inDrosophila appear to have testis expression (4)and to have maintained testis-biased or testis-specific expression independent of age (50).Vinckenbosch et al. (144) showed that new hu-man retrogenes are often transcribed in testisand later evolve stronger and more diversespatial expression patterns, coining the “out ofthe testis” hypothesis. Whether or not the testisis the starting point for new genes, a generalsurvey of the expression patterns for new genesthat originated within vertebrates revealedstrong positive correlation with the age in bothtranscription intensity and spatial expression(167). It is possible that this testis-biasedpattern of retrogene expression is due to ourinability to detect genes expressed at low levelsin different tissues, but this issue should beresolved soon with advances in next-generationsequencing.

EVOLUTIONARY FORCESACTING ON NEW GENES

Evolutionary forces, such as natural selectionand genetic drift, operate on both facets of newgene evolution: the fixation of new gene loci andtheir acquisition of a beneficial function. Thesetwo facets may overlap. In this section, we dis-cuss theoretical models developed to describehow new genes arise and acquire novel func-tions as well as general approaches to studyingnew genes and the selective forces that act onthem.

336 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

Neofunctionalization:the process by which anew gene acquires anovel function

Selective Models of NewGene Evolution

Muller (103) was among the first to recog-nize the potential importance of duplicategenes in evolution. He proposed a simplemodel whereby new duplicate genes couldacquire novel, beneficial functions distinctfrom those of the original copies. Ohno (112)elaborated on Muller’s model and namedthe fate Muller described as neofunctional-ization. However, Ohno also predicted thatduplicate genes are most often inactivatedand become pseudogenes. This classic modelassumes that the new gene is functional uponduplication and that the new gene subse-quently acquires mutations that provide anovel beneficial function. The novel functionis then preserved in the genome by naturalselection.

However, strictly duplicate genes areredundant, and beneficial mutations are ex-tremely rare. How do new duplicate genesremain in the population long enough to accu-mulate a beneficial, selected mutation(s)? Thisproblem led to the development of models thatpredict selective preservation of both copiesat all stages of their evolution: adaptiveradiation (AR), innovation-amplification-divergence (IAD), and escape from adaptiveconflict (EAC). The AR model proposes thatgene duplication itself is favored, e.g., for in-creased dosage of a gene product, and that thenew duplicates then undergo functional radia-tion (48). Thus, AR posits that novel functionsare acquired after duplication. IAD and EAC,in contrast, propose that ancestral loci developnovel beneficial secondary functions beforeduplication (9, 36). Under IAD, repeated geneduplication is favored to increase the dosageof the novel secondary function. Differentduplicates are then free to optimize the ances-tral or novel secondary function, and only thetwo best copies are retained in the genome.The increase in the number of duplicate geneswithin the AR and IAD models also providesadditional targets for beneficial mutations,thus increasing the probability and speed of

functional improvement. EAC predicts thatthe bifunctional ancestral gene is subject toselection before gene duplication, that adaptiveconflict between the ancestral function andthe new function constrains improvement ofthe selected function(s) before duplication,and that adaptive changes and functionalimprovement occur in the daughter genes afterduplication.

For additional information on duplicategene evolution, see Conant & Wolfe (33), whosuggest that preservation of new genes stemsfrom the co-option of existing functions to servenew purposes, and Walsh (145, 146), who givesa detailed mathematical description of the mod-els and relative probabilities of neofunctionali-zation and pseudogenization.

Examples of EAC (36), IAD (105), and AR(48) have been published, and each model hasspecific predictions for what we should observeif a new gene originated by each process (33).However, none of these models can be used as astatistical framework for rigorously testing theroles of evolutionary forces in new gene orig-ination. Classic molecular population genetictests based on nucleotide substitution patternsand allele frequency spectra do provide thisframework and have been used extensively todetect selection on new genes. These tests,such as the M-K (McDonald-Kreitman) test(97) and the HKA (Hudson, Kreitman, andAguade) test (60), detect elevated rates of aminoacid substitutions (M-K) or reduced effectivepopulation size (HKA) at loci. In addition,Thornton (135) introduced a coalescent-basedmodel that can be used to test for selectionon CNV. The HKA test and Thornton’s testcompare measurements of nucleotide variationin genes with a distribution of parameter valuesderived from neutral coalescent simulations.Thus, the M-K, HKA, and Thornton’s testsare used to test the classic model. Each ofthese five models (classic, AR, IAD, EAC, andstatistical) predicts that new genes should ex-perience strong natural selection after they areformed. We now discuss some of the evidenceindicating that this often appears to be thecase.

www.annualreviews.org • New Gene Evolution 337

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

c

In the clade ofD. subobscura-guanchi

0.01

JingweiKS = 0

KA = 9

In the clade ofD. teissieri-yakuba

Adh

KS = 0

KA = XX

Adh

In the clade ofD. hydei-mettleri

0.01Adh-Finnegan

Adh

0.1

KA = XX

KS = 0

Siren

In D. ananassae andD. bipectinata complex

0.01Adh-twain

KS = 0

KA = 12Adh

30/4

D. simulans D. melanogaster D. simulans D. melanogaster

Fixed retrogenes originatingon autosomes/the X Polymorphic retrogenes

originating onautosomes/the XParental

genes

a65/32

36/3

Chimpanzee Humans Chimpanzee Humans

Fixed retrogenes from A Xor X X copying over the

retrogenes from A Xor X A copying

Polymorphic retrogenes fromA A or X X copying overthe retrogenes from A X orX A copyingParental

genes

70/20

D. teissieri D. yakuba D. teissieri D. yakuba

b

2/192/80/18Adh Jingwei4/11 21/16

9/100/0

Retroposition

Figure 5Positive Darwinian selection acting on new genes. (a) Positive selection for the fixation of new retrogenes in Drosophila (124) andhumans (123). The numerator and denominator show the numbers of retrogenes that originate on the autosomes and the X,respectively. Tests based on the M-K framework indicate an excess of fixed X→A retrogenes in both species and strong positiveselection for X→A retrogene movement. (b) The jingwei ( jgw) gene in Drosophila (85). The ratios over the branches are the numbers ofnonsynonymous changes over the numbers of synonymous changes, and the ratios in the triangles are the ratios of divergence betweenthe species and the polymorphisms. M-K tests and Ka/Ks ratios indicate strong positive selection acted on jgw shortly after itoriginated. (c) Selection acted on all Adh-derived chimeric genes in Drosophila (64), as indicated by elevated Ka/Ks ratios.

Fixation of New Genes Within Speciesand Populations

The first study to identify signatures of selec-tion on a new gene journeying to fixation wasperformed by Llopart et al. (82), who analyzeda new variant of the jingwei gene in Drosophilateissieri, which lost its second intron. ThisD. teissieri–specific intron presence-absencepolymorphism exhibits a significant excess ofrare alleles and patterns of nucleotide polymor-phism that is consistent with moderate naturalselection driving the polymorphism to fixation.Selection has also been detected on CNV in

D. melanogaster and other organisms. Emersonet al. (42) found a genome-wide pattern consis-tent with strong purifying selection on all CNVexcept duplications of whole genes. That is,single-gene duplications are under significantlyweaker purifying selection than partial gene du-plications or partial or complete gene deletions.Similarly, Schrider et al. (123, 124) showed asignificant excess of fixed versus polymorphicretrogene CNV originating from the X chro-mosome in both Drosophila and humans, indi-cating that natural selection governs the pat-terns of retrogene CNV evolution (Figure 5a).

338 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

Overall, these studies show that natural selec-tion can play a key role in driving new genesto fixation. In addition, they highlight the useof classic population genetic tests in determin-ing whether selection acts on new genes duringtheir journeys to fixation.

Selection on Sequence Changesin New Genes

In addition to studies of the evolutionary forcesgoverning the fixation of new genes, many stud-ies have investigated the effects of selection anddrift on new gene sequences. Long & Langley(85) showed that the new chimeric gene jingweiin D. teissieri and Drosophila yakuba contains asignificant excess of nonsynonymous substitu-tions compared with nonsynonymous polymor-phisms (relative to the ratio of synonymous sub-stitutions to polymorphisms), indicating thatamino acid substitutions were rapidly driven tofixation shortly after the origination of jingwei(Figure 5b). Similarly, Nurminsky et al. (110)showed that a D. melanogaster–specific genefamily, Sdic, involved in sperm motility rapidlyacquired a new exon-intron structure and testis-specific expression (Figure 1). Sdic is a chimericgene composed of a 5′ piece of Cdic, encodinga cytoplasmic dynein intermediate chain, and a3′ piece of AnnX, a phospholipid binding pro-tein. This fusion protein underwent rapid struc-tural renovations, including the conversion of aCdic intron into an exon and an AnnX exon andCdic intron into a testis-specific promoter. Lowlevels of sequence polymorphism, preservationof coding potential, and the absence of Sdic inother closely related species suggest that Sdicwas rapidly swept to fixation.

These first discoveries sparked searchesfor general evolutionary patterns in newgenes. Jones & Begun (64) searched for com-mon patterns in the evolution of three Adh-derived chimeric genes in different lineages ofDrosophila. All three new genes quickly accu-mulated a large number of amino acid replace-ment substitutions, several at identical aminoacid sites, in the Adh-derived region shortly af-ter they arose. Strikingly, Jones & Begun (64)

and Shih & Jones (127) showed that differ-ent Adh-derived fusion genes often accumulatemutations at the same sites, regardless of towhich other gene they have fused (Figure 5c).In addition, each of the four Adh-derived fu-sion genes exhibits strong signals of acceleratedamino acid substitution using classic populationgenetic statistical tests (e.g., M-K test).

Some of these observations have recentlybeen borne out by genome-wide studies. Xuet al. (156) surveyed structural differences be-tween more than 600 paralogous pairs of genesin plants and found that most new genes un-derwent radical changes in exon/intron contentand boundaries as well as insertion/deletions.And using molecular population genetic tests,Chen et al. (30) found that young genes inD. melanogaster show strong signals of selection.These authors predicted that ∼25% of aminoacid substitutions in young essential genes werefixed by natural selection. In addition, this sig-nal of selection diminishes as genes grow older.Altogether these studies indicate that there aregeneral patterns to new gene evolution: Newgenes often undergo rapid (or immediate) struc-tural and sequence renovations and expressionpattern changes that are driven by strong natu-ral selection.

Analysis of New Gene Structureand Function

In addition to analyses of new gene frequen-cies and nucleotide changes, many groups haveinvestigated the evolutionary forces acting onnew genes by analyzing new gene functions, ge-nomic locations, or expression patterns. Thiscomplementary approach has revealed severalfundamental patterns of new gene origination.Chen et al. (24) and Cheng & Chen (31), forexample, investigated the antifreeze proteinsfound in the blood of several orders of Arc-tic and Antarctic fish. These proteins inde-pendently evolved in the different orders, yetthey consist of nearly identical tripeptide re-peats. These tripeptide repeats were generatedde novo by amplification of short nucleotidesequences. These studies showed that similar

www.annualreviews.org • New Gene Evolution 339

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

environmental pressures may favor the genera-tion of genes with similar functions.

In addition, as we showed in the pre-vious section, testis-biased genes are under-represented on the D. melanogaster and mam-malian X chromosome. Diaz-Castillo & Ranz’s(38) analysis of the genomic location of genesrelative to the position of chromosome domainsduring spermatogenesis led the authors to alter-natively propose that the enrichment of testis-biased retrogenes on the autosomes is causedby an increased availability during spermato-genesis of open chromatin domains that con-tain testis-expressed genes. This larger targetfor retrogene integration allows a higher pro-portion of these retrogenes to acquire testis-biased expression. These general observationsof the location of sex-biased genes, and theirgeneral movement off of the X chromosome,indicate that differences in expression alone candictate where in the genome new genes origi-nate. Together, these results show that stud-ies of general patterns of extant gene locations,structures, and expressions can be informativeof new gene origination and evolution.

PHENOTYPIC EFFECTSOF NEW GENES

Studying the roles of new genes in phenotypicevolution recently became feasible with the ad-vent of sophisticated genetic tools and molec-ular techniques as well as significant progressin related areas of important phenotypes in bi-ology. Young genes are often assumed to bedispensable because important functions arethought to require a long evolutionary periodto be developed and optimized (76). However,studies in the past decade have found numerous

young genes with important, and sometimes es-sential, functions at the molecular, cellular, andindividual level (27).

Biochemical Pathways

New genes can generate new biochemicalpathways and products if they are enzymes orbecome enzymes. Zhang et al. (162) showedthat jingwei evolved the capacity to catalyzebreakdown of long-chain alcohols in D. yakubaand D. teissieri, whereas the parent Adh canonly act on short-chain alcohols. In Arabidopsis,Weng et al. (152) and Matsuno et al. (95)demonstrated that three recently evolvednew duplicate genes from the P-450 family,Cyp98A9, Cyp98A8, and Cyp84A4, assembledtwo new biochemical pathways related tophenolic metabolism that are required forpollen development and α-pyrone synthesis.

Gene Expression Networks

New genes can also be quickly integrated intoexisting gene networks. Chen et al. (30) ob-served that almost all young essential genes havebeen assimilated into protein-protein physicalinteraction networks in Drosophila, and a signif-icant number of these young genes have de-veloped multiple interactions with old genes(Figure 6). Integration appears to be drivenby natural selection. Several new genes havebecome new hubs. Analysis of one new gene,Zeus, derived from the DNA-binding pro-tein Caf40 via retroposition (28), revealed thatit retained ∼30% of Caf40’s DNA-bindingsites. However, in a short evolutionary period(4–6 million years) Zeus acquired 193 new bind-ing sites through which it activates or represses

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→Figure 6New genes integrated into gene networks and reshaped those networks. (a) New yeast genes that originated through duplication-based(blue) and non-duplication-based (red ) mechanisms since the recent whole-genome duplication (<100 Mya) were integrated into thephysical interaction network (18). The orange box highlights a module composed of two new genes involved in the pathway to formand process actin. DID4 ( green box) interacts with 13 new genes within a few steps. (b) New genes form hubs in protein-proteininteraction networks (30). (c) The Drosophila melanogaster–Drosophila simulans-specific gene Zeus quickly accumulated more than 100amino acid substitutions in its nucleotide-binding domains under positive selection. Consequently, it evolved into a new DNA-bindingmotif that evolved hundreds of new gene links to rewire the gene networks that control reproduction (28).

340 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

a

YPS5

CDC36MCM21

ERD1

YHL042WDID4

SBE2

YJL070C

TMA17

RSF1

YLR125W

PAM18

YLL056C

PFA3

YLL023C

YEL057C

HUA2

NDC1

SPG3CRS5

YCL049C

YGR035CNIP1

YER121W

TCP1YLR030W

PAU16

GCN3

YSC84

LSB3

GCD7

CAT2

CNM67

MUK1

ADY3

UBP15

GTS1

YPL257WALD5

YIL092W

QNS1

NAB2

HSP150RAD3

YBL044W

YNL046W

YBR184W

CPR8

YNR040W

EAF6

YPR096C

DDR48

YGL010W

TMN2THP2

BSP1SLA1

IRC10

ABP1

SLA2

YER186C

YDL118W

4–6 Mya after generatingZeus through retroposition

107 amino acidsubstitutions in Zeus

Zeus has created 193 new gene links andkept only 30% (129) of ancestral links of caf40

Zeus

c

Caf40

AGC

ATC

AT

AT

ATC

ATCG

GC G

CT G G

AT T

GC

GAC

GCA

GCT

GCA

GCT

CGA

CGA

TAC

GAC

1

1Bits

2

02 5 8 17 203 6 12 16 18 19 2113 147 9 10 15114

Nucleic acidbinding groove

Nucleic acidbinding groove

Bits

GAC

AT

ATC

ATC

ATG

ATG G

CGC A

GC

GC

GAC

GCT

G

GC

AAC

TGAT

CGT

AGC

GCT

ATC

GCA

1

1

2

02 5 8 17 203 6 12 16 18 19 2113 147 9 10 15114

b

www.annualreviews.org • New Gene Evolution 341

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

hundreds of downstream genes involved inreproduction. This observation indicates thatgene expression networks can be rapidly andglobally reshaped in evolution by new genes.Li et al. (77) showed that a de novo gene inyeast can suppress a previously existing matingtype–control pathway, thus rewiring the struc-ture of gene networks in the species. Capraet al. (18) revealed that new genes in yeast be-come more integrated into cellular networksover time. The modified networks are not nec-essarily novel or unimportant, either: Konikoffet al. (72) found that genes have been contin-ually added and removed from the Wnt andTGF β-signaling pathways, ancient networksinvolved in animal development.

Development

Surprisingly, new genes can quickly acquireessential roles in development. Chen et al.(30) identified 59 genes that originated inthe past ∼35 million years in Drosophila thatevolved essential developmental functions.Silencing expression of these young genescauses development failure in early to latepupae and in some cases at even earlier stages(Figure 7a,b). Furthermore, tissue-specificknockdown of these young genes can causemorphological defects in adult flies. Silencingnew genes can also have a critical effect onreproduction, even when the individual cancomplete development. The duplicate gene nsr(novel spermatogenesis regulator) exists only inthe four species of the D. melanogaster cladethat diverged 3 Mya, yet it evolved an essentialfunction required for sperm individualization(39). Similarly, silencing Zeus, a gene in thesame group of Drosophila, causes sterility bydisrupting testis and sperm development (28).

Recent work on Umbrea, a 12–15 million-year-old gene in Drosophila, carefully dissectedthe evolutionary steps this young gene tookto becoming essential in D. melanogaster (121).Umbrea arose by DNA-based duplication ofheterochromatin protein 6 (HP6) 12–15 Mya.Subsequent loss of one of its two domains(the chromodomain) and the accumulation of

protein coding changes in the remaining chro-moshadow domain gave Umbrea a distinct chro-matin localization pattern at the centromere.Umbrea appears to have become essential onlyafter it lost the chromodomain 5–7 Mya. Care-ful molecular dissection, ancestral protein res-urrection, and population genetic analyses arethe keys to understanding the processes andtime new genes take to acquire important rolesin organisms.

Brain Evolution in Flies and Humans

Chen et al. (29) investigated the expression pat-terns of new genes in Drosophila and foundthat approximately five new genes per millionyears evolved brain expression patterns, mostlyin structures involved in olfaction and learn-ing/memory. All new brain genes are expressedin the α/β lobe, an evolutionarily new set ofneurons, implicating new genes in the evolu-tion of this brain structure. Some of the newbrain genes have significant effects on the be-havior. For example, Xcbp1 and Desr influenceforaging behaviors (29), and sphinx influencescourtship behaviors (34). The frequent acquire-ment of new brain genes into the genome andthe behavioral phenotypes of some of thesegenes suggested rapid evolution of behaviors,which is consistent with the remarkable obser-vations of Rollman et al. (120) that detecteda great variation in the the olfactory behav-ioral response associated with odorant receptorgene duplicates within the natural populationof D. melanogaster. The incorporation of newgenes into the brain is not specific to Drosophila.Zhang et al. (166) found a correlation betweennew genes and brain evolution in the human lin-eage. A high proportion of hominoid-specificand human-specific genes are expressed in theprefrontal cortex and temporal lobe, the newestbrain structures, in early fetal development.Strikingly, 54 of 380 human-specific genes areexpressed in these two brain regions, regionsthat are critical for proper cognitive function.One of these genes, SRGAP2, is involved inneocortical development (23, 37).

342 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

CG62899 Mya

Early larva

G3237618 Mya

Pharate

G1346330 MyaEarly pupa

a

YLL1Gene duplicationCG7627

b

Mya

LINES11343

SH2-0995

SH2-1101

SH2-0504

V39539

V39540

METHOD/MUTATIONP-element insertion

EMS/G717S

EMS/T765I

EMS/synonymous

RNAi/constitutive Gal4

RNAi/constitutive Gal4

PHENOTYPES Lethal, pupal stage

Lethal, pupal stage

Lethal, pupal stage

Viable

Lethal, pupal stage

Lethal, pupal stage

D. anan

assa

e

D. ere

cta

D. yak

uba

D. teis

sieri

D. sim

ulan

sD. m

aurit

iana

D. mela

nogas

ter

D. ere

cta

D. yak

uba

D. teis

sieri

D. sim

ulan

sD. m

aurit

iana

D. mela

nogas

ter

8

2

4

6

0

Figure 7The essential effects of new genes on development. (a) Development was terminated at the final stage whenthree different genes were knocked down using RNA interference (RNAi). (b) YLL1 originated in thecommon ancestor of the Drosophila melanogaster subgroup species ∼6–10 Mya, yet showed lethal effects inthe pupal stage when silenced by RNAi, mutated by EMS, or disrupted by the P element (30).

Sexual Dimorphism andSexual Reproduction

New genes impact sexual dimorphism by par-ticipating in the genetic systems that controlsexual reproduction and sex determination (87).As the aforementioned patterns of new gene

origination show, the vast majority of new genesare sex-biased, especially male-biased, and theirorigination processes show directional copy-ing between the sex chromosomes and auto-somes (e.g., 11, 43). A number of new geneshave been identified with various phenotypic

www.annualreviews.org • New Gene Evolution 343

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

effects, including testicular descent in theria(RLN3) (115), testis size in mouse (noncodingRNA gene, Poldi ) (58), sperm competition inD. melanogaster (Sdic) (160), and spermatogen-esis in Drosophila (nsr) (39).

The ability of new genes to be incorporatedinto such conserved pathways, networks, anddevelopmental programs warrants considerablefurther study. What specific roles can newgenes play, and what characteristics of newgenes enable them to become essential compo-nents of these processes so quickly? New genesnow appear to be potent drivers of phenotypicevolution and the genetic control of importantbiological processes, and show that organismaldevelopment and organ development haveevolved species-specific and lineage-specificcomponents. Understanding the evolution andmodification of these components throughthe incorporation of new genes is a crucial tofurther research.

CHALLENGES FOR THE FUTURE

It is apparent that we have just a glimpse ofthe emerging world of new genes and thatthese genes play crucial roles in the rapid evo-lution of the genetic systems that govern bi-ological diversity. Questions about new geneevolution have opened many doors to both ourunderstanding of existing diversity and to newresearch. For example, most studies have ex-amined new genes generated from a few

mechanisms, e.g., duplication and de novo orig-ination, leaving open a vast array of mecha-nisms to be investigated. Continued efforts willbe invaluable for understanding the abundanceof new genes, the mechanisms that have beenneglected so far, and even new gene evolutionin nonmodel organisms. An outstanding chal-lenge is to understand the roles of new genes inthe evolution and biology of phenotypes, andthe studies we have highlighted have left impor-tant, unresolved questions to be answered. Forexample, what evolutionary forces drive genetraffic? How do new genes evolve essential de-velopmental functions, and how quickly? Howis CNV driven to fixation, and when do CNVsacquire novel functions? How are importantstructures, such as the human brain, able toincorporate new gene functions, and how donew genes contribute to novel cognitive func-tion? Future studies of more, diverse pheno-types will help shed light on the general patternsand modes of new gene evolution and on the in-fluence of new genes on evolving systems. In ad-dition, understanding how phenotypes rapidlyevolve will require a deep understanding ofthe underlying local and global gene networks.This will be a tremendous challenge, rangingfrom the experimental deciphering and graphicdescription of the gene networks to a valid com-parative analysis of the ancestral and derivednetworks shaped by new genes and eventually tothe causal relationship of the altered networkswith the evolution of phenotypes.

DISCLOSURE STATEMENT

The authors are not aware of any affiliations, memberships, funding, or financial holdings thatmight be perceived as affecting the objectivity of this review.

ACKNOWLEDGMENTS

We thank all members of the Manyuan Long lab, past and present, for their scientific contributionto the relevant topics discussed in this review. We also thank the NIH, the NSF, and the PackardFoundation as well as the late Edna K. Papazian for their support of the study of new genesthroughout the past fifteen years as we explored this new and exciting area. M.L. is currentlysupported by NIH grants 1R01GM100768-01A1, NSF1051826, and NSF1026200; N.W.V. by

344 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

the NSF Graduate Research Fellowship and partially by the NIH genetics training grant T32GM007197; S.C. by the NSF Doctoral Dissertation Improvement Grant DEB-1110607; andM.D.V. by a Pew Latin American Postdoctoral Fellowship.

LITERATURE CITED

1. Almada AE, Wu X, Kriz AJ, Burge CB, Sharp PA. 2013. Promoter directionality is controlled by U1snRNP and polyadenylation signals. Nature 499:360–63

2. Arguello JR, Chen Y, Yang S, Wang W, Long M. 2006. Origination of an X-linked testes chimeric geneby illegitimate recombination in Drosophila. PLoS Genet. 2(5):e77

3. Bachtrog D, Toda NRT, Lockton S. 2010. Dosage compensation and demasculinization of X chromo-somes in Drosophila. Curr. Biol. 20(16):1476–81

4. Bai Y, Casola C, Feschotte C, Betran E. 2007. Comparative genomics reveals a constant rate of originationand convergent acquisition of functional retrogenes in Drosophila. Genome Biol. 8(1):R11.1–1.9

5. Baker DA, Russell S. 2011. Role of testis-specific gene expression in sex-chromosome evolution ofAnopheles gambiae. Genetics 189(3):1117–20

6. Begun DJ, Lindfors HA, Kern AD, Jones CD. 2007. Evidence for de novo evolution of testis-expressedgenes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176(2):1131–37

7. Berezikov E. 2011. Evolution of microRNA diversity and regulation in animals. Nat. Rev. Genet.12(12):846–60

8. Bergthorsson U, Adams KL, Thomason B, Palmer JD. 2003. Widespread horizontal transfer of mito-chondrial genes in flowering plants. Nature 424(6945):197–201

9. Bergthorsson U, Andersson DI, Roth JR. 2007. Ohno’s dilemma: evolution of new genes under contin-uous selection. Proc. Natl. Acad. Sci. USA 104(43):17004–9

10. Betran E, Long M. 2003. Dntf-2r, a young Drosophila retroposed gene with specific male expressionunder positive Darwinian selection. Genetics 164(3):977–88

11. Betran E, Thornton K, Long M. 2002. Retroposed new genes out of the X in Drosophila. Genome Res.12:1854–59

12. Bohne A, Brunet F, Galiana-Arnoux D, Schultheis C, Volff J-N. 2008. Transposable elements as driversof genomic and biological diversity in vertebrates. Chromosome Res. 16(1):203–15

13. Bowers JE, Chapman BA, Rong J. 2003. Unravelling angiosperm genome evolution by phylogeneticanalysis of chromosomal duplication events. Nature 422:433–38

14. Brosius J. 1991. Retroposons: seeds of evolution. Science 251(4995):75315. Brosius J. 2003. The contribution of RNAs and retroposition to evolutionary novelties. Genetica 118(2–

3):99–11616. Brunet FG, Crollius HR, Paris M, Aury J-M, Gibert P, et al. 2006. Gene loss and evolutionary rates

following whole-genome duplication in teleost fishes. Mol. Biol. Evol. 23(9):1808–1617. Cai J, Zhao R, Jiang H, Wang W. 2008. De novo origination of a new protein-coding gene in Saccha-

romyces cerevisiae. Genetics 179(1):487–9618. Capra JA, Pollard KS, Singh M. 2010. Novel genes exhibit distinct patterns of function acquisition and

network integration. Genome Biol. 11(12):R12719. Cardoso-Moreira M, Emerson JJ, Clark AG, Long M. 2011. Drosophila duplication hotspots are associ-

ated with late-replicating regions of the genome. PLoS Genet. 7(11):e100234020. Cardoso-Moreira M, Long M. 2010. Mutational bias shaping fly copy number variation: implications

for genome evolution. Trends Genet. 26(6):243–4721. Carvunis A-R, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, et al. 2012. Proto-genes and de

novo gene birth. Nature 487(7407):370–7422. Charlesworth B, Coyne JA, Barton NH. 1987. The relative rates of evolution of sex chromosomes and

autosomes. Am. Nat. 130(1):113–4623. Charrier C, Joshi K, Coutinho-Budd J, Kim J-E, Lambert N, et al. 2012. Inhibition of SRGAP2 function

by its human-specific paralogs induces neoteny during spine maturation. Cell 149(4):923–35

www.annualreviews.org • New Gene Evolution 345

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

24. Chen L, DeVries AL, Cheng CH. 1997. Evolution of antifreeze glycoprotein gene from a trypsinogengene in Antarctic notothenioid fish. Proc. Natl. Acad. Sci. USA 94(8):3811–16

25. Chen M, Zou M, Fu B, Li X, Vibranovski MD, et al. 2011. Evolutionary patterns of RNA-based dupli-cation in non-mammalian chordates. PLoS ONE 6(7):e21466

26. Chen S-T, Cheng H-C, Barbash DA, Yang H-P. 2007. Evolution of hydra, a recently evolved testis-expressed gene with nine alternative first exons in Drosophila melanogaster. PLoS Genet. 3(7):e107

27. Chen S, Krinsky BH, Long M. 2013. New genes as drivers of phenotypic evolution. Nat. Rev. Genet. Inpress

28. Chen S, Ni X, Krinsky BH, Zhang YE, Vibranovski MD, et al. 2012. Reshaping of global gene expressionnetworks and sex-biased gene expression by integration of a young gene. EMBO J. 31(12):2798–809

29. Chen S, Spletter M, Ni X, White KP, Luo L, Long M. 2012. Frequent recent origination of brain genesshaped the evolution of foraging behavior in Drosophila. Cell Rep. 1(2):118–32

30. Chen S, Zhang YE, Long M. 2010. New genes in Drosophila quickly become essential. Science330(6011):1682–85

31. Cheng C-HC, Chen L. 1999. Evolution of an antifreeze glycoprotein. Nature 401:443–4432. Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, et al. 2007. Evolution of genes and genomes

on the Drosophila phylogeny. Nature 450(7167):203–1833. Conant GC, Wolfe KH. 2008. Turning a hobby into a job: how duplicated genes find new functions.

Nat. Rev. Genet. 9(12):938–5034. Dai H, Chen Y, Chen S, Mao Q, Kennedy D, et al. 2008. The evolution of courtship behaviors through

the origination of a new gene in Drosophila. Proc. Natl. Acad. Sci. USA 105(21):7478–8335. Demuth JP, De Bie T, Stajich JE, Cristianini N, Hahn MW. 2006. The evolution of mammalian gene

families. PLoS ONE 1(1):e8536. Deng C, Cheng C-HC, Ye H, He X, Chen L. 2010. Evolution of an antifreeze protein by neofunction-

alization under escape from adaptive conflict. Proc. Natl. Acad. Sci. USA 107(50):21593–9837. Dennis MY, Nuttle X, Sudmant PH, Antonacci F, Graves TA, et al. 2012. Evolution of human-specific

neural SRGAP2 genes by incomplete segmental duplication. Cell 149(4):912–2238. Dıaz-Castillo C, Ranz JM. 2012. Nuclear chromosome dynamics in the Drosophila male germ line con-

tribute to the nonrandom genomic distribution of retrogenes. Mol. Biol. Evol. 29(9):2105–839. Ding Y, Zhao L, Yang S, Jiang Y, Chen Y, et al. 2010. A young Drosophila duplicate gene plays essential

roles in spermatogenesis by regulating several y-linked male fertility genes. PLoS Genet. 6(12):e100125540. Dopman EB, Hartl DL. 2007. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc.

Natl. Acad. Sci. USA 104(50):19920–2541. Duret L, Chureau C, Samain S, Weissenbach J, Avner P. 2006. The Xist RNA gene evolved in eutherians

by pseudogenization of a protein-coding gene. Science 312(5780):1653–5542. Emerson JJ, Cardoso-Moreira M, Borevitz JO, Long M. 2008. Natural selection shapes genome-wide

patterns of copy-number polymorphism in Drosophila melanogaster. Science 320(5883):1629–3143. Emerson JJ, Kaessmann H, Betran E, Long M. 2004. Extensive gene traffic on the mammalian X chro-

mosome. Science 303(5657):537–4044. Fan C, Chen Y, Long M. 2008. Recurrent tandem gene duplication gave rise to functionally divergent

genes in Drosophila. Mol. Biol. Evol. 25(7):1451–5845. Fan C, Vibranovski MD, Chen Y, Long M. 2007. A microarray based genomic hybridization method

for identification of new genes in plants: case analyses of Arabidopsis and Oryza. J. Integr. Plant Biol.49(6):915–26

46. Feschotte C. 2008. Transposable elements and the evolution of regulatory networks. Nat. Rev. Genet.9(5):397–405

47. Feuk L, Carson AR, Scherer SW. 2006. Structural variation in the human genome. Nat. Rev. Genet.7(2):85–97

48. Francino MP. 2005. An adaptive radiation model for the origin of new gene functions. Nat. Genet.37(6):573–77

49. Fu B, Chen M, Zou M, Long M, He S. 2010. The rapid generation of chimerical genes expandingprotein diversity in zebrafish. BMC Genomics 11(1):657

346 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

50. Gallach M, Chandrasekaran C, Betran E. 2010. Analyses of nuclearly encoded mitochondrial genes sug-gest gene duplication as a mechanism for resolving intralocus sexually antagonistic conflict in Drosophila.Genome Biol. Evol. 2:835–50

51. Gardiner A, Barker D, Butlin RK, Jordan WC, Ritchie MG. 2008. Evolution of a complex locus: exongain, loss and divergence at the Gr39a locus in Drosophila. PLoS ONE 3(1):e1513

52. Gilbert W. 1978. Why genes in pieces? Nature 271:50153. Gillespie J. 1987. Molecular evolution and the neutral allele theory. In Oxford Surveys in Evolutionary

Biology, Vol. 4, ed. P Harvey, L Partridge, pp. 10–37. New York: Oxford Univ. Press54. Graubert TA, Cahan P, Edwin D, Selzer RR, Richmond TA, et al. 2007. A high-resolution map of

segmental DNA copy number variation in the mouse genome. PLoS Genet. 3(1):e355. Haldane J. 1932. The time of action of genes, and its bearing on some evolutionary problems. Am. Nat.

66(702):5–2456. Hall C, Brachat S, Dietrich FS. 2005. Contribution of horizontal gene transfer to the evolution of

Saccharomyces cerevisiae. Eukaryot. Cell 4(6):1102–1557. Harr B, Turner LM. 2010. Genome-wide analysis of alternative splicing evolution among Mus subspecies.

Mol. Ecol. 19:228–3958. Heinen TJAJ, Staubach F, Haming D, Tautz D. 2009. Emergence of a new gene from an intergenic

region. Curr. Biol. 19(18):1527–3159. Hense W, Baines JF, Parsch J. 2007. X chromosome inactivation during Drosophila spermatogenesis.

PLoS Biol. 5(10):e27360. Hudson RR, Kreitman M, Aguade M. 1987. A test of neutral molecular evolution based on nucleotide

data. Genetics 116:153–5961. Innan H, Kondrashov F. 2010. The evolution of gene duplications: classifying and distinguishing between

models. Nat. Rev. Genet. 11(2):97–10862. Int. Chicken Genome Seq. Consort. 2004. Sequence and comparative analysis of the chicken genome

provide unique perspectives on vertebrate evolution. Nature 432(7018):695–71663. Jiang N, Bao Z, Zhang X, Eddy SR, Wessler SR. 2004. Pack-MULE transposable elements mediate

gene evolution in plants. Nature 431:569–7364. Jones CD, Begun DJ. 2005. Parallel evolution of chimeric fusion genes. Proc. Natl. Acad. Sci. USA

102(32):11373–7865. Kaessmann H, Vinckenbosch N, Long M. 2009. RNA-based gene duplication: mechanistic and evolu-

tionary insights. Nat. Rev. Genet. 10(1):19–3166. Katju V, Lynch M. 2003. The structure and early evolution of recently arisen gene duplicates in the

Caenorhabditis elegans genome. Genetics 165(4):1793–80367. Katju V, Lynch M. 2006. On the formation of novel genes by duplication in the Caenorhabditis elegans

genome. Mol. Biol. Evol. 23(5):1056–6768. Kellis M, Birren BW, Lander ES. 2004. Proof and evolutionary analysis of ancient genome duplication

in the yeast Saccharomyces cerevisiae. Nature 428(6983):617–2469. Keren H, Lev-Maor G, Ast G. 2010. Alternative splicing and evolution: diversification, exon definition

and function. Nat. Rev. Genet. 11(5):345–5570. Khil PP, Smirnova NA, Romanienko PJ, Camerini-Otero RD. 2004. The mouse X chromosome is

enriched for sex-biased genes not subject to selection by meiotic sex chromosome inactivation. Nat.Genet. 36(6):642–46

71. Knowles DG, McLysaght A. 2009. Recent de novo origin of human protein-coding genes. Genome Res.19(10):1752–59

72. Konikoff CE, Wisotzkey RG, Stinchfield MJ, Newfeld SJ. 2010. Distinct molecular evolutionary mech-anisms underlie the functional diversification of the Wnt and TGF β signaling pathways. J. Mol. Evol.70:303–12

73. Koonin EV, Makarova KS, Aravind L. 2001. Horizontal gene transfer in prokaryotes: quantification andclassification. Annu. Rev. Microbiol. 55:709–42

74. Langley CH, Stevens K, Cardeno C, Lee Y, Schrider DR, et al. 2012. Genomic variation in naturalpopulations of Drosophila melanogaster. Genetics 192(2):533–98

www.annualreviews.org • New Gene Evolution 347

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

75. Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ. 2006. Novel genes derived from noncodingDNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc. Natl.Acad. Sci. USA 103(26):9935–39

76. Lewin B, Krebs JE, Goldstein ES, Kilpatrick ST. 2011. Lewin’s Genes X. Sudbury, MA: Jones and BartlettPubl.

77. Li D, Dong Y, Jiang Y, Jiang H, Cai J, Wang W. 2010. A de novo originated gene depresses budding yeastmating pathway and is repressed by the protein encoded by its antisense strand. Cell Res. 20(4):408–20

78. Li WH, Gojobori T. 1983. Rapid evolution of goat and sheep globin genes following gene duplication.Mol. Biol. Evol. 1(1):94–108

79. Li W-H. 1997. Molecular Evolution. Sunderland, MA: Sinauer Assoc.80. Li Z, Liu M, Zhang L, Zhang W, Gao G, et al. 2009. Detection of intergenic non-coding RNAs expressed

in the main developmental stages in Drosophila melanogaster. Nucleic Acids Res. 37(13):4308–1481. Lipinski KJ, Farslow JC, Fitzpatrick KA, Lynch M, Katju V, Bergthorsson U. 2011. High spontaneous

rate of gene duplication in Caenorhabditis elegans. Curr. Biol. 21(4):306–1082. Llopart A, Comeron JM, Brunet G, Lachaise D, Long M. 2002. Intron presence: absence polymorphism

in Drosophila. Proc. Natl. Acad. Sci. USA 99(12):8121–2683. Long M. 1992. The origin and evolutionary mechanisms of new genes. PhD Diss. Univ. Calif., Davis. 139 pp.84. Long M, Betran E, Thornton K, Wang W. 2003. The origin of new genes: glimpses from the young

and old. Nat. Rev. Genet. 4(11):865–7585. Long M, Langley CH. 1993. Natural selection and the origin of jingwei, a chimeric processed functional

gene in Drosophila. Science 260(5104):91–9586. Long M, Rosenberg C, Gilbert W. 1995. Intron phase correlations and the evolution of the intron/exon

structure of genes. Proc. Natl. Acad. Sci. USA 92(26):12495–9987. Long M, Vibranovski MD, Zhang YE. 2012. Evolutionary interactions between sex chromosomes and

autosomes. In Rapidly Evolving Genes and Genetic Systems, ed. R Singh, J Xu, R Kulathinal, pp. 101–14.Oxford: Oxford Univ. Press

88. Lorenc A, Makałowski W. 2003. Transposable elements and vertebrate protein diversity. Genetica 118(2–3):183–91

89. Lu J, Fu Y, Kumar S, Shen Y, Zeng K, et al. 2008. Adaptive evolution of newly emerged micro-RNAgenes in Drosophila. Mol. Biol. Evol. 25(5):929–38

90. Lu J, Shen Y, Wu Q, Kumar S, He B, et al. 2008. The birth and death of microRNA genes in Drosophila.Nat. Genet. 40(3):351–55

91. Makalowski W. 1995. SINEs as a genomic scrap yard: an essay on genomic evolution. In The Impactof Short Interspersed Elements (SINEs) on the Host Genome, ed. R Maraia, pp. 81–104. Austin, TX: R.G.Landes

92. Marques AC, Dupanloup I, Vinckenbosch N, Reymond A, Kaessmann H. 2005. Emergence of younghuman genes after a burst of retroposition in primates. PLoS Biol. 3(11):e357

93. Marques AC, Tan J, Lee S, Kong L, Heger A, Ponting CP. 2012. Evidence for conserved post-transcriptional roles of unitary pseudogenes and for frequent bifunctionality of mRNAs. Genome Biol.13(11):R102

94. Marques AC, Tan J, Ponting CP. 2011. Wrangling for microRNAs provokes much crosstalk. GenomeBiol. 12(11):132

95. Matsuno M, Compagnon V, Schoch GA, Schmitt M, Debayle D, et al. 2009. Evolution of a novelphenolic pathway for pollen development. Science 325(5948):1688–92

96. McCarrey JR, Riggs AD. 1986. Determinator-inhibitor pairs as a mechanism for threshold setting indevelopment: a possible function for pseudogenes. Proc. Natl. Acad. Sci. USA 83(3):679–83

97. McDonald JH, Kreitman M. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature351:652–54

98. McLysaght A, Hokamp K, Wolfe KH. 2002. Extensive genomic duplication during early chordateevolution. Nat. Genet. 31(2):200–4

99. Meiklejohn CD, Landeen EL, Cook JM, Kingan SB, Presgraves DC. 2011. Sex chromosome-specificregulation in the Drosophila male germline but little evidence for chromosomal dosage compensation ormeiotic inactivation. PLoS Biol. 9(8):e1001126

348 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

100. Meisel RP, Han MV, Hahn MW. 2009. A complex suite of forces drives gene traffic from Drosophila Xchromosomes. Genome Biol. Evol. 1:176–88

101. Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, et al. 2011. Mapping copy number variationby population-scale genome sequencing. Nature 470(7332):59–65

102. Moran NA, Jarvik T. 2010. Lateral transfer of genes from fungi underlies carotenoid production inaphids. Science 328(5978):624–27

103. Muller HJ. 1936. Bar duplication. Science 83(2161):528–30104. Murphy DN, McLysaght A. 2012. De novo origin of protein-coding genes in murine rodents. PloS ONE

7(11):e48650105. Nasvall J, Sun L, Roth JR, Andersson DI. 2012. Real-time evolution of new genes by innovation, am-

plification, and divergence. Science 338(6105):384–87106. Nekrutenko A, Li WH. 2001. Transposable elements are found in a large number of human protein-

coding genes. Trends Genet. 17(11):619–21107. Ni X, Zhang YE, Negre N, Chen S, Long M, White KP. 2012. Adaptive evolution and the birth of

CTCF binding sites in the Drosophila genome. PLoS Biol. 10(11):e1001420108. Nozawa M, Aotsuka T, Tamura K. 2005. A novel chimeric gene, siren, with retroposed promoter sequence

in the Drosophila bipectinata complex. Genetics 171(4):1719–27109. Nozawa M, Miura S, Nei M. 2010. Origins and evolution of microRNA genes in Drosophila species.

Genome Biol. Evol. 2:180–89110. Nurminsky DI, Nurminskaya MV, De Aguiar D, Hartl DL. 1998. Selective sweep of a newly evolved

sperm-specific gene in Drosophila. Nature 396:572–75111. Ochman H, Lawrence JG, Groisman EA. 2000. Lateral gene transfer and the nature of bacterial inno-

vation. Nature 405(6784):299–304112. Ohno S. 1970. Evolution by Gene Duplication. New York: Springer-Verlag. 160 pp.113. Okamura K, Feuk L, Marques-Bonet T, Navarro A, Scherer SW. 2006. Frequent appearance of novel

protein-coding sequences by frameshift translation. Genomics 88(6):690–97114. Parisi M, Nuttall R, Naiman D, Bouffard G, Malley J, et al. 2003. Paucity of genes on the Drosophila X

chromosome showing male-biased expression. Science 299(5607):697–700115. Park J, Semyonov J, Chang CL, Yi W, Warren W, et al. 2008. Origin of INSL3-mediated testicular

descent in therian mammals. Genome Res. 18:974–85116. Piriyapongsa J, Jordan IK. 2008. Dual coding of siRNAs and miRNAs by plant transposable elements.

Bioinformatics 14:814–21117. Ranz JM, Castillo-Davis CI, Meiklejohn CD, Hartl DL. 2003. Sex-dependent gene expression and

evolution of the Drosophila transcriptome. Science 300(5626):1742–45118. Rogers RL, Bedford T, Hartl DL. 2009. Formation and longevity of chimeric and duplicate genes in

Drosophila melanogaster. Genetics 181(1):313–22119. Rogers RL, Hartl DL. 2012. Chimeric genes as a source of rapid evolution in Drosophila melanogaster.

Mol. Biol. Evol. 29(2):517–29120. Rollmann SM, Wang P, Date P, West SA, Mackay TF, Anholt RR. 2010. Odorant receptor polymor-

phisms and natural variation behavior in Drosophila melanogaster. Genetics 186:687–97121. Ross BD, Rosin L, Thomae AW, Hiatt MA, Vermaak D, et al. 2013. Stepwise evolution of essential

centromere function in a Drosophila neogene. Science 340(6137):1211–14122. Sabath N, Wagner A, Karlin D. 2012. Evolution of viral proteins originated de novo by overprinting.

Mol. Biol. Evol. 29(12):3767–80123. Schrider DR, Navarro FCP, Galante PAF, Parmigiani RB, Camargo AA, et al. 2013. Gene copy-number

polymorphism caused by retrotransposition in humans. PLoS Genet. 9(1):e1003242124. Schrider DR, Stevens K, Cardeno CM, Langley CH, Hahn MW. 2011. Genome-wide analysis of ret-

rogene polymorphisms in Drosophila melanogaster. Genome Res. 21(12):2087–95125. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, et al. 2004. Large-scale copy number polymorphism

in the human genome. Science 305(5683):525–28126. Semon M, Wolfe KH. 2007. Consequences of genome duplication. Curr. Opin. Genet. Dev. 17(6):505–12127. Shih H-J, Jones CD. 2008. Patterns of amino acid evolution in the Drosophila ananassae chimeric gene,

siren, parallel those of other Adh-derived chimeras. Genetics 180(2):1261–63

www.annualreviews.org • New Gene Evolution 349

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

128. Sturgill D, Zhang Y, Parisi M, Oliver B. 2007. Demasculinization of X chromosomes in the Drosophilagenus. Nature 450(7167):238–41

129. Sturtevant AH. 1925. The effects of unequal crossing over at the bar locus in Drosophila. Genetics 10:117–47

130. Sudhof TC, Goldstein JL, Brown MS, Russell DW. 1985. The LDL receptor gene: a mosaic of exonsshared with different proteins. Science 228(4701):815–22

131. Tao Y, Araripe L, Kingan SB, Ke Y, Xiao H, Hartl DL. 2007. A sex-ratio meiotic drive system inDrosophila simulans. II: An X-linked distorter. PLoS Biol. 5(11):e293

132. Tao Y, Masly JP, Araripe L, Ke Y, Hartl DL. 2007. A sex-ratio meiotic drive system in Drosophila simulans.I: An autosomal suppressor. PLoS Biol. 5(11):e292

133. Thomson TM, Lozano JJ, Loukili N, Carrio R, Serra F, et al. 2000. Fusion of the human gene for thepolyubiquitination coeffector UEV1 with Kua, a newly identified gene. Genome Res. 10(11):1743–56

134. Thornton KR. 2003. Gene conversion and natural selection at duplicate loci in Drosophila melanogaster.PhD Diss. Univ. Chicago. 181 pp.

135. Thornton KR. 2007. The neutral coalescent process for recent gene duplications and copy-numbervariants. Genetics 177(2):987–1000

136. Tiedge H, Chen W, Brosius J. 1993. Primary structure, neural-specific expression, and dendritic locationof human BC200 RNA. J. Neurosci. 13:2382–90

137. Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, et al. 2009. Origin of primate orphan genes:a comparative genomics approach. Mol. Biol. Evol. 26(3):603–12

138. Toups MA, Hahn MW. 2010. Retrogenes reveal the direction of sex-chromosome evolution inmosquitoes. Genetics 186(2):763–66

139. Vibranovski MD, Lopes HF, Karr TL, Long M. 2009. Stage-specific expression profiling of Drosophilaspermatogenesis suggests that meiotic sex chromosome inactivation drives genomic relocation of testis-expressed genes. PLoS Genet. 5(11):e1000731

140. Vibranovski MD, Zhang YE, Kemkemer C, Lopes HF, Karr TL, Long M. 2012. Re-analysis of the larvaltestis data on meiotic sex chromosome inactivation revealed evidence for tissue-specific gene expressionrelated to the Drosophila X chromosome. BMC Biol. 10(1):49; author reply 50

141. Vibranovski MD, Zhang YE, Kemkemer C, VanKuren NW, Lopes HF, et al. 2012. Segmental datasetand whole body expression data do not support the hypothesis that non-random movement is an intrinsicproperty of Drosophila retrogenes. BMC Evol. Biol. 12:169

142. Vibranovski MD, Zhang Y, Long M. 2009. General gene movement off the X chromosome in theDrosophila genus. Genome Res. 19(5):897–903

143. Vicoso B, Charlesworth B. 2009. The deficit of male-biased genes on the D. melanogaster X Chromosomeis expression-dependent: a consequence of dosage compensation? J. Mol. Evol. 68(5):576–83

144. Vinckenbosch N, Dupanloup I, Kaessmann H. 2006. Evolutionary fate of retroposed gene copies in thehuman genome. Proc. Natl. Acad. Sci. USA 103(9):3220–25

145. Walsh B. 2003. Population-genetic models of the fates of duplicate genes. Genetica 118(2–3):279–94146. Walsh JB. 1995. How often do duplicated genes evolve new functions? Genetics 139(1):421–28147. Wang J, Long M, Vibranovski MD. 2012. Retrogenes moved out of the Z chromosome in the silkworm.

J. Mol. Evol. 74(3–4):113–26148. Wang J, Mager J, Chen Y, Schneider E, Cross JC, et al. 2001. Imprinted X inactivation maintained by

a mouse Polycomb group gene. Nat. Genet. 28(4):371–75149. Wang W, Yu H, Long M. 2004. Duplication-degeneration as a mechanism of gene fission and the origin

of new genes in Drosophila species. Nat. Genet. 36(5):523–27150. Wang W, Zhang J, Alvarez C, Llopart A, Long M. 2000. The origin of the jingwei gene and the

complex modular structure of its parental gene, yellow emperor, in Drosophila melanogaster. Mol. Biol.Evol. 17(9):1294–301

151. Wang W, Zheng H, Fan C, Li J, Shi J, et al. 2006. High rate of chimeric gene origination by retropositionin plant genomes. Plant Cell 18:1791–802

152. Weng J-K, Li Y, Mo H, Chapple C. 2012. Assembly of an evolutionarily new pathway for α-pyronebiosynthesis in Arabidopsis. Science 337(6097):960–64

350 Long et al.

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.

GE47CH15-Long ARI 31 August 2013 8:56

153. Wu D-D, Irwin DM, Zhang Y-P. 2011. De novo origin of human protein-coding genes. PLoS Genet.7(11):e1002379

154. Xiao W, Liu H, Li Y, Li X, Xu C, et al. 2009. A rice gene of de novo origin negatively regulatespathogen-induced defense response. PLoS ONE 4(2):e4603

155. Xie C, Zhang YE, Chen J-Y, Liu C-J, Zhou W-Z, et al. 2012. Hominoid-specific de novo protein-codinggenes originating from long non-coding RNAs. PLoS Genet. 8(9):e1002942

156. Xu G, Guo C, Shan H, Kong H. 2012. Divergence of duplicate genes in exon-intron structure. Proc.Natl. Acad. Sci. USA 109(4):1187–92

157. Xue S, Jones MD, Lu Q, Middeldorp JM, Griffin BE. 2003. Genetic diversity: frameshift mechanismsalter coding of a gene (Epstein-Barr virus LF3 gene) that contains multiple 102-base-pair direct sequencerepeats. Mol. Cell. Biol. 23(6):2192–201

158. Yang S, Arguello JR, Li X, Ding Y, Zhou Q, et al. 2008. Repetitive element-mediated recombination asa mechanism for new gene origination in Drosophila. PLoS Genet. 4(1):e3

159. Yang Z, Huang J. 2011. De novo origin of new genes with introns in Plasmodium vivax. FEBS Lett.585(4):641–44

160. Yeh S-D, Do T, Chan C, Cordova A, Carranza F, et al. 2012. Functional evidence that a recently evolvedDrosophila sperm-specific gene boosts sperm competition. Proc. Natl. Acad. Sci. USA 109(6):2043–48

161. Yoshida S, Maruyama S, Nozaki H, Shirasu K. 2010. Horizontal gene transfer by the parasitic plantStriga hermonthica. Science 328:1128

162. Zhang J, Dean AM, Brunet F, Long M. 2004. Evolving protein functional diversity in new genes ofDrosophila. Proc. Natl. Acad. Sci. USA 101(46):16246–50

163. Zhang PG, Huang SZ, Pin A-L, Adams KL. 2010. Extensive divergence in alternative splicing pat-terns after gene and genome duplication during the evolutionary history of Arabidopsis. Mol. Biol. Evol.27(7):1686–97

164. Zhang Y, Lu S, Zhao S, Zheng X, Long M, Wei L. 2009. Positive selection for the male functionalityof a co-retroposed gene in the hominoids. BMC Evol. Biol. 9:252

165. Zhang Y, Wu Y, Liu Y, Han B. 2005. Computational identification of 69 retroposons in Arabidopsis.Plant Physiol. 138:935–48

166. Zhang YE, Landback P, Vibranovski MD, Long M. 2011. Accelerated recruitment of new brain devel-opment genes into the human genome. PLoS Biol. 9(10):e1001179

167. Zhang YE, Landback P, Vibranovski M, Long M. 2012. New genes expressed in human brains: impli-cations for annotating evolving genomes. BioEssays 34(11):982–91

168. Zhang YE, Vibranovski MD, Krinsky BH, Long M. 2010. Age-dependent chromosomal distribution ofmale-biased genes in Drosophila. Genome Res. 20(11):1526–33

169. Zhang YE, Vibranovski MD, Landback P, Marais GaB, Long M. 2010. Chromosomal redistribution ofmale-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoSBiol. 8(10):e1000494

170. Zhen Y, Aardema ML, Medina EM, Schumer M, Andolfatto P. 2012. Parallel molecular evolution in anherbivore community. Science 337(6102):1634–37

171. Zheng D, Gerstein MB. 2007. The ambiguous boundary between genes and pseudogenes: The dead riseup, or do they? Trends Genet. 23(5):219–24

172. Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, et al. 2008. On the origin of new genes in Drosophila. GenomeRes. 18(9):1446–55

173. Zhou R, Moshgabadi N, Adams KL. 2011. Extensive changes to alternative splicing patterns followingallopolyploidy in natural and resynthesized polyploids. Proc. Natl. Acad. Sci. USA 108(38):16122–27

174. Zhu Z, Zhang Y, Long M. 2009. Extensive structural renovation of retrogenes in the evolution of thepopulus genome. Plant Physiol. 151(4):1943–51

www.annualreviews.org • New Gene Evolution 351

Changes may still occur before final publication online and in print

Ann

u. R

ev. G

enet

. 201

3.47

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Mon

ash

Uni

vers

ity o

n 09

/23/

13. F

or p

erso

nal u

se o

nly.