Predominant patterns of splicing evolution on human ...

12
ORIGINAL ARTICLE Predominant patterns of splicing evolution on human, chimpanzee and macaque evolutionary lineages Jieyi Xiong 1,2 , Xi Jiang 1 , Angeliki Ditsiou 3,4 , Yang Gao 1 , Jing Sun 1 , Elijah D. Lowenstein 2,5 , Shuyun Huang 1,6 and Philipp Khaitovich 7,8,9, * 1 Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 200031 Shanghai, China, 2 Max Delbru ¨ ck Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany, 3 JBC/WTB Biocentre, University of Dundee, DD1 5EH Scotland, UK, 4 JMS Building, University of Sussex, BN1 9QG Brighton, UK, 5 Freie Universita ¨ t Berlin, 14195 Berlin, Germany, 6 School of Life Science and Technology, ShanghaiTech University, 201210 Shanghai, China, 7 Skolkovo Institute of Science and Technology, 143025 Skolkovo, Russia, 8 Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany and 9 Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 200031 Shanghai, China *To whom correspondence should be addressed at: Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 200031 Shanghai, China. Tel: þ86 21 54920454; Fax: þ86 21 54920451; Email: [email protected] Abstract Although splicing is widespread and evolves rapidly among species, the mechanisms driving this evolution, as well as its functional implications, are not yet fully understood. We analyzed the evolution of splicing patterns based on transcriptome data from five tissues of humans, chimpanzees, rhesus macaques and mice. In total, 1526 exons and exon sets from 1236 genes showed significant splicing differences among primates. More than 60% of these differences represent constitutive-to- alternative exon transitions while an additional 25% represent changes in exon inclusion frequency. These two dominant evolutionary patterns have contrasting conservation, regulation and functional features. The sum of these features indicates that, despite their prevalence, constitutive-to-alternative exon transitions do not substantially contribute to long-term func- tional transcriptome changes. Conversely, changes in exon inclusion frequency appear to be functionally relevant, especially for changes taking place in the brain on the human evolutionary lineage. Introduction Alternative splicing (AS) enables the expansion of the transcrip- tome repertoire through the creation of novel transcript isoforms (1). The proportion of genes undergoing AS increases with organ- ismal complexity, from 25% in nematode worms to 95% in hu- mans (2,3). AS further increases transcriptome variability by creating isoform variation among tissues within an individual (3,4), among different age groups (5) and among human popula- tions (6). Comprehensive comparison studies in vertebrates (7) and mammals (8) revealed that splicing divergence increases rap- idly as phylogenetic distances increase. Numerous studies reported that AS plays a critical role in various physiological processes in eukaryotes (911). Yet, func- tions of the vast majority of alternative isoforms are still un- known. A large proportion of detected isoforms might not be functional and may instead represent aberrant splicing prod- ucts, as indicated by their low expression and low splice site se- quence conservation (12). Similarly, evolutionary analysis of Received: November 19, 2017. Revised: February 1, 2018. Accepted: February 12, 2018 V C The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1474 Human Molecular Genetics, 2018, Vol. 27, No. 8 1474–1485 doi: 10.1093/hmg/ddy058 Advance Access Publication Date: 14 February 2018 Original Article Downloaded from https://academic.oup.com/hmg/article-abstract/27/8/1474/4857231 by MPI für evolutionäre Anthropologie user on 27 April 2018

Transcript of Predominant patterns of splicing evolution on human ...

Page 1: Predominant patterns of splicing evolution on human ...

O R I G I N A L A R T I C L E

Predominant patterns of splicing evolution on human

chimpanzee and macaque evolutionary lineagesJieyi Xiong12 Xi Jiang1 Angeliki Ditsiou34 Yang Gao1 Jing Sun1Elijah D Lowenstein25 Shuyun Huang16 and Philipp Khaitovich7891Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology Shanghai Institutes forBiological Sciences Chinese Academy of Sciences 200031 Shanghai China 2Max Delbruck Center forMolecular Medicine in the Helmholtz Association (MDC) 13125 Berlin Germany 3JBCWTB BiocentreUniversity of Dundee DD1 5EH Scotland UK 4JMS Building University of Sussex BN1 9QG Brighton UK 5FreieUniversitat Berlin 14195 Berlin Germany 6School of Life Science and Technology ShanghaiTech University201210 Shanghai China 7Skolkovo Institute of Science and Technology 143025 Skolkovo Russia 8Max PlanckInstitute for Evolutionary Anthropology 04103 Leipzig Germany and 9Shanghai Institutes for BiologicalSciences Chinese Academy of Sciences 200031 Shanghai China

To whom correspondence should be addressed at Shanghai Institutes for Biological Sciences Chinese Academy of Sciences 200031 Shanghai ChinaTel thorn86 21 54920454 Fax thorn86 21 54920451 Email khaitovichevampgde

AbstractAlthough splicing is widespread and evolves rapidly among species the mechanisms driving this evolution as well as itsfunctional implications are not yet fully understood We analyzed the evolution of splicing patterns based on transcriptomedata from five tissues of humans chimpanzees rhesus macaques and mice In total 1526 exons and exon sets from 1236genes showed significant splicing differences among primates More than 60 of these differences represent constitutive-to-alternative exon transitions while an additional 25 represent changes in exon inclusion frequency These two dominantevolutionary patterns have contrasting conservation regulation and functional features The sum of these features indicatesthat despite their prevalence constitutive-to-alternative exon transitions do not substantially contribute to long-term func-tional transcriptome changes Conversely changes in exon inclusion frequency appear to be functionally relevant especiallyfor changes taking place in the brain on the human evolutionary lineage

IntroductionAlternative splicing (AS) enables the expansion of the transcrip-tome repertoire through the creation of novel transcript isoforms(1) The proportion of genes undergoing AS increases with organ-ismal complexity from 25 in nematode worms to 95 in hu-mans (23) AS further increases transcriptome variability bycreating isoform variation among tissues within an individual(34) among different age groups (5) and among human popula-tions (6) Comprehensive comparison studies in vertebrates (7)

and mammals (8) revealed that splicing divergence increases rap-idly as phylogenetic distances increase

Numerous studies reported that AS plays a critical role invarious physiological processes in eukaryotes (9ndash11) Yet func-tions of the vast majority of alternative isoforms are still un-known A large proportion of detected isoforms might not befunctional and may instead represent aberrant splicing prod-ucts as indicated by their low expression and low splice site se-quence conservation (12) Similarly evolutionary analysis of

Received November 19 2017 Revised February 1 2018 Accepted February 12 2018

VC The Author(s) 2018 Published by Oxford University Press All rights reservedFor Permissions please email journalspermissionsoupcom

1474

Human Molecular Genetics 2018 Vol 27 No 8 1474ndash1485

doi 101093hmgddy058Advance Access Publication Date 14 February 2018Original Article

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

splicing patterns among primate species indicates that the ma-jority of observed splicing differences might represent evolu-tionary neutral drift (13)

Previous studies also suggest that evolutionary dynamicsmechanisms and functionality of alternatively spliced isoformsmay differ substantially depending on their splicing patternsinclusion frequency and evolutionary history For examplenovel isoforms formed by exonization may not be as functionalas those produced by other types of exon recruitment indepen-dent of novel exon inclusion frequency (14)

In this study we investigated the relative frequency evolu-tionary conservation evolutionary birth and death rates aswell as potential functionality of isoforms created by differentmechanisms Our analysis was based on transcriptome se-quencing data collected from five somatic tissues from humanschimpanzees rhesus macaques and mice The relatively shortdivergent times (15ndash18) and large proportion of alternativelyspliced genes in these species provide a good system to investi-gate the different types and properties of splicing changes alongthe evolutionary lineages

ResultsSplicing differences among species

We estimated splicing differences among humans chimpanzeesrhesus macaques and mice using poly-A-plus transcriptome datacollected in three brain regions (prefrontal cortex visual cortexand cerebellum) kidney and muscle in six adult individuals ofeach species (Supplementary Material Table S1) (19) The totaltranscriptome data contained 187 billion 100 nucleotide (nt) longreads 134 billion (72) of which were mapped to the correspond-ing genome (Supplementary Material Table S2)

In the first step of our splicing analysis we constructed con-sensus transcriptome maps of the three primate species (humanchimpanzee and macaque) or all four species (primates andmouse) based on the RNA-seq data Specifically after mappingRNA-seq reads to the corresponding genome in an annotation-free manner we mapped all identified canonical splice junctionsto the human genome This procedure yielded 224 325 splicejunctions detected in three primate species and 94 544 splicejunctions in all four species Using these junctions we identifiedexon sets consisting of an internal gene segment and two flank-ing exons Exon sets containing low occurrence isoforms (abun-dancelt 10 in all species-tissue combinations) were excludedfrom further analysis Following this procedure we detected 5480cassette exon sets with sufficient exon junction coverage in threeprimate species and 2859 cassette exon sets in all four species

Splicing variation analysis conducted based on percent-spliced-in (PSI) measurements of internal gene segments (singleexons or exon groups) located within the 1925 exon sets de-tected in all tissues of three primate species showed clear sepa-ration among organs and species (Fig 1A) Concordantly 23 ofthe total splicing variation was explained by the differencesamong tissues and 18 by the differences among the three pri-mate species (permutations Plt 00001) (Fig 1B) Similar resultswere observed using the four species data (Fig 1C and DSupplementary Material Fig S1)

To assess the validity of identified splicing differences wecompared them with published microarray and PCR results (20)Of the 281 and 340 splicing differences reported in cerebellumbetween humans and chimpanzees or macaques respectively111 and 127 were covered by exon sets detected in our study Ofthese 94 (847) and 103 (811) showed consistent difference

direction in our data (Supplementary Material Fig S2) Theseproportions are close to the RT-PCR validation rate (83) re-ported in the study (20) Direct comparison between our dataand the published PCR validation tests showed that 20 of the 23detected splicing events showed consistent differences amongthree primates while the remaining three did not yield conclu-sive results based on visual inspection of the gel images(Supplementary Material Table S3)

Patterns of splicing evolution

To identify significant splicing differences among species weapplied the generalized linear model to 5480 cassette exon setsdetected in primates and to 2859 cassette exon sets detected infour species This was done assuming that the distribution ofjunction-spanning reads followed the quasi-binomial model(21ndash23) The false discovery rate was set to 5 based on 1000permutations of species labels We further required that themean PSI difference between species be greater or equal to 02Based on these criteria a total of 1526 cassette exon sets from1236 genes showed significant splicing differences among pri-mates in at least one tissue Similarly 1385 cassette exon setsfrom 1035 genes showed significant splicing differences amongthe four species in at least one tissue

In agreement with previous reports (7 8) the number ofsplicing differences between species was mainly proportional tothe genetic distances (Pearson correlation rfrac14 088 Plt 00001Fig 2A Supplementary Material Table S4)

Assignment of splicing differences to the evolutionary line-ages and identification of the direction of splicing changes pro-duced an unexpected observation on all three evolutionarylineages changes leading to less exon inclusion were on averagemore than four times more frequent than changes leading togreater exon inclusion (Fig 2B) To understand the possible evo-lutionary mechanisms of this phenomenon we further sortedsplicing events into six evolutionary patterns The three pat-terns of decreased exon inclusion are (D1) decreased inclusionof previously constitutive exons (constitutive-to-alternative)(D2) decreased inclusion of alternative exons (alternative-to-al-ternative) and (D3) constitutive exclusion of alternative exons(exon death) The three patterns of increased exon inclusionare (I1) birth of alternative exons from intronic sequences (exonbirth) (I2) increased inclusion of alternative exons (alternative-to-alternative) and (I3) constitutive inclusion of previously alter-native exons (alternative-to-constitutive) (Fig 2C and D)

Strikingly distribution of observed splicing differences amongthese patterns was far from even The constitutive-to-alternativeD1 pattern contained 78 of all changes leading to a decrease inexon inclusion and 61 of all species-specific splicing eventsidentified on the three primate lineages (Fig 2E) Modifications ofthe inclusion frequency of alternative exons (alternative-to-alter-native D2 and I2 patterns) occupied 15 and 10 of all splicingevents respectively Exon birth and alternative-to-constitutivepatterns I1 and I3 were even more rare constituting 7 and 5 ofall splicing events respectively The exon death pattern D3 wasthe most rare contributing less than 2 of all splicing eventsThis uneven distribution of splicing patterns was observed ineach of the five tissues (Fig 2F Supplementary Material Fig S3)

Evolutionary properties of the main splicing patterns

If changes included in the D1 pattern ie constitutive-to-alternative exon transitions are retained during species

1475Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

evolution then they should lead to a rapid increase in the pro-portion of alternative isoforms To assess this we examined theevolutionary conservation of splicing changes in the three dom-inant splicing patterns D1 D2 and I2 Using mice as an out-group we assigned splicing events to the human chimpanzeeand macaque evolutionary lineages as well as to the ape line-age leading to the most recent common ancestor of humansand chimpanzees (Fig 3A Materials and Methods)

Notably the proportion of D1 splicing events on the ape evo-lutionary lineage was approximately two times smaller than oneither the human or chimpanzee lineages (Fig 3B) This findingis unexpected as the estimated length of the ape lineage isthree to four times greater that that of the human or chimpan-zee lineages (15ndash18) The scarcity of D1 events assigned to theape lineage can be explained either by the recent independentlyoccurring acceleration of D1 pattern events in the human andchimpanzee lineages or more likely by the rapid evolutionaryloss of D1 events originating on the ape lineage In contrast toD1 other splicing patterns such as D2 and I2 have similar evolu-tionary pattern characterized by more events assigned to longerevolutionary lineages (Fig 3C) This difference suggests thatconstitutive-to-alternative exon transitions (D1 pattern) appearand disappear more often during the course of evolution thanother splicing events

We next calculated evolutionary birth and death rates ofsplicing events in the three dominant splicing patterns D1 D2and I2 on the human chimpanzee ape and macaque evolu-tionary lineages using the maximum likelihood approach Weshow that in agreement with our predictions the birth rate ofD1 splicing events was eight times greater than the birth rateof D2 and I2 splicing events taken together 00067 (00051ndash00085) splicing events per cassette exon set per million yearsfor the D1 pattern and 000083 (30 10ndash6ndash00019) for the D2thorn I2patterns The death rate of these novel splicing events ie re-verse splicing changes was also substantially higher for theD1 pattern than for the D2 and I2 patterns 0093 (0071ndash012)per event per million years for the D1 pattern and 0037 (0023ndash0058) for the D2thorn I2 patterns (Fig 3D) These results furtherconfirm the conclusion that evolutionary events leading to ASof constitutive exons are common but the majority of the re-sulting alternative isoforms are rapidly lost during the courseof evolution In contrast changes affecting the inclusion fre-quency of existing alternative exons are less frequent butmore likely to be conserved Thus even though decreased in-clusion of constitutive exons is the most common pattern insplicing evolution it might not play the leading role in the in-crease in isoform repertoire over substantial evolutionarytime

A C

B D

Figure 1 Overview of splicing variation (A) and (C) PCA plots based on PSI values for 5480 events in 90 primate samples (A) and 2859 events in 120 samples from the

four species (C) The symbols show species and colors show tissues (B) and (D) The proportion of total splicing variation explained by the factors species tissues and

sex in primates (B) and four species (D) The proportion of variation explained by chance (gray bars) and their 95 confidential intervals (error bars) were calculated by

1000 permutations of the factor labels

1476 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Sequence conservation in the main splicing patterns

To assess the functional significance of splicing events in thethree dominant patterns D1 D2 and I2 we examined the se-quence and functional conservation of corresponding alterna-tive exons Because the exon boundaries of splicing eventsassessed in our study were conserved across species the vastmajority of splicing events assessed in our study (86ndash97)

resided within the protein-coding region (CDS) of a gene(Supplementary Material Fig S4) Among them the proportionof splicing events disrupting the reading frame was significantlyhigher in the D1 pattern (Fig 4A) Furthermore DNA sequenceconservation of alternatively spliced exons as well as conserva-tion of the entire CDS region was significantly lower in the D1pattern (Fig 4B) These observations suggest the neutral or dele-terious character of D1 pattern events

A

C

D

E F

B

Figure 2 Splicing divergence among species (A) The relationship between splicing divergence and genetic distances among species (B) The number of cassette exon

sets showing splicing changes specific to the lineages detected among primate species (top) or in the four species comparison (bottom) The red and blue numbers rep-

resent the number of cassette exon sets showing increased and decreased exon inclusion frequencies respectively (C) Human-specific splicing changes When splicing

changes were detected in multiple tissues the one with the most significant difference representing the major splicing pattern at this position is shown The colored

dots below the depicted changes indicate tissues M muscle K kidney PC prefrontal cortex VCvisual cortex and Cb cerebellum The bottom colored bar marks the

six evolutionary splicing patterns The same color-coding of splice patterns is used for the arrows in (D) and bar plots in (E) and (F) (D) Schematic representation of the

six evolutionary splicing patterns (E) The proportions occupied by each of the six splice patterns on the three primate lineages (F) The number of cassette exon sets

with species-specific splicing in each of the six patterns in each of the five tissues

1477Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Surprisingly despite the relatively high sequence conserva-tion of D2 and I2 exons they displayed higher KaKs ratioswhen compared with D1 or control exon groups (Fig 4C) Thisfinding parallels observations of elevated KaKs ratios in alter-natively spliced exons of mammals suggesting a stronger direc-tional selection pressure on these regions (2425)

Regulatory mechanisms of the main splicing patterns

The different evolutionary properties of the D1 and D2I2 pat-terns suggest differences in their regulatory mechanisms To as-sess this we examined the extent of DNA sequence changesaffecting splice site strength as well as the presence of regula-tory motifs in the vicinity of splice sites

D1 pattern events indeed contained more sequence changeswithin the upstream intron region adjacent to the 30 splice sitecompared with control exons (Fisher exact test Plt 005) (Fig 5A)Furthermore the resulting changes in splice site strength coin-cided with changes in exon inclusion frequency (binomial test

Plt 0001) This connection between splice site mutation and exoninclusion frequency changes is consistent with the results of aprevious comparison between humans and mice (26) Howeverthe upstream intron region flanking the 30 splice site in the D2and I2 patterns shows fewer sequence changes compared withcontrol exons (Fisher exact test Plt 00005) (Fig 5A) Still for thefew observed sequence changes the resulting decrease or in-crease in splice site strength also coincided with the change inexon inclusion frequency (binomial test Plt 005) These resultsindicate that mutations within the upstream intron affecting 30

splice site strength contribute substantially to splicing changes inall three of the dominant patterns

The birth and death frequency of regulatory splicing motifsnear junctions flanking alternatively spliced segments also var-ied among the three dominant splicing patterns Both D1 andD2 patterns had a higher proportion of sequence changes lead-ing to regulatory motif turnover in the 5rsquo-end region of alterna-tive exons compared with control exons (Fisher exact testPlt 005) (Fig 5B) For the D1 pattern further changes affectingregulatory motif turnover occurred in the proximal and distant

A B D

C

Figure 3 Evolutionary properties of the main splicing patterns (A) Schematic diagram of the evolutionary tree of the three primate species The same lineage color-

coding is used in (B) and (C) MY millions of years (B) The number of gene segments within the I2 and D2 patterns (I2 thorn D2) or within the D1 pattern in four-species

(C) The relationship between species divergence and the number of cassette exon sets for the I2 thorn D2 patterns and the D1 pattern In the D1 correlation plot the dots

representing human and chimpanzee values largely overlap (D) Estimated birth rate (x-axis) and death rate (y-axis) of splicing events in the I2 thorn D2 patterns (red) and

the D1 pattern (blue) White crosses show the maximum likelihood estimates the dark and light color areas show the 95 and 99 confidence intervals respectively

A B C

Figure 4 Sequence conservation properties of the main splicing patterns (A) The proportion of exons preserving the open reading frame in the dominant evolutionary

splicing patterns Red and green asterisks respectively show the significance of higher and lower ratios calculated in comparison with control exons (CNTL) using the

two-tailed Fisherrsquos exact test (B) The distribution of phastCons scores in the dominant evolutionary splicing patterns for gene segments of differently spliced regions

(blue) and for whole coding regions (orange) In both assays the D1 pattern shows significantly lower sequence conservation for both spliced exons (Plt00001) and cod-

ing regions (Plt0001) compared with any other pattern (two-tailed Wilcoxon test) (C) The rates of synonymous (Ks) and nonsynonymous (Ka) substitutions on the hu-

man and chimpanzee evolutionary lineages In (A) and (C) the error bars show the 95 confidence intervals

1478 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

positions of upstream introns a region that has been reportedas important for AS regulation (27) (Fisher exact test Plt 005)Finally no increase in regulatory pattern turnover was detectedfor the I2 pattern (Fig 5B) These observations indicate thatchanges in regulatory elements contribute differently to theevolution of the three dominant splicing patterns

Human-specific features of splicing evolution

In order to characterize features of splicing evolution specific tohumans we compared the frequencies of lineage-specific splic-ing events between the human and chimpanzee lineages Inagreement with the phylogenetic distances splicing changescorresponding to D1 and D2 patterns were distributed equallyon the human and chimpanzee lineages in all five tissuesConversely splicing changes corresponding to the I2 pattern oc-curred approximately two times more frequently on the humanlineage than on the chimpanzee lineage (one-sided Fisher exacttest Pfrac14 003) This difference was even greater for the exon setspresent in all four species (Fig 6A)

To further examine the authenticity of human-specific splic-ing changes we conducted additional semi-quantitative PCRexperiments We designed primer pairs for twelve out of 289human-specific splicing events detected using RNA-seq dataNine of these primer pairs effectively amplified splice segmentsin all three species From these nine splicing events eightshowed consistent differences in isoform concentrationsamong three primates coinciding with the differences observedin RNA-seq data in all three biological replicates (Fig 6B and CSupplementary Material Fig S5 and Table S5)

Among these eight splicing events five were predicted toaffect the protein sequence of the gene (Supplementary

Material Fig S5) To test whether these splicing changes led tothe expression of different protein isoforms in humans com-pared with chimpanzees and macaques we performed westernblot experiments using cerebellum samples from three humansthree chimpanzees and three macaques (SupplementaryMaterial Table S6) From the four antibodies tested only thepolyclonal antibody against CEP63 protein produced specificbands In agreement with our predictions based on the RNA-seqresults the western blot showed the presence of one CEP63 pro-tein isoform in the chimpanzee and macaque brains and twoprotein isoforms in the human brain (Fig 6DndashF)

To assess the potential functional significance of the identi-fied species-specific splicing changes we tested the enrichmentof splicing events from D1 D2 and I2 splicing patterns amongthe gene ontology (GO) functional categories Among all line-ages and patterns the human-specific I2 and D2 patternsshowed significant functional enrichment (FDRlt 005 Fig 6Gand H Supplementary Material Tables S7 and S8) Interestinglyalthough the enriched GO terms for I2 and D2 patterns did notoverlap in both cases they were mainly related to neuronalfunctionality The D2 pattern was enriched in seven cellularcomponent GO terms predominantly associated with synapticlocation while the I2 pattern was enriched in 23 biological pro-cess GO terms including axon guidance and related functionsFunctions enriched in the I2 pattern were reported to be af-fected by AS of cell adhesion proteins (28ndash30) These proteins in-clude products of two cell adhesion genes from the L1 familyCHL1 and NRCAM and two semaphorin genes SEMA4D andSEMA6C which we identified to be alternatively spliced in ourstudy More generally among 33 genes containing I2 or D2 splic-ing changes on the human lineage and falling into the enrichedGO terms 29 showed significant splicing changes in at least one

A

B

Figure 5 Regulatory properties of the main splicing patterns (A) The proportion of splice sites with strength changes on the ape and rhesus macaque lineages The col-

ored and white sections of the bars show the strength of change consistent and inconsistent with the direction of exon inclusion frequency changes respectively

White asterisks within the colored sections of the bars indicate the significance of the colored proportions calculated using the binomial test (B) The proportion of reg-

ulatory motif turnover on the human and chimpanzee evolutionary lineages In (A) and (B) error bars show 95 confidence intervals The schematic exon-intron struc-

ture in the middle shows the positions of splice sites in (A) and evaluated regions in (B) The length of evaluated regions in introns and exons depicted in (B) were 200

and 39 nt respectively

1479Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

A

D

H

G

E

F

B C

Figure 6 Human-specific features of splicing evolution (A) The proportion of cassette exon sets showing species-specific splicing changes for each of the domi-

nant splicing patterns The proportions for human and chimpanzee lineages were calculated in the primate species comparison (upper panel) and by four species

comparison (lower panel) Exon set numbers are shown within the bars The error bars show 95 confidential intervals based on the binomial distribution The as-

terisks mark the significance of the ratio differences between the human and chimpanzee branches (one-tailed Fisherrsquos exact test P lt 005) (B) Comparison of PSI

values obtained based on RNA-seq and PCR measurements (C) The comparison of human-specific PSI changes (human minus the mean of chimpanzee and rhe-

sus) was calculated based on RNA-seq and PCR measurements In (B) and (C) one case showing inconsistent splicing patterns in macaques in PCR experiment

compared with RNA-seq data is shown as triangles (D)ndash(F) RNA-seq reads coverage (D) PCR results (E) and western blot results (F) for human-specific splicing in

the CEP63 gene In (D) coverage in three primates using visual cortex data is shown as blue blocks junction reads are shown as orange lines The y-axis indicates

the reads number Human gene annotation is shown at the bottom of the figure the differentially spliced exon (located at 930-1067nt within the coding region) is

colored in red In (E) PCR was conducted in visual cortex samples of three individuals for each of the three primate species In (F) western blot was conducted in

cerebellum samples from three individuals for each of the three primate species In both (E) and (F) the longer and shorter products corresponding to exon inclu-

sion and exclusion isoforms are detected in the upper and lower rows respectively (G) and (H) Characteristics of genes showing I2-type (G) and D2-type (H) splic-

ing changes on the human evolutionary lineage and the corresponding enriched GO terms The red color indicates the presence of the feature Bar plots show the

significance of each GO term after multiple test correction

1480 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

brain region and 19 included splicing changes that altered pro-tein sequences

DiscussionSplicing evolution includes a variety of processes includingbirth and death of novel isoforms as well as increases and de-creases in the frequency of existing ones We investigated eachof these processes separately by classifying differences in exoninclusion levels among four species into six evolutionary pat-terns This classification resulted in an unexpected observationdifferent splicing patterns substantially varied in their frequen-cies and potential functional implications

Among the six splicing patterns one representing the birthof novel isoforms by constitutive-to-alternative exon transitioncontained 60ndash65 of all detected splicing differences Despite itsprevalence our results suggest that this pattern represents nei-ther the primary driving force of long-term isoform repertoireevolution nor the source of novel gene functionality Isoformscreated within this pattern often result in reading frame disrup-tion are less conserved at the DNA sequence level do not clus-ter in any functional categories and are lost more rapidlyduring the course of evolution than isoform changes in theother splicing patterns Although the CDS regions of genes con-taining D1-splicing events are less conserved at the sequencelevel compared with other patterns (Fig 4B) they show thesame proportion of one-to-one orthologs in comparisons to dis-tant species chicken and frog (Supplementary Material Fig S6)This result argues against the reduced functionality of genescontaining D1-splicing events Instead our observation ofhigher synonymous substitution rates (Ks) but not non-synonymous substitution rates (Ka) in D1 pattern exons(Fig 4C) indicates that the lower CDS conservation of D1 patterngenes might be due to a higher local mutation rate Accordinglya higher local mutation rate might lead to overrepresentation ofsplice sites mutations and mutations affecting cis-factor bindingcites associated with D1 pattern events (Fig 5) Our analysisalso suggests that constitutive-to-alternative splicing changesmight be primarily driven by mutations within proximal splicesites and splicing factor binding motifs thereby decreasing theirefficiency Overall these observations strongly imply the neu-tral or deleterious nature of splicing events leading toconstitutive-to-alternative exon transition Nonetheless wecannot exclude that perhaps a few of the novel alternative iso-forms contained in this pattern may gain new functionality dur-ing evolution

The majority of the remaining splicing events (22ndash27 of thetotal) does not affect isoform numbers but instead changes theirrelative frequencies These alternative isoforms tend to pre-serve protein reading frames are highly conserved at the DNAsequence level but tend to show high KaKs ratios indicating anacceleration of protein sequence evolution and tend to be pre-served by evolution We further identified mutations in trans-regulator binding sites as a potential regulatory mechanismdriving this type of splicing change These observations suggestthat changes in inclusion frequencies of existing isoforms rep-resent functional evolutionary transitions

Changes in the inclusion frequency of existing isoformsmight be particularly relevant for evolution of the human brainOne of the two patterns constituting this group of events theincrease in isoform inclusion levels showed a 2-fold accelera-tion on the human evolutionary lineage when compared withthe chimpanzee lineage This acceleration particularly affectedthe isoform repertoire in the brain Furthermore despite the

relatively small number of observed splicing events both pat-terns clustered significantly in biological processes relevant tobrain function

Due to the limited numbers of detected events we were un-able to provide a detailed characterization of the remainingthree evolutionary splicing patterns However this omissiondoes not mean that the splicing processes described by thesepatterns do not play a role in evolution For instance novelexon birth is one of the essential mechanisms leading to theemergence of new gene functions (31ndash33) Another limitation ofour study is the small number of examined tissues As largerdata sets representing a wider selection of species tissues andevolutionary distances appear categorizing splicing events intodiscrete evolutionary patterns might lead to novel insights intothe evolutionary mechanisms and functional consequences ofsplicing changes

Materials and MethodsSamples

The RNA-seq data used in this study were taken from the litera-ture (19) (Ensembl ID GSE49379 Supplementary MaterialTable S1) Selected human chimpanzee and rhesus macaquesamples used in the RNA-seq experiment were also utilized inthe PCR validation experiments as labeled in SupplementaryMaterial Table S1

For western blot human samples were obtained from theNational Institute of Child Health and Human Development(NICHD) Brain and Tissue Bank for Developmental Disorders atthe University of Maryland Written consent for the use of hu-man tissues for research was obtained from all donors or theirnext of kin All subjects were defined as healthy controls by fo-rensic pathologists at the corresponding tissue bank All sub-jects suffered sudden death with no prolonged agonal stateChimpanzee samples were obtained from the BiomedicalPrimate Research Centre The Netherlands Rhesus macaquesamples were obtained from the Suzhou Experimental AnimalCenter China All nonhuman primates used in this study suf-fered sudden deaths for reasons other than their participationin this study and without any relation to the tissue used Thecerebellum samples were dissected from the lateral part of thecerebellar hemispheres

Reads mapping

About 187 million 1 100 nt single-end RNA-seq reads weremapped to human chimpanzee rhesus macaque and mousereference genomes (genome assemble version hg19 panTro3rheMac2 and mm9) using the RNA-seq read mapping softwareTophat (v203 httptophatcbcbumdedu) (34) In the firstround of mapping we detected de-novo junctions in each sam-ple A total of 134 billion reads including 305 million junctionreads were successfully mapped to reference genomes(mapped proportion 72 Supplementary Material Table S2)We combined all detected junctions after we transferred theircoordinates to other species using liftOver (httpsgenomeucsceducgi-binhgLiftOver) The union of all junctions was in-putted in the second round of mapping in order to equalizejunction detection sensitivity in all species Considering thathuman genome annotations are more complete than those forchimpanzees and macaques here we did not use public genestructure annotations in order to avoid analysis bias

1481Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

PSI calculation

To minimize interspecific artifacts caused by length or mappingefficiency changes inside exons we only used junction reads forthe PSI value calculation The PSI value is equal to the propor-tion of inclusion reads among the total reads For example let[L R] denote the exclusion junction ie the junction supportingthe exon-exclusive isoform where L and R represent the posi-tions of its left (50-side) and right (30-side) splice sites respec-tively Also let [L M1] and [M2 R] denote its left and rightinclusion junctions ie the junction supporting the exon-inclusive isoform where M1 and M2 are two reference positionsbetween L and R In the event of a single cassette exon M1 andM2 are the two ends of this exon For each species the PSI valueof this gene segment (one or several adjacent cassette exons) intissue t was calculated as follows

PSIt frac14It

It thorn Et

It frac1412

XiM1

it LM1frac12 thornXiM2

it M2Rfrac12

0

1A

Et frac14X

i

itfrac12LR

where it LM1frac12 denotes the observed read number of this junc-tion in individual i and tissue t and It and Et denote the inclu-sion and exclusion junction read number respectively Lastlyonly splicing events that satisfied the criteria below were usedfor further analysis

1) Must have at least one inclusion or exclusion junction read inevery primate individual for three-species comparison analy-sis or one read in every individual for four-species comparisonanalysis in any tissue We did not require that both inclusionand exclusion reads be detected at the same time in order toallow for the identification of species-specific isoforms

2) In at least one species-tissue combination there must notbe less than four individuals with a left inclusion read andalso no fewer than four individuals with a right inclusionread in at least in one species-tissue combination theremust not be less than four individuals that have an exclu-sion read Additionally we filtered out splicing events withPSI lt 01 or PSI gt 09 in all species-tissue combinations toremove low-abundance alternative isoforms (12)

3) BothP

i it LM1frac12 andP

i it M2Rfrac12 should be larger than thenumber of non-junction reads covering the left or right splicesite ie the retention ratio of either the left or right intronshould be less than 05

Pi it LM1frac12 should not be less than

15 ofP

i it M2Rfrac12 and vice versa This criterion aims to ex-clude events that clearly indicate a mixed splicing type includ-ing intron retention alternative donor sites or alternativeacceptor sites

4) Both for inclusion and exclusion reads the multiple-mapped read number should be less than 15 This crite-rion can alleviate the bias that arises when a splicing regionhas a unique nucleotide sequence in the genome of onespecies but is duplicated in the genome of another species

Detecting lineage-specific spliced gene segments

We first required that the splice sites of analyzed cassette exonsets should be conserved among all compared species and that

the genes should be expressed in all individuals of these species Inthe primate comparison assay rhesus macaque was regarded asthe outgroup in order to detect splicing changes that occurred onthe human and chimpanzee lineages In the four-species compari-son assay mouse was regarded as the outgroup in order to detectsplicing changes that occurred on the ape and rhesus macaque lin-eages All comparisons were done separately in each tissue

In the primate comparison assay a quasi-binomial general-ized linear model was applied based on inclusion and exclusionread numbers using the glm function in R v2 test P-values werecalculated using the species term of this model For multiple-test correction 1000 permutations were performed by randomlyshuffling the species labels We used 005 as the false discoveryrate cutoff We further required that the differences betweenthe maximum and minimum PSI values among the three pri-mate species be larger than 02 The cassette exon sets that metthese criteria were considered as differentially spliced in pri-mates Next we classified each differentially spliced exon set ashuman lineage-specific chimpanzee lineage-specific or otherin each of the five tissues For example we regarded min

H Cj j H Rj jeth THORN gt 2 C Rj j as human lineage-specific and re-garded min H Cj j C Rj jeth THORN gt 2 H Rj j as chimpanzee lineage-specific where H C and R denote the PSI values in humanchimpanzee and rhesus macaque respectively

In the four-species comparison assay we also used the quasi-binomnial generalized linear model test to detect divergent cassetteexon sets followed by 1000 permutations for multiple-test correc-tion with a 02 PSI change cutoff To further detect splicing changesspecific to the rhesus macaque and ape lineages we compared thePSI values of mouse (M) macaque (R) and ape (ie the mean valueof human and chimpanzee) (A) We required that the inclusionlevel on the mouse lineage should be consistent with the inclusionlevel either on the macaque or ape lineages (ie MAj j lt 02 orM Rj j lt 02) Under this condition cassette exon sets that satis-

fied min M Aj j RAj jeth THORN gt 2 M Rj jwere regarded as ape lineage-specific while the cassette exon sets that satisfied min

M Rj j RAj jeth THORN gt 2 MAj j were regarded as rhesus macaquelineage-specific In the birth-and-death analysis of splicing patterns(Fig 3BndashD) equivalent criteria min M Hj j H 1

2 Cthorn Reth THORN

gt 2M 1

2 Cthorn Reth THORN and min M Cj j C 1

2 Hthorn Reth THORN

gt 2 M 12 Hthorn Reth THORN

were applied to human and chimpanzee lineage-specific cassetteexon sets in order to make the numbers of changed cassette exonsets comparable among the four lineages

For the convenience of comprehensively analyzing splicingchanges we chose a representative tissue from the five tissuesfor each changed cassette exon set The representative tissuechosen had the smallest P-value of interspecific PSI changeamong the tissues whose splice change lineage assignment wasthe most common For example if a cassette exon set has P-val-ues 001 0005 003 005 005 in muscle kidney prefrontal-cortex visual-cortex and cerebellum tissues specificallyspliced in human macaque human human unassigned line-ages respectively then the representative tissue is musclesince its most common splice change lineage assignment is hu-man and muscle has the lowest P-value among them (001)

The remaining conserved cassette exon sets with PSIchanges below 02 among three primates in all tissues wereused as a control group on further analysis

Defining the six splicing change patterns

We compared PSI values of lineage-specific cassette exon setsin the changed lineage against the other two (primate assay) orthree (four-species assay) unchanged lineages For each exon

1482 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 2: Predominant patterns of splicing evolution on human ...

splicing patterns among primate species indicates that the ma-jority of observed splicing differences might represent evolu-tionary neutral drift (13)

Previous studies also suggest that evolutionary dynamicsmechanisms and functionality of alternatively spliced isoformsmay differ substantially depending on their splicing patternsinclusion frequency and evolutionary history For examplenovel isoforms formed by exonization may not be as functionalas those produced by other types of exon recruitment indepen-dent of novel exon inclusion frequency (14)

In this study we investigated the relative frequency evolu-tionary conservation evolutionary birth and death rates aswell as potential functionality of isoforms created by differentmechanisms Our analysis was based on transcriptome se-quencing data collected from five somatic tissues from humanschimpanzees rhesus macaques and mice The relatively shortdivergent times (15ndash18) and large proportion of alternativelyspliced genes in these species provide a good system to investi-gate the different types and properties of splicing changes alongthe evolutionary lineages

ResultsSplicing differences among species

We estimated splicing differences among humans chimpanzeesrhesus macaques and mice using poly-A-plus transcriptome datacollected in three brain regions (prefrontal cortex visual cortexand cerebellum) kidney and muscle in six adult individuals ofeach species (Supplementary Material Table S1) (19) The totaltranscriptome data contained 187 billion 100 nucleotide (nt) longreads 134 billion (72) of which were mapped to the correspond-ing genome (Supplementary Material Table S2)

In the first step of our splicing analysis we constructed con-sensus transcriptome maps of the three primate species (humanchimpanzee and macaque) or all four species (primates andmouse) based on the RNA-seq data Specifically after mappingRNA-seq reads to the corresponding genome in an annotation-free manner we mapped all identified canonical splice junctionsto the human genome This procedure yielded 224 325 splicejunctions detected in three primate species and 94 544 splicejunctions in all four species Using these junctions we identifiedexon sets consisting of an internal gene segment and two flank-ing exons Exon sets containing low occurrence isoforms (abun-dancelt 10 in all species-tissue combinations) were excludedfrom further analysis Following this procedure we detected 5480cassette exon sets with sufficient exon junction coverage in threeprimate species and 2859 cassette exon sets in all four species

Splicing variation analysis conducted based on percent-spliced-in (PSI) measurements of internal gene segments (singleexons or exon groups) located within the 1925 exon sets de-tected in all tissues of three primate species showed clear sepa-ration among organs and species (Fig 1A) Concordantly 23 ofthe total splicing variation was explained by the differencesamong tissues and 18 by the differences among the three pri-mate species (permutations Plt 00001) (Fig 1B) Similar resultswere observed using the four species data (Fig 1C and DSupplementary Material Fig S1)

To assess the validity of identified splicing differences wecompared them with published microarray and PCR results (20)Of the 281 and 340 splicing differences reported in cerebellumbetween humans and chimpanzees or macaques respectively111 and 127 were covered by exon sets detected in our study Ofthese 94 (847) and 103 (811) showed consistent difference

direction in our data (Supplementary Material Fig S2) Theseproportions are close to the RT-PCR validation rate (83) re-ported in the study (20) Direct comparison between our dataand the published PCR validation tests showed that 20 of the 23detected splicing events showed consistent differences amongthree primates while the remaining three did not yield conclu-sive results based on visual inspection of the gel images(Supplementary Material Table S3)

Patterns of splicing evolution

To identify significant splicing differences among species weapplied the generalized linear model to 5480 cassette exon setsdetected in primates and to 2859 cassette exon sets detected infour species This was done assuming that the distribution ofjunction-spanning reads followed the quasi-binomial model(21ndash23) The false discovery rate was set to 5 based on 1000permutations of species labels We further required that themean PSI difference between species be greater or equal to 02Based on these criteria a total of 1526 cassette exon sets from1236 genes showed significant splicing differences among pri-mates in at least one tissue Similarly 1385 cassette exon setsfrom 1035 genes showed significant splicing differences amongthe four species in at least one tissue

In agreement with previous reports (7 8) the number ofsplicing differences between species was mainly proportional tothe genetic distances (Pearson correlation rfrac14 088 Plt 00001Fig 2A Supplementary Material Table S4)

Assignment of splicing differences to the evolutionary line-ages and identification of the direction of splicing changes pro-duced an unexpected observation on all three evolutionarylineages changes leading to less exon inclusion were on averagemore than four times more frequent than changes leading togreater exon inclusion (Fig 2B) To understand the possible evo-lutionary mechanisms of this phenomenon we further sortedsplicing events into six evolutionary patterns The three pat-terns of decreased exon inclusion are (D1) decreased inclusionof previously constitutive exons (constitutive-to-alternative)(D2) decreased inclusion of alternative exons (alternative-to-al-ternative) and (D3) constitutive exclusion of alternative exons(exon death) The three patterns of increased exon inclusionare (I1) birth of alternative exons from intronic sequences (exonbirth) (I2) increased inclusion of alternative exons (alternative-to-alternative) and (I3) constitutive inclusion of previously alter-native exons (alternative-to-constitutive) (Fig 2C and D)

Strikingly distribution of observed splicing differences amongthese patterns was far from even The constitutive-to-alternativeD1 pattern contained 78 of all changes leading to a decrease inexon inclusion and 61 of all species-specific splicing eventsidentified on the three primate lineages (Fig 2E) Modifications ofthe inclusion frequency of alternative exons (alternative-to-alter-native D2 and I2 patterns) occupied 15 and 10 of all splicingevents respectively Exon birth and alternative-to-constitutivepatterns I1 and I3 were even more rare constituting 7 and 5 ofall splicing events respectively The exon death pattern D3 wasthe most rare contributing less than 2 of all splicing eventsThis uneven distribution of splicing patterns was observed ineach of the five tissues (Fig 2F Supplementary Material Fig S3)

Evolutionary properties of the main splicing patterns

If changes included in the D1 pattern ie constitutive-to-alternative exon transitions are retained during species

1475Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

evolution then they should lead to a rapid increase in the pro-portion of alternative isoforms To assess this we examined theevolutionary conservation of splicing changes in the three dom-inant splicing patterns D1 D2 and I2 Using mice as an out-group we assigned splicing events to the human chimpanzeeand macaque evolutionary lineages as well as to the ape line-age leading to the most recent common ancestor of humansand chimpanzees (Fig 3A Materials and Methods)

Notably the proportion of D1 splicing events on the ape evo-lutionary lineage was approximately two times smaller than oneither the human or chimpanzee lineages (Fig 3B) This findingis unexpected as the estimated length of the ape lineage isthree to four times greater that that of the human or chimpan-zee lineages (15ndash18) The scarcity of D1 events assigned to theape lineage can be explained either by the recent independentlyoccurring acceleration of D1 pattern events in the human andchimpanzee lineages or more likely by the rapid evolutionaryloss of D1 events originating on the ape lineage In contrast toD1 other splicing patterns such as D2 and I2 have similar evolu-tionary pattern characterized by more events assigned to longerevolutionary lineages (Fig 3C) This difference suggests thatconstitutive-to-alternative exon transitions (D1 pattern) appearand disappear more often during the course of evolution thanother splicing events

We next calculated evolutionary birth and death rates ofsplicing events in the three dominant splicing patterns D1 D2and I2 on the human chimpanzee ape and macaque evolu-tionary lineages using the maximum likelihood approach Weshow that in agreement with our predictions the birth rate ofD1 splicing events was eight times greater than the birth rateof D2 and I2 splicing events taken together 00067 (00051ndash00085) splicing events per cassette exon set per million yearsfor the D1 pattern and 000083 (30 10ndash6ndash00019) for the D2thorn I2patterns The death rate of these novel splicing events ie re-verse splicing changes was also substantially higher for theD1 pattern than for the D2 and I2 patterns 0093 (0071ndash012)per event per million years for the D1 pattern and 0037 (0023ndash0058) for the D2thorn I2 patterns (Fig 3D) These results furtherconfirm the conclusion that evolutionary events leading to ASof constitutive exons are common but the majority of the re-sulting alternative isoforms are rapidly lost during the courseof evolution In contrast changes affecting the inclusion fre-quency of existing alternative exons are less frequent butmore likely to be conserved Thus even though decreased in-clusion of constitutive exons is the most common pattern insplicing evolution it might not play the leading role in the in-crease in isoform repertoire over substantial evolutionarytime

A C

B D

Figure 1 Overview of splicing variation (A) and (C) PCA plots based on PSI values for 5480 events in 90 primate samples (A) and 2859 events in 120 samples from the

four species (C) The symbols show species and colors show tissues (B) and (D) The proportion of total splicing variation explained by the factors species tissues and

sex in primates (B) and four species (D) The proportion of variation explained by chance (gray bars) and their 95 confidential intervals (error bars) were calculated by

1000 permutations of the factor labels

1476 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Sequence conservation in the main splicing patterns

To assess the functional significance of splicing events in thethree dominant patterns D1 D2 and I2 we examined the se-quence and functional conservation of corresponding alterna-tive exons Because the exon boundaries of splicing eventsassessed in our study were conserved across species the vastmajority of splicing events assessed in our study (86ndash97)

resided within the protein-coding region (CDS) of a gene(Supplementary Material Fig S4) Among them the proportionof splicing events disrupting the reading frame was significantlyhigher in the D1 pattern (Fig 4A) Furthermore DNA sequenceconservation of alternatively spliced exons as well as conserva-tion of the entire CDS region was significantly lower in the D1pattern (Fig 4B) These observations suggest the neutral or dele-terious character of D1 pattern events

A

C

D

E F

B

Figure 2 Splicing divergence among species (A) The relationship between splicing divergence and genetic distances among species (B) The number of cassette exon

sets showing splicing changes specific to the lineages detected among primate species (top) or in the four species comparison (bottom) The red and blue numbers rep-

resent the number of cassette exon sets showing increased and decreased exon inclusion frequencies respectively (C) Human-specific splicing changes When splicing

changes were detected in multiple tissues the one with the most significant difference representing the major splicing pattern at this position is shown The colored

dots below the depicted changes indicate tissues M muscle K kidney PC prefrontal cortex VCvisual cortex and Cb cerebellum The bottom colored bar marks the

six evolutionary splicing patterns The same color-coding of splice patterns is used for the arrows in (D) and bar plots in (E) and (F) (D) Schematic representation of the

six evolutionary splicing patterns (E) The proportions occupied by each of the six splice patterns on the three primate lineages (F) The number of cassette exon sets

with species-specific splicing in each of the six patterns in each of the five tissues

1477Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Surprisingly despite the relatively high sequence conserva-tion of D2 and I2 exons they displayed higher KaKs ratioswhen compared with D1 or control exon groups (Fig 4C) Thisfinding parallels observations of elevated KaKs ratios in alter-natively spliced exons of mammals suggesting a stronger direc-tional selection pressure on these regions (2425)

Regulatory mechanisms of the main splicing patterns

The different evolutionary properties of the D1 and D2I2 pat-terns suggest differences in their regulatory mechanisms To as-sess this we examined the extent of DNA sequence changesaffecting splice site strength as well as the presence of regula-tory motifs in the vicinity of splice sites

D1 pattern events indeed contained more sequence changeswithin the upstream intron region adjacent to the 30 splice sitecompared with control exons (Fisher exact test Plt 005) (Fig 5A)Furthermore the resulting changes in splice site strength coin-cided with changes in exon inclusion frequency (binomial test

Plt 0001) This connection between splice site mutation and exoninclusion frequency changes is consistent with the results of aprevious comparison between humans and mice (26) Howeverthe upstream intron region flanking the 30 splice site in the D2and I2 patterns shows fewer sequence changes compared withcontrol exons (Fisher exact test Plt 00005) (Fig 5A) Still for thefew observed sequence changes the resulting decrease or in-crease in splice site strength also coincided with the change inexon inclusion frequency (binomial test Plt 005) These resultsindicate that mutations within the upstream intron affecting 30

splice site strength contribute substantially to splicing changes inall three of the dominant patterns

The birth and death frequency of regulatory splicing motifsnear junctions flanking alternatively spliced segments also var-ied among the three dominant splicing patterns Both D1 andD2 patterns had a higher proportion of sequence changes lead-ing to regulatory motif turnover in the 5rsquo-end region of alterna-tive exons compared with control exons (Fisher exact testPlt 005) (Fig 5B) For the D1 pattern further changes affectingregulatory motif turnover occurred in the proximal and distant

A B D

C

Figure 3 Evolutionary properties of the main splicing patterns (A) Schematic diagram of the evolutionary tree of the three primate species The same lineage color-

coding is used in (B) and (C) MY millions of years (B) The number of gene segments within the I2 and D2 patterns (I2 thorn D2) or within the D1 pattern in four-species

(C) The relationship between species divergence and the number of cassette exon sets for the I2 thorn D2 patterns and the D1 pattern In the D1 correlation plot the dots

representing human and chimpanzee values largely overlap (D) Estimated birth rate (x-axis) and death rate (y-axis) of splicing events in the I2 thorn D2 patterns (red) and

the D1 pattern (blue) White crosses show the maximum likelihood estimates the dark and light color areas show the 95 and 99 confidence intervals respectively

A B C

Figure 4 Sequence conservation properties of the main splicing patterns (A) The proportion of exons preserving the open reading frame in the dominant evolutionary

splicing patterns Red and green asterisks respectively show the significance of higher and lower ratios calculated in comparison with control exons (CNTL) using the

two-tailed Fisherrsquos exact test (B) The distribution of phastCons scores in the dominant evolutionary splicing patterns for gene segments of differently spliced regions

(blue) and for whole coding regions (orange) In both assays the D1 pattern shows significantly lower sequence conservation for both spliced exons (Plt00001) and cod-

ing regions (Plt0001) compared with any other pattern (two-tailed Wilcoxon test) (C) The rates of synonymous (Ks) and nonsynonymous (Ka) substitutions on the hu-

man and chimpanzee evolutionary lineages In (A) and (C) the error bars show the 95 confidence intervals

1478 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

positions of upstream introns a region that has been reportedas important for AS regulation (27) (Fisher exact test Plt 005)Finally no increase in regulatory pattern turnover was detectedfor the I2 pattern (Fig 5B) These observations indicate thatchanges in regulatory elements contribute differently to theevolution of the three dominant splicing patterns

Human-specific features of splicing evolution

In order to characterize features of splicing evolution specific tohumans we compared the frequencies of lineage-specific splic-ing events between the human and chimpanzee lineages Inagreement with the phylogenetic distances splicing changescorresponding to D1 and D2 patterns were distributed equallyon the human and chimpanzee lineages in all five tissuesConversely splicing changes corresponding to the I2 pattern oc-curred approximately two times more frequently on the humanlineage than on the chimpanzee lineage (one-sided Fisher exacttest Pfrac14 003) This difference was even greater for the exon setspresent in all four species (Fig 6A)

To further examine the authenticity of human-specific splic-ing changes we conducted additional semi-quantitative PCRexperiments We designed primer pairs for twelve out of 289human-specific splicing events detected using RNA-seq dataNine of these primer pairs effectively amplified splice segmentsin all three species From these nine splicing events eightshowed consistent differences in isoform concentrationsamong three primates coinciding with the differences observedin RNA-seq data in all three biological replicates (Fig 6B and CSupplementary Material Fig S5 and Table S5)

Among these eight splicing events five were predicted toaffect the protein sequence of the gene (Supplementary

Material Fig S5) To test whether these splicing changes led tothe expression of different protein isoforms in humans com-pared with chimpanzees and macaques we performed westernblot experiments using cerebellum samples from three humansthree chimpanzees and three macaques (SupplementaryMaterial Table S6) From the four antibodies tested only thepolyclonal antibody against CEP63 protein produced specificbands In agreement with our predictions based on the RNA-seqresults the western blot showed the presence of one CEP63 pro-tein isoform in the chimpanzee and macaque brains and twoprotein isoforms in the human brain (Fig 6DndashF)

To assess the potential functional significance of the identi-fied species-specific splicing changes we tested the enrichmentof splicing events from D1 D2 and I2 splicing patterns amongthe gene ontology (GO) functional categories Among all line-ages and patterns the human-specific I2 and D2 patternsshowed significant functional enrichment (FDRlt 005 Fig 6Gand H Supplementary Material Tables S7 and S8) Interestinglyalthough the enriched GO terms for I2 and D2 patterns did notoverlap in both cases they were mainly related to neuronalfunctionality The D2 pattern was enriched in seven cellularcomponent GO terms predominantly associated with synapticlocation while the I2 pattern was enriched in 23 biological pro-cess GO terms including axon guidance and related functionsFunctions enriched in the I2 pattern were reported to be af-fected by AS of cell adhesion proteins (28ndash30) These proteins in-clude products of two cell adhesion genes from the L1 familyCHL1 and NRCAM and two semaphorin genes SEMA4D andSEMA6C which we identified to be alternatively spliced in ourstudy More generally among 33 genes containing I2 or D2 splic-ing changes on the human lineage and falling into the enrichedGO terms 29 showed significant splicing changes in at least one

A

B

Figure 5 Regulatory properties of the main splicing patterns (A) The proportion of splice sites with strength changes on the ape and rhesus macaque lineages The col-

ored and white sections of the bars show the strength of change consistent and inconsistent with the direction of exon inclusion frequency changes respectively

White asterisks within the colored sections of the bars indicate the significance of the colored proportions calculated using the binomial test (B) The proportion of reg-

ulatory motif turnover on the human and chimpanzee evolutionary lineages In (A) and (B) error bars show 95 confidence intervals The schematic exon-intron struc-

ture in the middle shows the positions of splice sites in (A) and evaluated regions in (B) The length of evaluated regions in introns and exons depicted in (B) were 200

and 39 nt respectively

1479Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

A

D

H

G

E

F

B C

Figure 6 Human-specific features of splicing evolution (A) The proportion of cassette exon sets showing species-specific splicing changes for each of the domi-

nant splicing patterns The proportions for human and chimpanzee lineages were calculated in the primate species comparison (upper panel) and by four species

comparison (lower panel) Exon set numbers are shown within the bars The error bars show 95 confidential intervals based on the binomial distribution The as-

terisks mark the significance of the ratio differences between the human and chimpanzee branches (one-tailed Fisherrsquos exact test P lt 005) (B) Comparison of PSI

values obtained based on RNA-seq and PCR measurements (C) The comparison of human-specific PSI changes (human minus the mean of chimpanzee and rhe-

sus) was calculated based on RNA-seq and PCR measurements In (B) and (C) one case showing inconsistent splicing patterns in macaques in PCR experiment

compared with RNA-seq data is shown as triangles (D)ndash(F) RNA-seq reads coverage (D) PCR results (E) and western blot results (F) for human-specific splicing in

the CEP63 gene In (D) coverage in three primates using visual cortex data is shown as blue blocks junction reads are shown as orange lines The y-axis indicates

the reads number Human gene annotation is shown at the bottom of the figure the differentially spliced exon (located at 930-1067nt within the coding region) is

colored in red In (E) PCR was conducted in visual cortex samples of three individuals for each of the three primate species In (F) western blot was conducted in

cerebellum samples from three individuals for each of the three primate species In both (E) and (F) the longer and shorter products corresponding to exon inclu-

sion and exclusion isoforms are detected in the upper and lower rows respectively (G) and (H) Characteristics of genes showing I2-type (G) and D2-type (H) splic-

ing changes on the human evolutionary lineage and the corresponding enriched GO terms The red color indicates the presence of the feature Bar plots show the

significance of each GO term after multiple test correction

1480 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

brain region and 19 included splicing changes that altered pro-tein sequences

DiscussionSplicing evolution includes a variety of processes includingbirth and death of novel isoforms as well as increases and de-creases in the frequency of existing ones We investigated eachof these processes separately by classifying differences in exoninclusion levels among four species into six evolutionary pat-terns This classification resulted in an unexpected observationdifferent splicing patterns substantially varied in their frequen-cies and potential functional implications

Among the six splicing patterns one representing the birthof novel isoforms by constitutive-to-alternative exon transitioncontained 60ndash65 of all detected splicing differences Despite itsprevalence our results suggest that this pattern represents nei-ther the primary driving force of long-term isoform repertoireevolution nor the source of novel gene functionality Isoformscreated within this pattern often result in reading frame disrup-tion are less conserved at the DNA sequence level do not clus-ter in any functional categories and are lost more rapidlyduring the course of evolution than isoform changes in theother splicing patterns Although the CDS regions of genes con-taining D1-splicing events are less conserved at the sequencelevel compared with other patterns (Fig 4B) they show thesame proportion of one-to-one orthologs in comparisons to dis-tant species chicken and frog (Supplementary Material Fig S6)This result argues against the reduced functionality of genescontaining D1-splicing events Instead our observation ofhigher synonymous substitution rates (Ks) but not non-synonymous substitution rates (Ka) in D1 pattern exons(Fig 4C) indicates that the lower CDS conservation of D1 patterngenes might be due to a higher local mutation rate Accordinglya higher local mutation rate might lead to overrepresentation ofsplice sites mutations and mutations affecting cis-factor bindingcites associated with D1 pattern events (Fig 5) Our analysisalso suggests that constitutive-to-alternative splicing changesmight be primarily driven by mutations within proximal splicesites and splicing factor binding motifs thereby decreasing theirefficiency Overall these observations strongly imply the neu-tral or deleterious nature of splicing events leading toconstitutive-to-alternative exon transition Nonetheless wecannot exclude that perhaps a few of the novel alternative iso-forms contained in this pattern may gain new functionality dur-ing evolution

The majority of the remaining splicing events (22ndash27 of thetotal) does not affect isoform numbers but instead changes theirrelative frequencies These alternative isoforms tend to pre-serve protein reading frames are highly conserved at the DNAsequence level but tend to show high KaKs ratios indicating anacceleration of protein sequence evolution and tend to be pre-served by evolution We further identified mutations in trans-regulator binding sites as a potential regulatory mechanismdriving this type of splicing change These observations suggestthat changes in inclusion frequencies of existing isoforms rep-resent functional evolutionary transitions

Changes in the inclusion frequency of existing isoformsmight be particularly relevant for evolution of the human brainOne of the two patterns constituting this group of events theincrease in isoform inclusion levels showed a 2-fold accelera-tion on the human evolutionary lineage when compared withthe chimpanzee lineage This acceleration particularly affectedthe isoform repertoire in the brain Furthermore despite the

relatively small number of observed splicing events both pat-terns clustered significantly in biological processes relevant tobrain function

Due to the limited numbers of detected events we were un-able to provide a detailed characterization of the remainingthree evolutionary splicing patterns However this omissiondoes not mean that the splicing processes described by thesepatterns do not play a role in evolution For instance novelexon birth is one of the essential mechanisms leading to theemergence of new gene functions (31ndash33) Another limitation ofour study is the small number of examined tissues As largerdata sets representing a wider selection of species tissues andevolutionary distances appear categorizing splicing events intodiscrete evolutionary patterns might lead to novel insights intothe evolutionary mechanisms and functional consequences ofsplicing changes

Materials and MethodsSamples

The RNA-seq data used in this study were taken from the litera-ture (19) (Ensembl ID GSE49379 Supplementary MaterialTable S1) Selected human chimpanzee and rhesus macaquesamples used in the RNA-seq experiment were also utilized inthe PCR validation experiments as labeled in SupplementaryMaterial Table S1

For western blot human samples were obtained from theNational Institute of Child Health and Human Development(NICHD) Brain and Tissue Bank for Developmental Disorders atthe University of Maryland Written consent for the use of hu-man tissues for research was obtained from all donors or theirnext of kin All subjects were defined as healthy controls by fo-rensic pathologists at the corresponding tissue bank All sub-jects suffered sudden death with no prolonged agonal stateChimpanzee samples were obtained from the BiomedicalPrimate Research Centre The Netherlands Rhesus macaquesamples were obtained from the Suzhou Experimental AnimalCenter China All nonhuman primates used in this study suf-fered sudden deaths for reasons other than their participationin this study and without any relation to the tissue used Thecerebellum samples were dissected from the lateral part of thecerebellar hemispheres

Reads mapping

About 187 million 1 100 nt single-end RNA-seq reads weremapped to human chimpanzee rhesus macaque and mousereference genomes (genome assemble version hg19 panTro3rheMac2 and mm9) using the RNA-seq read mapping softwareTophat (v203 httptophatcbcbumdedu) (34) In the firstround of mapping we detected de-novo junctions in each sam-ple A total of 134 billion reads including 305 million junctionreads were successfully mapped to reference genomes(mapped proportion 72 Supplementary Material Table S2)We combined all detected junctions after we transferred theircoordinates to other species using liftOver (httpsgenomeucsceducgi-binhgLiftOver) The union of all junctions was in-putted in the second round of mapping in order to equalizejunction detection sensitivity in all species Considering thathuman genome annotations are more complete than those forchimpanzees and macaques here we did not use public genestructure annotations in order to avoid analysis bias

1481Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

PSI calculation

To minimize interspecific artifacts caused by length or mappingefficiency changes inside exons we only used junction reads forthe PSI value calculation The PSI value is equal to the propor-tion of inclusion reads among the total reads For example let[L R] denote the exclusion junction ie the junction supportingthe exon-exclusive isoform where L and R represent the posi-tions of its left (50-side) and right (30-side) splice sites respec-tively Also let [L M1] and [M2 R] denote its left and rightinclusion junctions ie the junction supporting the exon-inclusive isoform where M1 and M2 are two reference positionsbetween L and R In the event of a single cassette exon M1 andM2 are the two ends of this exon For each species the PSI valueof this gene segment (one or several adjacent cassette exons) intissue t was calculated as follows

PSIt frac14It

It thorn Et

It frac1412

XiM1

it LM1frac12 thornXiM2

it M2Rfrac12

0

1A

Et frac14X

i

itfrac12LR

where it LM1frac12 denotes the observed read number of this junc-tion in individual i and tissue t and It and Et denote the inclu-sion and exclusion junction read number respectively Lastlyonly splicing events that satisfied the criteria below were usedfor further analysis

1) Must have at least one inclusion or exclusion junction read inevery primate individual for three-species comparison analy-sis or one read in every individual for four-species comparisonanalysis in any tissue We did not require that both inclusionand exclusion reads be detected at the same time in order toallow for the identification of species-specific isoforms

2) In at least one species-tissue combination there must notbe less than four individuals with a left inclusion read andalso no fewer than four individuals with a right inclusionread in at least in one species-tissue combination theremust not be less than four individuals that have an exclu-sion read Additionally we filtered out splicing events withPSI lt 01 or PSI gt 09 in all species-tissue combinations toremove low-abundance alternative isoforms (12)

3) BothP

i it LM1frac12 andP

i it M2Rfrac12 should be larger than thenumber of non-junction reads covering the left or right splicesite ie the retention ratio of either the left or right intronshould be less than 05

Pi it LM1frac12 should not be less than

15 ofP

i it M2Rfrac12 and vice versa This criterion aims to ex-clude events that clearly indicate a mixed splicing type includ-ing intron retention alternative donor sites or alternativeacceptor sites

4) Both for inclusion and exclusion reads the multiple-mapped read number should be less than 15 This crite-rion can alleviate the bias that arises when a splicing regionhas a unique nucleotide sequence in the genome of onespecies but is duplicated in the genome of another species

Detecting lineage-specific spliced gene segments

We first required that the splice sites of analyzed cassette exonsets should be conserved among all compared species and that

the genes should be expressed in all individuals of these species Inthe primate comparison assay rhesus macaque was regarded asthe outgroup in order to detect splicing changes that occurred onthe human and chimpanzee lineages In the four-species compari-son assay mouse was regarded as the outgroup in order to detectsplicing changes that occurred on the ape and rhesus macaque lin-eages All comparisons were done separately in each tissue

In the primate comparison assay a quasi-binomial general-ized linear model was applied based on inclusion and exclusionread numbers using the glm function in R v2 test P-values werecalculated using the species term of this model For multiple-test correction 1000 permutations were performed by randomlyshuffling the species labels We used 005 as the false discoveryrate cutoff We further required that the differences betweenthe maximum and minimum PSI values among the three pri-mate species be larger than 02 The cassette exon sets that metthese criteria were considered as differentially spliced in pri-mates Next we classified each differentially spliced exon set ashuman lineage-specific chimpanzee lineage-specific or otherin each of the five tissues For example we regarded min

H Cj j H Rj jeth THORN gt 2 C Rj j as human lineage-specific and re-garded min H Cj j C Rj jeth THORN gt 2 H Rj j as chimpanzee lineage-specific where H C and R denote the PSI values in humanchimpanzee and rhesus macaque respectively

In the four-species comparison assay we also used the quasi-binomnial generalized linear model test to detect divergent cassetteexon sets followed by 1000 permutations for multiple-test correc-tion with a 02 PSI change cutoff To further detect splicing changesspecific to the rhesus macaque and ape lineages we compared thePSI values of mouse (M) macaque (R) and ape (ie the mean valueof human and chimpanzee) (A) We required that the inclusionlevel on the mouse lineage should be consistent with the inclusionlevel either on the macaque or ape lineages (ie MAj j lt 02 orM Rj j lt 02) Under this condition cassette exon sets that satis-

fied min M Aj j RAj jeth THORN gt 2 M Rj jwere regarded as ape lineage-specific while the cassette exon sets that satisfied min

M Rj j RAj jeth THORN gt 2 MAj j were regarded as rhesus macaquelineage-specific In the birth-and-death analysis of splicing patterns(Fig 3BndashD) equivalent criteria min M Hj j H 1

2 Cthorn Reth THORN

gt 2M 1

2 Cthorn Reth THORN and min M Cj j C 1

2 Hthorn Reth THORN

gt 2 M 12 Hthorn Reth THORN

were applied to human and chimpanzee lineage-specific cassetteexon sets in order to make the numbers of changed cassette exonsets comparable among the four lineages

For the convenience of comprehensively analyzing splicingchanges we chose a representative tissue from the five tissuesfor each changed cassette exon set The representative tissuechosen had the smallest P-value of interspecific PSI changeamong the tissues whose splice change lineage assignment wasthe most common For example if a cassette exon set has P-val-ues 001 0005 003 005 005 in muscle kidney prefrontal-cortex visual-cortex and cerebellum tissues specificallyspliced in human macaque human human unassigned line-ages respectively then the representative tissue is musclesince its most common splice change lineage assignment is hu-man and muscle has the lowest P-value among them (001)

The remaining conserved cassette exon sets with PSIchanges below 02 among three primates in all tissues wereused as a control group on further analysis

Defining the six splicing change patterns

We compared PSI values of lineage-specific cassette exon setsin the changed lineage against the other two (primate assay) orthree (four-species assay) unchanged lineages For each exon

1482 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 3: Predominant patterns of splicing evolution on human ...

evolution then they should lead to a rapid increase in the pro-portion of alternative isoforms To assess this we examined theevolutionary conservation of splicing changes in the three dom-inant splicing patterns D1 D2 and I2 Using mice as an out-group we assigned splicing events to the human chimpanzeeand macaque evolutionary lineages as well as to the ape line-age leading to the most recent common ancestor of humansand chimpanzees (Fig 3A Materials and Methods)

Notably the proportion of D1 splicing events on the ape evo-lutionary lineage was approximately two times smaller than oneither the human or chimpanzee lineages (Fig 3B) This findingis unexpected as the estimated length of the ape lineage isthree to four times greater that that of the human or chimpan-zee lineages (15ndash18) The scarcity of D1 events assigned to theape lineage can be explained either by the recent independentlyoccurring acceleration of D1 pattern events in the human andchimpanzee lineages or more likely by the rapid evolutionaryloss of D1 events originating on the ape lineage In contrast toD1 other splicing patterns such as D2 and I2 have similar evolu-tionary pattern characterized by more events assigned to longerevolutionary lineages (Fig 3C) This difference suggests thatconstitutive-to-alternative exon transitions (D1 pattern) appearand disappear more often during the course of evolution thanother splicing events

We next calculated evolutionary birth and death rates ofsplicing events in the three dominant splicing patterns D1 D2and I2 on the human chimpanzee ape and macaque evolu-tionary lineages using the maximum likelihood approach Weshow that in agreement with our predictions the birth rate ofD1 splicing events was eight times greater than the birth rateof D2 and I2 splicing events taken together 00067 (00051ndash00085) splicing events per cassette exon set per million yearsfor the D1 pattern and 000083 (30 10ndash6ndash00019) for the D2thorn I2patterns The death rate of these novel splicing events ie re-verse splicing changes was also substantially higher for theD1 pattern than for the D2 and I2 patterns 0093 (0071ndash012)per event per million years for the D1 pattern and 0037 (0023ndash0058) for the D2thorn I2 patterns (Fig 3D) These results furtherconfirm the conclusion that evolutionary events leading to ASof constitutive exons are common but the majority of the re-sulting alternative isoforms are rapidly lost during the courseof evolution In contrast changes affecting the inclusion fre-quency of existing alternative exons are less frequent butmore likely to be conserved Thus even though decreased in-clusion of constitutive exons is the most common pattern insplicing evolution it might not play the leading role in the in-crease in isoform repertoire over substantial evolutionarytime

A C

B D

Figure 1 Overview of splicing variation (A) and (C) PCA plots based on PSI values for 5480 events in 90 primate samples (A) and 2859 events in 120 samples from the

four species (C) The symbols show species and colors show tissues (B) and (D) The proportion of total splicing variation explained by the factors species tissues and

sex in primates (B) and four species (D) The proportion of variation explained by chance (gray bars) and their 95 confidential intervals (error bars) were calculated by

1000 permutations of the factor labels

1476 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Sequence conservation in the main splicing patterns

To assess the functional significance of splicing events in thethree dominant patterns D1 D2 and I2 we examined the se-quence and functional conservation of corresponding alterna-tive exons Because the exon boundaries of splicing eventsassessed in our study were conserved across species the vastmajority of splicing events assessed in our study (86ndash97)

resided within the protein-coding region (CDS) of a gene(Supplementary Material Fig S4) Among them the proportionof splicing events disrupting the reading frame was significantlyhigher in the D1 pattern (Fig 4A) Furthermore DNA sequenceconservation of alternatively spliced exons as well as conserva-tion of the entire CDS region was significantly lower in the D1pattern (Fig 4B) These observations suggest the neutral or dele-terious character of D1 pattern events

A

C

D

E F

B

Figure 2 Splicing divergence among species (A) The relationship between splicing divergence and genetic distances among species (B) The number of cassette exon

sets showing splicing changes specific to the lineages detected among primate species (top) or in the four species comparison (bottom) The red and blue numbers rep-

resent the number of cassette exon sets showing increased and decreased exon inclusion frequencies respectively (C) Human-specific splicing changes When splicing

changes were detected in multiple tissues the one with the most significant difference representing the major splicing pattern at this position is shown The colored

dots below the depicted changes indicate tissues M muscle K kidney PC prefrontal cortex VCvisual cortex and Cb cerebellum The bottom colored bar marks the

six evolutionary splicing patterns The same color-coding of splice patterns is used for the arrows in (D) and bar plots in (E) and (F) (D) Schematic representation of the

six evolutionary splicing patterns (E) The proportions occupied by each of the six splice patterns on the three primate lineages (F) The number of cassette exon sets

with species-specific splicing in each of the six patterns in each of the five tissues

1477Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Surprisingly despite the relatively high sequence conserva-tion of D2 and I2 exons they displayed higher KaKs ratioswhen compared with D1 or control exon groups (Fig 4C) Thisfinding parallels observations of elevated KaKs ratios in alter-natively spliced exons of mammals suggesting a stronger direc-tional selection pressure on these regions (2425)

Regulatory mechanisms of the main splicing patterns

The different evolutionary properties of the D1 and D2I2 pat-terns suggest differences in their regulatory mechanisms To as-sess this we examined the extent of DNA sequence changesaffecting splice site strength as well as the presence of regula-tory motifs in the vicinity of splice sites

D1 pattern events indeed contained more sequence changeswithin the upstream intron region adjacent to the 30 splice sitecompared with control exons (Fisher exact test Plt 005) (Fig 5A)Furthermore the resulting changes in splice site strength coin-cided with changes in exon inclusion frequency (binomial test

Plt 0001) This connection between splice site mutation and exoninclusion frequency changes is consistent with the results of aprevious comparison between humans and mice (26) Howeverthe upstream intron region flanking the 30 splice site in the D2and I2 patterns shows fewer sequence changes compared withcontrol exons (Fisher exact test Plt 00005) (Fig 5A) Still for thefew observed sequence changes the resulting decrease or in-crease in splice site strength also coincided with the change inexon inclusion frequency (binomial test Plt 005) These resultsindicate that mutations within the upstream intron affecting 30

splice site strength contribute substantially to splicing changes inall three of the dominant patterns

The birth and death frequency of regulatory splicing motifsnear junctions flanking alternatively spliced segments also var-ied among the three dominant splicing patterns Both D1 andD2 patterns had a higher proportion of sequence changes lead-ing to regulatory motif turnover in the 5rsquo-end region of alterna-tive exons compared with control exons (Fisher exact testPlt 005) (Fig 5B) For the D1 pattern further changes affectingregulatory motif turnover occurred in the proximal and distant

A B D

C

Figure 3 Evolutionary properties of the main splicing patterns (A) Schematic diagram of the evolutionary tree of the three primate species The same lineage color-

coding is used in (B) and (C) MY millions of years (B) The number of gene segments within the I2 and D2 patterns (I2 thorn D2) or within the D1 pattern in four-species

(C) The relationship between species divergence and the number of cassette exon sets for the I2 thorn D2 patterns and the D1 pattern In the D1 correlation plot the dots

representing human and chimpanzee values largely overlap (D) Estimated birth rate (x-axis) and death rate (y-axis) of splicing events in the I2 thorn D2 patterns (red) and

the D1 pattern (blue) White crosses show the maximum likelihood estimates the dark and light color areas show the 95 and 99 confidence intervals respectively

A B C

Figure 4 Sequence conservation properties of the main splicing patterns (A) The proportion of exons preserving the open reading frame in the dominant evolutionary

splicing patterns Red and green asterisks respectively show the significance of higher and lower ratios calculated in comparison with control exons (CNTL) using the

two-tailed Fisherrsquos exact test (B) The distribution of phastCons scores in the dominant evolutionary splicing patterns for gene segments of differently spliced regions

(blue) and for whole coding regions (orange) In both assays the D1 pattern shows significantly lower sequence conservation for both spliced exons (Plt00001) and cod-

ing regions (Plt0001) compared with any other pattern (two-tailed Wilcoxon test) (C) The rates of synonymous (Ks) and nonsynonymous (Ka) substitutions on the hu-

man and chimpanzee evolutionary lineages In (A) and (C) the error bars show the 95 confidence intervals

1478 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

positions of upstream introns a region that has been reportedas important for AS regulation (27) (Fisher exact test Plt 005)Finally no increase in regulatory pattern turnover was detectedfor the I2 pattern (Fig 5B) These observations indicate thatchanges in regulatory elements contribute differently to theevolution of the three dominant splicing patterns

Human-specific features of splicing evolution

In order to characterize features of splicing evolution specific tohumans we compared the frequencies of lineage-specific splic-ing events between the human and chimpanzee lineages Inagreement with the phylogenetic distances splicing changescorresponding to D1 and D2 patterns were distributed equallyon the human and chimpanzee lineages in all five tissuesConversely splicing changes corresponding to the I2 pattern oc-curred approximately two times more frequently on the humanlineage than on the chimpanzee lineage (one-sided Fisher exacttest Pfrac14 003) This difference was even greater for the exon setspresent in all four species (Fig 6A)

To further examine the authenticity of human-specific splic-ing changes we conducted additional semi-quantitative PCRexperiments We designed primer pairs for twelve out of 289human-specific splicing events detected using RNA-seq dataNine of these primer pairs effectively amplified splice segmentsin all three species From these nine splicing events eightshowed consistent differences in isoform concentrationsamong three primates coinciding with the differences observedin RNA-seq data in all three biological replicates (Fig 6B and CSupplementary Material Fig S5 and Table S5)

Among these eight splicing events five were predicted toaffect the protein sequence of the gene (Supplementary

Material Fig S5) To test whether these splicing changes led tothe expression of different protein isoforms in humans com-pared with chimpanzees and macaques we performed westernblot experiments using cerebellum samples from three humansthree chimpanzees and three macaques (SupplementaryMaterial Table S6) From the four antibodies tested only thepolyclonal antibody against CEP63 protein produced specificbands In agreement with our predictions based on the RNA-seqresults the western blot showed the presence of one CEP63 pro-tein isoform in the chimpanzee and macaque brains and twoprotein isoforms in the human brain (Fig 6DndashF)

To assess the potential functional significance of the identi-fied species-specific splicing changes we tested the enrichmentof splicing events from D1 D2 and I2 splicing patterns amongthe gene ontology (GO) functional categories Among all line-ages and patterns the human-specific I2 and D2 patternsshowed significant functional enrichment (FDRlt 005 Fig 6Gand H Supplementary Material Tables S7 and S8) Interestinglyalthough the enriched GO terms for I2 and D2 patterns did notoverlap in both cases they were mainly related to neuronalfunctionality The D2 pattern was enriched in seven cellularcomponent GO terms predominantly associated with synapticlocation while the I2 pattern was enriched in 23 biological pro-cess GO terms including axon guidance and related functionsFunctions enriched in the I2 pattern were reported to be af-fected by AS of cell adhesion proteins (28ndash30) These proteins in-clude products of two cell adhesion genes from the L1 familyCHL1 and NRCAM and two semaphorin genes SEMA4D andSEMA6C which we identified to be alternatively spliced in ourstudy More generally among 33 genes containing I2 or D2 splic-ing changes on the human lineage and falling into the enrichedGO terms 29 showed significant splicing changes in at least one

A

B

Figure 5 Regulatory properties of the main splicing patterns (A) The proportion of splice sites with strength changes on the ape and rhesus macaque lineages The col-

ored and white sections of the bars show the strength of change consistent and inconsistent with the direction of exon inclusion frequency changes respectively

White asterisks within the colored sections of the bars indicate the significance of the colored proportions calculated using the binomial test (B) The proportion of reg-

ulatory motif turnover on the human and chimpanzee evolutionary lineages In (A) and (B) error bars show 95 confidence intervals The schematic exon-intron struc-

ture in the middle shows the positions of splice sites in (A) and evaluated regions in (B) The length of evaluated regions in introns and exons depicted in (B) were 200

and 39 nt respectively

1479Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

A

D

H

G

E

F

B C

Figure 6 Human-specific features of splicing evolution (A) The proportion of cassette exon sets showing species-specific splicing changes for each of the domi-

nant splicing patterns The proportions for human and chimpanzee lineages were calculated in the primate species comparison (upper panel) and by four species

comparison (lower panel) Exon set numbers are shown within the bars The error bars show 95 confidential intervals based on the binomial distribution The as-

terisks mark the significance of the ratio differences between the human and chimpanzee branches (one-tailed Fisherrsquos exact test P lt 005) (B) Comparison of PSI

values obtained based on RNA-seq and PCR measurements (C) The comparison of human-specific PSI changes (human minus the mean of chimpanzee and rhe-

sus) was calculated based on RNA-seq and PCR measurements In (B) and (C) one case showing inconsistent splicing patterns in macaques in PCR experiment

compared with RNA-seq data is shown as triangles (D)ndash(F) RNA-seq reads coverage (D) PCR results (E) and western blot results (F) for human-specific splicing in

the CEP63 gene In (D) coverage in three primates using visual cortex data is shown as blue blocks junction reads are shown as orange lines The y-axis indicates

the reads number Human gene annotation is shown at the bottom of the figure the differentially spliced exon (located at 930-1067nt within the coding region) is

colored in red In (E) PCR was conducted in visual cortex samples of three individuals for each of the three primate species In (F) western blot was conducted in

cerebellum samples from three individuals for each of the three primate species In both (E) and (F) the longer and shorter products corresponding to exon inclu-

sion and exclusion isoforms are detected in the upper and lower rows respectively (G) and (H) Characteristics of genes showing I2-type (G) and D2-type (H) splic-

ing changes on the human evolutionary lineage and the corresponding enriched GO terms The red color indicates the presence of the feature Bar plots show the

significance of each GO term after multiple test correction

1480 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

brain region and 19 included splicing changes that altered pro-tein sequences

DiscussionSplicing evolution includes a variety of processes includingbirth and death of novel isoforms as well as increases and de-creases in the frequency of existing ones We investigated eachof these processes separately by classifying differences in exoninclusion levels among four species into six evolutionary pat-terns This classification resulted in an unexpected observationdifferent splicing patterns substantially varied in their frequen-cies and potential functional implications

Among the six splicing patterns one representing the birthof novel isoforms by constitutive-to-alternative exon transitioncontained 60ndash65 of all detected splicing differences Despite itsprevalence our results suggest that this pattern represents nei-ther the primary driving force of long-term isoform repertoireevolution nor the source of novel gene functionality Isoformscreated within this pattern often result in reading frame disrup-tion are less conserved at the DNA sequence level do not clus-ter in any functional categories and are lost more rapidlyduring the course of evolution than isoform changes in theother splicing patterns Although the CDS regions of genes con-taining D1-splicing events are less conserved at the sequencelevel compared with other patterns (Fig 4B) they show thesame proportion of one-to-one orthologs in comparisons to dis-tant species chicken and frog (Supplementary Material Fig S6)This result argues against the reduced functionality of genescontaining D1-splicing events Instead our observation ofhigher synonymous substitution rates (Ks) but not non-synonymous substitution rates (Ka) in D1 pattern exons(Fig 4C) indicates that the lower CDS conservation of D1 patterngenes might be due to a higher local mutation rate Accordinglya higher local mutation rate might lead to overrepresentation ofsplice sites mutations and mutations affecting cis-factor bindingcites associated with D1 pattern events (Fig 5) Our analysisalso suggests that constitutive-to-alternative splicing changesmight be primarily driven by mutations within proximal splicesites and splicing factor binding motifs thereby decreasing theirefficiency Overall these observations strongly imply the neu-tral or deleterious nature of splicing events leading toconstitutive-to-alternative exon transition Nonetheless wecannot exclude that perhaps a few of the novel alternative iso-forms contained in this pattern may gain new functionality dur-ing evolution

The majority of the remaining splicing events (22ndash27 of thetotal) does not affect isoform numbers but instead changes theirrelative frequencies These alternative isoforms tend to pre-serve protein reading frames are highly conserved at the DNAsequence level but tend to show high KaKs ratios indicating anacceleration of protein sequence evolution and tend to be pre-served by evolution We further identified mutations in trans-regulator binding sites as a potential regulatory mechanismdriving this type of splicing change These observations suggestthat changes in inclusion frequencies of existing isoforms rep-resent functional evolutionary transitions

Changes in the inclusion frequency of existing isoformsmight be particularly relevant for evolution of the human brainOne of the two patterns constituting this group of events theincrease in isoform inclusion levels showed a 2-fold accelera-tion on the human evolutionary lineage when compared withthe chimpanzee lineage This acceleration particularly affectedthe isoform repertoire in the brain Furthermore despite the

relatively small number of observed splicing events both pat-terns clustered significantly in biological processes relevant tobrain function

Due to the limited numbers of detected events we were un-able to provide a detailed characterization of the remainingthree evolutionary splicing patterns However this omissiondoes not mean that the splicing processes described by thesepatterns do not play a role in evolution For instance novelexon birth is one of the essential mechanisms leading to theemergence of new gene functions (31ndash33) Another limitation ofour study is the small number of examined tissues As largerdata sets representing a wider selection of species tissues andevolutionary distances appear categorizing splicing events intodiscrete evolutionary patterns might lead to novel insights intothe evolutionary mechanisms and functional consequences ofsplicing changes

Materials and MethodsSamples

The RNA-seq data used in this study were taken from the litera-ture (19) (Ensembl ID GSE49379 Supplementary MaterialTable S1) Selected human chimpanzee and rhesus macaquesamples used in the RNA-seq experiment were also utilized inthe PCR validation experiments as labeled in SupplementaryMaterial Table S1

For western blot human samples were obtained from theNational Institute of Child Health and Human Development(NICHD) Brain and Tissue Bank for Developmental Disorders atthe University of Maryland Written consent for the use of hu-man tissues for research was obtained from all donors or theirnext of kin All subjects were defined as healthy controls by fo-rensic pathologists at the corresponding tissue bank All sub-jects suffered sudden death with no prolonged agonal stateChimpanzee samples were obtained from the BiomedicalPrimate Research Centre The Netherlands Rhesus macaquesamples were obtained from the Suzhou Experimental AnimalCenter China All nonhuman primates used in this study suf-fered sudden deaths for reasons other than their participationin this study and without any relation to the tissue used Thecerebellum samples were dissected from the lateral part of thecerebellar hemispheres

Reads mapping

About 187 million 1 100 nt single-end RNA-seq reads weremapped to human chimpanzee rhesus macaque and mousereference genomes (genome assemble version hg19 panTro3rheMac2 and mm9) using the RNA-seq read mapping softwareTophat (v203 httptophatcbcbumdedu) (34) In the firstround of mapping we detected de-novo junctions in each sam-ple A total of 134 billion reads including 305 million junctionreads were successfully mapped to reference genomes(mapped proportion 72 Supplementary Material Table S2)We combined all detected junctions after we transferred theircoordinates to other species using liftOver (httpsgenomeucsceducgi-binhgLiftOver) The union of all junctions was in-putted in the second round of mapping in order to equalizejunction detection sensitivity in all species Considering thathuman genome annotations are more complete than those forchimpanzees and macaques here we did not use public genestructure annotations in order to avoid analysis bias

1481Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

PSI calculation

To minimize interspecific artifacts caused by length or mappingefficiency changes inside exons we only used junction reads forthe PSI value calculation The PSI value is equal to the propor-tion of inclusion reads among the total reads For example let[L R] denote the exclusion junction ie the junction supportingthe exon-exclusive isoform where L and R represent the posi-tions of its left (50-side) and right (30-side) splice sites respec-tively Also let [L M1] and [M2 R] denote its left and rightinclusion junctions ie the junction supporting the exon-inclusive isoform where M1 and M2 are two reference positionsbetween L and R In the event of a single cassette exon M1 andM2 are the two ends of this exon For each species the PSI valueof this gene segment (one or several adjacent cassette exons) intissue t was calculated as follows

PSIt frac14It

It thorn Et

It frac1412

XiM1

it LM1frac12 thornXiM2

it M2Rfrac12

0

1A

Et frac14X

i

itfrac12LR

where it LM1frac12 denotes the observed read number of this junc-tion in individual i and tissue t and It and Et denote the inclu-sion and exclusion junction read number respectively Lastlyonly splicing events that satisfied the criteria below were usedfor further analysis

1) Must have at least one inclusion or exclusion junction read inevery primate individual for three-species comparison analy-sis or one read in every individual for four-species comparisonanalysis in any tissue We did not require that both inclusionand exclusion reads be detected at the same time in order toallow for the identification of species-specific isoforms

2) In at least one species-tissue combination there must notbe less than four individuals with a left inclusion read andalso no fewer than four individuals with a right inclusionread in at least in one species-tissue combination theremust not be less than four individuals that have an exclu-sion read Additionally we filtered out splicing events withPSI lt 01 or PSI gt 09 in all species-tissue combinations toremove low-abundance alternative isoforms (12)

3) BothP

i it LM1frac12 andP

i it M2Rfrac12 should be larger than thenumber of non-junction reads covering the left or right splicesite ie the retention ratio of either the left or right intronshould be less than 05

Pi it LM1frac12 should not be less than

15 ofP

i it M2Rfrac12 and vice versa This criterion aims to ex-clude events that clearly indicate a mixed splicing type includ-ing intron retention alternative donor sites or alternativeacceptor sites

4) Both for inclusion and exclusion reads the multiple-mapped read number should be less than 15 This crite-rion can alleviate the bias that arises when a splicing regionhas a unique nucleotide sequence in the genome of onespecies but is duplicated in the genome of another species

Detecting lineage-specific spliced gene segments

We first required that the splice sites of analyzed cassette exonsets should be conserved among all compared species and that

the genes should be expressed in all individuals of these species Inthe primate comparison assay rhesus macaque was regarded asthe outgroup in order to detect splicing changes that occurred onthe human and chimpanzee lineages In the four-species compari-son assay mouse was regarded as the outgroup in order to detectsplicing changes that occurred on the ape and rhesus macaque lin-eages All comparisons were done separately in each tissue

In the primate comparison assay a quasi-binomial general-ized linear model was applied based on inclusion and exclusionread numbers using the glm function in R v2 test P-values werecalculated using the species term of this model For multiple-test correction 1000 permutations were performed by randomlyshuffling the species labels We used 005 as the false discoveryrate cutoff We further required that the differences betweenthe maximum and minimum PSI values among the three pri-mate species be larger than 02 The cassette exon sets that metthese criteria were considered as differentially spliced in pri-mates Next we classified each differentially spliced exon set ashuman lineage-specific chimpanzee lineage-specific or otherin each of the five tissues For example we regarded min

H Cj j H Rj jeth THORN gt 2 C Rj j as human lineage-specific and re-garded min H Cj j C Rj jeth THORN gt 2 H Rj j as chimpanzee lineage-specific where H C and R denote the PSI values in humanchimpanzee and rhesus macaque respectively

In the four-species comparison assay we also used the quasi-binomnial generalized linear model test to detect divergent cassetteexon sets followed by 1000 permutations for multiple-test correc-tion with a 02 PSI change cutoff To further detect splicing changesspecific to the rhesus macaque and ape lineages we compared thePSI values of mouse (M) macaque (R) and ape (ie the mean valueof human and chimpanzee) (A) We required that the inclusionlevel on the mouse lineage should be consistent with the inclusionlevel either on the macaque or ape lineages (ie MAj j lt 02 orM Rj j lt 02) Under this condition cassette exon sets that satis-

fied min M Aj j RAj jeth THORN gt 2 M Rj jwere regarded as ape lineage-specific while the cassette exon sets that satisfied min

M Rj j RAj jeth THORN gt 2 MAj j were regarded as rhesus macaquelineage-specific In the birth-and-death analysis of splicing patterns(Fig 3BndashD) equivalent criteria min M Hj j H 1

2 Cthorn Reth THORN

gt 2M 1

2 Cthorn Reth THORN and min M Cj j C 1

2 Hthorn Reth THORN

gt 2 M 12 Hthorn Reth THORN

were applied to human and chimpanzee lineage-specific cassetteexon sets in order to make the numbers of changed cassette exonsets comparable among the four lineages

For the convenience of comprehensively analyzing splicingchanges we chose a representative tissue from the five tissuesfor each changed cassette exon set The representative tissuechosen had the smallest P-value of interspecific PSI changeamong the tissues whose splice change lineage assignment wasthe most common For example if a cassette exon set has P-val-ues 001 0005 003 005 005 in muscle kidney prefrontal-cortex visual-cortex and cerebellum tissues specificallyspliced in human macaque human human unassigned line-ages respectively then the representative tissue is musclesince its most common splice change lineage assignment is hu-man and muscle has the lowest P-value among them (001)

The remaining conserved cassette exon sets with PSIchanges below 02 among three primates in all tissues wereused as a control group on further analysis

Defining the six splicing change patterns

We compared PSI values of lineage-specific cassette exon setsin the changed lineage against the other two (primate assay) orthree (four-species assay) unchanged lineages For each exon

1482 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 4: Predominant patterns of splicing evolution on human ...

Sequence conservation in the main splicing patterns

To assess the functional significance of splicing events in thethree dominant patterns D1 D2 and I2 we examined the se-quence and functional conservation of corresponding alterna-tive exons Because the exon boundaries of splicing eventsassessed in our study were conserved across species the vastmajority of splicing events assessed in our study (86ndash97)

resided within the protein-coding region (CDS) of a gene(Supplementary Material Fig S4) Among them the proportionof splicing events disrupting the reading frame was significantlyhigher in the D1 pattern (Fig 4A) Furthermore DNA sequenceconservation of alternatively spliced exons as well as conserva-tion of the entire CDS region was significantly lower in the D1pattern (Fig 4B) These observations suggest the neutral or dele-terious character of D1 pattern events

A

C

D

E F

B

Figure 2 Splicing divergence among species (A) The relationship between splicing divergence and genetic distances among species (B) The number of cassette exon

sets showing splicing changes specific to the lineages detected among primate species (top) or in the four species comparison (bottom) The red and blue numbers rep-

resent the number of cassette exon sets showing increased and decreased exon inclusion frequencies respectively (C) Human-specific splicing changes When splicing

changes were detected in multiple tissues the one with the most significant difference representing the major splicing pattern at this position is shown The colored

dots below the depicted changes indicate tissues M muscle K kidney PC prefrontal cortex VCvisual cortex and Cb cerebellum The bottom colored bar marks the

six evolutionary splicing patterns The same color-coding of splice patterns is used for the arrows in (D) and bar plots in (E) and (F) (D) Schematic representation of the

six evolutionary splicing patterns (E) The proportions occupied by each of the six splice patterns on the three primate lineages (F) The number of cassette exon sets

with species-specific splicing in each of the six patterns in each of the five tissues

1477Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Surprisingly despite the relatively high sequence conserva-tion of D2 and I2 exons they displayed higher KaKs ratioswhen compared with D1 or control exon groups (Fig 4C) Thisfinding parallels observations of elevated KaKs ratios in alter-natively spliced exons of mammals suggesting a stronger direc-tional selection pressure on these regions (2425)

Regulatory mechanisms of the main splicing patterns

The different evolutionary properties of the D1 and D2I2 pat-terns suggest differences in their regulatory mechanisms To as-sess this we examined the extent of DNA sequence changesaffecting splice site strength as well as the presence of regula-tory motifs in the vicinity of splice sites

D1 pattern events indeed contained more sequence changeswithin the upstream intron region adjacent to the 30 splice sitecompared with control exons (Fisher exact test Plt 005) (Fig 5A)Furthermore the resulting changes in splice site strength coin-cided with changes in exon inclusion frequency (binomial test

Plt 0001) This connection between splice site mutation and exoninclusion frequency changes is consistent with the results of aprevious comparison between humans and mice (26) Howeverthe upstream intron region flanking the 30 splice site in the D2and I2 patterns shows fewer sequence changes compared withcontrol exons (Fisher exact test Plt 00005) (Fig 5A) Still for thefew observed sequence changes the resulting decrease or in-crease in splice site strength also coincided with the change inexon inclusion frequency (binomial test Plt 005) These resultsindicate that mutations within the upstream intron affecting 30

splice site strength contribute substantially to splicing changes inall three of the dominant patterns

The birth and death frequency of regulatory splicing motifsnear junctions flanking alternatively spliced segments also var-ied among the three dominant splicing patterns Both D1 andD2 patterns had a higher proportion of sequence changes lead-ing to regulatory motif turnover in the 5rsquo-end region of alterna-tive exons compared with control exons (Fisher exact testPlt 005) (Fig 5B) For the D1 pattern further changes affectingregulatory motif turnover occurred in the proximal and distant

A B D

C

Figure 3 Evolutionary properties of the main splicing patterns (A) Schematic diagram of the evolutionary tree of the three primate species The same lineage color-

coding is used in (B) and (C) MY millions of years (B) The number of gene segments within the I2 and D2 patterns (I2 thorn D2) or within the D1 pattern in four-species

(C) The relationship between species divergence and the number of cassette exon sets for the I2 thorn D2 patterns and the D1 pattern In the D1 correlation plot the dots

representing human and chimpanzee values largely overlap (D) Estimated birth rate (x-axis) and death rate (y-axis) of splicing events in the I2 thorn D2 patterns (red) and

the D1 pattern (blue) White crosses show the maximum likelihood estimates the dark and light color areas show the 95 and 99 confidence intervals respectively

A B C

Figure 4 Sequence conservation properties of the main splicing patterns (A) The proportion of exons preserving the open reading frame in the dominant evolutionary

splicing patterns Red and green asterisks respectively show the significance of higher and lower ratios calculated in comparison with control exons (CNTL) using the

two-tailed Fisherrsquos exact test (B) The distribution of phastCons scores in the dominant evolutionary splicing patterns for gene segments of differently spliced regions

(blue) and for whole coding regions (orange) In both assays the D1 pattern shows significantly lower sequence conservation for both spliced exons (Plt00001) and cod-

ing regions (Plt0001) compared with any other pattern (two-tailed Wilcoxon test) (C) The rates of synonymous (Ks) and nonsynonymous (Ka) substitutions on the hu-

man and chimpanzee evolutionary lineages In (A) and (C) the error bars show the 95 confidence intervals

1478 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

positions of upstream introns a region that has been reportedas important for AS regulation (27) (Fisher exact test Plt 005)Finally no increase in regulatory pattern turnover was detectedfor the I2 pattern (Fig 5B) These observations indicate thatchanges in regulatory elements contribute differently to theevolution of the three dominant splicing patterns

Human-specific features of splicing evolution

In order to characterize features of splicing evolution specific tohumans we compared the frequencies of lineage-specific splic-ing events between the human and chimpanzee lineages Inagreement with the phylogenetic distances splicing changescorresponding to D1 and D2 patterns were distributed equallyon the human and chimpanzee lineages in all five tissuesConversely splicing changes corresponding to the I2 pattern oc-curred approximately two times more frequently on the humanlineage than on the chimpanzee lineage (one-sided Fisher exacttest Pfrac14 003) This difference was even greater for the exon setspresent in all four species (Fig 6A)

To further examine the authenticity of human-specific splic-ing changes we conducted additional semi-quantitative PCRexperiments We designed primer pairs for twelve out of 289human-specific splicing events detected using RNA-seq dataNine of these primer pairs effectively amplified splice segmentsin all three species From these nine splicing events eightshowed consistent differences in isoform concentrationsamong three primates coinciding with the differences observedin RNA-seq data in all three biological replicates (Fig 6B and CSupplementary Material Fig S5 and Table S5)

Among these eight splicing events five were predicted toaffect the protein sequence of the gene (Supplementary

Material Fig S5) To test whether these splicing changes led tothe expression of different protein isoforms in humans com-pared with chimpanzees and macaques we performed westernblot experiments using cerebellum samples from three humansthree chimpanzees and three macaques (SupplementaryMaterial Table S6) From the four antibodies tested only thepolyclonal antibody against CEP63 protein produced specificbands In agreement with our predictions based on the RNA-seqresults the western blot showed the presence of one CEP63 pro-tein isoform in the chimpanzee and macaque brains and twoprotein isoforms in the human brain (Fig 6DndashF)

To assess the potential functional significance of the identi-fied species-specific splicing changes we tested the enrichmentof splicing events from D1 D2 and I2 splicing patterns amongthe gene ontology (GO) functional categories Among all line-ages and patterns the human-specific I2 and D2 patternsshowed significant functional enrichment (FDRlt 005 Fig 6Gand H Supplementary Material Tables S7 and S8) Interestinglyalthough the enriched GO terms for I2 and D2 patterns did notoverlap in both cases they were mainly related to neuronalfunctionality The D2 pattern was enriched in seven cellularcomponent GO terms predominantly associated with synapticlocation while the I2 pattern was enriched in 23 biological pro-cess GO terms including axon guidance and related functionsFunctions enriched in the I2 pattern were reported to be af-fected by AS of cell adhesion proteins (28ndash30) These proteins in-clude products of two cell adhesion genes from the L1 familyCHL1 and NRCAM and two semaphorin genes SEMA4D andSEMA6C which we identified to be alternatively spliced in ourstudy More generally among 33 genes containing I2 or D2 splic-ing changes on the human lineage and falling into the enrichedGO terms 29 showed significant splicing changes in at least one

A

B

Figure 5 Regulatory properties of the main splicing patterns (A) The proportion of splice sites with strength changes on the ape and rhesus macaque lineages The col-

ored and white sections of the bars show the strength of change consistent and inconsistent with the direction of exon inclusion frequency changes respectively

White asterisks within the colored sections of the bars indicate the significance of the colored proportions calculated using the binomial test (B) The proportion of reg-

ulatory motif turnover on the human and chimpanzee evolutionary lineages In (A) and (B) error bars show 95 confidence intervals The schematic exon-intron struc-

ture in the middle shows the positions of splice sites in (A) and evaluated regions in (B) The length of evaluated regions in introns and exons depicted in (B) were 200

and 39 nt respectively

1479Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

A

D

H

G

E

F

B C

Figure 6 Human-specific features of splicing evolution (A) The proportion of cassette exon sets showing species-specific splicing changes for each of the domi-

nant splicing patterns The proportions for human and chimpanzee lineages were calculated in the primate species comparison (upper panel) and by four species

comparison (lower panel) Exon set numbers are shown within the bars The error bars show 95 confidential intervals based on the binomial distribution The as-

terisks mark the significance of the ratio differences between the human and chimpanzee branches (one-tailed Fisherrsquos exact test P lt 005) (B) Comparison of PSI

values obtained based on RNA-seq and PCR measurements (C) The comparison of human-specific PSI changes (human minus the mean of chimpanzee and rhe-

sus) was calculated based on RNA-seq and PCR measurements In (B) and (C) one case showing inconsistent splicing patterns in macaques in PCR experiment

compared with RNA-seq data is shown as triangles (D)ndash(F) RNA-seq reads coverage (D) PCR results (E) and western blot results (F) for human-specific splicing in

the CEP63 gene In (D) coverage in three primates using visual cortex data is shown as blue blocks junction reads are shown as orange lines The y-axis indicates

the reads number Human gene annotation is shown at the bottom of the figure the differentially spliced exon (located at 930-1067nt within the coding region) is

colored in red In (E) PCR was conducted in visual cortex samples of three individuals for each of the three primate species In (F) western blot was conducted in

cerebellum samples from three individuals for each of the three primate species In both (E) and (F) the longer and shorter products corresponding to exon inclu-

sion and exclusion isoforms are detected in the upper and lower rows respectively (G) and (H) Characteristics of genes showing I2-type (G) and D2-type (H) splic-

ing changes on the human evolutionary lineage and the corresponding enriched GO terms The red color indicates the presence of the feature Bar plots show the

significance of each GO term after multiple test correction

1480 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

brain region and 19 included splicing changes that altered pro-tein sequences

DiscussionSplicing evolution includes a variety of processes includingbirth and death of novel isoforms as well as increases and de-creases in the frequency of existing ones We investigated eachof these processes separately by classifying differences in exoninclusion levels among four species into six evolutionary pat-terns This classification resulted in an unexpected observationdifferent splicing patterns substantially varied in their frequen-cies and potential functional implications

Among the six splicing patterns one representing the birthof novel isoforms by constitutive-to-alternative exon transitioncontained 60ndash65 of all detected splicing differences Despite itsprevalence our results suggest that this pattern represents nei-ther the primary driving force of long-term isoform repertoireevolution nor the source of novel gene functionality Isoformscreated within this pattern often result in reading frame disrup-tion are less conserved at the DNA sequence level do not clus-ter in any functional categories and are lost more rapidlyduring the course of evolution than isoform changes in theother splicing patterns Although the CDS regions of genes con-taining D1-splicing events are less conserved at the sequencelevel compared with other patterns (Fig 4B) they show thesame proportion of one-to-one orthologs in comparisons to dis-tant species chicken and frog (Supplementary Material Fig S6)This result argues against the reduced functionality of genescontaining D1-splicing events Instead our observation ofhigher synonymous substitution rates (Ks) but not non-synonymous substitution rates (Ka) in D1 pattern exons(Fig 4C) indicates that the lower CDS conservation of D1 patterngenes might be due to a higher local mutation rate Accordinglya higher local mutation rate might lead to overrepresentation ofsplice sites mutations and mutations affecting cis-factor bindingcites associated with D1 pattern events (Fig 5) Our analysisalso suggests that constitutive-to-alternative splicing changesmight be primarily driven by mutations within proximal splicesites and splicing factor binding motifs thereby decreasing theirefficiency Overall these observations strongly imply the neu-tral or deleterious nature of splicing events leading toconstitutive-to-alternative exon transition Nonetheless wecannot exclude that perhaps a few of the novel alternative iso-forms contained in this pattern may gain new functionality dur-ing evolution

The majority of the remaining splicing events (22ndash27 of thetotal) does not affect isoform numbers but instead changes theirrelative frequencies These alternative isoforms tend to pre-serve protein reading frames are highly conserved at the DNAsequence level but tend to show high KaKs ratios indicating anacceleration of protein sequence evolution and tend to be pre-served by evolution We further identified mutations in trans-regulator binding sites as a potential regulatory mechanismdriving this type of splicing change These observations suggestthat changes in inclusion frequencies of existing isoforms rep-resent functional evolutionary transitions

Changes in the inclusion frequency of existing isoformsmight be particularly relevant for evolution of the human brainOne of the two patterns constituting this group of events theincrease in isoform inclusion levels showed a 2-fold accelera-tion on the human evolutionary lineage when compared withthe chimpanzee lineage This acceleration particularly affectedthe isoform repertoire in the brain Furthermore despite the

relatively small number of observed splicing events both pat-terns clustered significantly in biological processes relevant tobrain function

Due to the limited numbers of detected events we were un-able to provide a detailed characterization of the remainingthree evolutionary splicing patterns However this omissiondoes not mean that the splicing processes described by thesepatterns do not play a role in evolution For instance novelexon birth is one of the essential mechanisms leading to theemergence of new gene functions (31ndash33) Another limitation ofour study is the small number of examined tissues As largerdata sets representing a wider selection of species tissues andevolutionary distances appear categorizing splicing events intodiscrete evolutionary patterns might lead to novel insights intothe evolutionary mechanisms and functional consequences ofsplicing changes

Materials and MethodsSamples

The RNA-seq data used in this study were taken from the litera-ture (19) (Ensembl ID GSE49379 Supplementary MaterialTable S1) Selected human chimpanzee and rhesus macaquesamples used in the RNA-seq experiment were also utilized inthe PCR validation experiments as labeled in SupplementaryMaterial Table S1

For western blot human samples were obtained from theNational Institute of Child Health and Human Development(NICHD) Brain and Tissue Bank for Developmental Disorders atthe University of Maryland Written consent for the use of hu-man tissues for research was obtained from all donors or theirnext of kin All subjects were defined as healthy controls by fo-rensic pathologists at the corresponding tissue bank All sub-jects suffered sudden death with no prolonged agonal stateChimpanzee samples were obtained from the BiomedicalPrimate Research Centre The Netherlands Rhesus macaquesamples were obtained from the Suzhou Experimental AnimalCenter China All nonhuman primates used in this study suf-fered sudden deaths for reasons other than their participationin this study and without any relation to the tissue used Thecerebellum samples were dissected from the lateral part of thecerebellar hemispheres

Reads mapping

About 187 million 1 100 nt single-end RNA-seq reads weremapped to human chimpanzee rhesus macaque and mousereference genomes (genome assemble version hg19 panTro3rheMac2 and mm9) using the RNA-seq read mapping softwareTophat (v203 httptophatcbcbumdedu) (34) In the firstround of mapping we detected de-novo junctions in each sam-ple A total of 134 billion reads including 305 million junctionreads were successfully mapped to reference genomes(mapped proportion 72 Supplementary Material Table S2)We combined all detected junctions after we transferred theircoordinates to other species using liftOver (httpsgenomeucsceducgi-binhgLiftOver) The union of all junctions was in-putted in the second round of mapping in order to equalizejunction detection sensitivity in all species Considering thathuman genome annotations are more complete than those forchimpanzees and macaques here we did not use public genestructure annotations in order to avoid analysis bias

1481Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

PSI calculation

To minimize interspecific artifacts caused by length or mappingefficiency changes inside exons we only used junction reads forthe PSI value calculation The PSI value is equal to the propor-tion of inclusion reads among the total reads For example let[L R] denote the exclusion junction ie the junction supportingthe exon-exclusive isoform where L and R represent the posi-tions of its left (50-side) and right (30-side) splice sites respec-tively Also let [L M1] and [M2 R] denote its left and rightinclusion junctions ie the junction supporting the exon-inclusive isoform where M1 and M2 are two reference positionsbetween L and R In the event of a single cassette exon M1 andM2 are the two ends of this exon For each species the PSI valueof this gene segment (one or several adjacent cassette exons) intissue t was calculated as follows

PSIt frac14It

It thorn Et

It frac1412

XiM1

it LM1frac12 thornXiM2

it M2Rfrac12

0

1A

Et frac14X

i

itfrac12LR

where it LM1frac12 denotes the observed read number of this junc-tion in individual i and tissue t and It and Et denote the inclu-sion and exclusion junction read number respectively Lastlyonly splicing events that satisfied the criteria below were usedfor further analysis

1) Must have at least one inclusion or exclusion junction read inevery primate individual for three-species comparison analy-sis or one read in every individual for four-species comparisonanalysis in any tissue We did not require that both inclusionand exclusion reads be detected at the same time in order toallow for the identification of species-specific isoforms

2) In at least one species-tissue combination there must notbe less than four individuals with a left inclusion read andalso no fewer than four individuals with a right inclusionread in at least in one species-tissue combination theremust not be less than four individuals that have an exclu-sion read Additionally we filtered out splicing events withPSI lt 01 or PSI gt 09 in all species-tissue combinations toremove low-abundance alternative isoforms (12)

3) BothP

i it LM1frac12 andP

i it M2Rfrac12 should be larger than thenumber of non-junction reads covering the left or right splicesite ie the retention ratio of either the left or right intronshould be less than 05

Pi it LM1frac12 should not be less than

15 ofP

i it M2Rfrac12 and vice versa This criterion aims to ex-clude events that clearly indicate a mixed splicing type includ-ing intron retention alternative donor sites or alternativeacceptor sites

4) Both for inclusion and exclusion reads the multiple-mapped read number should be less than 15 This crite-rion can alleviate the bias that arises when a splicing regionhas a unique nucleotide sequence in the genome of onespecies but is duplicated in the genome of another species

Detecting lineage-specific spliced gene segments

We first required that the splice sites of analyzed cassette exonsets should be conserved among all compared species and that

the genes should be expressed in all individuals of these species Inthe primate comparison assay rhesus macaque was regarded asthe outgroup in order to detect splicing changes that occurred onthe human and chimpanzee lineages In the four-species compari-son assay mouse was regarded as the outgroup in order to detectsplicing changes that occurred on the ape and rhesus macaque lin-eages All comparisons were done separately in each tissue

In the primate comparison assay a quasi-binomial general-ized linear model was applied based on inclusion and exclusionread numbers using the glm function in R v2 test P-values werecalculated using the species term of this model For multiple-test correction 1000 permutations were performed by randomlyshuffling the species labels We used 005 as the false discoveryrate cutoff We further required that the differences betweenthe maximum and minimum PSI values among the three pri-mate species be larger than 02 The cassette exon sets that metthese criteria were considered as differentially spliced in pri-mates Next we classified each differentially spliced exon set ashuman lineage-specific chimpanzee lineage-specific or otherin each of the five tissues For example we regarded min

H Cj j H Rj jeth THORN gt 2 C Rj j as human lineage-specific and re-garded min H Cj j C Rj jeth THORN gt 2 H Rj j as chimpanzee lineage-specific where H C and R denote the PSI values in humanchimpanzee and rhesus macaque respectively

In the four-species comparison assay we also used the quasi-binomnial generalized linear model test to detect divergent cassetteexon sets followed by 1000 permutations for multiple-test correc-tion with a 02 PSI change cutoff To further detect splicing changesspecific to the rhesus macaque and ape lineages we compared thePSI values of mouse (M) macaque (R) and ape (ie the mean valueof human and chimpanzee) (A) We required that the inclusionlevel on the mouse lineage should be consistent with the inclusionlevel either on the macaque or ape lineages (ie MAj j lt 02 orM Rj j lt 02) Under this condition cassette exon sets that satis-

fied min M Aj j RAj jeth THORN gt 2 M Rj jwere regarded as ape lineage-specific while the cassette exon sets that satisfied min

M Rj j RAj jeth THORN gt 2 MAj j were regarded as rhesus macaquelineage-specific In the birth-and-death analysis of splicing patterns(Fig 3BndashD) equivalent criteria min M Hj j H 1

2 Cthorn Reth THORN

gt 2M 1

2 Cthorn Reth THORN and min M Cj j C 1

2 Hthorn Reth THORN

gt 2 M 12 Hthorn Reth THORN

were applied to human and chimpanzee lineage-specific cassetteexon sets in order to make the numbers of changed cassette exonsets comparable among the four lineages

For the convenience of comprehensively analyzing splicingchanges we chose a representative tissue from the five tissuesfor each changed cassette exon set The representative tissuechosen had the smallest P-value of interspecific PSI changeamong the tissues whose splice change lineage assignment wasthe most common For example if a cassette exon set has P-val-ues 001 0005 003 005 005 in muscle kidney prefrontal-cortex visual-cortex and cerebellum tissues specificallyspliced in human macaque human human unassigned line-ages respectively then the representative tissue is musclesince its most common splice change lineage assignment is hu-man and muscle has the lowest P-value among them (001)

The remaining conserved cassette exon sets with PSIchanges below 02 among three primates in all tissues wereused as a control group on further analysis

Defining the six splicing change patterns

We compared PSI values of lineage-specific cassette exon setsin the changed lineage against the other two (primate assay) orthree (four-species assay) unchanged lineages For each exon

1482 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 5: Predominant patterns of splicing evolution on human ...

Surprisingly despite the relatively high sequence conserva-tion of D2 and I2 exons they displayed higher KaKs ratioswhen compared with D1 or control exon groups (Fig 4C) Thisfinding parallels observations of elevated KaKs ratios in alter-natively spliced exons of mammals suggesting a stronger direc-tional selection pressure on these regions (2425)

Regulatory mechanisms of the main splicing patterns

The different evolutionary properties of the D1 and D2I2 pat-terns suggest differences in their regulatory mechanisms To as-sess this we examined the extent of DNA sequence changesaffecting splice site strength as well as the presence of regula-tory motifs in the vicinity of splice sites

D1 pattern events indeed contained more sequence changeswithin the upstream intron region adjacent to the 30 splice sitecompared with control exons (Fisher exact test Plt 005) (Fig 5A)Furthermore the resulting changes in splice site strength coin-cided with changes in exon inclusion frequency (binomial test

Plt 0001) This connection between splice site mutation and exoninclusion frequency changes is consistent with the results of aprevious comparison between humans and mice (26) Howeverthe upstream intron region flanking the 30 splice site in the D2and I2 patterns shows fewer sequence changes compared withcontrol exons (Fisher exact test Plt 00005) (Fig 5A) Still for thefew observed sequence changes the resulting decrease or in-crease in splice site strength also coincided with the change inexon inclusion frequency (binomial test Plt 005) These resultsindicate that mutations within the upstream intron affecting 30

splice site strength contribute substantially to splicing changes inall three of the dominant patterns

The birth and death frequency of regulatory splicing motifsnear junctions flanking alternatively spliced segments also var-ied among the three dominant splicing patterns Both D1 andD2 patterns had a higher proportion of sequence changes lead-ing to regulatory motif turnover in the 5rsquo-end region of alterna-tive exons compared with control exons (Fisher exact testPlt 005) (Fig 5B) For the D1 pattern further changes affectingregulatory motif turnover occurred in the proximal and distant

A B D

C

Figure 3 Evolutionary properties of the main splicing patterns (A) Schematic diagram of the evolutionary tree of the three primate species The same lineage color-

coding is used in (B) and (C) MY millions of years (B) The number of gene segments within the I2 and D2 patterns (I2 thorn D2) or within the D1 pattern in four-species

(C) The relationship between species divergence and the number of cassette exon sets for the I2 thorn D2 patterns and the D1 pattern In the D1 correlation plot the dots

representing human and chimpanzee values largely overlap (D) Estimated birth rate (x-axis) and death rate (y-axis) of splicing events in the I2 thorn D2 patterns (red) and

the D1 pattern (blue) White crosses show the maximum likelihood estimates the dark and light color areas show the 95 and 99 confidence intervals respectively

A B C

Figure 4 Sequence conservation properties of the main splicing patterns (A) The proportion of exons preserving the open reading frame in the dominant evolutionary

splicing patterns Red and green asterisks respectively show the significance of higher and lower ratios calculated in comparison with control exons (CNTL) using the

two-tailed Fisherrsquos exact test (B) The distribution of phastCons scores in the dominant evolutionary splicing patterns for gene segments of differently spliced regions

(blue) and for whole coding regions (orange) In both assays the D1 pattern shows significantly lower sequence conservation for both spliced exons (Plt00001) and cod-

ing regions (Plt0001) compared with any other pattern (two-tailed Wilcoxon test) (C) The rates of synonymous (Ks) and nonsynonymous (Ka) substitutions on the hu-

man and chimpanzee evolutionary lineages In (A) and (C) the error bars show the 95 confidence intervals

1478 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

positions of upstream introns a region that has been reportedas important for AS regulation (27) (Fisher exact test Plt 005)Finally no increase in regulatory pattern turnover was detectedfor the I2 pattern (Fig 5B) These observations indicate thatchanges in regulatory elements contribute differently to theevolution of the three dominant splicing patterns

Human-specific features of splicing evolution

In order to characterize features of splicing evolution specific tohumans we compared the frequencies of lineage-specific splic-ing events between the human and chimpanzee lineages Inagreement with the phylogenetic distances splicing changescorresponding to D1 and D2 patterns were distributed equallyon the human and chimpanzee lineages in all five tissuesConversely splicing changes corresponding to the I2 pattern oc-curred approximately two times more frequently on the humanlineage than on the chimpanzee lineage (one-sided Fisher exacttest Pfrac14 003) This difference was even greater for the exon setspresent in all four species (Fig 6A)

To further examine the authenticity of human-specific splic-ing changes we conducted additional semi-quantitative PCRexperiments We designed primer pairs for twelve out of 289human-specific splicing events detected using RNA-seq dataNine of these primer pairs effectively amplified splice segmentsin all three species From these nine splicing events eightshowed consistent differences in isoform concentrationsamong three primates coinciding with the differences observedin RNA-seq data in all three biological replicates (Fig 6B and CSupplementary Material Fig S5 and Table S5)

Among these eight splicing events five were predicted toaffect the protein sequence of the gene (Supplementary

Material Fig S5) To test whether these splicing changes led tothe expression of different protein isoforms in humans com-pared with chimpanzees and macaques we performed westernblot experiments using cerebellum samples from three humansthree chimpanzees and three macaques (SupplementaryMaterial Table S6) From the four antibodies tested only thepolyclonal antibody against CEP63 protein produced specificbands In agreement with our predictions based on the RNA-seqresults the western blot showed the presence of one CEP63 pro-tein isoform in the chimpanzee and macaque brains and twoprotein isoforms in the human brain (Fig 6DndashF)

To assess the potential functional significance of the identi-fied species-specific splicing changes we tested the enrichmentof splicing events from D1 D2 and I2 splicing patterns amongthe gene ontology (GO) functional categories Among all line-ages and patterns the human-specific I2 and D2 patternsshowed significant functional enrichment (FDRlt 005 Fig 6Gand H Supplementary Material Tables S7 and S8) Interestinglyalthough the enriched GO terms for I2 and D2 patterns did notoverlap in both cases they were mainly related to neuronalfunctionality The D2 pattern was enriched in seven cellularcomponent GO terms predominantly associated with synapticlocation while the I2 pattern was enriched in 23 biological pro-cess GO terms including axon guidance and related functionsFunctions enriched in the I2 pattern were reported to be af-fected by AS of cell adhesion proteins (28ndash30) These proteins in-clude products of two cell adhesion genes from the L1 familyCHL1 and NRCAM and two semaphorin genes SEMA4D andSEMA6C which we identified to be alternatively spliced in ourstudy More generally among 33 genes containing I2 or D2 splic-ing changes on the human lineage and falling into the enrichedGO terms 29 showed significant splicing changes in at least one

A

B

Figure 5 Regulatory properties of the main splicing patterns (A) The proportion of splice sites with strength changes on the ape and rhesus macaque lineages The col-

ored and white sections of the bars show the strength of change consistent and inconsistent with the direction of exon inclusion frequency changes respectively

White asterisks within the colored sections of the bars indicate the significance of the colored proportions calculated using the binomial test (B) The proportion of reg-

ulatory motif turnover on the human and chimpanzee evolutionary lineages In (A) and (B) error bars show 95 confidence intervals The schematic exon-intron struc-

ture in the middle shows the positions of splice sites in (A) and evaluated regions in (B) The length of evaluated regions in introns and exons depicted in (B) were 200

and 39 nt respectively

1479Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

A

D

H

G

E

F

B C

Figure 6 Human-specific features of splicing evolution (A) The proportion of cassette exon sets showing species-specific splicing changes for each of the domi-

nant splicing patterns The proportions for human and chimpanzee lineages were calculated in the primate species comparison (upper panel) and by four species

comparison (lower panel) Exon set numbers are shown within the bars The error bars show 95 confidential intervals based on the binomial distribution The as-

terisks mark the significance of the ratio differences between the human and chimpanzee branches (one-tailed Fisherrsquos exact test P lt 005) (B) Comparison of PSI

values obtained based on RNA-seq and PCR measurements (C) The comparison of human-specific PSI changes (human minus the mean of chimpanzee and rhe-

sus) was calculated based on RNA-seq and PCR measurements In (B) and (C) one case showing inconsistent splicing patterns in macaques in PCR experiment

compared with RNA-seq data is shown as triangles (D)ndash(F) RNA-seq reads coverage (D) PCR results (E) and western blot results (F) for human-specific splicing in

the CEP63 gene In (D) coverage in three primates using visual cortex data is shown as blue blocks junction reads are shown as orange lines The y-axis indicates

the reads number Human gene annotation is shown at the bottom of the figure the differentially spliced exon (located at 930-1067nt within the coding region) is

colored in red In (E) PCR was conducted in visual cortex samples of three individuals for each of the three primate species In (F) western blot was conducted in

cerebellum samples from three individuals for each of the three primate species In both (E) and (F) the longer and shorter products corresponding to exon inclu-

sion and exclusion isoforms are detected in the upper and lower rows respectively (G) and (H) Characteristics of genes showing I2-type (G) and D2-type (H) splic-

ing changes on the human evolutionary lineage and the corresponding enriched GO terms The red color indicates the presence of the feature Bar plots show the

significance of each GO term after multiple test correction

1480 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

brain region and 19 included splicing changes that altered pro-tein sequences

DiscussionSplicing evolution includes a variety of processes includingbirth and death of novel isoforms as well as increases and de-creases in the frequency of existing ones We investigated eachof these processes separately by classifying differences in exoninclusion levels among four species into six evolutionary pat-terns This classification resulted in an unexpected observationdifferent splicing patterns substantially varied in their frequen-cies and potential functional implications

Among the six splicing patterns one representing the birthof novel isoforms by constitutive-to-alternative exon transitioncontained 60ndash65 of all detected splicing differences Despite itsprevalence our results suggest that this pattern represents nei-ther the primary driving force of long-term isoform repertoireevolution nor the source of novel gene functionality Isoformscreated within this pattern often result in reading frame disrup-tion are less conserved at the DNA sequence level do not clus-ter in any functional categories and are lost more rapidlyduring the course of evolution than isoform changes in theother splicing patterns Although the CDS regions of genes con-taining D1-splicing events are less conserved at the sequencelevel compared with other patterns (Fig 4B) they show thesame proportion of one-to-one orthologs in comparisons to dis-tant species chicken and frog (Supplementary Material Fig S6)This result argues against the reduced functionality of genescontaining D1-splicing events Instead our observation ofhigher synonymous substitution rates (Ks) but not non-synonymous substitution rates (Ka) in D1 pattern exons(Fig 4C) indicates that the lower CDS conservation of D1 patterngenes might be due to a higher local mutation rate Accordinglya higher local mutation rate might lead to overrepresentation ofsplice sites mutations and mutations affecting cis-factor bindingcites associated with D1 pattern events (Fig 5) Our analysisalso suggests that constitutive-to-alternative splicing changesmight be primarily driven by mutations within proximal splicesites and splicing factor binding motifs thereby decreasing theirefficiency Overall these observations strongly imply the neu-tral or deleterious nature of splicing events leading toconstitutive-to-alternative exon transition Nonetheless wecannot exclude that perhaps a few of the novel alternative iso-forms contained in this pattern may gain new functionality dur-ing evolution

The majority of the remaining splicing events (22ndash27 of thetotal) does not affect isoform numbers but instead changes theirrelative frequencies These alternative isoforms tend to pre-serve protein reading frames are highly conserved at the DNAsequence level but tend to show high KaKs ratios indicating anacceleration of protein sequence evolution and tend to be pre-served by evolution We further identified mutations in trans-regulator binding sites as a potential regulatory mechanismdriving this type of splicing change These observations suggestthat changes in inclusion frequencies of existing isoforms rep-resent functional evolutionary transitions

Changes in the inclusion frequency of existing isoformsmight be particularly relevant for evolution of the human brainOne of the two patterns constituting this group of events theincrease in isoform inclusion levels showed a 2-fold accelera-tion on the human evolutionary lineage when compared withthe chimpanzee lineage This acceleration particularly affectedthe isoform repertoire in the brain Furthermore despite the

relatively small number of observed splicing events both pat-terns clustered significantly in biological processes relevant tobrain function

Due to the limited numbers of detected events we were un-able to provide a detailed characterization of the remainingthree evolutionary splicing patterns However this omissiondoes not mean that the splicing processes described by thesepatterns do not play a role in evolution For instance novelexon birth is one of the essential mechanisms leading to theemergence of new gene functions (31ndash33) Another limitation ofour study is the small number of examined tissues As largerdata sets representing a wider selection of species tissues andevolutionary distances appear categorizing splicing events intodiscrete evolutionary patterns might lead to novel insights intothe evolutionary mechanisms and functional consequences ofsplicing changes

Materials and MethodsSamples

The RNA-seq data used in this study were taken from the litera-ture (19) (Ensembl ID GSE49379 Supplementary MaterialTable S1) Selected human chimpanzee and rhesus macaquesamples used in the RNA-seq experiment were also utilized inthe PCR validation experiments as labeled in SupplementaryMaterial Table S1

For western blot human samples were obtained from theNational Institute of Child Health and Human Development(NICHD) Brain and Tissue Bank for Developmental Disorders atthe University of Maryland Written consent for the use of hu-man tissues for research was obtained from all donors or theirnext of kin All subjects were defined as healthy controls by fo-rensic pathologists at the corresponding tissue bank All sub-jects suffered sudden death with no prolonged agonal stateChimpanzee samples were obtained from the BiomedicalPrimate Research Centre The Netherlands Rhesus macaquesamples were obtained from the Suzhou Experimental AnimalCenter China All nonhuman primates used in this study suf-fered sudden deaths for reasons other than their participationin this study and without any relation to the tissue used Thecerebellum samples were dissected from the lateral part of thecerebellar hemispheres

Reads mapping

About 187 million 1 100 nt single-end RNA-seq reads weremapped to human chimpanzee rhesus macaque and mousereference genomes (genome assemble version hg19 panTro3rheMac2 and mm9) using the RNA-seq read mapping softwareTophat (v203 httptophatcbcbumdedu) (34) In the firstround of mapping we detected de-novo junctions in each sam-ple A total of 134 billion reads including 305 million junctionreads were successfully mapped to reference genomes(mapped proportion 72 Supplementary Material Table S2)We combined all detected junctions after we transferred theircoordinates to other species using liftOver (httpsgenomeucsceducgi-binhgLiftOver) The union of all junctions was in-putted in the second round of mapping in order to equalizejunction detection sensitivity in all species Considering thathuman genome annotations are more complete than those forchimpanzees and macaques here we did not use public genestructure annotations in order to avoid analysis bias

1481Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

PSI calculation

To minimize interspecific artifacts caused by length or mappingefficiency changes inside exons we only used junction reads forthe PSI value calculation The PSI value is equal to the propor-tion of inclusion reads among the total reads For example let[L R] denote the exclusion junction ie the junction supportingthe exon-exclusive isoform where L and R represent the posi-tions of its left (50-side) and right (30-side) splice sites respec-tively Also let [L M1] and [M2 R] denote its left and rightinclusion junctions ie the junction supporting the exon-inclusive isoform where M1 and M2 are two reference positionsbetween L and R In the event of a single cassette exon M1 andM2 are the two ends of this exon For each species the PSI valueof this gene segment (one or several adjacent cassette exons) intissue t was calculated as follows

PSIt frac14It

It thorn Et

It frac1412

XiM1

it LM1frac12 thornXiM2

it M2Rfrac12

0

1A

Et frac14X

i

itfrac12LR

where it LM1frac12 denotes the observed read number of this junc-tion in individual i and tissue t and It and Et denote the inclu-sion and exclusion junction read number respectively Lastlyonly splicing events that satisfied the criteria below were usedfor further analysis

1) Must have at least one inclusion or exclusion junction read inevery primate individual for three-species comparison analy-sis or one read in every individual for four-species comparisonanalysis in any tissue We did not require that both inclusionand exclusion reads be detected at the same time in order toallow for the identification of species-specific isoforms

2) In at least one species-tissue combination there must notbe less than four individuals with a left inclusion read andalso no fewer than four individuals with a right inclusionread in at least in one species-tissue combination theremust not be less than four individuals that have an exclu-sion read Additionally we filtered out splicing events withPSI lt 01 or PSI gt 09 in all species-tissue combinations toremove low-abundance alternative isoforms (12)

3) BothP

i it LM1frac12 andP

i it M2Rfrac12 should be larger than thenumber of non-junction reads covering the left or right splicesite ie the retention ratio of either the left or right intronshould be less than 05

Pi it LM1frac12 should not be less than

15 ofP

i it M2Rfrac12 and vice versa This criterion aims to ex-clude events that clearly indicate a mixed splicing type includ-ing intron retention alternative donor sites or alternativeacceptor sites

4) Both for inclusion and exclusion reads the multiple-mapped read number should be less than 15 This crite-rion can alleviate the bias that arises when a splicing regionhas a unique nucleotide sequence in the genome of onespecies but is duplicated in the genome of another species

Detecting lineage-specific spliced gene segments

We first required that the splice sites of analyzed cassette exonsets should be conserved among all compared species and that

the genes should be expressed in all individuals of these species Inthe primate comparison assay rhesus macaque was regarded asthe outgroup in order to detect splicing changes that occurred onthe human and chimpanzee lineages In the four-species compari-son assay mouse was regarded as the outgroup in order to detectsplicing changes that occurred on the ape and rhesus macaque lin-eages All comparisons were done separately in each tissue

In the primate comparison assay a quasi-binomial general-ized linear model was applied based on inclusion and exclusionread numbers using the glm function in R v2 test P-values werecalculated using the species term of this model For multiple-test correction 1000 permutations were performed by randomlyshuffling the species labels We used 005 as the false discoveryrate cutoff We further required that the differences betweenthe maximum and minimum PSI values among the three pri-mate species be larger than 02 The cassette exon sets that metthese criteria were considered as differentially spliced in pri-mates Next we classified each differentially spliced exon set ashuman lineage-specific chimpanzee lineage-specific or otherin each of the five tissues For example we regarded min

H Cj j H Rj jeth THORN gt 2 C Rj j as human lineage-specific and re-garded min H Cj j C Rj jeth THORN gt 2 H Rj j as chimpanzee lineage-specific where H C and R denote the PSI values in humanchimpanzee and rhesus macaque respectively

In the four-species comparison assay we also used the quasi-binomnial generalized linear model test to detect divergent cassetteexon sets followed by 1000 permutations for multiple-test correc-tion with a 02 PSI change cutoff To further detect splicing changesspecific to the rhesus macaque and ape lineages we compared thePSI values of mouse (M) macaque (R) and ape (ie the mean valueof human and chimpanzee) (A) We required that the inclusionlevel on the mouse lineage should be consistent with the inclusionlevel either on the macaque or ape lineages (ie MAj j lt 02 orM Rj j lt 02) Under this condition cassette exon sets that satis-

fied min M Aj j RAj jeth THORN gt 2 M Rj jwere regarded as ape lineage-specific while the cassette exon sets that satisfied min

M Rj j RAj jeth THORN gt 2 MAj j were regarded as rhesus macaquelineage-specific In the birth-and-death analysis of splicing patterns(Fig 3BndashD) equivalent criteria min M Hj j H 1

2 Cthorn Reth THORN

gt 2M 1

2 Cthorn Reth THORN and min M Cj j C 1

2 Hthorn Reth THORN

gt 2 M 12 Hthorn Reth THORN

were applied to human and chimpanzee lineage-specific cassetteexon sets in order to make the numbers of changed cassette exonsets comparable among the four lineages

For the convenience of comprehensively analyzing splicingchanges we chose a representative tissue from the five tissuesfor each changed cassette exon set The representative tissuechosen had the smallest P-value of interspecific PSI changeamong the tissues whose splice change lineage assignment wasthe most common For example if a cassette exon set has P-val-ues 001 0005 003 005 005 in muscle kidney prefrontal-cortex visual-cortex and cerebellum tissues specificallyspliced in human macaque human human unassigned line-ages respectively then the representative tissue is musclesince its most common splice change lineage assignment is hu-man and muscle has the lowest P-value among them (001)

The remaining conserved cassette exon sets with PSIchanges below 02 among three primates in all tissues wereused as a control group on further analysis

Defining the six splicing change patterns

We compared PSI values of lineage-specific cassette exon setsin the changed lineage against the other two (primate assay) orthree (four-species assay) unchanged lineages For each exon

1482 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 6: Predominant patterns of splicing evolution on human ...

positions of upstream introns a region that has been reportedas important for AS regulation (27) (Fisher exact test Plt 005)Finally no increase in regulatory pattern turnover was detectedfor the I2 pattern (Fig 5B) These observations indicate thatchanges in regulatory elements contribute differently to theevolution of the three dominant splicing patterns

Human-specific features of splicing evolution

In order to characterize features of splicing evolution specific tohumans we compared the frequencies of lineage-specific splic-ing events between the human and chimpanzee lineages Inagreement with the phylogenetic distances splicing changescorresponding to D1 and D2 patterns were distributed equallyon the human and chimpanzee lineages in all five tissuesConversely splicing changes corresponding to the I2 pattern oc-curred approximately two times more frequently on the humanlineage than on the chimpanzee lineage (one-sided Fisher exacttest Pfrac14 003) This difference was even greater for the exon setspresent in all four species (Fig 6A)

To further examine the authenticity of human-specific splic-ing changes we conducted additional semi-quantitative PCRexperiments We designed primer pairs for twelve out of 289human-specific splicing events detected using RNA-seq dataNine of these primer pairs effectively amplified splice segmentsin all three species From these nine splicing events eightshowed consistent differences in isoform concentrationsamong three primates coinciding with the differences observedin RNA-seq data in all three biological replicates (Fig 6B and CSupplementary Material Fig S5 and Table S5)

Among these eight splicing events five were predicted toaffect the protein sequence of the gene (Supplementary

Material Fig S5) To test whether these splicing changes led tothe expression of different protein isoforms in humans com-pared with chimpanzees and macaques we performed westernblot experiments using cerebellum samples from three humansthree chimpanzees and three macaques (SupplementaryMaterial Table S6) From the four antibodies tested only thepolyclonal antibody against CEP63 protein produced specificbands In agreement with our predictions based on the RNA-seqresults the western blot showed the presence of one CEP63 pro-tein isoform in the chimpanzee and macaque brains and twoprotein isoforms in the human brain (Fig 6DndashF)

To assess the potential functional significance of the identi-fied species-specific splicing changes we tested the enrichmentof splicing events from D1 D2 and I2 splicing patterns amongthe gene ontology (GO) functional categories Among all line-ages and patterns the human-specific I2 and D2 patternsshowed significant functional enrichment (FDRlt 005 Fig 6Gand H Supplementary Material Tables S7 and S8) Interestinglyalthough the enriched GO terms for I2 and D2 patterns did notoverlap in both cases they were mainly related to neuronalfunctionality The D2 pattern was enriched in seven cellularcomponent GO terms predominantly associated with synapticlocation while the I2 pattern was enriched in 23 biological pro-cess GO terms including axon guidance and related functionsFunctions enriched in the I2 pattern were reported to be af-fected by AS of cell adhesion proteins (28ndash30) These proteins in-clude products of two cell adhesion genes from the L1 familyCHL1 and NRCAM and two semaphorin genes SEMA4D andSEMA6C which we identified to be alternatively spliced in ourstudy More generally among 33 genes containing I2 or D2 splic-ing changes on the human lineage and falling into the enrichedGO terms 29 showed significant splicing changes in at least one

A

B

Figure 5 Regulatory properties of the main splicing patterns (A) The proportion of splice sites with strength changes on the ape and rhesus macaque lineages The col-

ored and white sections of the bars show the strength of change consistent and inconsistent with the direction of exon inclusion frequency changes respectively

White asterisks within the colored sections of the bars indicate the significance of the colored proportions calculated using the binomial test (B) The proportion of reg-

ulatory motif turnover on the human and chimpanzee evolutionary lineages In (A) and (B) error bars show 95 confidence intervals The schematic exon-intron struc-

ture in the middle shows the positions of splice sites in (A) and evaluated regions in (B) The length of evaluated regions in introns and exons depicted in (B) were 200

and 39 nt respectively

1479Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

A

D

H

G

E

F

B C

Figure 6 Human-specific features of splicing evolution (A) The proportion of cassette exon sets showing species-specific splicing changes for each of the domi-

nant splicing patterns The proportions for human and chimpanzee lineages were calculated in the primate species comparison (upper panel) and by four species

comparison (lower panel) Exon set numbers are shown within the bars The error bars show 95 confidential intervals based on the binomial distribution The as-

terisks mark the significance of the ratio differences between the human and chimpanzee branches (one-tailed Fisherrsquos exact test P lt 005) (B) Comparison of PSI

values obtained based on RNA-seq and PCR measurements (C) The comparison of human-specific PSI changes (human minus the mean of chimpanzee and rhe-

sus) was calculated based on RNA-seq and PCR measurements In (B) and (C) one case showing inconsistent splicing patterns in macaques in PCR experiment

compared with RNA-seq data is shown as triangles (D)ndash(F) RNA-seq reads coverage (D) PCR results (E) and western blot results (F) for human-specific splicing in

the CEP63 gene In (D) coverage in three primates using visual cortex data is shown as blue blocks junction reads are shown as orange lines The y-axis indicates

the reads number Human gene annotation is shown at the bottom of the figure the differentially spliced exon (located at 930-1067nt within the coding region) is

colored in red In (E) PCR was conducted in visual cortex samples of three individuals for each of the three primate species In (F) western blot was conducted in

cerebellum samples from three individuals for each of the three primate species In both (E) and (F) the longer and shorter products corresponding to exon inclu-

sion and exclusion isoforms are detected in the upper and lower rows respectively (G) and (H) Characteristics of genes showing I2-type (G) and D2-type (H) splic-

ing changes on the human evolutionary lineage and the corresponding enriched GO terms The red color indicates the presence of the feature Bar plots show the

significance of each GO term after multiple test correction

1480 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

brain region and 19 included splicing changes that altered pro-tein sequences

DiscussionSplicing evolution includes a variety of processes includingbirth and death of novel isoforms as well as increases and de-creases in the frequency of existing ones We investigated eachof these processes separately by classifying differences in exoninclusion levels among four species into six evolutionary pat-terns This classification resulted in an unexpected observationdifferent splicing patterns substantially varied in their frequen-cies and potential functional implications

Among the six splicing patterns one representing the birthof novel isoforms by constitutive-to-alternative exon transitioncontained 60ndash65 of all detected splicing differences Despite itsprevalence our results suggest that this pattern represents nei-ther the primary driving force of long-term isoform repertoireevolution nor the source of novel gene functionality Isoformscreated within this pattern often result in reading frame disrup-tion are less conserved at the DNA sequence level do not clus-ter in any functional categories and are lost more rapidlyduring the course of evolution than isoform changes in theother splicing patterns Although the CDS regions of genes con-taining D1-splicing events are less conserved at the sequencelevel compared with other patterns (Fig 4B) they show thesame proportion of one-to-one orthologs in comparisons to dis-tant species chicken and frog (Supplementary Material Fig S6)This result argues against the reduced functionality of genescontaining D1-splicing events Instead our observation ofhigher synonymous substitution rates (Ks) but not non-synonymous substitution rates (Ka) in D1 pattern exons(Fig 4C) indicates that the lower CDS conservation of D1 patterngenes might be due to a higher local mutation rate Accordinglya higher local mutation rate might lead to overrepresentation ofsplice sites mutations and mutations affecting cis-factor bindingcites associated with D1 pattern events (Fig 5) Our analysisalso suggests that constitutive-to-alternative splicing changesmight be primarily driven by mutations within proximal splicesites and splicing factor binding motifs thereby decreasing theirefficiency Overall these observations strongly imply the neu-tral or deleterious nature of splicing events leading toconstitutive-to-alternative exon transition Nonetheless wecannot exclude that perhaps a few of the novel alternative iso-forms contained in this pattern may gain new functionality dur-ing evolution

The majority of the remaining splicing events (22ndash27 of thetotal) does not affect isoform numbers but instead changes theirrelative frequencies These alternative isoforms tend to pre-serve protein reading frames are highly conserved at the DNAsequence level but tend to show high KaKs ratios indicating anacceleration of protein sequence evolution and tend to be pre-served by evolution We further identified mutations in trans-regulator binding sites as a potential regulatory mechanismdriving this type of splicing change These observations suggestthat changes in inclusion frequencies of existing isoforms rep-resent functional evolutionary transitions

Changes in the inclusion frequency of existing isoformsmight be particularly relevant for evolution of the human brainOne of the two patterns constituting this group of events theincrease in isoform inclusion levels showed a 2-fold accelera-tion on the human evolutionary lineage when compared withthe chimpanzee lineage This acceleration particularly affectedthe isoform repertoire in the brain Furthermore despite the

relatively small number of observed splicing events both pat-terns clustered significantly in biological processes relevant tobrain function

Due to the limited numbers of detected events we were un-able to provide a detailed characterization of the remainingthree evolutionary splicing patterns However this omissiondoes not mean that the splicing processes described by thesepatterns do not play a role in evolution For instance novelexon birth is one of the essential mechanisms leading to theemergence of new gene functions (31ndash33) Another limitation ofour study is the small number of examined tissues As largerdata sets representing a wider selection of species tissues andevolutionary distances appear categorizing splicing events intodiscrete evolutionary patterns might lead to novel insights intothe evolutionary mechanisms and functional consequences ofsplicing changes

Materials and MethodsSamples

The RNA-seq data used in this study were taken from the litera-ture (19) (Ensembl ID GSE49379 Supplementary MaterialTable S1) Selected human chimpanzee and rhesus macaquesamples used in the RNA-seq experiment were also utilized inthe PCR validation experiments as labeled in SupplementaryMaterial Table S1

For western blot human samples were obtained from theNational Institute of Child Health and Human Development(NICHD) Brain and Tissue Bank for Developmental Disorders atthe University of Maryland Written consent for the use of hu-man tissues for research was obtained from all donors or theirnext of kin All subjects were defined as healthy controls by fo-rensic pathologists at the corresponding tissue bank All sub-jects suffered sudden death with no prolonged agonal stateChimpanzee samples were obtained from the BiomedicalPrimate Research Centre The Netherlands Rhesus macaquesamples were obtained from the Suzhou Experimental AnimalCenter China All nonhuman primates used in this study suf-fered sudden deaths for reasons other than their participationin this study and without any relation to the tissue used Thecerebellum samples were dissected from the lateral part of thecerebellar hemispheres

Reads mapping

About 187 million 1 100 nt single-end RNA-seq reads weremapped to human chimpanzee rhesus macaque and mousereference genomes (genome assemble version hg19 panTro3rheMac2 and mm9) using the RNA-seq read mapping softwareTophat (v203 httptophatcbcbumdedu) (34) In the firstround of mapping we detected de-novo junctions in each sam-ple A total of 134 billion reads including 305 million junctionreads were successfully mapped to reference genomes(mapped proportion 72 Supplementary Material Table S2)We combined all detected junctions after we transferred theircoordinates to other species using liftOver (httpsgenomeucsceducgi-binhgLiftOver) The union of all junctions was in-putted in the second round of mapping in order to equalizejunction detection sensitivity in all species Considering thathuman genome annotations are more complete than those forchimpanzees and macaques here we did not use public genestructure annotations in order to avoid analysis bias

1481Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

PSI calculation

To minimize interspecific artifacts caused by length or mappingefficiency changes inside exons we only used junction reads forthe PSI value calculation The PSI value is equal to the propor-tion of inclusion reads among the total reads For example let[L R] denote the exclusion junction ie the junction supportingthe exon-exclusive isoform where L and R represent the posi-tions of its left (50-side) and right (30-side) splice sites respec-tively Also let [L M1] and [M2 R] denote its left and rightinclusion junctions ie the junction supporting the exon-inclusive isoform where M1 and M2 are two reference positionsbetween L and R In the event of a single cassette exon M1 andM2 are the two ends of this exon For each species the PSI valueof this gene segment (one or several adjacent cassette exons) intissue t was calculated as follows

PSIt frac14It

It thorn Et

It frac1412

XiM1

it LM1frac12 thornXiM2

it M2Rfrac12

0

1A

Et frac14X

i

itfrac12LR

where it LM1frac12 denotes the observed read number of this junc-tion in individual i and tissue t and It and Et denote the inclu-sion and exclusion junction read number respectively Lastlyonly splicing events that satisfied the criteria below were usedfor further analysis

1) Must have at least one inclusion or exclusion junction read inevery primate individual for three-species comparison analy-sis or one read in every individual for four-species comparisonanalysis in any tissue We did not require that both inclusionand exclusion reads be detected at the same time in order toallow for the identification of species-specific isoforms

2) In at least one species-tissue combination there must notbe less than four individuals with a left inclusion read andalso no fewer than four individuals with a right inclusionread in at least in one species-tissue combination theremust not be less than four individuals that have an exclu-sion read Additionally we filtered out splicing events withPSI lt 01 or PSI gt 09 in all species-tissue combinations toremove low-abundance alternative isoforms (12)

3) BothP

i it LM1frac12 andP

i it M2Rfrac12 should be larger than thenumber of non-junction reads covering the left or right splicesite ie the retention ratio of either the left or right intronshould be less than 05

Pi it LM1frac12 should not be less than

15 ofP

i it M2Rfrac12 and vice versa This criterion aims to ex-clude events that clearly indicate a mixed splicing type includ-ing intron retention alternative donor sites or alternativeacceptor sites

4) Both for inclusion and exclusion reads the multiple-mapped read number should be less than 15 This crite-rion can alleviate the bias that arises when a splicing regionhas a unique nucleotide sequence in the genome of onespecies but is duplicated in the genome of another species

Detecting lineage-specific spliced gene segments

We first required that the splice sites of analyzed cassette exonsets should be conserved among all compared species and that

the genes should be expressed in all individuals of these species Inthe primate comparison assay rhesus macaque was regarded asthe outgroup in order to detect splicing changes that occurred onthe human and chimpanzee lineages In the four-species compari-son assay mouse was regarded as the outgroup in order to detectsplicing changes that occurred on the ape and rhesus macaque lin-eages All comparisons were done separately in each tissue

In the primate comparison assay a quasi-binomial general-ized linear model was applied based on inclusion and exclusionread numbers using the glm function in R v2 test P-values werecalculated using the species term of this model For multiple-test correction 1000 permutations were performed by randomlyshuffling the species labels We used 005 as the false discoveryrate cutoff We further required that the differences betweenthe maximum and minimum PSI values among the three pri-mate species be larger than 02 The cassette exon sets that metthese criteria were considered as differentially spliced in pri-mates Next we classified each differentially spliced exon set ashuman lineage-specific chimpanzee lineage-specific or otherin each of the five tissues For example we regarded min

H Cj j H Rj jeth THORN gt 2 C Rj j as human lineage-specific and re-garded min H Cj j C Rj jeth THORN gt 2 H Rj j as chimpanzee lineage-specific where H C and R denote the PSI values in humanchimpanzee and rhesus macaque respectively

In the four-species comparison assay we also used the quasi-binomnial generalized linear model test to detect divergent cassetteexon sets followed by 1000 permutations for multiple-test correc-tion with a 02 PSI change cutoff To further detect splicing changesspecific to the rhesus macaque and ape lineages we compared thePSI values of mouse (M) macaque (R) and ape (ie the mean valueof human and chimpanzee) (A) We required that the inclusionlevel on the mouse lineage should be consistent with the inclusionlevel either on the macaque or ape lineages (ie MAj j lt 02 orM Rj j lt 02) Under this condition cassette exon sets that satis-

fied min M Aj j RAj jeth THORN gt 2 M Rj jwere regarded as ape lineage-specific while the cassette exon sets that satisfied min

M Rj j RAj jeth THORN gt 2 MAj j were regarded as rhesus macaquelineage-specific In the birth-and-death analysis of splicing patterns(Fig 3BndashD) equivalent criteria min M Hj j H 1

2 Cthorn Reth THORN

gt 2M 1

2 Cthorn Reth THORN and min M Cj j C 1

2 Hthorn Reth THORN

gt 2 M 12 Hthorn Reth THORN

were applied to human and chimpanzee lineage-specific cassetteexon sets in order to make the numbers of changed cassette exonsets comparable among the four lineages

For the convenience of comprehensively analyzing splicingchanges we chose a representative tissue from the five tissuesfor each changed cassette exon set The representative tissuechosen had the smallest P-value of interspecific PSI changeamong the tissues whose splice change lineage assignment wasthe most common For example if a cassette exon set has P-val-ues 001 0005 003 005 005 in muscle kidney prefrontal-cortex visual-cortex and cerebellum tissues specificallyspliced in human macaque human human unassigned line-ages respectively then the representative tissue is musclesince its most common splice change lineage assignment is hu-man and muscle has the lowest P-value among them (001)

The remaining conserved cassette exon sets with PSIchanges below 02 among three primates in all tissues wereused as a control group on further analysis

Defining the six splicing change patterns

We compared PSI values of lineage-specific cassette exon setsin the changed lineage against the other two (primate assay) orthree (four-species assay) unchanged lineages For each exon

1482 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 7: Predominant patterns of splicing evolution on human ...

A

D

H

G

E

F

B C

Figure 6 Human-specific features of splicing evolution (A) The proportion of cassette exon sets showing species-specific splicing changes for each of the domi-

nant splicing patterns The proportions for human and chimpanzee lineages were calculated in the primate species comparison (upper panel) and by four species

comparison (lower panel) Exon set numbers are shown within the bars The error bars show 95 confidential intervals based on the binomial distribution The as-

terisks mark the significance of the ratio differences between the human and chimpanzee branches (one-tailed Fisherrsquos exact test P lt 005) (B) Comparison of PSI

values obtained based on RNA-seq and PCR measurements (C) The comparison of human-specific PSI changes (human minus the mean of chimpanzee and rhe-

sus) was calculated based on RNA-seq and PCR measurements In (B) and (C) one case showing inconsistent splicing patterns in macaques in PCR experiment

compared with RNA-seq data is shown as triangles (D)ndash(F) RNA-seq reads coverage (D) PCR results (E) and western blot results (F) for human-specific splicing in

the CEP63 gene In (D) coverage in three primates using visual cortex data is shown as blue blocks junction reads are shown as orange lines The y-axis indicates

the reads number Human gene annotation is shown at the bottom of the figure the differentially spliced exon (located at 930-1067nt within the coding region) is

colored in red In (E) PCR was conducted in visual cortex samples of three individuals for each of the three primate species In (F) western blot was conducted in

cerebellum samples from three individuals for each of the three primate species In both (E) and (F) the longer and shorter products corresponding to exon inclu-

sion and exclusion isoforms are detected in the upper and lower rows respectively (G) and (H) Characteristics of genes showing I2-type (G) and D2-type (H) splic-

ing changes on the human evolutionary lineage and the corresponding enriched GO terms The red color indicates the presence of the feature Bar plots show the

significance of each GO term after multiple test correction

1480 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

brain region and 19 included splicing changes that altered pro-tein sequences

DiscussionSplicing evolution includes a variety of processes includingbirth and death of novel isoforms as well as increases and de-creases in the frequency of existing ones We investigated eachof these processes separately by classifying differences in exoninclusion levels among four species into six evolutionary pat-terns This classification resulted in an unexpected observationdifferent splicing patterns substantially varied in their frequen-cies and potential functional implications

Among the six splicing patterns one representing the birthof novel isoforms by constitutive-to-alternative exon transitioncontained 60ndash65 of all detected splicing differences Despite itsprevalence our results suggest that this pattern represents nei-ther the primary driving force of long-term isoform repertoireevolution nor the source of novel gene functionality Isoformscreated within this pattern often result in reading frame disrup-tion are less conserved at the DNA sequence level do not clus-ter in any functional categories and are lost more rapidlyduring the course of evolution than isoform changes in theother splicing patterns Although the CDS regions of genes con-taining D1-splicing events are less conserved at the sequencelevel compared with other patterns (Fig 4B) they show thesame proportion of one-to-one orthologs in comparisons to dis-tant species chicken and frog (Supplementary Material Fig S6)This result argues against the reduced functionality of genescontaining D1-splicing events Instead our observation ofhigher synonymous substitution rates (Ks) but not non-synonymous substitution rates (Ka) in D1 pattern exons(Fig 4C) indicates that the lower CDS conservation of D1 patterngenes might be due to a higher local mutation rate Accordinglya higher local mutation rate might lead to overrepresentation ofsplice sites mutations and mutations affecting cis-factor bindingcites associated with D1 pattern events (Fig 5) Our analysisalso suggests that constitutive-to-alternative splicing changesmight be primarily driven by mutations within proximal splicesites and splicing factor binding motifs thereby decreasing theirefficiency Overall these observations strongly imply the neu-tral or deleterious nature of splicing events leading toconstitutive-to-alternative exon transition Nonetheless wecannot exclude that perhaps a few of the novel alternative iso-forms contained in this pattern may gain new functionality dur-ing evolution

The majority of the remaining splicing events (22ndash27 of thetotal) does not affect isoform numbers but instead changes theirrelative frequencies These alternative isoforms tend to pre-serve protein reading frames are highly conserved at the DNAsequence level but tend to show high KaKs ratios indicating anacceleration of protein sequence evolution and tend to be pre-served by evolution We further identified mutations in trans-regulator binding sites as a potential regulatory mechanismdriving this type of splicing change These observations suggestthat changes in inclusion frequencies of existing isoforms rep-resent functional evolutionary transitions

Changes in the inclusion frequency of existing isoformsmight be particularly relevant for evolution of the human brainOne of the two patterns constituting this group of events theincrease in isoform inclusion levels showed a 2-fold accelera-tion on the human evolutionary lineage when compared withthe chimpanzee lineage This acceleration particularly affectedthe isoform repertoire in the brain Furthermore despite the

relatively small number of observed splicing events both pat-terns clustered significantly in biological processes relevant tobrain function

Due to the limited numbers of detected events we were un-able to provide a detailed characterization of the remainingthree evolutionary splicing patterns However this omissiondoes not mean that the splicing processes described by thesepatterns do not play a role in evolution For instance novelexon birth is one of the essential mechanisms leading to theemergence of new gene functions (31ndash33) Another limitation ofour study is the small number of examined tissues As largerdata sets representing a wider selection of species tissues andevolutionary distances appear categorizing splicing events intodiscrete evolutionary patterns might lead to novel insights intothe evolutionary mechanisms and functional consequences ofsplicing changes

Materials and MethodsSamples

The RNA-seq data used in this study were taken from the litera-ture (19) (Ensembl ID GSE49379 Supplementary MaterialTable S1) Selected human chimpanzee and rhesus macaquesamples used in the RNA-seq experiment were also utilized inthe PCR validation experiments as labeled in SupplementaryMaterial Table S1

For western blot human samples were obtained from theNational Institute of Child Health and Human Development(NICHD) Brain and Tissue Bank for Developmental Disorders atthe University of Maryland Written consent for the use of hu-man tissues for research was obtained from all donors or theirnext of kin All subjects were defined as healthy controls by fo-rensic pathologists at the corresponding tissue bank All sub-jects suffered sudden death with no prolonged agonal stateChimpanzee samples were obtained from the BiomedicalPrimate Research Centre The Netherlands Rhesus macaquesamples were obtained from the Suzhou Experimental AnimalCenter China All nonhuman primates used in this study suf-fered sudden deaths for reasons other than their participationin this study and without any relation to the tissue used Thecerebellum samples were dissected from the lateral part of thecerebellar hemispheres

Reads mapping

About 187 million 1 100 nt single-end RNA-seq reads weremapped to human chimpanzee rhesus macaque and mousereference genomes (genome assemble version hg19 panTro3rheMac2 and mm9) using the RNA-seq read mapping softwareTophat (v203 httptophatcbcbumdedu) (34) In the firstround of mapping we detected de-novo junctions in each sam-ple A total of 134 billion reads including 305 million junctionreads were successfully mapped to reference genomes(mapped proportion 72 Supplementary Material Table S2)We combined all detected junctions after we transferred theircoordinates to other species using liftOver (httpsgenomeucsceducgi-binhgLiftOver) The union of all junctions was in-putted in the second round of mapping in order to equalizejunction detection sensitivity in all species Considering thathuman genome annotations are more complete than those forchimpanzees and macaques here we did not use public genestructure annotations in order to avoid analysis bias

1481Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

PSI calculation

To minimize interspecific artifacts caused by length or mappingefficiency changes inside exons we only used junction reads forthe PSI value calculation The PSI value is equal to the propor-tion of inclusion reads among the total reads For example let[L R] denote the exclusion junction ie the junction supportingthe exon-exclusive isoform where L and R represent the posi-tions of its left (50-side) and right (30-side) splice sites respec-tively Also let [L M1] and [M2 R] denote its left and rightinclusion junctions ie the junction supporting the exon-inclusive isoform where M1 and M2 are two reference positionsbetween L and R In the event of a single cassette exon M1 andM2 are the two ends of this exon For each species the PSI valueof this gene segment (one or several adjacent cassette exons) intissue t was calculated as follows

PSIt frac14It

It thorn Et

It frac1412

XiM1

it LM1frac12 thornXiM2

it M2Rfrac12

0

1A

Et frac14X

i

itfrac12LR

where it LM1frac12 denotes the observed read number of this junc-tion in individual i and tissue t and It and Et denote the inclu-sion and exclusion junction read number respectively Lastlyonly splicing events that satisfied the criteria below were usedfor further analysis

1) Must have at least one inclusion or exclusion junction read inevery primate individual for three-species comparison analy-sis or one read in every individual for four-species comparisonanalysis in any tissue We did not require that both inclusionand exclusion reads be detected at the same time in order toallow for the identification of species-specific isoforms

2) In at least one species-tissue combination there must notbe less than four individuals with a left inclusion read andalso no fewer than four individuals with a right inclusionread in at least in one species-tissue combination theremust not be less than four individuals that have an exclu-sion read Additionally we filtered out splicing events withPSI lt 01 or PSI gt 09 in all species-tissue combinations toremove low-abundance alternative isoforms (12)

3) BothP

i it LM1frac12 andP

i it M2Rfrac12 should be larger than thenumber of non-junction reads covering the left or right splicesite ie the retention ratio of either the left or right intronshould be less than 05

Pi it LM1frac12 should not be less than

15 ofP

i it M2Rfrac12 and vice versa This criterion aims to ex-clude events that clearly indicate a mixed splicing type includ-ing intron retention alternative donor sites or alternativeacceptor sites

4) Both for inclusion and exclusion reads the multiple-mapped read number should be less than 15 This crite-rion can alleviate the bias that arises when a splicing regionhas a unique nucleotide sequence in the genome of onespecies but is duplicated in the genome of another species

Detecting lineage-specific spliced gene segments

We first required that the splice sites of analyzed cassette exonsets should be conserved among all compared species and that

the genes should be expressed in all individuals of these species Inthe primate comparison assay rhesus macaque was regarded asthe outgroup in order to detect splicing changes that occurred onthe human and chimpanzee lineages In the four-species compari-son assay mouse was regarded as the outgroup in order to detectsplicing changes that occurred on the ape and rhesus macaque lin-eages All comparisons were done separately in each tissue

In the primate comparison assay a quasi-binomial general-ized linear model was applied based on inclusion and exclusionread numbers using the glm function in R v2 test P-values werecalculated using the species term of this model For multiple-test correction 1000 permutations were performed by randomlyshuffling the species labels We used 005 as the false discoveryrate cutoff We further required that the differences betweenthe maximum and minimum PSI values among the three pri-mate species be larger than 02 The cassette exon sets that metthese criteria were considered as differentially spliced in pri-mates Next we classified each differentially spliced exon set ashuman lineage-specific chimpanzee lineage-specific or otherin each of the five tissues For example we regarded min

H Cj j H Rj jeth THORN gt 2 C Rj j as human lineage-specific and re-garded min H Cj j C Rj jeth THORN gt 2 H Rj j as chimpanzee lineage-specific where H C and R denote the PSI values in humanchimpanzee and rhesus macaque respectively

In the four-species comparison assay we also used the quasi-binomnial generalized linear model test to detect divergent cassetteexon sets followed by 1000 permutations for multiple-test correc-tion with a 02 PSI change cutoff To further detect splicing changesspecific to the rhesus macaque and ape lineages we compared thePSI values of mouse (M) macaque (R) and ape (ie the mean valueof human and chimpanzee) (A) We required that the inclusionlevel on the mouse lineage should be consistent with the inclusionlevel either on the macaque or ape lineages (ie MAj j lt 02 orM Rj j lt 02) Under this condition cassette exon sets that satis-

fied min M Aj j RAj jeth THORN gt 2 M Rj jwere regarded as ape lineage-specific while the cassette exon sets that satisfied min

M Rj j RAj jeth THORN gt 2 MAj j were regarded as rhesus macaquelineage-specific In the birth-and-death analysis of splicing patterns(Fig 3BndashD) equivalent criteria min M Hj j H 1

2 Cthorn Reth THORN

gt 2M 1

2 Cthorn Reth THORN and min M Cj j C 1

2 Hthorn Reth THORN

gt 2 M 12 Hthorn Reth THORN

were applied to human and chimpanzee lineage-specific cassetteexon sets in order to make the numbers of changed cassette exonsets comparable among the four lineages

For the convenience of comprehensively analyzing splicingchanges we chose a representative tissue from the five tissuesfor each changed cassette exon set The representative tissuechosen had the smallest P-value of interspecific PSI changeamong the tissues whose splice change lineage assignment wasthe most common For example if a cassette exon set has P-val-ues 001 0005 003 005 005 in muscle kidney prefrontal-cortex visual-cortex and cerebellum tissues specificallyspliced in human macaque human human unassigned line-ages respectively then the representative tissue is musclesince its most common splice change lineage assignment is hu-man and muscle has the lowest P-value among them (001)

The remaining conserved cassette exon sets with PSIchanges below 02 among three primates in all tissues wereused as a control group on further analysis

Defining the six splicing change patterns

We compared PSI values of lineage-specific cassette exon setsin the changed lineage against the other two (primate assay) orthree (four-species assay) unchanged lineages For each exon

1482 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 8: Predominant patterns of splicing evolution on human ...

brain region and 19 included splicing changes that altered pro-tein sequences

DiscussionSplicing evolution includes a variety of processes includingbirth and death of novel isoforms as well as increases and de-creases in the frequency of existing ones We investigated eachof these processes separately by classifying differences in exoninclusion levels among four species into six evolutionary pat-terns This classification resulted in an unexpected observationdifferent splicing patterns substantially varied in their frequen-cies and potential functional implications

Among the six splicing patterns one representing the birthof novel isoforms by constitutive-to-alternative exon transitioncontained 60ndash65 of all detected splicing differences Despite itsprevalence our results suggest that this pattern represents nei-ther the primary driving force of long-term isoform repertoireevolution nor the source of novel gene functionality Isoformscreated within this pattern often result in reading frame disrup-tion are less conserved at the DNA sequence level do not clus-ter in any functional categories and are lost more rapidlyduring the course of evolution than isoform changes in theother splicing patterns Although the CDS regions of genes con-taining D1-splicing events are less conserved at the sequencelevel compared with other patterns (Fig 4B) they show thesame proportion of one-to-one orthologs in comparisons to dis-tant species chicken and frog (Supplementary Material Fig S6)This result argues against the reduced functionality of genescontaining D1-splicing events Instead our observation ofhigher synonymous substitution rates (Ks) but not non-synonymous substitution rates (Ka) in D1 pattern exons(Fig 4C) indicates that the lower CDS conservation of D1 patterngenes might be due to a higher local mutation rate Accordinglya higher local mutation rate might lead to overrepresentation ofsplice sites mutations and mutations affecting cis-factor bindingcites associated with D1 pattern events (Fig 5) Our analysisalso suggests that constitutive-to-alternative splicing changesmight be primarily driven by mutations within proximal splicesites and splicing factor binding motifs thereby decreasing theirefficiency Overall these observations strongly imply the neu-tral or deleterious nature of splicing events leading toconstitutive-to-alternative exon transition Nonetheless wecannot exclude that perhaps a few of the novel alternative iso-forms contained in this pattern may gain new functionality dur-ing evolution

The majority of the remaining splicing events (22ndash27 of thetotal) does not affect isoform numbers but instead changes theirrelative frequencies These alternative isoforms tend to pre-serve protein reading frames are highly conserved at the DNAsequence level but tend to show high KaKs ratios indicating anacceleration of protein sequence evolution and tend to be pre-served by evolution We further identified mutations in trans-regulator binding sites as a potential regulatory mechanismdriving this type of splicing change These observations suggestthat changes in inclusion frequencies of existing isoforms rep-resent functional evolutionary transitions

Changes in the inclusion frequency of existing isoformsmight be particularly relevant for evolution of the human brainOne of the two patterns constituting this group of events theincrease in isoform inclusion levels showed a 2-fold accelera-tion on the human evolutionary lineage when compared withthe chimpanzee lineage This acceleration particularly affectedthe isoform repertoire in the brain Furthermore despite the

relatively small number of observed splicing events both pat-terns clustered significantly in biological processes relevant tobrain function

Due to the limited numbers of detected events we were un-able to provide a detailed characterization of the remainingthree evolutionary splicing patterns However this omissiondoes not mean that the splicing processes described by thesepatterns do not play a role in evolution For instance novelexon birth is one of the essential mechanisms leading to theemergence of new gene functions (31ndash33) Another limitation ofour study is the small number of examined tissues As largerdata sets representing a wider selection of species tissues andevolutionary distances appear categorizing splicing events intodiscrete evolutionary patterns might lead to novel insights intothe evolutionary mechanisms and functional consequences ofsplicing changes

Materials and MethodsSamples

The RNA-seq data used in this study were taken from the litera-ture (19) (Ensembl ID GSE49379 Supplementary MaterialTable S1) Selected human chimpanzee and rhesus macaquesamples used in the RNA-seq experiment were also utilized inthe PCR validation experiments as labeled in SupplementaryMaterial Table S1

For western blot human samples were obtained from theNational Institute of Child Health and Human Development(NICHD) Brain and Tissue Bank for Developmental Disorders atthe University of Maryland Written consent for the use of hu-man tissues for research was obtained from all donors or theirnext of kin All subjects were defined as healthy controls by fo-rensic pathologists at the corresponding tissue bank All sub-jects suffered sudden death with no prolonged agonal stateChimpanzee samples were obtained from the BiomedicalPrimate Research Centre The Netherlands Rhesus macaquesamples were obtained from the Suzhou Experimental AnimalCenter China All nonhuman primates used in this study suf-fered sudden deaths for reasons other than their participationin this study and without any relation to the tissue used Thecerebellum samples were dissected from the lateral part of thecerebellar hemispheres

Reads mapping

About 187 million 1 100 nt single-end RNA-seq reads weremapped to human chimpanzee rhesus macaque and mousereference genomes (genome assemble version hg19 panTro3rheMac2 and mm9) using the RNA-seq read mapping softwareTophat (v203 httptophatcbcbumdedu) (34) In the firstround of mapping we detected de-novo junctions in each sam-ple A total of 134 billion reads including 305 million junctionreads were successfully mapped to reference genomes(mapped proportion 72 Supplementary Material Table S2)We combined all detected junctions after we transferred theircoordinates to other species using liftOver (httpsgenomeucsceducgi-binhgLiftOver) The union of all junctions was in-putted in the second round of mapping in order to equalizejunction detection sensitivity in all species Considering thathuman genome annotations are more complete than those forchimpanzees and macaques here we did not use public genestructure annotations in order to avoid analysis bias

1481Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

PSI calculation

To minimize interspecific artifacts caused by length or mappingefficiency changes inside exons we only used junction reads forthe PSI value calculation The PSI value is equal to the propor-tion of inclusion reads among the total reads For example let[L R] denote the exclusion junction ie the junction supportingthe exon-exclusive isoform where L and R represent the posi-tions of its left (50-side) and right (30-side) splice sites respec-tively Also let [L M1] and [M2 R] denote its left and rightinclusion junctions ie the junction supporting the exon-inclusive isoform where M1 and M2 are two reference positionsbetween L and R In the event of a single cassette exon M1 andM2 are the two ends of this exon For each species the PSI valueof this gene segment (one or several adjacent cassette exons) intissue t was calculated as follows

PSIt frac14It

It thorn Et

It frac1412

XiM1

it LM1frac12 thornXiM2

it M2Rfrac12

0

1A

Et frac14X

i

itfrac12LR

where it LM1frac12 denotes the observed read number of this junc-tion in individual i and tissue t and It and Et denote the inclu-sion and exclusion junction read number respectively Lastlyonly splicing events that satisfied the criteria below were usedfor further analysis

1) Must have at least one inclusion or exclusion junction read inevery primate individual for three-species comparison analy-sis or one read in every individual for four-species comparisonanalysis in any tissue We did not require that both inclusionand exclusion reads be detected at the same time in order toallow for the identification of species-specific isoforms

2) In at least one species-tissue combination there must notbe less than four individuals with a left inclusion read andalso no fewer than four individuals with a right inclusionread in at least in one species-tissue combination theremust not be less than four individuals that have an exclu-sion read Additionally we filtered out splicing events withPSI lt 01 or PSI gt 09 in all species-tissue combinations toremove low-abundance alternative isoforms (12)

3) BothP

i it LM1frac12 andP

i it M2Rfrac12 should be larger than thenumber of non-junction reads covering the left or right splicesite ie the retention ratio of either the left or right intronshould be less than 05

Pi it LM1frac12 should not be less than

15 ofP

i it M2Rfrac12 and vice versa This criterion aims to ex-clude events that clearly indicate a mixed splicing type includ-ing intron retention alternative donor sites or alternativeacceptor sites

4) Both for inclusion and exclusion reads the multiple-mapped read number should be less than 15 This crite-rion can alleviate the bias that arises when a splicing regionhas a unique nucleotide sequence in the genome of onespecies but is duplicated in the genome of another species

Detecting lineage-specific spliced gene segments

We first required that the splice sites of analyzed cassette exonsets should be conserved among all compared species and that

the genes should be expressed in all individuals of these species Inthe primate comparison assay rhesus macaque was regarded asthe outgroup in order to detect splicing changes that occurred onthe human and chimpanzee lineages In the four-species compari-son assay mouse was regarded as the outgroup in order to detectsplicing changes that occurred on the ape and rhesus macaque lin-eages All comparisons were done separately in each tissue

In the primate comparison assay a quasi-binomial general-ized linear model was applied based on inclusion and exclusionread numbers using the glm function in R v2 test P-values werecalculated using the species term of this model For multiple-test correction 1000 permutations were performed by randomlyshuffling the species labels We used 005 as the false discoveryrate cutoff We further required that the differences betweenthe maximum and minimum PSI values among the three pri-mate species be larger than 02 The cassette exon sets that metthese criteria were considered as differentially spliced in pri-mates Next we classified each differentially spliced exon set ashuman lineage-specific chimpanzee lineage-specific or otherin each of the five tissues For example we regarded min

H Cj j H Rj jeth THORN gt 2 C Rj j as human lineage-specific and re-garded min H Cj j C Rj jeth THORN gt 2 H Rj j as chimpanzee lineage-specific where H C and R denote the PSI values in humanchimpanzee and rhesus macaque respectively

In the four-species comparison assay we also used the quasi-binomnial generalized linear model test to detect divergent cassetteexon sets followed by 1000 permutations for multiple-test correc-tion with a 02 PSI change cutoff To further detect splicing changesspecific to the rhesus macaque and ape lineages we compared thePSI values of mouse (M) macaque (R) and ape (ie the mean valueof human and chimpanzee) (A) We required that the inclusionlevel on the mouse lineage should be consistent with the inclusionlevel either on the macaque or ape lineages (ie MAj j lt 02 orM Rj j lt 02) Under this condition cassette exon sets that satis-

fied min M Aj j RAj jeth THORN gt 2 M Rj jwere regarded as ape lineage-specific while the cassette exon sets that satisfied min

M Rj j RAj jeth THORN gt 2 MAj j were regarded as rhesus macaquelineage-specific In the birth-and-death analysis of splicing patterns(Fig 3BndashD) equivalent criteria min M Hj j H 1

2 Cthorn Reth THORN

gt 2M 1

2 Cthorn Reth THORN and min M Cj j C 1

2 Hthorn Reth THORN

gt 2 M 12 Hthorn Reth THORN

were applied to human and chimpanzee lineage-specific cassetteexon sets in order to make the numbers of changed cassette exonsets comparable among the four lineages

For the convenience of comprehensively analyzing splicingchanges we chose a representative tissue from the five tissuesfor each changed cassette exon set The representative tissuechosen had the smallest P-value of interspecific PSI changeamong the tissues whose splice change lineage assignment wasthe most common For example if a cassette exon set has P-val-ues 001 0005 003 005 005 in muscle kidney prefrontal-cortex visual-cortex and cerebellum tissues specificallyspliced in human macaque human human unassigned line-ages respectively then the representative tissue is musclesince its most common splice change lineage assignment is hu-man and muscle has the lowest P-value among them (001)

The remaining conserved cassette exon sets with PSIchanges below 02 among three primates in all tissues wereused as a control group on further analysis

Defining the six splicing change patterns

We compared PSI values of lineage-specific cassette exon setsin the changed lineage against the other two (primate assay) orthree (four-species assay) unchanged lineages For each exon

1482 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 9: Predominant patterns of splicing evolution on human ...

PSI calculation

To minimize interspecific artifacts caused by length or mappingefficiency changes inside exons we only used junction reads forthe PSI value calculation The PSI value is equal to the propor-tion of inclusion reads among the total reads For example let[L R] denote the exclusion junction ie the junction supportingthe exon-exclusive isoform where L and R represent the posi-tions of its left (50-side) and right (30-side) splice sites respec-tively Also let [L M1] and [M2 R] denote its left and rightinclusion junctions ie the junction supporting the exon-inclusive isoform where M1 and M2 are two reference positionsbetween L and R In the event of a single cassette exon M1 andM2 are the two ends of this exon For each species the PSI valueof this gene segment (one or several adjacent cassette exons) intissue t was calculated as follows

PSIt frac14It

It thorn Et

It frac1412

XiM1

it LM1frac12 thornXiM2

it M2Rfrac12

0

1A

Et frac14X

i

itfrac12LR

where it LM1frac12 denotes the observed read number of this junc-tion in individual i and tissue t and It and Et denote the inclu-sion and exclusion junction read number respectively Lastlyonly splicing events that satisfied the criteria below were usedfor further analysis

1) Must have at least one inclusion or exclusion junction read inevery primate individual for three-species comparison analy-sis or one read in every individual for four-species comparisonanalysis in any tissue We did not require that both inclusionand exclusion reads be detected at the same time in order toallow for the identification of species-specific isoforms

2) In at least one species-tissue combination there must notbe less than four individuals with a left inclusion read andalso no fewer than four individuals with a right inclusionread in at least in one species-tissue combination theremust not be less than four individuals that have an exclu-sion read Additionally we filtered out splicing events withPSI lt 01 or PSI gt 09 in all species-tissue combinations toremove low-abundance alternative isoforms (12)

3) BothP

i it LM1frac12 andP

i it M2Rfrac12 should be larger than thenumber of non-junction reads covering the left or right splicesite ie the retention ratio of either the left or right intronshould be less than 05

Pi it LM1frac12 should not be less than

15 ofP

i it M2Rfrac12 and vice versa This criterion aims to ex-clude events that clearly indicate a mixed splicing type includ-ing intron retention alternative donor sites or alternativeacceptor sites

4) Both for inclusion and exclusion reads the multiple-mapped read number should be less than 15 This crite-rion can alleviate the bias that arises when a splicing regionhas a unique nucleotide sequence in the genome of onespecies but is duplicated in the genome of another species

Detecting lineage-specific spliced gene segments

We first required that the splice sites of analyzed cassette exonsets should be conserved among all compared species and that

the genes should be expressed in all individuals of these species Inthe primate comparison assay rhesus macaque was regarded asthe outgroup in order to detect splicing changes that occurred onthe human and chimpanzee lineages In the four-species compari-son assay mouse was regarded as the outgroup in order to detectsplicing changes that occurred on the ape and rhesus macaque lin-eages All comparisons were done separately in each tissue

In the primate comparison assay a quasi-binomial general-ized linear model was applied based on inclusion and exclusionread numbers using the glm function in R v2 test P-values werecalculated using the species term of this model For multiple-test correction 1000 permutations were performed by randomlyshuffling the species labels We used 005 as the false discoveryrate cutoff We further required that the differences betweenthe maximum and minimum PSI values among the three pri-mate species be larger than 02 The cassette exon sets that metthese criteria were considered as differentially spliced in pri-mates Next we classified each differentially spliced exon set ashuman lineage-specific chimpanzee lineage-specific or otherin each of the five tissues For example we regarded min

H Cj j H Rj jeth THORN gt 2 C Rj j as human lineage-specific and re-garded min H Cj j C Rj jeth THORN gt 2 H Rj j as chimpanzee lineage-specific where H C and R denote the PSI values in humanchimpanzee and rhesus macaque respectively

In the four-species comparison assay we also used the quasi-binomnial generalized linear model test to detect divergent cassetteexon sets followed by 1000 permutations for multiple-test correc-tion with a 02 PSI change cutoff To further detect splicing changesspecific to the rhesus macaque and ape lineages we compared thePSI values of mouse (M) macaque (R) and ape (ie the mean valueof human and chimpanzee) (A) We required that the inclusionlevel on the mouse lineage should be consistent with the inclusionlevel either on the macaque or ape lineages (ie MAj j lt 02 orM Rj j lt 02) Under this condition cassette exon sets that satis-

fied min M Aj j RAj jeth THORN gt 2 M Rj jwere regarded as ape lineage-specific while the cassette exon sets that satisfied min

M Rj j RAj jeth THORN gt 2 MAj j were regarded as rhesus macaquelineage-specific In the birth-and-death analysis of splicing patterns(Fig 3BndashD) equivalent criteria min M Hj j H 1

2 Cthorn Reth THORN

gt 2M 1

2 Cthorn Reth THORN and min M Cj j C 1

2 Hthorn Reth THORN

gt 2 M 12 Hthorn Reth THORN

were applied to human and chimpanzee lineage-specific cassetteexon sets in order to make the numbers of changed cassette exonsets comparable among the four lineages

For the convenience of comprehensively analyzing splicingchanges we chose a representative tissue from the five tissuesfor each changed cassette exon set The representative tissuechosen had the smallest P-value of interspecific PSI changeamong the tissues whose splice change lineage assignment wasthe most common For example if a cassette exon set has P-val-ues 001 0005 003 005 005 in muscle kidney prefrontal-cortex visual-cortex and cerebellum tissues specificallyspliced in human macaque human human unassigned line-ages respectively then the representative tissue is musclesince its most common splice change lineage assignment is hu-man and muscle has the lowest P-value among them (001)

The remaining conserved cassette exon sets with PSIchanges below 02 among three primates in all tissues wereused as a control group on further analysis

Defining the six splicing change patterns

We compared PSI values of lineage-specific cassette exon setsin the changed lineage against the other two (primate assay) orthree (four-species assay) unchanged lineages For each exon

1482 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 10: Predominant patterns of splicing evolution on human ...

set if its PSI value on the changed lineage was below 003 orabove 097 it was assigned to the D3 or I3 pattern respectivelyif its PSI value on any of the unchanged lineages was below 003or above 097 it was assigned to the I1 or D2 pattern respec-tively otherwise it was assigned to the I2 or D2 pattern depend-ing on the direction of change

Birth and death rate calculation for the D1 and I2thornD2patterns

Suppose that in a unit of evolutionary time eg a million yearsthe probability of a cassette exon set whose splicing changes ina specific pattern (pattern birth) is p1 and for a cassette exon setwhose splicing formerly changed the probability of a reversechange (pattern death) is p2 For the human chimpanzee andrhesus macaque lineages after a time period t the expectedcassette exon set number changed on lineage s (ms) in a givenpattern can be described by the differential equation as follows

ms frac14 np1 msp2

mstfrac140 frac14 0

s 2 human chimpanzee macaquef g

where n is the number of total alternatively spliced cassetteexon sets Solving this equation we get the proportion q of cas-sette exon sets with detectable splicing changes as a function oft as follows

q sp1 p2eth THORN frac14 f teth THORN frac14 ms

nfrac14 p1 p1ep2t

p2

s 2 human chimpanzee and macaquef g

For simplicity we did not consider the case of parallel evolu-tion as its probability is extremely low

Next we built a model for the ape lineage Let t1 denote thetime range of the ape lineage (the green line in Fig 3A) and t2

denote the time range since the human and chimpanzee line-ages separated A pattern specific to the ape lineage can both beborn and die during t1 but can only die during t2 since patterndeath occurring in either the human or chimpanzee lineageswould make the ape-specific change undetectable The ex-pected number of cassette exon sets specifically spliced in theape lineage mape can be described according to the differentialequation as follows

mape frac14 2mapep2

mapet2frac140 frac14 n f etht1THORN

Solving this equation we also get the proportion q of cassetteexon sets with splicing changes detectable in the ape lineage as

qethsfrac14apep1p2THORN frac14mape

nfrac14 p1e2p2t2 ethep2t1 1THORN

p2

For all four lineages it is reasonable to assume that the ac-tual observed number of cassette exon set changes Ms in lineages follows a binominal distribution

Ms Bethn qethsp1 p2THORNTHORN

Therefore the likelihood for given p1 and p2 is

Pr Mjn p1 p2eth THORN frac14Y

s

PrethMsjBethn qethsp1 p2THORNTHORNTHORN

To find p1 and p2 with the highest likelihood we calculatedPr Mjnp1p2eth THORN in a 1000 1000 Cartesian space in the range ofp1 frac14 [0 0003] p2 frac14 [0 01] for the I2thornD2 patterns and p1 frac14[0 0015] p2 frac14 [0 025] for the D2 pattern As the likelihood for p1

and p2 in the boundaries of these spaces was extremely smallwe ignored the likelihoods outside of this area In Figure 3D wealso show the high-likelihood sub-areas for the confidence in-terval The likelihoods of these sub-areas are 95 and 99 ofthe sum of the total likelihood respectively

KaKs analysis

For this analysis we used coding regions of gene segments spe-cifically spliced on human and chimpanzee lineages The se-quences of these regions were compared with a primateconsensus genome in order to find out the nucleotide mutationson either the human or chimpanzee lineages The numbers ofsynonymous and non-synonymous mutations were calculatedusing the program yn00 in PLAM (v47a httpabacusgeneuclacuksoftwarepamlhtml) (35)

Splice site analysis

We used Splice Site Tools (httpibistauacilssatSpliceSiteFramehtm) to assess the strength of splice sites The 50 splicesite strength was calculated based on nine nucleotides from po-sition 3 tothorn6 while the 30 splice site strength was calculatedbased on 15 nucleotides from position 14 tothorn1 We only usedsplice sites flanking cassette exon sets with consistentstrengths between human and chimpanzee lineages and com-pared them with the rhesus macaque lineage We used a totalof 128 D1 exon sets specifically spliced on either the rhesus ma-caque or ape lineage which were detected in the four-speciescomparison To get sufficient I2thornD2 exon sets for statisticalanalysis we used 178 cassette exon sets which were alterna-tively spliced in all three primates (003ltPSIlt 097) with simi-lar inclusion levels in the human and chimpanzee but differentinclusion levels from the macaque (assessed using similar crite-ria as earlier) These exon sets had I2 and D2 splicing changeson either the rhesus macaque or ape lineages The other 1026cassette exon sets with PSI changes below 02 among three pri-mates were used as a control group

RNA-binding motif analysis

Sequences of 329 RNA-binding motifs from 52 splicing factors inhumans were downloaded from SpliceAid database (httpwwwintroniitsplicinghtml) (36) To check for the existence of

1483Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 11: Predominant patterns of splicing evolution on human ...

these motifs in the three primate species we screened six re-gions of each cassette exon including 200nt from the proximaland distal ends of adjacent introns (with an intronlengthgt300nt) and 39nt from the ends of the cassette exon(with an exon lengthgt60nt) We detected motif changes (turn-on and turn-off) on both the human and chimpanzee lineagesusing rhesus macaque as an outgroup

GO enrichment analysis

We tested GO enrichment for each combination of the threedominant patterns (D1 I2 and D2) and three primate lineagesGenes with specific splicing changes in the tested dominantpatterns and primate lineages in any of five tissues were usedFor the I2 and D2 patterns we excluded marginal cases by re-quiring that the minority isoform in any primate species shouldbe greater than 10 For each test a background gene set wasrandomly chosen from the cassette-exon-containing genes tomake sure that the background gene set had a similar expres-sion distribution as the test gene set GO enrichments weretested using the R package topGO (37) with lsquoclassicrsquo algorithmand lsquoFisher exact testrsquo followed by the Benjamini-Hochbergmultiple test correction with significant cutoff FDRfrac14 005 Tomake the results more concise if two GO terms from father andchild nodes were both significant with the exact same matchedgenes only the child term was reported The enriched ontolo-gies and their significances are shown in SupplementaryMaterial Tables S7 and S8

PCR confirmation

PCR primers were designed at the two neighboring constitutiveexons of spliced gene segments in order to detect longer (exon-in-cluded) and shorter (exon-excluded) products (SupplementaryMaterial Table S5) First-strand cDNA was synthesized usingSuperScript II reverse transcriptase (Invitrogen) After doublestrand cDNA synthesis PCR reactions were performed at 94C for2 min followed by 30 cycles of 95C for 20s 47ndash58C (depending onthe primers) for 20s and 72C for 30s and complete elongation at72C for 5 min with rTaq DNA polymerase (TAKARA) Each PCRproduct was verified in a 2 (wv) agarose gel with size estima-tion based on the DL 2000 DNA marker (Takara) or verified byDNA 1000 chip (Agilent) as the protocol suggested The PCR PSIvalues were calculated as the mean proportion of the longerproduct strength between the two expected products To validateprimer efficiency in all species we first tested 12 cassette exonsets in three PCRs (3 species 1 tissue 1 individuals) Nine pro-duced discernable PCR bands with sizes corresponding to twopredicted alternative transcript isoforms Out of the nine casesone showed inconsistent band intensities in the rhesus macaquefor unknown reasons The other eight cases were further ex-panded for a total of nine PCRs (3 species 1 tissue 3 individ-uals) All repeat experiments showed band intensities consistentwith our RNA-seq results (Supplementary Material Fig S5)

Western blot

Western blot samples are in Supplementary Material Table S6Among the seven tested antibodies Anti-CEP63 polyclonal rab-bit antibody was purchased from Merck Millipore (CatalogueNumber 06ndash1292) Electrophoresis and western blot procedureswere carried out as described in the NuPAGE Novex system(Invitrogen) Briefly the protein was denatured at 70C for

10 min with reduced loading buffer The electrophoresis cas-sette was set and run at 200 V for 1ndash15 h according to the size ofthe target protein The blotting sandwich was assembled andthe protein was transferred from the gel onto the polyvinyli-dene difluoride (PVDF) membrane at 100 V for 1ndash2 h Afterunspecific binding was blocked for 2 h at room temperature themembrane was incubated with the primary antibody at 4Covernight The next day the membrane was incubated in horse-radish peroxidase (HRP)-linked secondary antibody for 1 h aftera brief wash in PBST (PBS with 05 TWEEN 20) The chemilumi-nescent signal was generated by adding the substrate to themembrane and exposing it with photographic film in the darkroom

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingThis work was supported by the Strategic Priority ResearchProgram of the Chinese Academy of Sciences (XDB13010200)National Natural Science Foundation of China (9133120331171232 31501047 and 31420103920) National One ThousandForeign Experts Plan (WQ20123100078) Bureau of InternationalCooperation Chinese Academy of Sciences (GJHZ201313) andRussian Science Foundation (16ndash14-00220)

References1 Black DL (2003) Mechanisms of alternative pre-messenger

RNA splicing Annu Rev Biochem 72 291ndash3362 Ramani AK Calarco JA Pan Q Mavandadi S Wang Y

Nelson AC Lee LJ Morris Q Blencowe BJ Zhen Met al (2011) Genome-wide analysis of alternative splicing inCaenorhabditis elegans Genome Res 21 342ndash348

3 Wang ET Sandberg R Luo S Khrebtukova I Zhang LMayr C Kingsmore SF Schroth GP and Burge CB (2008)Alternative isoform regulation in human tissue transcrip-tomes Nature 456 470ndash476

4 Johnson JM Castle J Garrett-Engele P Kan Z LoerchPM Armour CD Santos R Schadt EE Stoughton R andShoemaker DD (2003) Genome-wide survey of human al-ternative pre-mRNA splicing with exon junction microar-rays Science 302 2141ndash2144

5 Mazin P Xiong J Liu X Yan Z Zhang X Li M He LSomel M Yuan Y Phoebe Chen YP et al (2013)Widespread splicing changes in human brain developmentand aging Mol Syst Biol 9 633

6 Gonzalez-Porta M Calvo M Sammeth M and Guigo R(2012) Estimation of alternative splicing variability in humanpopulations Genome Res 22 528ndash538

7 Barbosa-Morais NL Irimia M Pan Q Xiong HYGueroussov S Lee LJ Slobodeniuc V Kutter C Watt SColak R et al (2012) The evolutionary landscape of alterna-tive splicing in vertebrate species Science 338 1587ndash1593

8 Merkin J Russell C Chen P and Burge CB (2012)Evolutionary dynamics of gene and isoform regulation inmammalian tissues Science 338 1593ndash1599

9 Kelemen O Convertini P Zhang Z Wen Y Shen MFalaleeva M and Stamm S (2013) Function of alternativesplicing Gene 514 1ndash30

1484 | Human Molecular Genetics 2018 Vol 27 No 8

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018

Page 12: Predominant patterns of splicing evolution on human ...

10 Buljan M Chalancon G Eustermann S Wagner GPFuxreiter M Bateman A and Babu MM (2012)Tissue-specific splicing of disordered segments that embedbinding motifs rewires protein interaction networks MolCell 46 871ndash883

11 Ellis JD Barrios-Rodiles M Colak R Irimia M Kim TCalarco JA Wang X Pan Q OrsquoHanlon D Kim PM et al(2012) Tissue-specific alternative splicing remodelsprotein-protein interaction networks Mol Cell 46 884ndash892

12 Pickrell JK Pai AA Gilad Y and Pritchard JK (2010)Noisy splicing drives mRNA isoform diversity in humancells PLoS Genet 6 e1001236

13 Reyes A Anders S Weatheritt RJ Gibson TJ SteinmetzLM and Huber W (2013) Drift and conservation of differen-tial exon usage across tissues in primate species Proc NatlAcad Sci U S A 110 15377ndash15382

14 Keren H Lev-Maor G and Ast G (2010) Alternative splicingand evolution diversification exon definition and functionNat Rev Genet 11 345ndash355

15 Glazko GV and Nei M (2003) Estimation of divergencetimes for major lineages of primate species Mol Biol Evol20 424ndash434

16 Raaum RL Sterner KN Noviello CM Stewart CB andDisotell TR (2005) Catarrhine primate divergence dates es-timated from complete mitochondrial genomes concor-dance with fossil and nuclear DNA evidence J Hum Evol48 237ndash257

17 Steiper ME and Young NM (2006) Primate molecular di-vergence dates Mol Phylogenet Evol 41 384ndash394

18 Langergraber KE Prufer K Rowney C Boesch CCrockford C Fawcett K Inoue E Inoue-Muruyama MMitani JC Muller MN et al (2012) Generation times in wildchimpanzees and gorillas suggest earlier divergence timesin great ape and human evolution Proc Natl Acad Sci U SA 109 15716ndash15721

19 Bozek K Wei Y Yan Z Liu X Xiong J Sugimoto MTomita M Paabo S Pieszek R Sherwood CC et al (2014)Exceptional evolutionary divergence of human muscle andbrain metabolomes parallels human cognitive and physicaluniqueness PLoS Biol 12 e1001871

20 Lin L Shen S Jiang P Sato S Davidson BL and Xing Y(2010) Evolution of alternative splicing in primate brain tran-scriptomes Hum Mol Genet 19 2958ndash2973

21 Mccullagh P (1984) Generalized linear-models Eur J OperRes 16 285ndash292

22 Consul PC (1975) Characterization of Lagrangian Poissonand quasi-binomial distributions Commun Stat 4 555ndash563

23 Consul PC and Mittal SP (1975) New urn model with pre-determined strategy Biometr Z 17 67ndash75

24 Ermakova EO Nurtdinov RN and Gelfand MS (2006) Fastrate of evolution in alternatively spliced coding regions ofmammalian genes BMC Genomics 7 84

25 Plass M and Eyras E (2006) Differentiated evolutionaryrates in alternative exons and the implications for splicingregulation BMC Evol Biol 6 50

26 Lev-Maor G Goren A Sela N Kim E Keren H Doron-Faigenboim A Leibman-Barak S Pupko T and Ast G(2007) The ldquoalternativerdquo choice of constitutive exonsthroughout evolution PLoS Genet 3 e203

27 Fox-Walsh KL Dou Y Lam BJ Hung SP Baldi PF andHertel KJ (2005) The architecture of pre-mRNAs affectsmechanisms of splice-site pairing Proc Natl Acad Sci U SA 102 16176ndash16181

28 Cunningham BA Hemperly JJ Murray BA PredigerEA Brackenbury R and Edelman GM (1987) Neural celladhesion molecule structure immunoglobulin-like do-mains cell surface modulation and alternative RNA splic-ing Science 236 799ndash806

29 Gower HJ Barton CH Elsom VL Thompson J MooreSE Dickson G and Walsh FS (1988) Alternative splicinggenerates a secreted form of N-CAM in muscle and brainCell 55 955ndash964

30 Dalva MB McClelland AC and Kayser MS (2007) Cell ad-hesion molecules signalling functions at the synapse NatRev Neurosci 8 206ndash220

31 Schmitz J and Brosius J (2011) Exonization of transposedelements a challenge and opportunity for evolutionBiochimie 93 1928ndash1934

32 Lin L Jiang P Shen S Sato S Davidson BL and Xing Y(2009) Large-scale analysis of exonized mammalian-wide in-terspersed repeats in primate genomes Hum Mol Genet 182204ndash2214

33 Lin L Shen S Tye A Cai JJ Jiang P Davidson BL andXing Y (2008) Diverse splicing patterns of exonized Alu ele-ments in human tissues PLoS Genet 4 e1000225

34 Kim D Pertea G Trapnell C Pimentel H Kelley R andSalzberg SL (2013) TopHat2 accurate alignment of tran-scriptomes in the presence of insertions deletions and genefusions Genome Biol 14 R36

35 Yang Z (1997) PAML a program package for phylogeneticanalysis by maximum likelihood Comput Appl Biosci 13555ndash556

36 Piva F Giulietti M Nocchi L and Principato G (2009)SpliceAid a database of experimental RNA target motifs boundby splicing proteins in humans Bioinformatics 25 1211ndash1213

37 Alexa A and Rahnenfuhrer J (2016) topGO EnrichmentAnalysis for Gene Ontology R package version 2280 httpsbioconductororgpackagesreleasebiochtmltopGOhtml

1485Human Molecular Genetics 2018 Vol 27 No 8 |

Downloaded from httpsacademicoupcomhmgarticle-abstract27814744857231by MPI fuumlr evolutionaumlre Anthropologie useron 27 April 2018