Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes...

12
Supplementary Figure 1 The use of frequency and size cutoffs in CNV gene set enrichment tests to reduce genomic inflation. Quantilequantile plots were generated based on P values from CNV enrichment tests of random gene sets, using different MAF cutoffs (<0.1%, 1%) and CNV size cutoffs (removing the top 5% and 10% of CNVs overlapping the most genes). Each dot represents a different gene set. The 95% CI assuming uniformly distributed P values is displayed as the gray shaded area. The genomic inflation factor () is provided for each distribution. Inflation followed the reasonable null distribution when more stringent MAF thresholds and size cutoffs were applied (see MAF < 0.1% and removing the 10% of CNVs overlapping the most genes). Nature Genetics: doi:10.1038/ng.3903

Transcript of Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes...

Page 1: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Figure 1

The use of frequency and size cutoffs in CNV gene set enrichment tests to reduce genomic inflation.

Quantile–quantile plots were generated based on P values from CNV enrichment tests of random gene sets, using different MAF cutoffs (<0.1%, 1%) and CNV size cutoffs (removing the top 5% and 10% of CNVs overlapping the most genes). Each dot represents a different gene set. The 95% CI assuming uniformly distributed P values is displayed as the gray shaded area. The genomic inflation

factor () is provided for each distribution. Inflation followed the reasonable null distribution when more stringent MAF thresholds and size cutoffs were applied (see MAF < 0.1% and removing the 10% of CNVs overlapping the most genes).

Nature Genetics: doi:10.1038/ng.3903

Page 2: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Figure 2

Quantile–quantile plots of P values from enrichment tests of 1,766 gene sets.

Top left, case–control SNVs from whole-exome sequence data. Top right, de novo mutations from 1,077 trios. Bottom left, case–control CNVs. Bottom right, meta-analysis P values from Fisher’s method (dark blue). Tailored enrichment tests were applied to each variant type (Online Methods). Each dot represents a different gene set. The 95% CI assuming uniformly distributed P values is displayed as

the gray shaded area. The genomic inflation factor () is provided for each distribution. General inflation of P values from tests of

disruptive variants (loss-of-function and CNVs) was observed, but no inflation was observed for tests of synonymous variants. Damaging missense, missense variants with CADD Phred > 15.

Nature Genetics: doi:10.1038/ng.3903

Page 3: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Figure 3

Quantile–quantile plots of P values from enrichment tests of random gene sets.

Top left, case–control SNVs from whole-exome sequence data. Top right, de novo mutations from 1,077 trios. Bottom left, case–control CNVs. Bottom right, meta-analysis P values from Fisher’s method (dark blue). Genes were randomly sampled from the genome to create gene sets with the same size distribution as the 1,766 tested gene sets. Each dot represents a different gene set. The 95% CI assuming uniformly distributed P values is displayed as the gray shaded area. Tailored enrichment tests were applied to each variant

type (Online Methods). The genomic inflation factor () is provided for each distribution. No inflation of test statistics was observed in the meta-analysis P values. Damaging missense, missense variants with CADD Phred > 15.

Nature Genetics: doi:10.1038/ng.3903

Page 4: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Figure 4

Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental disorders.

In schizophrenia, ASD, and severe neurodevelopmental disorders, de novo mutations were enriched in a subset of genes intolerant of loss-of-function variants, with no excess of polygenic burden in the remaining genes. To generate 95% CIs and P values, the rates of de novo mutations in affected trios (1,077 schizophrenia trios, 4,038 trios with ASD, and 1,133 trios with severe neurodevelopmental disorders) were compared against the rate in unaffected control trios (2,038 trios) using Poisson exact tests. Plotted P values are from

the Poisson test of loss-of-function mutations. Damaging missense, missense variants with CADD Phred > 15.

Nature Genetics: doi:10.1038/ng.3903

Page 5: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Figure 5

Enrichment of damaging rare variants in genes ordered and grouped by the degree of loss-of-function intolerance in schizophrenia, ASD, and severe neurodevelopmental disorders.

(a) Schizophrenia cases compared to controls for rare SNVs and indels. (b) Rates of de novo mutation in schizophrenia, ASD, and

severe neurodevelopmental disorder probands as compared to control probands. Genes are ordered by their degree of loss-of-function intolerance (pLI score) and grouped into six categories: the 10% with the highest pLI score, the top 10–20% as ranked by pLI score, 20–40% as ranked by pLI score, and so on. Calculation of the 95% CIs and P values for the trio data followed the same method as in Supplementary Figure 4. A significant enrichment of rare, damaging variants was only observed in the 20% of genes with the highest

pLI score, while no signal was observed in the remaining genes. Error bars are 95% CIs of the estimates. Damaging missense, missense variants with CADD Phred > 15.

Nature Genetics: doi:10.1038/ng.3903

Page 6: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Figure 6

Non-random sampling of genes in the 1,766 tested gene sets.

(a) Genes are ranked and plotted based on the number of gene sets to which they belong. The top 1,000 genes were over-represented in gene sets from public databases, and genes outside the top 5,000 genes were under-represented. (b) Distribution of overlap

coefficients with the set of loss-of-function-intolerant genes. The overlap coefficients between each of the 1,766 discovery gene sets and the set of loss-of-function-intolerant genes were calculated. The overlap coefficients between randomly sampled gene sets and loss-of-function-intolerant genes were similarly computed. These values are displayed as two density plots. The overlap coefficient is a

similarity measure defined as

, where X and Y are sets of genes.

Nature Genetics: doi:10.1038/ng.3903

Page 7: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Figure 7

Heat map of overlap coefficients calculated between the 35 significant gene sets (FDR < 5%).

The overlap coefficients of the 35 gene sets enriched for rare coding variants conferring risk for schizophrenia were computed and are

clustered and displayed as a heat map. The overlap coefficient is a similarity measure defined as

where X and Y are sets of

genes. The overlap coefficients between each gene set and the set of loss-of-function-intolerant genes are displayed as rounded values. See Supplementary Table 2 and the Supplementary Note for more information on each gene set.

Nature Genetics: doi:10.1038/ng.3903

Page 8: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Figure 8

Summary of cognition and educational attainment data available for the schizophrenia whole-exome data set.

(a) Individuals diagnosed with schizophrenia (cases). (b) Individuals without a diagnosis of schizophrenia (controls). Information on

population is also provided (UK, Finland, and Sweden).

Nature Genetics: doi:10.1038/ng.3903

Page 9: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Figure 9

Enrichment of rare loss-of-function variants in loss-of-function-intolerant genes after excluding known developmental disorder–associated genes in schizophrenia cases stratified by information on cognitive function as compared to controls.

The P values shown were calculated using the variant threshold method comparing the burden of loss-of-function variants between the corresponding cases and controls. Error bars represent the 95% CIs of the point estimates. Damaging missense, missense variants with CADD Phred > 15.

Nature Genetics: doi:10.1038/ng.3903

Page 10: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

Supplementary Note Description of gene sets

We first accessed and combined gene sets from five public databases: Gene Ontology (release 146; June 22, 2015 release), KEGG (July 1, 2011 release), PANTHER (May 18, 2015 release), REACTOME (March 23, 2015 release), and the Molecular Signatures Database (MSigDB) hallmark processes (version 4, March 26, 2015 release). Given our focus on very rare and de novo variants, we had limited power to robustly detect enrichment in small gene sets, as evident in previous studies of schizophrenia and autism rare variation in which the strongest signals came from aggregating hundreds of genes1–3. Thus, we restricted our analyses to gene sets from the five public databases with more 100 genes. We further tested a number of gene sets selected based on biological hypotheses about schizophrenia risk, and genome-wide screens investigating rare variants in broader neurodevelopmental disorders. These included gene sets described in previous enrichment analyses of schizophrenia rare variants1: translational targets of FMRP4,5, components of the post-synaptic density1,6, ion channel proteins1, components of the ARC, mGluR5, and NMDAR complexes1, proteins at cortical inhibitory synapses7,8, targets of mir-1371, and genes near schizophrenia common risk loci1,9. We additionally incorporated gene sets previously shown to be enriched for autism risk genes: targets of CHD83,10,11, splice targets of RBFOX3,12,13, hippocampal gene expression networks14, and neuronal gene lists from the Gene2cognition database (http://www.genes2cognition.org)3.

We used the pLI metric described in the ExAC v0.3.1 database as a measure of gene-

level selective constraint15. Since the full v0.3.1 release contained the Swedish schizophrenia study, we used pLI scores calculated from the subset of the ExAC database containing 45,376 exomes that excluded individuals with a psychiatric diagnosis for all analyses in this study. 3,488 genes annotated with pLI > 0.9 were described as “loss-of-function intolerant”. We also tested for enrichment in the remaining set of 14,753 genes. We further ranked and grouped genes into deciles and bideciles according to the pLI metric (Supplementary Figure 5). ASD risk genes were defined as genes with a FDR < 10% or < 30% by Sanders et al. in the largest meta-analysis of ASD exomes to date16. For a less stringent list, we separately defined ASD and developmental disorder de novo genes as genes hit by a LoF or a LoF/missense de novo variant in the Sanders et al. and the DDD study16,17. The DECIPHER Developmental Disorder Genotype-Phenotype (DDG2P) database (April 13, 2015 release) was used to define genes diagnostic of developmental disorders17,18. For a high confidence list, as used for clinical reporting in the DDD study, we included genes with a monoallelic or a X-linked dominant mode of effect and robust evidence in the literature (“Confirmed DD Genes”, “Probable DD gene”, “Both DD and IF”). From these genes, we created four lists based on mechanism (LoF or LoF/missense) and affected organ system (brain/cognition or any organ system). We further extended these list with novel genes for severe developmental disorders identified in 4,293 parent-proband trios exome sequenced in the DDD study19. The 94 genome-wide significant genes were described in Supplementary Table 3 in McRae et al., and genes with de novo LoF

Nature Genetics: doi:10.1038/ng.3903

Page 11: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

mutations were appended to the LoF and LoF/missense lists, while genes with only de novo missense mutations were only added to the LoF/missense lists. Finally, for background gene lists, we defined cerebellar and cortical genes as those that are expressed in at least 80% of the corresponding human brain samples in the Brainspan RNA-seq dataset20. A gene was defined as expressed in a sample if the exon and whole gene read counts were greater than 10 counts, and the Cufflinks lower-bound FPKM estimate was greater than 021. For brain-enriched genes, we compared the differential expression of individual genes in the brain against all other tissues in the GTEx dataset22, and identified a subset that is 2-fold enriched with a FDR < 5%.

Consortia

UK10K consortium Richard Anney, Mohammad Ayub, Anthony Bailey, Gillian Baird, Jeff Barrett, Douglas Blackwood, Patrick Bolton, Gerome Breen, David Collier, Paul Cormican, Nick Craddock, Lucy Crooks, Sarah Curran, Petr Danecek, Richard Durbin, Louise Gallagher, Jonathan Green, Hugh Gurling, Richard Holt, Chris Joyce, Ann LeCouteur, Irene Lee, Jouko Lönnqvist, Shane McCarthy, Peter McGuffin, Andrew McIntosh, Andrew McQuillin, Alison Merikangas, Anthony Monaco, Dawn Muddyman, Michael O'Donovan, Michael Owen, Aarno Palotie, Jeremy Parr, Tiina Paunio, Olli Pietilainen, Karola Rehnström, Tarjinder Singh, David Skuse, Jim Stalker, David St. Clair, Jaana Suvisaari, Hywel Williams

INTERVAL study Participants in the INTERVAL randomised controlled trial were recruited with the active collaboration of NHS Blood and Transplant England (http://www.nhsbt.nhs.uk), which has supported field work and other elements of the trial. DNA extraction and genotyping was funded by the National Institute of Health Research (NIHR), the NIHR BioResource (http://bioresource.nihr.ac.uk/) and the NIHR Cambridge Biomedical Research Centre (www.cambridge-brc.org.uk). The academic coordinating centre for INTERVAL was supported by core funding from: NIHR Blood and Transplant Research Unit in Donor Health and Genomics, UK Medical Research Council (G0800270), British Heart Foundation (SP/09/002), and NIHR Research Cambridge Biomedical Research Centre. A complete list of the investigators and contributors to the INTERVAL trial is provided in reference23, and http://www.intervalstudy.org.uk/about-the-study/whos-involved/interval-contributors/.

References for Supplementary Note 1. Purcell, S. M. et al. A polygenic burden of rare disruptive mutations in schizophrenia.

Nature 506, 185–90 (2014). 2. Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic networks.

Nature 506, 179–184 (2014). 3. De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism.

Nature 515, 209–15 (2014). 4. Darnell, J. C. et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic

function and autism. Cell 146, 247–61 (2011).

Nature Genetics: doi:10.1038/ng.3903

Page 12: Nature Genetics: doi:10.1038/ng · Supplementary Figure 4 Enrichment of de novo mutations in genes with near-complete depletion of truncating variants across schizophrenia and neurodevelopmental

5. Ascano, M. et al. FMRP targets distinct mRNA sequence elements to regulate protein expression. Nature 492, 382–386 (2012).

6. Kirov, G. et al. De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia. Mol. Psychiatry 17, 142–53 (2012).

7. Heller, E. A. et al. The biochemical anatomy of cortical inhibitory synapses. PLoS One 7, (2012).

8. Pocklington, A. J. et al. Novel Findings from CNVs Implicate Inhibitory and Excitatory Signaling Complexes in Schizophrenia. Neuron 86, 1203–1214 (2015).

9. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–7 (2014).

10. Sugathan, A. et al. CHD8 regulates neurodevelopmental pathways associated with autism spectrum disorder in neural progenitors. Proc. Natl. Acad. Sci. 111, E4468–E4477 (2014).

11. Cotney, J. et al. The autism-associated chromatin modifier CHD8 regulates other autism risk genes during human neurodevelopment. Nat. Commun. 6, 6404 (2015).

12. Weyn-Vanhentenryck, S. M. et al. HITS-CLIP and Integrative Modeling Define the Rbfox Splicing-Regulatory Network Linked to Brain Development and Autism. Cell Rep. 6, 1139–1152 (2014).

13. Fogel, B. L. et al. RBFOX1 regulates both splicing and transcriptional networks in human neuronal development. Hum. Mol. Genet. 21, 4171–4186 (2012).

14. Johnson MR, Shkura K, Langley SR, Delahaye-Duriez A, Srivastava P, W. Hill D, Rackham OJ, Davies G, Harris SE, Moreno-Moral A, Rotival M, Speed D, Petrovski S, Katz A, , Hayward C, Porteous DJ, Smith BH, Padmanabhan S, Hocking LJ, Starr JM, Liewald DC, St, D. I. and P. E. Systems genetics identifies a convergent gene network for cognitive function and neurodevelopmental disease. Nat. Neurosci. in press, (2015).

15. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

16. Sanders, S. J. et al. Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci. Neuron 87, 1215–1233 (2015).

17. The Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–8 (2015).

18. Firth, H. V et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am. J. Hum. Genet. 84, 524–33 (2009).

19. Deciphering Developmental Disorders Study. Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017).

20. Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–9 (2011).

21. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–78 (2012).

22. The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science (80-. ). 348, 648–660 (2015).

23. Moore, C. et al. The INTERVAL trial to determine whether intervals between blood donations can be safely and acceptably decreased to optimise blood supply: study protocol for a randomised controlled trial. Trials 15, 363 (2014).

Nature Genetics: doi:10.1038/ng.3903