Evolutionary Forces Shaping Human Genetic Variation #1000GB

32
EVOLUTIONARY FORCES SHAPING HUMAN GENETIC VARIATION Ryan D. Hernandez [email protected] twitter: @rdhernand

description

Talk I gave at the 1000 Genomes Project and Beyond meeting in Cambridge, UK.

Transcript of Evolutionary Forces Shaping Human Genetic Variation #1000GB

Page 1: Evolutionary Forces Shaping Human Genetic Variation #1000GB

EVOLUTIONARY FORCES SHAPING HUMAN

GENETIC VARIATIONRyan D. Hernandez

[email protected] twitter: @rdhernand

Page 2: Evolutionary Forces Shaping Human Genetic Variation #1000GB

http://www.finca.org

Page 3: Evolutionary Forces Shaping Human Genetic Variation #1000GB

NileRiver

Red Sea

AndamanIslands

Meadowcroft19,000-12,000 years ago

Kennewick9,500 years ago

Spirit Cave9,500-9,400

years ago

Monte Verde14,800 years ago

Niah Cave40,000 years ago

Qafzeh100,000years ago

Lake Mungo45,000 years ago

Malakunanja50,000 years ago

Omo KibishOldest modern human195,000 years ago

Pestera cu Oase35,000 years ago

Yana River30,000 years ago

Zhoukoudian(Shandingdong)

11,000 years ago

Minatogawa18,000 years ago

Clovis13,500years ago

Klasies River Mouth120,000 years ago

EQUATOR

40,000-30,000years ago

20,000-15,000years ago

50,000years ago

15,000-12,000years ago

200,000 years ago

70,000-50,000 years ago

40,000years ago

AUSTRALIA

ASIA

AFRICA

EUROPE

NORTHAMERICA

SOUTHAMERICA

1

2

3

4

5

6

Migration date Generalized route

Human MigrationFossil or artifact site

40,000years ago

SOURCES: SUSAN ANTON, NEW YORK UNIVERSITY; ALISON BROOKS, GEORGE WASHINGTON UNIVERSITY; PETER FORSTER, UNIVERSITY OF CAMBRIDGE; JAMES F. O'CONNELL, UNIVERSITY OF UTAH; STEPHEN OPPENHEIMER, OXFORD UNIVERSITY; SPENCER WELLS, NATIONAL GEOGRAPHIC SOCIETY; OFER BAR-YOSEF, HARVARD UNIVERSITY

NGM MAPS

© 2006 National Geographic Society. All rights reserved.

Human colonization of

the world

http://ngm.nationalgeographic.com

Page 4: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Putatively neutral diversity levels The Effect of “Positive Selection”

Adaptive Neutral

Nearly NeutralMildly DeleteriousFairly Deleterious

Strongly Deleterious

Page 5: Evolutionary Forces Shaping Human Genetic Variation #1000GB

“Selective Sweep”

Adaptive Neutral

Nearly NeutralMildly DeleteriousFairly Deleterious

Strongly Deleterious

Putatively neutral diversity levels The Effect of “Positive Selection”

Page 6: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Adaptive Neutral

Nearly NeutralMildly DeleteriousFairly Deleterious

Strongly Deleterious

“Selective Sweep”

Putatively neutral diversity levels The Effect of “Positive Selection”

Page 7: Evolutionary Forces Shaping Human Genetic Variation #1000GB

“Selective Sweep”

• Repeated fixation of functional mutations in coding regions over evolutionary timescales can lead to a disproportional number of amino acid substitutions relative to observed polymorphisms.

• This can be summarized by a 2x2 table and analyzed using the McDonald-Kreitman test:

Non-Syn Syn

Fixed F F

Polymorphic P P1000 Genomes Project Data

Adaptive Neutral

Nearly NeutralMildly DeleteriousFairly Deleterious

Strongly Deleterious

Putatively neutral diversity levels The Effect of “Positive Selection”

Page 8: Evolutionary Forces Shaping Human Genetic Variation #1000GB

SnIPRE: an improvement to MKT

Since few SNPs and substitutions are usually observed per gene, MKT can be noisy. Pooling observations across the genome using a mixed effects model vastly increases power.

Eilertson et al, 2012

Page 9: Evolutionary Forces Shaping Human Genetic Variation #1000GB

SnIPREASR in 1000 Genomes Project

Human-chimp divergencePos Sel Conserved

410 8027

• Conserved genes are either neutral or under purifying selection.

Page 10: Evolutionary Forces Shaping Human Genetic Variation #1000GB

SnIPREASR: an improvement to SnIPRE

• Alignments are generated using MOSAIC, a program we developed that rigorously integrates putative orthologs from an arbitrary number of sources.!!

• Using PAML, we perform AIC-based model selection to infer the substitutions along the human lineage since our divergence with chimp.

pythonhosted.org/bio-­‐MOSAIC/ Maher & Hernandez (arXiv)

HumanChimpOrangGorilla

Cyrus Maher

Page 11: Evolutionary Forces Shaping Human Genetic Variation #1000GB

SnIPREASR works well for positive selection

• Simulations: Human-specific substitutions; Gutenkunst et al. demographic model.

• 𝛾 is the population scaled selection coefficient.

• SnIPREASR is best-powered to estimate values of 𝛾>0.

HumanChimpOrangGorilla

Page 12: Evolutionary Forces Shaping Human Genetic Variation #1000GB

ASR removes genes positively selected in chimp

Human-chimp divergencePos Sel Conserved

Human only (ASR)

Pos Sel 343 0 343Conserved 67 8027 8094

410 8027

• Conserved genes are either neutral or under purifying selection.

• 67/410 (16%) of genes identified as positively selected when comparing human-chimp are conserved along the human lineage.

Page 13: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Positively selected genes dominated by smell & response to pathogens

GOrilla

Page 14: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Amino acid substitution

Neutral diversity

levels …

Reflects the fraction of amino acid

substitutions that are adaptive

n substitutions…Reflects the typical

strength of selection

The footprint of adaptive amino acid substitutions

• Goal: compare the pattern around amino acid substitutions to the pattern around synonymous substitutions.

Hernandez et al. Science (2011)

Page 15: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Observed Patterns of Diversity Around Human Substitutions

Hernandez et al. Science (2011)

Page 16: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Genetic diversity reduced: π=f0π0

(decrease in effective population size [Ne])

Adaptive Neutral

Nearly NeutralMildly DeleteriousFairly Deleterious

Strongly Deleterious

Putatively neutral diversity levels The Effect of Negative Selection

Page 17: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Genetic diversity reduced: π=f0π0

(decrease in effective population size [Ne])

Adaptive Neutral

Nearly NeutralMildly DeleteriousFairly Deleterious

Strongly Deleterious

Putatively neutral diversity levels The Effect of Negative Selection

Page 18: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Putatively neutral diversity levels The Effect of Negative Selection

Genetic diversity reduced: π=f0π0

(decrease in effective population size [Ne])

Adaptive Neutral

Nearly NeutralMildly DeleteriousFairly Deleterious

Strongly Deleterious

Page 19: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Background Selection in Humans

ObservedPredicted

}}

Hernandez, et al. Science (2011).

Page 20: Evolutionary Forces Shaping Human Genetic Variation #1000GB

BGS correlates with Fst at neutral sites

www.postersession.com

4 - Demography and BGS Across 1000 Genomes !

4 - Population Differentiation as a Function of BGS!The decrease in Ne locally across the genome as a result of BGS (inferred2 by the value, B, in which lower values indicate stronger BGS) may impact the rate of genetic drift at specific loci. To investigate this effect, we measured FST between TGP populations as a function of BGS strength. Our results suggest that the strength of BGS is a predictor of population differentiation, with an increase in genetic drift driving this effect.

5 - Forward Simulations of Demography and BGS!

With the observations of thousands of genomes from global populations, worldwide patterns of human genetic diversity can be more fully described and the genome-wide impacts of natural selection on these patterns better understood. In many studies, genomic scans have revealed linked-selection, which manifests itself as a local trough in diversity, to be one important driver of such patterns. However, it less well understood how the actions of linked-selection interact with other drivers of population diversity, such as demography. Utilizing thousands of whole-genomes from phase 3 of the 1000 Genomes Project (TGP) 1, we investigate the impact of one mode of linked-selection, background selection (BGS), on population specific patterns of genetic diversity as a function of demographic history. Additionally, we also observe that BGS influences the site-frequency spectrum (SFS), generating an increase towards the proportion of rare variants. This observation is especially crucial since many inferences of demography utilize the SFS and can be biased if they are performed on regions of the genome under strong BGS.

1 - Introduction!

Background Selection Influences Population Specific Patterns of Diversity and the Site-

Frequency Spectrum in Humans !!

Raul Torres and Ryan D. Hernandez!University of California, San Francisco!

2 - Background Selection!Through purifying selection, neutral variants linked to a negatively selected site will be eliminated from the population in a process called ‘background selection’ (BGS). Due to recombination and variation in the magnitude of selection on deleterious variants, the strength of BGS will vary across the genome, resulting in differing amounts of reduction in genetic diversity. Below, we show the variation in the strength of BGS across chromosome 22, as estimated by McVicker et al. 20082.

6 - Conclusions! The complex demographic history of humans has influenced the action of background selection on driving the patterns of diversity that exist within specific populations. Using data from the 1000 Genomes Project, we are able to recapitulate patterns that are evident by both forward simulations and population genetic theory. We also observe that linked selection, through the mechanism of BGS, can have a large impact on distorting the SFS. We caution that even strictly neutral sites can suffer from such distortions through the mere process of being on the same background as another site under selection and that care should be taken to account for such distortions when performing inference using the SFS.

References: 1. Abecasis G, et al. 2012. Nature. 491: 56-65. 2. McVicker G, et al. 2009. PLOS Genet. 5(5): e1000471. 3. Boyko AR, et al. 2008. PLOS Genet. 4(5): e1000083. 4. Gutenkunst RN, et al. 2009. PLOS Genet. 5(10): e1000695. 5. Hernandez RD. 2008. Bioinformatics. 24(23): 2786-2787. Acknowledgements: R.T. is supported by an NIGMS Initiative for Maximizing Student Development grant (R25-GM56847), an NSF Graduate Research Fellowship and a UCSF Earle C. Anthony Travel Award.

Na!

1.8 x Na!

0.28 x Na!

1.8 x Na! 3.4 x Na! 5.8 x Na!

0.15 x Na !

0.08 x Na !

AFR! EUR! ASN!

. . .�Deleterious locus:!!

Distribution of selection !coefficients: γ � Γ(α, β)

Length = 100 bp!

Uniform recombination

rate!

Demographic Model!Selection Model!

0.60

0.65

0.70

0.75

0.80

0.85

0.90

ππππ0

CONST AFR CEU ASNEUR! ASN!

Strength of Background Selection Across Chromosome 22

Position

Due

to B

ackg

roun

d Se

lect

ion

10000000 20000000 30000000 40000000

0.8

0.6

0.4

0.2

0

AFR!CONST!

Neutral Variant

Deleterious Variant

BGS!

Using a distribution of fitness effects and a demographic model inferred from previous studies3,4, we ran forward simulations using SFS_CODE5 to estimate the effect of human demography on determining the reduction in genetic diversity caused by BGS, observing that the effects of BGS are strongest for those populations that have experienced sharp population bottlenecks (i.e., Europeans and Asians). However, the expected reduction in diversity due to BGS across all human populations is still greater than for a simulated population of constant size, illustrating the importance of population expansions for determining the overall strength of BGS.

Estimate of Reduction in Diversity Due to BGS across African,

European, and Asian Populations!

π BG

S/π 0"

To observe if demography impacts the relative patterns of diversity (π) caused by BGS as predicted by our simulations, we measured the ratio of diversity across different bins of BGS strength, B, in the human genome in each of 20 populations from phase 3 of the TGP. We observe that BGS causes the greatest relative reduction of genetic diversity in non-African populations. To limit potential biases from coverage, high-coverage (exome) and low-coverage regions of the genome were analyzed separately.

π stro

ng -B

GS/

πw

eak-

BG

S"

π"

AFR!!

EUR!!

EASN!!

●●

● ● ● ●

●● ●

●●

● ●●

● ●

● ● ● ●

● ● ●

Fst (estimator method) vs. Background SelectionAfrican vs. Asian

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

●● ●

● ● ●●

● ●

●●

●● ●

●●

● ●

●●

Fst (estimator method) vs. Background SelectionAfrican vs. European

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

●● ●

●● ●

●● ● ● ●

● ● ● ●●

●●

● ●

● ●

●●

Fst (estimator method) vs. Background SelectionEuropean vs. Asian

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

B value!

F ST!

BGS strength!

popu

latio

n di

ffere

ntia

tion!

F ST!

F ST!

B value! B value!

African vs. Asian! African vs. European! European vs. Asian!

High-Coverage (Exome)!

Position!

0.00

040.

0006

0.00

080.

0010

0.00

12

● ● ● ● ●

● ● ● ● ●● ● ● ● ●

● ● ● ● ●

ESN

LWK

YRI

MSL

GW

D

IBS

CEU TS

I

GBR FI

N

CH

B

JPT

CH

S

KHV

CD

X

BEB

PJL

ITU

STU

GIH

B : 0−50B : 476−525B : 951−1000

0.00

040.

0006

0.00

080.

0010

● ● ● ● ●

● ● ● ● ●● ● ● ● ●

● ● ● ● ●

ESN

LWK

YRI

MSL

GW

D

IBS

CEU TS

I

GBR FI

N

CH

B

JPT

CH

S

KHV

CD

X

BEB

PJL

ITU

STU

GIH

B : 0−50B : 476−525B : 951−1000

SASN!!

0.32

0.34

0.36

0.38

0.40

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

(B=0−50)/(B=951−1000)

AFR!!

EUR!!

EASN!!

SASN!!

AFR!!

EUR!!

EASN!!

SASN!!

0.40

0.42

0.44

0.46

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

(B=0−50)/(B=951−1000)

π stro

ng -B

GS/

πw

eak-

BG

S"

π"

AFR!!

EUR!!

EASN!!

SASN!!

High-Coverage (Exome)!

Low-Coverage!Low-Coverage!

4 - BGS Skews the SFS Towards Rare Variants!Purifying selection on linked sites can cause distortions in gene geneologies, leading to potential skews in the site-frequency spectrum. To investigate these effects, we measured the SFS as a function of B separately across the high-coverage and low-coverage regions of phase 3 TGP populations. We observed a marked increase in the number of of rare variants, especially singletons, in both datasets as a function of BGS strength. This pattern is amplified in non-African vs. African populations.

Derived Allele Count (log-scale)!

frequ

ency!

frequ

ency!

0.0

0.1

0.2

0.3

0.4

0.5

YRI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

0.0

0.1

0.2

0.3

0.4

CHS

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

CHS!

0.0

0.1

0.2

0.3

0.4

0.5

TSI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

0.5

CHS

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

0.5

ITU

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

ITU

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

0.0

0.1

0.2

0.3

0.4

YRI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

YRI!

0.0

0.1

0.2

0.3

0.4

TSI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

TSI!

Low-!Coverage!

High-!Coverage!

1

ratiovec[1]

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

1.15

1.25

1.35

1.45 ●

Low−CoverageHigh−Coverage

Ratio of Singleton Frequency in Strong BGS Bin vs. Weak BGS Bin!

ITU!

AFR!!

EUR!!

EASN!!

SASN!!

ratio!

Estimate2 of the Strength of Background Selection Across Chromosome 22!

Frac

tion

of D

iver

sity!

Redu

ced

Due

to B

GS

(1-B)!

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0-24 225-249 475-499 725-749 975-979 0-24 225-249 475-499 725-749 975-979 0-24 225-249 475-499 725-749 975-979B value B value

www.postersession.com

4 - Demography and BGS Across 1000 Genomes !

4 - Population Differentiation as a Function of BGS!The decrease in Ne locally across the genome as a result of BGS (inferred2 by the value, B, in which lower values indicate stronger BGS) may impact the rate of genetic drift at specific loci. To investigate this effect, we measured FST between TGP populations as a function of BGS strength. Our results suggest that the strength of BGS is a predictor of population differentiation, with an increase in genetic drift driving this effect.

5 - Forward Simulations of Demography and BGS!

With the observations of thousands of genomes from global populations, worldwide patterns of human genetic diversity can be more fully described and the genome-wide impacts of natural selection on these patterns better understood. In many studies, genomic scans have revealed linked-selection, which manifests itself as a local trough in diversity, to be one important driver of such patterns. However, it less well understood how the actions of linked-selection interact with other drivers of population diversity, such as demography. Utilizing thousands of whole-genomes from phase 3 of the 1000 Genomes Project (TGP) 1, we investigate the impact of one mode of linked-selection, background selection (BGS), on population specific patterns of genetic diversity as a function of demographic history. Additionally, we also observe that BGS influences the site-frequency spectrum (SFS), generating an increase towards the proportion of rare variants. This observation is especially crucial since many inferences of demography utilize the SFS and can be biased if they are performed on regions of the genome under strong BGS.

1 - Introduction!

Background Selection Influences Population Specific Patterns of Diversity and the Site-

Frequency Spectrum in Humans !!

Raul Torres and Ryan D. Hernandez!University of California, San Francisco!

2 - Background Selection!Through purifying selection, neutral variants linked to a negatively selected site will be eliminated from the population in a process called ‘background selection’ (BGS). Due to recombination and variation in the magnitude of selection on deleterious variants, the strength of BGS will vary across the genome, resulting in differing amounts of reduction in genetic diversity. Below, we show the variation in the strength of BGS across chromosome 22, as estimated by McVicker et al. 20082.

6 - Conclusions! The complex demographic history of humans has influenced the action of background selection on driving the patterns of diversity that exist within specific populations. Using data from the 1000 Genomes Project, we are able to recapitulate patterns that are evident by both forward simulations and population genetic theory. We also observe that linked selection, through the mechanism of BGS, can have a large impact on distorting the SFS. We caution that even strictly neutral sites can suffer from such distortions through the mere process of being on the same background as another site under selection and that care should be taken to account for such distortions when performing inference using the SFS.

References: 1. Abecasis G, et al. 2012. Nature. 491: 56-65. 2. McVicker G, et al. 2009. PLOS Genet. 5(5): e1000471. 3. Boyko AR, et al. 2008. PLOS Genet. 4(5): e1000083. 4. Gutenkunst RN, et al. 2009. PLOS Genet. 5(10): e1000695. 5. Hernandez RD. 2008. Bioinformatics. 24(23): 2786-2787. Acknowledgements: R.T. is supported by an NIGMS Initiative for Maximizing Student Development grant (R25-GM56847), an NSF Graduate Research Fellowship and a UCSF Earle C. Anthony Travel Award.

Na!

1.8 x Na!

0.28 x Na!

1.8 x Na! 3.4 x Na! 5.8 x Na!

0.15 x Na !

0.08 x Na !

AFR! EUR! ASN!

. . .�Deleterious locus:!!

Distribution of selection !coefficients: γ � Γ(α, β)

Length = 100 bp!

Uniform recombination

rate!

Demographic Model!Selection Model!

0.60

0.65

0.70

0.75

0.80

0.85

0.90

ππππ0

CONST AFR CEU ASNEUR! ASN!

Strength of Background Selection Across Chromosome 22

Position

Due

to B

ackg

roun

d Se

lect

ion

10000000 20000000 30000000 40000000

0.8

0.6

0.4

0.2

0

AFR!CONST!

Neutral Variant

Deleterious Variant

BGS!

Using a distribution of fitness effects and a demographic model inferred from previous studies3,4, we ran forward simulations using SFS_CODE5 to estimate the effect of human demography on determining the reduction in genetic diversity caused by BGS, observing that the effects of BGS are strongest for those populations that have experienced sharp population bottlenecks (i.e., Europeans and Asians). However, the expected reduction in diversity due to BGS across all human populations is still greater than for a simulated population of constant size, illustrating the importance of population expansions for determining the overall strength of BGS.

Estimate of Reduction in Diversity Due to BGS across African,

European, and Asian Populations!

π BG

S/π 0"

To observe if demography impacts the relative patterns of diversity (π) caused by BGS as predicted by our simulations, we measured the ratio of diversity across different bins of BGS strength, B, in the human genome in each of 20 populations from phase 3 of the TGP. We observe that BGS causes the greatest relative reduction of genetic diversity in non-African populations. To limit potential biases from coverage, high-coverage (exome) and low-coverage regions of the genome were analyzed separately.

π stro

ng -B

GS/

πw

eak-

BG

S"

π"

AFR!!

EUR!!

EASN!!

●●

● ● ● ●

●● ●

●●

● ●●

● ●

● ● ● ●

● ● ●

Fst (estimator method) vs. Background SelectionAfrican vs. Asian

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

●● ●

● ● ●●

● ●

●●

●● ●

●●

● ●

●●

Fst (estimator method) vs. Background SelectionAfrican vs. European

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

●● ●

●● ●

●● ● ● ●

● ● ● ●●

●●

● ●

● ●

●●

Fst (estimator method) vs. Background SelectionEuropean vs. Asian

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

B value!

F ST!

BGS strength!

popu

latio

n di

ffere

ntia

tion!

F ST!

F ST!

B value! B value!

African vs. Asian! African vs. European! European vs. Asian!

High-Coverage (Exome)!

Position!

0.00

040.

0006

0.00

080.

0010

0.00

12

● ● ● ● ●

● ● ● ● ●● ● ● ● ●

● ● ● ● ●

ESN

LWK

YRI

MSL

GW

D

IBS

CEU TS

I

GBR FI

N

CH

B

JPT

CH

S

KHV

CD

X

BEB

PJL

ITU

STU

GIH

B : 0−50B : 476−525B : 951−1000

0.00

040.

0006

0.00

080.

0010

● ● ● ● ●

● ● ● ● ●● ● ● ● ●

● ● ● ● ●

ESN

LWK

YRI

MSL

GW

D

IBS

CEU TS

I

GBR FI

N

CH

B

JPT

CH

S

KHV

CD

X

BEB

PJL

ITU

STU

GIH

B : 0−50B : 476−525B : 951−1000

SASN!!

0.32

0.34

0.36

0.38

0.40

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

(B=0−50)/(B=951−1000)

AFR!!

EUR!!

EASN!!

SASN!!

AFR!!

EUR!!

EASN!!

SASN!!

0.40

0.42

0.44

0.46

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

(B=0−50)/(B=951−1000)

π stro

ng -B

GS/

πw

eak-

BG

S"

π"

AFR!!

EUR!!

EASN!!

SASN!!

High-Coverage (Exome)!

Low-Coverage!Low-Coverage!

4 - BGS Skews the SFS Towards Rare Variants!Purifying selection on linked sites can cause distortions in gene geneologies, leading to potential skews in the site-frequency spectrum. To investigate these effects, we measured the SFS as a function of B separately across the high-coverage and low-coverage regions of phase 3 TGP populations. We observed a marked increase in the number of of rare variants, especially singletons, in both datasets as a function of BGS strength. This pattern is amplified in non-African vs. African populations.

Derived Allele Count (log-scale)!

frequ

ency!

frequ

ency!

0.0

0.1

0.2

0.3

0.4

0.5

YRI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

0.0

0.1

0.2

0.3

0.4

CHS

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

CHS!

0.0

0.1

0.2

0.3

0.4

0.5

TSI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

0.5

CHS

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

0.5

ITU

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

ITU

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

0.0

0.1

0.2

0.3

0.4

YRI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

YRI!

0.0

0.1

0.2

0.3

0.4

TSI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

TSI!

Low-!Coverage!

High-!Coverage!

1

ratiovec[1]

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

1.15

1.25

1.35

1.45 ●

Low−CoverageHigh−Coverage

Ratio of Singleton Frequency in Strong BGS Bin vs. Weak BGS Bin!

ITU!

AFR!!

EUR!!

EASN!!

SASN!!

ratio!

Estimate2 of the Strength of Background Selection Across Chromosome 22!

Frac

tion

of D

iver

sity!

Redu

ced

Due

to B

GS

(1-B)!

B value

www.postersession.com

4 - Demography and BGS Across 1000 Genomes !

4 - Population Differentiation as a Function of BGS!The decrease in Ne locally across the genome as a result of BGS (inferred2 by the value, B, in which lower values indicate stronger BGS) may impact the rate of genetic drift at specific loci. To investigate this effect, we measured FST between TGP populations as a function of BGS strength. Our results suggest that the strength of BGS is a predictor of population differentiation, with an increase in genetic drift driving this effect.

5 - Forward Simulations of Demography and BGS!

With the observations of thousands of genomes from global populations, worldwide patterns of human genetic diversity can be more fully described and the genome-wide impacts of natural selection on these patterns better understood. In many studies, genomic scans have revealed linked-selection, which manifests itself as a local trough in diversity, to be one important driver of such patterns. However, it less well understood how the actions of linked-selection interact with other drivers of population diversity, such as demography. Utilizing thousands of whole-genomes from phase 3 of the 1000 Genomes Project (TGP) 1, we investigate the impact of one mode of linked-selection, background selection (BGS), on population specific patterns of genetic diversity as a function of demographic history. Additionally, we also observe that BGS influences the site-frequency spectrum (SFS), generating an increase towards the proportion of rare variants. This observation is especially crucial since many inferences of demography utilize the SFS and can be biased if they are performed on regions of the genome under strong BGS.

1 - Introduction!

Background Selection Influences Population Specific Patterns of Diversity and the Site-

Frequency Spectrum in Humans !!

Raul Torres and Ryan D. Hernandez!University of California, San Francisco!

2 - Background Selection!Through purifying selection, neutral variants linked to a negatively selected site will be eliminated from the population in a process called ‘background selection’ (BGS). Due to recombination and variation in the magnitude of selection on deleterious variants, the strength of BGS will vary across the genome, resulting in differing amounts of reduction in genetic diversity. Below, we show the variation in the strength of BGS across chromosome 22, as estimated by McVicker et al. 20082.

6 - Conclusions! The complex demographic history of humans has influenced the action of background selection on driving the patterns of diversity that exist within specific populations. Using data from the 1000 Genomes Project, we are able to recapitulate patterns that are evident by both forward simulations and population genetic theory. We also observe that linked selection, through the mechanism of BGS, can have a large impact on distorting the SFS. We caution that even strictly neutral sites can suffer from such distortions through the mere process of being on the same background as another site under selection and that care should be taken to account for such distortions when performing inference using the SFS.

References: 1. Abecasis G, et al. 2012. Nature. 491: 56-65. 2. McVicker G, et al. 2009. PLOS Genet. 5(5): e1000471. 3. Boyko AR, et al. 2008. PLOS Genet. 4(5): e1000083. 4. Gutenkunst RN, et al. 2009. PLOS Genet. 5(10): e1000695. 5. Hernandez RD. 2008. Bioinformatics. 24(23): 2786-2787. Acknowledgements: R.T. is supported by an NIGMS Initiative for Maximizing Student Development grant (R25-GM56847), an NSF Graduate Research Fellowship and a UCSF Earle C. Anthony Travel Award.

Na!

1.8 x Na!

0.28 x Na!

1.8 x Na! 3.4 x Na! 5.8 x Na!

0.15 x Na !

0.08 x Na !

AFR! EUR! ASN!

. . .�Deleterious locus:!!

Distribution of selection !coefficients: γ � Γ(α, β)

Length = 100 bp!

Uniform recombination

rate!

Demographic Model!Selection Model!

0.60

0.65

0.70

0.75

0.80

0.85

0.90

ππππ0

CONST AFR CEU ASNEUR! ASN!

Strength of Background Selection Across Chromosome 22

Position

Due

to B

ackg

roun

d Se

lect

ion

10000000 20000000 30000000 40000000

0.8

0.6

0.4

0.2

0

AFR!CONST!

Neutral Variant

Deleterious Variant

BGS!

Using a distribution of fitness effects and a demographic model inferred from previous studies3,4, we ran forward simulations using SFS_CODE5 to estimate the effect of human demography on determining the reduction in genetic diversity caused by BGS, observing that the effects of BGS are strongest for those populations that have experienced sharp population bottlenecks (i.e., Europeans and Asians). However, the expected reduction in diversity due to BGS across all human populations is still greater than for a simulated population of constant size, illustrating the importance of population expansions for determining the overall strength of BGS.

Estimate of Reduction in Diversity Due to BGS across African,

European, and Asian Populations!

π BG

S/π 0"

To observe if demography impacts the relative patterns of diversity (π) caused by BGS as predicted by our simulations, we measured the ratio of diversity across different bins of BGS strength, B, in the human genome in each of 20 populations from phase 3 of the TGP. We observe that BGS causes the greatest relative reduction of genetic diversity in non-African populations. To limit potential biases from coverage, high-coverage (exome) and low-coverage regions of the genome were analyzed separately.

π stro

ng -B

GS/

πw

eak-

BG

S"

π"

AFR!!

EUR!!

EASN!!

●●

● ● ● ●

●● ●

●●

● ●●

● ●

● ● ● ●

● ● ●

Fst (estimator method) vs. Background SelectionAfrican vs. Asian

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

●● ●

● ● ●●

● ●

●●

●● ●

●●

● ●

●●

Fst (estimator method) vs. Background SelectionAfrican vs. European

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

●● ●

●● ●

●● ● ● ●

● ● ● ●●

●●

● ●

● ●

●●

Fst (estimator method) vs. Background SelectionEuropean vs. Asian

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

B value!

F ST!

BGS strength!

popu

latio

n di

ffere

ntia

tion!

F ST!

F ST!

B value! B value!

African vs. Asian! African vs. European! European vs. Asian!

High-Coverage (Exome)!

Position!

0.00

040.

0006

0.00

080.

0010

0.00

12

● ● ● ● ●

● ● ● ● ●● ● ● ● ●

● ● ● ● ●

ESN

LWK

YRI

MSL

GW

D

IBS

CEU TS

I

GBR FI

N

CH

B

JPT

CH

S

KHV

CD

X

BEB

PJL

ITU

STU

GIH

B : 0−50B : 476−525B : 951−1000

0.00

040.

0006

0.00

080.

0010

● ● ● ● ●

● ● ● ● ●● ● ● ● ●

● ● ● ● ●

ESN

LWK

YRI

MSL

GW

D

IBS

CEU TS

I

GBR FI

N

CH

B

JPT

CH

S

KHV

CD

X

BEB

PJL

ITU

STU

GIH

B : 0−50B : 476−525B : 951−1000

SASN!!

0.32

0.34

0.36

0.38

0.40

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

(B=0−50)/(B=951−1000)

AFR!!

EUR!!

EASN!!

SASN!!

AFR!!

EUR!!

EASN!!

SASN!!

0.40

0.42

0.44

0.46

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

(B=0−50)/(B=951−1000)

π stro

ng -B

GS/

πw

eak-

BG

S"

π"

AFR!!

EUR!!

EASN!!

SASN!!

High-Coverage (Exome)!

Low-Coverage!Low-Coverage!

4 - BGS Skews the SFS Towards Rare Variants!Purifying selection on linked sites can cause distortions in gene geneologies, leading to potential skews in the site-frequency spectrum. To investigate these effects, we measured the SFS as a function of B separately across the high-coverage and low-coverage regions of phase 3 TGP populations. We observed a marked increase in the number of of rare variants, especially singletons, in both datasets as a function of BGS strength. This pattern is amplified in non-African vs. African populations.

Derived Allele Count (log-scale)!

frequ

ency!

frequ

ency!

0.0

0.1

0.2

0.3

0.4

0.5

YRI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

0.0

0.1

0.2

0.3

0.4

CHS

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

CHS!

0.0

0.1

0.2

0.3

0.4

0.5

TSI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

0.5

CHS

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

0.5

ITU

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

ITU

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

0.0

0.1

0.2

0.3

0.4

YRI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

YRI!

0.0

0.1

0.2

0.3

0.4

TSI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

TSI!

Low-!Coverage!

High-!Coverage!

1

ratiovec[1]

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

1.15

1.25

1.35

1.45 ●

Low−CoverageHigh−Coverage

Ratio of Singleton Frequency in Strong BGS Bin vs. Weak BGS Bin!

ITU!

AFR!!

EUR!!

EASN!!

SASN!!

ratio!

Estimate2 of the Strength of Background Selection Across Chromosome 22!

Frac

tion

of D

iver

sity!

Redu

ced

Due

to B

GS

(1-B)!

strong weak

• Neutral sites defined as PhyloP ⊂ (-1.2, 1.2)

Raul Torres

Page 21: Evolutionary Forces Shaping Human Genetic Variation #1000GB

BGS in the human genome

Low Coverage WGS

High Coverage exome

www.postersession.com

4 - Demography and BGS Across 1000 Genomes !

4 - Population Differentiation as a Function of BGS!The decrease in Ne locally across the genome as a result of BGS (inferred2 by the value, B, in which lower values indicate stronger BGS) may impact the rate of genetic drift at specific loci. To investigate this effect, we measured FST between TGP populations as a function of BGS strength. Our results suggest that the strength of BGS is a predictor of population differentiation, with an increase in genetic drift driving this effect.

5 - Forward Simulations of Demography and BGS!

With the observations of thousands of genomes from global populations, worldwide patterns of human genetic diversity can be more fully described and the genome-wide impacts of natural selection on these patterns better understood. In many studies, genomic scans have revealed linked-selection, which manifests itself as a local trough in diversity, to be one important driver of such patterns. However, it less well understood how the actions of linked-selection interact with other drivers of population diversity, such as demography. Utilizing thousands of whole-genomes from phase 3 of the 1000 Genomes Project (TGP) 1, we investigate the impact of one mode of linked-selection, background selection (BGS), on population specific patterns of genetic diversity as a function of demographic history. Additionally, we also observe that BGS influences the site-frequency spectrum (SFS), generating an increase towards the proportion of rare variants. This observation is especially crucial since many inferences of demography utilize the SFS and can be biased if they are performed on regions of the genome under strong BGS.

1 - Introduction!

Background Selection Influences Population Specific Patterns of Diversity and the Site-

Frequency Spectrum in Humans !!

Raul Torres and Ryan D. Hernandez!University of California, San Francisco!

2 - Background Selection!Through purifying selection, neutral variants linked to a negatively selected site will be eliminated from the population in a process called ‘background selection’ (BGS). Due to recombination and variation in the magnitude of selection on deleterious variants, the strength of BGS will vary across the genome, resulting in differing amounts of reduction in genetic diversity. Below, we show the variation in the strength of BGS across chromosome 22, as estimated by McVicker et al. 20082.

6 - Conclusions! The complex demographic history of humans has influenced the action of background selection on driving the patterns of diversity that exist within specific populations. Using data from the 1000 Genomes Project, we are able to recapitulate patterns that are evident by both forward simulations and population genetic theory. We also observe that linked selection, through the mechanism of BGS, can have a large impact on distorting the SFS. We caution that even strictly neutral sites can suffer from such distortions through the mere process of being on the same background as another site under selection and that care should be taken to account for such distortions when performing inference using the SFS.

References: 1. Abecasis G, et al. 2012. Nature. 491: 56-65. 2. McVicker G, et al. 2009. PLOS Genet. 5(5): e1000471. 3. Boyko AR, et al. 2008. PLOS Genet. 4(5): e1000083. 4. Gutenkunst RN, et al. 2009. PLOS Genet. 5(10): e1000695. 5. Hernandez RD. 2008. Bioinformatics. 24(23): 2786-2787. Acknowledgements: R.T. is supported by an NIGMS Initiative for Maximizing Student Development grant (R25-GM56847), an NSF Graduate Research Fellowship and a UCSF Earle C. Anthony Travel Award.

Na!

1.8 x Na!

0.28 x Na!

1.8 x Na! 3.4 x Na! 5.8 x Na!

0.15 x Na !

0.08 x Na !

AFR! EUR! ASN!

. . .�Deleterious locus:!!

Distribution of selection !coefficients: γ � Γ(α, β)

Length = 100 bp!

Uniform recombination

rate!

Demographic Model!Selection Model!

0.60

0.65

0.70

0.75

0.80

0.85

0.90

ππππ0

CONST AFR CEU ASNEUR! ASN!

Strength of Background Selection Across Chromosome 22

Position

Due

to B

ackg

roun

d Se

lect

ion

10000000 20000000 30000000 40000000

0.8

0.6

0.4

0.2

0

AFR!CONST!

Neutral Variant

Deleterious Variant

BGS!

Using a distribution of fitness effects and a demographic model inferred from previous studies3,4, we ran forward simulations using SFS_CODE5 to estimate the effect of human demography on determining the reduction in genetic diversity caused by BGS, observing that the effects of BGS are strongest for those populations that have experienced sharp population bottlenecks (i.e., Europeans and Asians). However, the expected reduction in diversity due to BGS across all human populations is still greater than for a simulated population of constant size, illustrating the importance of population expansions for determining the overall strength of BGS.

Estimate of Reduction in Diversity Due to BGS across African,

European, and Asian Populations!

π BG

S/π 0"

To observe if demography impacts the relative patterns of diversity (π) caused by BGS as predicted by our simulations, we measured the ratio of diversity across different bins of BGS strength, B, in the human genome in each of 20 populations from phase 3 of the TGP. We observe that BGS causes the greatest relative reduction of genetic diversity in non-African populations. To limit potential biases from coverage, high-coverage (exome) and low-coverage regions of the genome were analyzed separately.

π stro

ng -B

GS/

πw

eak-

BG

S"

π"

AFR!!

EUR!!

EASN!!

●●

● ● ● ●

●● ●

●●

● ●●

● ●

● ● ● ●

● ● ●

Fst (estimator method) vs. Background SelectionAfrican vs. Asian

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

●● ●

● ● ●●

● ●

●●

●● ●

●●

● ●

●●

Fst (estimator method) vs. Background SelectionAfrican vs. European

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

●● ●

●● ●

●● ● ● ●

● ● ● ●●

●●

● ●

● ●

●●

Fst (estimator method) vs. Background SelectionEuropean vs. Asian

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0−24 225−249 475−499 725−749 975−1000

B value!

F ST!

BGS strength!

popu

latio

n di

ffere

ntia

tion!

F ST!

F ST!

B value! B value!

African vs. Asian! African vs. European! European vs. Asian!

High-Coverage (Exome)!

Position!0.

0004

0.00

060.

0008

0.00

100.

0012

● ● ● ● ●

● ● ● ● ●● ● ● ● ●

● ● ● ● ●

ESN

LWK

YRI

MSL

GW

D

IBS

CEU TS

I

GBR FI

N

CH

B

JPT

CH

S

KHV

CD

X

BEB

PJL

ITU

STU

GIH

B : 0−50B : 476−525B : 951−1000

0.00

040.

0006

0.00

080.

0010

● ● ● ● ●

● ● ● ● ●● ● ● ● ●

● ● ● ● ●

ESN

LWK

YRI

MSL

GW

D

IBS

CEU TS

I

GBR FI

N

CH

B

JPT

CH

S

KHV

CD

X

BEB

PJL

ITU

STU

GIH

B : 0−50B : 476−525B : 951−1000

SASN!!

0.32

0.34

0.36

0.38

0.40

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

(B=0−50)/(B=951−1000)

AFR!!

EUR!!

EASN!!

SASN!!

AFR!!

EUR!!

EASN!!

SASN!!

0.40

0.42

0.44

0.46

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

(B=0−50)/(B=951−1000)

π stro

ng -B

GS/

πw

eak-

BG

S"

π"

AFR!!

EUR!!

EASN!!

SASN!!

High-Coverage (Exome)!

Low-Coverage!Low-Coverage!

4 - BGS Skews the SFS Towards Rare Variants!Purifying selection on linked sites can cause distortions in gene geneologies, leading to potential skews in the site-frequency spectrum. To investigate these effects, we measured the SFS as a function of B separately across the high-coverage and low-coverage regions of phase 3 TGP populations. We observed a marked increase in the number of of rare variants, especially singletons, in both datasets as a function of BGS strength. This pattern is amplified in non-African vs. African populations.

Derived Allele Count (log-scale)!

frequ

ency!

frequ

ency!

0.0

0.1

0.2

0.3

0.4

0.5

YRI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

0.0

0.1

0.2

0.3

0.4

CHS

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

CHS!

0.0

0.1

0.2

0.3

0.4

0.5

TSI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

0.5

CHS

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

0.5

ITU

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

Derived Allele Count (log-scale)!

0.0

0.1

0.2

0.3

0.4

ITU

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

0.0

0.1

0.2

0.3

0.4

YRI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

YRI!

0.0

0.1

0.2

0.3

0.4

TSI

1 2 3 5 10 25 50 150

B : 0−50B : 476−525B : 951−1000

TSI!

Low-!Coverage!

High-!Coverage!

1

ratiovec[1]

ESN

LWK

YRI

MSL

GWD

IBS

CEU TSI

GBR FIN

CHB

JPT

CHS

KHV

CDX

BEB

PJL

ITU

STU

GIH

1.15

1.25

1.35

1.45 ●

Low−CoverageHigh−Coverage

Ratio of Singleton Frequency in Strong BGS Bin vs. Weak BGS Bin!

ITU!

AFR!!

EUR!!

EASN!!

SASN!!

ratio!

Estimate2 of the Strength of Background Selection Across Chromosome 22!

Frac

tion

of D

iver

sity!

Redu

ced

Due

to B

GS

(1-B)!

• Neutral sites defined as PhyloP ⊂ (-1.2, 1.2)

Page 22: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Modeling assumptions impact results

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Multiplicative, 2Ns = -5

distance (bp)

ππ 0 L = 5e4

L = 5e5L = 1e6L = 1e7L = 5e7L = 1e8

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Additive, 2Ns = -5

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Multiplicative, 2Ns~Γ(α, β)

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Additive, 2Ns~Γ(α, β)

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Multiplicative, 2Ns = -5

distance (bp)

ππ 0 L = 5e4

L = 5e5L = 1e6L = 1e7L = 5e7L = 1e8

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Additive, 2Ns = -5

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Multiplicative, 2Ns~Γ(α, β)

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Additive, 2Ns~Γ(α, β)

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Multiplicative, 2Ns = -5

distance (bp)

ππ 0 L = 5e4

L = 5e5L = 1e6L = 1e7L = 5e7L = 1e8

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Additive, 2Ns = -5

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Multiplicative, 2Ns~Γ(α, β)

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Additive, 2Ns~Γ(α, β)

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Multiplicative, 2Ns = -5

distance (bp)

ππ 0 L = 5e4

L = 5e5L = 1e6L = 1e7L = 5e7L = 1e8

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Additive, 2Ns = -5

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Multiplicative, 2Ns~Γ(α, β)

distance (bp)

ππ 0

-20000 -10000 0 10000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Additive, 2Ns~Γ(α, β)

distance (bp)

ππ 0

Lawrence Uricchio

Page 23: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Complex signatures of selection

• Soft selective sweeps result in multiple haplotypes increasing in frequency.

Soft Sweep

Zach Szpiech

Page 24: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Extended Multiple Haplotype Homozygosity

-- haplotype sample size!

-- set of distinct haplotypes from the locus to marker x!

-- ith most frequent haplotype!

-- number of haplotypes

ï�H��� ï�H��� �H��� �H��� �H���

���

���

���

���

���

���

'LVWDQFH�IURP�FRUH��ES�

EH

H

$GDSWLYH1HXWUDO

SelScan: Szpiech & Hernandez (arXiv)

Sorry, redacted for now… More coming soon!!

Page 25: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Power

0 0.01 0.02 0.05 0.10

160%

120%

80%

40%

0%

Constant Demography (s = 0.01)

0.70

0.80

0.90

Frequency at which selection begins

% in

crea

se in

pow

er o

ver i

HS

Sampling Frequency

0 0.01 0.02 0.05 0.10

140%120%

60%

100%80%

40%20%

0%

African Demography (s = 0.01)

0.70

0.80

0.90

Frequency at which selection begins

% in

crea

se in

pow

er o

ver i

HS

Sampling Frequency

0 0.01 0.02 0.05 0.10

60%

100%

80%

40%

20%

0%

European Demography (s = 0.01)

0.70

0.80

0.90

Frequency at which selection begins

% in

crea

se in

pow

er o

ver i

HS

Sampling Frequency

0 0.01 0.02 0.05 0.10

100%

60%

80%

40%

20%

0%

Constant Demography (s = 0.01)

0.70

0.80

0.90

Frequency at which selection begins

Pow

er

Sampling Frequency

0 0.01 0.02 0.05 0.10

100%

60%

80%

40%

20%

0%

African Demography (s = 0.01)

0.70

0.80

0.90

Frequency at which selection begins

Pow

er

Sampling Frequency

0 0.01 0.02 0.05 0.10

100%

60%

80%

40%

20%

0%

European Demography (s = 0.01)

0.70

0.80

0.90

Frequency at which selection begins

Pow

er

Sampling Frequency

Page 26: Evolutionary Forces Shaping Human Genetic Variation #1000GB

A genomic approach to detecting selection

• Most SNPs are non-coding.

• Most regulatory elements do not act on the nearest gene.

• We can use genome-wide signatures of selection to infer selection on genes using eQTL information.

ARTICLE

Sherlock: Detecting Gene-Disease Associationsby Matching Patterns of Expression QTL and GWAS

Xin He,1,2 Chris K. Fuller,1 Yi Song,1 Qingying Meng,3 Bin Zhang,4 Xia Yang,3 and Hao Li1,*

Genetic mapping of complex diseases to date depends on variations inside or close to the genes that perturb their activities. A strong

body of evidence suggests that changes in gene expression play a key role in complex diseases and that numerous loci perturb gene

expression in trans. The information in trans variants, however, has largely been ignored in the current analysis paradigm. Here we pre-

sent a statistical framework for genetic mapping by utilizing collective information in both cis and trans variants. We reason that for a

disease-associated gene, any genetic variation that perturbs its expression is also likely to influence the disease risk. Thus, the expression

quantitative trait loci (eQTL) of the gene, which constitute a unique ‘‘genetic signature,’’ should overlap significantly with the set of loci

associated with the disease. We translate this idea into a computational algorithm (named Sherlock) to search for gene-disease associa-

tions from GWASs, taking advantage of independent eQTL data. Application of this strategy to Crohn disease and type 2 diabetes pre-

dicts a number of genes with possible disease roles, including several predictions supported by solid experimental evidence. Importantly,

predicted genes are often implicated by multiple trans eQTL with moderate associations. These genes are far from any GWAS association

signals and thus cannot be identified from the GWAS alone. Our approach allows analysis of association data from a new perspective and

is applicable to any complex phenotype. It is readily generalizable to molecular traits other than gene expression, such as metabolites,

noncoding RNAs, and epigenetic modifications.

Introduction

Recent application of genome-wide association studies(GWASs) to complex human diseases led to the discoverythat the majority of disease-associated variants (estimatedto be as high as 88%) are located in noncoding sequences,potentially affecting gene expression rather than proteinfunction.1,2 Because of the complexity of gene regulation,the expression of a gene can be modulated bymutations incis (proximal to the gene) and/or in trans (distal to the geneor on different chromosomes, such as upstream transcrip-tion/chromatin factors, distal regulatory elements, etc.).3,4

As a result of a large mutational target size (primarilybecause of mutations in trans) and the buffering of generegulatory systems that helps tolerate expression changes,genetic variants altering expression levels are common inpopulations.5 Indeed, many studies of expression quanti-tative trait loci (eQTL) demonstrate that the expressionof most genes is influenced bymultiple loci, most of whichact in trans.6–9 Despite their individually small effect sizes,trans eQTL are collectively important for variation of geneexpression and by some estimates account for a larger pro-portion of the heritability of gene expression than do ciseQTL.8,10

Because of their prevalence in the human population,expression variations, especially those in trans, provide sys-tematic perturbations of the gene regulatory networks un-derlying various complex phenotypes, and as such mightreveal important information about the genetic basis ofthese phenotypes. Thus there is a pressing need to developa general framework to mine the collective information in

both cis- and trans-expression QTL in the context of associ-ation studies. So far, information from trans variations haslargely been ignored because only cis variants can be as-signed to their target genes based on proximity by usingthe GWAS data alone. The growing collection of eQTLdata for various human tissues makes it possible to asso-ciate trans variants with target genes.8 Although previousstudies demonstrated the utility of eQTL data for aidingthe analysis of association studies,9,11 most of these usedonly cis eQTL located close to the genes. This reflectssome fundamental difficulties of utilizing information intrans. Because trans eQTL are usually much weaker thanthose in cis, the statistical signal of an individual transeQTL is difficult to detect—it may fall far below thegenome-wide threshold.9 Another major challenge is thepleiotropic effect of trans variation. A gene’s trans perturba-tion may come from the mutation of a regulatory mole-cule, but this mutation may also affect multiple othergenes.Here we present a general strategy to infer genes whose

perturbations contribute to the etiology of complex dis-eases by tapping into statistical information providedby both cis and trans variations affecting gene expres-sion. Although individual variants are often weak andnot particularly informative, the overall pattern ofexpression variants of a gene can provide a strong statis-tical signal. A unique aspect of this strategy is thatbecause we utilize trans variants far from target genes,it is possible to identify important genes distal to anyGWAS association signals and thus impossible to detectwith GWAS alone.

1Department of Biochemistry and Biophysics, University of California at San Francisco, San Francisco, CA 94143, USA; 2Lane Center of ComputationalBiology, Carnegie Mellon University, Pittsburgh, PA 15213, USA; 3Department of Integrative Biology and Physiology, University of California at Los An-geles, Los Angeles, CA 90095, USA; 4Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY 10029, USA*Correspondence: [email protected]://dx.doi.org/10.1016/j.ajhg.2013.03.022. !2013 by The American Society of Human Genetics. All rights reserved.

The American Journal of Human Genetics 92, 667–680, May 2, 2013 667

He et al. AJHG (2013)

Page 27: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Detecting selection on regulatory networks

The likelihood function at each SNP at a given Z is computed by

summing over the hidden variables Ui and Vi:

P!xi; yi jZ

"¼X

Ui ;Vi

PðUiÞPðVi jZ;UiÞPðxi jUiÞP!yi jVi

": (Equation 3)

Ui is a Bernoulli random variable with the success probability a

(the prior probability of a SNP being associated with the expres-

sion trait). The information about the gene-disease relationship

is encoded in the conditional probability P(Vi j Z, Ui). When

Z ¼ 0 or when Z ¼ 1 and Ui ¼ 0 (a false eSNP), Vi is a Bernoulli

random variable with the success probability b (the prior probabil-

ity of a SNP being associated the phenotype). When Z ¼ 1 and

Ui ¼ 1, according to our discussion before, Vi should also be 1.

The probability terms P(xi j Ui) and P(xi j Vi) reflect the distribution

of p values under the null or alternative hypothesis, and we derive

these distributions in Appendix A.

The Bayes factor defined in Equation 2 can be expressed in terms

of the following variables:

Bi;x ¼Pðxi jUi ¼ 1ÞPðxi jUi ¼ 0Þ Bi;y ¼

P!yi jVi ¼ 1

"

P!yi jVi ¼ 0

": (Equation 4)

These are the Bayes factors measuring the association of the ith

SNP with the expression and the phenotypic trait, respectively.

We show, in Appendix B, that the BF of the ith SNP in Equation

2 is given by

Bi ¼1$ a

1$ aþ aBi;xþ aBi;x

1$ aþ aBi;x

Bi;y

1$ bþ bBi;y: (Equation 5)

Thus the Bayes factor of the gene being tested depends only on the

parameters a, b, and the SNP-level Bayes factors. If Bayesian

inference has been performed in both the eQTL and GWAS anal-

ysis, it is straightforward to combine the resulting BFs to obtain

the BF for the gene.

Figure 1. The Sherlock Algorithm: Matching Genetic Signatures of Gene Expression Traits to that of the Disease to Identify Gene-Disease Associations(A) Perturbation of the expression level of a disease-associated gene at any of its eQTL changes the disease risk, and thus theeQTL tend to be associated with the complex disease as well (the dashed lines). The eQTL associations may contain falsepositives, so we use binary indicator variables, U, to represent the true SNP-gene expression relationship; similarly we use indicatorvariables, V, for the SNP-disease relationship. Z is a binary variable indicating whether the expression trait influences the diseaserisk.(B) Hypothetic genome-wide association plots of the causal expression trait (top) and a complex disease (bottom). The geneticsignature of the gene expression trait partially overlaps with that of the disease. Red arrows indicate the matched loci.(C) Alignment of genetic signatures of a gene expression trait and the phenotype. Three different scenarios are shown, representedby the green, red, and black boxes.(D) The probabilistic model representing the dependency of the variables. The semantics of the variables U, V, and Z are shown in(A). When Z ¼ 0, U and V are independent (top). When Z ¼ 1, V depends on both Z and U; if U ¼ 1, then V is also likely to be 1(bottom). The association statistics of a SNP with respect to the gene expression trait and the disease (x and y) depend on thehidden variables U and V. Shaded and open circles indicate observed and latent variables, respectively.

The American Journal of Human Genetics 92, 667–680, May 2, 2013 669

He et al. AJHG (2013)

Page 28: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Detecting selection on regulatory networksRank GENE BF

1 IFNAR2 14.0216

2 DARS 13.3106

3 RARRES2 12.7859

4 SLC25A43 11.8157

5** EXT1 11.4169

6 FAM20B 11.3852

7** MICB 11.2997

8** MICA 11.2997

9** HLA-­‐B 11.2997

10** HLA-­‐C 11.2359

11 RHBDL1 11.1828

12** RBMS3 11.142

13 FNBP1 11.1387

14 P4HB 10.8784

15** SOX5 10.8667

Rank GENE BF

16 KCNK3 10.7472

17 RGS20 10.5487

18 MPST 10.5474

19** HLA-­‐DPB1 10.4441

20 QSOX1 10.4326

21** IL16 10.4201

22** SYT17 10.3908

23 MALL 10.3165

24** CRTC1 10.2577

25 MEMO1 10.2574

26 ISOC2 10.2464

27 PCF11 10.0775

28 XKR8 10.0043

29 RNF216L 10.0043

30** SCG2 10.0012

** indicates genes in GWAS association database for complex phenotype

Page 29: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Selection on standing variation driven by response to pathogens

Description P-value FDR q-value

cytokine-mediated signaling pathway 5.92E-06 6.26E-02

immune effector process 7.47E-06 3.95E-02

regulation of immune system process 7.47E-06 2.64E-02

regulation of defense response to virus 8.53E-06 2.26E-02

lymphocyte costimulation 9.36E-06 1.98E-02

T cell costimulation 9.36E-06 1.65E-02

GOrilla

Page 30: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Haplotype-based selection signals recapitulate geography

−5 0 5

−50

510

Top 1% of windows

PC1 (14.4%)

PC2

(12.

6%)

ACBASW

CDX

CEU

CHBCHS

CLMFIN GBRGIH IBS

JPT

KHV

LWK

MKK

MXLPEL PUR TSI

YRI• TGP samples with phased OMNI genotype data

• Used iHS

• 100kb windows for each population are coded 1 if selection score is in top 1% (0 otherwise)

Page 31: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Conclusions• Many complex signatures of selection in the human

genome.

• Mixtures of positive and negative selection

• Complicated modes of selection (including soft sweeps)

• Predominant signature of ancient human-lineage selection seems to be from olfactory processes

• Recent selection on standing variation associated with complex traits, including pathogen response.

Page 32: Evolutionary Forces Shaping Human Genetic Variation #1000GB

Thanks!

1000 Genomes Project Consortium

Funding: NHGRI; QB3; CHARM; CTSI

[email protected]

Nicolas Strauli

Cyrus Maher

Raul Torres

Lawrence Uricchio

Zach Szpiech