Ecological genetics of Arabidopsis thaliana from reservoir populations in low-disturbance habitats...

1
Ecological genetics of Arabidopsis thaliana from reservoir populations in low-disturbance habitats Neil Pearson, Warwick HRI Contact [email protected] Student ID: 0867630 Supervisors: Professor Eric Holub, Professor Robin Allaby Funding: BBSRC Summary Bioinformatic analyses of high density SNP data from Arabidopsis thaliana accessions collected in the UK will attempt to identify regions in the genome that could trace the history of haplotypes back to founding populations, and to determine whether regions are under selection due to parasitism (e.g, Albugo candida, the white blister rust pathogen). A number of genome-wide studies have recently discovered evidence of selection in the human genome, and this project will extend such techniques into the field of Arabidopsis genomic research. Haplotype blocks under selection will be identified by incidences of contiguous covariant SNPs at a rate significantly higher than expected under a neutral model. The lengths and distributions of these haplotype may grant insight into migratory history and recent adaptive walks of A. thaliana, and may also provide indications of the allelic compositions of the founders of the UK population. The population history inference software DADI will also be used to compare a frequency spectrum derived from SNP data with models of potential population histories. In parallel, genome-wide association mapping will be used to identify regions that may confer susceptibility in a global sample of A. thaliana to a common oomycete parasite Albugo candida (white blister rust). Use of A. candida will allow a test for correlation with haplotype blocks, thus indicating positive selection for resistance to infection. Correlation between haplotype blocks and environment types (gardens versus low disturbance wall sites), or between broader habitat types, will be investigated using available geographic data. A software solution dedicated to finding evidence of positive selection from such combined SNP and phenotype data will be produced and released to facilitate further research into the underlying genetic causes of phenotypic variation using this approach. The aim of this project is therefore to identify local genetic variation in A. thaliana that can be attributed to the action of selection, especially that caused by A. candida. Objective 1: Identification of haplotypes Previous work (Platt et al., 2010) indicated that globally, the Arabidopsis thaliana population followed a model of gene flow known as isolation by distance – in which the likelihood of two individuals sharing alleles decreases as geographic separation increases – and that this model held true at all scales examined across Eurasia. Due to the relatively small number of loci available, however, this approach could not be used to investigate specific predictions concerning selection acting on particular loci. Applying a similar approach with the 250K SNP dataset, though, allows such predictions to be made. Two subsets of accessions were selected from the full dataset: accessions collected in the UK, and accessions from the Nordborg- Bergelson set. SNP data were divided into windows of 100 adjacent loci each, and a script was used to locate pairwise similarity of 99% or greater per window. A K-means clustering algorithm was then used to separate out haplotypes that failed to be distinguished due to proximity (see Figure 2). Results (Figures 1a and 1b) show a close similarity in the distribution of haplotypes across the genome in both subsets. This shows that, in all likelihood, most haplotypes are older than the species’ entry to the UK; this will be confirmed by a comparison seeking corresponding blocks occurring in both subsets, further enabling a comparison of frequencies between subsets (see Objective 3). Objective 2: QTL mapping Albugo resistance A set of Multiparent Advanced Generation Inter-cross (MAGIC) lines (see Kover et al., 2009) were grown and inoculated with ACEM2, a race of Albugo candida, and the resulting phenotypes recorded (see Images 2a to 2g). Analysis of phenotypes using MAGIC mapping software revealed 3 association peaks, closely corresponding with genes WRR4 (Borhan et al., 2008), WRR5/6 (Holub and Cevic, pers. comm.), and an unnamed gene. Upon first analysis using all accessions, a strong association peak is discovered (see Images 3c and 3d). Removing lines showing complete resistance reveals two more peaks (Image 3e). This experiment is now being repeated with a second A. candida isolate collected from C. bursa, in order to establish that the two isolates are clonal, and provoke the same phenotypes that associate with the same defence genes in A. thaliana. Objective 3: Further investigation Several lines of enquiry may now be followed: Measure A. candida resistance phenotypes of accessions used in 250K dataset • Carry out DADI analysis (Gutenkunst & Hernandezr, 2010) of frequency spectra in order to infer population history, in addition to simple geographic correlations of haplotypes comparisons to models specified from data derived from Platt et al. (2010) • Use Kimura’s equation (Kimura & Ohta, 1973) to estimate divergence time (in generations) between haplotypes found in UK and Nordborg-Bergelson accessions, assuming neutrality Resampling in regions showing differences in frequency. Use F-statistics and Hardy-Weinberg equilibrium to identify instances of gene flow between populations in distinct geographic areas, and selection The end goal… RELATE THE ECOLOGY TO THE GENETICS An overview of run structure in this group: Window 1 2 3 4 5 6 7 8 9 10 Sq_1 | | | | | - - - - - NFA_8 - | | | | - - - - - Hil_1 | | | | | - - - - - Crl_1 | | | | | - - - - - Edburgh_8 | | - - - - - - - - HR_5 - - - - - - | | | | HR_10 - - - - | | | | | | Cnt_1 - - - - | | | | | | UKSE6_640 - - - - - - | | - - UKSE6_618 - - - - | | | | | | UKID35 - - - - - - | | - - UKID87 - - - - | | | | | | UKID103 - - - - | | | | | | UKID28 - - - - | | - - - - UKID17 - - - - | | | | | | CIBC_5A - - - - - - | | | | UKSE6_626 - - - - - - | | | | PHW_13 - - - - - - - - | | Figure 2 An example of similarity within a short series of windows, demonstrating the necessity of employing clustering analysis to determine haplotype structure. Vertical marks represent ≥99% similarity between 2 or more accessions, horizontal marks represent less extensive similarity. Background Efforts to understand the genetic basis of phenotypic diversity have advanced in 3 major stages, with techniques generally being pioneered in human genetics and subsequently applied to the study of other model organisms, including A. thaliana: 1. Human Genome Project: First complete sequence of the entire genome. Raised the possibility that the genetic causes of all phenotypic variation might soon be known 2. International HapMap Consortium: Utilised high-density SNP data to attempt to trace genetic differences responsible for phenotypic variation. Shifted perspective from simple Mendelian characters to more complex, quantitative traits, as described by Plomin (2010) 3. 1000 Genomes Project: Resequencing effort, made possible by technological advances. Addresses biases inherent to HapMap approach, and enables comprehensive genome-wide association mapping techniques These techniques, pioneered in human genetic research, have proven effective when applied to A. thaliana, being used – for example – to identify genes associated with flowering time (Ehrenreich et al, 2009). It is further argued (Bergelson and Roux, 2010) that placing such genome-wide association studies in an ecological context enables the study of past and the prediction of future evolutionary trajectories – i.e., selective walks. Following this line of thought, three complementary tests were applied to a set of genome-wide SNP data generated from A. thaliana by Horton et al (2012), in order to identify previously unknown genomic regions that are under selection. Many were found, but the exact details of the population history responsible for these results are, as yet, unknown. DADI analysis Image 1 Albugo candida, encountered on Arabidopsis thaliana’s close relative and competitor Capsella bursa (Shepherd’s purse), causing the disease ‘white rust’ Image 3a-e Haplotypes (≥99% similarity) identified from 250K SNP data in UK accessions (a) and international accessions (b), and MAGIC mapping traces using all phenotypes (c), binary resistant/susceptible (d), and entirely resistant phenotypes removed (e) Image 4 PCA of international 250K SNP data (taken from Horton et al 2012 supplementary data) Images 2a-g Observed response phenotypes to Albugo candida infection in Arabidopsis thaliana, ranging from complete resistance (a) through partial resistance (b, c, d, e) to full susceptibility (f, g) a b c d e f g a b c d e Image 5 Initial 2-dimensional comparison of 250K SNP data (UK and Nordborg-Bergelson groups) against FS derived from bottlenecked and diverging population model. Note process of constructing data is, as yet, flawed. References Borhan, M. H., Gunn, N., Cooper, A., Gulden, S., Tör, M., Rimmer, S. R., & Holub, E. B. (2008). WRR4 encodes a TIR-NB-LRR protein that confers broad-spectrum white rust resistance in Arabidopsis thaliana to four physiological races of Albugo candida. Molecular plant-microbe interactions : MPMI , 21(6), 757-68. Ehrenreich, I. M., Hanzawa, Y., Chou, L., Roe, J. L., Kover, P. X., & Purugganan, M. D. (2009). Candidate gene association mapping of Arabidopsis flowering time. Genetics, 183(1), 325-35. Gutenkunst, R. N., Hernandezr, R. D., Williamson, S. H., & Bustamante, C. D. (2010). Inferring the demographic history of multiple populations from genomic polymorphism data. Statistics, 4-4. Horton, M. W., Hancock, A. M., Huang, Y. S., Toomajian, C., Atwell, S., Auton, A., Muliyati, N. W., et al. (2012). Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nature Genetics, 44(2), 212-216. Nature Publishing Group. Kimura, M., & Ohta, T. (1973). The age of a neutral mutant persisting in a finite population. Genetics, 75(1). Genetics Soc America. Retrieved from http://www.genetics.org/content/75/1/199.short Kover, P. X., Valdar, W., Trakalo, J., Scarcelli, N., Ehrenreich, I. M., Purugganan, M. D., Durrant, C., et al. (2009). A Multiparent Advanced Generation Inter-Cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS genetics, 5(7), e1000551. Platt, A., Horton, M., Huang, Y. S., Li, Y., Anastasio, A. E., Mulyati, N. W., Agren, J., et al. (2010). The scale of population structure in Arabidopsis thaliana. PLoS genetics, 6(2), e1000843. Plomin, R., Haworth, C. M. A., & Davis, O. S. P. (n.d.). quantitative traits. Genetics. Acknowledgments Prof. Eric Holub Prof. Robin Allaby Doc. Volkan Cevik Warwick School of Life Sciences BBSRC

Transcript of Ecological genetics of Arabidopsis thaliana from reservoir populations in low-disturbance habitats...

Page 1: Ecological genetics of Arabidopsis thaliana from reservoir populations in low-disturbance habitats Neil Pearson, Warwick HRI Contact n.pearson@warwick.ac.uk.

Ecological genetics of Arabidopsis thaliana from reservoir populations in low-disturbance habitats

Neil Pearson, Warwick HRI Contact [email protected] Student ID: 0867630Supervisors: Professor Eric Holub, Professor Robin Allaby Funding: BBSRC

Summary Bioinformatic analyses of high density SNP data from Arabidopsis thaliana accessions collected in the UK will attempt to identify regions in the genome that could trace the history of haplotypes back to founding populations, and to determine whether regions are under selection due to parasitism (e.g, Albugo candida, the white blister rust pathogen). A number of genome-wide studies have recently discovered evidence of selection in the human genome, and this project will extend such techniques into the field of Arabidopsis genomic research.

Haplotype blocks under selection will be identified by incidences of contiguous covariant SNPs at a rate significantly higher than expected under a neutral model. The lengths and distributions of these haplotype may grant insight into migratory history and recent adaptive walks of A. thaliana, and may also provide indications of the allelic compositions of the founders of the UK population. The population history inference software DADI will also be used to compare a frequency spectrum derived from SNP data with models of potential population histories.

In parallel, genome-wide association mapping will be used to identify regions that may confer susceptibility in a global sample of A. thaliana to a common oomycete parasite Albugo candida (white blister rust). Use of A. candida will allow a test for correlation with haplotype blocks, thus indicating positive selection for resistance to infection. Correlation between haplotype blocks and environment types (gardens versus low disturbance wall sites), or between broader habitat types, will be investigated using available geographic data. A software solution dedicated to finding evidence of positive selection from such combined SNP and phenotype data will be produced and released to facilitate further research into the underlying genetic causes of phenotypic variation using this approach.

The aim of this project is therefore to identify local genetic variation in A. thaliana that can be attributed to the action of selection, especially that caused by A. candida.

Objective 1: Identification of haplotypesPrevious work (Platt et al., 2010) indicated that globally, the Arabidopsis thaliana population followed a model of gene flow known as isolation by distance – in which the likelihood of two individuals sharing alleles decreases as geographic separation increases – and that this model held true at all scales examined across Eurasia. Due to the relatively small number of loci available, however, this approach could not be used to investigate specific predictions concerning selection acting on particular loci. Applying a similar approach with the 250K SNP dataset, though, allows such predictions to be made.

Two subsets of accessions were selected from the full dataset: accessions collected in the UK, and accessions from the Nordborg-Bergelson set. SNP data were divided into windows of 100 adjacent loci each, and a script was used to locate pairwise similarity of 99% or greater per window. A K-means clustering algorithm was then used to separate out haplotypes that failed to be distinguished due to proximity (see Figure 2).

Results (Figures 1a and 1b) show a close similarity in the distribution of haplotypes across the genome in both subsets. This shows that, in all likelihood, most haplotypes are older than the species’ entry to the UK; this will be confirmed by a comparison seeking corresponding blocks occurring in both subsets, further enabling a comparison of frequencies between subsets (see Objective 3).

Objective 2: QTL mapping Albugo resistanceA set of Multiparent Advanced Generation Inter-cross (MAGIC) lines (see Kover et al., 2009) were grown and inoculated with ACEM2, a race of Albugo candida, and the resulting phenotypes recorded (see Images 2a to 2g). Analysis of phenotypes using MAGIC mapping software revealed 3 association peaks, closely corresponding with genes WRR4 (Borhan et al., 2008), WRR5/6 (Holub and Cevic, pers. comm.), and an unnamed gene. Upon first analysis using all accessions, a strong association peak is discovered (see Images 3c and 3d). Removing lines showing complete resistance reveals two more peaks (Image 3e).

This experiment is now being repeated with a second A. candida isolate collected from C. bursa, in order to establish that the two isolates are clonal, and provoke the same phenotypes that associate with the same defence genes in A. thaliana.

Objective 3: Further investigationSeveral lines of enquiry may now be followed:

• Measure A. candida resistance phenotypes of accessions used in 250K dataset

• Carry out DADI analysis (Gutenkunst & Hernandezr, 2010) of frequency spectra in order to infer population history, in addition to simple geographic correlations of haplotypes comparisons to models specified from data derived from Platt et al. (2010)

• Use Kimura’s equation (Kimura & Ohta, 1973) to estimate divergence time (in generations) between haplotypes found in UK and Nordborg-Bergelson accessions, assuming neutrality

• Resampling in regions showing differences in frequency. Use F-statistics and Hardy-Weinberg equilibrium to identify instances of gene flow between populations in distinct geographic areas, and selection

The end goal…

RELATE THE ECOLOGY TO THE GENETICS

An overview of run structure in this group:Window 1 2 3 4 5 6 7 8 9 10Sq_1 | | | | | - - - - - NFA_8 - | | | | - - - - -Hil_1 | | | | | - - - - - Crl_1 | | | | | - - - - -Edburgh_8 | | - - - - - - - - HR_5 - - - - - - | | | |HR_10 - - - - | | | | | |Cnt_1 - - - - | | | | | |UKSE6_640 - - - - - - | | - -UKSE6_618 - - - - | | | | | |UKID35 - - - - - - | | - -UKID87 - - - - | | | | | |UKID103 - - - - | | | | | |UKID28 - - - - | | - - - - UKID17 - - - - | | | | | |CIBC_5A - - - - - - | | | |UKSE6_626 - - - - - - | | | |PHW_13 - - - - - - - - | |

Figure 2 An example of similarity within a short series of windows, demonstrating the necessity of employing clustering analysis to determine haplotype structure. Vertical marks represent ≥99% similarity between 2 or more accessions, horizontal marks represent less extensive similarity.

BackgroundEfforts to understand the genetic basis of phenotypic diversity have advanced in 3 major stages, with techniques generally being pioneered in human genetics and subsequently applied to the study of other model organisms, including A. thaliana:

1. Human Genome Project: First complete sequence of the entire genome. Raised the possibility that the genetic causes of all phenotypic variation might soon be known

2. International HapMap Consortium: Utilised high-density SNP data to attempt to trace genetic differences responsible for phenotypic variation. Shifted perspective from simple Mendelian characters to more complex, quantitative traits, as described by Plomin (2010)

3. 1000 Genomes Project: Resequencing effort, made possible by technological advances. Addresses biases inherent to HapMap approach, and enables comprehensive genome-wide association mapping techniques

These techniques, pioneered in human genetic research, have proven effective when applied to A. thaliana, being used – for example – to identify genes associated with flowering time (Ehrenreich et al, 2009). It is further argued (Bergelson and Roux, 2010) that placing such genome-wide association studies in an ecological context enables the study of past and the prediction of future evolutionary trajectories – i.e., selective walks.

Following this line of thought, three complementary tests were applied to a set of genome-wide SNP data generated from A. thaliana by Horton et al (2012), in order to identify previously unknown genomic regions that are under selection. Many were found, but the exact details of the population history responsible for these results are, as yet, unknown.

DADI analysis

Image 1 Albugo candida, encountered on Arabidopsis thaliana’s close relative and competitor Capsella bursa (Shepherd’s purse), causing the disease ‘white rust’

Image 3a-e Haplotypes (≥99% similarity) identified from 250K SNP data in UK accessions (a) and international accessions (b), and MAGIC mapping traces using all phenotypes (c), binary resistant/susceptible (d), and entirely resistant phenotypes removed (e)

Image 4 PCA of international 250K SNP data (taken from Horton et al 2012 supplementary data)

Images 2a-g Observed response phenotypes to Albugo candida infection in Arabidopsis thaliana, ranging from complete resistance (a) through partial resistance (b, c, d, e) to full susceptibility (f, g)

a b c d

e f g

a

b

c

d

e

Image 5 Initial 2-dimensional comparison of 250K SNP data (UK and Nordborg-Bergelson groups) against FS derived from bottlenecked and diverging population model. Note process of constructing data is, as yet, flawed.

References• Borhan, M. H., Gunn, N., Cooper, A., Gulden, S., Tör, M., Rimmer, S. R., & Holub, E. B.

(2008). WRR4 encodes a TIR-NB-LRR protein that confers broad-spectrum white rust resistance in Arabidopsis thaliana to four physiological races of Albugo candida. Molecular plant-microbe interactions : MPMI, 21(6), 757-68.

• Ehrenreich, I. M., Hanzawa, Y., Chou, L., Roe, J. L., Kover, P. X., & Purugganan, M. D. (2009). Candidate gene association mapping of Arabidopsis flowering time. Genetics, 183(1), 325-35.

• Gutenkunst, R. N., Hernandezr, R. D., Williamson, S. H., & Bustamante, C. D. (2010). Inferring the demographic history

of multiple populations from genomic polymorphism data. Statistics, 4-4.• Horton, M. W., Hancock, A. M., Huang, Y. S., Toomajian, C., Atwell, S., Auton, A.,

Muliyati, N. W., et al. (2012). Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nature Genetics, 44(2), 212-216. Nature Publishing Group.

• Kimura, M., & Ohta, T. (1973). The age of a neutral mutant persisting in a finite population. Genetics, 75(1). Genetics Soc America. Retrieved from http://www.genetics.org/content/75/1/199.short

• Kover, P. X., Valdar, W., Trakalo, J., Scarcelli, N., Ehrenreich, I. M., Purugganan, M. D., Durrant, C., et al. (2009). A Multiparent Advanced Generation Inter-Cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS genetics, 5(7), e1000551.

• Platt, A., Horton, M., Huang, Y. S., Li, Y., Anastasio, A. E., Mulyati, N. W., Agren, J., et al. (2010). The scale of population structure in Arabidopsis thaliana. PLoS genetics, 6(2), e1000843.

• Plomin, R., Haworth, C. M. A., & Davis, O. S. P. (n.d.). quantitative traits. Genetics.

AcknowledgmentsProf. Eric HolubProf. Robin AllabyDoc. Volkan CevikWarwick School of Life SciencesBBSRC