Development and evaluation of single nucleotide ...€¢ compare the results of the simulation with...

1
Introduction The specific objectives of this study were; to develop a suite of informative SNP markers from intron sequences; estimate by computer simulation the statistical power of a combined panel of intronic and exonic SNPs to detect population genetic structure compared with a suite of microsatellite markers; compare the results of the simulation with an analysis of molecular variance between closely related and distant humpback populations. Methods Intronic SNP Discovery CTAGGAATA CTAGGAATA Illumina humpback whale transcriptome sequences ( ) were aligned to exonic regions of the cow genome ( ) using Bowtie (Langmead et al. 2009). Exons: highly conserved but not very variable Introns: weakly conserved but more variable F primer R primer 500 – 800 bps Exons were then randomly selected for primer design to amplify an intervening intron (41 primer sets were designed). Products were then sequenced for a subset of 8 to 24 individuals representing humpback whales that migrate along the east and west coast of Australia, and explored for SNPs. . G FRET A G A A G G FRET C G C Reporter Quencher Primer Poll (Taq DNA pol.) Homo: C/C Homo: A/A Hetero: C/A Mismatch: A C Match: C G C Taqman® PCR probes (Applied Biosystems) were designed for the intronic SNPs as well as 10 exonic SNPs discovered in parallel (Polanowski et al. 2011). These probes were used to genotype samples from western Australia (N=22), eastern Australia (N=23) and California (N=22). Samples were also genotyped at 10 microsatellite loci. Based on empirical estimates of allele frequencies, the simulation software POWSIM was then used to: simulate the statistical power of SNPs and microsatellite markers to detect population structure for different values of FST. simulate the effect of population sample size on the statistical power of each marker type to detect population structure at an expected level of FST = 0.005. simulate the relationship between population sample size and the number of SNP loci needed to detect an FST of 0.005, using average allele frequencies from the Australian dataset. Finally, genetic differentiation among all three regions was assessed for both the SNP and microsatellite loci in an Analysis of Molecular Variance (AMOVA). Results Of the 16 potential intronic SNPs found among 11 loci and 13771bps of sequence, 10 were verified through further sequencing with 7 of these producing consistent and interpretable genotypes using the TaqMan probes (i.e. 7 intronic and 10 exonic SNPs were used in subsequent analyses). Simulations Both the 17 SNPs and 10 microsatellites had sufficient power (> 80 % proportion of significance) to detect an FST > 0.03, given the sample sizes of this study (Figure 1). Figure 1: Simulated estimates of power (using the χ2 test for SNPs and Fisher’s test for microsatellites) to detect population differentiation between two populations when drawing a sample of N = 23 individuals from each at various true nuclear FST for 17 SNP and ten microsatellite loci. For the detection of structure (FST = 0.005) that is typical among some humpback populations, sample sizes > 150 would be required for the 17 SNPs compared to N > 50 for the microsatellites (Figure 2). Figure 2: Effect of sample size (number of individuals genotyped per population) on simulated estimates of power (using the χ2 test for SNPs and Fisher’s test for microsatellites) to detect population differentiation between two populations at the level of FST = 0.005 for the 17 SNP and ten microsatellite loci. If three times the number of SNP loci is used (50 loci), at least 75 samples from each population is needed to detect an FST of 0.005 (Figure 3). Figure 3: Effect of sample size per population on the statistical power (χ2 test) to detect an expected level of FST = 0.005 for different numbers of SNP loci. Consistent with the simulations, an AMOVA found significant differentiation between California and each of the two Australian regions at the microsatellites (FST = 0.029 and 0.036), but only between eastern Australia and California using the SNPs (FST = 0.034) (between western Australia and California FST = 0.018). Conclusions With sufficient sample sizes, a small to moderate number of SNP markers of varying quality and utility can detect differentiation between both neighbouring and distant regions in the humpback whale (FST in the range of 0.005 to 0.04). This study is one of the first to examine the utility of SNP markers in humpback whale population structure analysis, an area of research where microsatellites have, until now, been the most viable option for nuclear marker (Bourret et al. 2008). SNPs appear to offer the best option so far for overcoming some of the challenges associated with the future need to combine data from multiple researchers and serve as a foundation for future regional and global collaborations. Development and evaluation of single nucleotide polymorphism (SNP) markers for population structure analysis in the humpback whale Acknowledgements Research in Australia was funded by the Australian Marine Mammal Centre under permits from the Commonwealth of Australia and the states of Western Australia and New South Wales. Synthetic humpback DNA samples from California were provided by Scott Baker and Debbie Steel, and were prepared by Beth Slikas at Oregon State University. References Bourret, V., M. Mace, M. Bonhomme and B. Crouau-Roy. 2008. Microsatellites in Cetaceans: An Overview. The Open Marine Biology Journal 2:38-42. Langmead, B., C. Trapnell, M. Pop and S.L. Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25 doi: 10.11.86/gb2009-2010-2003-r2025. Polanowski, A.M., N.T. Schmitt, M.C. Double, N. Gales and S.N. Jarman. 2011. TaqMan assays for genotyping 45 single nucleotide polymorphisms in the humpback whale nuclear genome. Conservation Genetics Resources DOI 10.1007/s12686-011-9424-5. Natalie T Schmitt 1,2 , Andrea M Polanowski 1 , Mike C Double 1 , Scott Baker 3 , Debbie Steel 3 , Rod Peakall 2 and Simon N Jarman 1 1 Australian Marine Mammal Centre Science, Australian Antarctic Division, Kingston, Tasmania 7050, Australia 2 Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra ACT 0200, Australia 3 Marine Mammal Institute and the Department of Fisheries and Wildlife, Oregon State University, 2030 SE Marine Science Dr, Newport Oregon 97365 USA Contact: [email protected] or [email protected], [email protected], [email protected], [email protected], [email protected], rod. [email protected], Simon Jarman - [email protected]. Australian Government Department of Sustainability, Environment, Water, Population and Communities Australian Antarctic Division Abstract Studies of cetacean population genetics would benefit from the use of SNPs as scoring is more reproducible than for microsatellites, making them suitable for long-term, collaborative studies of globally distributed species. Here we report the development of SNPs in the humpback whale (Megaptera novaeangliae) and the assessment of their statistical power for population genetic structure analysis. Taqman® assays for 17 SNPs and ten microsatellite loci were used to genotype samples from western Australia (N=22), eastern Australia (N=23) and California (N=22). POWSIM was used to simulate the statistical power of both markers to detect population structure for different values of FST, based on empirical estimates of allele frequencies. The simulations suggest adequate power to detect FST >0.03 for both markers, given the sample sizes of this study. For the detection of structure (FST = 0.005) that is typical among some humpback populations, sample sizes >150 would be required for the SNPs, or N>75 if 50 loci are used, compared to N>50 for the microsatellites. Consistent with the simulation, an AMOVA found significant differentiation between California and each of the two Australian regions at the microsatellites (FST = 0.029 and 0.036), but only between eastern Australia and California using the SNPs (FST = 0.034).

Transcript of Development and evaluation of single nucleotide ...€¢ compare the results of the simulation with...

Page 1: Development and evaluation of single nucleotide ...€¢ compare the results of the simulation with an analysis of molecular variance between closely related and distant humpback populations.

IntroductionThe specific objectives of this study were;

• to develop a suite of informative SNP markers from intron sequences; • estimate by computer simulation the statistical power of a combined

panel of intronic and exonic SNPs to detect population genetic structure compared with a suite of microsatellite markers;

• compare the results of the simulation with an analysis of molecular variance between closely related and distant humpback populations.

MethodsIntronic SNP Discovery

CTAGGAATA

CTAGGAATA… …

Illumina humpback whale transcriptome sequences ( ) were aligned to exonic regions of the cow genome ( ) using Bowtie (Langmead et al. 2009).

Exons: highly conserved but not very variable

Introns: weakly conserved but more variable

F primer R primer

500 – 800 bps

Exons were then randomly selected for primer design to amplify an intervening intron (41 primer sets were designed).

Products were then sequenced for a subset of 8 to 24 individuals representing humpback whales that migrate along the east and west coast of Australia, and explored for SNPs.

.

G

FRETA

GA

A

G

G

FRETC

GC

Reporter

Quencher

Primer Poll (Taq DNA pol.)

Homo: C/C Homo: A/AHetero: C/A

Mismatch: AC

Match: C

GC

Taqman® PCR probes (Applied Biosystems) were designed for the intronic SNPs as well as 10 exonic SNPs discovered in parallel (Polanowski et al. 2011). These probes were used to genotype samples from western Australia (N=22), eastern Australia (N=23) and California (N=22). Samples were also genotyped at 10 microsatellite loci.

Based on empirical estimates of allele frequencies, the simulation software POWSIM was then used to:

• simulate the statistical power of SNPs and microsatellite markers to detect population structure for different values of FST.

• simulate the effect of population sample size on the statistical power of each marker type to detect population structure at an expected level of FST = 0.005.

• simulate the relationship between population sample size and the number of SNP loci needed to detect an FST of 0.005, using average allele frequencies from the Australian dataset.

Finally, genetic differentiation among all three regions was assessed for both the SNP and microsatellite loci in an Analysis of Molecular Variance (AMOVA).

Results Of the 16 potential intronic SNPs found among 11 loci and 13771bps of sequence, 10 were verified through further sequencing with 7 of these producing consistent and interpretable genotypes using the TaqMan probes (i.e. 7 intronic and 10 exonic SNPs were used in subsequent analyses).

Simulations ● Both the 17 SNPs and 10 microsatellites had sufficient power (> 80 % proportion of significance) to detect an FST > 0.03, given the sample sizes of this study (Figure 1).

Nuclear FST

FST

F

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

Pro

port

ion

of

signif

ififcances

0.0

0.2

0.4

0.6

0.8

1.0

17 SNP loci (χ2 test)

ten microsatellite loci (Fisher's test)

Figure 1: Simulated estimates of power (using the χ2 test for SNPs and Fisher’s test for microsatellites) to detect population differentiation between two populations when drawing a sample of N = 23 individuals from each at various true nuclear FST for 17 SNP and ten microsatellite loci.

● For the detection of structure (FST = 0.005) that is typical among some humpback populations, sample sizes > 150 would be required for the 17 SNPs compared to N > 50 for the microsatellites (Figure 2).

No. of samples per population

0 50 100 150 200 250

Pro

po

rtio

no

fsi

gn

ifififcan

ces

0.0

0.2

0.4

0.6

0.8

1.0

17 SNP loci, FSTFSTF = 0.005 (χ= 0.005 (χ= 0.005 (2

test)

ten microsatellite loci, FST

FST

F = 0.005 (Fisher's test) ST

= 0.005 (Fisher's test) ST

Figure 2: Effect of sample size (number of individuals genotyped per population) on simulated estimates of power (using the χ2 test for SNPs and Fisher’s test for microsatellites) to detect population differentiation between two populations at the level of FST = 0.005 for the 17 SNP and ten microsatellite loci.

● If three times the number of SNP loci is used (50 loci), at least 75 samples from each population is needed to detect an FST of 0.005 (Figure 3).

No. of SNP loci

0 10 20 30 40 50 60

No

.o

fsa

mp

mp

mle

sp

erp

op

ula

tio

n

0

50

100

150

200

250

Figure 3: Effect of sample size per population on the statistical power (χ2 test) to detect an expected level of FST = 0.005 for different numbers of SNP loci.

Consistent with the simulations, an AMOVA found significant differentiation between California and each of the two Australian regions at the microsatellites (FST = 0.029 and 0.036), but only between eastern Australia and California using the SNPs (FST = 0.034) (between western Australia and California FST = 0.018).

ConclusionsWith sufficient sample sizes, a small to moderate number of SNP markers of varying quality and utility can detect differentiation between both neighbouring and distant regions in the humpback whale (FST in the range of 0.005 to 0.04).

This study is one of the first to examine the utility of SNP markers in humpback whale population structure analysis, an area of research where microsatellites have, until now, been the most viable option for nuclear marker (Bourret et al. 2008). SNPs appear to offer the best option so far for overcoming some of the challenges associated with the future need to combine data from multiple researchers and serve as a foundation for future regional and global collaborations.

Development and evaluation of single nucleotide polymorphism (SNP) markers for population structure analysis in the humpback whale

AcknowledgementsResearch in Australia was funded by the Australian Marine Mammal Centre under permits from the Commonwealth of Australia and the states of Western Australia and New South Wales. Synthetic humpback DNA samples from California were provided by Scott Baker and Debbie Steel, and were prepared by Beth Slikas at Oregon State University.

ReferencesBourret, V., M. Mace, M. Bonhomme and B. Crouau-Roy. 2008. Microsatellites in Cetaceans: An Overview. The Open Marine Biology Journal 2:38-42.

Langmead, B., C. Trapnell, M. Pop and S.L. Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25 doi: 10.11.86/gb2009-2010-2003-r2025.

Polanowski, A.M., N.T. Schmitt, M.C. Double, N. Gales and S.N. Jarman. 2011. TaqMan assays for genotyping 45 single nucleotide polymorphisms in the humpback whale nuclear genome. Conservation Genetics Resources DOI 10.1007/s12686-011-9424-5.

Natalie T Schmitt1,2, Andrea M Polanowski1, Mike C Double1, Scott Baker3, Debbie Steel3, Rod Peakall2 and Simon N Jarman11 Australian Marine Mammal Centre Science, Australian Antarctic Division, Kingston, Tasmania 7050, Australia2 Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra ACT 0200, Australia3 Marine Mammal Institute and the Department of Fisheries and Wildlife, Oregon State University, 2030 SE Marine Science Dr, Newport Oregon 97365 USA

Contact: [email protected] or [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], Simon Jarman - [email protected].

Australian GovernmentDepartment of Sustainability, Environment,

Water, Population and CommunitiesAustralian Antarctic Division

Photos: David Donnelly. Poster design: Mathew Oakes, Australian Antarctic Division Multimedia Unit

AbstractStudies of cetacean population genetics would benefit from the use of SNPs as scoring is more reproducible than for microsatellites, making them suitable for long-term, collaborative studies of globally distributed species. Here we report the development of SNPs in the humpback whale (Megaptera novaeangliae) and the assessment of their statistical power for population genetic structure analysis. Taqman® assays for 17 SNPs and ten microsatellite loci were used to genotype samples from western Australia (N=22), eastern Australia (N=23) and California (N=22). POWSIM was used to simulate the statistical power of both markers to detect population structure for different values of FST, based on empirical estimates of allele frequencies. The simulations suggest adequate power to detect FST >0.03 for both markers, given the sample sizes of this study. For the detection of structure (FST = 0.005) that is typical among some humpback populations, sample sizes >150 would be required for the SNPs, or N>75 if 50 loci are used, compared to N>50 for the microsatellites. Consistent with the simulation, an AMOVA found significant differentiation between California and each of the two Australian regions at the microsatellites (FST = 0.029 and 0.036), but only between eastern Australia and California using the SNPs (FST = 0.034).