Genome wide association mapping

Genome-wide Association Mapping

Avjinder Singh Kaler

PhD Candidate

Department of Crop, Soil, and Environmental Sciences

University of Arkansas

Nov-15-2016

Plant Breeding Lecture

Identify genomic regions associated with

phenotypes

Phenotypic Data

• Flowering time

• Plant height

• Yield

• Phenotype Variation

• Phenotypes are response

variables

Genotypic Data

• Genomic markers that span the

entire genome

• Single nucleotide

polymorphisms (SNPs) are

commonly used as markers

• Markers are explanatory

variables

Functional Diversity: Phenotype

Plant Height Seed Color

Genetic Architecture of Complex Traits

Phenotype

Genotype Environment

P = G + E + GE

How do we connect genotype to phenotype?

Functional Diversity: Phenotype Variation

• Few recombination events, resulting in relatively low mapping resolution

• Historical recombination events and natural genetic diversity, resulting in high

mapping resolution

GWAS based on Linkage Disequilibrium (LD)

• LD is the non-random correlation or association of alleles at

two loci

• D, D′ (normalized), and r2 are commonly used summary

statistics to estimate pairwise LD

• r2 is preferred in association studies because it is more

indicative of how markers might correlate with QTL

Visualize extent of LD between pairs of loci

LD Decay LD Block (Haplotype View)

Genome-wide association study (GWAS)

• Identify genomic regions associated with a phenotype

• Fit a statistical model at each SNP in genome

• Use fitted models to test H0: No association with SNP

and phenotype

Associating SNPs with phenotypes

• At each SNP: Conduct a test of association with trait

• Significant SNP/trait association suggests:

– SNP has direct biological function (functional polymorphism)

– SNP in LD with functional polymorphism(s)

Line 1

Line 2

Line 3

Line 4

Line 5

Line 6

A/C T/C G/A A/G G/T

Genetic diversity can lead to false positives in a GWAS

• Two sources for false positives:

– Population structure—allele frequency differences among individuals due to local adaptation or diversifying selection

– Familial relatedness—allele frequency differences among individuals due to recent co-ancestry

Genetic Diversity of 2,815 Maize Inbreds

Principal Coordinate 1

Romay et al. (2013)

Controlling False Positives due to Population Structure

• STRUCTURE (Q)

• Identify different subpopulations within a sample of individuals collected from a population of unknown structure

• Estimating Q- matrix

• Time Consuming

• Principle Component Analysis

• Fast and effective approach to diagnose population structure

• PCA summarizes variation observed across all markers into a smaller number of underlying component variables

• Estimating PCs-matrix

Principle Component Analysis

•Scree plot –shows the fraction of total variance in the data explained by each PC

•PCs selected based on the L-curve

Controlling False Positives due to Familial relatedness

•A kinship coefficient (F) is the probability that two homologous genes are identical by descent

•Kinship from genetic markers is an estimate of relative kinship that is based on probabilities of identical by state

Mixed models reduce false positives in GWAS

• (Line1,…, Linen) ~ MVN(0, )

• K = kinship matrix

• εi ~ i.i.d. N(0, )

Phenotype of ith

individual

Grand Mean

Fixed effects: account

for population

structure

Marker effect

Observed SNP alleles

of ith individual

Random effects:

account for familial

relatedness

Random error

Yu et al. (2006)

Measures relatedness between

individuals

Association Mapping Pipeline

Germplasm Selection

•Choice of germplasm is critical to the success of the association analysis

•Phenotyping

•Design Experiment

•Collection of high quality phenotypic data

Phenotypic Outliers

•Outliers are “unusual” data points that substantially deviate from the mean and strongly influence parameter estimates

•Should ALWAYS check for outliers in our data sets

• Do NOT ignore outliers if detected

Phenotypic Outliers • Outliers can

• increase error variance

• reduce the power of statistical tests

• distort estimates

• decrease normality if non-randomly distributed

• Potential Causes of Outliers

• Human errors in data collection, recording, or entry

• Technical errors from faulty or non-calibrated phenotyping equipment

• Intentional or motivated mis-reporting such as “speed” phenotyping in a hot field environment

Evaluate Data for Outliers

•Histogram

•Box-plot (Box and Whisker plot)

•Quantile-Quantile plot – graphical method for comparing two probability distributions to assess goodness-of-fit

Get to know your data!

Statistical Identification of Outliers

•Cook’s distance – measures influence of a data point. Data points that substantially change effect estimates.

•Deleted studentized residuals – measures leverage of a data point. Data points that affect least squares fit.

Two of several possible methods

Removal of Outliers

•Removing anomalous data points from data sets is controversial to some folks.

• If outliers are not removed, inferences made from the fitted model may not be representative of the population under study.

• If you remove outliers, then be sure to report it in the manuscript.

Non-Normal Trait Data

•When fitting a mixed model, two very important assumptions are that the error terms follow a normal distribution and that there is a constant variance.

•When data are non-normal, these two assumptions in particular could be violated.

Analysis of Non-Normal Trait Data

•Generalized linear mixed models can be used to analyze non-normal data

•The Box-Cox procedure can be used to find the most appropriate transformation that corrects for non-normality of the error terms and unequal variances.

Box-Cox Transformation

Genotyping

• SNPs most commonly used in association mapping

Genotype-Quality Control

• Removing the monomorphic markers

• Markers with Minor allele Frequency < 5% or < 3%

• Markers with high missing rate (e.g. > 10%)

• Imputation for missing data (LD-kNNi, FILLIN, FSHAP, BEAGLE)

Controlling False Positives

• Population structure—allele frequency differences among individuals due to local adaptation or diversifying selection

• Familial relatedness—allele frequency differences among individuals due to recent co-ancestry

• If not properly controlled both can cause spurious associations in GWAS

Controlling False Positives

• Population structure• STRUCTURE (Q-matrix)

• Principle Component Analysis (PCs-matrix)

• Familial relatedness• Kinship matrix

Mixed models reduce false positives in GWAS

• (Line1,…, Linen) ~ MVN(0, )

• K = kinship matrix

• εi ~ i.i.d. N(0, )

Phenotype of ith

individual

Grand Mean

Fixed effects: account

for population

structure

Marker effect

Observed SNP alleles

of ith individual

Random effects:

account for familial

relatedness

Random error

Yu et al. (2006)

Measures relatedness between

individuals

What is a significant association?

• Bonferroni correction –procedure to control the family-wise error rate (i.e., probability of making one or more type I errors)

– Simplest and most conservative method to control FWER

– Calculated as α/n, when nis number of hypotheses (i.e., SNPs tested)

• False Discovery Rate –procedure to control the expected proportion of false discoveries

– Less stringent than Bonferroni

– q-value is the FDR analogue of p-value e.g., q=0.10 is 10 false discoveries/100 tests

• Use list of p-values from ALL SNP tests as input to R function p.adjustor packages qvalue, fdrtool, … others

Slide adapted from Prof. Jim Holland

Genome-wide Association Mapping Results

Manhattan plot: summarize GWAS results

QQ-plot: assess performance of Statistical model

Simple Model without correcting for population structure Mixed Linear Model

GWAS results for all SNPs that were analyzed

Software for GWAS

• TASSEL

• GAPIT

• PLINK

• GEMMA

• FARMCPU

• JMP Genomics

• https://omictools.com/gwas-category

• Tutorials– http://www.slideshare.net/AvjinderSingh/basic-tutorial-of-association-mapping-

by-avjinder-kaler

– http://www.slideshare.net/AvjinderSingh/tutorial-for-association-mapping-with-farm-cpu

Genome wide association mapping

Education

Transcript of Genome wide association mapping

Genome-Wide Association Mapping and Genomic Prediction ... · natural variation is not analyzed in such studies. Genome-wide association (GWA) studies proﬁting from a wide allelic

Genome-wide association mapping of agronomic and morphologic

Research Genome-wide mapping and assembly of structural … · 2015. 11. 19. · Research Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome Aaron

Genome-wide association mapping for eyespot …...RESEARCH ARTICLE Genome-wide association mapping for eyespot disease in US Pacific Northwest winter wheat Megan J. Lewien1, Timothy

New resources and strategies for genome-wide mapping in sorghum

Genome-wide Target Mapping Shows Histone Deacetylase ... · Genome-wide Target Mapping Shows Histone Deacetylase Complex1 Regulates Cell Proliferation in Cucumber Fruit1[OPEN] Zhen

Genome-wide High-Resolution Mapping and Functional ...signal.salk.edu/publications/methylome.pdf · Resource Genome-wide High-Resolution Mapping and Functional Analysis of DNA Methylation

Genome-Wide Mapping of Binding Sites Reveals Multiple Biological ...

Preliminary genome-wide association mapping of rice ... · Genome-wide association (GWA) mapping is a technique that links the specific phenotype to sequence variation present in

Genome-wide mapping of chromatin state of mouse forelimbs

Genome Wide Association Mapping for the Tolerance to the ...diposit.ub.edu/dspace/bitstream/2445/138519/1/657960.pdf · performed genome-wide association mapping to determine the

Cell Line Models for Genome Wide Association Mapping in ......Cell Line Models for Genome Wide Association Mapping in Cancer Drug Response Alison Motsinger-Reif, PhD ... Challenges

Genome-wide association and HLA region fine-mapping ... · Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections

Genome-Wide Association Mapping Reveals That Specific and … · Genome-Wide Association Mapping Reveals That Speciﬁc and Pleiotropic Regulatory Mechanisms Fine-Tune Central Metabolism

Quantitative trait loci, genome wide association mapping ...€¦ · GWAS (Genome wide association mapping) Coupling of molecular variants to (quantitative) traits, like weight of

Genome-Wide Association Mapping in Arabidopsis …mcclean/plsc731/homework...Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen

Genome-wide association mapping of growth dynamics detects ...

Genome-Wide Association Mapping of Quantitative Traits in ...MOUSE GENETIC RESOURCES Genome-Wide Association Mapping of Quantitative Traits in Outbred Mice Weidong Zhang,*, 1Ron Korstanje,*,

Genome-wide mapping of DNA methylation: a quantitative ...domino.mpi-inf.mpg.de › intranet › ag3 › ag3publ.nsf... · - 1 - Genome-wide mapping of DNA methylation: a quantitative

Genome-wide prediction of breeding values and mapping of ...

Genome-Wide Association Mapping of Quantitative Traits in ...MOUSE GENETIC RESOURCES Genome-Wide Association Mapping of Quantitative Traits in Outbred Mice Weidong Zhang,, 1Ron Korstanje,,