From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT...

25
From Genomes to Breeding Decisions with GenoMagic TM Paul Chomet Tomato Breeders Workshop April 2018 1

Transcript of From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT...

Page 1: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

From Genomes to Breeding Decisions

with GenoMagicTM

Paul Chomet

Tomato Breeders Workshop

April 2018

1

Page 2: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Join Us for the Pan-Genome Discussion 5-6:30

› New method of capturing sequence based diversity

› Based on a pan-genome NOT a single reference› diversity captured efficiently by a haplotype dB › Pangenome consortium formed for many species

› Utility of system:› ID’s all genetic variants› Genome to genome mapping› Cost effective, accurate imputation service› Marker discovery› Genotyping platform optimization

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.2

Summary:

Page 3: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Genome sequence: A key for crop engineering & improvement

R2

val

ue

Chr 3

Trait Discovery

NBS-LRR Resistance Gene

S. Leaf Blight

Genome ModificationEditing

Transgenes

Mutagenesis

Marker Aided Breeding

Crop Improvement

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.3

Page 4: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

How do you analyze across genomes data?

Reference genome based approach

› High rate of undetected polymorphisms due to unmapped sequences

Ref. Genome- Chromosome 1 Ref. Genome- Chromosome 2

Ref. Genome- Chromosome 1 Ref. Genome- Chromosome 2

› High rate of false discovery polymorphism due to misalignment

› Limited discovery of only part of the polymorphism: SNPs and small INDELs (no structural variation)

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.4

Page 5: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Access to relevant genomics data impacts data use and quality.

Reference Genome

Trait Donors & Breeding Material

% M

apped R

eads to H

ein

z1706

Wild accessions“Old” cultivars

Landraces

Usable Data

Landraces “Old” cultivars

Wild accessions

% S

NPs that are

Hig

h Q

uality

SNP Quality

vs.

With permission from Ruth Wagner, Monsanto, PAG Conference 2018Monsanto variant calling (unpublished) on data fromExploring genetic variation in the tomato clade by whole genome sequencing Plant J 2014 Oct 80 (1): 136-48.

Heinz1706

Page 6: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

From @jrossibarra, Twitter, 4:33 PM - 28 Aug 2017 : Writing perspective on genome size & adaptation in plants w/ @wbmei @dangates_j @MGStetter @mcstitzer (Hint: we think genome size matters)

Most GWAS hits are outside of the genic region in complex genomes

Page 7: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

How is NRGene’s approach different?

Most design methods:

› Uses one reference

› Map reads to reference

› Biased diversity

NRGene’s approach:

› Pan-Genome: not single reference based

› Haplotype dB: sequence captures germplasm diversity

› Utilities: › optimal marker set

› Efficiently impute

› dynamic

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.

"The strategies behind GenoMAGIC are a step above conventional means and enable clear value

gains for downstream analytics, directly impacting cost and timeline models.“

-Joseph Clarke, Principal Research Scientist, Syngenta

7

Page 8: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Overview of NRGene’s breeding solutions

Single/

Multiple full

Genomes

Full Genome DiversityGenotyping

(imputation)

Genomic

Selection

Trait mapping

Marker

design

Downstream analysis

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.

Diversity analysis

(haplotype DB)

Genome assembly…

Comparative genomicsGenome evolutionGenesGene functionCausative polymorphisms…

MarkersGenotypingDiversity analysis…

Recurrent GenotypingGenomic SelectionBreedingTrait associations…

Page 9: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

De-Novo assembly of selected key lines

All to all genome mapping

Transcript mapping PAV/ CNV and translocation calling

Select key lines

Capture genomic information to move across genomes

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd. 9

Page 10: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Raw sequencing data

Accurate and Cost-Effective De Novo Assembly of Complex Genomes

Scaffolds level assembly

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.

Fully phased Assembly of contigs, and scaffolds

DNA extraction

Pure, concentrated, high quantity low fragmentation gDNAextracted from a single individual

Selection of key lines

Best representatives of relevant genetic diversity

Libraries preparation

Optimized Recipe of library types

Sequencing

Optimized sequencing coverage per library type

Library Type Read Length Insert Size Coverage

PCR-free PE 250x2 450bp 60x

PCR-free PE 150x2 800bp 30x

MP 150x2 3kbp 30x

MP 150x2 6kbp 30x

MP 150x2 9kbp 30x

10X 150x2 100kbp 30x

Total 210x

Page 11: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Global Adoption of NRGene’s DeNovoMagic Technology

Wheat, Maize,Barley, Rye

Wheat, Maize, Soy, Apple, Mango, Tomato, Cucumber, Pepper, Pumpkin, Zucchini, Strawberry, Blueberry, Ryegrass, Guayule, Hummingbird, Trout Fish, Bean, Brassica, Grasses, Linen

Wheat, Sunflower, Sinapis, Lentil, Bean

Maize, Rose

OpiumDurum Wheat

Wild Wheat, Sesame, Basil, Tomato, Melon, Pepper, Jojoba, Eucalyptus

Canola, Cotton

Chickpea, Pigeon Pea

Strawberry,

Sweet Potato

Wheat, Maize, Barley, Rice,Cotton, Peanut, Wheatgrass

BovineMaize

Beet

Potato, Cucumber,Squash

Wheat, Ryegrass

CONFIDENTIAL © 2018 All rights reserved to Energin R. Techn

ParameterGerman Bread Wheat Julius

(AABBDD)Bovine Maize

Strawberry(heterozygote

octoploid)

Canola (Tetraploid)

Ryegrass(heterozygote)

Canadian Bread Wheat

(AABBDD)

Scaffold N50 (No. of scaffolds)

38.0 Mbp(102)

38.9 Mbp(22)

35.5 Mbp(18)

3.34 Mbp(131)

8.4 Mbp(37)

3.1 Mbp(420)

14.6 Mbp(269)

Scaffold N90 (No. of scaffolds)

6.6 Mbp(448)

8 Mbp(74)

11 Mbp(58)

0.92 Mbp(425)

0.54 Mbp(405)

0.29 Mbp(1,934)

2.4 Mbp(1,166)

Total assembly size 14.38 Gbp 2.71 Gbp 2.13 Gbp 1.4 Gbp 1.04 Gbp 4.53 Gbp 14.43 Gbp

Unfilled gaps(=n)

1.13% 1.77% 1.9% 0.71% 0.94% 1.30% 1.83%

Completeness (BUSCO- % complete genes)

95.29% 93.37% 96.03% 94.77% 97.07% 97.85% 98.06%

Avni, etal, 2017Hirsch, etal, 2016Lu, etal, 2015

Page 12: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

De-Novo assembly of selected key lines

All to all genome mapping

Transcript mapping PAV/ CNV and translocation calling

Select key lines

Pan genomic analyses builds common coordinate system

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd. 12

Page 13: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Genome A

Genome B

All to All Mapping of Reference Genomes

Input: Two De-Novo assembled reference genomes* illustration

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.13

Page 14: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Genome A

Genome B

sample chromosomestart end sample chromosomestart end match

mo17__ver100 3 1009414 1010165 b73v4__ver100 3 114772 115523 TRUE

mo17__ver100 3 1010165 1010229 b73v4__ver100 3 115523 115587 FALSE

mo17__ver100 3 1010229 1010725 b73v4__ver100 3 115587 116083 TRUE

mo17__ver100 3 1010725 1010789 b73v4__ver100 3 116083 116147 FALSE

mo17__ver100 3 1010789 1011171 b73v4__ver100 3 116147 116529 TRUE

mo17__ver100 3 1011171 1011252 b73v4__ver100 3 116529 116610 FALSE

mo17__ver100 3 1011252 1011427 b73v4__ver100 3 116610 116785 TRUE

mo17__ver100 3 1011427 1011491 b73v4__ver100 3 116785 116849 FALSE

mo17__ver100 3 1011491 1011499 b73v4__ver100 3 116849 116857 TRUE

mo17__ver100 3 1011499 1011563 b73v4__ver100 3 116857 116921 FALSE

mo17__ver100 3 1011563 1011638 b73v4__ver100 3 116921 116996 TRUE

mo17__ver100 3 1011638 1011702 b73v4__ver100 3 116996 117060 FALSE

mo17__ver100 3 1011702 1011707 b73v4__ver100 3 117060 117065 TRUE

mo17__ver100 3 1011707 1011771 b73v4__ver100 3 117065 117129 FALSE

mo17__ver100 3 1011771 1011778 b73v4__ver100 3 117129 117136 TRUE

mo17__ver100 3 1011778 1011842 b73v4__ver100 3 117136 117200 FALSE

mo17__ver100 3 1011842 1011956 b73v4__ver100 3 117200 117314 TRUE

mo17__ver100 3 1011956 1012020 b73v4__ver100 3 117314 117378 FALSE

mo17__ver100 3 1012020 1012918 b73v4__ver100 3 117378 118276 TRUE

mo17__ver100 3 1012918 1012982 b73v4__ver100 3 118276 118340 FALSE

Input: Two De-Novo assembled reference genomes

Output: whole genome mapping depicting areas of homology and sequence polymorphism

All to All Mapping of Reference Genomes

Genome B

* illustration

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.14

Page 15: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Locate transcript areas

Match annotation and indicate PAV/ CNV and translocations

Transcript analysis enables gene variation calling coupled with accurate

mappings

15

Transcript analysis and structural variants calling

MINOR TRANSLOCATION

MAJOR TRANSLOCATION

PAV / CNV

MATCH

Confidential © 2015 All rights reserved to Energin.R Technologies 2009 Ltd

* illustration

Page 16: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Capturing the shared and unique genes across the pan genome

• Transcript mapping to 5 unique de-

novo assembled maize lines

• Shared syntenic transcript mapping

is revealed while building the pan

genome

• Unique transcripts are also revealed

16

20000

30000

40000

50000

60000

70000

80000

90000

Nu

mb

er

of

tra

nsc

rip

ts

Maize variants

Core and Dispensable Transcriptome within different maize lines

Shared Transcripts

Unique Transcripts

Page 17: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Overview of NRGene’s product portfolio

Single/

Multiple full

Genomes

Full Genome DiversityGenotyping

(imputation)

Genomic

Selection

Trait mapping

Marker

design

Downstream analysis

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.

Diversity analysis

(haplotype DB)

Genome assembly…

Comparative genomicsGenome evolutionGenesGene functionCausative polymorphisms…

MarkersGenotypingDiversity analysis…

Recurrent GenotypingGenomic SelectionBreedingTrait associations…

Page 18: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Low coverage sequence captures haplotype information

Short Reads >30x

Illumina, optional 10X libraries

Contigs/Scaffolds

• Filter noise/error

• Phasing of hetrozygous/polyploid contigs

• Longer Scaffolds

Pan Genome DB

• Accurate Mapping

• Identify ALL types of polymorphisms

GenoMAGICPseudo

Chromosomes

Scaffolds MappingAgainst Pan-GenomeDB and Not a Single Reference Genome

Statistics for Maize lines assembly

Coverage Scaffold N50 Assembly size% Accuracy of Scaffolds Mapping (defined)

% of Mapped Assembled Sequences

180x 9.4 Mbp 2.2Gbp 99.99%* 97%

60x 32 Kbp 2.1Gbp 99% 80% - 97%

30x 11Kbp 1.8Gbp 99% 70% - 97%*Serves as Gold Standard; validated with genetic maps

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.18

Page 19: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Discover, Store, and Compare Haplotype MarkersHaplotype Markers differentiate between variants, includes indels, SV, SNPs, translocations

CML247

PHG47

PH207

B73 A

T

A

A

T

T

C

T

G

T

C

C

C

G

T

T

G

T

T

T

G

T

T

T

GCCAGTCCG

GCATGCGATGCCGT

TCCGACTTTCA

GGTCACGCAATC

CAG…ACG

TGAACAG…ACGCAGT

A polymorphism is a change between two different lines and is therefore only relevant to the two lines examined.

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.

A haplotype marker is the sequence which uniquely defines the haplotype as compared to the common pangenome

19

Page 20: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Haplotypes Database & genotype (array, amplicon, GBS) imputation

GBS

Pan-genome:Key Diversity

Lines(180-220x)

Resequencing:Diversity panel

Lines(3x – 40x)

Haplotype imputation

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.

Diversity analysis:Haplotype markers

(Millions per sample)

SNP (array/ amplicon)

Genotype dataSequence Haplotype DB

OR

Enriched marker data

Imputation

Dynamic iterations of DB update

+

Page 21: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Join Us for the Pan-Genome Discussion 5-6:30

› New method of capturing sequence based diversity› Based on a pan-genome NOT a single reference› diversity captured efficiently by a haplotype dB › Pangenome consortium formed for many species

› Utility of system:› ID’s all genetic variants› Genome to genome mapping› Cost effective, accurate imputation service› Marker discovery› Genotyping platform optimization

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.21

Summary:

Page 22: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Use case study with GenoMAGICComparison of SNP array data: single reference genome

vs. Haplotype marker dB

22

Page 23: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

NRGene haplotypes are consistent with Monsanto expectations and provide insights through increased resolution compared to bi-allelic marker approaches.

>10 kb>1 Mb

23

SNP Seq SNP Seq SNP Seq SNP Seq SNP Seq

SNP= SNP haplotypes shown based on 50K SNP chip and a Monsanto algorithm

Seq=Haplotype similarity blocks (>1MB) based on NRGene algorithm

https://pag.confex.com/pag/xxvi/meetingapp.cgi/Paper/31991

Page 24: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Case example using a 400 kb region of chr1

>94% shared SNPs or markers between B73-Flint1

NRGene’s System › Includes all types of markers

› Greatly reduces false positive SNP markers

Affymetrix 600K array

167 SNPs

B73

Flint1

Flint2

59%- GOOD: SNPs are polymorphic B73- Flint 241% - BAD: SNPs shared between unrelated haplotype

Flint1

Flint2

NRGene Haplotype Markers

B73

1610 Markers

95%- GOOD: unique hap markers B73- Flint 25% - BAD: markers shared with B73 haplotype

CONFIDENTIAL © 2017 All rights reserved to Energin R. Technologies Ltd.24

Page 25: From Genomes to Breeding Decisions with GenoMagicTMtgc.ifas.ufl.edu/TBRT 2018/BreedingTech/From... · From Genomes to Breeding Decisions with GenoMagicTM Paul Chomet Tomato Breeders

Capturing the shared and unique Hap Markers across 9 maize variants

• 9 full de novo assemblies were screened for hap

markers

• Overall marker analysis revealed high genetic

diversity:

1. average number of 4.2M markers per sample

2. A large number (17,516,792) of unique markers

3. 3,139,431 (18%) are standard SNP markers*

26

* Polymorphism is a SNP (45%) and has an alternative sequence with high allele frequency (26%)

10000

100000

1000000

10000000

100000000

1 2 3 4 5 6 7 8 9

# o

f m

arke

rs

# of samples (genomes)

Cumulative # of markers

unique markers shared markers

3.9M

17.5M

160k