Improving genome assemblies and capturing...

31
Improving genome assemblies and capturing genome variation data for applied crop improvement. 1 David Edwards University of Western Australia, Australia [email protected]

Transcript of Improving genome assemblies and capturing...

Page 1: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Improving genome assemblies and capturing genome variation data for

applied crop improvement.

1

David Edwards

University of Western Australia, Australia

[email protected]

Page 2: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Outline

• Sequencing wheat chromosome arms

• Chickpea chromosomal genomics

• Skim GBS based genome assembly

• Validation and improvement of the canola genome

Page 3: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Sequencing wheat chromosome arms

3

Ta 7DS Bd 1

Bd 3

www.wheatgenome.info

Berkman, et al., Plant Biotechnology Journal (2011)

Page 4: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

4

Sequencing wheat chromosome arms

Page 5: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

5

Sequencing wheat chromosome arms

Page 6: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Contig duplication

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

Level of sequence duplication (% of assembled chromosome arm)

IWGSC

Page 7: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Genes not supported by read data

0

5

10

15

20

25

30

2D

L

1D

L

5A

L

6A

S

5A

S

3A

L

2D

S

2A

L

6D

L

5B

L

4D

L

5B

L

6B

L

3A

S

4D

S

2B

S

1A

S

Number of Genes with no read supporting them

Page 8: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

New assembly

0

100000000

200000000

300000000

400000000

500000000

600000000

Assem

bly

siz

e (

bp

)

Chromosome Arm assembly

Total assembly size

IWGSC

Non-normalizedVelvet

Page 9: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Chickpea

Page 10: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Chickpea desi vs kabuli

Page 11: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Chickpea kabuli reference

Page 12: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Kabuli reference

Page 13: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Kabuli reference

Desi Kabuli

Page 14: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Desi reference

Desi Kabuli Desi WGS

Page 15: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Skim GBS

15

• Determine SNPs by sequencing parents and running SGSautoSNP

• Low coverage skim sequence segregating population

• Map reads to the reference genome

• Call genotype where reads cover previously defined SNP

• Impute and clean to define haplotype blocks

Page 16: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Genotype calling

16

Call genotype of previously predicted SNPs

A

C/A T/C

A

Page 17: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Haplotype blocks

TN1 A G G T C C A G G A T A A T

TN2 A G G T C C A G G A T A A T

TN3 T C C A G G C G G A T A A T

TN4 A G G T C C A G G A T A A T

TN5 T C C A G G C T C G C G G C

TN6 A G G T C C A G G A T A A T

TN7 T C C A G G C T C G C G G C

T A G G T C C A G G A T A A T

N T C C A G G C T C G C G G C

Page 18: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Pre-imputation

Page 19: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

After imputation and cleaning

Page 20: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Genome assessment with isolated chromosome sequence data A)Heat maps on released genome (chromosome’s) size & quality B-H) Heat maps on improved genome assessment with isolated chromosomes sequence data I) Gene density plots J) SNP density plots

Kabuli improved genome assessment

Page 21: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Desi improved genome assessment

Genome assessment with isolated chromosome sequence data A)Heat maps on released genome (chromosome’s) size & quality B-H) Heat maps on improved genome assessment with isolated chromosomes sequence data I) Heat map produced with WGS (ICC 4958) J) Gene density plots

Page 22: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Genome stats

0

10

20

30

40

50

60

70

80

Ca1 Ca2 Ca3 Ca4 Ca5 Ca6 Ca7 Ca8

Kabuli v1.0

Kabuli v2.6.2

Desi v1.0

Desi v1.1

Siz

e in

Mb

p

Chromosomes

Kabuli v2.6.2 overall chromosome length has increased from 303.1 Mbp to 423.2 Mbp by placing 1,987 contigs Desi v1.1 overall chromosome length has increased from 124.3 Mbp to 416.9 Mbp by placing 133,840 contigs

Page 23: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Desi vs Kabuli comparison

Page 24: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Improvement of Darmor genome

Page 25: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

GBS using Darmor

• Both parental individuals high coverage (~50x)

• 92 double haploid Tapidor x Ningyou individuals, average coverage 1.6x

• Novel algorithm: contigPlacer

• Uses tagging SNPs per contig and compares genotype patterns with placed contigs, places unplaced contigs

Page 26: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Darmor genome

• 10 A-chromosomes, 9 C-chromosomes,

• 22 sets of unplaced contigs

• Assembled size: 850.29 Mbp

• 645.95 Mbp placed (75.8%), 204.33 Mbp unplaced contigs

Page 27: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Darmor improvement

• Identified 1,006,985 SNPs

• ~60% alleles called, after imputation ~80%

• Data in contigPlacer

Page 28: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Darmor improvement

• Before contigPlacer: 645.95 Mbp placed (75.8%), 204.33 Mbp unplaced contigs

• After contigPlacer: 798.95 Mbp placed (93.9%), 51.33 Mbp unplaced

• 98.5% of unplaced contigs with initial chromosome assignment mapped to the same chromosome

Page 29: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Resolving minor structural errors

Fixing minor structural errors is important for accurate trait fine mapping

Page 30: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Conclusions

• Many high quality published genomes can be improved

• Chromosomal genomics, skimGBS and bioinformatics can validate and improve genome assemblies

• Quality genome assemblies improve trait association, they are essential for pan genome assembly and assessment of structural variation

Page 31: Improving genome assemblies and capturing …ksiconnect.icrisat.org/wp-content/uploads/2015/03/Dave...Improving genome assemblies and capturing genome variation data for applied crop

Acknowledgements

31

Philipp Bayer

Pradeep Ruperao

Juan Montenegro

Kenneth Chan

Michal Lorenc

Agnieszka Golicz

Kaitao Lai

Paul Visendi

Paula Martinez

Jenny Lee

Paul Berkman

Jiri Stiller

Sahana Manoli

Jacqueline Batley

Alice Hayward

Emma Campbell

Jessica Dalton-Morgan

Satomi Hayashi

Reece Tollenaere

Hana Šimková

Marie Kubaláková

Jaroslav Doležel

Tim Sutton

Deepa Jaganathan

Rajeev Varshney

(and colleagues)

Contact:

[email protected]