THEME – 4 Genomic diversity of domestication in soybean

38
Genomic diversity of domestication in soybean Institute of Crop Science Chinese Academy of Agricultural Sciences Li-juan Qiu International Workshop on “Applied Mathematics and Omics Technologies for Discovering Biodiversity and Genetic Resources for Climate Change Mitigation and Adaptation to Sustain Agriculture in Drylands” IAV Hassan II Rabat - Morocco, 24-27 June 2014

description

 

Transcript of THEME – 4 Genomic diversity of domestication in soybean

Page 1: THEME – 4 Genomic diversity of domestication in soybean

Genomic diversity of domestication in soybean

Institute of Crop Science

Chinese Academy of Agricultural Sciences

Li-juan Qiu

International Workshop on “Applied Mathematics and Omics Technologies for

Discovering Biodiversity and Genetic Resources for Climate Change Mitigation

and Adaptation to Sustain Agriculture in Drylands”

IAV Hassan II – Rabat - Morocco, 24-27 June 2014

Page 2: THEME – 4 Genomic diversity of domestication in soybean

1. Background

2. Genetic diversity of G. soja and G. max

3. Pan-genome of G. soja

4. Genomic variation between G. sojas and GmaxW82

5. Selection genes during domestication

Outline

Page 3: THEME – 4 Genomic diversity of domestication in soybean

Glycine

Soja

Glycine 26 perenial wild species (mainly in Australia)

Annual wild soybean (G.soja) (East Asia)

Cultivated soybean (G. max) (Worldwide)

Leguminosea, Papilollateae, Glycine

1. Background

Page 4: THEME – 4 Genomic diversity of domestication in soybean

G. soja

G. max

Landrace

G. max

Modern Cultivars

Domestication

Improvement

Glycine soja - the wild relative of cultivated soybean G. max

S ec ondary G ene P oolG P -2

unknown

Tertiary Gene Pool

GP-3

Wild perennial species

Tertiary Gene Pool

GP-3

Wild perennial species

From: Harlan and deWet (1971)

Two bottlenecks: domestication and breeding

Page 5: THEME – 4 Genomic diversity of domestication in soybean

G. soja

G. max vs G. soja

Plant

— Plant height

— Growth habit

Seed

— Size

— Color

— Pod dehiscence

Physiological

trait

— Protein content

— Oil content Modern cultivar

Page 6: THEME – 4 Genomic diversity of domestication in soybean

Genetic variation controlled the difference

The variation of soybean genome during domestication

Genetic variation, e.g. SNP, InDel, PAV, CNV

Domestication trait related gene

The genetic variation between

wild and cultivated soybean ?

Domestication related traits

The genetic variation between

wild and cultivated soybean ?

Domestication related traits

Page 7: THEME – 4 Genomic diversity of domestication in soybean

The history of soybean cultivation are more than

4500 years since agricultural ancestor Houji, who

planted five crops including soybean.

According to word record, the earlist name of

soybean was “shu” in “The Book of Odes”.

The other languages of soybean in the world are

was translated from the “shu”.

Cultivated soybean is native to China

Page 8: THEME – 4 Genomic diversity of domestication in soybean

China owns the most of soybean germplasms

More than 170,000 soybean accessions are in germplasm

collections. Among them, 45,000 accessions are unique

(Carter et al. 2004)

More than 23,000 cultivated and 7,000 soybean accessions

are conserved in Chinese National Gene bank (CNGB).

Page 9: THEME – 4 Genomic diversity of domestication in soybean

Constructing different level of core collections

Qiu et al 2003,Scientia Agricultura Sinics; Qiu et al 2009, PMB 2013

Core collection: represent the genetic diversity of a crop species

and its relatives with a minimum of repetitiveness

Primary

core

collection

Basic

collection

Core

collection

AAAABBBB

CCCCDDDDEEEE

FFFGGGHHH

AABB

CCDDEE

FFGGHHH

ABCEFGH

Primary core collection

(2794)

Basic collection

(23587)

Location

Phenotype

Phenotype

Genotype

Core collection

in the different level

(248; 433…)

Methods

Methods

Primary

core

collection

Basic

collection

Core

collection

AAAABBBB

CCCCDDDDEEEE

FFFGGGHHH

AABB

CCDDEE

FFGGHHH

ABCEFGH

AAAABBBB

CCCCDDDDEEEE

FFFGGGHHH

AABB

CCDDEE

FFGGHHH

ABCEFGH

Primary core collection

(2794)

Basic collection

(23587)

Location

Phenotype

Phenotype

Genotype

Core collection

in the different level

(248; 433…)

Methods

Methods

Primary core collection

(2794)

Basic collection

(23587)

Location

Phenotype

Phenotype

Genotype

Core collection

in the different level

(248; 433…)

Methods

Methods

Page 10: THEME – 4 Genomic diversity of domestication in soybean

The primary division of genetic diversity was between the

wild and domesticated accessions.

G. soja and G. max represent distinct germplasm pools.

B

A

SSR SSR+SNPSNP

C R K JNER NR HR SR

G.max G.soja

NER NR HR SR C R KJ

G.max G.soja

NER NR HR SR C R KJ

G.max G.soja

K=2

K=3

K=4

K=5

K=6

K=2

K=3

K=4

K=5

K=6

B

A

SSR SSR+SNPSNP

C R K JNER NR HR SR

G.max G.soja

NER NR HR SR C R KJ

G.max G.soja

NER NR HR SR C R KJ

G.max G.soja

K=2

K=3

K=4

K=5

K=6

K=2

K=3

K=4

K=5

K=6

A

SSR SSR+SNPSNP

C R K JNER NR HR SR

G.max G.soja

NER NR HR SR C R KJ

G.max G.soja

NER NR HR SR C R KJ

G.max G.soja

K=2

K=3

K=4

K=5

K=6

K=2

K=3

K=4

K=5

K=6

2. Differentiation between G.soja and G. max

S HH N NE Russia Korea Japan

99 SSR

554 SNP

SSR+SNP

S HH N NE Russia Korea Japan

S HH N NE Russia Korea Japan

99 SSR

554 SNP

SSR+SNP

Li et al. New Phytologist, 2010; Li et al. Theor Appl Genet, 2008

1863 landraces; 59 SSR 112 wild soybean; 99 SSR, 554 SNP

Population structure within species is accordance with

geographic origin in cultivated and wild soybeans respectively

Page 11: THEME – 4 Genomic diversity of domestication in soybean

Genetic diversity was remarkable decreased after domestication

Li et al. (2010) New Phytologist

Cultivated

Wild Hyten et al. (2006) PNAS

Accessions:

26 G. soja

94 G. max

Molecular data:

111 fragments from 102 genes

Accessions:

92 G. soja

279 G. max

Molecular data:

554 SNP markers

99 SSR markers

1807

Wild

0.871

1473

Cultivated

0.687 78.3%

81.5%

Page 12: THEME – 4 Genomic diversity of domestication in soybean

From Schmutz et al., Nature 2010; 463:178-183

The development of

sequencing technique

Cultivated soybean reference genome

Gmax W82

As an important source of genetic diversity, gene repertoire in G. soja remains largely unexplored

Page 13: THEME – 4 Genomic diversity of domestication in soybean

Pan-genome: The set of all genes present in the

genomes of a group of organisms

3. Pan-genome of G. soja

From: Morgante et al. Current Opinion in Plant Biology 10, 149-155 (2007)

Core genome: shared among individuals.

Dispensable genome: an individual-specific or partially-shared

among individuals.

Page 14: THEME – 4 Genomic diversity of domestication in soybean

Why pan-genome ?

Li et al. New Phytologist, 2010

The largest component of variation (~75%) was among

individuals within population

A single genome sequence might not reflect the entire genomic

complement of a species

AMOVA

Page 15: THEME – 4 Genomic diversity of domestication in soybean

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91

GsojaC GsojaB GsojaA GsojaG GsojaE GsojaD

Seven representative wild soybean (New Phytologist, 2010)

China: Northeast, North,

Huanghuai and South regions

Other countries:

Japan, Korea, Russia

Three libraries:

180bp, 500bp, 2kbp

Data

817Gb, 111.9 X in average

Pan-genome of annual wild soybean

Page 16: THEME – 4 Genomic diversity of domestication in soybean

ID GsojaA GsojaB GsojaC GsojaD GsojaE GsojaF GsojaG

Predicated

genome size (Mbp) 981.0 1000.8 1053.78 1118.34 956.43 992.66 889.33

Assembled

geneome

size(Mbp)

813 895 841 985 920 886 878

Contig N50 (Kbp)* 9 22.2 8 11 27 24.3 19.2

Scaffold N50 (Kbp) 18.3 57.2 17 48.7 65.1 52.4 44.9

No. of genes

predicated 58,756 56,655 60,377 62,048 58,414 57,573 58,169

No. of genes

confirmed 55,061 54,256 56,542 57,631 55,901 54,805 54,797

Number of predicated genes: average 55,570 genes/ genome

RNA-Seq validation: 67.3% of predicated genes

Summary of data and assembly

Page 17: THEME – 4 Genomic diversity of domestication in soybean

The pan-genome is dynamic and a single genome does not adequately represent the diversity of the species

The number of total genes

increased as additional

genomes were added and

the no of core genes

decreased

The average pan-genome

size of any two accessions

accounted for 78.2% of that

found using all seven

accessions

Page 18: THEME – 4 Genomic diversity of domestication in soybean

Pan-genome

Core: 48.6% of genes and 80.1% of genome sequence

Dispensable: 51.4% of genes and 19.9% of genome sequence

59,080 genes Genome size: 986.3 Mbp

Pan-genome of annual wild soybean

Page 19: THEME – 4 Genomic diversity of domestication in soybean

8.86/kb

19.93/kb

The dispensable gene set was more variable than the core

gene set, both structurally and functionally.

The dispensable genes have experienced weaker purifying

selection and evolved more quickly than core genes

Core genome vs. dispensable genome

Page 20: THEME – 4 Genomic diversity of domestication in soybean

58.3% of the dispensable could not be assigned any

functional annotation versus 33.9% for the core genes set.

95.5% of core genes had homologs in other species

based on blast searches to 32 plant genomes (excluding

soybean), significantly more than the dispensable gene

set (83.5%, chi-square test, p< 0.01).

lineage-specific genes evolved faster than genes that were

shared between species, either via a higher evolutionary rate

or a higher gene loss rate

Core genes were more functionally conservative among plant species than dispensable genes

Page 21: THEME – 4 Genomic diversity of domestication in soybean

Evolution of the G. max /G. soja species complex

G. soja diverged from G. max more than 0.8 mya

Nearly 3 times older than a previous estimate of 0.27 mya

based on re-sequencing of a single G. soja genome

670 conserved single-copy

gene orthologs

Page 22: THEME – 4 Genomic diversity of domestication in soybean

4.Genomic variation between G. soja and GmaxW82

SNPs: 3.63~4.72 million

Indels: 0.50~0.77 million

Structure var: CNV, PAV

Thousands of genes

affected by above

variations, some of

which may be useful for

future crop improvement.

Page 23: THEME – 4 Genomic diversity of domestication in soybean

G.soja vs G.max: Genomic basis of agronomic traits

photosensing and light signaling coordinately

controlsling

flowering

Two 3nt-indel and 9 non-synonymous

SNP; two variation hotpots

photosensing and light signaling coordinately

controlsling

flowering

Two 3nt-indel and 9 non-synonymous

SNP; two variation hotpots

Page 24: THEME – 4 Genomic diversity of domestication in soybean

G. max

G. soja

Re-sequencing*

1 G.soja+1 G.max

Re-sequencing #

25 G.soja+30 G.max

De novo sequencing

7 G.soja+1 G.max

?712???19.6M?250M

33816117972615M85M480M510M

?????70M?510M

?712???19.6M?250M

33816117972615M85M480M510M

?????70M?510M

#: From Li et al. BMC Genomics, 2013; *: FromKim et al. PNAS, 2010

G.soja-

specific

G.max-

specific

CNV-lossCNV-gainLarge

InDel

(5-100bp)

Small

InDel

(1-5bp)

SNP missed

in Re-seq

SNP G.soja-

specific

G.max-

specific

CNV-lossCNV-gainLarge

InDel

(5-100bp)

Small

InDel

(1-5bp)

SNP missed

in Re-seq

SNP

Specific variations identified in this comparison

Page 25: THEME – 4 Genomic diversity of domestication in soybean

9 SNPs in a 62bp fragment

More SNPs were found by assembly-based method

10 million SNPs, two time of SNPs identified by re-sequencing (Li et al. BMC Genomics, 2013)

New SNP mostly from divergent regions where assembled

sequences could be aligned and short sequencing reads are

difficult to be mapped

Page 26: THEME – 4 Genomic diversity of domestication in soybean

Copy number variation: 1978 genes

1179 loss

726 gain

73 gain and loss

Category: G. soja > G.max

Number: G. max > G. soja

R genes

Page 27: THEME – 4 Genomic diversity of domestication in soybean

>100 bp and <95% identity

PAV sequence: 30.3 Mb

G. soja specific: 11.3 Mb

G. max specific: 19 Mbp

PAV gene:354

G. soja specific: 338

G. max specific:16

Page 28: THEME – 4 Genomic diversity of domestication in soybean

PAV: 24.3% of involved

in defense response

Gs1-3: biotic and abiotic

stress tolerance or plant

development

56 resequencing

accession: frequency G.

soja> G. max

Gs1 Gs2 Gs3

8kb

Page 29: THEME – 4 Genomic diversity of domestication in soybean

Wild

Culitvated

1

2

3

4

5

1

2

3

4

5

Population bottleneck or artificial selection will result in the fixation

of alleles during domestication

5. Selection genes during domestication

Page 30: THEME – 4 Genomic diversity of domestication in soybean

G. Soja

Landrace

Elite cultivar

25 accessions 93.55Gb 98.2%Glyma1.01

31 accessions (Lam et al. 2010)

17 G. soja

14 G. max

25 accessions

Total: 5,102,244 SNPs

Special: 25.5%

specific to our accessionsspecific to our accessionsLi et al. BMC Genomics, 2013

Page 31: THEME – 4 Genomic diversity of domestication in soybean

0

10

20

30

40

50

60

Gm

01

Gm

02

Gm

03

Gm

04

Gm

05

Gm

06

Gm

07

Gm

08

Gm

09

Gm

10

Gm

11

Gm

12

Gm

13

Gm

14

Gm

15

Gm

16

Gm

17

Gm

18

Gm

19

Gm

20

0

20

40

60

80

100

120

140

No. of region No. of genes

No.

of

regio

ns

No.

of

genes

394 regions: 1.47% of the whole genome (950M)

928 genes: 2.0% of 46,430 predicted genes

θπ(cultivated/wild) , Tajima’s D values, FST

20 Kb sliding window (2Kb step-size).

Artificial Selection

Page 32: THEME – 4 Genomic diversity of domestication in soybean

The distribution of selection regions were not random or even

uniform throughout the genome

Appeared to be apparent clusters in certain genomic regions

Gm08

Gm12

Similar to the distribution pattern of QTLs underlying domestication

related traits (Ross-Ibarra, Genetics of Adaptation, 2005)

Page 33: THEME – 4 Genomic diversity of domestication in soybean

A homolog of the domestication gene Grain Incomplete Filling 1

(GIF1) in rice

GIF1 encodes cell-wall invertase that regulates sugar levels

to meet with the demands of cell division and growth during

the grain development.

Increased grain size and weight in transgenic rice

From: Wang et al. Nat Genet, 2008

Selection gene: Glyma03g35520.1

Page 34: THEME – 4 Genomic diversity of domestication in soybean

GmTfl1 (Glyma19g37890.1): Tian et al. 2010; Liu et al. 2010

gDNA cDNA

θ π θ π

GmTfl1 Glyma19g37890.1

Elite cultivars 1.86 0.98 0.98 0.52

Landraces 1.78 1.05 1.78 1.61

G. soja 1.65 1.28 0 0

Glyma03g35250.1

G. max (89) 0 0 0 0

Elite cultivars 0 0 0 0

Landraces 0 0 0 0

G. soja (20) 0.66 0.73 0.85 0.54

The homolog of Glyma03g35250.1 in sunflower experienced

selective sweeps during evolution (From Blackman et al. 2011).

Selection gene: Glyma03g35250.1

Page 35: THEME – 4 Genomic diversity of domestication in soybean

Confirmed some regions or genes

• 100-seed weight: QTL by Yan et al 2014 Plant Breeding, 2014

Type No. of

SNPs

No. of

haplotype

Haplotype

diversity

Total 72 32 0.762

G.soja 71 28 0.952

Landrace 29 5 0.568

Elite 3 4 0.552

Total Wild

Landrace Elite

Page 36: THEME – 4 Genomic diversity of domestication in soybean

Black

Diverse

color

Yellow

G. soja

Landrace

Elite

cultivar

CHS1, CHS3, CHS4, CHS5, and CHS9

Multiple-allele I locus

Soybean seed coat color

0

1

2

3

4

5

6

-40000 10000 60000 110000 160000

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

Page 37: THEME – 4 Genomic diversity of domestication in soybean

The hierarchical genetic structure of soybean landraces was

reflected with the geographic region.

A pan-genome was constructed by de novo sequencing and

assembling seven G. soja accessions.

Inter-genomic comparisons identified up to 3,000 lineage-

specific genes and genes with CNV, PAV or large-effect

mutations, some of which may contribute to variation of

agronomic traits such as resistance, seed composition,

flowering time, biomass etc.

A set of candidate genes significantly affected by selection for

preferred agricultural traits underlying soybean domestication

were identified and some genes were confirmed.

These results will facilitate the harnessing of untapped genetic

diversity from wild soybean for developing elite cultivars.

Summary

Page 38: THEME – 4 Genomic diversity of domestication in soybean

Funding:

National Natural Science Foundation of China State Key Basic Research

and Development Plan of China (973)

National Key Technologies R&D Program in the 11th Five-Year Plan (863)

Acknowledgments

Novogene

Prof. Ruiqiang Li

Guangyu Zhou

Wenkai Jiang

Zhouhuao Zhang

University of Georgia

Prof. Scott A. Jackson

Purdue University

Dr. Jianxin Ma