Building a foundation for microbial metagenomics analysis Mccluskey_WDCM... · Building a...

32
WDCM and CODATA Joint Workshop ICCC13 Beijing, China ONE-GENE, ONE-ENZYME Building a foundation for microbial metagenomics analysis Kevin McCluskey 1 and Scott Baker 2 1 The Fungal Genetics Stock Center 2 Department of Energy, Pacific Northwest National Laboratory

Transcript of Building a foundation for microbial metagenomics analysis Mccluskey_WDCM... · Building a...

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Building a foundation for microbial

metagenomics analysisKevin McCluskey1 and Scott Baker2

1The Fungal Genetics Stock Center2Department of Energy, Pacific Northwest National Laboratory

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Outline

• Background

• Role of collections

• Whole genome survey programs

• Accessing genome databases

• Evaluating genome diversity

2/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

What is metagenomics?

• The direct genetic analysis of communities of microbes

– Environmental

– Intestinal/Rumen

• Growth of Metagenomicspublications (from Pubmed)

3/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Metagenomic are proliferating

• US NCBI SRA lists over 17,000 publicmetagenome studies

• 16,466 are private

4/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Metagenome studies depend on robust reference materials

• 2005 Most taxa are unidentifiedor Id to Phylum

• Microbial“Dark Matter”

5/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Resources in collections are essential for metagenomics

77 different species representing

23 different genera

59 species with 10 or fewer isolates

> 70 strains from whole genome

sequencing programs

# strains Species

18744 Neurospora crassa

1188 Aspergillus nidulans

672 Neurospora intermedia

550 Fusarium sp.

299 Neurospora tetrasperma

274 Neurospora sitophila

253 Schizophyllum commune

241 Sordaria sp.

152 Neurospora discreta

134 Magnaporthe grisea

138 Aspergillus niger

75 Pichia pastoris

52 Neurospora sp

28 Ascobolus sp.

26 Gelasinospora sp.

53 Aspergillus fumigatus

6/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Increasing pressure on operations of culture collections in post-genomics era

Neurospora genome was published in 2004

0

500

1000

1500

2000

2500

20

00

20

01

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

10

20

11

Neurospora

Aspergillus

Other

Plasmids

Plates

Numbers of items distributed by the FGSC in recent years.

7/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Taxonomic studies and WGS

• 1000 Fungal Genomes seeks to provide one complete genome sequence for every fungal FAMILY

• GEBA will fill gaps in bacterial and archael genome sequences

8/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

US DOE Joint Genome Institute programs

• Most are user programs

– community sequencing program

– emerging technologies opportunity program

– technology development pilot program

9/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

US DOE Joint Genome Institute programs

• Emphasizing relationships with living microbe collection

– Genomic encyclopedia of bacteria and archae

• DSMZ

– 1000 Fungal genomes program

• USDA, USFS, FGSC collections http://1000.fungalgenomes.org

http://www.jgi.doe.gov/programs/GEBA/

10/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

US DOE JGI Data Release

11/31

• JGI supports open data

• Data published immediately

– Available immediately on JGI portal• Fort Lauderdale (2003) and Toronto (2009) agreements

– Reserves right of first publication• Not strongly enforced

• Registration required (Fall 2013)

– Archived at NCBI

• Analyses and time-of-publication data responsibility of investigator

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

1000 Fungal Genomes

12/31

• PIs: Spatafora, Stajich,McCluskey, Crous, Turgeon, Lindner, O'Donnell, Ward, Rokas, Glass, Arnold,Martin, Grigoriev

• Sampling to completelysequence ≥one speciesfrom each Family

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Current Databases are Taxonomic

• JGI

• GenomesOnLineDatabase

13/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

BLAST at JGI

• Somewhat hidden

• Identifies SOME strain names

14/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

BLAST at Pubmed

• Taxonomicallydense

• Not allhits are to extantstrains

15/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Searching collection catalogs by gene ontogeny

• Search bytraits

• Limitedat JGI

16/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Searching by GO term

• Could allow a taxonomicallyblind search

• Also does notrequire that user have DNAsequence fortrait

17/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Searching by GO term

• Only as goodas the annotation

18/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Searching by GO term

• Only as goodas the annotation

• Alternatives:KEGGKOG

19/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Searching by GO term

• Only as goodas the annotation

• Alternatives:KEGGKOG

20/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

How can you know if your gene is interesting

• Resequencing characterizes within-species genetic diversity

• Enables allele finding

• Amenable to quantitative analysis

21/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Resequencing reveals „Haplotypes‟

• 3 Backgrounds:– 7035 is in a “St. Lawrence” background

• 3831 and 2261 are intermediate

– 1363 is in a “Lindegren” background

– 821 is in an “Abbott” background

• This figure shows polymorphisms on chromosome 7 from five strains

• Other chromosomes show differentamounts of Lindegren vs St. Lawrencesequence

Shared SNP

Unique SNP

Indel

22/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Size distribution of Indels

• Indels of size +/- 4 are over-represented

• Indels that do not cause frameshift are greatly over-represented in coding sequences

0

5000

10000

15000

20000

25000

30000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Nu

mb

er

of

Ind

els

Indel Size

Size distribution of Indels

0

20

40

1 2 3 4 5 6 7 8 9101112131415161718

Indel size (number of bases)

% CDSindels

23/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Mitochondrial polymorphisms among WGS strains

Strain SNP INDELCDS

INDEL106 2 14 0305 4 65 5309 3 23 0322 7 83 5821 6 133 81211 4 69 41303 4 53 41363 6 11 02261 2 10 03114 4 51 23246 3 28 13562 6 27 13564 3 70 43566 36 322 213831 16 153 103921 3 18 27022 11 92 37035 9 29 1

Griffiths, Collins and Nargang (1995)

• No two strains have the same mitochondrial genotype

• Some strains areheteroploid

24/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

How is the variability relevant

• Resequencing reveals multiple alleles at many loci

• Evaluating generated or natural mutants requires a background of natural or neutral variability

25/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Allele variability among resequenced strains

• Depends on organism

• Neurospora has 9,730 genes

– Sequenced 18 strains (pilot study)

9,730175,000

Potential range of allele variability

All identical to reference genomeEvery allele unique

?

26/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Allele variability: Neurospora example

• All strains had SNPs and indels

• >400 ORFs had no SNPs

• All strains had nonsense mutations 0

50

100

150

13

63

32

2

35

62

70

22

35

64

31

14

12

11

38

31

70

35

Nonsense SNPs

Strain Total SNPs

309 13274

7035 18487

1211 20493

3246 21533

3831 22961

106 23579

3566 37516

3114 41085

2261 44839

3564 47981

1303 59356

7022 78991

3921 80311

305 90195

3562 106533

322 142489

1363 146641

821 188346

0

20000

40000

60000

80000

82

1

13

63

70

22

13

03

35

66

32

46

12

11

70

35

10

6

SNPs per strain

27/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Allele variability among resequenced strains

• 18 Laboratory mutant strains

– two lineages

10,000180,000

33,172 alleles among 18 strains

All identical to reference genomeEvery allele unique

28/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Lineage dictates number of unique alleles

• Strains crossedwith referencegenome strainhave fewest novel alleles

0

1000

2000

3000

4000

5000

6000

7000

13

63

82

1

32

2

30

5

35

62

39

21

70

22

22

61

13

03

35

64

10

6

35

66

32

46

31

14

38

31

70

35

30

9

12

11

No

n-r

efe

ren

ce A

llele

s

Strain Number (FGSC)

Number of novel alleles

2489 = St. Lawrence/ Oak Ridge

1363 = Lindegren

821 = Abbott

Less related More related

29/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Lineage dictates number of unique alleles

• Strains crossedwith referencegenome strainhave fewest novel alleles

0

1000

2000

3000

4000

5000

6000

7000

13

63

82

1

32

2

30

5

35

62

39

21

70

22

22

61

13

03

35

64

10

6

35

66

32

46

31

14

38

31

70

35

30

9

12

11

No

n-r

efe

ren

ce A

llele

s

Strain Number (FGSC)

Number of novel alleles

2489 = St. Lawrence/ Oak Ridge

1363 = Lindegren

821 = Abbott

Fewer Backcrosses More Backcrosses

29/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Culture Collections supportmetagenomics analyses

• Taxonomic diversity studies

– GEBA

– 1000 Fungal Genomes Project

– BGI

• Genetic diversity studies

– Pan genome

– Neurospora

– Dothidiomycetes

– Eurotiomycetes

30/31

WDCM and CODATA Joint WorkshopICCC13 Beijing, ChinaONE-GENE, ONE-ENZYME

Acknowledgements

• US National Science Foundation grant 0235887 (FGSC)– Mike Plamann, Aric Wiest

• US National Science Foundation grant 1203112 (RCN)

• US NIH award GM068087 (Jay Dunlap, PI)– Knock-out collection

• DOE JGI Fungal Genomics Program– Scott Baker, Igor Grigoriev, Wendy Schackwitz, Anna Lipzin,

Joel Martin

31/31