Microbial Agrogenomics 4/2/2015, UK-MX Workshop
-
Upload
leighton-pritchard -
Category
Science
-
view
1.817 -
download
0
Transcript of Microbial Agrogenomics 4/2/2015, UK-MX Workshop
Microbial AgrogenomicsWhere can it lead us?
Leighton PritchardInformation and Computational SciencesThe James Hutton Institute
Acceptable Use PolicyIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Recording of this talk, taking photos, discussing the content usingemail, Twitter, blogs, etc. is permitted (and encouraged),providing distraction to others during the presentation is minimised.
These slides will be available on SlideShare.
Table of ContentsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Introduction
Why Genomics?
2003-Now
Implications
Where Next?
Conclusions
Centres of ExpertiseIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
http://www.hutton.ac.uk• Dundee Effector Consortium (DEC, with University of Dundee) [link]
• Centre for Research on Potato and Other Solanaceous Plants (CRPS) [link]
• Centre for Human and Animal Pathogens in the Environment (HAP-E) [link]
Plant-Pathogen InteractionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Pathogens of barley (e.g. Rhynchosporium commune), and soft fruit
(e.g. Raspberry Leaf Blotch Virus (RLBV))
Plant-Pathogen InteractionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Potato pathogens, pests, and vectors.• soft-rot bacteria (Dickeya, Pectobacterium, Erwinia)
• blight (Phytophthora infestans)
• Potato Cyst Nematode (PCN) (Globodera)
• aphids (Myzus persicae)
Issue 1: Food SecurityIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Economic cost and burden of crop disease• P. infestans: e1bn Europe; $4bn global
• Societal impact (human health, commodity prices; farming)
• Emerging pathogens (JIT supply chain; climate change)
• Plant-associated human pathogens
• Food fraud
Issue 2: Environmental SustainabilityIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Pesticide minimisation and withdrawal
• Durable resistance, soil-beneficial microbes, plantgrowth/nutritional enhancement
• Traditional breeding, GM, or engineering?
• Soils: rhizosphere interactions/soil diversity
• Farming practices (water run-off, rotation, equipment-cleaning- EU sulphuric acid ban)
Table of ContentsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Introduction
Why Genomics?
2003-Now
Implications
Where Next?
Conclusions
What Have Genomes Ever Done For Us?Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Catalogue features (genes, regulatory elements, etc.) in anorganism.
Plant-microbe interactionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Gene products at the host-microbe interface
Dodds & Rathjen (2010) Nat. Rev. Genet. 11:539-548 doi:10.1038/nrg2812
Plant-Nematode InteractionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
RNA-seq identification of 27 putative nematode effectors:Small proteins, expressed in gland cells during feeding stage only.
Cotton et al. (2014) Genome Biol. 15:R43 doi:10.1186/gb-2014-15-3-r43
Plant DefenceIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Prediction of NB-LRR genes (sequence capture).
Jupe et al. (2013) Plant J. 76:530-544 doi:10.1111/tpj.12307
Jupe et al. (2012) BMC Genomics 13:75 doi:10.1186/1471-2164-13-75
What Have Genomes Ever Done For Us?Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Catalogue features (genes, regulatory elements, etc.) in anorganism.
• If we have multiple genomes. . .• What common features associate with phenotype or
environment?
Plant-microbe interactionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
GWAS/QTLs/genotyping for plant breeding
http://ics.hutton.ac.uk/flapjack/
Milne et al. (2010) Bioinformatics 26:3133-3134 doi:10.1093/bioinformatics/btq580
Plant-microbe interactionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Structural changes to genomes: repeat-driven expansion
duplication, mutation, recombination, epigenetic control of effectors . . .
Haas et al. (2009) Nature 461:393-398 doi:10.1038/nature08358
Plant-microbe interactionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Structural changes to genomes: genome reductions
Buchnera, Serratia symbiotica - aphid symbionts, ‘random’ inactivationGil et al. (2002) Proc. Natl. Acad. Sci. USA 99:4454-4458 doi:10.1073/pnas.062067299
Burke & Moran (2011) Genome Biol. Evol. 99:4454-4458 doi:10.1093/gbe/evr002
Plant-microbe interactionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Lateral gene transfer (virulence-associated genes)
Bell et al. (2004) Proc. Natl. Acad. Sci. 101:11105-11110 doi:10.1073/pnas.0402424101
Plant-microbe interactionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Closely-related bacteria, different host/environmental preference.Pectobacterium atrosepticum
Holden et al. (2009) FEMS Micro. Rev. 33:689-703 doi:10.1111/j.1574-6976.2008.00153.xToth et al. (2006) Annu. Rev. Phytopath. 44:305-336 doi:10.1146/annurev.phyto.44.070505.143444
What Have Genomes Ever Done For Us?Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Catalogue features (genes, regulatory elements, etc.) in anorganism.
• If we have multiple genomes. . .• What common features associate with phenotype or
environment?• Epidemiology: spread and transmission
Historical OriginsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Retracing 19th-century P.infestans pandemics
Yoshida et al. (2014) PLoS Pathog. 10:e1004028 doi:10.1371/journal.ppat.1004028
International EmergenceIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Distribution of Dickeya spp. in Europe• D.dianthicola; ◦ D.solani ; � Dickeya spp. on potato
Toth et al. (2011) Plant Pathol. 60:385-399 doi:10.1111/j.1365-3059.2011.02427.x
Host JumpsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Movement of Dickeya from ornamental to crop plants
Parkinson et al. (2015) Eur. J. Plant Pathol. 141:63-70 doi:10.1007/s10658-014-0523-5
Diagnostic ToolsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Quarantine and legislation require precise identification.Genomes enable rapid, robust RT-PCR diagnostics.
targets
V
IV
III
II
I
genomes
IIIIIIIVV
https://github.com/widdowquinn/find differential primersPritchard et al. (2013) Plant Pathol. 62:587-596 doi:10.1111/j.1365-3059.2012.02678.x
Pritchard et al. (2012) PLoS One 7:e34498 doi:10.1371/journal.pone.0034498
Table of ContentsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Introduction
Why Genomics?
2003-Now
Implications
Where Next?
Conclusions
2003: E. carotovora subsp. atrosepticaIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• £250k collaboration between SCRI, University of Cambridge,WT Sanger Institute
• Single isolate: E. carotovora subsp. atroseptica SCRI1043
• First sequenced enterobacterial plant pathogen (32 authors!)
• Annotation: 6 people, for 6 months ≈ three person-years
• Result: single, complete 5Mbp circular chromosome (10.2X)
Bell et al. (2004) Proc. Natl. Acad. Sci. USA 101: 30:11105-11110. doi:10.1073/pnas.0402424101
2003: E. carotovora subsp. atrosepticaIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Compared against all 142 then-available bacterial genomes
Bell et al. (2004) Proc. Natl. Acad. Sci. USA 101: 30:11105-11110. doi:10.1073/pnas.0402424101
2013: Dickeya spp.Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Sequenced and annotated 25 isolates of Dickeya over two years
• Multiple sequencing methods: 454, Illumina (SE, PE)
• Automated annotation, limited manual correction
• Results: 12-237 fragments: 4.2-5.1Mbp/genome (6-84X)Pritchard et al. (2013) Genome Ann. 1 (4) doi:10.1128/genomeA.00087-12
Pritchard et al. (2013) Genome Ann. 1 (6) doi:10.1128/genomeA.00978-13
2013: Dickeya spp.Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Whole genome-based species definitions: sp. nov. D. solani
van der Wolf et al. (2014) Int. J. Syst. Evol. Micr. 64:768-774 doi:10.1099/ijs.0.052944-0
2013: Dickeya spp.Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Differences in metabolic capacity (but ≈ 20% orphan EC activities)
2014: E. coliIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Sequenced and annotated ≈ 190 isolates of E. coliAll bacteria environmental, sampled from lysimeters
• Illumina PE sequencing, cost ≈£11k
• Automated annotation: PROKKA
(w/ Fiona Brennan, Florence Abram, NUI Galway)
2014: E. coliIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Whole genome-based subspecies classification
Bru
nei2
0070
942_
cont
igs
Mue
nste
r200
6309
1_co
ntig
s
Sen
ftenb
erg2
0070
885_
cont
igs
Lys1
42_c
ontig
s
Lys1
75_c
ontig
s
Lys1
30_c
ontig
s
Lys1
70_c
ontig
s
Lys1
26_c
ontig
s
Lys1
67_c
ontig
s
Lys1
76_c
ontig
s
Lys1
69_c
ontig
s
Lys5
0_co
ntig
s
X50
38_c
ontig
s
Lys1
31_c
ontig
s
Lys1
71_c
ontig
s
Lys1
11_c
ontig
s
Lys1
07_c
ontig
s
Lys1
14_c
ontig
s
Lys1
6_co
ntig
s
Lys2
2_co
ntig
s
Lys6
5_co
ntig
s
Lys5
6_co
ntig
s
Lys1
13_c
ontig
s
Lys1
09_c
ontig
s
Lys7
7_co
ntig
s
Lys1
02_c
ontig
s
Lys1
00_c
ontig
s
Lys9
2_co
ntig
s
Lys9
4_co
ntig
s
Lys8
0_co
ntig
s
Lys6
4_co
ntig
s
Lys8
2_co
ntig
s
AW
3_co
ntig
s
X50
08_c
ontig
s
AW
4_co
ntig
s
AW
1_co
ntig
s
Lys1
18_c
ontig
s
Lys1
38_c
ontig
s
Lys1
21_c
ontig
s
Lys1
22_c
ontig
s
Lys1
77_c
ontig
s
Lys1
55_c
ontig
s
Lys1
65_c
ontig
s
Lys1
63_c
ontig
s
Lys1
60_c
ontig
s
Lys1
61_c
ontig
s
Lys1
72_c
ontig
s
Lys1
44_c
ontig
s
Lys1
35_c
ontig
s
Lys1
46_c
ontig
s
Lys1
23_c
ontig
s
Lys1
24_c
ontig
s
Lys1
50_c
ontig
s
Lys1
40_c
ontig
s
Lys1
57_c
ontig
s
Lys1
73_c
ontig
s
Lys1
56_c
ontig
s
Lys1
58_c
ontig
s
Lys1
59_c
ontig
s
Lys1
62_c
ontig
s
Lys5
_con
tigs
X50
84_c
ontig
s
X50
42_c
ontig
s
Lys1
10_c
ontig
s
Lys1
36_c
ontig
s
Lys5
4_co
ntig
s
Lys1
_con
tigs
Lys6
_con
tigs
Lys1
12_c
ontig
s
X50
12_c
ontig
s
Lys3
0_co
ntig
s
Lys2
5_co
ntig
s
Lys4
3_co
ntig
s
Lys3
7_co
ntig
s
Lys4
0_co
ntig
s
Lys1
51_c
ontig
s
Lys3
1_co
ntig
s
Lys2
7_co
ntig
s
Lys4
2_co
ntig
s
Lys5
1_co
ntig
s
Lys3
3_co
ntig
s
Lys4
6_co
ntig
s
Lys3
8_co
ntig
s
Lys8
9_co
ntig
s
Lys2
3_co
ntig
s
Lys1
15_c
ontig
s
Lys1
08_c
ontig
s
Lys1
04_c
ontig
s
DS
M10
973_
cont
igs
Lys1
25_c
ontig
s
Lys1
05_c
ontig
s
Lys1
7_co
ntig
s
Lys1
28_c
ontig
s
Lys6
6_co
ntig
s
Lys7
3_co
ntig
s
Lys1
5_co
ntig
s
Lys9
1_co
ntig
s
DS
M86
98_c
ontig
s
DS
M86
95_c
ontig
s
Lys7
4_co
ntig
s
Lys6
1_co
ntig
s
Lys9
_con
tigs
Lys1
53_c
ontig
s
Lys8
4_co
ntig
s
Lys9
3_co
ntig
s
Lys7
2_co
ntig
s
Lys6
2_co
ntig
s
Lys2
1_co
ntig
s
Lys5
9_co
ntig
s
Lys6
3_co
ntig
s
Lys8
3_co
ntig
s
Lys1
9_co
ntig
s
Lys4
_con
tigs
AW
13_c
ontig
s
Lys4
5_co
ntig
s
Lys2
8_co
ntig
s
Lys5
3_co
ntig
s
Lys5
2_co
ntig
s
Lys3
4_co
ntig
s
Lys3
6_co
ntig
s
Lys2
4_co
ntig
s
Lys3
5_co
ntig
s
Lys6
8_co
ntig
s
Lys1
06_c
ontig
s
Lys8
8_co
ntig
s
Lys9
7_co
ntig
s
Lys7
6_co
ntig
s
Lys1
34_c
ontig
s
Lys5
8_co
ntig
s
Lys7
1_co
ntig
s
Lys8
1_co
ntig
s
Lys1
29_c
ontig
s
Lys1
20_c
ontig
s
Lys1
45_c
ontig
s
Lys1
37_c
ontig
s
Lys1
27_c
ontig
s
Lys1
52_c
ontig
s
Lys1
01_c
ontig
s
Lys9
8_co
ntig
s
Lys7
0_co
ntig
s
Lys1
33_c
ontig
s
Lys4
7_co
ntig
s
Lys7
5_co
ntig
s
Lys4
8_co
ntig
s
Lys1
48_c
ontig
s
Lys1
39_c
ontig
s
Lys1
41_c
ontig
s
Lys1
64_c
ontig
s
Lys1
49_c
ontig
s
Lys1
47_c
ontig
s
Lys6
0_co
ntig
s
Lys7
9_co
ntig
s
Lys1
68_c
ontig
s
Lys1
8_co
ntig
s
Lys8
7_co
ntig
s
Lys9
6_co
ntig
s
Lys7
_con
tigs
Lys1
54_c
ontig
s
Lys1
17_c
ontig
s
Lys1
19_c
ontig
s
Lys1
78_c
ontig
s
Lys1
16_c
ontig
s
Lys8
6_co
ntig
s
Lys9
0_co
ntig
s
Lys4
1_co
ntig
s
Lys1
3_co
ntig
s
Lys8
5_co
ntig
s
X50
02_c
ontig
s
Lys1
2_co
ntig
s
Lys3
9_co
ntig
s
Lys1
4_co
ntig
s
Lys5
5_co
ntig
s
Lys2
9_co
ntig
s
Lys9
9_co
ntig
s
X50
35_c
ontig
s
Lys8
_con
tigs
Lys3
_con
tigs
X50
34_c
ontig
s
X50
88_c
ontig
s
Lys2
0_co
ntig
s
Lys7
8_co
ntig
s
Lys1
1_co
ntig
s
Brunei20070942_contigs
Muenster20063091_contigs
Senftenberg20070885_contigs
Lys142_contigs
Lys175_contigs
Lys130_contigs
Lys170_contigs
Lys126_contigs
Lys167_contigs
Lys176_contigs
Lys169_contigs
Lys50_contigs
5038_contigs
Lys131_contigs
Lys171_contigs
Lys111_contigs
Lys107_contigs
Lys114_contigs
Lys16_contigs
Lys22_contigs
Lys65_contigs
Lys56_contigs
Lys113_contigs
Lys109_contigs
Lys77_contigs
Lys102_contigs
Lys100_contigs
Lys92_contigs
Lys94_contigs
Lys80_contigs
Lys64_contigs
Lys82_contigs
AW3_contigs
5008_contigs
AW4_contigs
AW1_contigs
Lys118_contigs
Lys138_contigs
Lys121_contigs
Lys122_contigs
Lys177_contigs
Lys155_contigs
Lys165_contigs
Lys163_contigs
Lys160_contigs
Lys161_contigs
Lys172_contigs
Lys144_contigs
Lys135_contigs
Lys146_contigs
Lys123_contigs
Lys124_contigs
Lys150_contigs
Lys140_contigs
Lys157_contigs
Lys173_contigs
Lys156_contigs
Lys158_contigs
Lys159_contigs
Lys162_contigs
Lys5_contigs
5084_contigs
5042_contigs
Lys110_contigs
Lys136_contigs
Lys54_contigs
Lys1_contigs
Lys6_contigs
Lys112_contigs
5012_contigs
Lys30_contigs
Lys25_contigs
Lys43_contigs
Lys37_contigs
Lys40_contigs
Lys151_contigs
Lys31_contigs
Lys27_contigs
Lys42_contigs
Lys51_contigs
Lys33_contigs
Lys46_contigs
Lys38_contigs
Lys89_contigs
Lys23_contigs
Lys115_contigs
Lys108_contigs
Lys104_contigs
DSM10973_contigs
Lys125_contigs
Lys105_contigs
Lys17_contigs
Lys128_contigs
Lys66_contigs
Lys73_contigs
Lys15_contigs
Lys91_contigs
DSM8698_contigs
DSM8695_contigs
Lys74_contigs
Lys61_contigs
Lys9_contigs
Lys153_contigs
Lys84_contigs
Lys93_contigs
Lys72_contigs
Lys62_contigs
Lys21_contigs
Lys59_contigs
Lys63_contigs
Lys83_contigs
Lys19_contigs
Lys4_contigs
AW13_contigs
Lys45_contigs
Lys28_contigs
Lys53_contigs
Lys52_contigs
Lys34_contigs
Lys36_contigs
Lys24_contigs
Lys35_contigs
Lys68_contigs
Lys106_contigs
Lys88_contigs
Lys97_contigs
Lys76_contigs
Lys134_contigs
Lys58_contigs
Lys71_contigs
Lys81_contigs
Lys129_contigs
Lys120_contigs
Lys145_contigs
Lys137_contigs
Lys127_contigs
Lys152_contigs
Lys101_contigs
Lys98_contigs
Lys70_contigs
Lys133_contigs
Lys47_contigs
Lys75_contigs
Lys48_contigs
Lys148_contigs
Lys139_contigs
Lys141_contigs
Lys164_contigs
Lys149_contigs
Lys147_contigs
Lys60_contigs
Lys79_contigs
Lys168_contigs
Lys18_contigs
Lys87_contigs
Lys96_contigs
Lys7_contigs
Lys154_contigs
Lys117_contigs
Lys119_contigs
Lys178_contigs
Lys116_contigs
Lys86_contigs
Lys90_contigs
Lys41_contigs
Lys13_contigs
Lys85_contigs
5002_contigs
Lys12_contigs
Lys39_contigs
Lys14_contigs
Lys55_contigs
Lys29_contigs
Lys99_contigs
5035_contigs
Lys8_contigs
Lys3_contigs
5034_contigs
5088_contigs
Lys20_contigs
Lys78_contigs
Lys11_contigs
ANIm
0.9 0.92 0.94 0.96 0.98
Value
010
0020
0030
0040
0050
0060
00
Color Keyand Histogram
Cou
nt
AB1B2CDEFUX
(w/ Fiona Brennan, Florence Abram, NUI Galway)
2014: Campylobacter spp.Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
≈1034 clinical, animal, food-associated Campylobacter isolates
• Illumina PE sequencing, cost ≈£60k
• Automated annotation: PRODIGAL
(w/ Ken Forbes, Norval Strachan, University of Aberdeen)
2014: Campylobacter spp.Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• 15554 ‘gene families’ in 1034 isolates.
• Calculation: 4e12 pairwise protein comparisons!
(w/ Ken Forbes, Norval Strachan, University of Aberdeen)
Table of ContentsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Introduction
Why Genomics?
2003-Now
Implications
Where Next?
Conclusions
So what’s changed?Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Everything.
• Cost: £250k → £60 per genome.Now cheaper to sequence than analyse a genome!Offload work from people to software.
• Location: sequencing centre, to benchtop (Nanopore!)
• Speed: sequencing run time can be less than a day
• Data: massive volume increase
Predicting the future is hard. . .Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Su et al. attempted to do it, though:
10,000 prokaryotes in 2015 was an underestimate.http://sulab.org/2013/06/sequenced-genomes-per-year/
So what’s changed?Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Everything.
• Cost: £250k → £60 per genome.
• Location: sequencing centre, to benchtop (Nanopore!)
• Speed: sequencing run time can be less than a day
• Data: massive volume increaseMore data ≈ better, but also more challenging.
• Software: more ( 6= better. . .) software for more things
• New experiments: genomes, exomes, variant calling,methylated sequences, STARR-seq, . . .
• New applications: diagnostics, epidemic tracking,metagenomics, . . .
Sequence first. . . ask questions, laterIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• “Why?” has sometimes been replaced by “What?”
http://dilbert.com/strip/2000-01-03
“The thesis is not hypothesis driven. Add a hypothesis and refer to it in subsequent
chapters.”
More isn’t always betterIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Deeper sequencing (more reads) 6= more information or betterassembly.
60-80X coverage the ‘sweet spot’ for bacterial genomes.More reps � more reads!Conway & Bromage (2011) Bioinformatics 27:479-486 doi:10.1093/bioinformatics/btq697
Are database annotations reliable?Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Automated annotation is essentialThe Critical Assessment of Function Annotation (CAFA) project.
Radivojac et al. (2013) Nat. Meth. 10:221-227 doi:10.1038/nmeth.2340
Do biased database annotations matter?Introduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Experimental annotations of proteins are incomplete. Is thatimportant?Tested by simulation, and following databases for three years.• Yes. It matters.
• Current large scale annotations are meaningful and almost surprisingly reliable.
• The nature and level of data incompleteness, and type of classification modelhave an effect.
• “Low precision, high recall” (i.e. less discriminating) tools most significantlyaffected.
Molecular function prediction is usually more reliable thanbiological process predictionJiang et al. (2014) Bioinformatics 30:i609-i616 doi:10.1093/bioinformatics/btu472
Cozzetto et al. (2013) BMC Bioinf. 14:S3-S1 doi:10.1186/1471-2105-14-S3-S1
CAFA resultsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
The Critical Assessment of Function Annotation (CAFA) 2013results. (F-measure combines precision and recall)
• You can do better thanBLAST.
• Best-performing methods docomparably well.
• Best methods usedevolutionary relationships,structure, and expressiondata.
• Machine Learning methodswork best.
Radivojac et al. (2013) Nat. Meth. 10:221-227 doi:10.1038/nmeth.2340
More Isn’t Always BetterIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Statistical inference on large datasets requires extra care.
Hypothesis tests may incorrectly reject null hypotheses (B-H)
More Isn’t Always BetterIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• More tests → random effect seems ’real’
• May be considering a large set of inferences simultaneously(and yet not notice!):“p-hacking”, “Researcher Degrees ofFreedom”“good scientists are skilled at looking hard enough and subsequently coming up
with good stories (plausible even to themselves, as well as to their colleagues
and peer reviewers) to back up any statistically-significant comparisons they
happen to come up with.” Gelman & Loken (2013) ”The Garden of Forking Paths”
(“Data-dredging”)
True for all large data analyses: genomics, metabolomics,proteomics, health screening, finding terrorists, etc.Xia et al. (2012) Metabolomics 9:280-299 doi:10.1007/s11306-012-0482-9Broadhurst & Kell (2006) Metabolomics 2:171-196 doi:10.1007/s11306-006-0037-z
Genome-Scale PredictionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Imagine a paper describing a predictor for protein functionalclass (e.g. pathogen effector)
• The paper reports sensitivity = 0.95, FPR = 0.01
• We run the predictor on 20,000 proteins in an organism
• It predicts 130 members of the class. How many of them arelikely to be true positives?
• We need a baseline level of that class (fX ) in the genome todetermine this.
• Estimate ≈ 200 in gene complement, so fX = 0.01• fX = 0.01 =⇒ P(class|+ve) = 0.490 ≈ 0.5: 65 TP
Pritchard & Broadhurst (2014) Meth. Mol. Biol. 9:280-299 doi:10.1007/978-1-62703-986-4 4
Genome-Scale PredictionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Imagine a paper describing a predictor for protein functionalclass (e.g. pathogen effector)
• The paper reports sensitivity = 0.95, FPR = 0.01
• We run the predictor on 20,000 proteins in an organism
• It predicts 130 members of the class. How many of them arelikely to be true positives?
• We need a baseline level of that class (fX ) in the genome todetermine this.
• Estimate ≈ 200 in gene complement, so fX = 0.01• fX = 0.01 =⇒ P(class|+ve) = 0.490 ≈ 0.5: 65 TP
Pritchard & Broadhurst (2014) Meth. Mol. Biol. 9:280-299 doi:10.1007/978-1-62703-986-4 4
Baserate FallacyIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
http://bit.ly/1EFbzCI
http://armchairbiology.blogspot.co.uk/2014/07/the-baserate-fallacy-revisited.html
A Literature ExampleIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Reported sensitivity ≈ 0.71, FPR ≈ 0.15
Arnold et al. (2009) PLoS Pathog. 5:e1000376 doi:10.1371/journal.ppat.1000376
Big Data: New ProblemsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Lots of high throughput experiments, and large datasets(but even more small datasets)
• Historically ill-formed data (sequences in Word documents,BLAST results pasted into notebooks).
• How do we connect all this data in a productive way?
This section influenced heavily by C. Titus Brown and Philip Bourne
Big Data: New ProblemsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Data management. Too often:“Goodbye to the student is goodbye to the data”
• Persistence of data resources (link rot, database entropy)
http://www.phdcomics.com/comics/archive.php?comicid=382
Big Data: New ProblemsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• How reproducible are computational results?
• Software/data versions prevent exact reproduction: 280h toreproduce one paper approximately - in the same lab!Garijo et al. (2013) PLoS One doi:10.1371/journal.pone.0080278
http://www.slideshare.net/pebourne/sib0114
Big Data: New ProblemsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Maybe we can get away with all of this in a traditional model ofscience publishing. . .
http://www.slideshare.net/c.titus.brown/2015-baltiandbioinformatics
Big Data: New ProblemsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
. . .but lots of biological data doesn’t make sense except in the lightof other biological data.
http://www.slideshare.net/c.titus.brown/2015-baltiandbioinformatics
Big Data: New SolutionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Everyone could be better off with collaboration and data sharing.
What is winning: career progression, or feeding people?(still competing, but on analysis and insight, not on who holds what data. . .)
http://www.slideshare.net/c.titus.brown/2015-baltiandbioinformatics
Big Data: New SolutionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Data quality ≈ data trust:
• Sustainable: storage, archiving, maintenance
• Findable: “where is the dataset?”, “is it available?”
• Queryable: “is X in the dataset?”
• Analysable: metadata, annotation
http://www.slideshare.net/pebourne/sib0114
Big Data: New SolutionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Interoperable digital assets: datasets, software, lab books, etc.
• Uniquely identified (DOI, PMID, etc.)
• Provenance (version and access control)
• Open standards - what data to keep, how to organise it:MINSEQE (sequencing), MIAME (microarray), MIASE (simulation), MIAPE
(proteomics), MIARE (RNAi), SBML, GFF3, SAM/BAM/CRAM, etc.
• Sustainable infrastructure for biological information(ELIXIR, “The Commons” [US], RDF, Open Data)
http://www.slideshare.net/pebourne/sib0114
https://pebourne.wordpress.com/2014/10/07/the-commons/
Big Data: New SolutionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Too much software is difficult to use for experts, or unusable fornon-experts.Veretnik et al. (2008) PLoS Comp. Biol. doi:10.1371/journal.pcbi.1000136
Big Data: New SolutionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Workflows, pipelines, and service integrative frameworks
Cock et al. (2014) Methods Mol. Biol. 1127:3-15 doi:10.1007/978-1-62703-986-4 1Cock et al. (2013) PeerJ 1:e167 doi:10.7717/peerj.167
http://galaxy-community.org.uk/
Big Data: New SolutionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Sometimes new software is needed.Writing good software is difficult, and expensive.
http://www.theregister.co.uk/2015/01/22/us military finds f35 software is a buggy mess/
Big Data: New SolutionsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Not enough software engineers to go round: train what we have.Programming literacy, computational thinking: versioned, readable,maintainable code.
http://www.software.ac.uk/http://software-carpentry.org/http://datacarpentry.org/
Table of ContentsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Introduction
Why Genomics?
2003-Now
Implications
Where Next?
Conclusions
Cheap Sequencing In The FieldIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Diagnostics and epidemic tracking by sequencingGlobal Microbial Identifier (GMI) http://www.globalmicrobialidentifier.org:
Global system of databases for microbial/disease identification and diagnostics.
Quick et al. (2014) BMJ Open 11:e006278 doi:10.1136/bmjopen-2014-006278
Sequencing In The FieldIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Live prediction for epidemiology?
(Peter Skelsey, JHI)
Sequence Isn’t EverythingIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Organisms are dynamic, and multi-scale
• Context: epigenetics, tissue differentiation, mesoscale systems,symbiosis, etc.
• Phenotypic plasticity: responses to environment - stress,temperature, etc.
The PhytobiomeIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Phytobiome: the plant, and its associated microbial community
• American Phytopathological Society “Phytobiomes Intitative”
• “a complete systems approach that spans foundational to appliedscience focused on downstream application”
• We are not at war with all microbes. . .
https://www.apsnet.org/members/outreach/ppb/phytobiomes
Genomes Are Parts ListsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
We know (some of) the bits that make up the machinery. . .
Flux Balance AnalysisIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Flux Balance Analysis: constraint-based static representation ofmetabolism (RNA/ChIP-seq adds dynamics to models)
• Set upper, lower bounds to reaction rate, define objective phenotype(biomass, target flux profile)
• in silico knockouts; viable states; nutrient usage
• A basis for synthetic biology and engineering
Flux Balance AnalysisIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Dickeya: 29 × FBA, host range ≈ nutrient-dependent growthalso transposon mutant libraries
(w/ Sonia Humphris, Ian Toth, JHI)
Plant-Microbe Interactions Are SystemsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Components, interactions, dynamics etc. = systems biologyInteraction creates a third system from host and microbe
Pritchard & Birch (2014) Mol. Plant. Pathol. 15:865-870 doi:10.1111/mpp.12210
Pritchard & Birch (2011) Plant Sci. 180:584-603 doi:10.1016/j.plantsci.2010.12.008
Plant-Microbe Interactions Are SystemsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Components, interactions, dynamics etc. = systems biologyInteraction creates a third system from host and microbe
microbe(bulk)
microbe(local)
PRR PRR*
R protein
R protein*
ø
øøe�ector translocation
e�ector(internalised)
PAMP
ø
ø
cell wall
microbe approaches cell microbe leaves cell/is destroyed
microbe producesPAMP
microbe producese�ector
PAMP bindingactivates PRR
e�ector bindingactivates R protein
calloseproduction
calloseloss
e�ectorloss
e�ectorloss
PAMPloss
enhanced by callose (PTI)and R protein* (ETI)
enhanced byPRR* (PTI)
slowed by callose (PTI)
callose
e�ector(external)
enhanced bye�ector action
No Response PTI
PTI+ETS PTI+ETS+ETI
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0 50 100 150 200 0 50 100 150 200Time
Arb
itrar
y un
its
variable
Callose
Pathogen
Pathogen, Callose timecourses by host type
Pritchard & Birch (2014) Mol. Plant. Pathol. 15:865-870 doi:10.1111/mpp.12210
Pritchard & Birch (2011) Plant Sci. 180:584-603 doi:10.1016/j.plantsci.2010.12.008
Integrate Models and DataIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Integration of models and datasets still a challenge
• Models at different scales
• Kinetic, metabolomic, proteomic, transcriptomic, genomicdatasets
Hartmann & Schreiber (2014) Front. Bioeng. Biotechnol. 8:226-244 doi:10.3389/fbioe.2014.00091
Types of ModelIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
• Combining data: models at different scales.
• Information required/produced depends on model type.
• Size/detail trade-off
Hartmann & Schreiber (2014) Front. Bioeng. Biotechnol. 8:226-244 doi:10.3389/fbioe.2014.00091/abstract
Synthetic BiologyIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Engineering new response modes into crops.
Gurr & Rushton (2005) Trends Biotech. 23:283-290 doi:10.1016/j.tibtech.2005.04.009
Genome EditingIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
TALENs and CRISPR/Cas9s
http://www.lifetechnologies.com/
http://www.umassmed.edu/xuelab
Trait StackingIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
For resistance and other beneficial traits (yield, nutrients, biofuels)
Vanholme et al. (2010) Trends Biotechnol. 28:543-547 doi:10.1016/j.tibtech.2010.07.008
Engineering Soil-Beneficial MicrobesIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Refactoring of Klebsiella nitrogen fixation:
Temme et al. (2012) Proc. Natl. Acad. Sci. USA 10:763 doi:10.1073/pnas.1120788109
Engineering New BiologyIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
dCas9 logic circuits, integrating with host regulation
Nielsen & Voigt (2014) Mol. Syst. Biol. 10:763 doi:10.15252/msb.20145735
Table of ContentsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Introduction
Why Genomics?
2003-Now
Implications
Where Next?
Conclusions
DataIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
Sequencing is ever cheaper and more productive:
• Very large datasets
• More information (with good planning)
• Challenges for data storage and sharing
• Challenges for analysis (“why” vs. “what”)
• Challenges for software, accessibility (workflows,multidisciplinary training)
• Interdisciplinary collaboration and data integration willbe essential
Systems/SyntheticsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
A parts list only gets us so far:
• Cells are dynamic biophysical systems
• Organisms are dynamic cellular systems
• ‘Real’ plant systems include the phytobiome
• Systems biology essential to understand plant-microbeinteractions
• Synthetic biology promises to be a powerful tool to improveplant health, nutrition, etc.
• BUT: ethical issues around deployment of synthetic systems
AcknowledgementsIntroduction Why Genomics? 2003-Now Implications Where Next? Conclusions
James Hutton InstitutePaul BirchEmma CampbellPeter CockIngo HeinNicola HoldenSonia HumphrisFlorian JupeIan TothNUI GalwayFlorence AbramFiona BrennanUniversity of AberdeenKen ForbesNorval Strachan
University of AlbertaDavid BroadhurstSASAVincent MulhollandGerry SaddlerFeraValerie BertrandJohn ElphinstoneRachel GloverNeil ParkinsonUniversity of MunsterMartina BielaszewskaHelge KarchUniversity of SalfordNatalie FerryRyan Joynson
And many others!