Biodiversity initiative:
Integrating Taxonomy, Genomics and Biodiversity
+ +
= ?????Speaker: Benjamin Linard
Alfried Vogler Team
Arthropods metagenomics 1 / 8
DNA extraction
PCR barcodes
Mixed sample
All 480 beetles Est. 288
species
1x Illumina MiSeq run
(8.5Gb)
Mitochondrial contigs
De novo assembly into
contigs
Pool DNA by
volume
1 SAMPLE: 480 beetle specimens captured in Borneo
Mitochondrial DNA 2 / 8
Harpalinae
Chrysomelidae
Coccinelidae
Curculionidae
Tenebrionidae
Buprestidae
Log(Biomass) = -2.37 + 0.85(log(No. Reads))P<0.001; F1,84=73.32; R2=0.47
~5% beetle mitochondrial DNA
Shotgun output
No. reads 33,796,432
Est. proportion mitochondrial reads 4.94%
Complete mitogenomes 35
Partial mitogenomes >10kb 85
Partial mitogenomes2-10kb 420
Results of Alex Crampton Plat
Genomic information ? 3 / 8
~95% genomic information ,
~45 % is Coleoptera DNA
~5% beetle mitochondria TaxomomyAbundance
Genomic analysesFunctional information ?
Tribolium castaneum, chromosome 3
# ho
mol
ogou
s co
ntig
s
position
Homologous contig % GC Chromosome region with known sequence
NNNN region (unresolved sequence)
Homologous DNAbetween 4
beetle metagenomic samples
Arthropods metagenomics
Computational requirements to analyse 1 arthropod soup :
Server: ~128Gb RAM, 24 cores Xeon 2.4 GHz
Assemblies Type RAM (Gb) Time (6 cores) Disk (Go)
Mitochondrial < 10 < 12 hours < 30
Total DNA ~ 100 ~ 5 days ~ 300-500
( in the best case... when data complexity is manageable by current algorithms )
Our last DNA assembly
One assembly at a time, unpredictible risk of memory overload
Several assemblies (~1.5 per week)
Successful
Aborted
4 / 8
Arthropods metagenomics
Computational requirements to analyse 1 arthropod soup :
Server: ~128Gb RAM, 24 cores Xeon 2.4 GHz
Assemblies Type RAM (Gb) Time (6 cores) Disk (Go)
Mitochondrial < 10 < 12 hours < 30
Total DNA ~ 100 ~ 5 days ~ 300-500
( in the best case... when data complexity is manageable by current algorithms )
5 / 8
Genomicanalyses :
Type RAM (Gb) Time (6 cores) Disk (Go)
Homology/ alignments < 2 ~ 5 days < 10
Statistics / graphs < 2 ~ 1 day < 2
Need support of SQLdatabase.
Currenlty ~ 300 Gb
5 / 8
Future ?for the analysisof 1 arthropod soup
~1 000 arthropods trancriptomes/genomes~50 beetle species transcriptomes~50 beetle draft/complete genomes
Long term perspective: Disk space consuming... # more reference genomes # larger/more complex databases
Growth of analysis pipeline !More CPU to perform a complete metagenomic analyses
• standard MIGS (D Field & al, 2008)• standards MINIMESS (J Raes & al, 2007)
Biodiversity & functional analysis
Arthropods biodiversity
n traps per site:(soil, canopy,Ground…)
N plots
Mitochondrial analysis, not a problem
Full DNA analysis... Which computational power could we access ? More computations to answer more interesting questions
Pooling DNA? We will loose metagenomic resolution... More complex assemblies
Many soup analyses...
6 / 7
A source of DNA collection 7 / 8
Metadata (1 arthropod soup) General: Sampling localisation, date, methods...
Mitochondrial information: Identified species/taxonsAbundanceOther identified species (plants, fungi...)
Genomic information: Identified genesIdentified functions (sugar degradation)
Soup metadata125 species,
French Guyana
Soup metadata24 unidentified tenebrionidae
from Madagascar
Soup metadata53 species +
abundance data
Soup metadata24 species,
Congo,
Scientificcommunity
in NHM
accessNHM data
portal
Kemu
Integration, links, queries ...
“I want all NHM collection data concerning the species X”
Data storage ? NHM databases integration ?
8 specimens, 4 images, 2 metagenomic samples ...
Thank youfor your attention. BEETLE SOUP,
Your daily source of DNA!
BEETLE
SOUP
Top Related