Assembly: before and after

50
Assembly – before and after Lex Nederbragt [email protected] @lexnederbragt

description

A talk I gave at the Dec 2013 Assembly Masterclass at UC Davis. Really licensed under CC0. UPDATED May 2014, for the presentation I gave at the combined SeRC Nordic Assembly Workshop in Stockholm, Sweden, May 14th 2014

Transcript of Assembly: before and after

Page 1: Assembly: before and after

Assembly – before and after

Lex [email protected]

@lexnederbragt

Page 2: Assembly: before and after

A warning

The list is by no means complete

Nor do we have experience with all the programs mentioned

Page 3: Assembly: before and after

Sample

DNA

Reads

Genome assembly

Sequencing AssemblyDNA isolation

QC QCQC

Page 4: Assembly: before and after

Reads

Genome

assembly

Assembly

QC

Page 5: Assembly: before and after

Fastqc

Page 6: Assembly: before and after

Prinseq

Page 7: Assembly: before and after

Many others…

www.nipgr.res.in/ngsqctoolkit.html

Page 8: Assembly: before and after

preqc (sga)

http://arxiv.org/abs/1307.8026

Page 9: Assembly: before and after

Reads

Genome

assembly

Assembly

Grooming

Page 10: Assembly: before and after

Format conversion

http://en.wikipedia.org/wiki/FASTQ_format

Fastq format hell

Page 11: Assembly: before and after

Adapter/quality trimming

http://www.biostars.org/p/53528/

Celera assemblerOverlap based trimming

Fastx ToolkitSeqtkPrinSeqNGS QC ToolkitTrimmomaticBioPiecesCutadapt……

Page 12: Assembly: before and after

Mate pair splitting and orientation

150 – 600 bases

Illumina paired end reads

2 – 40 kilobases

Illumina mate pair reads

2 – 40 kilobases

454 mate pair reads

linker

Page 13: Assembly: before and after

Mate pair splitting and orientationIllumina paired end reads

Illumina mate pair reads

454 mate pair reads

linker

junctionjunction

+ +

paired end reads ‘contamination’

Page 14: Assembly: before and after

Mate pair splitting and orientationIllumina paired end reads

Illumina mate pair reads

454 mate pair reads

linker

junctionjunction

+ +

paired end reads ‘contamination’

Check what orientation your assembler expects

for the reads!

Page 15: Assembly: before and after

Reads

Genome

assembly

AssemblyPreparing

Page 16: Assembly: before and after

Error-correctionStand-alone or built into assembler

Page 17: Assembly: before and after

Merging pairs

List from Torsten Seeman’s bloghttp://thegenomefactory.blogspot.no/2012/11/tools-to-merge-overlapping-paired-end.html

COPE http://sourceforge.net/projects/coperead/SeqPrep https://github.com/jstjohn/SeqPrepFLASH http://www.cbcb.umd.edu/software/flashfastq-join http://code.google.com/p/ea-utils/wiki/FastqJoinPANDAseq https://github.com/neufeld/pandaseqmergePairs.py http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py

Recent addition

Page 18: Assembly: before and after

Extend reads

http://140.116.235.124/~tliu/arf-pe/

Page 19: Assembly: before and after

Digital normalisation

http://arxiv.org/abs/1203.4802

Page 20: Assembly: before and after

Estimate kmer to use

preqc (SGA)

http://arxiv.org/abs/1307.8026

Page 21: Assembly: before and after

Reads

Genome

assembly

Assembly

What can the reads tell us about the genome

Page 22: Assembly: before and after

kmer-based

preqc (SGA)

Kmerspectrumanalyzer

http://arxiv.org/abs/1307.8026

Khmer from Titus

Page 23: Assembly: before and after

Reads

Genome

assembly

Assembly

This talk

Page 24: Assembly: before and after

Reads

Genome

assembly

Assembly

QC

Page 25: Assembly: before and after

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Page 26: Assembly: before and after

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Page 27: Assembly: before and after

Assemblathon stats

http://korflab.ucdavis.edu/datasets/Assemblathon/Assemblathon2/Basic_metrics/assemblathon_stats.pl

OR

https://github.com/lexnederbragt/sequencetools/

Page 28: Assembly: before and after

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Page 29: Assembly: before and after

Gap closing

IMAGE2

Page 30: Assembly: before and after

Correcting bases

Quiver from Pacific Biosciences

Page 31: Assembly: before and after

Separate scaffolding

Page 32: Assembly: before and after

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Page 33: Assembly: before and after

Assembly merging/reconciliation

Page 34: Assembly: before and after

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Page 35: Assembly: before and after

Mapped genomic reads

FRCBAM

Page 36: Assembly: before and after

Mapped transcriptomic reads

Page 37: Assembly: before and after

Gene finding

Page 38: Assembly: before and after

Binning

Nederbragt et al, 2010

Page 39: Assembly: before and after

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Page 40: Assembly: before and after

Genome browser(s)IGV

Page 41: Assembly: before and after

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Page 42: Assembly: before and after

Comparative measures

Log Average Probability (LAP)

Assembly Likelihood Evaluation (ALE)

See also Howison, Zapata2 and Dunn (2013) Toward a statistically explicit understanding of de novo sequence

assembly doi: 10.1093/bioinformatics/btt525

Page 43: Assembly: before and after

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Page 44: Assembly: before and after

Reference comparison

Mauve assembly metrics

Page 45: Assembly: before and after

Review

Page 46: Assembly: before and after

Too many tools…

http://seqanswers.com/wiki/Software/list

Page 47: Assembly: before and after

Too many tools…

http://wwwdev.ebi.ac.uk/fg/hts_mappers

88 short-read mappers

Page 48: Assembly: before and after

Embargo!

Page 49: Assembly: before and after

Benchmarking, anyone?

Page 50: Assembly: before and after

All-in-one assembly pipeline

doi:10.1186/1471-2105-15-126