Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Post on 01-Dec-2014

1.956 views 4 download

description

Dr Fazekas process for checking and editing DNA sequences before publishing on BOLD.

Transcript of Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Plant DNA Barcoding: data workflow

Aron Fazekas University of Guelph

Plant DNA Barcoding: data workflow

Workflow Outline:

raw sequence editing

data alignment

re-edit the sequence file

upload to BOLD

quality checks using BOLD / genbank

Sequence editing: primer trimming

5’ GTTATGCATGAACGTAATGCTC

GAGCATTACGT….

Sequence editing: primer trimming

Sequence editing: primer trimming

Sequence editing: editing miscalls

Sequence editing: congruence between forward/ reverse reads

Sequence Alignment

rbcL easy to align - most programs work wellmatK tricky to align – TransAlign seems to do the best job

trnH difficult (impossible between genera?)ITS difficult (impossible between genera?)

After editing: need to align the dataKelchner (2000) Ann Missouri Bot

Gard

Clustal www.clustal.orgTransAlign http://www.biomedcentral.com/1471-2105/6/156K-Align http://www.ebi.ac.uk/Tools/msa/kalign/

Problems to look for after alignment:

- primers not trimmed

- gaps at the ends

- gaps in the middle (protein coding)

- translation shows stop codons

Sequence Alignment

- primers not trimmed trnH-psbAReal data submitted for publication

- gaps at the ends

rbcLdata submitted for publication - gaps in the middle of a

coding region

Translate coding regions (rbcL, matK) to ensure there are no stop codons present

Edit both the alignment file and the original sequence file

Can trnH-psbA (or other non-coding sequence) be aligned across diverse species?

Upload to BOLD

After data is edited, aligned: use BOLD to create a tree

• Check for misplaced taxa – remove them from the dataset

• Check for singleton species – make a list

BOLD BLAST check

Genbank BLAST check

Genbank BLAST check

Genbank Blast

Acknowledgements

Sujeevan Ratnasingham & Bold Team