Dr Aron Fazekas - Plant DNA Barcoding; data workflow

24
Plant DNA Barcoding: data workflow Aron Fazekas University of Guelph

description

Dr Fazekas process for checking and editing DNA sequences before publishing on BOLD.

Transcript of Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Page 1: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Plant DNA Barcoding: data workflow

Aron Fazekas University of Guelph

Page 2: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Plant DNA Barcoding: data workflow

Workflow Outline:

raw sequence editing

data alignment

re-edit the sequence file

upload to BOLD

quality checks using BOLD / genbank

Page 3: Dr Aron Fazekas - Plant DNA Barcoding; data workflow
Page 4: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Sequence editing: primer trimming

Page 5: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

5’ GTTATGCATGAACGTAATGCTC

GAGCATTACGT….

Sequence editing: primer trimming

Page 6: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Sequence editing: primer trimming

Page 7: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Sequence editing: editing miscalls

Page 8: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Sequence editing: congruence between forward/ reverse reads

Page 9: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Sequence Alignment

rbcL easy to align - most programs work wellmatK tricky to align – TransAlign seems to do the best job

trnH difficult (impossible between genera?)ITS difficult (impossible between genera?)

After editing: need to align the dataKelchner (2000) Ann Missouri Bot

Gard

Clustal www.clustal.orgTransAlign http://www.biomedcentral.com/1471-2105/6/156K-Align http://www.ebi.ac.uk/Tools/msa/kalign/

Page 10: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Problems to look for after alignment:

- primers not trimmed

- gaps at the ends

- gaps in the middle (protein coding)

- translation shows stop codons

Sequence Alignment

Page 11: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

- primers not trimmed trnH-psbAReal data submitted for publication

- gaps at the ends

Page 12: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

rbcLdata submitted for publication - gaps in the middle of a

coding region

Page 13: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Translate coding regions (rbcL, matK) to ensure there are no stop codons present

Page 14: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Edit both the alignment file and the original sequence file

Page 15: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Can trnH-psbA (or other non-coding sequence) be aligned across diverse species?

Page 16: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Upload to BOLD

Page 17: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

After data is edited, aligned: use BOLD to create a tree

Page 18: Dr Aron Fazekas - Plant DNA Barcoding; data workflow
Page 19: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

• Check for misplaced taxa – remove them from the dataset

• Check for singleton species – make a list

Page 20: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

BOLD BLAST check

Page 21: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Genbank BLAST check

Page 22: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Genbank BLAST check

Page 23: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Genbank Blast

Page 24: Dr Aron Fazekas - Plant DNA Barcoding; data workflow

Acknowledgements

Sujeevan Ratnasingham & Bold Team