Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis...

38
Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18 th -29 th January , 2010

Transcript of Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis...

Page 1: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Sequence Analysis with Artemis

and

Artemis Comparison Tool (ACT)

Carribean Bioinformatics Workshop18th-29th January , 2010

Page 2: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

atcttttacttttttcatcatctatacaaaaaatcatagaatattcatcatgttgtttaaaataatgtattccattatgaactttattacaaccctcgtt

tttaattaattcacattttatatctttaagtataatatcatttaacattatgttatcttcctcagtgtttttcattattatttgcatgtacagtttatca

tttttatgtaccaaactatatcttatattaaatggatctctacttataaagttaaaatctttttttaattttttcttttcacttccaattttatattccg

cagtacatcgaattctaaaaaaaaaaataaataatatataatatataataaataatatataataaataatatataatatataataaataatatataatat

ataatatataataaataatatataatatataatatataataaataatatataataaataatatataatatataatatataatactttggaaagattattt

atatgaatatatacacctttaataggatacacacatcatatttatatatatacatataaatattccataaatatttatacaacctcaaataaaataaaca

tacatatatatatataaatatatacatatatgtatcattacgtaaaaacatcaaagaaatatactggaaaacatgtcacaaaactaaaaaaggtattagg

agatatatttactgattcctcatttttataaatgttaaaattattatccctagtccaaatatccacatttattaaattcacttgaatattgttttttaaa

ttgctagatatattaatttgagatttaaaattctgacctatataaacctttcgagaatttataggtagacttaaacttatttcatttgataaactaatat

tatcatttatgtccttatcaaaatttattttctccatttcagttattttaaacatattccaaatattgttattaaacaagggcggacttaaacgaagtaa

ttcaatcttaactccctccttcacttcactcattttatatattccttaatttttactatgtttattaaattaacatatatataaacaaatatgtcactaa

taatatatatatatatatatatatatatatatattataaatgttttactctattttcacatcttgtccttttttttttaaaaatcccaattcttattcat

taaataataatgtattttttttttttttttttttttttattaattattatgttactgttttattatatacactcttaatcatatatatatatttatatat

atatatatatatatatatatattattcccttttcatgttttaaacaagaaaaaaaactaaaaaaaaaaaaaataataaaatatatttttataacatatgt

attattaaaatgtatatataaaaatatatattccatttattattatttttttatatacattgttataagagtatcttctcccttctggtttatattacta

ccatttcactttgaacttttcataaaaattaatagaatatcaaatatgtataatatataacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaata

tatatatatatatatatacatataatatatatttcatctaatcatttaaaattattattatatattttttaaaaaatatatttatgataacataaaaaga

atttaattttaattaaatatatataattacatacatctaatattattatatatatataataagttttccaaatagaatacttatatattatatatatata

tatatatatatatattcttccataaaaagaataaaataaaataaaaacaccttaaaagtatttgtaaaaaattccccacattgaatatatagttgtattt

ataaaattaaagaaaaagcataaagttaccatttaatagtggagattagtaacattttcttcattatcaaaaatatttatttcctaattttttttttttg

taaaatatatttaaaaatgtaatagattatgtattaaataatataaatatagcaaaatgttcaattttagaaatttgcctctttttgacaaggataattc

aaaagatacaggtaaaaaaaaaaaaataaagtaaaacaaaacaaaacaaaaaacaaaaaaaaaaaaaaaaaaaaaaatgacatgttataatataatataa

taaataaaaattatgtaatatatcataatcgaagaaacatatatgaaaccaaaaagaaacagatcttgatttattaatacatatataactaacattcata

tctttatttttgtagatgatataaaaaattttataaactcttatgaagggatatatttttcatcatccaataaatttataaatgtatttctagacaaaat

tctgatcattgatccgtcttccttaaatgttattacaataaatacagatctgtatgtagttgatttcctttttaatgagaaaaataagaatcttattgtt

ttagggtaatgaaatatatatagatttatatttttatttatttattatatattattttttaatttttcttttatatatttattttatttagtgtataaaa

tgatatcctttatatttatatttacatgggatattcaaataataacaaaaatgagtatacacatatatatatatatatatatatatatgtatattttttt

tttttttttatgttcctataggaaagggaagaattcactgatttgtagtgtttacaatattagggaatgcaactttacacttttgaaaaaaattcagtta

agcaaaaatattaataacattaaaaagacactgatagcaaaatgtaatgaatatataataacattagaaaataagaaaattactttttatttcttaaata

aagattatagtataaatcaaagtgaattaatagaagacggaaaagaacttattgaaaatatctatttgtcaaaaaatcatatcttgttagtaataaaaaa

ttcatatgtatatatataccaattagatattaaaaattcccatattagttatacacttattgatagtttcaatttaaatttatcctacctcagagaatct

ataaataataaaaaaaagcatataaataaaataaatgatgtatcaaataatgacccaaaaaaggataataatgaaaaaaatacttcatctaataatataa

Page 3: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

atcttttacttttttcatcatctatacaaaaaatcatagaatattcatcatgttgtttaaaataatgtattccattatgaactttattacaaccctcgtt

tttaattaattcacattttatatctttaagtataatatcatttaacattatgttatcttcctcagtgtttttcattattatttgcatgtacagtttatca

tttttatgtaccaaactatatcttatattaaatggatctctacttataaagttaaaatctttttttaattttttcttttcacttccaattttatattccg

cagtacatcgaattctaaaaaaaaaaataaataatatataatatataataaataatatataataaataatatataatatataataaataatatataatat

ataatatataataaataatatataatatataatatataataaataatatataataaataatatataatatataatatataatactttggaaagattattt

atatgaatatatacacctttaataggatacacacatcatatttatatatatacatataaatattccataaatatttatacaacctcaaataaaataaaca

tacatatatatatataaatatatacatatatgtatcattacgtaaaaacatcaaagaaatatactggaaaacatgtcacaaaactaaaaaaggtattagg

agatatatttactgattcctcatttttataaatgttaaaattattatccctagtccaaatatccacatttattaaattcacttgaatattgttttttaaa

ttgctagatatattaatttgagatttaaaattctgacctatataaacctttcgagaatttataggtagacttaaacttatttcatttgataaactaatat

tatcatttatgtccttatcaaaatttattttctccatttcagttattttaaacatattccaaatattgttattaaacaagggcggacttaaacgaagtaa

ttcaatcttaactccctccttcacttcactcattttatatattccttaatttttactatgtttattaaattaacatatatataaacaaatatgtcactaa

taatatatatatatatatatatatatatatatattataaatgttttactctattttcacatcttgtccttttttttttaaaaatcccaattcttattcat

taaataataatgtattttttttttttttttttttttttattaattattatgttactgttttattatatacactcttaatcatatatatatatttatatat

atatatatatatatatatatattattcccttttcatgttttaaacaagaaaaaaaactaaaaaaaaaaaaaataataaaatatatttttataacagatgt

attattaaaatgtatatataaaaatatatattccatttattattatttttttatatacattgttataagagtatcttctcccttctggtttatattacta

ccatttcactttgaacttttcataaaaattaatagaatatcaaatatgtataatatataacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaata

tatatatatatatatatacatataatatatatttcatctaatcatttaaaattattattatatattttttaaaaaatatatttatgataacataaaaaga

atttaattttaattaaatatatataattacatacatctaatattattatatatatataataagttttccaaatagaatacttatatattatatatatata

tatatatatatatattcttccataaaaagaataaaataaaataaaaacaccttaaaagtatttgtaaaaaattccccacattgaatatatagttgtattt

ataaaattaaagaaaaagcataaagttaccatttaatagtggagattagtaagtttttcttcattatcaaaaatatttatttcctaattttttttttttg

taaaatatatttaaaaatgtaatagattatgtattaaataatataaatatagcaaaatgttcaattttagaaatttgcctctttttgacaaggataattc

aaaagatacaggtaaaaaaaaaaaaataaagtaaaacaaaacaaaacaaaaaacaaaaaaaaaaaaaaaaaaaaaaatgacatgttataatataatataa

taaataaaaattatgtaatatatcataatcgaagaaacatatatgaaaccaaaaagaaacagatcttgatttattaatacatatataactaacattcata

tctttatttttgtagatgatataaaaaattttataaactcttatgaagggatatatttttcatcatccaataaatttataaatgtatttctagacaaaat

tctgatcattgatccgtcttccttaggtgttattacaataaatacagatctgtatgtagttgatttcctttttaatgagaaaaataagaatcttattgtt

ttagggtaatgaaatatatatagatttatatttttatttatttattatatattattttttaatttttcttttatatatttattttatttagtgtataaaa

tgatatcctttatatttatatttacatgggatattcaaataataacaaaaatgagtatacacatatatatatatatatatatatatatgtatattttttt

tttttttttatgttcctataggaaagggaagaattcactgatttgtagtgtttacaatattagggaatgcaactttacacttttgaaaaaaattcagtta

agcaaaaatattaataacattaaaaagacactgatagcaaaatgtaatgaatatataataacattagaaaataagaaaattactttttatttcttaaata

aagattatagtataaatcaaagtgaattaatagaagacggaaaagaacttattgaaaatatctatttgtcaaaaaatcatatcttgttagtaataaaaaa

ttcatatgtatatatataccaattagatattaaaaattcccatattagttatacacttattgatagtttcaatttaaatttatcctacctcagagaatct

ataaataataaaaaaaagcatataaataaaataaatgatgtatcaaataatgacccaaaaaaggataataatgaaaaaaatacttcatctaataatataa

Sequencing is just the

beginning of the process

Extracting information &

interpreting

What´s there

where are the genes

which genes

how to find them?

SEQUENCE ANNOTATION

Page 4: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Strategies for sequence annotation

Predictive methods

Comparative methods

Experimental methods

Interpretation of the DNA sequence into genes

according to rules

Page 5: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January
Page 6: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January
Page 7: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Strategies for sequence annotation

Predictive methods

Comparative methods

Experimental methods

Interpretation of the DNA sequence into genes

according to rules

Interpretation of the DNA sequence into genes

according to similarities with other sequences

Page 8: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January
Page 9: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Strategies for sequence annotation

Predictive methods

Comparative methods

Experimental methods

Interpretation of the DNA sequence into genes

according to rules

Interpretation of the DNA sequence into genes

according to similarities with other sequences

Interpretation of the DNA sequence into genes

according to experimental results (e.g. cDNA)

Page 10: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

EST Blast Hit

Page 11: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs:

ORFs and CDSs

ORFs are not equivalent to CDSs

Not all open reading frames are coding sequences

Page 12: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction

Gene finderGlimmer

Orpheus PHAT

GeneMark

Page 13: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene finding programs

• Genefinding software packages use Hidden

Markov Models.

• Predict coding, intergenic and intron

sequences

• Need to be trained on a specific organism.

• Never perfect!

Page 14: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs: Problems

• ORFs are not equivalent to CDSs

• Gene prediction programs find new genes that share

properties with a given set of genes.

• They can be confounded by:

– Sequence constraints (ribosomal proteins etc.)

– Sequence biases

– Different sets of genes

– Horizontal gene transfer

– Non-coding DNA

Page 15: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs: Problems

Different gene training sets: Plasmodium falciparum

Original annotation

Updated annotation

Page 16: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs: Problems

Non-protein coding regions: S. typhi ribosomal RNA genes

glimmer

genefinder

final

orpheus

glimmer

genefinder

final

orpheus

Page 17: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs: ProblemsNon-protein coding regions: N. meningitidis DNA repeats

glimmer

orpheus

final

glimmer

orpheus

final

Page 18: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs: Problems

Pseudogenes

M. leprae

Page 19: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs: Problems

Pseudogenes: M. leprae

Glimmer

Page 20: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs: Problems

Pseudogenes: M. lepraePseudogenes: M. leprae

ORPHEUS

Page 21: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs: Problems

Pseudogenes: M. leprae

WUBLASTX vs. M. tuberculosis

Page 22: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene prediction programs: Problems

Pseudogenes: M. leprae

Final annotation

Page 23: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

The Gene Prediction Process

DNA SEQUENCE

AN

NA

LY

SIS

SO

FT

WA

RE

Usefull

CDS

Prediction

Annotator

AT content

Gene finders

Codon Usage

BlastX

FASTA

ESTs

Page 24: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Eukaryotic gene

AAAAAAAAAACAP

AAAAAAAAAACAP

TTTTTTTTT

TTTTTTTTT

intron Exon II5’UTR Exon Istop

3’UTR

EST

cDNA

mRNA

EST

Exon III

ATG GT AG GT AG

Page 25: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

AT content

• Coding regions have higher GC content in

AT rich genomes

Page 26: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

AT content

Page 27: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

CODON USAGE

• Codon bias is different for each organism.

• DNA content in coding regions is restricted

– but it is not restricted in non coding regions.

• The codon usage for any particular gene can influence expression.

Page 28: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Codon usage

• All organisms have a preferred set of

codons.

Malaria TrypanosomaGUU 0.41 GUU 0.28

GUC 0.06 GUC 0.19

GUA 0.42 GUA 0.14

GUG 0.11 GUG 0.39

Page 29: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Codon Usage

• http://www.kazusa.or.jp/codon/

Page 30: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Codon Usage in Artemis

Forward

frames

Reverse

frames

Page 31: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Codon usage & gene finding in : Leishmania

Page 32: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

GC frame plot

• Plots the third position GC content of each

frame of a DNA sequence.

• In coding DNA the GC content of the 3rd

base is often higher.

• Good prediction of coding in malaria and

trypanosomes.

Page 33: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

GC frame plot of tubulin gene cluster on T. brucei Chr 1

Page 34: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Homology Data

• Coding regions are more conserved than non

coding regions due to selective pressure.

• Comparing all possible translations against

all known proteins will give clues to known

genes.

• Blastx

Page 35: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene finding: using ACT

TBLASTX comparisons

P. knowlesi

P. falciparum

P. yoelii

Page 36: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Gene finding by RNA-Seq(Transcriptional landscape of Neospora caninum Tachyzoites

Day 3 Tachyzoites (RNAseq)

Day 4 Tachyzoites (RNAseq)

Page 37: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

Day 3 Tachyzoites (RNAseq)

Day 4 Tachyzoites (RNAseq)

N. caninum Chr08

T. gondii Chr08

5’ UTR 3’ UTR

TBLASTX matches visualised in ACT

Transcriptome sequencing in Neospora(RNAseq is useful for predicting/confirming UTR boundaries)

Page 38: Sequence Analysis with Artemis and Artemis Comparison Tool ... · Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18th-29th January

RNA-Seq: correcting gene models

Before

%GC

After

%GC

__16hr, __32hr, __48hr