In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J....
-
Upload
leslie-james -
Category
Documents
-
view
213 -
download
0
Transcript of In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J....
In the Pursuit of Optimal Sequence Trimming Parameters
for EST Projects
Fabiano C. Peixoto & J. Miguel Ortega
LCC-CENAPAD
A
T
GCBIOINFORMÁTICA UFMG
Noticed:
• BLAST results• Phred 15• Too much trimming
0
10
20
30
40
50
Query: 469 TTAGGAGGATCGTTTTTAGAATCCCCTGCAACGTTACCACGGTGGATTTCACTGACTGCG 528 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 1038 ttaggaggatcgtttttagaatcccctgcaacgttaccacggtggatttcactgactgcg 979
Query: 529 ACGTTCTTAACGTTGAATCCAACGTTGCTACCAgggagagcctcagtaagtgcttcatga 588 ||||||||||||||||| || |||||||||||||||||| ||||||||||||||||||||Sbjct: 978 acgttcttaacgttgaagcccacgttgctaccagggagaccctcagtaagtgcttcatga 919
Query: 589 tgcatttcgacagaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccata 648 |||||||||||||| |||||||||| |||| ||||||||||| |||||||||||||||||Sbjct: 918 tgcatttcgacagacttgacttcagccgaccaaccttgcggaccaaaagtgacgaccata 859
Query: 649 ccaggcttgatgataccagtttcaacgc 676 ||||||||||||||||||||||||||||Sbjct: 858 ccaggcttgatgataccagtttcaacgc 831
.TGAAGCTTTCAGCTTCTTTAGGAGGATCGTTTTTAGAATCCCCTGCAACGTTACCACGGTGGATTTCACTGACTGCGACGTTCTTAACGTTGAATCCAACGttGCTACCAgggagagcctcagtaagtgcttcatgatgcatttcgacagaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccataccaggcttgatgataccagtttcaacgcctcggggccaggctggcgtgaacagggcctagcgggtccgcgggggaagggtcccggctcaatccaccaatagagcggagctaaagtgacgggggcgcca
Phred 15
Experimental approach
Sequences:
•pUC18 plasmidial vector (published sequence)•Sequence reaction:
•Single pool - 3 plates (96 samples)•MegaBACE sequencer
•3 reads for each plate, esd processing - 846 reads
Processing:
•BLAST (MegaBLAST, as in UniGene)•Phred
•trim: a chromatogram analyzer•trim_alt: trim_cutoff parameter 1% up to 25%
-500
-400
-300
-200
-100
0
100
200
1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
Trim_cutoff parameter value(%)
Nu
mb
er
of
ba
se
s
Included (trim) Discarded (trim) Included (TrimAlt) Discarded(TrimAlt)
-500
-400
-300
-200
-100
0
100
200
1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
Trim_cutoff parameter value(%)
Nu
mb
er
of
Ba
se
s (
x 1
00
0)
Included (trim) Discarded (trim) Included (TrimAlt) Discarded (TrimAlt) Total (trim_alt)
0
100
200
300
400
500
600
700
800
900
1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
Trim_cutoff parameter value
Nu
mb
er
of
Se
qu
en
ce
s
Included (Trim) Discarded (Trim) Included (TrimAlt) Discarded (TrimAlt)
0,00%
5,00%
10,00%
15,00%
20,00%
25,00%
30,00%
1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
total miscall stepwise miscall
16% 17%
Trim_alt sequence
BLAST
gaps/missmatches(% of bases)
Additionalbases
3%
Conclusions
•trim_alt algorithm can be used with the trim_cutoff parameter up to 18%,
without including miscalled bases
•trim_alt algorithm with the proper parameters is capable of recovering more information than the trim algorithm
•other trimming algorithms, such as window- based ones, may also be analyzed in the same way
Aknowledgements
Sequences:
•Laboratório de Genética e Bioquímica•Laboratório de Imunologia de Doencas Infecciosas•Laboratório de Biodiversidade e Evoluçâo Molecular•Marina M. Mourão , Lucila Grossi and Renata A. Ribeiro (UFMG, Rede Genoma de Minas Gerais)
Computing facilities:
•CENAPAD-MG/CO