(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against...
-
Upload
derek-shaw -
Category
Documents
-
view
217 -
download
0
Transcript of (PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against...
(PSI-)BLAST & MSAvia Max-Planck
• Where? (to find homologues)
• Structural templates- search against the PDB
• Sequence homologues- search against SwissProt or Uniprot (recommended!)
• How many?
• As many as possible, as long as the MSA looks good (next week…)
General Issues
• How long? (length of homologues)
• Fragments- short homologues (less than 50,60% the query’s length) = bad alignment
• Ensure your sequences exhibit the wanted domain(s)
• N/C terminal tend to vary in length between homologues
• How close? (distance from query sequence)
• All too close- no information
• Too many too far- bad alignment
• Ensure that you have a balanced collection!
General Issues
• From who? (which species the sequence belongs to)
• Don’t care, all homologues are welcome
• Orthologues/paralogues may be helpful
• Sequences from distant/close species provide different types of information
• Which method? (BLAST/PSI-BLAST)
• Depends on the protein, available homologues, the goal in mind…
General Issues
Rules For Choosing Sequences• Very similar sequences have little information
• Very different sequences cause trouble…<30% identical with more than half of the other sequences in the set
• Choose sequences as distantly related as possibleSequence between 30-80% identical with more than half of
the sequences in the set
• The more sequences the better
General Issues
Overall work steps
1.Run the search- 1. Select database2. E-value threshold3. BLAST or PSI-BLAST- how many rounds?
2.Take out sequences- HSP (slider region) or full sequences
3.Align sequences- choose alignment program
4.View alignment with BioEdit tor another program
5.Calculate trees, conservation scores (ConSurf) etc…
(PSI-)BLAST via Max-Planck
http://toolkit.tuebingen.mpg.de/sections/search
• Databases- swissprot, tremble, NR, env, pdb or any combination for proteins, but only NT for DNA.
• All BLAST programs
Main advantage- you can easily extract and filter the HSPs, on top of full sequences
The Query Protein
Name: Dihydrodipicolinate reductase
Enzyme reaction:
Molecular process: Lysine biosynthesis (early stages)
Organism: E. coli
Sequence length: 273 aa
Query:DAPB_ECOLI
>DAPB_ECOLIMHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL
The Query Protein
(PSI-)BLAST via Max-Planckhttp://toolkit.tuebingen.mpg.de/psi_blast/
Choose database or databases
(selecting a few using CTRL)
Upload sequenceor MSA
(PSI-)BLAST via Max-Planc
(PSI-)BLAST via Max-Planc
(PSI-)BLAST via Max-Planc
(PSI-)BLAST via Max-Planc
(PSI-)BLAST via Max-Planck
E-value threshold can be assessed using the distribution
Forward results to MSA
http://toolkit.tuebingen.mpg.de/sections/alignment
Forward results to MSA
Forward results to
MSAAll marked hits or filter by e-value
HSP (sider region) or full sequences
Forward results to MSA
Align via Max-Planck
Alignment results:
Save the alignment
Alignmen viewing & editingBioEdit
• http://www.mbio.ncsu.edu/BioEdit/BioEdit.html
• Easy-to-use sequence alignment editor
• View and manipulate alignments up to 20,000 sequences. •Four modes of manual alignment: select and slide, dynamic grab and drag, gap insert and delete by mouse click, and on-screen typing which behaves like a text editor.
•Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and NBRF/PIR formats. Also reads GCG and Clustal formats
Easiest Using Bioedit
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
Alignment viewing & editing
Easiest Using Bioedit
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
• Find a specific sequence: “Edit-> search -> in titles”
• Erase\add sequences: “Edit-> cut\paste\delete sequence”
• “Sequence Identity matrix” under “Alignment”- useful for a rough evaluation of distances within the alignment.
• After taking out sequences, “Minimize Alignment” under “Alignment” takes out unessential gaps.
• Can save an image using: “File -> Graphic View” & then “Edit -> Copy page as BITMAP”
Alignment viewing & editing
A little of ConSurf
Compute Conservation Scores
• Give an MSA or will compute one for you
(given a FASTA sequence, BLAST & MSA)
Main advantage:filters short HSPs, removes redundant
sequences
• Shows conservation scores on sequence or on a protein structure (if available)
ConSurf
ConSurfhttp://consurf.tau.ac.il/results/1321532763/output.php
ConSurfhttp://consurf.tau.ac.il/results/1321532763/output.php
ConSurf
MSA colored by conservation
PSI-BLAST result
MSA
Phylogenetic tree
Sequences used
Sequence conservation
ConSurf
Jmol- Easy web-based viewer
WebLogohttp://weblogo.berkeley.edu/logo.cgi
WebLogohttp://weblogo.berkeley.edu/logo.cgi
Each sequence is a different story
adjust parameters:
• BLAST- E-value, substitution matrix, gap penalties, database, minimum length, redundancy level, fragment overlap…
• PSI-BLAST- BLAST parameters + PSSM inclusion threshold (or chose manually), number of rounds…
• Try using HSP or full sequences, different MSA programs…
No “Miracle solution”