C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Walter Pirovano 24 Oct 2007...
-
Upload
abigale-arscott -
Category
Documents
-
view
213 -
download
0
Transcript of C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Walter Pirovano 24 Oct 2007...
CENTR
FORINTEGRATIVE
BIOINFORMATICSVU
E
Walter Pirovano24 Oct 2007
Genome analysis
Lecture 11: literature discussion
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[2] 24 Oct 2007 - Walter Pirovano - Genome analysis
Papers• Consensus sequences improve PSI-BLAST
through mimicking profile-profile alignmentsDariusz Przybylski and Burkhard Rost
Nucleic Acids Research 2007
• Heads or Tails: A Simple Reliability Check for Multiple Sequence AlisngmentsGiddy Landan and Dan Graur
Molecular Biology and Evolution 2007
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[3] 24 Oct 2007 - Walter Pirovano - Genome analysis
1st paper
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[4] 24 Oct 2007 - Walter Pirovano - Genome analysis
BLAST and PSI-BLAST• BLAST is a sequence-sequence method:
Sequence (query) – Sequence (nr database)
• PSI-BLAST is a profile-sequence method:RUN 1: just like normal BLASTRUN 2: Profile (query) – Sequence (nr database)
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[5] 24 Oct 2007 - Walter Pirovano - Genome analysis
Accuracy vs. Speedthe usual dilemma …
Sequence – Sequence
Profile – Sequence
Profile – Profile
AC
CU
RA
CY S
PEED
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[6] 24 Oct 2007 - Walter Pirovano - Genome analysis
Consensus sequences - 1• “1-D semplification of the sequence profile”
• Compromise between accuracy and speed
ACD..Y
Profile
Sequence 1
F A T N M G T S D P P T
Sequence 2
F V T N M N N S D G P T
Consensus
F * T N M * * S D * P T
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[7] 24 Oct 2007 - Walter Pirovano - Genome analysis
Consensus sequences - 2• How can we display consensus sequences?
• Replace the complete sequence by the consensus sequence (100%)
• Replace only local parts by consensus segments(top 50% & low 50%)
• Tests on:
• Sequence – Consensus• Consensus – Consensus• Profile – Consensus
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[8] 24 Oct 2007 - Walter Pirovano - Genome analysis
Method
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[9] 24 Oct 2007 - Walter Pirovano - Genome analysis
Evaluation of results• Ability to identify functionally related
proteins
• Correctly align them based on structural alignments
• Function is more conserved than Structure
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[10] 24 Oct 2007 - Walter Pirovano - Genome analysis
Functional evaluation: SCOP
foldssuperfamilie
sfamilies
classes
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[11] 24 Oct 2007 - Walter Pirovano - Genome analysis
Structural evaluation: 3D model quality
query template
MAGFWIL MLGKSLL
• Making the model: simply copy coordinates
• Test model quality through LGA superposition (query model with query structure)
• ‘Golden standard’: structural alignment of known structure of query & template with MAMMOTH
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[12] 24 Oct 2007 - Walter Pirovano - Genome analysis
Final sets for alignment test
• Set 1: most related, non-trivial pairs(no. = 1647)
• Set 2: more difficult, most diverged(no. = 5551)
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[13] 24 Oct 2007 - Walter Pirovano - Genome analysis
Results functional analysis
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[14] 24 Oct 2007 - Walter Pirovano - Genome analysis
Results structural analysis
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[15] 24 Oct 2007 - Walter Pirovano - Genome analysis
2nd paper
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[16] 24 Oct 2007 - Walter Pirovano - Genome analysis
There are quite some multiple alignment methods ....
PRALINE
... but what about accuracy?
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[17] 24 Oct 2007 - Walter Pirovano - Genome analysis
Benchmarking: usual on structural alignments.• There are several alignment benchmarks, such
as BAliBASE, HOMSTRAD or SABMARK
• But they can only tell us the alignment quality on their predefined sets
• Alignment methods need to define quality and consistency criteria.
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[18] 24 Oct 2007 - Walter Pirovano - Genome analysis
Heads-or-Tails method
?