Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings,...

Multiple sequence comparison (MSC)

Reading: Setubal/Meidanis, 3.4

Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14

Why care about similarity?

• Similar sequences have similar structure

Similar structure -> similar sequence?• No, the converse is not true!

• Convergent evolution. Outwardly similar solutions to similar problems may be internally different.

• Tiger and ‘Tasmanian tiger’. Fish and dolphin. Bat and bird.

• Same is true of molecular ‘species’ and ‘anatomies’!

Sequence --> function

• Similar sequences have similar function

• ‘[T]he same genes that work in flies are the ones that work in humans.’ -- Eric Wieshaus 1995 Nobel for drosophila work

Common origins• Similar sequences have common origins

• ‘Descent with modification’ is Nature’s design mechanism

• Strong similarity may imply recent common origin (what do we mean by ‘strong’ and ‘recent’?)

• Strong similarity may imply strong conservation of sequence or motif

Is multiple sequence comparison a generalization?

• From cs point of view, we’re going from two strings to many strings, a generalization

• Yes, in that it helps detect faint similarities

• No, in that we go from known biological similarity to suspected sequence similarity

‘Big’ uses for MSC

• Represent protein families

• Identify conserved sequence features

• Deduce evolutionary history

Profile representation

• Definition Given a multiple alignment of a set of strings, a profile specifies for each column the frequency of each character

Profile example

Alignment

a b c - a

a b a b a

a c c b -

c b - b c

Profile

C1 C2 C3 C4 C5

a .75 .25 .50

b .75 .75

c .25 .25 .50 .25

d .25 .25 .25

Fit string S to profile P

• Given a profile P and a string S, what is the best alignment (fit) of S to P?

• Example:

S: A a b - b c

P: 1 - 2 3 4 5

Two key issues

• How to score an alignment of a string to a profile

• How to compute an optimal alignment, given a scoring system

Scoring and alignment of profile

• Scoring Assuming letter-to-letter scores are given, use the weighted sum for each column

• Optimal alignment By DP, similar to S-S optimal alignment

• Q: How would you do profile-to-profile scoring and alignment?

Signature (motif) representation

• A motif is a regular expression (re)• Example: a helicase motif

[&H][&AD[DE]xn[TSN][x4][QK]Gx7[&A], where– [abc] = any of a,b,c– & = [ILVMFYW]– x = any amino

– a3 = up to 3 a’s

– an = any number of a’s

• Find a motif by grep-ing

Finding optimal MS alignment

• Need a scoring system

• Given a scoring system, an (efficient) method of calculation

• If no efficient method of getting the right answer, an efficient way of getting a plausible answer

Need MSC measure

• Desirable characteristics:– variable number of sequences– column-wise calculation– order independence

MQPILLL

MLR-LL-

MK-ILLL

MPPVLIL

Sum-of-pairs (SP) measure

• Column score = sum pairwise scores

• k Choose 2 pairs

• Reduces to pairwise alignment when k = 2

• Need to assign (-,-) value

• May compute in either row or column order

DP approach

• Generalization of two-sequence comparison

• k-dimensional array

• space complexity is O(nk)

• MSC with SP measure is NP-complete

MSA speedup heuristic

• This ‘heuristic’ guarantees the right answer!

• But .. it doesn’t guarantee the speedup

• General idea:– find a lower bound on L – if value for a cell exceeds L, it cannot enter into

opt solution

Commonly method -- iterative

• Simplest implementation

• Begin with Si and Sj which are pairwise closest

• Iteratively merge in additional string with smallest edit distance from any in multiple alignment

• Equivalent to finding MSP on edit tree

Clustering method

• Almost any clustering algorithm can be adapted to MSC

• Usually start with small clusters and build big ones

• Also possible start with big cluster, and divide-and-conquer

• Not clear which method is best

Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings,...

Documents

Transcript of Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings,...

Rearranjo de Genomas: Uma Coletânea de Artigos Zanoni Dias Orientador: João Meidanis.

Infografia Setubal V2 20092018 - on.eapn.pt · Infografia Setubal V2_20092018 Created Date: 9/20/2018 11:59:22 AM ...

Prova de Circuitos Digitais - EST Setubal

Manual Hortas Setubal

Entrevista roberto-setubal

Setubal Tecnicos de Turismo

1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.

Prof. João Carlos Setubal

Setubal e a Industria das Conservas

LEGAL - isprambiente.gov.it · Luigi COCCHI Mycological e Naturalistic Group “Renzo Franchi”, ... Centro Hospitalar de Setubal, Rua Camilo Castelo Branco, Setubal (Portugal) Fernanda

. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

Scientix: Seminario IBL - Setubal, Portugal, 24 October 2015

ADVANCED LAPAROSCOPIC GYNECOLOGICAL SURGERY Nov 2016.pdf · Complications Ad n omy si A. Setubal A. Setubal A. Wattiez A. Setubal.00 pm CO URSE W A P-A. Wattiez 01.30 pm L unc ha

Http:// Copyright © 2002 Qusay H. Mahmoud1 Developing Enterprise Applications Using Java School of Technology of Setubal Setubal, Portugal.

Computing close bounds on the minimum number of recombinations Dan Gusfield UCD Y. Song, Y. F. Wu, D. Gusfield (ISMB2005) D. Gusfield, D. Hickerson (Dis.

Antigo guindaste do Porto de Setubal

O Porto de Setubal 1934

Luis vidigal ip setubal maio 2012

Relatorio e Contas de 2008 - Porto de Setubal

[Gusfield G] Algorithms on Strings, Trees and Sequ(BookFi.org)