ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche...
Transcript of ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche...
![Page 1: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/1.jpg)
ARN et Bioinformatique
Daniel GautheretV.2006.0 http://www.esil.univ-mrs.fr/~dgaut/Cours
![Page 2: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/2.jpg)
Plan du cours
• Séance 1– Les molécules d’ARN– Structures des ARN
• Secondaire• Tertiaire
– La prédiction de structure. • Energies et covariation
• Séance 2– ARNomique
• Résultats expérimentaux, RFAM– Recherche d’ARN dans les génomes
• ARN connus• ARN inconnus
• Séance 3– TD: l’ARNt en 3D / Mfold / Erpin
![Page 3: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/3.jpg)
1a. Molécules d’ARN
![Page 4: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/4.jpg)
L’ARN messager
![Page 5: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/5.jpg)
Les ARN non-messagers ou ARN non codants (ncRNA)
• ARNt• ARNr• ARNsn • ARNsno • micro-ARN• Autres ARN 4.5S, 10Sa, Spot42, DicF, MicF, OxyS, DsrA, 6S
(procaryotes) produits des gènes XIST, H19, IPW, 7H4, His-1, NTT (mammifères).
ARNr 18S et 28S: promoteurs pol-I. ARNt et ARNr 5S: promoteurs pol-III
![Page 6: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/6.jpg)
ARN de transfert (tRNA)
• Potentiellement 64 différents dans un génomes
• En pratique une quarantaine dans les génomes microbiens. Qq centaines dans les génomes de mammifères.
• Le premier ARN observé en 3D (1974)
![Page 7: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/7.jpg)
ARN ribosomique (rRNA)
• 5S, 16S, 23S pour les procaryotes (120, 1500 et 3000 bp)
• 5.5S, 18S et 28S pour les eucaryotes.
• Complexés avec protéines (~40)
• Observées en quelques exemplaires chez les procaryotes (7 opérons chez coli)
• Les génomes vertébrés peuvent accueillir plusieurs centaines de copies identiques. (clusters de gènes)
• Structure 3D en 1999Source: http://www.cytochemistry.net/Cell-biology/ribosome.htm
![Page 8: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/8.jpg)
Small nuclear RNA (snRNA)
• Petits ARN du noyau impliqués dans l’épissage et la maintenance des télomères
• Complexés à des protéines• Plusieurs sortes nommées U1, U2, U4, U6,
U12…
Source: RFAM
![Page 9: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/9.jpg)
Small nucleolar RNAs (snoRNAs)
• Guide de méthylation des rRNA• Eucaryotes et archae• Deux familles: boite C/D et boite H/ACA• Nomenclature: E2, E3, U3, U14, U23…
Source: plant snoRNA database http://bioinf.scri.sari.ac.uk/cgi-bin/plant_snorna/home
![Page 10: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/10.jpg)
microRNA (miRNA)
Stem loop Stem loop precursor precursor (pre(pre--miRNA70nt)miRNA70nt)
miRNA miRNA transcription transcription unitunit
Dr
Dr
nucleus cytoplasm
()
mRNAmRNAtargettargetMature miRNAMature miRNA
(21(21--22 nt.)iceice 22 nt.)Translation Translation block or block or cleavagecleavage
Polycistronic Polycistronic transcript (pritranscript (pri--miRNA)
miRNA+RISC miRNA+RISC complexcomplex
miRNA)
![Page 11: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/11.jpg)
miRNA
• Se trouvent chez les animaux et les plantes• 300 à 1000 chez l’homme• Un seul miRNA peut cibler 100 gènes ou plus• Recherche de miRNA
– Conservation– Structure
![Page 12: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/12.jpg)
Les génomes à ARN
• Virus à ARN– Double brin, simple brin– En partie codant, en partie non-codant
Source: Wikipedia
![Page 13: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/13.jpg)
1b. Structure des ARN
![Page 14: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/14.jpg)
Les bases
![Page 15: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/15.jpg)
Le ribose ou déoxyribose
O2’
C1’
C3’ C2’
C4’
C5’
O5’
O4’
O3’Credit: Richard Hallick. http://www.blc.arizona.edu/Molecular_Graphics/DNA_Structure/DNA_Tutorial.HTML
2’OH: ribose
![Page 16: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/16.jpg)
Nucléosides
Credit: Richard Hallick. http://www.blc.arizona.edu/Molecular_Graphics/DNA_Structure/DNA_Tutorial.HTML
![Page 17: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/17.jpg)
La chaine d’ADN/ARN
O2’
C1’
C3’ C2’
C4’
C5’
O5’
P
O3’
OBOA
baseO4’
O3’
Credit: Richard Hallick. http://www.blc.arizona.edu/Molecular_Graphics/DNA_Structure/DNA_Tutorial.HTML
![Page 18: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/18.jpg)
Les degrés de liberté du nucléotide
ν0 à ν4 résumés par:Phase+amplitude
![Page 19: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/19.jpg)
Le plissement du sucre (sugar pucker)En raison des interactions entre non covalentes entre substituants du cycle, les susbstituants ont tendance à se palcer le plus loin possible les uns des autres. En conséquence, 4 des 5 atomes du ribose sont approximativement dans un plan, mais le 5ème sort du plan de 0,5 Å environ. 4 conformations majeures: C2': endo, C3'-exo, C3'-endo, C2' -exo.Endo: coté base, exo: loin base
C3’ endo
C2’ endo
![Page 20: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/20.jpg)
L’orientation de la base
N
N
ON
N
O
HH
H
HH
3'
3'G C– Effet sur l ’orientation
des brins (parallèle ou antiparallèle):
![Page 21: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/21.jpg)
Contraintes sur les angles de torsion
– Roues conformationnelles
![Page 22: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/22.jpg)
La double-hélice
(ADN)
![Page 23: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/23.jpg)
Les paires Watson-Crick
La distance entre les deux points d’attachement des paires A - T et G - C sont identiques.
La même géométrie est compatible avec des pairesA – T et T – A ou G – C et C – G
![Page 24: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/24.jpg)
Les sillons
majeur
mineur
Axe de l’hélice
Axe de pseudo-symétrie
![Page 25: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/25.jpg)
Donneurs et accepteurs de ponts-H: l’identité des paires de bases
majeur majeur
mineurmineur
P
OH
OO
donneur
accepteur
(RNA)
![Page 26: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/26.jpg)
Hélices A, B et Z
A: ADN/ARN B: ADN Z: ADN/ARN
![Page 27: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/27.jpg)
Vues axiales des hélices A et B
![Page 28: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/28.jpg)
Les types d’hélicesSelon: force ionique, solvents, degré d’hydratation.
A B Z
ADN/ARN ADN ADN/ARN
droit droit gauche2’endo 2’ endo (py)
3’endo (pu)20 Å 18 Åanti Anti (py)
Syn (pu)aucun12 Å plat
Sens hélice
Déplacement bp/axelargeurSillon majeur 9 Åprof
6 Å étroitlargeurSillon mineur 7,5 Å profondprof
11 10 12Nt/tour
Confo sucre 3’ endoDiamètre 26 Å
Liaison glycosidique anti4 Å3 Å
13,5 Å11 Å3 Å
![Page 29: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/29.jpg)
L’empilement des bases (stacking)
Empilement dans un brin de type B
• L’empilement n’est pas imposé par la double hélice: il est l’un des principaux facteurs de stabilité des A.N.
• Cycles aromatiques séparés d’environ 4 Å
• Causes: hydrophobicité et interactions VdW
• Les séquences YpR et PrY ont un empilement différent: La séquence influe sur la stabilité et le forme de l’hélice.
![Page 30: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/30.jpg)
Paramètres des hélices
On peut définir une double hélice avec les paramètres suivants:Tilt (theta-t): autour de l’axe pseudo-symTwist (t): tour/residu autour axe hélicepropeller Twist (theta-p): entre une base et son appariéeaxial rise (h): élévation/residuDislocation (D): distance entre centre bp et axe héliceRoll (theta-r): angle autour du 3ème axe (axe C6-C8).
![Page 31: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/31.jpg)
ARN: les structures secondaires
Extrait ARNr 23S
![Page 32: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/32.jpg)
Interactions secondaires et
tertiaires
![Page 33: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/33.jpg)
Eléments de structure secondaire
a. épingle à cheveux (hairpin)
b. interne
c. bulge
d. multi-branche
e. duplex (longue-distance)
f. pseudonoeud
![Page 34: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/34.jpg)
Secondaire et tertiaire: un exemple…
– The 1030-1124 Region of 23S rRNAhas seveal tertiary interactions: two canonical lone base pairs (1082:1086and 1087:1102), a base triple ((1092:1099)1072) and three non-canonical tertiary base pairs (1032:1122, 1039:1116, and1040:1115)
![Page 35: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/35.jpg)
L’ARNt
Présence de nombreux nucléosides inhabituels, par ex: inosine (I), pseudouridine (Ψ), dihydrouridine (D), ribothymidine (T), base Y, etc.
![Page 36: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/36.jpg)
L’ARNt: du 2D au 3D
Source: Molecular Biology of the Cell (http://www.garlandscience.com/textbooks/0815332181.asp)
![Page 37: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/37.jpg)
L’ARNtBoucle TΨC
Boucle D
Boucle Variable
Tige Acceptrice
BoucleAnticodon Forme d’un « L »
Interactions dans la structure tertiaire:
Formation des triplets; interactions avec des P – du backbone et avec le 2’OH des riboses
![Page 38: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/38.jpg)
ARNt: interactions tertiaires
![Page 39: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/39.jpg)
Le ribosome
Masse totale: 2.6 x 103 kDa (100 fois plus que le Lysosyme)Composition: 1/3 protéine 2/3 nucléotidesSous-unité 30S : Interaction avec les codons du mRNA et les anticodons du tRNASous-unité 50S : activité peptidyl-transferase et interaction avec le GTP-binding protein.Meilleures structures aujourd’hui :Ribosome 70S de T. thermophilus à 7.8 Å (1999) Sous-unité 30S de T. thermophilus à 4.5 Å (1999)Sous-unité 50S de H. marsimortui à 2.4 Å (2000)
![Page 40: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/40.jpg)
23S: interactions secondaires et tertiaires
(moitié 5’)
![Page 41: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/41.jpg)
23S: interactions
secondaires et tertiaires
(moitié 3’)
![Page 42: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/42.jpg)
Ribosome: les sites de fixation des ARNt
Source: Molecular Biology of the Cell (http://www.garlandscience.com/textbooks/0815332181.asp)
![Page 43: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/43.jpg)
La grande sous-unité (ARN 23S)
crystal structure of the large ribosomal subunit from Haloarcula marismortui at 2.4 angstrom resolution
Grande sous Unité: 35 prot + 2 ARN (23S, 5S)
Nenad Ban, Poul Nissen, Jeffrey Hansen, Peter B. Moore, Thomas A. Steitz Science. 289:878-9, 2000
![Page 44: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/44.jpg)
Protéines de la grande sous-Unité
![Page 45: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/45.jpg)
Repliement des domaines
dominés apr les interactions
inter-hélices
![Page 46: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/46.jpg)
A proposed mechanism of peptide synthesis catalyzed by the ribosome. A2486 (A2451) is shown as the standard tautomer in allsteps, but could be represented as the imino tautomer, which would have a negative unprotonated N3 and a neutral protonated N3. We expect that the electronic distribution is actually between these two extremes. (A) The N3 of A2486 abstracts a proton from the NH2 group as the latter attacks the carbonyl carbon of the peptidyl-tRNA. (B) A protonated N3 stabilizes the tetrahedral carbon intermediate by hydrogen bonding to the oxyanion. (C) The proton is transferred from the N3 to the peptidyl tRNA 3' OH as the newly formed peptide deacylates. Among the variations on this mechanism that should be considered would be a protonated A2486 stabilizing the intermediate, as in (B), with less contribution on acid-base catalysis, as shown in (A) and (C).
stabilisation du carbone tetrahedrique par N+
Site P libéré par le déplacement du peptide sur le tRNA du site A
Site P Site A
Centre peptidyl-
transférase
La stabilisation de l’imino tautomère du A2486 augmente le pKa de A2486 N3 à 7.6
Notez les interactions tertiaires
Principale fonction de la S.U. 50S
Pas de peptide à moins de 18Å
Abstraction proton par N3 de 2486
Attaque nucléophile
![Page 47: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/47.jpg)
The polypeptide exit tunnel. (A) The subunit has been cut in half, roughly bisecting its central protuberance and its peptide tunnel along the entire length. The two halves have been opened like the pages of a book. All ribosome atoms are shown in space-filling representation, with all RNA atoms that do not contact solvent shown in white and all protein atoms that do not contact solvent shown in green. Surface atoms of both protein and RNA are color-coded with carbon yellow, oxygen red, and nitrogen blue. A possible trajectory for a polypeptide passing through the tunnel is shown as a white ribbon. PT, peptidyl transferase site. (B) Detail of the polypeptide exit tunnel showing distribution of polar and nonpolar groups, with atoms colored as in (A), the constriction and bend in the tunnel formed by proteins L4 and L22 (green patches close to PT), and the relatively wide exit of the tunnel. A modeled polypeptide is in white. (C) The tunnel surface is shown with backbone atoms of the RNA color coded by domain. Domains I (yellow), II (light blue), III (orange), IV (green), V (light red), 5S (pink), and proteins are blue.
Tunnel de sortie du
polypeptide
![Page 48: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/48.jpg)
Les motifs ARN
••
--
RNAG
Boucle GNRA
•-
• -
R NA G
G U
G
-
C -U
AA
UA
GC -
• •-interaction GNRA- boucle interne
-•-
YU
U-turn
--
CNGU
?
Boucle UNCG
Plateforme d'adénine
• •-
-AG U/A•
Aempilement
Quelques motifs ARN. Les traits pointillés désignent des interactions tertiaires. Les flèches indiquent le sens 5'->3'. Le point d'interrogation dans le motif d indique que le partenaire de cette interaction tertiaire n'est pas connu.
a
ed
cb
Boucle E
![Page 49: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/49.jpg)
La paire G:U Wobble
![Page 50: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/50.jpg)
La paire Hoogsteen
Boucle TTC de tRNA Phe levure
![Page 51: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/51.jpg)
Le U-turn
Boucle anticodon de tRNA Phe levure
![Page 52: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/52.jpg)
La boucle GNRA
![Page 53: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/53.jpg)
Les triplets de bases
Triple hélice 10-13 + 22-25+ 9+45+46 de tRNA Phe levure
![Page 54: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/54.jpg)
1.c La prédiction des structures secondaires d’ARN
![Page 55: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/55.jpg)
Les Dot-Plots
![Page 56: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/56.jpg)
Algorithme de Zuker et Stiegler
• L’algorithme de programmation dynamique appliqué aux structures secondaires
• recherche les structures de plus basse énergie pour tous les sous-fragments d’une séquence (Zuker and Stiegler, 1981).
• Garantit une solution optimale à l’intérieur du modèle énergétique choisi. N’autorise pas les pseudo-nœuds.
![Page 57: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/57.jpg)
Prédire les structures secondaires: énergies libres d’empilement (stacking free energy) (Turner et al.)
– Dans l’hélice (seulement paires WC et GU):
------------------ ------------------ ------------------ ------------------A C G U A C G U A C G U A C G U
------------------ ------------------ ------------------ ------------------5' --> 3' 5' --> 3' 5' --> 3' 5' --> 3'
UX UX UX UXAY CY GY UY
3' <-- 5' 3' <-- 5' 3' <-- 5' 3' <-- 5'. . . -8.1 . . . . . . . -4.0 . . . .. . -13.3 . . . . . . . -5.3 . . . . .. -10.5 . -4.0 . . . . . -1.8 . -5.3 . . . .
-6.6 . -3.6 . . . . . -3.6 . -3.6 . . . . .
ACGU
– Attention: AU
U
A
A
5 ’
A
U
U
5 ’
5 ’5 ’
![Page 58: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/58.jpg)
Turner’s free energies (suite)
– Terminal mismatches:
Y Y Y Y------------------ ------------------ ------------------ ------------------A C G U A C G U A C G U A C G U
------------------ ------------------ ------------------ ------------------5' --> 3' 5' --> 3' 5' --> 3' 5' --> 3'
AX AX AX AXAY CY GY UY
3' <-- 5' 3' <-- 5' 3' <-- 5' 3' <-- 5'. . . -4.0 . . . -4.3 . . . -3.8 -4.3 -6.0 -6.0 -6.0. . -5.2 . . . -7.2 . . . -7.1 . -2.6 -2.4 -2.4 -2.4. -10.3 . -7.2 . -5.2 . -4.8 . -9.4 . -6.6 -3.4 -6.9 -6.9 -6.9
-4.3 . -4.3 . -2.6 . -2.6 . -3.4 . -3.4 . -3.3 -3.3 -3.3 -3.3
ACGU
A
G
C
A
5 ’
5 ’
![Page 59: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/59.jpg)
Turner’s free energies (suite)
– Dangling ends:
X X X X------------------ ------------------ ------------------ ------------------A C G U A C G U A C G U A C G U
------------------ ------------------ ------------------ ------------------5' --> 3' 5' --> 3' 5' --> 3' 5' --> 3'
AX AX AX AXA C G U
3' <-- 5' 3' <-- 5' 3' <-- 5' 3' <-- 5'. . . . . . . . . . . . -4.9 -0.9 -5.5 -2.3
A
U
A
5 ’
5 ’
![Page 60: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/60.jpg)
Analyse comparative
L’analyse comparative est la façon la plus fiable de déterminer les structures secondaires
Elle repose sur la détection de covariationsElle nécessite plusieurs séquences homologues alignées
Fonctionne même pour les paires non Watson-Crick!
![Page 61: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/61.jpg)
Mesures de la covariation
• Table de contingence: table p normal 53 61
a c g u -53, 61| 0 97 0 3 0 61------+------------------a 3 | 0 0 0 3 0 c 0 | 0 0 0 0 0 g 97 | 0 97 0 0 0 u 0 | 0 0 0 0 0 - 0 | 0 0 0 0 0 53
gc=( 29, 28.03, 96.7%) au=( 1, 0.03, 3.3%)
• Tests:– Chi 2– Information Mutuelle– évènements phylogénétiques
![Page 62: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/62.jpg)
Exemple: covariations d’un nucléotide avec tous les autres
– Position 1 du tRNA contre toutes les autres positions:
![Page 63: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/63.jpg)
Cours 2. ARNomique
![Page 64: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/64.jpg)
Supports:
http://www.esil.univ-mrs.fr/~dgaut/cours/orsay.html
A partir de novembre:
http://rna.igmors.u-psud.fr/
![Page 65: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/65.jpg)
Bacterial genome
• 90% transcribed (~90% coding)
codingintergenic
T
Transcription: T
![Page 66: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/66.jpg)
Vertebrate Genomes: 90% transcribed too!
Intergenic DNAIntrons, UTRs
TT
T
T?Transcription: T Satellite DNA
Transposable elementstRNA, rRNACoding regions: 1.5%
Vertebrate gene: 30kb (coding: 1,5kb)
![Page 67: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/67.jpg)
RFAM: The RNA databaseSam Griffiths-Jones, Simon Moxon, Mhairi Marshall, Ajay
Khanna, Sean R. Eddy and Alex Bateman.
![Page 68: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/68.jpg)
RFAM Stats
• 503 Familles d’ARN– une famille = un groupe d’ARN homologues et
alignables. – miRNA= >20 familles.
• Homme: – ~100 familles – 3000 ARN différents annotés
• E. Coli: ~40 familles
![Page 69: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/69.jpg)
Familles, orthologues etc.
![Page 70: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/70.jpg)
Combien d’autres ARNnc ?
![Page 71: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/71.jpg)
2.1 Approches expérimentales
![Page 72: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/72.jpg)
Cloning– Rnomics*
– Extract total RNA– Isolate small RNAs – Tag & Reverse transcribe– Clone & Sequence
• ~200 ncRNAs identified in mouse • ~100 ncRNA in bacteria
* Huttenhofer et al., 2001
![Page 73: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/73.jpg)
Limitations
• Sensibilité: ncRNA rares, exprimés à des sites précis ou pendant une courte durée
• Non exhaustivité
![Page 74: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/74.jpg)
Autres approches
– Full-length cDNA projects• FANTOM3:
– 100,000 mouse cDNAs– 32,000 non-coding!
– Tiling arrays• Half human transcriptome is polyA-, cytoplasmic and maps
unnanotated loci
![Page 75: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/75.jpg)
Limitations
• Nombreux transcrits non fonctionnels (« fuites » de la transcription)
• Pas de preuve de fonction en tant qu’ARN
![Page 76: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/76.jpg)
2.2 Recherche Bioinformatique de gènes ARN
![Page 77: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/77.jpg)
What’s Special About ncRNA Detection?
s
s
s
BLAST HMM
– No ORF– No Markov model / sequence statistics– ncRNA is defined both by primary and secondary structure– « Substitution matrices » for nucleic acids are terrible compared
to aminoacids counterparts
![Page 78: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/78.jpg)
2.2.a Looking for known ncRNAs
– How can we detect ncRNA genes from known families?
![Page 79: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/79.jpg)
Descriptor-based programs
Rnamot / Rnamotif (Gautheret 91, Macke ‘02)
Palingol (Viari 96)
Patscan (Overbeek ’00)
PatSearch (Pesole ‘01)
h1 s1 h1 s2 h2 s3 h2
h1 5:5 1h2 5:5 NNNNR:YNNNNs1 7:7 NUNNNNNs2 4:40s3 7:7 UUCNNNN
RnaMot descriptor for anticodon+TYC domain of
tRNA
![Page 80: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/80.jpg)
Descriptor-based programs
PROS CONS
Draft descriptors can be quickly sketched and tested
Requires a good prior knowledgeof secondary structure and
sequence constraints
Alignment is not compulsory, although it is very helpful to
have one
Requires basic computer skills to translate biological constraints
into computer script
Biologists decide what features are important or not (see also
CONS!)
Biologists have the responsibility of correctly weighting each
important feature
![Page 81: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/81.jpg)
Probabilistic ncRNA search programs
• Stochastic Context Free Grammars (firstadaptation of CFG to RNA: Searls 94; SCFG: Eddy & Durbin 94)
– Time cost = O(N4) for sequence of length N – Not « practical » for large alignments or genome-
wide searches– Pseudoknots not allowed
describe how to generateany structure in thetraining set
ProductionrulesTraining set
![Page 82: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/82.jpg)
ERPIN: Secondary Structure Profiles
• 16-row matrix captures base correlations and base-pair freqs.
1 2 3A C CA C CG C GA C CA C G
7 8 9G G UG G UU G CG G UC G U
A:AG:AC:AU:AA:GG:GC:GU:G...
4 5 6C G AC A AC G -- G -- G A
AGCU-Sb1,b2 =
log(Fb1b2 /Fb1xFb2)
Alignment
Weight matrices Implemented inPSI-Blast orProsite
Usual 5-row matrix for single strands
![Page 83: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/83.jpg)
Profile Search Strategy
l=10 l=14
Seq
uenc
e (1
4nt
)
best score for forl=14 (0 gaps)
Single-strand profile (14 positions)
best score forl=10 (4 gaps)
Helix score for h5-h3computed from helix profile
GTTCTTGCATGTTTGACGGAACGTTCTTGCATGATTGACGGAACGTTCTTGCATGTTTGACGGAACTTTCCTGCATGCTTGACGGAACTTTAT--CAAGTTCAT-ATAAAATTAT--CGTGCCTTC-ATAATATTAT--CGTGTCTTC-ATAATATTAT--CATGTTTC--ATAAT
Training set
h3h5
Target sequence
![Page 84: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/84.jpg)
ERPIN: Profile-based searchGautheret & Lambert, JMB, 2001, 313, p. 1005.
Training set
A:AG:AC:AU:AA:GG:GC:GU:G
AGCU-
U:U...
Single-strandprofile (5xN)
Helix profile (16xN)
Sb = log(Fb /Eb)
Sb1,b2 = log(Fb1b2 /Fb1xFb2)
![Page 85: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/85.jpg)
Problem with information-poor training sets
A Mir-133 training-set:
(( - ((((((( ------ ((( - (((( ---------------------- )))) ))) ------ ))))))) - ))TC t GGCTGGT caaac- GGA a CCAA gtccgtcttcctgagaggt--- TTGG TCC CCTTCA ACCAGCT a CATG t GGCTGGT caaac- GGA a CCAA gtcaggtgtttctgtgaggt-- TTGG TCC CCTTCA ACCAGAC t ATTG t GGCTGGT aaaac- GGA a CCAA gtcaggtgtttttgtgaggt-- TTGG TCC CCTTCA ACCAGCT a TGTG c GGCTGGT gaaaa- GGA a CCAC atcaacccagaaaaaggat--- TTGG TCC CCTTCA ACCAGCC g CATA t GGCTGGT caaac- GGA a CCAA gtccgtcttccttagaggt--- TTGG TCC CCTTCA ACCAGCT a TTAG t TGCTGGT aaaac- GGA a CCAA gtcgggtgtttgcgagaggt-- TTGG TCC CTTTCA ACCAGCT a CTTG t GGCTGGT caaat- GGA a CCAA gtcaggtgtttctgcgaggt-- TTGG TCC CCTTCA ACCAGCT a CT
100% C:GOther scores = log (obs/expected) = abritrary low value!
What about G:C or A:U in this column? Is it as bad as C:C or A:G?
![Page 86: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/86.jpg)
Pseudocounts
– Principle: fill columns with expected counts, based on a reasonable model
– Example: column c contains 7 C:Gs, we know C:G often substitutes for G:C, let’s allow for someG:Cs.
– We need substitution matrices!
![Page 87: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/87.jpg)
Henikoff & Henikoff pseudocounts
(for aminoacids)
Total # pseudocounts in column c
Counts of a in column c
a substituted by i
With the previous example:
Column c is 100% C:GProbability(C:G)=1, others = 0
Count of A:Ts = Bc * 1 * Probability (C:G | A:T)Count of A:As = Bc * 1 * Probability (C:G | A:A), etc.
![Page 88: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/88.jpg)
RNA substitution matrices
Obtained from euk+archae+bac 16S/18S rRNA alignement
AA AT AG AC TA TT TG TC GA GT GG GC CA CT CG CC6.54e-04 5.20e-06 3.88e-05 4.22e-05 2.13e-05 5.51e-06 1.21e-05 3.84e-05 8.52e-05 1.28e-05 1.76e-04 2.89e-06 1.47e-05 6.47e-06 3.19e-06 4.69e-067.96e-05 9.00e-04 5.19e-05 1.78e-04 1.69e-04 1.43e-04 8.85e-05 1.86e-04 4.15e-05 1.69e-04 1.22e-04 1.99e-04 8.73e-05 2.44e-04 1.25e-04 3.30e-041.00e-04 8.72e-06 1.35e-03 1.27e-04 1.72e-05 5.09e-06 3.10e-05 1.38e-04 5.74e-05 1.59e-05 1.01e-04 8.22e-06 9.99e-06 1.62e-05 1.33e-05 2.56e-054.11e-05 1.13e-05 4.81e-05 9.79e-04 2.79e-06 7.02e-06 2.79e-06 4.47e-05 4.93e-06 1.97e-05 3.05e-05 8.06e-06 5.40e-06 1.55e-05 2.47e-06 7.30e-054.23e-04 2.19e-04 1.33e-04 5.69e-05 1.16e-03 2.21e-04 2.35e-04 2.78e-04 9.59e-05 1.18e-04 1.79e-04 1.08e-04 3.54e-04 2.04e-04 2.24e-04 9.28e-051.05e-05 1.80e-05 3.80e-06 1.38e-05 2.14e-05 9.30e-04 2.57e-05 7.75e-05 5.79e-06 2.33e-05 4.87e-05 1.18e-05 1.57e-05 8.72e-05 1.83e-05 5.25e-041.05e-04 5.03e-05 1.04e-04 2.49e-05 1.03e-04 1.16e-04 1.14e-03 1.80e-04 4.69e-05 4.56e-05 1.25e-04 4.26e-05 1.70e-04 2.15e-04 7.52e-05 3.23e-051.45e-05 4.59e-06 2.03e-05 1.73e-05 5.30e-06 1.52e-05 7.82e-06 1.60e-04 4.55e-06 8.99e-06 4.77e-06 3.66e-06 9.00e-06 6.17e-05 2.95e-06 1.61e-052.57e-04 8.19e-06 6.74e-05 1.53e-05 1.46e-05 9.11e-06 1.63e-05 3.64e-05 1.47e-03 2.50e-05 8.70e-05 2.12e-05 3.02e-05 2.83e-05 4.40e-06 8.02e-061.24e-04 1.07e-04 6.02e-05 1.96e-04 5.81e-05 1.18e-04 5.10e-05 2.31e-04 8.04e-05 1.28e-03 9.39e-05 8.77e-05 2.53e-05 9.12e-05 3.55e-05 4.58e-051.82e-04 8.24e-06 4.08e-05 3.24e-05 9.35e-06 2.61e-05 1.49e-05 1.30e-05 2.97e-05 9.98e-06 5.62e-04 6.96e-06 6.83e-06 8.80e-06 1.32e-05 1.06e-051.14e-04 5.14e-04 1.26e-04 3.27e-04 2.16e-04 2.44e-04 1.94e-04 3.84e-04 2.78e-04 3.57e-04 2.67e-04 1.49e-03 1.07e-04 5.26e-04 2.57e-04 2.87e-041.30e-05 5.04e-06 3.43e-06 4.90e-06 1.58e-05 7.22e-06 1.73e-05 2.10e-05 8.85e-06 2.30e-06 5.85e-06 2.40e-06 5.30e-04 5.30e-05 1.58e-05 7.68e-063.86e-06 9.54e-06 3.78e-06 9.51e-06 6.16e-06 2.71e-05 1.48e-05 9.78e-05 5.60e-06 5.61e-06 5.10e-06 7.94e-06 3.58e-05 2.95e-04 5.22e-06 3.52e-051.04e-04 2.68e-04 1.70e-04 8.32e-05 3.71e-04 3.12e-04 2.83e-04 2.56e-04 4.77e-05 1.19e-04 4.21e-04 2.12e-04 5.86e-04 2.86e-04 1.35e-03 2.50e-042.13e-06 9.81e-06 4.54e-06 3.41e-05 2.12e-06 1.24e-04 1.69e-06 1.94e-05 1.21e-06 2.14e-06 4.70e-06 3.31e-06 3.95e-06 2.68e-05 3.48e-06 5.45e-04
A T G C9.13e-04 8.22e-05 1.05e-04 9.35e-055.57e-05 6.70e-04 7.98e-05 1.41e-046.94e-05 7.78e-05 7.32e-04 5.03e-054.09e-05 9.15e-05 3.33e-05 6.03e-04
AAATAGACTATTTGTCGAGTGGGCCACTCGCC
ATGC
![Page 89: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/89.jpg)
Performance as a function of training set size and pseudocount weight
specificity
sensitivity
From 1 to 19 sequences in training set…
pseudocountspseudocounts + true-counts
pcw =
![Page 90: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/90.jpg)
Setting the pseudocount weight
pseudocountspcw = pseudocounts + true-counts
![Page 91: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/91.jpg)
An E-value for RNA motifs?
E-value:# expected hits of score > S, by chance
Can we guess this? Idea 1:
run against random database and compute score distributionExtrapolate for any score
![Page 92: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/92.jpg)
High Scores are Well Behaved
• Y = 6.6 105 e-0.3 S
• E = Kmn e-λ S
• (extreme value distrib.)
• Score 30 (min):
• 3.8 hits/100mb• Score 40 (ave): • 0.13 hits/100mb10
100
1000
10000
20 25 30 35score
#hits
randommarkov
SECIS hits in700 mb randomized sequences An E-value is possible
1 day to run simulation!
![Page 93: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/93.jpg)
Global Score Distribution is not as Good!
A:AG:AC:AU:AA:GG:GC:GU:GA:CG:CC:CU:CA:UG:UC:U
« Log(0) » or pseudocount
« Finite » scoreHigh scores
Helix profile
Distribution of all scores
U:U
Not GaussianHow can we model it? (same for single-
strand profile)
![Page 94: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/94.jpg)
Discrete convolutions
computation
A:AG:AC:AU:AA:GG:GC:GU:GA:CG:CC:CU:CA:UG:UC:UU:U
simulation
![Page 95: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/95.jpg)
E-values of complete motifs
simulated
computed
![Page 96: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/96.jpg)
Profile-based search
PROS CONS
All constraints in the training set are efficiently exploited, resulting
in highly specific detections
Alignment and secondary structure constraints must be
accurate
No programming is needed Helices of variable length need to be reduced to their shortest
consensus
Scoring system is defined automatically
Program will not depart from initial alignment in terms of motif
sizeE-values are provided for each hit Users still have to decide on
search order and masked elements
![Page 97: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/97.jpg)
Running a successful ncRNA search
Example: the Signal Recognition Particle(SRP) RNA
172 sequences availableAll 3 kingdomsSignature: 50-nt domain IV
![Page 98: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/98.jpg)
Organize ncRNA informationAlignment is a must
Should be structure-based
ClustalW OK only as a first attempt
RNAalifold (Vienna package) can identify covarying basepairs
Secondary Structureannotation
Will help identify sequence/ structureconstraints: helix sizes, conservedbases, etc.
![Page 99: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/99.jpg)
Want to publish your finds in absence of E-value?Prepare Control Procedure
TP TP+FN
Sensitivity: SN = Total « true » objects
TP TP+FP
Specificity : SP = Total predictions
TP and FN: easy to obtain, using training set (leave-one out)
FP: harder! How do you know a hit is false?
Hint: express SP as: FP / Mb in a random sequence
Make it large enough and of same composition (mono & di-nt) as search database (e.g. with the shuffle program)
![Page 100: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/100.jpg)
Using the ERPIN program
>structure000000000000000000000000000000000000001111001000222443333333666555588877777788899996661111443222>AQU.AEO.AGGGUGAACU-CCCCCAGGCCCGAA--AGGGAGCAAGGGUAAGC-CCG>THE.THE.GGCGUGAACC-GGGUCAGGUCCGGA--AGGAAGCAGCCCUAAGC-GCC
erpin srp.epn sequence.fasta -8,8 -nomaskerpin srp.epn sequence.fasta –2,2 -nomaskerpin srp.epn sequence.fasta -2,2 –umask 5 9 -nomask
![Page 101: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/101.jpg)
ERPIN results
Score: based on profile values
E-value: How manyhits expected at thisscore or higher?
No need for random sequence tests!
![Page 102: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/102.jpg)
Looking for MicroRNA Precursors
Stem loop Stem loop precursor precursor (pre(pre--miRNA70nt)miRNA70nt)
miRNA miRNA transcription transcription unitunit
Dr
Dr
nucleus cytoplasm
()
mRNAmRNAtargettargetMature miRNAMature miRNA
(21(21--22 nt.)iceice 22 nt.)Translation Translation block or block or cleavagecleavage
Polycistronic Polycistronic transcript (pritranscript (pri--miRNA)
miRNA+RISC miRNA+RISC complexcomplex
miRNA)
![Page 103: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/103.jpg)
miR Training sets
• >50 miRNA families with different signatures– Can’t use a single profile for all
• 18 training sets built for 18 miRNA families– Using CLUSTALW + Alifold– 10 sequences/family on average
Legendre, Lambert, Gautheret, Bioinformatics 2004
![Page 104: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/104.jpg)
ERPIN vs BLAST
• 20 animal genomes scanned• Compare Erpin and WU-BLAST w/ sensitive
parameters (W=7)• E-value ≤ 0.01
ERPIN WU-BLAST
43 (0) 212 (5) 41 (9)
![Page 105: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/105.jpg)
Analysis of a miR Cluster
miR17 clusterciona
ciona Grey: initial training set
“E” indicates hits identified by ERPIN only,
“EB” indicates hits identified by both ERPIN and BLAST.
• Important homologues missed by WU-BLAST• Profile search a must in miRNA detection
![Page 106: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/106.jpg)
2.2b De novo ncRNA finding
– How can we detect ncRNA genes when noprior sequence/structure data is available?
![Page 107: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/107.jpg)
Thermodynamic Profiling (Le et al. 88)
-20100 bases
-2
100 bases
-3100 bases
-2
100 bases
Profile : - 4
100 bases
window free energy - mean (energy of rnd seq.)Z-score =
Var(energy of rnd seq.)
![Page 108: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/108.jpg)
The problem with thermodynamics
OK for strong local structures (some success in viral genomes)However: true ncRNA (tRNA, rRNA) do not display higher folding energy than random sequences of same composition (di-nt: Rivas & Eddy 2000)The method would not detect many known ncRNAsG+C content alone is a better ncRNA predictor than free energy
![Page 109: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/109.jpg)
G+C content
In high A+T background (thermophilic archaebacteria), ncRNA stand out clearly. Combining (G+C)% and CpG% provides the best discriminant (Schattner ’02). A dozen such predicted ncRNA experimentally confirmed in M. jannaschii and P. furiosus.Does not work in genomes with « normal » G+C contents, except as a complement to other methods (thermodynamics, etc.)
![Page 110: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/110.jpg)
RNAGenie (Carter, Dubchak & Holbrook, 2001)
– Partition E. coli sequences into 2 sets of overlapping 80nt windows :
• 675kb intergenic (negative set) • 8kb true ncRNA (16S,23S,5S,tRNA,other small RNAs)
(positive set)
– Train neural network capturing different « compositional characters »
• nt and di-nt composition• frequency of typical RNA motifs: UNCG, GNRA, AAR, CUAG• Folding free energy
![Page 111: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/111.jpg)
RNAGenie results
– Claim about 80-90% accuracy– Predict 370 new ncRNA– 13 predictions confirmed subsequently– Limitations
• small training set: squewed prediction• Possible contamination of non-ncRNA sequences in
« negative » set.
![Page 112: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/112.jpg)
Comparative Genomics rules!
Wassarman et. al. ‘01: comparaison Escherichia, Salmonella, Klebsiella : 60 ncRNA predicted, 23 confirmedMany ongoing projects in bacteria, Xenopus, Ciona, human
![Page 113: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/113.jpg)
Comparative genomics:a major source of ncRNA in eukaryotes
– 5-6% of mamalian genome under selection vs 1.5% coding (3 times as much as in nematodes)
– Tiling arrays and full length cDNA identify as many transcripts in intergenic as in known genes
– As many polyA- as polyA+
![Page 114: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/114.jpg)
Functional assignment of conserved regions
– Coding exons– Regulatory sequences in exons
and introns– Promoters– ncRNA– Ancestral repeats– Others (matrix attachment,
etc.)
Margulies et al, 2003
Fraction of conserved sequences in.. (AR=ancestral repeats)
Detect this!
Need classification software- QRNA (Rivas & Eddy, 2001)- RNAz (Hofacker & Stadler, 2005)
![Page 115: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/115.jpg)
Q-RNA (Rivas & Eddy 2001)
– Analysis of Blast alignment (SCFG based)
Synonymous mutations
Compensatory mutations
•Model for protein coding gene
•Model for ncRNA
(also include loop probabilities obtained fromtraining set of real ncRNA)
![Page 116: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/116.jpg)
RNA-Alifold (Hofacker, Stadler)
• Originally: secondary structure prediction• Predicts best common structure for a set of
aligned RNAs• Dynamic programming, averaging:
– Energies of aligned bases– Covariation term
![Page 117: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/117.jpg)
RNAz (Washietl et al. 2004)
• Uses multiple alignments. • two basic components:
– (1) Measure for RNA secondary structure conservation based on computing an RNAalifold consensus secondary structure (Structure Conservation Index)
– (2) Measure of thermodynamic stability, based on a zscore normalized w.r.t. both sequence length and base composition and can be calculated without sampling from shuffled sequences (SVM)
– An SVM again to combine 1 and 2
![Page 118: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/118.jpg)
Q-RNA & RNAz
– QRNA • Limited range for similarity (65%-85%): too dissimilar= incorrect
Blast alignments, too similar=no covariation
• Pairwise: limited covariation
– RNAz• Multiple alignment: better use of sequence variation• Already applied to mammals (5-species alignment) & Ciona
Problem: Human/mouse/rat ncRNAs not in this range!
![Page 119: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/119.jpg)
The right species for ncRNA detection?
Human/mouse ncRNA: ~98-100% id18S fugu/xenopus/human: 95% id! Obvious interest for diverse animal models
Can’t observe covariation
Conserved elements detected using various species
Human/mouse tRNAAsp
![Page 120: ARN et Bioinformatiquedenise/CoursBioinfo/ARN-BIBS-CASM...ADN/ARN ADN ADN/ARN droit droit gauche 2’endo 2’ endo (py) 3’endo (pu) 20 Å 18 Å anti Anti (py) Syn (pu) aucun 12](https://reader034.fdocuments.net/reader034/viewer/2022052423/5f07bf6d7e708231d41e8ab1/html5/thumbnails/120.jpg)