Based on: MicroRNA identification based on sequence and structure alignment

15
Based on: Based on: MicroRNA identification based on MicroRNA identification based on sequence sequence and and structure alignment structure alignment Presented by - Presented by - Neeta Jain, Nehar Arora, and Jeff Bonis Neeta Jain, Nehar Arora, and Jeff Bonis Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li A Modified miRAlign Approach to A Modified miRAlign Approach to Finding MicroRNAs in the Finding MicroRNAs in the Chicken Genome Chicken Genome

description

A Modified miRAlign Approach to Finding MicroRNAs in the Chicken Genome. Based on: MicroRNA identification based on sequence and structure alignment. Presented by - Neeta Jain, Nehar Arora, and Jeff Bonis. Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li. - PowerPoint PPT Presentation

Transcript of Based on: MicroRNA identification based on sequence and structure alignment

Page 1: Based on: MicroRNA identification based on  sequence  and structure alignment

Based on:Based on:

MicroRNA identification based on MicroRNA identification based on sequence sequence

andandstructure alignmentstructure alignment

Presented by -Presented by -

Neeta Jain, Nehar Arora, and Jeff BonisNeeta Jain, Nehar Arora, and Jeff Bonis

Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and

Yanda Li

A Modified miRAlign Approach to A Modified miRAlign Approach to Finding MicroRNAs in the Finding MicroRNAs in the

Chicken GenomeChicken Genome

Page 2: Based on: MicroRNA identification based on  sequence  and structure alignment

OutlineOutline

IntroductionIntroduction MotivationMotivation MethodsMethods ResultsResults ConclusionConclusion

Page 3: Based on: MicroRNA identification based on  sequence  and structure alignment

IntroductionIntroduction What are miRNAs and why are What are miRNAs and why are they important? they important?

miRNAs are ~22 nt long non-miRNAs are ~22 nt long non-coding RNAscoding RNAs

They are derived from their ~70 They are derived from their ~70 nt precursors, which typically nt precursors, which typically have a hairpin structurehave a hairpin structure

Importance of miRNAs:

They are found to regulate the expression of target genes via complementary base pair interactions.

Page 4: Based on: MicroRNA identification based on  sequence  and structure alignment

MotivationMotivation

miRNAs are short (~22 nt) and more conserved in miRNAs are short (~22 nt) and more conserved in their secondary structure than in primarytheir secondary structure than in primary

Hence, conventional sequence alignment methods Hence, conventional sequence alignment methods such as BLAST can only find relatively close such as BLAST can only find relatively close homologueshomologues

There are replaceable steps of the miRAlign, and There are replaceable steps of the miRAlign, and the increase/decrease in performance should be the increase/decrease in performance should be evaluatedevaluated

Prof. Joan at the Delaware BioTechnology Institute Prof. Joan at the Delaware BioTechnology Institute is working on identifying miRNAs in the chicken is working on identifying miRNAs in the chicken genome, but the secondary structure information genome, but the secondary structure information has not yet been exploitedhas not yet been exploited

Page 5: Based on: MicroRNA identification based on  sequence  and structure alignment

MethodsMethods

DataData Reference setsReference sets

mirRBase Registry Version 8.0 (mirRBase Registry Version 8.0 (http://microrna.sanger.ac.uk/sequenceshttp://microrna.sanger.ac.uk/sequences))

MicroRNA Registry Version 5.0 was previously MicroRNA Registry Version 5.0 was previously usedused

1300 animal miRNAs from six species and their 1300 animal miRNAs from six species and their precursors(1104) composed our raw training set precursors(1104) composed our raw training set Train_All.Train_All.

Train_Sub_1 : All six animal miRNAs except Train_Sub_1 : All six animal miRNAs except those from those from G. gallusG. gallus

Train_Sub_2: All six animal miRNAs except Train_Sub_2: All six animal miRNAs except those from those from G. gallusG. gallus and and C.elegansC.elegans

Genomic sequencesGenomic sequences Only the chicken genome (Only the chicken genome (G. gallusG. gallus) was used) was used..

Page 6: Based on: MicroRNA identification based on  sequence  and structure alignment

Methods (contd)Methods (contd)

Page 7: Based on: MicroRNA identification based on  sequence  and structure alignment

PreprocessingPreprocessing

Known precursors from training set are Known precursors from training set are used to BLAT used to BLAT (instead of BLAST)(instead of BLAST) against against the chicken genomethe chicken genome

The resulting candidate pre-miRNAs are The resulting candidate pre-miRNAs are used as the potential precursor miRNAsused as the potential precursor miRNAs

Experienced difficulty extracting Experienced difficulty extracting flanking sequencesflanking sequences

Methods (cont.)Methods (cont.)

Page 8: Based on: MicroRNA identification based on  sequence  and structure alignment

Experiment (contd)Experiment (contd) ““Modified” miRAlignModified” miRAlign

(1.) Secondary Structure Prediction(1.) Secondary Structure Prediction

Both the candidate sequence and it’s reverse Both the candidate sequence and it’s reverse complement are analyzed by RNAfold to predict complement are analyzed by RNAfold to predict hairpins.hairpins.

Alternatively, sequences were also analyzed in Alternatively, sequences were also analyzed in parallel by mFold to predict their secondary parallel by mFold to predict their secondary structures.structures.

Only hairpins with MFE lower than -20 kcal/mol Only hairpins with MFE lower than -20 kcal/mol are retainedare retained..

(2.) Pairwise sequence alignment(2.) Pairwise sequence alignment

Sequences from previous step are aligned Sequences from previous step are aligned pairwise to all the ~22 nt known miRNA pairwise to all the ~22 nt known miRNA sequences from the training setsequences from the training set

Sequence similarity score between the candidate Sequence similarity score between the candidate and known mature miRNAs is calculated by and known mature miRNAs is calculated by CLUSTALW.CLUSTALW.

If the score exceeds a user-defined threshold If the score exceeds a user-defined threshold (default=70), then the candidate to known (default=70), then the candidate to known miRNA pairs are kept for further analysismiRNA pairs are kept for further analysis

Page 9: Based on: MicroRNA identification based on  sequence  and structure alignment

Methods (contd)Methods (contd)(3.) Checking miRNA’s position on stemloop(3.) Checking miRNA’s position on stemloop

Should not locate on terminal loop of hairpinShould not locate on terminal loop of hairpin Omitted due to unavailability of the offset of Omitted due to unavailability of the offset of the known mature miRNAs in the pre-miRNAs:the known mature miRNAs in the pre-miRNAs:

Should locate on the same arm of hairpinShould locate on the same arm of hairpin Position of potential miRNA on hairpin should Position of potential miRNA on hairpin should not differ too much from it’s known not differ too much from it’s known homologues (chosen delta_len :- 15)homologues (chosen delta_len :- 15)

Page 10: Based on: MicroRNA identification based on  sequence  and structure alignment

Methods (contd)Methods (contd)(4.) RNA secondary structure alignment(4.) RNA secondary structure alignment

RNAforester computes pairwise structure RNAforester computes pairwise structure alignment and gives similarity scorealignment and gives similarity score

Score is a summation of all base (base pair) Score is a summation of all base (base pair) match (insertion, deletion).match (insertion, deletion).

Normalized similarity score of structure C and Normalized similarity score of structure C and m is given as:m is given as:

An alternative structure alignment program, An alternative structure alignment program, SimTree, transforms the structures into SimTree, transforms the structures into labeled trees then computes the distance labeled trees then computes the distance between them and assigns a normalized score.between them and assigns a normalized score.

Page 11: Based on: MicroRNA identification based on  sequence  and structure alignment

Methods (contd)Methods (contd)(5.) Total similarity score(5.) Total similarity score

After aligning all potential homologue pairs, a After aligning all potential homologue pairs, a total similarity score (tss) is assigned to each total similarity score (tss) is assigned to each candidate sequence.candidate sequence.

Where,

C- candidate sequence ; R – set composed of all C’s

Page 12: Based on: MicroRNA identification based on  sequence  and structure alignment

ResultsResults Search for miRNAs in the chicken genome Search for miRNAs in the chicken genome proved somewhat difficult. BLAT was used proved somewhat difficult. BLAT was used instead of BLAST because of time instead of BLAST because of time restraintsrestraints

For secondary structure prediction, mFold For secondary structure prediction, mFold predicted a lower MFE than RNAfold, on predicted a lower MFE than RNAfold, on averageaverage

T-Coffee could be used for pairwise T-Coffee could be used for pairwise sequence alignment instead of CLUSTALW, sequence alignment instead of CLUSTALW, but is about N-times slowerbut is about N-times slower

Page 13: Based on: MicroRNA identification based on  sequence  and structure alignment

Results (cont.)Results (cont.) Requirements for the position of mature Requirements for the position of mature miRNA on the stem loop were reducedmiRNA on the stem loop were reduced Only the non-loop locating condition was Only the non-loop locating condition was satisfiedsatisfied

Needed orientation (5’ vs. 3’) of known pre-Needed orientation (5’ vs. 3’) of known pre-miRNAs to check arm location and hairpin lengthmiRNAs to check arm location and hairpin length

Previously found that over 97.5% of known Previously found that over 97.5% of known animal miRNAs met the non-stringent cutoff animal miRNAs met the non-stringent cutoff hairpin length difference of 15hairpin length difference of 15

For secondary structure alignments, For secondary structure alignments, SimTree was used along with the original SimTree was used along with the original RNAforester.RNAforester. SimTree uses similar tree alignment methods to SimTree uses similar tree alignment methods to RNAforesterRNAforester

Page 14: Based on: MicroRNA identification based on  sequence  and structure alignment

ConclusionConclusion

Final results are still under analysisFinal results are still under analysis

Future work:Future work: Perform primary sequence steps first, then secondary Perform primary sequence steps first, then secondary

structure filter stepsstructure filter steps Primary sequence filters provide a greater reduction in the Primary sequence filters provide a greater reduction in the

candidate set then the secondary structure-based filterscandidate set then the secondary structure-based filters Additional seondary structure prediction, primary sequence Additional seondary structure prediction, primary sequence

alignment, and secondary alignment tools could be evaluatedalignment, and secondary alignment tools could be evaluated Different combinations of these tools could also lead to Different combinations of these tools could also lead to

better performancebetter performance Tertiary structure tools could supplement/replace some of Tertiary structure tools could supplement/replace some of

the filtering stepsthe filtering steps

Page 15: Based on: MicroRNA identification based on  sequence  and structure alignment

THANK YOU

Questions ??