Matlab Bioinformatics Toolkit Evaluation Kanishka Bhutani
Embed Size (px)
Transcript of Matlab Bioinformatics Toolkit Evaluation Kanishka Bhutani
Matlab Bioinformatics Toolkit EvaluationKanishka Bhutani
What I expected ??Local/Global sequence alignments.Multiple sequence alignments.Choice of different scoring matrices (BLOSUM, PAM) for evaluation.Build Hidden Markov Models. Easily import sequences from databases (PFAM,PDB, Swissprot)
What I found ??Most of the features.Bonus = Microarray normalization tools. Microarray Visualization tools including box plots, heat maps.
Any surprises ?No Multiple sequence alignmentsAvg./Std Dev. of hydrophobicity, solvent accessibility : Command ?Proteinplot- GUI for protein structure analysis.Import your file to view, select parameters and display stats.
What all I tried?Local alignment, Global alignment.For short sequences: swalign(seq1,seq2) nwalign(seq1,seq2)seq1,seq2: AA or NT sequences.For imported long sequences:Convert seq into a vector of integer valuesCommands: nt2int, aa2int
Pairwise Sequence alignmentS = getgenbank(NM_00001)M= getgenbank(NM_00002)Output : Header and a sequence.K=nt2int(S.Sequence) B=nt2int(M.Sequence)[sc,align] = nwalign [K,B]
Alignment Score Aligned seq.
Getting sequences : V Easy !getgenbank: Retrieve sequence information from Genbank database.getembl: Retrieve seq. information from EMBL database.getpept: Retrieve seq information from Genpept database.gethmmprof: Get HMM from the PFAM database.
Experimenthmmodel = gethmmprof(PF00001)
Visualization of modelShowhmmprof (hmmodel,scale,logodds)
Get GPCR seqsS = getgenbank (NM_024531)disp (S.Sequence)
Alignment of the seqsvar = gethmmalignment (PF00001,type,seed)
disp [char(var.Header) char (var.Sequence)]
For GPCR Family CSimilarly for diff families.Multiple aligned sequences retrieved.
GUI proteinplotUser friendly.Avg./ Std. dev values for: Hydrophobicity. Secondary structure propensity (Alpha helices or beta strands) Accessibility (accessible and buried residues)
Mglur1 plot (Proteinplot)
Test a seq. with HMMRetrieve mglur1 from Genbank mgr = getgenbank (NM_012407) glusequence = mgr.sequenceTest it with the HMM model class A [a.sglu] = hmmprofalign (model A, glusequence,showscore,true)Score = -203.53Seq =
Log odd score plot for best path
Difficulties & questionsNo multiple sequence alignment.Demos: Not very helpful.Difficult to view the sequences as no disp command found.Bugs: Storing huge sequences (GPCR A) in a file, parsing error. HMMprofdemo command abruptly stops and gives errors. Proteinplot (GUI) hangs the machine often.Verify the sequences using the HMM models ??Regular expression matches and highlighting those positions??
Suggestions of experimentGiven an unknown sample dataset of proteins, known dataset of proteins (known structural information).Utilize the BLMT to extract over expressed 4 Grams in a protein sequence or a group of protein sequences from the known set.Use search for regular expression function in the Matlab toolkit to look for those 4 Grams in unknown proteins and hence predict their structure.