  • Matlab Bioinformatics Toolkit EvaluationKanishka Bhutani

  • What I expected ??Local/Global sequence alignments.Multiple sequence alignments.Choice of different scoring matrices (BLOSUM, PAM) for evaluation.Build Hidden Markov Models. Easily import sequences from databases (PFAM,PDB, Swissprot)

  • What I found ??Most of the features.Bonus = Microarray normalization tools. Microarray Visualization tools including box plots, heat maps.

  • Any surprises ?No Multiple sequence alignmentsAvg./Std Dev. of hydrophobicity, solvent accessibility : Command ?Proteinplot- GUI for protein structure analysis.Import your file to view, select parameters and display stats.

  • What all I tried?Local alignment, Global alignment.For short sequences: swalign(seq1,seq2) nwalign(seq1,seq2)seq1,seq2: AA or NT sequences.For imported long sequences:Convert seq into a vector of integer valuesCommands: nt2int, aa2int

  • Pairwise Sequence alignmentS = getgenbank(NM_00001)M= getgenbank(NM_00002)Output : Header and a sequence.K=nt2int(S.Sequence) B=nt2int(M.Sequence)[sc,align] = nwalign [K,B]

    Alignment Score Aligned seq.

  • Getting sequences : V Easy !getgenbank: Retrieve sequence information from Genbank database.getembl: Retrieve seq. information from EMBL database.getpept: Retrieve seq information from Genpept database.gethmmprof: Get HMM from the PFAM database.

  • Experimenthmmodel = gethmmprof(PF00001)

  • Visualization of modelShowhmmprof (hmmodel,scale,logodds)

  • Get GPCR seqsS = getgenbank (NM_024531)disp (S.Sequence)

  • Alignment of the seqsvar = gethmmalignment (PF00001,type,seed)

    disp [char(var.Header) char (var.Sequence)]

  • For GPCR Family CSimilarly for diff families.Multiple aligned sequences retrieved.

  • GUI proteinplotUser friendly.Avg./ Std. dev values for: Hydrophobicity. Secondary structure propensity (Alpha helices or beta strands) Accessibility (accessible and buried residues)

  • Mglur1 plot (Proteinplot)

  • Mglur1 results

  • Test a seq. with HMMRetrieve mglur1 from Genbank mgr = getgenbank (NM_012407) glusequence = mgr.sequenceTest it with the HMM model class A [a.sglu] = hmmprofalign (model A, glusequence,showscore,true)Score = -203.53Seq =

  • Log odd score plot for best path

  • Difficulties & questionsNo multiple sequence alignment.Demos: Not very helpful.Difficult to view the sequences as no disp command found.Bugs: Storing huge sequences (GPCR A) in a file, parsing error. HMMprofdemo command abruptly stops and gives errors. Proteinplot (GUI) hangs the machine often.Verify the sequences using the HMM models ??Regular expression matches and highlighting those positions??

  • Suggestions of experimentGiven an unknown sample dataset of proteins, known dataset of proteins (known structural information).Utilize the BLMT to extract over expressed 4 Grams in a protein sequence or a group of protein sequences from the known set.Use search for regular expression function in the Matlab toolkit to look for those 4 Grams in unknown proteins and hence predict their structure.