8024 Bio Info

download 8024 Bio Info

of 28

Transcript of 8024 Bio Info

  • 8/3/2019 8024 Bio Info

    1/28

    Introduction to Bioinformatics

  • 8/3/2019 8024 Bio Info

    2/28

    What is Bioinformatics?

    Bioinformatics is a relatively newinterdisciplinary field that integrates computerscience, mathematics, biology, and informationtechnology to manage, analyze, and understand

    biological, biochemical and biophysicalinformation.

    Bioinformatics is a computational science andthe subset of larger field of ComputationalBiology.

  • 8/3/2019 8024 Bio Info

    3/28

    What is Bioinformatics?

    Bioinformatics is the use of computers to studybiology

    Bioinformatics is the science of usinginformation to understand biology

    Bioinformatics is integration of informationtechnology (IT) and biology

    Bioinformatics is the development of

    computational methods for studying structure,function and evolution of genes, proteins andwhole genomes

  • 8/3/2019 8024 Bio Info

    4/28

  • 8/3/2019 8024 Bio Info

    5/28

    Some Terminology

    Cell is a primary unit of life

    Cell consists of molecules, chemical

    reactions and a copy of the genome

    for that organism

    All life on this planet depends on

    three types of molecules: DNA,RNA and proteins

  • 8/3/2019 8024 Bio Info

    6/28

    Some Terminology

    DNA Holds information on how cell works

    RNA

    Acts to transfer short pieces of information to

    different parts of cell Provide templates to synthesize into protein

    Proteins

    Form enzymes that send signals to other cells

    and regulate gene activity Form bodys major components (e.g. hair, skin,

    etc.)

  • 8/3/2019 8024 Bio Info

    7/28

    DNA - Deoxyribonucleic Acid

    Genetic material Consists of two long strands

    Each strand is made of:

    Phosphates

    Sugar

    Nucleotides

    A (adenine)

    G (guanine) C ( cytosine)

    T (thymine)

  • 8/3/2019 8024 Bio Info

    8/28

    DNADouble Helix Structure

  • 8/3/2019 8024 Bio Info

    9/28

    The Central Dogma of Molecular

    Biology

    Information has been transferred from DNA(information storage molecule) to RNA

    (information transfer molecule) to a specificprotein (a functional, non-coding product)

    DNA RNA Protein

    transcription translation

  • 8/3/2019 8024 Bio Info

    10/28

    More Terminology Transcription of DNA

    DNA transcribed into RNA

    RNA exits as a single-strand unit and as a double-helix

    as well RNA consist of A, C, G and U (uracil)

    Types of RNA

    Messenger RNAmRNA

    Transfer RNAtRNA

    Ribosomal RNArRNA

  • 8/3/2019 8024 Bio Info

    11/28

    More Terminology

    Translation of Messenger RNA (mRNA):

    mRNA is translated into protein

    Proteins:

    linear polymers built from amino acids

    The transfer of information from DNA to specificprotein via RNA takes place according to thegenetic code.

    The RNA sequence is divided into blocks of three

    letters This block is called CODON

    Each codon corresponds to the specific amino acid

  • 8/3/2019 8024 Bio Info

    12/28

    More Terminology

    Four different nucleotides are used to build DNAand RNA moleculesA, G, C, T and A, G, C, U

    20 different amino acids are used in proteinsynthesis

    Four nucleotides can be arranged in 64 differentcombinations of three.

    There are 64 = 4*4*4 different codons

    Some codons are redundant and some havespecial functionto terminate the translationprocess

  • 8/3/2019 8024 Bio Info

    13/28

    Why is bioinformatics important?

    Traditionally, research was carried out entirely at theexperimental laboratory but the huge increase in thedata in the genomic era has seen a need to incorporatecomputers into this research process

    There are three central biological processes aroundwhich bioinformatics tools must be developed:

    DNA sequence determines protein sequence

    Protein sequence determines protein structure

    Protein structure determines protein function

  • 8/3/2019 8024 Bio Info

    14/28

    Major research areas

    Sequence analysis- A comparison of genes withina species or between different species can showsimilarities between protein functions, or relationsbetween species

    The comparison of sequences in order to find similar and

    dissimilar in compared sequences (sequence alignment)

    Identification of gene-structures, reading frames,distributions of introns and exons and regulatoryelements

    Revealing the evolution and genetic diversity oforganisms.

  • 8/3/2019 8024 Bio Info

    15/28

    Computational evolutionary biology-Evolutionary biology is the study of the origin anddescent of species, as well as their change over time.Informatics has assisted evolutionary biologists inseveral key ways; it has enabled researchers to:

    trace the evolution of a large number of organisms bymeasuring changes in their DNA, rather than throughphysical taxonomy or physiological observations alone,

    build complex computational models of populations topredict the outcome of the system over time

    track and share information on an increasingly largenumber of species and organisms

  • 8/3/2019 8024 Bio Info

    16/28

    Prediction of protein structure- Proteinstructure prediction is another important application ofbioinformatics.

    In the genomic branch of bioinformatics, homology isused to predict the function of a gene: if the sequence of

    gene A, whose function is known, is homologous to thesequence of gene B, whose function is unknown, onecould infer that B may share A's function.

    MODELLER is one of the best software for Homology

    modelling. Protein Data Bank is the data base for 3D co-ordinates of a protein.

  • 8/3/2019 8024 Bio Info

    17/28

    Drug Designing- Drug design is the approach of

    finding drugs by design, based on their biological targets. Computer-assisted drug design uses computational

    chemistry to discover, enhance, or study drugs andrelated biologically active molecules

    Phylogenetics- Predicting the genetic or evolutionaryrelation of set of organisms. Mitochondrial SNPs andMicrosatellites ( DNA repeats) are mostly used inPhylogenetics.MEGA,PAUPare PAUP* are some of the

    important software's. Maximum Parsimony andMaximum Likelyhood are mostly used methods.

    http://www.megasoftware.net/http://paup.csit.fsu.edu/http://paup.csit.fsu.edu/http://www.megasoftware.net/
  • 8/3/2019 8024 Bio Info

    18/28

    Biological databases: why?

    Need for storing and communicating largedatasets has grown

    Make biological data available to

    scientists.

    To make biological data available incomputer-readable form.

  • 8/3/2019 8024 Bio Info

    19/28

    Type of data

    nucleotide sequences

    protein sequences

    proteins sequence patterns or motifs

    macromolecular 3D structure gene expression data

    metabolic pathways

  • 8/3/2019 8024 Bio Info

    20/28

    Different classifications of databases

    Primary or derived databases

    Primary databases: experimental results directlyinto database

    Secondary databases: results of analysis ofprimary databases

    Aggregate of many databases

    Links to other data items

    Combination of data

    Consolidation of data

  • 8/3/2019 8024 Bio Info

    21/28

    Nucleotide sequence databases EMBL, GenBank, and DDBJ are the three

    primary nucleotide sequence databases

    EMBL www.ebi.ac.uk/embl/

    GenBank www.ncbi.nlm.nih.gov/Genbank/

    DDBJ www.ddbj.nig.ac.jp

  • 8/3/2019 8024 Bio Info

    22/28

    Genbank

    An annotated collection of all publiclyavailable nucleotide and proteins

    Set up in 1979 at the LANL (Los Alamos).

    Maintained since 1992 NCBI (Bethesda).

    http://www.ncbi.nlm.nih.gov

  • 8/3/2019 8024 Bio Info

    23/28

    EMBL Nucleotide Sequence DB

    An annotated collection of all publiclyavailable nucleotide and proteinsequences

    Created in 1980 at the EuropeanMolecular Biology LaboratoryinHeidelberg.

    Maintained since 1994 by EBI-Cambridge.

    http://www.ebi.ac.uk/embl.html

  • 8/3/2019 8024 Bio Info

    24/28

    DDBJDNA Data Bank of Japan

    An annotated collection of all publiclyavailable nucleotide and proteinsequences

    Started, 1984 at the National Institute ofGenetics(NIG) in Mishima.

    Still maintained in this institute a team

    lead by Takashi Gojobori. http://www.ddbj.nig.ac.jp

  • 8/3/2019 8024 Bio Info

    25/28

    Other NCBI nucleic acids DBs

    EST database: A collection of expressed sequence tags, or short, single-pass sequencereads from mRNA (cDNA).

    GSS database: A database of genome survey sequences, or short, single-pass genomicsequences.

    HomoloGene: A gene homology tool that compares nucleotide sequences between pairs oforganisms in order to identify putative orthologs.

    HTG database: A collection of high-throughput genome sequences from large-scalegenome sequencing centers, including unfinished and finished sequences.

    SNPs database: A central repository for both single-base nucleotide substitutions andshort deletion and insertion polymorphisms.

    RefSeq: A database of non-redundant reference sequences standards, including genomicDNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both withinNCBI and with external groups, supports data-gathering efforts.

    STS database: A database of sequence tagged sites, or short sequences that areoperationally unique in the genome.

    UniSTS: A unified, non-redundant view of sequence tagged sites (STSs).

    UniGene: A collection of ESTs and full-length mRNA sequences organized into clusters,each representing a unique known or putative human gene annotated with mapping andexpression information and cross-references to other sources.

    http://www.ncbi.nlm.nih.gov/dbEST/index.htmlhttp://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/dbEST/index.html
  • 8/3/2019 8024 Bio Info

    26/28

    Bioinformatics Tools BLAST:

    The Basic Local Alignment Search Tool (BLAST) for comparinggene and protein sequences against others in public databases,now comes in several types including PSI-BLAST, PHI-BLAST, and

    BLAST 2 sequences. Specialized BLASTs are also available

    FASTAA database search tool used to compare a nucleotide or peptidesequence to a sequence database. It was the first widely usedalgorithm for database similarity searching. The program looks for

    optimal local alignments by scanning the sequence for smallmatches called "words"

  • 8/3/2019 8024 Bio Info

    27/28

    ClustalwClustalW is a general purpose multiple sequence alignment program

    for DNA or proteins. It produces biologically meaningful multiplesequence alignments of divergent sequences, calculates the bestmatch for the selected sequences, and lines them up so that theidentities, similarities and differences can be seen.

    RasMolIt is a powerful research tool to display the structure of DNA,proteins, and smaller molecules. Protein Explorer, a derivative ofRasMol, is an easier to use program.

    DeepView (also knows as Swiss-PdbViewer)For seeing and exploring macromolecular models in threedimensions, and for manual and semiautomated homology modeling

  • 8/3/2019 8024 Bio Info

    28/28

    conclusion

    Bioinformatics in India is at an early stage ofdevelopment. But at 4 to 5 centers in thecountry, one sees mature understanding of

    the needs of this sector and world classdevelopment of tools and applications. Thesecenters will ensure that Indias traditional

    strengths in IT are leveraged to place us onpar with the developed countries.