Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching...

Post on 17-Jan-2016

219 views 1 download

Tags:

Transcript of Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching...

Introduction to Bioinformatics234525-236523

Lecturer: Dr. Yael Mandel-Gutfreund

Teaching Assistance:

Martin Akerman

Sivan Bercovici

Course web site :http://webcourse.cs.technion.ac.il/234525

2

What is Bioinformatics?

3

Course Objectives

• To introduce the bioinfomatics discipline • To make the students familiar with the major

biological questions which can be addressed by bioinformatics tools

• To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

4

Course Structure and Requirements

1.Class Structure1. 2 hours Lecture 2. 1 hour tutorial

2. Home work• Homework projects will be given every second week• The homework will be done in pairs.• 5/5 homework projects submitted

2. A final project will be conducted and submitted in pairs

5

Grading

• 30 % Homework assignments

• 70% final project

6

Literature list• Gibas, C., Jambeck, P. Developing Bioinformatics

Computer Skills. O'Reilly, 2001. • Lesk, A. M. Introduction to Bioinformatics. Oxford

University Press, 2002.

• Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004.

Advanced Reading

Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

7

What is Bioinformatics?

8

“The field of science in which biology, computer science, and information technology merge to form a single discipline”

Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

What is Bioinformatics?

9

from purely lab-based science to an information science

BioinformaticsBio = Informatics

10

Central Paradigm in Molecular Biology

mRNAGene (DNA) Protein

21ST centaury

Genome Transcriptome Proteome

11

Genome

• Chromosomal DNA of an organism

• Coding and non-coding DNA

• Genome size and number of genes does not necessarily determine organism complexity

12

Transcriptome

• Complete collection of all possible mRNAs (including splice variants) of an organism.

• Regions of an organism’s genome that get transcribed into messenger RNA.

• Transcriptome can be extended to include all transcribed elements, including non-coding RNAs used for structural and regulatory purposes.

13

Proteome

• The complete collection of proteins that can be produced by an organism.

• Can be studied either as static (sum of all proteins possible) or dynamic (all proteins found at a specific time point) entity

14

From DNA to Genome

Watson and Crick DNA model

First protein sequence1955

1960

1965

1970

1975

1980

1985

First protein structure

15

1995

1990

2000 First human genome draft

First bacterial genome

Hemophilus Influenzae

Yeast genome

16

Total 706 456

Eukaryotes 78 43

Bacteria 578 383

Archaea 50 29

Complete Genomes

2008 2007

17

Comparison between the full drafts of the human and chimp genomesrevealed that they differ only by 1.23%

How humans are chimps?

Perhaps not surprising!!!

18

The “post-genomics” eraThe “post-genomics” era

Goal:

to understand the living cell

Annotation Comparativegenomics

Structuralgenomics

Functionalgenomics

What’s Next ?

19

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC

AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA

AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA

TAT GGA CAA TTG GTT TCT TCT CTG AAT ......

.............. TGAAAAACGTA

Annotation

20

Annotation

Identify the genes within a given sequence of DNA

Identify the sitesWhich regulate the gene

Predict the function

21

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC

AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA

AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA

TAT GGA CAA TTG GTT TCT TCT CTG

AAT .................................

.............. TGAAAAACGTA

TF binding sitepromoter

Ribosome binding SiteORF=Open Reading FrameCDS=Coding Sequence

Transcription

Start Site

22

Comparativegenomics

Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGG - - GGATGCGGGCCCTATACCCMouse ATAGCG - - - GGATGCGGCGC -TATACCA

23

Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse.

Conservation of the IGFALS (Insulin-like growth factor)Between human and mouse.

24

Functionalgenomics

25

Understanding the function of genes and other parts of the genome

26

A large network of 8184 interactions among 4140 S. Cerevisiae proteins

A network of interactions can be built For all proteins in an organism

27

Structural genomics

28

Assigning the structures of all proteins

Protein-ligand complexes

Functional sites

fold Evolutionaryrelationship

Shape and electrostatics

Active sites

protein complexes

Biologic processes

29

Resources and Databases

The different types of data are collected in database

– Sequence databases – Structural databases– Databases of Experimental Results

All databases are connected

30

Sequence databases

• Gene database

• Genome database

• SNPs database

• Disease related mutation database

31

Gene database

• Give information into gene functionality

• Alternative splicing of genes– Alternative pattern of exons included to create

gene product

• EST

32

Genome Databases

• Data organized by species

• Clones assembled into contigous pieces ‘contigs’ or whole chromosomes

• Information on non-coding regions

• Relativity

33

Genome Browsers

• Annotation adds value to sequence

• Easy “walk” through the genome

• Comparative genomics

34

Genome Browsers

• UCSC Genome Browser http://genome.ucsc.edu/

• Ensembl Genome Browser (http://www.ensembl.org)

• WormBase: http://www.wormbase.org/

• AceDB: http://www.acedb.org/

• Comprehensive Microbial Resource: http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl

• FlyBase: http://flybase.bio.indiana.edu/

35

SNP database

Single Nucleotide Polymorphisms (SNPs)

• Single base difference in a single position among two different individuals of the same species

• Play an important role in differentiation and disease

36

Sickle Cell Anemia

• Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin

Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

37

Healthy Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

38

Diseased Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

39

Disease Databases

• Genes are involved in disease

• Many diseases are well studied

• Description of diseases and what is known about them is stored

40

Structure Databases

• 3-dimensional structures of proteins, nucleic acids, molecular complexes etc

• 3-d data is available due to techniques such as NMR and X-Ray crystallography

41

42

Databases of Experimental Results

• Data such as experimental microarray images- expression data

• Proteomic data

• Metabolic pathways, protein-protein interaction data, regulatory networks

• ETC………….

43

PubMed

• MEDLINE publication database– Over 17,000 journals– 15 million citations since 1950

Service of the National Library of Medicine

http://www.ncbi.nlm.nih.giv/PubMed

Literature Databases

44

Putting it all Together

• Each Database contains specific information

• Like other biological systems also these databases are interrelated

45

GENOMIC DATAGenBank

DDBJ

EMBL

ASSEMBLED GENOMES

GoldenPath

WormBase

TIGR

PROTEIN

PIR

SWISS-PROT

STRUCTUREPDB

MMDB

SCOP

LITERATURE

PubMed

PATHWAYKEGG

COG

DISEASE

LocusLink

OMIM

OMIA

GENESRefSeq

AllGenes

GDBSNPs

dbSNP

ESTs

dbEST

unigene

MOTIFS

BLOCKS

Pfam

Prosite

GENE EXPRESSION

Stanford MGDB

NetAffx

ArrayExpress