Introduction to Bioinformatics 236523/234525

43
Introduction to Bioinformatics 236523/234525 Lecturer: Prof. Yael Mandel- Gutfreund Teaching Assistance: Shula Shazman Idit kosti urse web site : tp://webcourse.cs.technion.ac.il/236523

description

Introduction to Bioinformatics 236523/234525. Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Shula Shazman Idit kosti. Course web site : http://webcourse.cs.technion.ac.il/236523. What is Bioinformatics?. Course Objectives. To introduce the bioinfomatics discipline - PowerPoint PPT Presentation

Transcript of Introduction to Bioinformatics 236523/234525

Page 1: Introduction to Bioinformatics 236523/234525

Introduction to Bioinformatics236523/234525

Lecturer: Prof. Yael Mandel-Gutfreund

Teaching Assistance: Shula Shazman

Idit kostiCourse web site :http://webcourse.cs.technion.ac.il/236523

Page 2: Introduction to Bioinformatics 236523/234525

2

What is Bioinformatics?

Page 3: Introduction to Bioinformatics 236523/234525

3

Course Objectives

• To introduce the bioinfomatics discipline • To make the students familiar with the major

biological questions which can be addressed by bioinformatics tools

• To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

Page 4: Introduction to Bioinformatics 236523/234525

4

Course Structure and Requirements

1.Class Structure1. 2 hours Lecture 2. 1 hour tutorial

2. Home work• Homework assignments will be given every second week• The homework will be done in pairs.• 5/5 homework assignments will be submitted

2. A final project will be conducted and submitted in pairs

Page 5: Introduction to Bioinformatics 236523/234525

5

Grading

• 20 % Homework assignments• 80 % final project

Page 6: Introduction to Bioinformatics 236523/234525

6

Literature list• Gibas, C., Jambeck, P. Developing Bioinformatics

Computer Skills. O'Reilly, 2001. • Lesk, A. M. Introduction to Bioinformatics. Oxford

University Press, 2002.• Mount, D.W. Bioinformatics: Sequence and Genome

Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004.

Advanced Reading

Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

Page 7: Introduction to Bioinformatics 236523/234525

7

What is Bioinformatics?

Page 8: Introduction to Bioinformatics 236523/234525

8

“The field of science in which biology, computer science, and information technology merge to form a single discipline”

Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

What is Bioinformatics?

Page 9: Introduction to Bioinformatics 236523/234525

9

Central Paradigm in Molecular Biology

mRNAGene (DNA) Protein

21ST centaury

Genome Transcriptome Proteome

Page 10: Introduction to Bioinformatics 236523/234525

10

From DNA to Genome

Watson and Crick DNA model

First protein sequence1955

1960

1965

1970

1975

1980

1985

First protein structure

Page 11: Introduction to Bioinformatics 236523/234525

11

1995

1990

2000 First human genome draft

First genomeHemophilus Influenzae

Yeast genome

Page 12: Introduction to Bioinformatics 236523/234525

12

Total 1379 294

Eukaryotes 133 39

Bacteria 1152 235

Archaea 94 23

Complete Genomes

2010 2005

Page 13: Introduction to Bioinformatics 236523/234525

1,000 Genomes Project: Expanding the Map of Human Genetics

Researchers hope the effort will speed up the discovery of many diseases's genetic roots

13

Page 14: Introduction to Bioinformatics 236523/234525

14

Main Goal: To understand the living

cell

Annotation Comparativegenomics

Structuralgenomics

Functionalgenomics

25000 genomes… What’s Next ?The “post-genomics” The “post-genomics” eraera

Page 15: Introduction to Bioinformatics 236523/234525

From ….25000 genomes

To…Understanding living cells

Page 16: Introduction to Bioinformatics 236523/234525

16

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATGCGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAACTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTCAGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGAAGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAATAT GGA CAA TTG GTT TCT TCT CTG AAT .................... TGAAAAACGTA

Annotation

Page 17: Introduction to Bioinformatics 236523/234525

17

Annotation

Identify the genes within a given sequence of DNA

Identify the sitesWhich regulate the gene

Predict the function

Page 18: Introduction to Bioinformatics 236523/234525

18

How do we identify a genein a genome?

A gene is characterized by several features (promoter, ORF…)some are easier and some harder to detect…

Page 19: Introduction to Bioinformatics 236523/234525

19

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATGCGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAACTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTCAGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGAAGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAATAT GGA CAA TTG GTT TCT TCT CTG AAT .................................

.............. TGAAAAACGTA

TF binding sitepromoter

Ribosome binding SiteORF=Open Reading FrameCDS=Coding Sequence

Tran

script

ion

Star

t Site

Page 20: Introduction to Bioinformatics 236523/234525

20

Using Bioinformatics approaches for Gene hunting

Relative easy in simple organisms (e.g. bacteria)

VERY HARD for higher organism (e.g. humans)

Page 21: Introduction to Bioinformatics 236523/234525

21

Comparativegenomics

Page 22: Introduction to Bioinformatics 236523/234525

22

Comparison between the full drafts of the human and chimp genomesrevealed that they differ only by 1.23%

How humans are chimps?

Perhaps not surprising!!!

Page 23: Introduction to Bioinformatics 236523/234525

So where are we different ??

23

Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGG - - GGATGCGGGCCCTATACCCMouse ATAGCG - - - GGATGCGGCGC -TATACCA

Page 24: Introduction to Bioinformatics 236523/234525

24

And where are we similar ???

VERY SIMAILARConserved between many organisms

VERYDIFFERENT

Page 25: Introduction to Bioinformatics 236523/234525

25

Functionalgenomics

Page 26: Introduction to Bioinformatics 236523/234525

26

TO BE IS NOT ENOUGH In any time point a gene can be functional or not

Page 27: Introduction to Bioinformatics 236523/234525

27

From the gene expression pattern we can lean:

What does the gene do ?When is it needed?What other genes or proteins interact with it?…..

What's wrong??

Page 28: Introduction to Bioinformatics 236523/234525

28

StructuralGenomics

Page 29: Introduction to Bioinformatics 236523/234525

29

The protein three dimensional structure can tell

much more than the sequence alone

Protein-ligand complexes

Functional sites

fold Evolutionaryrelationship

Shape and electrostatics

Active sites

protein complexes

Biologic processes

Page 30: Introduction to Bioinformatics 236523/234525

30

Resources and Databases

The different types of data are collected in database

– Sequence databases – Structural databases– Databases of Experimental Results

All databases are connected

Page 31: Introduction to Bioinformatics 236523/234525

31

Sequence databases

• Gene database• Genome database• Disease related mutation database• ………….

Page 32: Introduction to Bioinformatics 236523/234525

32

Genome Browsers

Easy “walk” through the genome

Page 33: Introduction to Bioinformatics 236523/234525

33

Genome Browsers

• UCSC Genome Browser http://genome.ucsc.edu/

• Ensembl Genome Browser (http://www.ensembl.org)

• WormBase: http://www.wormbase.org/

• AceDB: http://www.acedb.org/

• Comprehensive Microbial Resource: http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl

• FlyBase: http://flybase.bio.indiana.edu/

Page 34: Introduction to Bioinformatics 236523/234525

34

Mutation database

• Single base difference in a single position

among two different individuals of the same species

• Play an important role in differentiation and disease

Page 35: Introduction to Bioinformatics 236523/234525

35

Sickle Cell Anemia

• Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin

Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

Page 36: Introduction to Bioinformatics 236523/234525

36

Healthy Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNAACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

Page 37: Introduction to Bioinformatics 236523/234525

37

Diseased Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNAACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

Page 38: Introduction to Bioinformatics 236523/234525

38

Structure Databases

• 3-dimensional structures of proteins, nucleic acids, molecular complexes etc

• 3-d data is available due to techniques such as NMR and X-Ray crystallography

Page 39: Introduction to Bioinformatics 236523/234525

39

Page 40: Introduction to Bioinformatics 236523/234525

40

Databases of Experimental Results

• Data such as experimental microarray images- gene expression data

• Proteomic data- protein expression data• Metabolic pathways, protein-protein

interaction data, regulatory networks

• ETC………….

Page 41: Introduction to Bioinformatics 236523/234525

41

PubMed

Service of the National Library of Medicine

http://www.ncbi.nlm.nih.gov/pubmed/

Literature Databases

Page 42: Introduction to Bioinformatics 236523/234525

42

Putting it all Together

• Each Database contains specific information

• Like other biological systems also these databases are interrelated

Page 43: Introduction to Bioinformatics 236523/234525

43

GENOMIC DATAGenBank

DDBJ

EMBL

ASSEMBLED GENOMES

GoldenPath

WormBase

TIGR

PROTEINPIR

SWISS-PROT

STRUCTUREPDB

MMDB

SCOP

LITERATUREPubMed

PATHWAYKEGG

COG

DISEASELocusLink

OMIM

OMIA

GENESRefSeq

AllGenes

GDBSNPsdbSNP

ESTsdbEST

unigene

MOTIFSBLOCKS

Pfam

Prosite

GENE EXPRESSION

Stanford MGDB

NetAffx

ArrayExpress