Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA...

55
1 Bioinformatics Joshua Gilkerson Albert Kalim Ka-him Leung David Owen

Transcript of Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA...

Page 1: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

1

Bioinformatics

Joshua GilkersonAlbert KalimKa-him LeungDavid Owen

This presentation will probablyinvolve audiencediscussion, which will createaction items. UsePowerPoint to keep track ofthese action items duringyour presentation

• In Slide Show, click on theright mouse button

• Select “Meeting Minder”• Select the “Action Items” tab• Type in action items as they

come up• Click OK to dismiss this box

This will automatically createan Action Item slide at theend of your presentationwith your points entered.

Page 2: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

2

What is Bioinformatics? Bioinformatics: “The collection,

classification, storage, and analysis ofbiochemical and biological information usingcomputers especially as applied in moleculargenetics and genomics.” (Dictionary.com)

Molecular genetics: “The branch of geneticsthat deals with the expression of genes bystudying the DNA sequences ofchromosomes.” (Dictionary.com)

Page 3: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

3

What is Bioinformatics? (cont.) Another definition of molecular genetics: “The

branch of genetics that deals with hereditarytransmission and variation on the molecularlevel.” (Dictionary.com)

Genomics: “A branch of biotechnology concernedwith applying the techniques of genetics andmolecular biology to the genetic mapping andDNA sequencing of sets of genes or the completegenomes of selected organisms using high-speedmethods, with organizing the results in databases,and with applications of the data (as in medicineor biology).” (Dictionary.com)

Page 4: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

4

How old is the discipline? The answer to this one depends on

which source you choose to read.

From T K Attwood and D J Parry-Smith's"Introduction to Bioinformatics",Prentice-Hall 1999 [Longman HigherEducation; ISBN 0582327881]: "Theterm bioinformatics is used toencompass almost all computerapplications in biological sciences, butwas originally coined in the mid-1980sfor the analysis of biological sequencedata."

Page 5: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

5

How old is the discipline? (cont.)

From Mark S. Boguski's article inthe "Trends Guide toBioinformatics" Elsevier, TrendsSupplement 1998 p1:"The term "bioinformatics" is arelatively recent invention, notappearing in the literature until1991 and then only in the contextof the emergence of electronicpublishing...”

Page 6: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

6

Bioinformatic Research up to 2005

DNA sequence Gene expression Protein

expression Protein Structure Genome mapping

Metabolicnetworks

Regulatorynetworks

Trait mapping Gene function

analysis Scientific

literature

Page 7: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

7

What remains to be done? Comparative

Genomics Description of

mRNAs, proteins(identity andstructure)

Functionalanalyses

Detailedunderstanding ofdevelopment,regulation,variation

Page 8: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

8

The Human Genetic Code

Page 9: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

9

Bioinformatics Activity: Where IsBioinformatics Done?The biggest and best source of

bioinformatics links is the GenomeWeb at the Rosalind FranklinCentre for Genomics Research atthe Genome Campus nearCambridge, United Kingdom.

Others: Research Centers,Sequencing Centers, and "Virtual"Centers (for example consortia andcommunities).

Page 10: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

10

Research Centers Centro Nacional de Biotecnologia (CNB), Madrid, Spain. Computational Biology and Informatics Laboratory at the

University of Pennysylvania, Philadelphia, USA CIRB: Centro Interdipartimentale di Ricerche

Biotecnologiche, Bologna, Italy Cold Spring Harbor Labs, New York, USA European Molecular Biology Laboratory (EMBL),

Heidelberg, Germany. Généthon, France. GIRI: Genetic Information Research Institute, California,

USA. MRC Human Genetics Unit, Edinburgh, United Kingdom. MRC Rosalind Franklin Centre for Genomics

Research(RFCGR), Hinxton, United Kingdom.

Page 11: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

11

Sequencing Centers

The Department of GenomeAnalysis at the Institute ofMolecular Biotechnology, Jena,Germany.

The Australian Genome ResearchFacility, Austalia.

Baylor College of Medicine, USA.Michael Smith Genome Sciences

Centre, Canada.

Page 12: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

12

Virtual Centers

International Center forCooperation in Bioinformaticsnetwork (ICCBnet):http://www.iccbnet.org/

Belgian EMBnet node:http://www.be.embnet.org/

Page 13: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

13

Online Resources: WhatBioinformatics Websites AreThere?

Blogs InformationDirectoriesPortalsSocietiesToolsTutorials

Page 14: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

14

Blogs Bioinformatics.Org is a bioinformatics

blog. The Bio-Web (http://cellbiol.com/) links

to resources online for molecular andcell biologists and covers current newsin various biological/computationalfields.

Genehack (http://genehack.org/)is one of the first bioinformatics blogs.

Page 15: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

15

Information The Australian National Genomic Information

Service (ANGIS) is operated by the AustralianGenomic Information Centre(http://www.angis.org.au/new/about/generalinfo.html#AGIC, currently at the University ofSydney) to offer software, databases,documentation, training and support forbiologists

"The University of Maryland AgNIC gateway(http://agnic.umd.edu/) is a guide to qualityagricultural biotechnology information on theInternet."

Page 16: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

16

Directories Christy Hightower, Engineering

Librarian at the Science andEngineering Library, University ofCalifornia Santa Cruz has already donethis better than me.

Visit her excellent article(http://www.istl.org/istl/02-winter/internet.html) aboutbioinformatics Net resources in Issuesin Science and TechnologyLibrarianship.

Page 17: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

17

Societies

Humberto Ortiz Zuazaga kindlyintroduced The InternationalSociety for Computational Biology(http://www.iscb.org/) which hepoints out "has links to programsof study and online courses incomputational biology and to jobpostings".

Page 18: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

18

Collection of Tools Bioinformatics.Org for a collection of

bioinformatics toolbox. The Rosalind Franklin Center's

"GenomeWeb“(http://www.rfcgr.mrc.ac.uk/GenomeWeb/).

Of historical interest only now, is thelegendary " Pedro's Molecular BiologySearch and Analysis Tools“(http://www.public.iastate.edu/~pedro/research_tools.html) that provides a collection ofWWW Links to Informationand Services Useful to Molecular Biologists.

Page 19: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

19

Portals Bioinformatics.Org is an international organization which

promotes freedom and openness in the field of bioinformaticsand is the root domain of a damned fine Website .

CCP11 (Collaborative Computational Project 11,http://www.rfcgr.mrc.ac.uk/CCP11/index.jsp) is another productof the UK's Genome Campus. CCP11 is funded by the BBSRCand is hosted at the MRC Rosalind Franklin Center forGenomics Research RFCGR located on the Wellcome TrustGenome Campus, Cambridge.“

Jennifer Steinbachs runs compbiology.org which is a generalcomputational biology site as well as being a portal to her ownwork.

BioPlanet (http://www.bioplanet.com/index.php) is well worthvisiting. It describes itself as "a not-for-profit site, funded withour resources, for [its users'] benefit."

ColorBasePair (http://www.colorbasepair.com/) is a denselypacked portal with lots of bioinformatics links.

Page 20: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

20

Genome Project

Ka-Him Leung

Page 21: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

21

Genomics

Genome– complete set of genetic instructions

for making an organismGenomics

– attempts to analyze or compare theentire genetic complement of aspecies

Page 22: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

22

Genomic Issues Genomic DNA is a linear sequence of 4

nucleotides (A, C, G, T)

DNA forms the double helix by pairing with itsreverse complement (A-T, G-C)

Genomic DNA contains many genes, each ofwhich is formed from one or more exons(stretches of genomic DNA), separated byintrons

A gene is copied into complementary RNA in aprocess called transcription (U substitutes T)

Page 23: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

23

Genomic Issues (cont.) DNA sequencing, the process of determining the exact

order of the 3 billion chemical building blocks (calledbases and abbreviated A, T, C, and G) that make up theDNA of the 24 different human chromosomes

In the human genome, about 3 billion bases are arrangedalong the chromosomes in a particular order for eachunique individual.

One million bases (called a megabase and abbreviatedMb) of DNA sequence data is roughly equivalent to 1megabyte of computer data storage space. Since thehuman genome is 3 billion base pairs long, 3 gigabytesof computer data storage space are needed to store theentire genome.

Page 24: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

24

Different Genomics

Comparative Genomics: the managementand analysis of the millions of datapoints that result from Genomics

Functional Genomics: ways of identifyinggene functions and associations

Structural Genomic: emphasizes high-throughput, whole-genome analysis.

Page 25: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

25

History of Genome 1980

– First complete genome sequence for an organism is published• FX174 - 5,386 base pairs coding nine proteins. (~5Kb)

1995– First bacterial genome(Haemophilus influenzea) sequenced (1.8 Mb)

1996– Saccharomyces cerevisiae genome sequenced (baker's yeast, 12.1

Mb) 1997

– E. coli genome sequenced (4.7 Mbp) 1998

– Sequence of first human chromosome completed 2000

– A. Thaliana genome (flower) (100 Mb)– D. Melanogaster genome(Fruitfly) (180Mb)

2001– 10,000 full-length human cDNAs sequenced

2003– Human genome sequence completed

Page 26: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

26

Human Genome Project

U.S. Human Genome Project was a 13-yeareffort coordinated by the Department of Energyand the National Institutes of Health.

Start at 1990. To complete mapping andunderstanding of all the genes of humanbeings.

In June 2000, scientists completed the firstworking draft of the human genome.

A high-quality, "finished" full sequence wascompleted in April 2003.

Page 27: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

27

Goals of HGP– identify all the approximately 20,000-25,000 genes in

human DNA,

– determine the sequences of the 3 billion chemicalbase pairs that make up human DNA,

– store this information in databases,

– improve tools for data analysis,

– transfer related technologies to the private sector,and

– address the ethical, legal, and social issues (ELSI)that may arise from the project.

Page 28: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

28

DNA Sequencing Process Mapping

– Identify set of clones that span region of genome to besequenced

Library Creation– Make sets of smaller clones from mapped clones

Template Preparation– Purify DNA from smaller clones.– Setup and perform sequencing chemistries

Gel Electrophoresis– Determine sequences from smaller clones

Pre-finishing and Finishing– Specialty techniques to produce high quality sequences

Data editing Annotation– Quality assurance; Verification; Biological annotation;– Submission to public database

Page 29: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

29

Page 30: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

30

Future of HGP HGP is the first step in understanding humans at the molecular

level. Work is still ongoing to determine the function of many ofthe human genes.

What still need to be done:– Gene number, exact locations, and functions– Gene regulation– DNA sequence organization– Chromosomal structure and organization– Noncoding DNA types, amount, distribution, information content, and

functions– Coordination of gene expression, protein synthesis, and post-

translational events– Interaction of proteins in complex molecular machines– Predicted vs. experimentally determined gene function– Evolutionary conservation among organisms– Protein conservation (structure and function)– Proteomes (total protein content and function) in organisms

Page 31: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

31

Page 32: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

32

Sequence Alignment

Joshua Gilkerson

Page 33: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

33

Sequence Alignment

In genomics, many situations arisewhen sequences need to becompared or searched for similarsub-sequences.

Both of these task are aided byaligning the sequences to oneanother.

The two sequences are called thesubject and the query.

Page 34: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

34

Local vs. Global Global alignment aligns the entire query

to the entire subject. Local alignment aligns a piece one

sequence to a piece of the other. Which is used depends on the

application. Surprisingly, these are computationally

equivalent. Sometime local-global mixed are used,

aligning the entire query sequenceagainst any one part of the subject.

Page 35: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

35

Example Alignments Global AlignmentAGCTCGA--GATTGCTGGACATGCTGCTGCT| |||| |||||| |||| ||||||A--TCGAGCGATTGC-----ATGCAGCTGCT Local Alignment

– Same subject as above– Query Sequence: GAGAT

AGCTCGAGATTGCTGGACATGCTGCTGCT|| | ||||| || ||AGAT GAGAT GAGAT

Page 36: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

36

Model for Alignment

The best alignment is the onechosen from all possiblealignments that minimizes thescore.

Scoring is done pairwise at eachposition along the alignment.

Introducing a gap is moreexpensive than extending onealready introduced(affine gappenalty).

Page 37: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

37

Model for Alignment Score = ∑ gap penalties + ∑ similarity

weights Gap penalty = open penalty + size * size

penalty Open penalty and size penalty are constants

>=0. Similarity weight is zero for same base, >=0

for disparate bases. BLOSUM similarity weights are most

commonly used.

Page 38: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

38

Scoring Example Same example as earlier Using:

– Gap opening penalty of 1– Gap size penalty of 1– Similarity scores all 1

AGCTCGA--GATTGCTGGACATGCTGCTGCT| |||| |||||| |||| ||||||A--TCGAGCGATTGC-----ATGCAGCTGCT0210000210000002111100001000000=13

Page 39: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

39

Needleman-Wunsch Algorithm Sequences Q and S Scoring matrix M len(Q) x len(S) Similarity matrix s Gap length penalty - g opening penalty -

0 M(i,j) - score for best alignment of first i

elements of Q and first j elements of S. M(i,j) = minimum of

– M(i-1,j)+g,– M(i,j-1)+g,– M(i-1,j-1)+s(Q(i),Q(j))

Page 40: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

40

Needleman-Wunsch Example

CAT vs TAG<-s M->

g=1

00T130C1110AGTCA

G 0 3G2A1T

3210TAC

Page 41: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

41

Needleman-Wunsch Example

CAT vs TAG<-s M->

g=1

00T130C1110AGTCA

G 0 2333G3222A2221T3210TAC

Page 42: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

42

Needleman-Wunsch Example

CAT vs TAG<-s M->

g=1

00T130C1110AGTCA

G 0 2333G3222A2221T3210TAC

Page 43: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

43

Needleman-Wunsch Example

Two equally good alignments:-CAT C-AT | and |T-AG -TAG

Page 44: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

44

Needleman-Wunsch Runs in n2 time. Easily generalized to allow gap opening

penalty by using 3 copies of M, one for prefixesending with a match, one ending with a gap ineach sequence.

Easily generalized to local alignment by sayings is best score for an alignment of some suffixof the sequences ending at i and j. In practice,this means:– The first row and column are filled with all zeroes

instead of just the top-left-most position.– The end of the alignment is at the globally minimal

position, not the lower-left corner.– The beginning is at the location where backtracking

cannot continue.

Page 45: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

45

Other Alignment Tools The Basic Local Alignment Search Tool

(BLAST) is probably the most widelyused tool in genomics.– Finds local alignments.– Used on very large sequences (entire

genomes) Smith-Waterman Algorithm - Adaptation

of Needleman-Wunsch for localalignments.

FASTA package

Page 46: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

46

The Importance ofBioinformatics and Summary

David Owen

Page 47: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

47

The importance of bioinformatics

Traditionally, molecular biologyresearch was done entirely in alaboratory.

But the genome projects hasincreased the data by a hugeamount. Thus the researchersneed to incorporate computers formaking sense of the vast amountof data.

Page 48: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

48

Challenges Intelligent and efficient storage of the massive

data. Easy and reliable access to the data. Development of tools which allow the

extraction of meaningful information.

The developer of the tool must also consider thefollowing:

The user (biologist) might not be an expert withcomputers.

The tool must be able to provide access acrossthe internet.

Page 49: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

49

Processes

Three main processes a bioinformatics toolmust have: DNA sequence determines protein sequence Protein sequence determines protein structure Protein structure determines protein function

The information obtained from these processesallow us to understand better of the biology oforganisms.

Page 50: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

50

Computer Scientist vs. Biologist Computer scientist:

– Logic– Problem-solving– Process-oriented– Algorithmic– Optimizing

Biologist:– Knowledge gathering– Experimentally-focused– Exceptions are as common as rules– Describe work as a story– Develop conclusions and models

The need for communication between computer scientist andbiologist.

Page 51: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

51

Research Areas

Further research areas include:Sequence alignmentProtein structure predictionPrediction of gene expressionProtein-protein interactionsModeling of evolution

Page 52: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

52

Future of Bioinformatics- Integration of a wide variety of data sources.

E.g. Combining the GIS data (maps) andweather systems, with crop health andgenotype data, allows us to predict successfuloutcomes of agricultural experiments.

- Large-scale comparative genomics. E.g. thedevelopment of tolls that can do 10-waycomparisons of genomes.

- Modeling and visualization of full networks ofcomplex system.

Page 53: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

53

Ultimate Goal

Obtain a better understanding of thebiology of organisms through theexamination of biologicalinformation hidden in the vastamount of data we have.

This knowledge will allow us toimprove our standard of life.

Page 54: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

54

References http://www.ornl.gov/sci/techresources/H

uman_Genome/project/about.shtml http://www.genome.gov/ http://bioinfo.mbb.yale.edu/course/proje

cts/final-4/ http://www.dictionary.com http://www.ebi.ac.uk/2can/bioinformatics

/index.html http://bioinformatics.ca/workshop_pages

/bioinformatics/day1-files/1.0_intro_bffo_2005.pdf

Page 55: Bioinformaticsdouglas/Classes/cs521/bioinformatics/Bioinformatics2005.pdf–Gene regulation –DNA sequence organization –Chromosomal structure and organization –Noncoding DNA

55

References (cont.)

http://elegans.uky.edu/520/Lecture/index.html

http://bioinformatics.org/