CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10...

17
©2012 Sami Khuri Biology/CS 123A Bioinformatics I 2.1 Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012 Bioinformatics Two Introduction to Bioinformatics ©2012 Sami Khuri What is Bioinformatics? The Human Genome Project (HGP) Mapping Model Organisms Types of Databases Applications of Bioinformatics Genome Research ©2012 Sami Khuri ©2012 Sami Khuri Pathway to Genomic Medicine Sequencing of the human DNA Personalized medicine Cure for diseases Implicating genetic variants with human disease Interpreting the human genome sequence Human Genome Project ENCODE Project HapMap Project Genomic Medicine ©2012 Sami Khuri The Human Genome Project • The HGP is a multinational effort, begun by the USA in 1988, whose aim is to produce a complete physical map of all human chromosomes, as well as the entire human DNA sequence. The ultimate goal of genome research is to find all the genes in the DNA sequence and to develop tools for using this information in the study of human biology and medicine. The primary goal of the project is to make a series of descriptive diagrams (called maps) of each human chromosome at increasingly finer resolutions. ©2012 Sami Khuri Bioinformatics and the Internet The recent enormous increase in biological data has made it necessary to use computer information technology to collect, organize, maintain, access, and analyze the data. Computer speed, memory, exchange of information over the Internet has greatly facilitated bioinformatics. • The bioinformatics tools available over the Internet are accessible, generally well developed, fairly comprehensive, and relatively easy to use. @2002-10 Sami Khuri Cytogenetic map of chromosome 19 telomere centromere telomere ©2012 Sami Khuri

Transcript of CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10...

Page 1: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.1

Sami Khuri

Department of Computer Science

San José State University

Biology/CS 123A

Fall 2012

Bioinformatics

Two

Introduction to Bioinformatics

©2012 Sami Khuri

What is Bioinformatics?

� The Human Genome

Project (HGP)

� Mapping

� Model Organisms

� Types of Databases

�Applications of

Bioinformatics

� Genome Research

©2012 Sami Khuri

©2012 Sami Khuri

Pathway to Genomic

Medicine

Sequencing of

the human

DNA

Personalized

medicine

Cure for

diseases

Implicating

genetic

variants with

human disease

Interpreting

the human

genome

sequence

Human

Genome

Project

ENCODE

Project

HapMap

Project

Genomic

Medicine

©2012 Sami Khuri

The Human Genome

Project• The HGP is a multinational effort, begun by the USA in

1988, whose aim is to produce a complete physical map of all human chromosomes, as well as the entire human DNA sequence.

• The ultimate goal of genome research is to find all the genes in the DNA sequence and to develop tools for using this information in the study of human biologyand medicine.

• The primary goal of the project is to make a series of descriptive diagrams (called maps) of each human chromosome at increasingly finer resolutions.

©2012 Sami Khuri

Bioinformatics and

the Internet• The recent enormous increase in biological data

has made it necessary to use computer information technology to collect, organize, maintain, access, and analyze the data.

• Computer speed, memory, exchange of information over the Internet has greatly facilitated bioinformatics.

• The bioinformatics tools available over the Internet are accessible, generally well developed, fairly comprehensive, and relatively easy to use.

@2002-10 Sami Khuri

Cytogenetic map

of chromosome

19

telomere

centromeretelomere

©2012 Sami Khuri

Page 2: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.2

©2012 Sami Khuri

@2002-10 Sami Khuri

Other Species

As part of the HGP, genomes of other organisms, such as

bacteria, yeast, flies and mice are also being studied.

p53 gene

pax6 geneDiabetes

C elegans

Baker’s yeast

DNA repair

Cell division

Chimps are infected with SIV

Very rarely progress to AIDS

©2012 Sami Khuri

@2002-10 Sami Khuri

Other Sequenced

Genomes

©2012 Sami Khuri

Model Organisms

• A model organism is an organism that is extensively studied to understand particular biological phenomena.

• Why have model organisms? The hope is that discoveries made in model organisms will provide insight into the workings of other organisms.

• Why is this possible? This works because evolution reuses fundamental biological principles and conserves metabolic, regulatory, and developmental pathways.

Studying Human Diseases

Copyright © 2006 Pearson Prentice Hall, Inc.

©2012 Sami Khuri

©2012 Sami Khuri

Goals of the HGP

• To identify all the approximately 20,000-

25,000 genes in human DNA,

• To determine the sequences of the 3.2 billion

chemical base pairs that make up human

DNA,

• To store this information in databases,

• To improve tools for data analysis,

• To address the ethical, legal, and social issues

(ELSI) that may arise from the project.

©2012 Sami Khuri

HGP Finished Before

Deadline

• In 1991, the USA Congress was told

that the HGP could be done by 2005 for

$3 billion.

• It ended in 2003 for $2.7 billion,

because of efficient computational

methods.

Page 3: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.3

©2012 Sami Khuri

What is Bioinformatics?

Set of Tools

• The use of computers to collect,

analyze, and interpret biological

information at the molecular level.

• A set of software tools

for molecular sequence

analysis

©2012 Sami Khuri

What is Bioinformatics?

A Discipline

• The field of science, in which biology,

computer science, and information

technology merge into a single discipline.

Definition of NCBI (National Center for Biotechnology Information)

• The ultimate goal of bioinformatics is to

enable the discovery of new biological insights

and to create a global perspective from which

unifying principles in biology can be discerned.

©2012 Sami Khuri

Why Study Bioinformatics (I)

• Bioinformatics is intrinsically

interesting.

• Bioinformatics offers the prospect

of finding better drug targets earlier

in the drug development process.

– By looking for genes in model organisms that are

similar to a given human gene, researchers can

learn about protein the human gene encodes and

search for drugs to block it.

@2002-10 Sami Khuri

How can Bioinformatics Help?

©2012 Sami Khuri

Rational drug designStructure-based drug design

use docking algorithmsto design molecule thatcould bind the model structure

Scientific American July 2000 ©2012 Sami Khuri ©2012 Sami Khuri

Why Study Bioinformatics (II)

• Molecular biology is the new

frontier of 21st century science.

– DNA, RNA, genes, stem cells,

etc.. are everywhere in the

news.

• Science Magazine celebrated

its 125th anniversary by issuing twenty five

big questions facing science over the next

quarter-century. www.sciencemag.org/sciext/125th

Page 4: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.4

©2012 Sami Khuri

Science: Top 25 Questions (I)

* What Is the Universe Made Of?

* What is the Biological Basis of Consciousness?

• Why Do Humans Have So Few Genes?

• To What Extent Are Genetic Variation and Personal Health Linked?

* Can the Laws of Physics Be Unified?

* How Much Can Human Life Span Be Extended?

• What Controls Organ Regeneration?

• How Can a Skin Cell Become a Nerve Cell?

• How Does a Single Somatic Cell Become a Whole Plant?

* How Does Earth's Interior Work?

* Are We Alone in the Universe?

* How and Where Did Life on Earth Arise?

©2012 Sami Khuri

Science: Top 25 Questions (II)

• What Determines Species Diversity?

• What Genetic Changes Made Us Uniquely Human?* How Are Memories Stored and Retrieved?

• How Did Cooperative Behavior Evolve?

• How Will Big Pictures Emerge from a Sea of Biological Data?

* How Far Can We Push Chemical Self-Assembly?

* What Are the Limits of Conventional Computing?

• Can We Selectively Shut Off Immune Responses?• Do Deeper Principles Underlie Quantum Uncertainty and Nonlocality?

• Is an Effective HIV Vaccine Feasible?* How Hot Will the Greenhouse World Be?

* What Can Replace Cheap Oil -- and When?

@2002-10 Sami Khuri @2002-10 Sami Khuri

Red Blood Cells

©2012 Sami Khuri

@2002-10 Sami Khuri

©2012 Sami Khuri ©2012 Sami Khuri

What do Bioinformaticians do?

• They analyze and interpret data

• Develop and implement algorithms

• Design user interface

• Design database

• Automate genome analysis

• They assist molecular biologists in data

analysis and experimental design.

Page 5: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.5

©2012 Sami Khuri

Databases for Storage

and Analysis

- Databases store data that need to be analyzed

- By comparing sequences, we discover:- How organisms are related to one another

- How proteins function

- How populations vary

- How diseases occur

- The improvement of sequencing methods generated a lot of data that need to be:

- stored - organized - curated

- annotated - managed - networked

- accessed - assessed

Types of Databases

In 2006 there were

858 databases

classified into 14

major categories

©2012 Sami Khuri

Three Major Databases

• GenBank from the NCBI (National Center of Biotechnology Information), National Library of Medicine http://www.ncbi.nlm.nih.gov

• EBI (European Bioinformatics Institute) from the European Molecular Biology Library

http://www.ebi.ac.uk

• DDBJ (DNA DataBank of Japan)

http://www.ddbj.nig.ac.jp

©2012 Sami Khuri

GenBank Taxonomic

Sampling

Homo sapiens

Mus musculus

Drosophila melanogaster

Caenorhabditis elegans

Arabidopsis thaliana

Oryza sativa

Rattus norvegicus

Danio rerio

Saccharomyces cerevisiae

62.1%

7.7%

6.1%

3.3%

2.9%

1.3%

0.8%

0.6%

0.6%

©2012 Sami Khuri

©2012 Sami Khuri

GenBank

GenBank is the NIH genetic sequence

database of all publicly available DNA

and derived protein sequences, with

annotations describing the biological

information these records contain.

©2012 Sami Khuri

What does NCBI do?

NCBI: established in 1988 as a national resource

for molecular biology information.

– it creates public databases,

– it conducts research in computational biology,

– it develops software tools for analyzing genome data,

and

– it disseminates biomedical information,

all for the better understanding of molecular

processes affecting human health and disease.

Page 6: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.6

©2012 Sami Khuri

Applications of

Genome Research

Current and potential applications of Genome Research include:

– Molecular Medicine

– Microbial Genomics

– Risk Assessment

– Bioarcheology, Anthropology, Evolution and Human Migration

– DNA Identification

– Agriculture, Livestock Breeding and Bioprocessing

©2012 Sami Khuri

Molecular Medicine

• Improve the diagnosis of disease

• Detect genetic predispositions to disease

• Create drugs based on molecular

information

• Use gene therapy and control systems as

drugs

• Design custom drugs on individual genetic

profiles.

©2012 Sami Khuri

©2012 Sami Khuri

Microbal Genomics

• Swift detection and treatment in clinics of disease-causing microbes: pathogens

• Development of new energy sources: biofuels

• Monitoring of the environment to detect chemical warfare

• Protection of citizens from biological and chemical warfare

• Efficient and safe clean up of toxic waste.

©2012 Sami Khuri

DNA Identification I

• Identify potential suspects whose DNA may match evidence left at crime scenes

• Exonerate persons wrongly accused of crimes

• Establish paternity and other family relationships

• Match organ donors with recipients in transplant programs

Louis XVII

Louis XVII: son of Louis XV1 and Marie-Antoinette who

died from tuberculosis in 1795 at the age of 12

©2012 Sami Khuri ©2012 Sami Khuri

DNA Identification II

• Identify endangered and protected species as an aid to wildlife officials and also to prosecute poachers

• Detect bacteria and other organisms that may pollute air, water, soil, and food

• Determine pedigree for seed or livestock breeds

• Authenticate consumables such as wine and caviar

Page 7: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.7

©2012 Sami Khuri

Agriculture, Livestock

Breeding and Bioprocessing

• Grow disease-resistant, insect-resistant, and drought-resistant crops

• Breed healthier, more productive, disease-resistant farm animals

• Grow more nutritious produce

• Develop biopesticides

• Incorporate edible vaccines into food products

@2002-10 Sami Khuri

©2012 Sami Khuri

Mighty Mouse versus

Human Genome

• Humans and mice have

similar genomes, but

their genes are ordered

differently

• ~245 rearrangements

– Reversals

– Fusions

– Fissions

– Translocation

©2012 Sami Khuri ©2012 Sami Khuri

Translocation

Fusion and Fission

Translocation1 2 3 9 8 7

44 5 6 2 4 1

1 2 6 2 4 1

4 5 3 9 8 7

1 2 3 4

5 61 2 3 4 5 6

Fusion

Fission

Figure 4.13a The Biology of Cancer (© Garland Science 2007)

Chromosomal Translocations

©2012 Sami Khuri

@2002-10 Sami Khuri

Flies have orthologs to

humans disease-causing

genes in categories such

as:

• neurological

• renal

• immunological

• endocrine

• cardiovascular

• metabolic

• blood-vessel and

• cancerous disorders

Flies can provide insights into human disease at the

systems level, revealing how different genes interact in vivoDiscovering Genomics, Campbell, 2007 ©2012 Sami Khuri

Page 8: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.8

What have we learned

from the HGP?

A small

portion of the

genome

codes for

proteins,

tRNAs

and rRNAs

©2012 Sami Khuri

@2002-10 Sami Khuri

What have we learned

from the HGP?

The small number of genes

©2012 Sami Khuri

©2012 Sami Khuri

Alternative Splicing

Genomic Medicine by Guttmacher et al., NEJM, 2002

@2002-10 Sami Khuri

The Alpha-Tropomyosin

Gene

©2012 Sami Khuri

©2012 Sami Khuri

Gene Prediction

• Problem: Given a genomic DNA sequence, identify where the genes are.

• Input: A genomic DNA sequence.

• Output: Location of gene elements in the raw, genomic DNA sequence, including (for eukaryotes):

– exons

– introns

Anatomy of an Intron

©2012 Sami Khuri

Page 9: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.9

©2012 Sami Khuri

Building upon the

Foundations of HGP • As we build upon the foundation laid by the

Human Genome Project, our ability to explore

uncharted frontiers will hinge upon melding

biological know-how with expertise in computer

science, physics, math, clinical research,

bioethics, and many other disciplines.

• A firm understanding of the powerful potential

of genomics, proteomics, and bioinformatics

will be essential to success in this amazing new

world. Discovering Genomics, Campbell, 2007 – Preface by Francis Collins

©2012 Sami Khuri

Genomics is a Way of

Seeing Life• Genome: the complete (haploid) DNA content of an

organism.

• Genomics: the field of genome studies.

• Genomics

– is not just a collection of methods

– has become an enhanced way of seeing life.

• Genomics includes the study of interaction of

molecules inside the cell:

DNA Protein Lipids Carbohydrates

• Genomics requires us to analyze, hypothesize, think,

and formulate models.

©2012 Sami Khuri

Pathway to Genomic

Medicine

Sequencing of

the human

DNA

Personalized

medicine

Cure for

diseases

Implicating

genetic

variants with

human disease

Interpreting

the human

genome

sequence

Human

Genome

Project

ENCODE

Project

HapMap

Project

Genomic

Medicine

©2012 Sami Khuri

ENCODE

ENCODE: ENCyclopedia Of DNA

Elements

• Goal: Compile a Comprehensive

Encyclopedia of all Functional Elements in

the Human Genome

• Initial Pilot Project: 1% of Human Genome

• Apply Multiple, Diverse Approaches to Study

and Analyze that 1% in a Consortium Fashion

©2012 Sami Khuri

The ENCODE Project

• 44 regions of the human genome were

selected, spanning 30 megabases (about 1% of

the human genome).

• The ENCODE regions include:

– About 50% randomly selected loci

– About 50% containing well-known genes

• Example: alpha and beta globins, CFTR

• The ENCODE Project Consortium released

its findings in a 2007 article (>250 coauthors)

©2012 Sami Khuri

Major Findings of ENCODE

• The majority of all nucleotides are transcribed

as part of

– Coding transcripts

– Noncoding RNAs

– Random transcripts that may have no biological

function.

• Many genes have multiple, previously

undetected, transcription start sites

– Regulatory sequences are as likely to be upstream

as downstream of the major start sites.

Page 10: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.10

Constrained Sequences

Many important

functional

sequences are not

conserved.

They are even highly

variable among humans.

©2012 Sami Khuri ©2012 Sami Khuri

Pathway to Genomic

Medicine

Sequencing of

the human

DNA

Personalized

medicine

Cure for

diseases

Implicating

genetic

variants with

human

disease

Interpreting

the human

genome

sequence

Human

Genome

Project

ENCODE

Project

HapMap

Project

Genomic

Medicine

©2012 Sami Khuri

Genomic Variations

• Collection of genomic variations

makes any person a unique human

being. It contributes to that person’s:

– Potential to learn

– Predisposition to disease

– Predisposition to drug addiction

– Response to pharmaceutical interventions

• There are variations within, as well

as, between populations.

• The variation between individual genomes has

sparked a biotech boom in the area of SNP discovery.

GCCCGCCTC

CGGGCGGAG

GCC CTC

CGG GAG

Single Nucleotide Polymorphism (SNP)

GCCCACCTCCTC

CGGGTGGAGGAGCopy Number Variant (CNV)

“Indel” Polymorphism

GCCCACCTC

CGGGTGGAG

Types of

Genomic Variations

©2012 Sami Khuri

Single Nucleotide

PolymorphismA Single Nucleotide Polymorphism is a single base-pair

mutation that occurs at a specific site in the DNA

sequence.

. 90% of all human chromosomes

have the following sequence at a

particular location (i.e., unique locus)

But 10% of all alleles have a

slightly different sequence at that

particular location (i.e., unique locus)

©2012 Sami Khuri

What is a Polymorphism?

• A polymorphism is a difference in DNA

sequence among individuals.

• Genetic variations occurring in more than

1% of a population would be considered

useful polymorphisms for genetic analysis.

• SNP: position in a genome at which two or

more different bases occur in the population,

each with a frequency greater than 1%.

©2012 Sami Khuri

Page 11: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.11

©2012 Sami Khuri

SNPs and Human

Variations To classify a variation as a SNP it should

occur in at least 1% of the population.

©2012 Sami Khuri

Where do SNPs Fall?

• SNPs may fall:

–within coding sequences of genes,

–noncoding regions of genes, or

– in the intergenic regions between

genes.

©2012 Sami Khuri

International

Haplotype Map Project

• Goal of International Haplotype Map Project

– Develop a haplotype map of the human genome.

• The “HapMap” describes common patterns of human

DNA sequence variation

– key source for researchers to use to find genes affecting

health, disease, and responses to drugs, and environmental

factors.

• Haplotypes are groups of SNPs transmitted in

“blocks”.

• These blocks can be characterized by a subset of their

SNPs (tags).©2012 Sami Khuri

CAGATCGCTGGATGAATCGCATCTGTAAGCAT

CGGATTGCTGCATGGATCGCATCTGTAAGCAC

CAGATCGCTGGATGAATCGCATCTGTAAGCAT

CAGATCGCTGGATGAATCCCATCAGTACGCAT

CGGATTGCTGCATGGATCCCATCAGTACGCAT

CGGATTGCTGCATGGATCCCATCAGTACGCAC

SNP2

SNP3

SNP4

SNP5

↓SNP6

↓SNP1

Block 1 Block 2

SNP7

↓SNP8

Teri A

. Man

olio

, NH

GR

I, 2008

©2012 Sami Khuri

CAGATCGCTGGATGAATCGCATCTGTAAGCAT

CGGATTGCTGCATGGATCGCATCTGTAAGCAC

CAGATCGCTGGATGAATCGCATCTGTAAGCAT

CAGATCGCTGGATGAATCCCATCAGTACGCAT

CGGATTGCTGCATGGATCCCATCAGTACGCAT

CGGATTGCTGCATGGATCCCATCAGTACGCAC

SNP3

↓SNP5

↓SNP6

Block 1 Block 2

SNP7

↓SNP8

Teri A

. Man

olio

, NH

GR

I, 2008

Block 1 Block 2

©2012 Sami Khuri

CAGATCGCTGGATGAATCGCATCTGTAAGCAT

CGGATTGCTGCATGGATCGCATCTGTAAGCAC

CAGATCGCTGGATGAATCGCATCTGTAAGCAT

CAGATCGCTGGATGAATCCCATCAGTACGCAT

CGGATTGCTGCATGGATCCCATCAGTACGCAT

CGGATTGCTGCATGGATCCCATCAGTACGCAC

SNP3

↓SNP6

↓SNP8

Teri A

. Man

olio

, NH

GR

I, 2008

Block 1 Block 2

Page 12: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.12

©2012 Sami Khuri

Tag SNP

GTT 35%

CTC 30%

GTT 10%

GAT 8%

CAT 7%

CAC 6%

other haplotypes 4%

Block 1 Block 2 FrequencySingleton

Teri A

. Man

olio

, NH

GR

I, 2008

©2012 Sami Khuri

GWAS: The New Wave

Hokusai: The Great Wave

The New Wave:

Genome Wide

Association Studies

©2012 Sami Khuri

Genome-Wide Association

Study• Method for interrogating all 10 million

variable points across human genome.

• Variation is inherited in groups, or blocks, so

not all 10 million points have to be tested.

• NIH is interested in advancing genome-wide

association studies (GWAS) to identify

common genetic factors that influence health

and disease.

Teri A. Manolio, NHGRI, 2008

©2012 Sami Khuri

Mapping Relationships

Among SNPs

Christensen and Murray, 2007

©2012 Sami Khuri

@2002-10 Sami Khuri

SNP with its two allelic possibilities

Associations between

SNP variants: various

shades from whiteto red. Deepest red

indicating the

strongest association.

Patterns of triangular

blocks of strong association

are separated by short nodes with very little association.

Finding set of tags is equivalent to

“Minimum Dominating Set Problem”

©2012 Sami Khuri

@2002-10 Sami Khuri

SNP with its two allelic possibilities

Testing for one

SNP might provide

almost complete genetic

information for that block.

Tagging SNP:

block 1 � 3t

block 2 � 8t They can serve as a surrogate

for any variant within its block

Page 13: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.13

To produce a genome-wide map of common variation

Genotype 6 Million SNPs in Four populations in Two Phases:

• CEPH (CEU) (Europe - n = 90, trios)

• Yoruban (YRI) (Africa - n = 90, trios)

• Japanese (JPT) (Asian - n = 45)

• Chinese (HCB) (Asian - n =45)

Nature 437: 1299-320, 2005

www.hapmap.orgwww.hapmap.org

©2012 Sami Khuri ©2012 Sami Khuri

SNP: rs2162709 on chromosome 5

©2012 Sami Khuri

Myocardial Infarction

and rs1333049

©2012 Sami Khuri

Genome-wide Association Analysis of Coronary Artery Disease, by Samani et al, NEJM 2007

©2012 Sami Khuri

Methods Used for

SNP Analysis• Regression Models

• Exhaustive Searches

– Two-Locus Interaction

– Higher-Order Interaction

• Recursive Partitioning

Approach

• Random Forest Approach

• Multifactor

Dimensionality Reduction

• Bayesian model selectionDetecting gene-gene interactions that underlie human diseases” by Cordell, Nature, June 2009

©2012 Sami Khuri

Pathway to Genomic

Medicine

Sequencing of

the human

DNA

Personalized

medicine

Cure for

diseases

Implicating

genetic

variants with

human

disease

Interpreting

the human

genome

sequence

Human

Genome

Project

ENCODE

Project

HapMap

Project

Genomic

Medicine

Page 14: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.14

©2012 Sami Khuri

Understanding the link between -

DNA sequence Biology/Disease

(Genotype) (Phenotype)

Environment

ATTCGCATGGACC

CA

The Next ChallengePersonalized medicine is the use of diagnostic and screening methods to better manage the individual patient’s disease or predisposition toward a disease.

Personalized medicine will enable risk assessment, diagnosis, prevention, and therapy specifically tailored to the unique characteristics of the individual, thus enhancing the quality of life and public health.

Personalized Medicine is Genotype-Specific Treatment.

Personalized Medicine

©2012 Sami Khuri

©2012 Sami Khuri

Prostate Cancer Diagnosis

cancercontrol.cancer.gov/od/phg/presentations/Xu.pdf

©2012 Sami Khuri

@2002-10 Sami Khuri

GWAS and

Prostate Cancer

cancercontrol.cancer.gov/od/phg/presentations/Xu.pdf

23andme.com

decodeme.com

navigenics.com

Genetic Testing

for the Public

©2012 Sami Khuri ©2012 Sami Khuri

Biomarkers and

Human Disease

• Improve clinical trial design

• Identify disease subsets

• Guide disease selection for new therapies

Page 15: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.15

©2012 Sami Khuri

Pharmacogenomics

• Pharmacogenomics deals with the influence of genetic variation on drug response in patients by correlating gene expression or SNPs with a drug's efficacy or toxicity.

• Pharmacogenomics aims to optimize drug therapy, with respect to the patients' genotype, to ensure maximum efficacy with minimal adverse effects.

• Such approaches promise the advent of “personalized medicine” in which drugs and drug combinations are optimized for each individual's unique genetic makeup. www.wikipedia.com

©2012 Sami Khuri

Genotype and

Phenotype Interaction

• Evans and Relling from St. Jude’s Children’s hospital in Memphis considered the efficacy and toxicity of a drug that requires two genes:

– An activator with 2 alleles, and

– A binding site with 2 alleles.

• There are 9 possible genotypes.

• Therapeutic effects depend on the genotype of the drug receptors in combination with the amount of active drug in circulation.

homozygous

for mutation

very

toxic

Discovering Genomics, Campbell, 2007 ©2012 Sami Khuri ©2012 Sami Khuri

Therapeutic

Effects of Drugs• The example highlights the complex web of

protein interactions that pharmacogenomics hopes to decipher.

• Drug response is polygenic, and new technologies are needed to understand the connections between relevant proteins involved in drug responses.

• If genotype-specific medication becomes viable, the physician will need to know the genotype of the ill person to determine the appropriate medication and dosage for optimal therapy.

©2012 Sami Khuri

@2002-10 Sami Khuri

Origins of African

Americans

Source: Esteban González Burchard ©2012 Sami Khuri

Ancestry Informative

Marker

• An Ancestry-Informative Marker (AIM) is a set of polymorphisms for a locus which exhibits substantially different frequencies between populations from different geographical regions.

• By using a number of AIMs one can estimate the geographical origins of the ancestors of an individual and ascertain what proportion of ancestry is derived from each geographical region.

en.wikipedia.org/wiki/

Page 16: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.16

©2012 Sami Khuri

SNPs and AIMs

Source: Esteban González Burchard ©2012 Sami Khuri

@2002-10 Sami Khuri

Origins of Latinos and

African Americans

African

Americans

Source: Esteban González Burchard

Self-Identified Race:

Genetic Ancestry

©2012 Sami Khuri ©2012 Sami Khuri

The Superior Doctor

Superior doctors prevent the disease

Mediocre doctors treat the disease before evident

Inferior doctors treat the full blown disease-Huang Dee: Nai - Ching

(2600 B.C. 1st Chinese Medical Text)

©2012 Sami Khuri

Preventive Medicine

• Prevent disease from occurring

• Identify the cause of the disease

• Treat the cause of the disease rather than the symptoms

• Genomics identifies the cause of disease

• “All medicine may become pediatrics” Paul Wise

• Effects of environment, accidents, aging, penetrance …

• Health care costs can be greatly reduced if

– invests in preventive medicine

– one targets the cause of disease rather than symptoms

©2012 Sami Khuri

@2002-09 Sami Khuri

Anatomy Lesson

of Dr. Nicolaes Tulp

1632 oil painting by Rembrandt Harmenszoon van Rijn

Page 17: CS123A TWO Bioinformatics F2012 - SJSU...• Incorporate edible vaccines into food products @2002-10 Sami Khuri ©2012 Sami Khuri Mighty Mouse versus Human Genome • Humans and mice

©2012 Sami Khuri

Biology/CS 123A

Bioinformatics I

2.17

©2012 Sami Khuri

If Rembrandt was

Around Today

Source: Carlos Cordon-Cardo, Columbia University ©2012 Sami Khuri

Convert all this progress into real riches for science, society, and patients

The Future

©2012 Sami Khuri

Concluding Remarks (I)

• Biology is becoming an information science

• Progression: in vivo to in vitro to in silico

• Are natural languages adequate in predicting quantitative behavior of biological systems?

– Need to produce biological knowledge and

operations in ways that natural languages do not allow

• “Biology easily has 500 years of exciting problems to work on”. Donald Knuth

• We are here to add what we can to, not to get what we can from, Life. William Osler

©2012 Sami Khuri

Concluding Remarks (II)

• Today’s biology courses need to cast a wide net

to capture the imaginations of students

representing many different interests, skills, and

viewpoints.

• Today’s biologists need to think quantitatively

and from a multidisciplinary perspective.

• The role that mathematics and computer science

are playing in biology is still in an embryonic

stage.