presentation

71
Bioinformatics: Definitions, Challenges and Impact on Health Care Systems Daniel Masys, M.D. Professor and Chair Department of Biomedical Informatics Vanderbilt University School of Medicine

description

 

Transcript of presentation

Page 1: presentation

Bioinformatics: Definitions, Challenges and Impact on Health Care Systems

Daniel Masys, M.D.Professor and Chair

Department of Biomedical InformaticsVanderbilt University School of Medicine

Page 2: presentation

Topics

1. What is Bioinformatics?2. Health Informatics compared to

Bioinformatics3. Scope of Bioinformatics4. Genomics data and patient care5. Impact of Bioinformatics on

Health Information Systems

Page 3: presentation

Central Dogma of Molecular Biology

DNA RNA Protein PhenotypePhenotype

Transcription

TranslationReplicationPost Translational

Modification

Page 4: presentation

What is Bioinformatics?

Definitions…

Page 5: presentation

NIH Working Definition

Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

http://www.bisti.nih.gov/CompuBioDef.pdf

Page 6: presentation

Another…NCBI (National Center for Biotechnology Information

Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights and to create a global perspective from which unifying principles in biology can be discerned.

http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html

Page 7: presentation

Bioinformatics & Health Informatics

Bioinformatics is the study of the flow of information in biological sciences.

Health Informatics is the study of the flow of information in patient care.

These two field are on a collision course as genomics data becomes used in patient care.

Russ Altman,MD, PhD, Stanford Univ.

Page 8: presentation

Different Areas of Strength Bioinformatics

Much more data available on the Internet than Health Informatics

Much more progress on database integration across multiple data sources

Health (Clinical) Informatics Focus on tailoring common functions to local (very

complex) healthcare environments More need for aggregation of local, regional, national

outcomes, statistics, knowledge Much more progress on terminologies for integration of

data

Page 9: presentation

Scope of Bioinformatics

OMES and OMICS

Page 10: presentation

Omes and Omics Genomics

Primarily sequences (DNA and RNA) Databanks and search algorithms Supports studies of molecular evolution (“Tree wars”)

Proteomics Sequences (Protein) and structures Mass spectrometry, X-ray crystallography Databanks, knowledge bases, visualization

Functional Genomics (transcriptomics) Microarray data Databanks, analysis tools, controlled terminologies

Systems Biology (metabolomics) Metabolites and interacting systems (interactomics) Graphs, visualization, modeling, networks of entities

Page 11: presentation

Central Dogma of Molecular Biology

DNA RNA Protein PhenotypePhenotype

StructuralGenomics

Functional Genomics(Transcriptomics)

Proteomics Phenomics

Page 12: presentation

Genome and Genomics Genome – entire complement of DNA in a

species Both nuclear and mitochondrial/chloroplast Variants among individuals

Genomics – study of the sequence, structure and function of the genome. Study relationships among sets of genes rather than single genes.

Comparative genomics – study of the differences among species. Usually covers evolutionary studies of differences & conservation over time.

Page 13: presentation

Genome Databases (e.g., GenBank) Consists of

long strings of DNA bases – ATCG….. Annotations of this database to attach

meaning to the sequence data. Example entry from GenBank:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_000410&dopt=gb Hemochromatosis gene HFE

Page 14: presentation

Human Genome Project Human Genome Project - International

research effort Determine sequence of human genome

and other model organisms Began 1987, completed 2003 Next steps for ~20,000 genes

Function and regulation of all genes Significance of variations between people Cures, therapies, “genomic healthcare”

Page 15: presentation
Page 16: presentation

The Genome Sequence is at hand…so?

“The good news is that we have the human genome. The bad news is it’s just a parts list”

Page 17: presentation

“The Human Genome Project has catalyzed striking paradigm changes in biology - biology is an information science.”

Leroy Hood, MD, PhDInstitute for Systems BiologySeattle, Washington

Page 18: presentation

Genomes In Public Databases

Published complete genomes:

Ongoing prokaryotic genomes:

Ongoing eukaryotic genomes:

http://www.genomesonline.org/

72

255

158

12/0112/01 10/0210/02

104

316

218

8/038/03

156

386

246

6/20066/2006

375

945

730

2050

Page 19: presentation

Genomics activities Sequence the genes and chromosomes –

done by breaking the DNA into parts Map the location of various gene entities to

establish their order Compare the sequences with other known

sequences to determine similarity Across species, conserved sequence “motifs” Predict secondary structure of proteins

Create large databases – GenBank, EMBL, DDBJ Develop algorithms and similarity measures

BLAST and its many forms

Page 20: presentation

Structural genomics vocabulary

Homolog a gene from one species, for example the mouse, that

has a common origin and functions the same as a gene from another species, for example, humans, Drosophila, or yeast

Orthologs genes in different species that evolved from a common

ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution.

Paralogs Genes related by duplication within a genome.

Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions

Page 21: presentation

Central Dogma of Molecular Biology

DNA RNA Protein PhenotypePhenotype

Genomics Transcriptomics

Functional Genetics

Proteomics

Page 22: presentation

Proteome vs Transcriptome Functional genomics (transcriptomics)

looks at the timing and regulation of gene products (mRNA, primarily)

Proteome is final end-product (set of many or all proteins).

Relationship between transcriptome and proteome is complex, due to longevity of mRNA signal, subsequent control of translation to protein, and post translational modifications.

Page 23: presentation

Functional Genomics –Microarrays

Transcriptome and transcriptomics High throughput technique designed

to measure the relative abundance of mRNA in a cell or tissue in response to an experiment.

Also called gene expression analysis

Page 24: presentation

Functional Genomics Technologies:Slide, Chip and Filter Arrays

Page 25: presentation

How Microarrays Work Conceptual description:

Set of targets (oligonucleotides, cDNA’s, proteins, tissues, etc) are immobilized in predetermined positions on a substrate

Solution containing tagged molecules capable of binding to the targets is placed over the targets

Binding occurs between targets and tagged molecules.

Fluorescent or radiolabel tags allows visualization of targets that have been bound.

Page 26: presentation

Schematic of probe preparation, hybridization, scanning and image analysis for slide arrays

Page 27: presentation

Array slidesAmino-silane/poly l-lysine coated

Page 28: presentation

Arrayer

Page 29: presentation
Page 30: presentation

GeneChip synthesis

Page 31: presentation

Genechip analysis system

Page 32: presentation

Genechip array design

Page 33: presentation

Raw data

Page 34: presentation

Genechip analysis software

Page 35: presentation

Duplicate Experiments

Determination of the confidence level between duplicates.3 fold differences are generally considered significant.

Page 36: presentation

Experimental Design A fundamental challenge of

microarray experiments: underdetermined systems

Kohane IS, Kho AT, Butte AJ. Microarrays for an Integrative Genomics. (The MIT Press; Cambridge, MA; 2003), p. 11.

Page 37: presentation

Characteristics of Array Data

Voluminous – tens of thousands of variables with relatively few observations of each (upside down vs. classical biostatistics)

Noisy – error rates up to 8% Methods designed to detect patterns

and associations always find patterns and associations

Page 38: presentation
Page 39: presentation

Uses of Expression Profiling

Pharmaceutical research: ID drug targets by comparing expression profile of drug-

treated cells with those of cells containing mutations in genes encoding known drug targets

Disease Dx and Tx: Distinguish morphologically similar cancers

DLBCL (Poulsen et al (2005) Microarray-based classification of diffuse large B-cell lymphomas European Journal of Haematology 74(6):453-65.))

Therapy potential Rabson AB, Weissmann D. From microarray to bedside:

targeting NF-kappaB for therapy of lymphomas. Clin Cancer Res. 2005 Jan 1;11(1)2-6.

Page 40: presentation

Future Applications

Diagnostic tool to screen for infective agents Chip imprinted with set of pathogenic

genomes used to identify bacterial, viral, or parasite genomic material in patient’s body fluids

Diagnostic chip to check for mutations involved in drug-gene interactions. Roche Amplichip

Page 41: presentation

Public Microarray Data Repositories

Major public repositories: GEO (NCBI)

http://www.ncbi.nlm.nih.gov/geo/ ArrayExpress (EBI)

http://www.ebi.ac.uk/arrayexpress/

Page 42: presentation

Standards and Repositories Brazma, A, et al. Minimum information about a

microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics. 2001 Dec;29(4):373. http://www.nature.com/cgi-taf/DynaPage.taf?file=/ng/journal/v29/n4/full/ng1201-365.html

Ball, CA, et al. Submission of Microarray Data to Public Repositories. PLoS Biology. 2004 September; 2 (9): e317http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=15340489

Page 43: presentation

Central Dogma of Molecular Biology

DNA RNA Protein PhenotypePhenotype Tissues Organs Organisms

Genomics Transcriptomics

Functional Genetics

Proteomics

Page 44: presentation

Proteome and Proteomics

Proteome – the entire set of proteins (and other gene products) made by the genome.

Proteomics – study of the interactions among proteins in the proteome, including networks of interacting proteins and metabolic considerations. Also includes differences in developmental stages, tissues and organs.

Page 45: presentation

Protein Functions Catalysis Transport Nutrition and

storage Contraction and

mobility Structural elements

Cytoskeleton Basement

membranes

Defense mechanisms

Regulation Genetic Hormonal

Buffering capacity

Page 46: presentation

Protein Databases SwissProt PIR

http://www.pir.uniprot.org/

GENE http://www.ncbi.nlm.nih.gov/gene InterPro http://www.ebi.ac.uk/interpro/

Correspond to (and derived from) Genome data bases

All connected by Reference Sequences (NCBI)

UniProt

Page 47: presentation

Gene/Protein Database entries

HFE record in Entrez GENE (NCBI) http://www.ncbi.nlm.nih.gov/

entrez/query.fcgi?&db=gene&cmd=retrieve&dopt=Graphics&list_uids=3077

Page 48: presentation

Structure & Function Determination

X-ray crystallography Nuclear magnetic resonance

spectroscopy and tandem MS/MS Computational modeling Sequence alignment from others Homology modeling

Page 49: presentation

Structure Databases Contain experimentally determined and

predicted structures of biological molecules Most structures determined by X-ray

crystallography, NMR Example – MMDB molecular modeling db

http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml HFE Entry

http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?form=6&db=t&Dopt=s&uid=9816

Page 50: presentation

Protein Interaction Databases

Record observations of protein-protein interactions in cells

Attempts to detail interactions observed in thousands of small-scale experiments described in published articles

Examples: BIND: Biomolecular Interaction Network Database DIP: Database of Interacting Proteins MIPS: Munich Information Center for Protein Sequences PRONET: Protein interaction on the Web Many others, both academic and commercial

Page 51: presentation

Controlled Vocabularies in Bioinformatics

The Gene Ontology http://www.geneontology.org/ Knowledge about gene function (the ontology itself) Annotation of gene products (for comparisons)

The MGED Ontology (arising from MIAME) http://mged.sourceforge.net/ Annotation of microarray experiments for public

repositories Clinical Bioinformatics Ontology:

Annotation of gene tests in electronic medical records http://www.cerner.com/cbo

MIAPE from Proteomics Standards Initiative (PSI) Annotation of proteomics experiments for public

repositories http://psidev.sourceforge.net/

Page 52: presentation

Genomics Data and Patient Care

From genotype to phenotype

Page 53: presentation

Human Disease Gene Specifics

Genes linked to human diseases (9-2004)

+ 425 in 2 yrs 1700/20,000 =

9% of loci

0

200400

600800

1000

12001400

1600

1800

2002 2003 2004

Loci

Page 54: presentation

Informatics Issues related to Genomics Data and Patient Care

Linking known data for genes causing human diseases to clinical decision support and EMR documentation

Representation of genetic data in electronic medical records

Page 55: presentation

Clinical Bioinformatics:Common Questions What genes cause the condition? What are the normal function of the gene? What mutations have been linked to

diseases? How does the mutation alter gene function? What laboratories are performing DNA tests? Are there gene therapies or clinical trials? What names are used to refer to the genes

and the diseases? What other conditions are linked to these

same genes?

Page 56: presentation

Answers exist online … but it is not easy; answers in many places Can’t navigate by genes names - must use

hot links and numeric identifiers The number and function of alternate forms

of the protein are inconsistently reported Synonymy (many names, same meaning)

and polysemy (same name, different meanings) cause confusion

Upper and lower case are used for species distinctions

Page 57: presentation

Major Challenges of Navigation Complexity of data Dynamic nature of the data Diverse foci and number of

data/knowledge base systems Data and knowledge

representation lack standardsCan navigate if you know what

you are looking for.

Page 58: presentation

Genetics Home Reference Consumer health resource to help the

public navigate from phenotype to genotype.

Focus on health implications of the Human Genome Project.

http://ghr.nlm.nih.gov Mitchell, Fun, McCray, JAMIA, 2004 Nov 11(6):439-

437

Page 59: presentation

Genetics is Impacting Medicine Today

1700 genes & health conditions > 1100 gene tests for diagnosis Relate to diagnosis, therapy, drug

dosage, occupational hazards, reproductive plans, health risks, ….

Page 60: presentation

Well-known Examples Pharmacogenetics:

CYP450 alleles: exaggerated, diminished or ultra-rapid drug responses. E.G., Warfarin. 93% of patients are OK on standard doses. 7% of patients have severe hemorrhage. CYP2C9*2 and CYP2C9*3 most severe of 6 known mutations.

Environmental susceptibility Sickle Cell trait carrier and malaria parasite

Nutrition PKU and avoidance of phenylalanine

Page 61: presentation

Iressa (gefitinib)

Non-small cell lung CA ~ 140,000 pt/yr Iressa (Astra Zeneca) causes remission in

1 of 10 patients if taken daily for life. Iressa efficacy correlates with EGFR

mutation in the tumor. Now have gene testing for EGFR so can target appropriate people. http://www.sciencemag.org/cgi/content/full/305/5688/1222a

BUT – Astra Zeneca can’t make money on only 14,000 per year.

http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=131550

Page 62: presentation

Implications for Health Care System

More gene tests will be ordered. [reports of 300% increase in gene tests in 2003.]

Arch Pathol Lab Med – 2004, 128(12):1330-1333

Simultaneous testing will cause the “Incidentalome” – unanticipated findings on screeening genetic tests.

Kohane, Masys, Altman, RB. The incidentalome: a threat to genomic medicine. JAMA, 296(2), 212-5, 2006.

Preventive healthcare will play a larger part. Environmental risk factors dictate OSHA-type

approach to worker empowerment and education about safe behavior

Page 63: presentation

Unsolved Informatics Issues:What Should Be Stored in the EMR?

Complete DNA sequence for specific genes into the EMR? Where?

Meta-data about the DNA sequence? If not the sequence (ie., diff from reference

sequence), what to do when the reference sequence changes?

How to trigger alerts and reminders? And for what?

Page 64: presentation

Genetic data in electronic medical records

Implications for component systems: Laboratory Pharmacy Computerized order entry Documentation and notes

Knowledge management Alerts and reminders Finding patients matching profiles Practice guidelines and clinical trials

Page 65: presentation

Genome Data and Other Information Systems

Genomic information will be pervasive in all healthcare information systems.

Also in public health systems Newborn screening Tissue and organ banks DOD requires DNA samples Bioterrorism and homeland security Identification of World Trade Center victims

Privacy and security issues are important but not inherently different than other EMR data.

Page 66: presentation

Summary

Informatics will be the key enabling technology for personalized, genomic medicine.

Current separation between bioinformatics and clinical informatics will diminish as the two subdisciplines merge

Page 67: presentation

Optional Exercise:Hands-on with GHR

Scavenger hunt with hemochromatosis and the genes that influence it.

Explore the Genetics Home Reference by answering the following questions. Start at http://ghr.nlm.nih.gov .

Page 68: presentation

GHR Scavenger Hunt

How common is hemochromatosis? How many genes have been proven to

be involved in hemochromatosis when the genes are mutated?

What are the symbols for these genes? Can you find the link to MedlinePlus

with health information on hemochromatosis?

Page 69: presentation

GHR Scavenger Hunt What are the names of the patient

support associations for hemochromatosis?

One synonym for this condition is “bronze diabetes”. Can you find a reason for this?

What kind of damage is done to the liver of people with hemochromatosis?

Page 70: presentation

GHR Scavenger Hunt For the genes involved in

hemochromatosis, how many of them are available as a DNA test?

Give one place where you would choose to send a tissue sample for DNA testing.

What sites are listed under “Research Resources” for the TFR2 gene? How many alternately spliced proteins for

TFR2? In what tissues is this gene expressed?

Page 71: presentation

GHR Scavenger Hunt How do people inherit

hemochromatosis? Do the genes involved in

hemochromatosis cause other health conditions when they are mutated?

Can you find a protein sequence for one of the genes?

What clinical trials are available for hemochromatosis patients close to where you live?