21st Century Biology: Informatics in the Post-Genome Era · Informatics in the Post-Genome Era...
Transcript of 21st Century Biology: Informatics in the Post-Genome Era · Informatics in the Post-Genome Era...
21st Century Biology:Informatics in the Post-Genome Era
Robert J. RobbinsFred Hutchinson Cancer Research Center
1124 Columbia Street, LV-101Seattle, Washington 98104
[email protected](206) 667 4778
2
Topics
• Biotechnology will be the “magic”technology of the 21st Century.
3
Topics
• Biotechnology will be the “magic”technology of the 21st Century.
• Information Technology (IT) has aspecial relationship with biology.
4
Topics
• Biotechnology will be the “magic”technology of the 21st Century.
• Information Technology (IT) has aspecial relationship with biology.
• Moore’s Law constantly transforms IT(and everything else).
5
Topics
• Biotechnology will be the “magic”technology of the 21st Century.
• Information Technology (IT) has aspecial relationship with biology.
• Moore’s Law constantly transforms IT(and everything else).
• Current funding mechanisms for bio-information infrastructure are hopelesslyinadequate to meet future needs and mustbe radically reformed.
Introduction
MagicalTechnology
7
Magic
To a person from 1897, much currenttechnology would seem like magic.
8
Magic
To a person from 1897, much currenttechnology would seem like magic.
What technology of 2097 would seemmagical to a person from 1997?
9
Magic
To a person from 1897, much currenttechnology would seem like magic.
What technology of 2097 would seemmagical to a person from 1997?
Candidate: Biotechnology so advancedthat the distinction between living andnon-living is blurred.
IT-BiologySynergism
11
IT is Special
Information Technology:
• affects the performance and themanagement of tasks
12
IT is Special
Information Technology:
• affects the performance and themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
13
IT is Special
Information Technology:
• affects the performance and themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
• is incredibly plastic(programming and poetry are both exercises in pure thought)
14
IT is Special
Information Technology:
• affects the performance and themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
• is incredibly plastic(programming and poetry are both exercises in pure thought)
• improves exponentially(Moore’s Law)
15
Biology is Special
Life is Characterized by:
• individuality
16
Biology is Special
Life is Characterized by:
• individuality
• historicity
17
Biology is Special
Life is Characterized by:
• individuality
• historicity
• contingency
18
Biology is Special
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information content
19
Biology is Special
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information contentNo law of large numbers...
20
Biology is Special
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information contentNo law of large numbers, since everyliving thing is genuinely unique.
21
IT-Biology Synergism
• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.
22
IT-Biology Synergism
• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.
• Biology needs information technology, themethod for manipulating informationabout large numbers of dependent,historically contingent, individual things.
23
Biology is Special
For it is in relation to the statistical point of viewthat the structure of the vital parts of livingorganisms differs so entirely from that of anypiece of matter that we physicists and chemistshave ever handled in our laboratories ormentally at our writing desks.
Erwin Schrödinger. 1944. What is Life.
24
Genetics as Code
[The] chromosomes ... contain in some kind of code-script the entire pattern of the individual's futuredevelopment and of its functioning in the mature state.... [By] code-script we mean that the all-penetratingmind, once conceived by Laplace, to which everycausal connection lay immediately open, could tellfrom their structure whether [an egg carrying them]would develop, under suitable conditions, into a blackcock or into a speckled hen, into a fly or a maize plant,a rhodo-dendron, a beetle, a mouse, or a woman.
Erwin Schrödinger. 1944. What is Life.
25
One Human SequenceWe now know thatSchrödinger’s mysterioushuman “code-script”consists of 3.3 billionbase pairs of DNA.
26
One Human Sequence
Typed in 10-pitch font, one human sequence would stretch for morethan 5,000 miles. Digitally formatted, it could be stored on one CD-ROM. Biologically encoded, it fits easily within a single cell.
We now know thatSchrödinger’s mysterioushuman “code-script”consists of 3.3 billionbase pairs of DNA.
27
Bio-digital Information
DNA is a highly efficient digital storage device:
• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.
28
Bio-digital Information
DNA is a highly efficient digital storage device:
• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.
• Storing all of the (redundant) information in allof the world’s DNA on computer hard diskswould require that the entire surface of the Earthbe covered to a depth of three miles in Conner1.0 gB drives.
Genomics:An Example
30
Computers as Instruments
Computers are not just tools for catalogingexisting knowledge. They are instruments thatchange the way we can see the biologicalworld. Computers allow us to see genomes,just as radio telescopes let us see quasars andmicroscopes let us see cells.
31
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
32
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
33
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
34
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
– development of capabilities for collecting, storing,distributing, and analyzing the data produced;
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
35
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
– development of capabilities for collecting, storing,distributing, and analyzing the data produced;
– creation of appropriate technologies necessary to achievethese objectives.
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
36
Base Pairs in GenBank
0
10 0 ,00 0 ,00 0
20 0 ,00 0 ,00 0
30 0 ,00 0 ,00 0
40 0 ,00 0 ,00 0
50 0 ,00 0 ,00 0
60 0 ,00 0 ,00 0
70 0 ,00 0 ,00 0
80 0 ,00 0 ,00 0
90 0 ,00 0 ,00 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
GenBank Release Numbers
9493929190898887 95 96 97
37
0
10 0 ,00 0 ,00 0
20 0 ,00 0 ,00 0
30 0 ,00 0 ,00 0
40 0 ,00 0 ,00 0
50 0 ,00 0 ,00 0
60 0 ,00 0 ,00 0
70 0 ,00 0 ,00 0
80 0 ,00 0 ,00 0
90 0 ,00 0 ,00 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
GenBank Release Numbers
9493929190898887 95 96 97
Base Pairs in GenBank
Growth in GenBank is exponential.More data were added in the last 10weeks than were added in the first 10years of the project.
38
Infrastructure and the HGP
Progress towards all of the [Genome Project]goals will require the establishment of well-funded centralized facilities, including a stockcenter for the cloned DNA fragmentsgenerated in the mapping and sequencingeffort and a data center for the computer-basedcollection and distribution of large amounts ofDNA sequence information.
National Research Council. 1988. Mapping and Sequencing theHuman Genome. Washington, DC: National Academy Press. p. 3
39
Databases and the Genome Project
[The] database developer should provide, insome real sense, an intellectual focus for theinterpretation of genomic data.
NIH-DOE Ad Hoc Committee on Genome Databases
21st CenturyBiology
The Approach
41
Paradigm Shift in Biology
The new paradigm, now emerging, is that all the‘genes’ will be known (in the sense of beingresident in databases available electronically),and that the starting point of a biologicalinvestigation will be theoretical. An individualscientist will begin with a theoretical conjecture,only then turning to experiment to follow or testthat hypothesis.
Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.
42
Paradigm Shift in Biology
To use [the] flood of knowledge, which will pouracross the computer networks of the world,biologists not only must become computerliterate, but also change their approach to theproblem of understanding life.
Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.
21st CenturyBiology
The People
44
Human Resources Issues
• Reduction in need for non-IT staff
45
Human Resources Issues
• Reduction in need for non-IT staff
• Increase in need for IT staff, especially“information engineers”
46
Human Resources Issues
• Reduction in need for non-IT staff
• Increase in need for IT staff, especially“information engineers”
In modern biology, a general trend is toconvert expert work into staff work andfinally into computation. New expertise isrequired to design, carry out, and interpretcontinuing work.
47
Human Resources Issues
Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”
48
Human Resources Issues
Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”
Craig Venter: “At TIGR, we already havetwice as many computer scientists on ourstaff.”
Exchange at DOE workshop on high-throughput sequencing.
21st CenturyBiology
The Science
50
Fundamental Dogma
The fundamental dogma ofmolecular biology is that genes act tocreate phenotypes through a flow ofinformation from DNA to RNA toproteins, to interactions amongproteins, and ultimately tophenotypes.
Collections of individual phenotypes,of course, constitute a population.
DNA
RNA
Proteins
Circuits
Phenotypes
Populations
51
Fundamental DogmaDNA
RNA
Proteins
Circuits
Phenotypes
Populations
GenBankEMBLDDBJ
MapDatabases
SwissPROTPIR
PDB
Although a few databases already existto distribute molecular information,
52
Fundamental DogmaDNA
RNA
Proteins
Circuits
Phenotypes
Populations
GenBankEMBLDDBJ
MapDatabases
SwissPROTPIR
PDB
Gene Expression?
Clinical Data ?
Regulatory Pathways?Metabolism?
Biodiversity?
Neuroanatomy?
Development ?
Molecular Epidemiology?
Comparative Genomics?
the post-genomic era will need manymore to collect, manage, and publishthe coming flood of new findings.
Although a few databases already existto distribute molecular information,
21st CenturyBiology
The Literature
54
Electronic Data Publishing
G D B -- Beta Hemoglobin---------------------------------------------- ** Locus Detail View **-------------------------------------------------------- Symbol: HBB Name: hemoglobin, beta MIM Num: 141900 Location: 11p15.5 Created: 01 Jan 86 00:00-------------------------------------------------------- ** Polymorphism Table **--------------------------------------------------------Probe Enzyme=================================beta-globin cDNA RsaIbeta-globin cDNA,JW10+ AvaIIPstbeta,JW102,BD23,pB+ BamHIpRK29,Unknown HindIIbeta-IVS2 probe HphIIVS-2 normal HphIUnknown AvrIIbeta-IVS2 probe AsuI
55
Electronic Data Publishing
O M I M -- Beta Hemoglobin----------------------------------------------Title*141900 HEMOGLOBIN--BETA LOCUS[HBB; SICKLE CELL ANEMIA, INCLUDED;BETA-THALASSEMIAS, INCLUDED;HEINZ BODY ANEMIAS, BETA-GLOBINTYPE, ...]
The alpha and beta loci determine thestructure of the 2 types of polypeptidechains in adult hemoglobin, Hb A. Byautoradiography using heavy-labeledhemoglobin-specific messenger RNA,Price et al. (1972) found labeling of achromosome 2 and a group B chromo-some. They concluded, incorrectly as itturned out, that the beta-gamma-deltalinkage group was on a group Bchromosome since the zone of labelingwas longer on that chromosome than onchromosome 2 (which by this reasoning
G D B -- Beta Hemoglobin---------------------------------------------- ** Locus Detail View **-------------------------------------------------------- Symbol: HBB Name: hemoglobin, beta MIM Num: 141900 Location: 11p15.5 Created: 01 Jan 86 00:00-------------------------------------------------------- ** Polymorphism Table **--------------------------------------------------------Probe Enzyme=================================beta-globin cDNA RsaIbeta-globin cDNA,JW10+ AvaIIPstbeta,JW102,BD23,pB+ BamHIpRK29,Unknown HindIIbeta-IVS2 probe HphIIVS-2 normal HphIUnknown AvrIIbeta-IVS2 probe AsuI
56
Electronic Data Publishing
GenBank -- Beta Hemoglobin----------------------------------------------DEFINITION [DEF] [HUMHBB] Human beta globin regionLOCUS [LOC] HUMHBBACCESSION NO. [ACC]J00179 J00093 J00094 J00096J00158 J00159 J00160 J00161
KEYWORDS [KEY]Alu repetitive element; HPFH;KpnI repetitive sequence; RNApolymerase III; allelicvariation; alternate cap site;
SEQUENCEgaattctaatctccctctcactactgtctagtatccctcaaggagtggtggctcatgtcttgagctcaagagtttgatataaaaaaaaattagccaggcaaatgggaggatcccttgagcgcactccagcct
GenBank -- Beta Hemoglobin----------------------------------------------DEFINITION [DEF] [HUMHBB] Human beta globin regionLOCUS [LOC] HUMHBBACCESSION NO. [ACC]J00179 J00093 J00094 J00096J00158 J00159 J00160 J00161
KEYWORDS [KEY]Alu repetitive element; HPFH;KpnI repetitive sequence; RNApolymerase III; allelicvariation; alternate cap site;
SEQUENCEgaattctaatctccctctcactactgtctagtatccctcaaggagtggtggctcatgtcttgagctcaagagtttgatataaaaaaaaattagccaggcaaatgggaggatcccttgagcgcactccagcct
O M I M -- Beta Hemoglobin----------------------------------------------Title*141900 HEMOGLOBIN--BETA LOCUS[HBB; SICKLE CELL ANEMIA, INCLUDED;BETA-THALASSEMIAS, INCLUDED;HEINZ BODY ANEMIAS, BETA-GLOBINTYPE, ...]
The alpha and beta loci determine thestructure of the 2 types of polypeptidechains in adult hemoglobin, Hb A. Byautoradiography using heavy-labeledhemoglobin-specific messenger RNA,Price et al. (1972) found labeling of achromosome 2 and a group B chromo-some. They concluded, incorrectly as itturned out, that the beta-gamma-deltalinkage group was on a group Bchromosome since the zone of labelingwas longer on that chromosome than onchromosome 2 (which by this reasoning
G D B -- Beta Hemoglobin---------------------------------------------- ** Locus Detail View **-------------------------------------------------------- Symbol: HBB Name: hemoglobin, beta MIM Num: 141900 Location: 11p15.5 Created: 01 Jan 86 00:00-------------------------------------------------------- ** Polymorphism Table **--------------------------------------------------------Probe Enzyme=================================beta-globin cDNA RsaIbeta-globin cDNA,JW10+ AvaIIPstbeta,JW102,BD23,pB+ BamHIpRK29,Unknown HindIIbeta-IVS2 probe HphIIVS-2 normal HphIUnknown AvrIIbeta-IVS2 probe AsuI
57
Electronic Data PublishingP I R -- Beta Hemoglobin----------------------------------------------DEFINITION
[HBHU] Hemoglobin beta chain- Human, chimpanzee, pygmy chimpanzee, and gorilla
SUMMARY [SUM] #Type Protein #Molecular-weight 15867 #Length 146 #Checksum 1242
SEQUENCEV H L T P E E K S A V T A L W GK V N V D E V G G E A L G R L LV V Y P W T Q R F F E S F G D LS T P D A V M G N P K V K A H GK K V L G A F S D G L A H L D NL K G T F A T L S E L H C D K LH V D P E N F R L L G N V L V CV L A H H F G
P I R -- Beta Hemoglobin----------------------------------------------DEFINITION
[HBHU] Hemoglobin beta chain- Human, chimpanzee, pygmy chimpanzee, and gorilla
SUMMARY [SUM] #Type Protein #Molecular-weight 15867 #Length 146 #Checksum 1242
SEQUENCEV H L T P E E K S A V T A L W GK V N V D E V G G E A L G R L LV V Y P W T Q R F F E S F G D LS T P D A V M G N P K V K A H GK K V L G A F S D G L A H L D NL K G T F A T L S E L H C D K LH V D P E N F R L L G N V L V CV L A H H F G
GenBank -- Beta Hemoglobin----------------------------------------------DEFINITION [DEF] [HUMHBB] Human beta globin regionLOCUS [LOC] HUMHBBACCESSION NO. [ACC]J00179 J00093 J00094 J00096J00158 J00159 J00160 J00161
KEYWORDS [KEY]Alu repetitive element; HPFH;KpnI repetitive sequence; RNApolymerase III; allelicvariation; alternate cap site;
SEQUENCEgaattctaatctccctctcactactgtctagtatccctcaaggagtggtggctcatgtcttgagctcaagagtttgatataaaaaaaaattagccaggcaaatgggaggatcccttgagcgcactccagcct
GenBank -- Beta Hemoglobin----------------------------------------------DEFINITION [DEF] [HUMHBB] Human beta globin regionLOCUS [LOC] HUMHBBACCESSION NO. [ACC]J00179 J00093 J00094 J00096J00158 J00159 J00160 J00161
KEYWORDS [KEY]Alu repetitive element; HPFH;KpnI repetitive sequence; RNApolymerase III; allelicvariation; alternate cap site;
SEQUENCEgaattctaatctccctctcactactgtctagtatccctcaaggagtggtggctcatgtcttgagctcaagagtttgatataaaaaaaaattagccaggcaaatgggaggatcccttgagcgcactccagcct
O M I M -- Beta Hemoglobin----------------------------------------------Title*141900 HEMOGLOBIN--BETA LOCUS[HBB; SICKLE CELL ANEMIA, INCLUDED;BETA-THALASSEMIAS, INCLUDED;HEINZ BODY ANEMIAS, BETA-GLOBINTYPE, ...]
The alpha and beta loci determine thestructure of the 2 types of polypeptidechains in adult hemoglobin, Hb A. Byautoradiography using heavy-labeledhemoglobin-specific messenger RNA,Price et al. (1972) found labeling of achromosome 2 and a group B chromo-some. They concluded, incorrectly as itturned out, that the beta-gamma-deltalinkage group was on a group Bchromosome since the zone of labelingwas longer on that chromosome than onchromosome 2 (which by this reasoning
G D B -- Beta Hemoglobin---------------------------------------------- ** Locus Detail View **-------------------------------------------------------- Symbol: HBB Name: hemoglobin, beta MIM Num: 141900 Location: 11p15.5 Created: 01 Jan 86 00:00-------------------------------------------------------- ** Polymorphism Table **--------------------------------------------------------Probe Enzyme=================================beta-globin cDNA RsaIbeta-globin cDNA,JW10+ AvaIIPstbeta,JW102,BD23,pB+ BamHIpRK29,Unknown HindIIbeta-IVS2 probe HphIIVS-2 normal HphIUnknown AvrIIbeta-IVS2 probe AsuI
58
Electronic Scholarly Publishing
HTTP://WWW.ESP.ORGThe ESP site is dedicated to the electronic publishing ofscientific and other scholarly materials. Of particularinterest are the history of science, genetics,computational biology, and genome research.
59
Electronic Scholarly Publishing
The Classical Genetics:Foundations seriesprovides ready accessto typeset-quality,electronic editions ofimportant publicationsthat can otherwise bevery difficult to find.
60
Electronic Scholarly Publishing
“Hardy” (of Hardy-Weinberg) is a namewell known to moststudents of biology.
61
Electronic Scholarly Publishing
But how many haveread, or even seen,all of Hardy’sbiological writings?
This is it: A single,one-page letter to theeditor of Science.
62
Electronic Scholarly Publishing
http://www.blocks.fhcrc.org/~kinesinElectronic publishing is especially appropriate for somekinds of dynamic review papers.
63
Electronic Scholarly Publishing
Reviews may be revisedand maintained in realtime...
64
Electronic Scholarly Publishing
and it is easy to providelarge amounts of in-depth supporting andrelated data.
65
Electronic Scholarly Publishing
66
Electronic Scholarly Publishing
http://www.esp.org/books/darwin/beagleEntire monographs can be made instantly available to readers world-wide..
67
Electronic Scholarly Publishing
Today’s computertechnology was nearlyunimaginable just tenyears ago. The technol-ogy of ten years fromnow will also bringmany surprises.
How is it that IT canmaintain such anamazing rate ofsustained change?
And what, if any, arethe implications of thatrate of change forbiology?
Moore’s Law
Transforms InfoTech(and everything else)
69
Moore’s Law: The Statement
Every eighteen months, thenumber of transistors that canbe placed on a chip doubles.
Gordon Moore, co-founder of Intel...
70
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Time
100,000
10,000
1,000
100
10
71
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
72
Moore’s Law: The Effect
Three Phases of Novel IT Applications
• It’s Impossible
73
Moore’s Law: The Effect
Three Phases of Novel IT Applications
• It’s Impossible
• It’s Impractical
74
Moore’s Law: The Effect
Three Phases of Novel IT Applications
• It’s Impossible
• It’s Impractical
• It’s Overdue
75
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
76
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
77
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
78
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
C
79
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
C
80
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
AA
C
81
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
A
C
82
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erfo
rma
nce
)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
A
C
Relevance for biology?
83
Cost (constant performance)
1,000
10,000
100,000
1,000,000
10,000,000
1975 1980 1985 1990 1995 2000 2005
UniversityPurchase
84
Cost (constant performance)
1,000
10,000
100,000
1,000,000
10,000,000
1975 1980 1985 1990 1995 2000 2005
UniversityPurchase
DepartmentPurchase
85
Cost (constant performance)
1,000
10,000
100,000
1,000,000
10,000,000
1975 1980 1985 1990 1995 2000 2005
RO1 GrantPurchase
UniversityPurchase
DepartmentPurchase
86
Cost (constant performance)
1,000
10,000
100,000
1,000,000
10,000,000
1975 1980 1985 1990 1995 2000 2005
PersonalPurchase
RO1 GrantPurchase
UniversityPurchase
DepartmentPurchase
Funding forInformation
Infrastructure
88
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
89
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
90
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
91
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
• Reallocation of federal funding is difficult,and subject to political pressures.
92
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
• Reallocation of federal funding is difficult,and subject to political pressures.
• Federal-funding decision processes areponderously slow and inefficient.
93
Federal Funding of Bio-Databases
The challenges:
94
Federal Funding of Bio-Databases
The challenges:
• providing adequate funding levels
95
Federal Funding of Bio-Databases
The challenges:
• providing adequate funding levels
• making timely, efficient decisions
96
Federal Funding of Bio-Databases
Appropriate funding level:
97
Federal Funding of Bio-Databases
Appropriate funding level:
• approx. 10% of research funding
98
Federal Funding of Bio-Databases
Appropriate funding level:
• approx. 10% of research funding
• i.e., 1 - 2 billion dollars per year
99
Federal Funding of Bio-Databases
Appropriate funding level:
• approx. 10% of research funding
• i.e., 1 - 2 billion dollars per year
Source of estimate:
- Experience of IT-transformed industries.
- Current support for IT-rich biological research.
100
Market Forces
Vendors
productsservices
Buyers
In a simple market economy, vendors try to anticipatethe needs of buyers and offer products and services tomeet those needs.
101
Market Forces
Vendors
productsservices
Buyers
$
purchases
In a simple market economy, vendors try to anticipatethe needs of buyers and offer products and services tomeet those needs.
Real users decide whether or not to buy a product orservice, depending upon whether or not it meets a realneed at a reasonable price.
102
Market Forces
Vendors
productsservices
Buyers
$
purchases
In a simple market economy, vendors try to anticipatethe needs of buyers and offer products and services tomeet those needs.
Real users decide whether or not to buy a product orservice, depending upon whether or not it meets a realneed at a reasonable price.
Business 101 Insight:
Successful vendors target aniche and excel at meeting theneeds of that niche.
103
Market Forces
VentureCapital
Vendors$
Buyers
$Stock
Offerings
Funding to initiate the developmentof products and services come frominvestors, not from buyers.
$
VendorInvestment
productsservices
$
purchases
104
Market Forces
VentureCapital
Vendors$
Buyers
$Stock
Offerings
Funding to initiate the developmentof products and services come frominvestors, not from buyers.
Investors decide whether or not toprovide start-up funding based uponthe estimated ability of the vendor tocreate products and services that willmeet real needs at competitive prices.
$
VendorInvestment
productsservices
$
purchases
105
Federal Funding
Database
Users
productsservices
$
purchases
If biological databases were drivenby market forces, individual userswould choose what services theyneed and individual databaseproviders would choose whatservices to make available.
106
Federal Funding
Investors
Database
$
Users
productsservices
$
purchases
If biological databases were drivenby market forces, individual userswould choose what services theyneed and individual databaseproviders would choose whatservices to make available.
Investors would provide start-upmoney on the likelihood ofsuccessful products and servicesbeing developed.
107
Federal Funding
Investors
Database
$
Users
productsservices
$
purchases
If biological databases were drivenby market forces, individual userswould choose what services theyneed and individual databaseproviders would choose whatservices to make available.
Investors would provide start-upmoney on the likelihood ofsuccessful products and servicesbeing developed.
Ultimate success would depend onmeeting the needs of real users.Decisions could be made rapidly, inresponse to changing needs andemerging opportunities.
108
Federal Funding
Agency
Database
Reviewers
OtherAgencies
AgencyAdvisors
Congress
productsservices
OMB $
$ DatabaseAdvisors
Users
Instead, funding decisions forbiological databases can follow aponderously slow course, withalmost no opportunity for input fromreal users.
Those most knowledgeable about aparticular database are oftenexcluded from participating in thereview process because of a possible“conflict of interest” status with thedatabase provider.
109
Federal Funding of Bio-Databases
Possible solution - create market forces:
110
Federal Funding of Bio-Databases
Possible solution - create market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
111
Federal Funding of Bio-Databases
Possible solution - create market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
112
Federal Funding of Bio-Databases
Possible solution - create market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
• provide guaranteed supplementary funding,redeemable only for access to bio-databases.
113
Federal Funding of Bio-Databases
Possible solution - create market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
• provide guaranteed supplementary funding,redeemable only for access to bio-databases.
• data stamps
114
Federal Funding of Bio-Databases
Possible solution - create market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
• provide guaranteed supplementary funding,redeemable only for access to bio-databases.
• data stamps, AKA food (for-thought) stamps ?!
115
Food (for thought) Stamps
Funding Agencies could:
• provide a 10% supplement to every researchgrant in the form of “stamps” redeemable only atdatabase providers.
116
Food (for thought) Stamps
Funding Agencies could:
• provide a 10% supplement to every researchgrant in the form of “stamps” redeemable only atdatabase providers.
• allow the “stamps” to be transferable amongscientists, so that a market for them couldemerge.
117
Food (for thought) Stamps
Funding Agencies could:
• provide a 10% supplement to every researchgrant in the form of “stamps” redeemable only atdatabase providers.
• allow the “stamps” to be transferable amongscientists, so that a market for them couldemerge.
• provide funding only after the stamps have beenredeemed at a database provider.
118
Food (for thought) Stamps
Problems:
• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).
119
Food (for thought) Stamps
Problems:
• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).
• how to identify “approved” database providers.
120
Food (for thought) Stamps
Problems:
• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).
• how to identify “approved” database providers.
• how to initiate the FFT system.
121
Food (for thought) Stamps
Problems:
• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).
• how to identify “approved” database providers.
• how to initiate the FFT system.
• etc etc
122
Food (for thought) Stamps
Alternatives (if no solution emerges):• increasingly inefficient research activities (abject
failure will occur when it becomes easier to redoresearch than to obtain the results of prior work).
123
Food (for thought) Stamps
Alternatives (if no solution emerges):• increasingly inefficient research activities (abject
failure will occur when it becomes easier to redoresearch than to obtain the results of prior work).
• loss of access to bio-databases for public-sectorresearch.
124
Food (for thought) Stamps
Alternatives (if no solution emerges):• increasingly inefficient research activities (abject
failure will occur when it becomes easier to redoresearch than to obtain the results of prior work).
• loss of access to bio-databases for public-sectorresearch.
• movement of majority of “important” biologicalresearch into the private sector.
125
Food (for thought) Stamps
Alternatives (if no solution emerges):• increasingly inefficient research activities (abject
failure will occur when it becomes easier to redoresearch than to obtain the results of prior work).
• loss of access to bio-databases for public-sectorresearch.
• movement of majority of “important” biologicalresearch into the private sector.
• loss of American pre-eminence (if othercountries solve the problems first).
126
Food (for thought) Stamps
Bottom Line:
• This might not be the answer, but
127
Food (for thought) Stamps
Bottom Line:
• This might not be the answer, but
• it’s FOOD FOR THOUGHT
128
Slides:
http://www.esp.org/rjr/beckman.pdf