“software” of life. Genomes to function Lessons from genome projects Most genes have no known...
-
Upload
roger-hodges -
Category
Documents
-
view
222 -
download
0
Transcript of “software” of life. Genomes to function Lessons from genome projects Most genes have no known...
Genomes to function
Lessons from genome projects
• Most genes have no known function
• Most genes w/ known function assigned from sequence-similarity matches to other organisms
• Need methods to experimentally assay gene activity on a genome-wide scale
Condition 1 RNA
Condition 2 RNA
gene enriched in condition 1
gene enrichedin condition 2
17,997 genes94% of genome
Measure expression on genome-wide scale: DNA Microarrays
Global Analyses of Gene Expression
• Collect all microarrays from the world
• Gene activity across thousands of conditions
conditions(~5k)genes
(20k)
Digital Age of Biology
• Biologists drowning in data
• Bottleneck now is developing computational resources for discovery
• Think Genbank before BLAST...
Discovering Gene Function on a Global Scale
• Gene Networks
• Search Engines
MattWeirauch
CoreyPowell Chad
Chen
CharlieVaske Alex
WilliamsMartinaKoeva
Gene Networks
Gene Networks
• link 2 genes together if they are co-activated in multiple organisms
• build networks from all the links
• discover function from a gene’s links
• understand bigger picture of gene regulation
Principle #1
Gene networks are “scale free”
• Scale free – gene networks may arise from processes like expansion of WWW
some links on the WWW
Principle #2
Genes self assemble into modular subcomponents
http://www.cse.ucsc.edu/~jstuart/multispecies
Principle #2
Genes self assemble into modular subcomponents
0
10
20
30
40
50
605 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
105
110
115
Core Size
Per
cen
t o
f C
ore
s
Network
Random
Principle #3
Coordinated activity is a signature of gene function
proliferation
transcription
ribosomebiogenesis
ribosomalsubunits
respirationprotein modification
secretion
fatty acidmetab.tissue growth
neuronal
immune response
development /hox genes
cell polarity,cell structure
Newly evolved
Proteasome “module”
http://www.cse.ucsc.edu/~jstuart/multispecies
integrator subunits
Principle #4
Local network topology reports on gene function
top 3 integrators:
Integrators have more cis-regulatory complexity
integrators subunits
integrators have different phenotypes
0
10
20
30
40
50
60
70
80
90
WT UNC LVA STP RUP EMB PCH GRO Other
gen
es (%
)
integrators
subunits
Current Directions for Gene Networks
• Gene isoform networks to capture alternative splicing
• Predict drug targets from synthetic lethal nets
Current Directions for Gene Networks
• Gene isoform networks to capture alternative splicing
• Predict drug targets from synthetic lethal nets (w/ Lokey Lab)
MattWeirauch
CoreyPowell Chad
Chen
CharlieVaske Alex
WilliamsMartinaKoeva
Gene Isoform Networks
Gene Isoform Networks
• Most human genes (>60%) are alternatively spliced.
• Alternative splicing gives rise to different proteins from the same gene
• The particular variant expressed can be very important (e.g. sex determination in flies)
• The functional implications of alt. splicing in humans is still largely unexplored.
• Provides a higher resolution understanding of gene expression and its relationship to health & disease
Splicing Microarrays
• Measure particular subparts of the gene structure (e.g. exon-exon junctions)
• Data now available for human and mouse tissue compendiums
• Infer isoforms from expression of subparts across the tissues
• Identify isoform modules
A functional network of gene isoformsisoform patterns isoform network
• assemble into modules
• functional signatures
• global network design
MattWeirauch
CoreyPowell Chad
Chen
CharlieVaske Alex
WilliamsMartinaKoeva
Search Engines
Search engines to discover gene function
identify every member of a pathwayRetinoblastoma pathway
(slidefrom
Art Owen)
gene recommender
query
search for regulating conditions
gene recommendersearch for regulating conditions
query
query
gene recommender
searchfor new
candidates
regulating conditions
query +“hits”
gene recommender
regulating conditions
Rbhda-1lin-36rba-2lin-9
queryScore
experiments
1 Score genes
2
gene recommender procedure
dpl-1rba-2K12D12.1RbR06C7.8hda-1B0464.6R06F6.1T16G12.5F55A3.7plk-1lin-9lin-36
hits
computational validation
Score experiments
1 Score genes
2hda-1lin-36rba-2lin-9
query
(no Rb)
1. rba-22. lin-93. dpl-14. R06C7.85. hda-16. B0464.67. R06F6.18. K12D12.19. T16G12.510. F55A3.711. plk-112. Rb13. lin-36
hits
Searching 1 organism
0
50
100
150
200
250
300
Riboso
me*
Calci
um C
hannel
s
Glyco
lysi
s*
Elect
ron T
ransp
ort*
Prote
asom
e*
tRNA S
ynth
etas
es
Fatty
Aci
d Deg
*
TCA Cyc
le
Transl
atio
n Fac
tors
Cell c
ycle
Cholest
erol*
Collagen
Pre
cisi
on
at
50%
Rec
all
BacteriaYeastPlantWormFlyHuman
H.sapquery
Ecdy hits
Anim hits
Opishits
Euk hits
Cell hits
OrthologMap
Ecdy
Opis
Euk
Anim
Cell
H.saphits
H.sap
A.tha hits
H.pyl hits
S.cer hits
C.ele hits
D.mel hits
D.mel
C.ele
S.cer
A.tha
H.pyl
Multiple SpeciesSearch Engine
Orthology Map
cdk-4mcm-5mcm-7n/apcn-1hda-1…
Cdk4Mcm5Mcm7E2fMus209Rpd3…
C.ele
D.mel
MCM3 (8)MCM6 (9)
MCM5 (28)HDAC1 (69)RBBP4 (86)RPA1 (428)
BUB1 (1866)...
GR
H.sap hits
CDK4MCM5MCM7E2F1PCNAHDAC1…
H.sapcell cycle
query
Anim
Ecdy
MCM3* (1) MCM6* (2)HDAC1* (3)MCM5* (4)RBBP4 (5)
...
Animhits
MCM6* (1)BUB1* (2)
HDAC1* (3)MCM3* (4)
RPA1 (5)...
Ecdyhits
H.sap
Hdac1Bub1
Mcm6Rpa1
Mcm3...
mcm-3rpa-1
mcm-6bub-1rba-2hda-1
...
H.sap BTPsof C.ele hits
GR
H.sap BTPsof D.mel hits
GR
HDAC1 (3)BUB1 (21)MCM6 (26)RPA1 (48)MCM3 (60)
...
MCM3 (6)RPA1 (9)
MCM6 (15)BUB1 (24)
RBBP4 (25)HDAC1 (114)
...
Related genes sort to the top of the search lists
Multiple species search is more precise
Multiple species search is more precise
immunological synapse
Gene product Comment
CD8 antigen query
unknown tyrosine kinase lymphocyte specific
T-cell receptor zeta query
CD2 antigen participates in T-cell activation
CD4 antigen (p55) query
unknown Src-like adaptor
negative regulator of T-cell receptor signaling
CD8 antigen query
unknown transcription factor T-cell specific
paired box gene 8 (PAX8) new association
17
34
2
11
4
21
28
14
42
12
36
26
24 23
7
1
5
22
3
15571
15572
Search Engine Directions
• Search gene networks for pathway members– Incorporate multiple data sources in search
– Faster than scanning raw data
• Discriminative search engines– E.g. identify genes coregulated with DNA damage genes
more so than S-phase genes
Search Engine Directions
• Network Recommender– Search gene networks for pathway members
– Incorporate multiple data sources in search
– Faster than scanning raw data
• Discriminative search engines– E.g. identify genes coregulated with DNA damage genes
more so than S-phase genes
MattWeirauch
CoreyPowell Chad
Chen
CharlieVaske Alex
WilliamsMartinaKoeva
Network Recommender
Network Recommendercoexpression
synthetic lethal
physical protein interactions
Iterative Propagation Algorithm
1. Given a set of genes in a pathway A2. Score gene g based on how connected to
predicted pathway members in network i• Si(g) = hwighp(h) / hwigh, • where h ranges over neighbors of g in network i
3. Compute posterior each gene g in pathway• Construct a positive distribution P(Si(g)| g in A)• Construct a negative distribution P(Si(g)| g not in A)
4. Set p(g) = ∏i P(g in A | Si(g))
Network Recommender Performance
recall
prec
isio
n
Network Recommender Results
Network Recommender for cell cycle
- physical proteininteraction
- gene coexpression
Supplemental Material
05
101520253035404550
Pe
rce
nt
Inte
rac
tio
ns
1 3 5 7 9 11 13 nopathnetwork distance
% Synth Leth
% Background
Genetic interactions