Modeling Dependencies in Protein-DNA Binding Sites 1 School of Computer Science & Engineering 2...

Modeling Dependencies in Protein-DNA Binding Sites

1 School of Computer Science & Engineering2 Hadassah Medical School

The Hebrew University, Jerusalem, Israel

Yoseph Barash 1

Gal Elidan 1

Nir Friedman 1

Tommy Kaplan 1,2

promoter

binding site

Dependent positions in binding sites

Pros: Biology suggests dependencies Single amino-acid interacts with two nucleotides Change in conformation of protein or DNA

Cons: Modeling dependencies is harder Additional parameters Requires more data, not as robust

To model or not to model dependencies ?[Man & Stormo 2001, Bulyk et al, 2002, Benos et al, 2002]

Most approaches assume position independence

Can we learn dependencies from available genomic data ?

Do dependency models perform better ?

Outline Flexible models of dependencies Learning from (un)aligned sequences Systematic evaluation

Biological insights

Data driven approach

How to model binding sites ?

))P(X)P(X)P(X)P(XP(X)XP(X 543215 1 T

5432151 T)|T)P(X|T)P(X|T)P(X|T)P(X|P(T)P(X)XP(X )X|)P(X)P(XX|)P(XX|)P(XP(X)XP(X 354133215 1

X1 X2 X3 X4 X5 Profile: Independency model

Tree: Direct dependencies

Mixture of Profiles:Global dependencies

Mixture of Trees:Both types of dependencies

X1 X2 X3 X4 X5

3541332151 )XT,|T)P(X|)P(XXT,|)P(XXT,|T)P(X|P(T)P(X)XP(X

? )X X X X P(X 54321 represent a distribution of binding sites

Learning models: Aligned binding sites

Learning based on methods for probabilistic graphical models (Bayesian networks)

GCGGGGCCGGGCTGGGGGCGGGGTAGGGGGCGGGGGTAGGGGCCGGGCTGGGGGCGGGGTAAAGGGCCGGGCGGGAGGCCGGGAGCGGGGCGGGGCGAGGGGACGAGTCCGGGGCGGTCCATGGGGCGGGGC

Aligned binding sitesModels

X1 X2 X3 X4 X5

LearningMachinery

select maximum likelihood model

Evaluation using aligned data

Estimate generalization of each model:

Test: how probable is the site given the model?

-20.34-23.03-21.31-19.10-18.42-19.70-22.39-23.54-22.39-23.54-18.07-19.18-18.31-21.43

ATGGGGCGGGGCGTGGGGCGGGGCATGGGGCGGGGCGTGGGGCGGGGCGCGGGGCGGGGCGAGGGGACGAGTCCGGGGCGGTCCATGGGGCGGGGC

GCGGGGCCGGGCTGGGGGCGGGGTAGGGGGCGGGGGTAGGGGCCGGGCTGGGGGCGGGGTTGGGGGCCGGGC

Data set Test Log-LikelihoodTest setTraining set

Testavg. LL = -20.77

95 TFs with ≥ 20 binding sites from TRANSFAC database [Wingender et al, 2001’]

Cross-validation:

Arabidopsis ABA binding factor 1

Profile

Test LL per instance -19.93

Mixture of Profiles76%

Test LL per instance -18.70 (+1.23)(improvement in likelihood > 2-fold)

X4 X5 X6 X7 X8 X9 X10 X11 X12

Test LL per instance -18.47 (+1.46)(improvement in likelihood > 2.5-fold)

Likelihood improvement over profiles

TRANSFAC 95 aligned data sets

10 20 30 40 50 60 70 80 90

Significant(paired t-test)

ood Not significant

Significant improvement in generalization

Data often exhibits dependencies

Sources of data: Gene annotation (e.g. Hughes et al, 2000)

Gene expression (e.g. Spellman et al, 1998; Tavazoie et al, 2000)

ChIP (e.g. Simon et al, 2001; Lee et al, 2002)

Motif finding problemInput: A set of potentially co-regulated genes

Output: A common motif in their promoters

Evaluation for unaligned data

EM algorithm

Learning models: unaligned data

Use EM algorithm to simultaneously Identify binding site positions Learn a dependency model

Unaligned Data

Learna model

Identify binding

ModelsX1 X2 X3 X4 X5

X1 X2 X3 X4 X5

ChIP location analysis[Lee et al, 2002]

Yeast genome-wide location experiments Target genes for 106 TFs in 146 experiments

YAL005C...

YAL010CYAL012CYAL013WYPR201W

YAL001CYAL002WYAL003W

+ – +– ...

+ –––

ABF1 Targets

– +––. ..

– ++ –

ZAP1 Targets…....

# genes ~ 6000

Learned Mixture of Profiles

Example: Models learned for ABF1 (YPD) Autonomously replicating sequence-binding factor 1

Learned profile

Known profile(from TRANSFAC)

Evaluating PerformanceDetect target genes on a genomic scale:

ACGTAT…………….………………….AGGGATGCGAGC-1000 0-473

-180 -160 -140 -120 -100 -80 -60

Profile

Evaluating Performance

Mix of Trees

Bonferroni corrected p-value ≤ 0.01

Gal4 regulates Gal80

Biologicallyverified site

Detect target genes on a genomic scale:

YAL005CYAL007CYAL008WYAL009WYAL010CYAL012CYAL013WYPR201W

Evaluation using ChIP location data[Lee et al, 2002]

Evaluate using a 5-fold cross-validation test:

Data set Test set Prediction

– +––+ –––

––– – ++– –

Evaluate using a 5-fold cross-validation test:

– +––+ –––

√√√√FN√√√FP√√

YAL005CYAL007CYAL008WYAL009WYAL010CYAL012CYAL013WYPR201W

Data set

Prediction

Evaluation using ChIP location data[Lee et al, 2002]

0% 1% 2% 3% 4% 5%

False Positive Rate

Profile

Example: ROC curve of HSF1

Mixture of Trees

~60 FP

Mixture of Profiles

-20 -10 0 10 20 30 40 50 60

Δ sensitivity

Tree vs. ProfileTrue

Predicted

Improvement in sensitivity & specificity

SensitivityTP / True

SpecificityTP / Predicted

105 unaligned data sets from Lee et al.

-20 -10 0 10 20 30 40 50 60

Δ sensitivity

Mixture of Profiles vs. ProfileTrue

Predicted

-20 -10 0 10 20 30 40 50 60

Δ sensitivity

Mixture of Trees vs. ProfileTrue

Predicted

“Is it worthwhile to model dependencies?”Evaluation clearly supports this

What about the underlying biology ?(with Prof. Hanah Margalit, Hadassah Medical School)

Distance between dependent positions

1 2 3 4 5 6 7 8 9 10 11

Distance

Weak (< 0.3 bits)

Medium (< 0.7 bits)

Strong

Tree models learned from the aligned data sets

< 1/3 of the dependencies

Zinc finger

bZIPbHLH

Turn Helix

β Sheetothers ???

Structural families

Dependency models vs. Profile on aligned data sets

10 20 30 40 50 60 70 80 90

Significant(paired t-test)

Not Significant

Conclusions Flexible framework for learning dependenciesDependencies are found in many cases It is worthwhile to model them -

Better learning and binding site prediction

http://compbio.cs.huji.ac.il/TFBN

Future work Link to the underlying structural biology Incorporate as part of other regulatory

mechanism models

Modeling Dependencies in Protein-DNA Binding Sites 1 School of Computer Science & Engineering 2...

Documents

Transcript of Modeling Dependencies in Protein-DNA Binding Sites 1 School of Computer Science & Engineering 2...

Hadassah Magazin 2010

Beyond Instruction Level Parallelism - School of Computer ...cs.hadassah.ac.il/staff/martin/Adv_Architecture/slide07-1.pdf · Advanced Computer Architecture — Hadassah College —

VOL. - Hadassah Women's Zionist Organization of America

Hadassah Ppt Early Jewish Display

Hadassah winter 2013 14

New Hadassah Template 2.0

Hadassah Medical Center Jerusalem - ust-cartelalfa.ro A4 Hadassah - AMC.pdf · • Reparaţia cicatricilor cheloide şi hipertrofice Chirurgie plastică pediatrică • Tratamentul

Stanford typed dependencies manual - SourceForgegrammarscope.sourceforge.net/dependencies-manual.pdfStanford typed dependencies manual ... rather than the phrase structure representations

OSGi Community Event 2010 - Dependencies, dependencies, dependencies

Hadassah Dinner Book

Dependencies, dependencies, dependencies

Ora Paltiel, MD, MSc Braun School of Public Health & Community Medicine Hebrew University of Jerusalem Hadassah Medical Organization Israel.

MALABSORPTION MICHAEL WILSCHANSKI MICHAEL WILSCHANSKI PEDIATRIC GASTROENTEROLOGY UNIT PEDIATRIC GASTROENTEROLOGY UNIT HADASSAH UNIVERSITY HOSPITAL HADASSAH.

Target INR 2.0 - 3.0 - Hadassah Medical Center

Hadassah Presentation

Message Passing Architecture - School of Computer …cs.hadassah.ac.il/staff/martin/Adv_Architecture/slide09-1.pdfAdvanced Computer Architecture — Hadassah College — Fall 2016

Hadassah Brooklyn Region

Hadassah Annual Report 2014/15

Dependencies Manual

Functional Dependencies