Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011
description
Transcript of Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011
![Page 1: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/1.jpg)
Techniques for Improved Probabilistic Inference
in Protein-Structure Determination
via X-Ray CrystallographyAmeet Soni
Department of Computer SciencesDoctoral DefenseAugust 10, 2011
![Page 2: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/2.jpg)
Protein-Structure Determination
2
Proteins essential to cellular function Structural support Catalysis/enzymatic activity Cell signaling
Protein structures determine function
X-ray crystallography main technique for determining structures
X-ray; 88.1%
NMR; 11.3%
Other; 0.6%
![Page 3: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/3.jpg)
Sequences vs Structure Growth
3
![Page 4: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/4.jpg)
Task Overview4
Given A protein sequence Electron-density map
(EDM) of protein
Do Automatically produce a
protein structure that Contains all atoms Is physically feasible
SAVRVGLAIM...
![Page 5: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/5.jpg)
5
Using biochemical domain knowledge and enhanced algorithms for probabilistic inference will produce more accurate and more complete protein structures.
Thesis Statement
![Page 6: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/6.jpg)
Challenges & Related Work6
1 Å 2 Å 3 Å 4 Å
Our Method: ACMI
ARP/wARPTEXTAL & RESOLVE
Resolution is a
property of the protein
Higher Resolution : Better Image Quality
![Page 7: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/7.jpg)
Outline7
Background and Motivation ACMI Roadmap and My Contributions Inference in ACMI Guided Belief Propagation Probabilistic Ensembles in ACMI (PEA) Conclusions and Future Directions
![Page 8: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/8.jpg)
Outline8
Background and Motivation ACMI Roadmap and My Contributions Inference in ACMI Guided Belief Propagation Probabilistic Ensembles in ACMI (PEA) Conclusions and Future Directions
![Page 9: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/9.jpg)
ACMI Roadmap(Automated Crystallographic Map Interpretation)9
Perform Local Match Apply Global Constraints Sample Structure
Phase 1 Phase 2 Phase 3
prior probability of
each AA’s location
posterior probabilityof each AA’s location
all-atom protein structures
bk
bk-1
bk+1*1…M
![Page 10: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/10.jpg)
Analogy: Face Detection10
Phase 1 Find Nose Find Eyes Find Mouth
Phase 2 Combine and Apply Constraints
Phase 3 Infer Face
![Page 11: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/11.jpg)
Phase 1: Local Match Scores11
General CS area: 3D shape matching/object recognition
Given: EDM, sequenceDo: For each amino acid in the sequence, score its match to every location in the EDM
My Contributions Spherical-harmonic decompositions for local match
[DiMaio, Soni, Phillips, and Shavlik, BIBM 2007] {Ch. 7} Filtering methods using machine learning
[DiMaio, Soni, Phillips, and Shavlik, IJDMB 2009] {Ch. 7} Structural homology using electron density [Ibid.] {Ch.
7}
![Page 12: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/12.jpg)
Phase 2: Apply Global Constraints
12
General CS area: Approximate probabilistic inference
Given: Sequence, Phase 1 scores, constraintsDo: Posterior probability for each amino acid’s
3D location given all evidence
My Contributions Guided belief propagation using domain knowledge
[Soni, Bingman, and Shavlik, ACM BCB 2010] {Ch. 5} Residual belief propagation in ACMI [Ibid.] {Ch. 5} Probabilistic ensembles for improved inference
[Soni and Shavlik, ACM BCB 2011] {Ch. 6}
![Page 13: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/13.jpg)
Phase 3: Sample Protein Structure
13
General CS area: Statistical sampling
Given: Sequence, EDM, Phase 2 posteriorsDo: Sample all-atom protein structure(s)
My Contributions Sample protein structures using particle filters
[DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, Shavlik, Bioinformatics 2007] {Ch. 8}
Informed sampling using domain knowledge [Unpublished elsewhere] {Ch. 8}
Aggregation of probabilistic ensembles in sampling[Ibid. ACM BCB 2011] {Ch. 6}
![Page 14: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/14.jpg)
Comparison to Related Work[DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007]
14
[Ch. 8 of dissertation]
![Page 15: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/15.jpg)
Outline15
Background and Motivation ACMI Roadmap and My Contributions Inference in ACMI Guided Belief Propagation Probabilistic Ensembles in ACMI (PEA) Conclusions and Future Directions
![Page 16: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/16.jpg)
ACMI Roadmap16
Perform Local Match Apply Global Constraints Sample Structure
Phase 1 Phase 2 Phase 3
prior probability of
each AA’s location
posterior probabilityof each AA’s location
all-atom protein structures
bk
bk-1
bk+1*1…M
![Page 17: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/17.jpg)
Phase 2 – Probabilistic Model
17
ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF)
LEU4 SER5GLY2 LYS3ALA1
![Page 18: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/18.jpg)
Size of Probabilistic Model18
# nodes: ~1,000# edges:
~1,000,000
![Page 19: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/19.jpg)
Approximate Inference19
Best structure intractable to calculateie, we cannot infer the underlying structure analytically
Phase 2 uses Loopy Belief Propagation (BP) to approximate solution Local, message-passing scheme Distributes evidence among nodes
Convergence not guaranteed
![Page 20: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/20.jpg)
Example: Belief Propagation20
LYS31 LEU32
mLYS31→LEU32
pLEU32pLYS31
![Page 21: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/21.jpg)
Example: Belief Propagation21
LYS31 LEU32
mLEU32→LEU31
pLEU32pLYS31
![Page 22: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/22.jpg)
Shortcomings of Phase 222
Inference is very difficult ~106 possible locations for each amino acid ~100-1000s of amino acids in one protein Evidence is noisy O(N2) constraints
Solutions are approximate,room for improvement
![Page 23: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/23.jpg)
Outline23
Background and Motivation ACMI Roadmap and My Contributions Inference in ACMI Guided Belief Propagation Probabilistic Ensembles in ACMI (PEA) Conclusions and Future Directions
![Page 24: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/24.jpg)
Best case: wasted resources Worst case: poor information is excessive influence
Message Scheduling [ACM-BCB 2010]{Ch. 5}
24
SERLYSALA
Key design choice: message-passing schedule When BP is approximate, ordering affects
solution[Elidan et al, 2006]
Phase 2 uses a naïve, round-robin schedule
![Page 25: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/25.jpg)
Using Domain Knowledge25
Biochemist insight: well-structured regions of protein correlate with strong features in density map eg, helices/strands have stable conformations
Disordered regions are more difficult to detect
General idea: prioritize what order messages are sent using expert knowledge eg, disordered amino acids receive less priority
![Page 26: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/26.jpg)
Guided Belief Propagation26
![Page 27: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/27.jpg)
Related Work27
Assumption: messages with largest change in value are more useful
Residual Belief Propagation [Elidan et al, UAI 2006] Calculates residual factor for each node
Each iteration, highest-residual node passes messages
General BP technique
![Page 28: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/28.jpg)
Experimental Methodology28
Our previous technique: naive, round robin (ORIG)
My new technique: Guidance using disorder prediction (GUIDED) Disorder prediction using DisEMBL [Linding et
al, 2003] Prioritize residues with high stability (ie, low
disorder)
Residual factor (RESID) [Elidan et al, 2006]
![Page 29: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/29.jpg)
Experimental Methodology29
Run whole ACMI pipeline Phase 1: Local amino-acid finder (prior
probabilities) Phase 2: Either ORIG, GUIDED, RESID Phase 3: Sample all-atom structures from
Phase 2 results
Test set of 10 poor-resolution electron-density maps From UW Center for Eukaryotic Structural
Genomics Deemed the most difficult of a large set of
proteins
![Page 30: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/30.jpg)
Phase 2 Accuracy: Percentile Rank30
x P(x)
A 0.10
B 0.30
C 0.35
D 0.20
E 0.05
Truth 100%60%Truth
![Page 31: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/31.jpg)
Phase 2 Marginal Accuracy31
![Page 32: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/32.jpg)
Protein-Structure Results Do these better marginals produce more
accurate protein structures?
RESID fails to produce structures in Phase 3 Marginals are high in entropy (28.48 vs 5.31) Insufficient sampling of correct locations
32
![Page 33: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/33.jpg)
Phase 3 Accuracy:Correctness and Completeness
33
Correctness akin to precision – percent of predicted structure that is accurate
Completeness akin to recall – percent of true structure predicted accurately
Truth Model A Model B
![Page 34: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/34.jpg)
Protein-Structure Results34
![Page 35: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/35.jpg)
Outline35
Background and Motivation ACMI Roadmap and My Contributions Inference in ACMI Guided Belief Propagation Probabilistic Ensembles in ACMI (PEA) Conclusions and Future Directions
![Page 36: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/36.jpg)
Ensembles: the use of multiple models to improve predictive performance
Tend to outperform best single model [Dietterich ‘00] eg, 2010 Netflix prize
Ensemble Methods [ACM-BCB 2011]{Ch. 6}
36
![Page 37: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/37.jpg)
Phase 2: Standard ACMI37
Protocol
MRF
P(bk)
message-scheduler: how ACMI sends messages
![Page 38: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/38.jpg)
Phase 2: Ensemble ACMI38
Protocol 1
MRF
Protocol 2
Protocol C
P1(bk)
P2(bk)
PC(bk)
…
…
![Page 39: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/39.jpg)
Probabilistic Ensembles in ACMI (PEA)39
New ensemble framework (PEA) Run inference multiple times, under
different conditions Output: multiple, diverse, estimates of each
amino acid’s location
Phase 2 now has several probability distributions for each amino acid, so what? Need to aggregate distributions in Phase 3
![Page 40: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/40.jpg)
ACMI Roadmap40
Perform Local Match Apply Global Constraints Sample Structure
Phase 1 Phase 2 Phase 3bk
bk-1
bk+1*1…M
prior probability of
each AA’s location
posterior probabilityof each AA’s location
all-atom protein structures
![Page 41: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/41.jpg)
Place next backbone atom
Backbone Step (Prior Work)41
(1) Sample bk from empirical Ca- Ca- Ca pseudoangle distribution
bk-1b'k
bk-2
????
?
![Page 42: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/42.jpg)
Place next backbone atom
Backbone Step (Prior Work)42
0.25…
bk-1
bk-2
(2) Weight each sample by its Phase 2 computed marginal
b'k0.20
0.15
![Page 43: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/43.jpg)
Place next backbone atom
Backbone Step (Prior Work)43
0.25…
bk-1
bk-2
(3) Select bk with probability proportional to sample weight
b'k0.20
0.15
![Page 44: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/44.jpg)
Backbone Step for PEA44
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? Aggregator
w(b'k)
![Page 45: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/45.jpg)
Backbone Step for PEA: Average
45
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? AVG
0.14
![Page 46: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/46.jpg)
Backbone Step for PEA: Maximum
46
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? MAX
0.23
![Page 47: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/47.jpg)
Backbone Step for PEA: Sample
47
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? SAMP
0.15
![Page 48: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/48.jpg)
Recap of ACMI (Prior Work)48
Prot
ocol
P(bk)
0.25
…
bk-1
bk-2
0.20
0.15
Phase 2 Phase 3
![Page 49: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/49.jpg)
Prot
ocol
Prot
ocol
Recap of PEA49
Prot
ocol
bk-1
bk-2
0.14
…
0.26
0.05
Phase 2 Phase 3Ag
greg
ato
r
![Page 50: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/50.jpg)
Results: Impact of Ensemble Size
50
![Page 51: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/51.jpg)
Experimental Methodology51
PEA (Probabilistic Ensembles in ACMI) 4 ensemble components Aggregators: AVG, MAX, SAMP
ACMI ORIG – standard ACMI (prior work) EXT – run inference 4 times as long BEST – test best of 4 PEA components
![Page 52: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/52.jpg)
Phase 2 Results: PEA vs ACMI
52
*p-value < 0.01
![Page 53: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/53.jpg)
Protein-Structure Results: PEA vs ACMI53
*p-value < 0.05
![Page 54: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/54.jpg)
Protein-Structure Results: PEA vs ACMI54
![Page 55: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/55.jpg)
Outline55
Background and Motivation ACMI Roadmap and My Contributions Inference in ACMI Guided Belief Propagation Probabilistic Ensembles in ACMI (PEA) Conclusions and Future Directions
![Page 56: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/56.jpg)
My Contributions56
Perform Local Match Apply Global Constraints Sample Structure
• Local matching with spherical harmonics
• First-pass filtering
• Machine-learning search filter
• Structural homology detection
• Guided BP using domain knowledge
• Residual BP in ACMI
• Probabilistic Ensembles in ACMI
• All-atom structure sampling using particle filters
• Incorporating domain knowledge into sampling
• Aggregation of ensemble estimates
![Page 57: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/57.jpg)
Overall Conclusions57
ACMI is the state-of-the-art method for determining protein structures in low-quality images
Broader implications Phase 1: Shape Matching, Signal Processing,
Search Filtering Phase 2: Graphical models, Statistical Inference Phase 3: Sampling, Video Tracking
Structural biology is a good example of a challenging probabilistic inference problem Guiding BP and PEA are general solutions
![Page 58: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/58.jpg)
UCH37 [PDB 3IHR]58
E. S. Burgie et al. Proteins: Structure, Function, and Bioinformatics. In-Press
![Page 59: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/59.jpg)
Further Work on ACMI59
Advanced Filtering in Phase 1 Generalize Guided BP
Requires domain knowledge priority function
Generalize PEA Learning; Compare to other approaches
More structures (membrane proteins) Domain knowledge in Phase 3 scoring
![Page 60: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/60.jpg)
Future Work60
Inference in complex domains Non-independent data Combining multiple object types Relations among data sets
Biomedical applications Medical diagnosis Brain imaging Cancer screening Health record analysis
![Page 61: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/61.jpg)
Acknowledgements61
Advisor: Jude Shavlik
Committee: George Phillips, David Page, Mark Craven, Vikas Singh
Collaborators: Frank DiMaio and Sriraam Natarajan, Craig Bingman, Sethe Burgie, Dmitry Kondrashov
Funding: NLM R01-LM008796, NLM Training Grant T15- LM007359, NIH PSI Grant GM074901
Practice Talk Attendees: Craig, Trevor, Deborah, Debbie, Aubrey
ML Group
![Page 62: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/62.jpg)
Acknowledgements62
Friends: Nick, Amy, Nate, Annie, Greg, Ila, 2*(Joe and Heather), Dana, Dave, Christine, Emily, Matt, Jen,
Mike, Angela, Scott, Erica, and others
Family: Bharat, Sharmistha, Asha, Ankoor, and EmilyDale, Mary, Laura, and Jeff
![Page 63: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/63.jpg)
Thank you!
![Page 64: Ameet Soni Department of Computer Sciences Doctoral Defense August 10, 2011](https://reader035.fdocuments.net/reader035/viewer/2022062501/56816733550346895ddbded0/html5/thumbnails/64.jpg)
Publications•A. Soni and J. Shavlik, “Probabilistic ensembles for improved inference in protein-
structure determination,” in Proceedings of the ACM International Conference on Bioinformatics and Computational Biology, 2011
•A. Soni, C. Bingman, and J. Shavlik, “Guiding belief propagation using domain knowledge for protein-structure determination,” in Proceedings of ACM International Conference on Bioinformatics and Computational Biology, 2010.
•E. S. Burgie, C. A. Bingman, S. L. Grundhoefer, A. Soni, and G. N. Phillips, Jr., “Structural characterization of Uch37 reveals the basis of its auto-inhibitory mechanism.” Proteins: Structure, Function, and Bioinformatics, In-Press. PDB ID: 3IHR.
•F. DiMaio, A. Soni, G. N. Phillips, and J. Shavlik, “Spherical-harmonic decomposition for molecular recognition in electron-density maps,” International Journal of Data Mining and Bioinformatics, 2009.
•F. DiMaio, A. Soni, and J. Shavlik, “Machine learning in structural biology: Interpreting 3D protein images,” in Introduction to Machine Learning and Bioinformatics, ed. Sushmita Mitra, Sujay Datta, Theodore Perkins, and George Michailidis, Ch. 8. 2008.
•F. DiMaio, A. Soni, G. N. Phillips, and J. Shavlik, “Improved methods for template matching in electron-density maps using spherical harmonics,” in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2007.
•F. DiMaio, D. Kondrashov, E. Bitto, A. Soni, C. Bingman, G. Phillips, and J. Shavlik, “Creating protein models from electron-density maps using particle-filtering methods,” Bioinformatics, 2007.
64