Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.
-
Upload
maximilian-mosley -
Category
Documents
-
view
217 -
download
0
Transcript of Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.
![Page 1: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/1.jpg)
Exciting Bioinformatics
Adventures
Limsoon WongInstitute for Infocomm
Research
![Page 2: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/2.jpg)
Plan
• Treatment optimization of childhood ALL• Treatment prognosis of DLBC lymphoma• Prediction of translation initiation site• Prediction of vaccine target• Reliability Assessment of Y2H expts
![Page 3: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/3.jpg)
Treatment Optimization of
Childhood Leukemia
Image credit: FEER
![Page 4: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/4.jpg)
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong
Childhood ALL
• Major subtypes are: T-ALL, E2A-PBX, TEL-AML, MLL genome rearrangements, Hyperdiploid>50, BCR-ABL
• Diff subtypes respond differently to same Tx
• Over-intensive Tx – Development of
secondary cancers– Reduction of IQ
• Under-intensiveTx – Relapse
• The subtypes look similar
• Conventional diagnosis– Immunophenotyping– Cytogenetics– Molecular diagnostics
• Unavailable in most ASEAN countries
![Page 5: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/5.jpg)
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong
Image credit: Affymetrix
Single-Test Platform ofMicroarray & Machine
Learning
![Page 6: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/6.jpg)
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong
Multidimensional Scaling Plot Subtype Diagnosis
![Page 7: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/7.jpg)
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong
Is there a new subtype?
• Hierarchical clustering of gene expression profiles reveals a novel subtype of childhood ALL
![Page 8: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/8.jpg)
Conclusions
Conventional Tx:• intermediate intensity to everyone 10% suffers relapse 50% suffers side effects costs US$150m/yr
Our optimized Tx:• high intensity to 10%• intermediate intensity to 40%• low intensity to 50%• costs US$100m/yr
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong
•High cure rate of 80%• Less relapse
• Less side effects• Save US$51.6m/yr
![Page 9: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/9.jpg)
References
• E.-J. Yeoh et al., “Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling”, Cancer Cell, 1:133--143, 2002
![Page 10: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/10.jpg)
Treatment Prognosis for DLBC
Lymphoma
Image credit: Rosenwald et al, 2002
![Page 11: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/11.jpg)
Diffuse Large B-Cell Lymphoma
• DLBC lymphoma is the most common type of lymphoma in adults
• Can be cured by anthracycline-based chemotherapy in 35 to 40 percent of patients
DLBC lymphoma comprises several diseases that differ in responsiveness to chemotherapy
• Intl Prognostic Index (IPI) – age, “Eastern Cooperative
Oncology Group” Performance status, tumor stage, lactate dehydrogenase level, sites of extranodal disease, ...
• Not very good for stratifying DLBC lymphoma patients for therapeutic trials
Use gene-expression profiles to predict outcome of chemotherapy?
Copyright © 2005 by Limsoon Wong. Adapted from Huiqing Liu
![Page 12: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/12.jpg)
Knowledge Discovery from Gene Expression of “Extreme” Samples
“extreme”sampleselection:< 1 yr vs > 8 yrs
knowledgediscovery from gene expression
240 samples
80 samples26 long-
term survivors
47 short-term survivors
7399genes
84genes
T is long-term if S(T) < 0.3
T is short-term if S(T) > 0.7 Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong
![Page 13: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/13.jpg)
p-value of log-rank test: < 0.0001Risk score thresholds: 0.7, 0.3
Kaplan-Meier Plot for 80 Test Cases
Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong
![Page 14: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/14.jpg)
(A) IPI low, p-value = 0.0063
(B) IPI intermediate,p-value = 0.0003
Improvement Over IPI
Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong
![Page 15: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/15.jpg)
(A) W/o sample selection (p =0.38) (B) With sample selection (p=0.009)
No clear difference on the overall survival of the 80 samples in the validation group of DLBCL study, if no training sample selection conducted
Merit of “Extreme” Samples
Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong
![Page 16: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/16.jpg)
References
• H. Liu et al, “Selection of patient samples and genes for outcome prediction”, Proc. CSB2004, pages 382--392
![Page 17: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/17.jpg)
Protein Translation Initiation Site Recognition
![Page 18: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/18.jpg)
299 HSU27655.1 CAT U27655 Homo sapiensCGTGTGTGCAGCAGCCTGCAGCTGCCCCAAGCCATGGCTGAACACTGACTCCCAGCTGTG 80CCCAGGGCTTCAAAGACTTCTCAGCTTCGAGCATGGCTTTTGGCTGTCAGGGCAGCTGTA 160GGAGGCAGATGAGAAGAGGGAGATGGCCTTGGAGGAAGGGAAGGGGCCTGGTGCCGAGGA 240CCTCTCCTGGCCAGGAGCTTCCTCCAGGACAAGACCTTCCACCCAACAAGGACTCCCCT............................................................ 80................................iEEEEEEEEEEEEEEEEEEEEEEEEEEE 160EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 240EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
A Sample cDNA
• What makes the second ATG the TIS?
Copyright © 2005 by Limsoon Wong
![Page 19: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/19.jpg)
Approach
• Training data gathering• Signal generation
– k-grams, distance, domain know-how, ...
• Signal selection– Entropy, 2, CFS, t-test, domain know-how...
• Signal integration– SVM, ANN, PCL, CART, C4.5, kNN, ...
Copyright © 2005 by Limsoon Wong
![Page 20: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/20.jpg)
Amino-Acid Features
Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong
![Page 21: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/21.jpg)
Amino-Acid Features
Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong
![Page 22: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/22.jpg)
Amino Acid K-grams Discovered (by entropy)
Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong
![Page 23: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/23.jpg)
Validation Results (on Hatzigeorgiou’s)
• Using top 100 features selected by entropy and trained on Pedersen & Nielsen’s dataset
Copyright © 2005 by Limsoon Wong. Adapted from Huiqing Liu
![Page 24: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/24.jpg)
ATGpr
Ourmethod
Validation Results (on Chr X and Chr 21)
• Using top 100 features selected by entropy and trained on Pedersen & Nielsen’s
Copyright © 2005 by Limsoon Wong. Adapted from Huiqing Liu
![Page 25: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/25.jpg)
References
• L. Wong et al., “Using feature generation and feature selection for accurate prediction of translation initiation sites”, GIW 13:192--200, 2002
![Page 26: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/26.jpg)
Image credit: Asif Khan
Vaccine Target Prediction
![Page 27: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/27.jpg)
T-Cell Epitope Prediction
• Why?– Only 1%-5% of peptides
from a protein bind to any one HLA molecule
– Traditional approaches are slow, & inapplicable to large-scale screening
Computer Modeling – Enable systematic
screening for HLA binders
– Minimize number of expts
– Reduce cost 10x
• Challenges:– There are ~2000
variants of HLA classified in ~20 supertypes
– Relatively small number of expt data on peptides that bind HLA molecules
– for majority of HLA molecules expt data do not exist
H1 H4H3H2
P1P2P3P4
Promiscuous peptides
One supertype
Copyright © 2005 by Limsoon Wong. Adapted from Asif Khan.
![Page 28: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/28.jpg)
Multipred Approach
Copyright © 2005 by Asif Khan, Guanglan Zhang, Vladimir Brusic
![Page 29: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/29.jpg)
FP
FN
DR supertype
Cut-offThreshold
HCV IB protein sequence
Copyright © 2005 by Asif Khan, Guanglan Zhang, Vladimir Brusic
Expt Validation
![Page 30: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/30.jpg)
Accuracy of Multipred
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
ANN
HMM
SVM
ANN 0.87 0.76 0.88 0.93 0.91 0.87
HMM 0.93 0.73 0.92 0.94 0.88 0.88
SVM 0.90 0.81 0.93 0.97 0.85 0.89
A-0201 A-0202 A-0204 A-0205 A-0206 avearage
Copyright © 2005 by Asif Khan, Guanglan Zhang, Vladimir Brusic
![Page 31: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/31.jpg)
Conclusions
• Computer models are necessary to aid in identification of vaccine targets
• Prediction models built are both sensitive and specific
• MULTIPRED can identify promiscuous peptides and immunological hot-spots which are useful for vaccine design
• Hot-spots are ideal for development of epitope-based vaccines
![Page 32: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/32.jpg)
References
• K.N. Srinivasan, et al. “Predictions of Class I T-cell epitopes: Evidence of presence of immunological hot spots inside antigens”, Bioinformatics, 20:i297-i302, 2004.
![Page 33: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/33.jpg)
% of TP based on co-localization% of TP based on shared cellular role (I = 1)% of TP based on shared cellular role (I = .95)
TP = ~50% Image credit: Sprinzak et al, 2003
Assessing Reliability
of Protein-Protein Interaction Expts
![Page 34: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/34.jpg)
Large disagreement betw methods
Copyright © 2005 by Limsoon Wong. Adapted from Sprinzak et al, 2003
Some Protein Interaction Data Sets
• Can we find a way to rank candidate interacting pairs according to their reliability?
![Page 35: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/35.jpg)
Copyright © 2005 by Limsoon Wong. Adapted from Chen et al, 2004
Some “Reasonable” Speculations
• A true interacting pair is often connected by at least one alternative path (reason: a biological function is performed by a highly interconnected network of interactions)
• The shorter the alternative path, the more likely the interaction (reason: evolution of life is through “add-on” interactions of other or newer folds onto existing ones)
Existence of a strong short alternative path connecting an interacting pair indicates that the interaction is “reliable”
![Page 36: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/36.jpg)
Interaction Pathway Reliability
Copyright © 2005 by Limsoon Wong. Adapted from Chen et al, 2004
![Page 37: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/37.jpg)
The number of pairs not in theintersection of Ito & Uetz is notchanged much wrt the ipr valueof the pairs
The number of pairs in theintersection of Ito & Uetzincreases wrt the ipr valueof the pairs
Evaluation wrt Reproducible Interactions
• “ipr” correlates well to “reproducible” interactions
• “ipr” seems to work
Copyright © 2005 by Limsoon Wong. Adapted from Chen et al, 2004
![Page 38: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/38.jpg)
At the ipr thresholdthat eliminated 80%of pairs, ~85% of theof the remaining pairshave common cellularroles
Evaluation wrt Common Cellular Role, etc
• “ipr” correlates well to common cellular roles, localization, & expression
Copyright © 2005 by Limsoon Wong. Adapted from Chen et al, 2004
![Page 39: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/39.jpg)
Evaluation wrt “Many-few” Interactions
• Number of “Many-few” interactions increases when more “reliable” IPR threshold is used to filter interactions
• Consistent with the Maslov-Sneppen prediction
Part of the network of physical interactions reported byIto et al., PNAS, 2001
Copyright © 2005 by Limsoon Wong. Adapted from Chen et al., 2004
![Page 40: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/40.jpg)
Evaluation wrt “Cross-Talkers”
• A MIPS functional cat:– | 02 | ENERGY – | 02.01 | glycolysis and gluconeogenesis – | 02.01.01 | glycolysis methylglyoxal
bypass – | 02.01.03 | regulation of glycolysis &
gluconeogenesis
• First 2 digits is top cat• Other digits add more
granularity to the cat Compare non-co-
localized high- & low- IPR pairs to find number that fall into same cat. More high-IPR pairs in same cat, then IPR works
• For top cat– 148/257 high-IPR pairs
are in same cat– 65/260 low-IPR pairs are
in same cat
• For fine-granularity cat– 135/257 high-IPR pairs
are in same cat.37/260 low-IPR pairs are in same cat
IPR works IPR pairs that are not
co-localized are real cross-talkers!
Copyright © 2005 by Limsoon Wong.
![Page 41: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/41.jpg)
Conclusions
• There are latent local & global “motifs” that indicate the likelihood of protein interactions
• These motifs can be exploited in computational elimination of false positives from high-throughput Y2H expts
Copyright © 2005 by Limsoon Wong.
![Page 42: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/42.jpg)
References
• J. Chen et al, “Mining high-throughput experimental data for reliable protein interaction data using using network”, 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2004), Florida, November 15-17, 2004
![Page 43: Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649ddc5503460f94ad3f18/html5/thumbnails/43.jpg)
Acknowledgements
• Childhood ALL: – Jinyan Li, Huiqing Liu– Allen Yeoh
• DLBC Lymphoma:– Jinyan Li, Huiqing Liu
• Translation Initiation: – Fanfan Zeng, Roland
Yap– Huiqing Liu
• T-Cell Epitopes: – Vladimir Brusic, Asif
Khan, Guanglan Zhang– Tom August, KN
Srinivasan
• Protein Interaction Reliability: – Jin Chen, Mong Li Lee,
Wynne Hsu– See-Kiong Ng– Prasanna Kolatkar, Jer-
Ming Chia