Limsoon Wong Kent Ridge Digital Labs Singapore
-
Upload
imani-powers -
Category
Documents
-
view
16 -
download
1
description
Transcript of Limsoon Wong Kent Ridge Digital Labs Singapore
Show & Tell
Limsoon WongKent Ridge Digital Labs
Singapore
From Informaticsto Bioinformatics
Show & Tell
What is Bioinformatics?What is Bioinformatics?
Show & Tell
What are the Themes of Bioinformatics?What are the Themes of Bioinformatics?
Bioinformatics =
Data Mgmt + Knowledge Discovery
Data Mgmt =
Integration + Transformation + Cleansing
Knowledge Discovery =
Statistics + Algorithms + Databases
Show & Tell
What are the Benefits of Bioinformatics?What are the Benefits of Bioinformatics?
To the patient:
Better drug, better treatment To the pharma:
Save time, save cost, make more $ To the scientist:
Better science
Show & Tell
Data IntegrationData Integration
A DOE “impossible query”:
For each gene on a given cytogenetic band,
find its non-human homologs. source type location remarks
GDB Sybase Baltimore Flat tablesSQL joinsLocation info
Entrez ASN.1 Bethesda Nested tablesKeywordsHomolog info
Show & Tell
Data Integration ResultsData Integration Resultssybase-add (#name:”GDB", ...);
create view L from locus_cyto_location using GDB;
create view E from object_genbank_eref using GDB;
select
#accn: g.#genbank_ref, #nonhuman-homologs: H
from
L as c, E as g,
(select u
from g.#genbank_ref.na-get-homolog-summary as u
where not(u.#title string-islike "%Human%") andalso
not(u.#title string-islike "%H.sapien%")) as H
where
c.#chrom_num = "22” andalso
g.#object_id = c.#locus_id andalso
not (H = { });
• Using Kleisli:
• Clear
• Succint
• Efficient
• Handles
•heterogeneity
•complexity
Show & Tell
Data WarehousingData Warehousing
Motivation efficiency availabilty “denial of service” data cleansing
Requirements efficient to query easy to update. model data naturally
{(#uid: 6138971,
#title: "Homo sapiens adrenergic ...",
#accession: "NM_001619",
#organism: "Homo sapiens",
#taxon: 9606,
#lineage: ["Eukaryota", "Metazoa", …],
#seq: "CTCGGCCTCGGGCGCGGC...",
#feature: {
(#name: "source",
#continuous: true,
#position: [
(#accn: "NM_001619",
#start: 0, #end: 3602,
#negative: false)],
#anno: [
(#anno_name: "organism",
#descr: "Homo sapiens"), …] ), …)}
Show & Tell
Data Warehousing ResultsData Warehousing Results Relational DBMS is
insufficient because it forces us to fragment data into 3NF.
Kleisli turns flat relational DBMS into nested relational DBMS. It can use flat relational DBMS such as Sybase, Oracle, MySQL, etc. to be its updatable complex object store. It can even use all of these systems simultaneously!
! Log inoracle-cplobj-add (#name: "db", ...);
! Define table
create table GP (#uid: "NUMBER", #detail: "LONG")using db;
! Populate table with GenPept reportsselect #uid: x.#uid, #detail: x into GPfrom aa-get-seqfeat-general "PTP” as xusing db;
! Map GP to that tablecreate view GP from GP using db;
! Run a queryto get title of 131470select x.#detail.#title from GP as xwhere x.#uid = 131470;
Show & Tell
Epitope PredictionEpitope Prediction
TRAP-559AAMNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSEEVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLNLNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRSLLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVILTDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNRFLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEKTASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQCEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENIIDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQKPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDNQNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGNRHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHEKPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVPGAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN
Show & Tell
Epitope Prediction ResultsEpitope Prediction Results
Prediction by our ANN model for HLA-A11 29 predictions 22 epitopes 76% specificity
1 66 100Rank by BIMAS
Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%)
Prediction by BIMAS matrix for HLA-A*1101
Show & Tell
Gene Expression AnalysisGene Expression Analysis
Clustering gene expression profiles Classifying gene expression profiles
find stable differentially expressed genes
Show & Tell
Gene Expression Analysis ResultsGene Expression Analysis Results
The Discovery System• Correlation test• Voter selection• Class prediction
Show & Tell
Protein Interaction ExtractionProtein Interaction Extraction
“What are the protein-protein interaction pathwaysfrom the latest reported discoveries?”
Show & Tell
Protein Interaction Extraction ResultsProtein Interaction Extraction Results
Rule-based system for processing free texts in scientific abstracts
Specialized in extracting
protein names extracting
protein-protein interactions
Show & Tell
Transcription Start PredictionTranscription Start Prediction
Show & Tell
Transcription Start Prediction ResultsTranscription Start Prediction Results
Show & Tell
Medical Record AnalysisMedical Record Analysis
Looking for patterns that are valid novel useful understandable
age sex chol ecg heart sick49 M 266 Hyp 171 N64 M 211 Norm 144 N58 F 283 Hyp 162 N58 M 284 Hyp 160 Y58 M 224 Abn 173 Y
Show & Tell
Medical Record Analysis ResultsMedical Record Analysis Results
DeEPs, a novel “emerging pattern’’ method
Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks
Works for gene expressions
Show & Tell
Behind the SceneBehind the Scene
Research Vladimir Bajic Vladimir Brusic Jinyan Li See-Kiong Ng Limsoon Wong Louxin Zhang
Business Peter Saunders
Industry Assignees Hao Han (gX) Rahul Despande (MC)
Engineering
Allen Chong Judice Koh SPT Krishnan Seng Hong Seah Guanglan Zhang Zhuo Zhang
Students Huiqing Liu Song Zhu Kun Yu