NUS-KI Course on Bioinformatics, Nov 2005 Sequence Analysis and Function Prediction Limsoon Wong.
Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, &...
-
Upload
galilea-merren -
Category
Documents
-
view
215 -
download
1
Transcript of Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, &...
![Page 1: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/1.jpg)
Copyright © 2005 by Limsoon Wong
Building Gene Networks by Information Extraction,
Cleansing, & Integration
Limsoon WongInstitute for Infocomm
ResearchSingapore
![Page 2: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/2.jpg)
Copyright © 2005 by Limsoon Wong.
Plan
• Motivation for study of gene network• Example Efforts at I2R
– Disease Pathweaver– Dragon ERG Solution
• Technical Challenges Involved– Name entity recognition– Co-reference resolution– Protein interaction extraction– Database cleansing
![Page 3: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/3.jpg)
Copyright © 2005 by Limsoon Wong
Motivation
![Page 4: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/4.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Zhou Zhang, Suisheng Tang, & See-Kiong Ng
Some Useful Information Sources• Disease-centric
resources– OMIM [NCBI OMIM, 2004]
– Gene2Disease [Perez-Iratxeta et al., Nature, 2002]
– MedGene [Hu et al.,
J. Proteome Res., 2003]
• Emphasized direct gene-disease relationships– Provide lists of
disease-related genes– Do not provide info on
gene-gene interactions & their networks
• Related interaction resources– KEGG [Kanehisa, NAR, 2000]
• Manually constructed protein interaction networks
• Mostly metabolic pathways and few disease pathways– Only 7 disease
pathways
![Page 5: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/5.jpg)
Monogenic Heterogenic
Why Gene Network?
• Many common diseases are – not caused by a genetic variation
within a single gene
• But are influenced by – complex interactions among multiple
genes– environmental & lifestyle factors
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Zhuo Zhang, Suisheng Tang, & See-Kiong Ng
![Page 6: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/6.jpg)
Copyright © 2005 by Limsoon Wong.
Desired Outcome of Gene Network Study
• Help scientists understand the mechanism of complex diseases by– Greatly reducing work load for primary study
of genetic diseases, broaden the scope of molecular studies
– Easily identifying key players in the gene network, help in finding potential drug targets
• Scalability framework– Extend to many genetic diseases– Include other resources of gene interactions
Adapted w/ permission from Zhou Zhang, Suisheng Tang, & See-Kiong Ng
![Page 7: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/7.jpg)
Copyright © 2005 by Limsoon Wong
Some Gene Network Study Efforts at I2R
• Disease Pathweaver• Dragon ERG Solution
![Page 8: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/8.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Zhou Zhang, Suisheng Tang, & See-Kiong Ng
Disease Pathweaver, Zhang et al. APBC 2005
• Automatic constructing disease pathways – Identify core genes– Mine info on core
genes– Construct interaction
network betw core genes
• Data sources:– Online literature– High-thru’put biological
data
![Page 9: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/9.jpg)
Copyright © 2005 by Limsoon Wong.
Disease Pathweaver:
The Portal• Portal for human nervous diseases gene
networks– http://research.i2r.a-star.edu/NSDPath
• Statistics– 37 Human Nervous System Disorders– 7 ~ 60 core genes per disease– 2 ~ 320 core interactions per disease
Adapted w/ permission from Zhou Zhang, Suisheng Tang, & See-Kiong Ng
![Page 10: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/10.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Zhuo Zhang, Suisheng Tang, & See-Kiong Ng
Disease Pathweaver:
A Tour
![Page 11: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/11.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Zhuo Zhang, Suisheng Tang, & See-Kiong Ng
Disease Pathweaver:
A Tour
![Page 12: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/12.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Zhuo Zhang, Suisheng Tang, & See-Kiong Ng
Disease Pathweaver:
A Tour
![Page 13: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/13.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Zhuo Zhang, Suisheng Tang, & See-Kiong Ng
Disease Pathweaver:
A Tour
![Page 14: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/14.jpg)
Disease Pathweaver:
DPW vs KEGG on Huntington Disease
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Zhuo Zhang, Suisheng Tang, & See-Kiong Ng
![Page 15: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/15.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Suisheng Tang & Vlad Bajic
Estrogen-Responsive Genes• Why
– Affects human physiology in many aspects
– Related to many diseases– Widely used in clinic
• Challenges– Multiple pathways– Difficult to predict ERE– Many estrogen-
responsive genes but only a few are well-studied
– Difficult to keep up w/ speed of knowledge accumulation
Needed– Tools to predict ERE &
estrogen-responsive genes
– Database of useful info– Systems to predict impt
regulatory units, associate gene functions, & generate global view of gene network
![Page 16: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/16.jpg)
Copyright © 2005 by Limsoon Wong.
Dragon ERG Solution
ERE dependent
E2 dependent
ERs bind to other TFs
Membrane receptors
E2 independent
ERE independentComing soon!
ERE Finder ERG Finder ERGDB
ERG Explorer
Adapted w/ permission from Suisheng Tang & Vlad Bajic
text mining here!
![Page 17: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/17.jpg)
http://sdmc.i2r.a-star.edu.sg/ERE-V2/index
Predict functional ERE in genomic DNAOne prediction in 13.3k bpAllow further analysis
Dragon ERG Solution:
Dragon ERE Finder, Bajic et al, NAR, 2003
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Suisheng Tang & Vlad Bajic
![Page 18: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/18.jpg)
Only for human genomeUsing 117 bp ERE frameEvaluated by PubMed & microarray data
Dragon ERG Solution:
Dragon Estrogen-Responsive Gene Finder, Tang et al, NAR, 2004a
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Suisheng Tang & Vlad Bajic
http://sdmc.i2r.a-star.edu.sg/DRAGON/ERGP1_0
![Page 19: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/19.jpg)
Contains >1000 genesManually curatedBasic gene infoExperimental evidenceFull set of referencesERE sites annotated
Dragon ERG Solution:
Dragon Estrogen-Responsive Genes Database, Tang et al, NAR, 2004b
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Suisheng Tang & Vlad Bajic
http://sdmc.i2r.a-star.edu.sg/promoter/Ergdb-v11
![Page 20: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/20.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Suisheng Tang & Vlad Bajic
Dragon ERG Solution:
DEERGF
![Page 21: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/21.jpg)
Copyright © 2005 by Limsoon Wong.
Dragon ERG Solution:
Case-Specific TF relation networks, Pan et al, NAR, 2004
• Analyse abstracts• Stemming, POS
tagging• Use ANNs, SVM,
discriminant analysis• Simplified rules for
sentence analysis• Constraints on the
forms of sentences• Sensitivity ~75%• Precision ~82%
• Produce reports & direct links to PubMed docs, & graphical presentations of entity links
Adapted w/ permission from Vlad Bajic
![Page 22: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/22.jpg)
Copyright © 2005 by Limsoon Wong
Technical Challenges
• Named entity recognition• Co-reference resolution• Data cleansing
![Page 23: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/23.jpg)
Bio Entity Name Recognition, Zhou et al., BioCreAtIvE, 2004
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Jian Su
![Page 24: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/24.jpg)
Copyright © 2005 by Limsoon Wong.
Bio Entity Name Recognition:
Ensemble Classification Approach• Features considered
– orthographic, POS, morphologic, surface word, trigger words (TW1: receptor, enhancer, etc. TW2: activation, stimulation, etc.)
• SVM– Context of 7 words– Each word gives 5 features,
plus its position
• HMMs– 3 features used
(orthographic, POS, surface word)
– HMM1 & HMM2 use POS taggers trained on diff corpora
HMM1balanced precision
& recall
HMM2low precision& high recall
SVMhigh precision& low recall
MajorityVoting
![Page 25: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/25.jpg)
Bio Entity Name Recognition:
Performance at BioCreAtIvE 2004
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Zhou et al., BioCreAtIve 2004
![Page 26: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/26.jpg)
Co-Reference Resolution, Yang et al., IJCNLP, 2004
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Yang et al., IJCNLP 2004
![Page 27: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/27.jpg)
Co-Reference Resolution: Baseline Features Used
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Yang et al., IJCNLP 2004
![Page 28: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/28.jpg)
Co-Reference Resolution:
New Features Used & Performance
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Yang et al., IJCNLP 2004
Base Classifier:C5.0
![Page 29: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/29.jpg)
• The max entropy model:
• where– o is the outcome– h is the feature vector– Z(h) is normalization
function
– fj are feature functions
j are feature weights
Protein Interaction Extraction, Xiao et al, submitted
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Xiao et al., IJCNLP 2004
![Page 30: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/30.jpg)
Copyright © 2005 by Limsoon Wong.
Protein Interaction Extraction:
Features Used
Adapted w/ permission from Xiao et al.
![Page 31: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/31.jpg)
Copyright © 2005 by Limsoon Wong.
Protein Interaction Extraction:
Performance on IEPA Corpus
Adapted w/ permission from Xiao et al.
![Page 32: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/32.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/ permission from Judice Koh, Mong Li Lee, & Vladimir Brusic
Data Cleansing, Koh et al, DBiDB, 2005
• 11 types & 28 subtypes of data artifacts– Critical artifacts (vector
contaminated sequences, duplicates, sequence structure violations)
– Non-critical artifacts (misspellings, synonyms)
• > 20,000 seq records in public contain artifacts
• Identification of these artifacts are impt for accurate knowledge discovery
• Sources of artifacts– Diverse sources of data
• Repeated submissions of seqs to db’s
• Cross-updating of db’s
– Data Annotation• Db’s have diff ways for data
annotation• Data entry errors can be
introduced• Different interpretations
– Lack of standardized nomenclature
• Variations in naming• Synonyms, homonyms, &
abbrevn
– Inadequacy of data quality control mechanisms
• Systematic approaches to data cleaning are lacking
![Page 33: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/33.jpg)
Data Cleansing:
A Classification of Errors
Copyright © 2005 by Limsoon Wong. Adapted w/permision from Judice Koh, Mong Li Lee, & Vladimir Brusic
![Page 34: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/34.jpg)
Copyright © 2005 by Limsoon Wong. Adapted w/permision from Judice Koh, Mong Li Lee, & Vladimir Brusic
Context of the misspellingsCorrectionsMisspellings
EMBL:Y18050 E.faecium pbp5 geneTITLE Modification of penicillin-binding protein 5 asociated with highlevel ampicillin resistance in Enterococcus faeciumgi|1143442|emb|X92687.1|EFPBP5G
associatedasociated
Swiss-Prot:P03385Env polyprotein precursor DEFINITION Env polyprotein precursor [Contains: Surface protein (SU) (GP70);Tranmembrane protein (TM) (p15E); R protein].gi|119478|sp|P03385|ENV_MLVMO
transmembranetranmembrane
Patent Database:A76783 Sequence 11 from Patent WO9315210CDS <1..150/note="gene cassete encoding intercalating jun-zipper andlinker"gi|6088638|emb|A76783.1||pat|WO|9315210|11[6088638]
CassetteCassete
GenBank:AAD26534 nectin-1 [Rattus norvegicus]TITLE Nectin/PRR: An Immunogloblin-like Cell Adhesion Molecule Recruitedto Cadherin-based Adherens Junctions through Interaction withAfadin, a PDZ Domain-containing Proteingi|4590334|gb|AAD26534.1
ImmunoglobulinImmunoglobinRECORD
SINGLE SOURCE DATABASE
Invalid values
Ambiguity
Incompatible schema
ATTRIBUTE
Spelling errors
Format violation
Annotation error
Dubious sequences
Sequence redundancy
Data Provenance flaws
Cross-annotation error
Sequence structure violation
• Usually typo errors
• Occurs in different fields of the record
• We identified 569 possible misspelled words affecting up to 20,505 nucleotide records in Entrez.
Vector contaminated sequence
Erroneous data transformation
MULTIPLESOURCE DATABASE
Data Cleansing:
Example Spelling Errors
![Page 35: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/35.jpg)
RECORD
SINGLE SOURCE DATABASE
Invalid values
Ambiguity
Incompatible schema
ATTRIBUTE
Uninformative sequences
Undersized sequences
Annotation error
Dubious sequences
Sequence redundancy
Data provenance flaws
Cross-annotation error
Sequence structure violation
Vector contaminated sequence
Erroneous data transformation
MULTIPLESOURCE DATABASE
Among the 5,146,255 protein records queried using Entrez to the major protein or translated nucleotide databases , 3,327 protein sequences are shorter than four residues (as of Sep, 2004).
• In Nov 2004, the total number of undersized protein sequences increases to 3,350.
• Among 43,026,887 nucleotide records queried using Entrez to major nucleotide databases, 1,448 records contain sequences shorter than six bases (as of Sep, 2004).
• In Nov 2004, the total number of undersized nucleotide sequences increases to 1,711.
Undersized protein sequences in major databases
218 171
42
528
116 151
1015
364 383
12351
1253 2 120 0 23
0
200
400
600
800
1000
1200
1 2 3
Sequence Length
Num
ber
of r
ecor
ds
DDBJ
EMBL
GenBank
PDB
SwissProt
PIR
Undersized nucleotide sequences in major databases
2 3 924
5573 69
115
81
233228
108 108
5167
6
40 45
77104
0
50
100
150
200
250
1 2 3 4 5
Sequence LengthN
um
ber
of
reco
rds
DDBJ
EMBL
GenBank
PDB
Copyright © 2005 by Limsoon Wong. Adapted w/permision from Judice Koh, Mong Li Lee, & Vladimir Brusic
Data Cleansing:
Example Meaningless Seqs
![Page 36: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/36.jpg)
Copyright © 2005 by Limsoon Wong.
References • Zhang et al., “Toward discovering disease-specific gene
networks from online literature”, APBC, 3:161-169, 2005• Pan et al., “Dragon TF association miner: A system for
exploring transcription factor associations through text mining”, NAR, 32:W230-W234, 2004
• Tang et al., “Computational method for discovery of estrogen-responsive genes”, NAR, 32:6212-6217, 2004a
• Tang et al., “ERGDB: Estrogen-responsive genes database”, NAR, 32:D533-D536, 2004b
• Bajic et al., “Dragon ERE finded ver.2: A tool for accurate detection and analysis of estrogen-response elements in vertebrate genomes”, NAR, 31:3605-3607, 2003
• Koh et al., “A Classification of Biological Data Artifacts”, DBiBD, 2005
![Page 37: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/37.jpg)
Copyright © 2005 by Limsoon Wong.
References • Zhou et al., “Recognition of protein and gene names from
text using an ensemble of classifiers and effective abbreviation resolution”, Proc. BioCreAtIvE Workshop, pp 26-30, 2004
• Yang et al., “Improving Noun Phrase Co-reference Resolution by Matching Strings”, IJCNLP, 1:226-333, 2004
• Xiao et al., “Protein-protein interaction extraction: A supervised learning approach”, submitted
![Page 38: Copyright © 2005 by Limsoon Wong Building Gene Networks by Information Extraction, Cleansing, & Integration Limsoon Wong Institute for Infocomm Research.](https://reader035.fdocuments.net/reader035/viewer/2022070308/551bf57c550346a34f8b4642/html5/thumbnails/38.jpg)
I2R
Communications & DevicesServices & Applications Media
Media Processing
Human CentricMedia
Media Semantics
Infocomm Security
Context-Aware Systems
Knowledge Discovery
Radio Systems Networking LightwaveEmbedded Systems
Digital Wireless
Acknowledgements
Copyright © 2005 by Limsoon Wong
Data Cleansing:Judice Koh, Vladimir Brusic, Mong Li Lee, Asif M. Khan,Paul T.J. Tan, Heiny Tan, Kenneth Lee, Wilson Goh,
Songsak Tongchusak, Kavitha Gopalakrishnan
Pathweaver:Zhuo Zhang, See-Kiong Ng,
Suisheng Tang, Chris Tan
Dragon ERG & TF Miner:Suisheng Tang, Vlad Bajic,
Zuo Li, Pan Hong, Vidhu Chaudhary, Raja Kangasa
Info Extraction:Guodong Zhou, Jian Su,
ChewLim Tan, Juan Xiao,Xiaofeng Yang, Chris Tan,
Dan Shen, Jie Zhang