The pragmatic text miner: From literature to electronic health records
-
Upload
lars-juhl-jensen -
Category
Health & Medicine
-
view
892 -
download
3
Transcript of The pragmatic text miner: From literature to electronic health records
Lars Juhl Jensen
The pragmatic text miner
From literature to electronic health records
why text mining?
data mining
guilt by association
structured data
unstructured text
biomedical literature
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
dictionary-based approach
identification required
dictionary
cyclin dependent kinase 1
CDC2
expansion rules
CDC2
hCdc2
flexible matching
cyclin dependent kinase 1
cyclin-dependent kinase 1
“black list”
SDS
>10 km<10 hours
the formal way
benchmark
manually annotated corpus
automatic tagging
compare
quality metrics
precision
recall
F-score
manually annotated corpus
use existing corpus
not new
make new corpus
hard work
natural language processing
part-of-speech tagging
semantic tagging
sentence parsing
Gene and protein names
Cue words for entity recognition
Verbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
handle negations
directionality
high precision
poor recall
highly domain specific
the pragmatic way
benchmark light™
requires fewer calories
non-annotated corpus
automatic tagging
random sampling
manual inspection
precision
no recall
relative recall
compare methods
co-mentioning
within documents
within paragraphs
within sentences
weighted score
benchmark
associations good?
tagging good enough
unifying text & data
web resources
text mining
curated knowledge
experimental data
computational predictions
common identifiers
quality scores
proteins
STRING
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
small molecules
Kuhn et al., Nucleic Acids Research, 2012
compartments
compartments.jensenlab.org
tissues
tissues.jensenlab.org
diseases
environments
electronic health records
Jensen et al., Nature Reviews Genetics, 2012
structured data
Jensen et al., Nature Reviews Genetics, 2012
unstructured data
clinical narrative
comorbidity
Jensen et al., Nature Reviews Genetics, 2012
Roque et al., PLoS Computational Biology, 2011
in Danish
by busy doctors
confounding factors
age and gender
reporting bias
temporal correlation
diagnosis trajectories
Jensen et al., in preparation, 2013
pharmocovigilance
adverse drug reactions
Eriksson et al., in preparation, 2013
ADR profiles
Eriksson et al., in preparation, 2013
ADR frequencies
Eriksson et al., in preparation, 2013
Acknowledgments
STRINGChristian von MeringDamian SzklarczykMichael KuhnManuel StarkSamuel ChaffronChris CreeveyJean MullerTobias DoerksPhilippe JulienAlexander RothMilan SimonovicJan KorbelBerend SnelMartijn HuynenPeer Bork
Text miningSune FrankildEvangelos PafilisAlberto SantosKalliopi TsafouJanos BinderLucia FaniniSarah FaulwetterChristina PavloudiJulia SchnetzerAikaterini VasileiadouHeiko HornMichael KuhnNigel BrownReinhard SchneiderSean O’Donoghue
EHR miningRobert ErikssonPeter Bjødstrup JensenAnders Boeck JensenFrancisco S. RoqueHenriette SchmockMarlene DalgaardMassimo AndreattaThomas HansenKaren SøebySøren BredkjærAnders JuulTudor OpreaPope MoseleyThomas WergeSøren Brunak