Aligning Biomedical Engineering Senior Capstone Design Course with ABET Criteria
MIE 2008 1 Biomedical Knowledge Confidence Criteria Assessment of Biomedical Knowledge According to...
-
Upload
chester-mckenzie -
Category
Documents
-
view
216 -
download
1
Transcript of MIE 2008 1 Biomedical Knowledge Confidence Criteria Assessment of Biomedical Knowledge According to...
MIE MIE 20082008
1
Assessment ofBiomedical KnowledgeBiomedical Knowledge
According toConfidence CriteriaConfidence Criteria
Ines Jilani : Ines Jilani : [email protected]
Natalia GrabarPierre MenetonMarie-Christine Jaulent
Wednesday, 28th of May 2008
MIE MIE 20082008
2
ContextContext
• Increasing number of biomedical articles in Pubmed*
• Follow-up work on automatic extraction of functional knowledge about genes/proteins from scientific articlesΔ indexed in Pubmed– Using lexico-syntactic patterns:
• Language specific automaton (grammar)
o Syntactic elements (Verb, Noun, Adjective…)
o Semantic elements (Meaning of words…)
* http://www.ncbi.nlm.nih.gov/sites/entrez
Δ Jilani I, Grabar N & Jaulent M.-C. Fitting the finite-state automata platform for mining gene functions from biological scientific literature. In SMBM in Jena (Germany) 2006
MIE MIE 20082008
3
Example of lexico-syntactic patternExample of lexico-syntactic pattern
o (Sox2; sensory organ development)
o (Hint; murine development)
MIE MIE 20082008
4
IntroductionIntroduction
• Limits of the system– Loss of context: reliability and confidence of
the claim
• Solution– Use some devices to « weight » the extracted
knowledge• In order to make more confident use of extracted
knowledge
o Hedge, modifier, qualifier
o Confidence markers
MIE MIE 20082008
5
Hedges, modifiers, qualifiers …Hedges, modifiers, qualifiers …
• Linguistic devices used by authors to qualify their assertions– Different grammatical categories: verbs, adverbs, adjectives…– “Copper deficiency is a plausible cause of Alzheimer disease
(AD). This hypothesis should be tested with a lengthy trial of copper supplementation”*
• “hedge” was first used by Lakoff Δ : “words whose job it is to make things more or less fuzzy”
• HylandΦ, and others carried out qualitative studies of these qualifiers– without modelling them– nor integrated their use for weighting any kind of information in a
knowledge extraction system
* Quoted from the abstract of the article with Pubmed Identifier 17928161
Δ Lakoff, G., (1972) : Hedges: A study of Meaning Criteria and the Logic of Fuzzy Concepts, Chicago Linguistic Society, 8, pp. 183-228
Φ Hyland, K. 1995. The Author in the Text: Hedging Scientific Writing. Hong Kong Papers in Linguistics and Language Teaching.
MIE MIE 20082008
6
ObjectivesObjectives• Work on confidence markers in scientific articles
– Their use– Their significance– Their classification– Their automatic detection in texts for knowledge weighting
purposes
• The main aim was to document the information so that it could be used confidentlyE.g. : (Sox2; sensory organ development)– Sox2 is required for sensory organ development– Sox2 might be required for sensory organ development– Sox2 is probably required for sensory organ development– Our findings suggest that Sox2 is required for sensory organ
development– Doe, et al. has demonstrated that Sox2 is required for sensory
organ development
MIE MIE 20082008
7
MaterialsMaterials
• 3 corpora obtained by querying Pubmed
• Lexical resource: WordNet®* is a large lexical database of layman English: nouns, verbs, adjectives and adverbs– Used to enrich the extracted confidence markers by identifying
their synonyms* WordNet, An Electronic Lexical Database, Christiane Fellbaum ed., (1998), The MIT Press, Cambridge, Mass
Corpus QUERY SPECIES SOURCE SPECIFICITY NUMBER of SENTENCES
CORP1 160 genes + Alzheimer disease
human Pubmed 355 abstracts 817
CORP2 160 genes + Alzheimer disease
human Pubmed Central
68 full texts 27,912
CORP3 160 genes + Alzheimer disease
worm Pubmed 348 abstracts 825
MIE MIE 20082008
8
MethodsMethods• Manual collection of confidence markers from CORP1,
CORP2 and CORP3
• Enrichment of the list of confidence markers– Using WordNet®
• Classification of confidence markers according to 2 types of classes
• Add the Impact Factor (IF) as another confidence criterion– Hypothesis: IF of a journal is subjectively related to the
reliability of the biological and medical information published
• Modeling confidence criteria: develop a formula allowing to order the triplets (representing annotations) in respect to their confidence score, and consistently
MIE MIE 20082008
9
ResultsResults
• List of 250250 manually collected confidence markers was generated
• Enrichment using WordNet® increased the number of confidence markers listed to 478478
• Classification– 4 different categories in ascending order of
confidence Type 1– 10 distinct qualifiers modifying confidence levels
within the Type 1 categories, characterizing subjectivity in texts Type 2
MIE MIE 20082008
10
1 - Interrogation or trial and error of the author: Knowledge that remains unproven and requires demonstration. e.g.: “remain to be confirmed”, “has yet to be identified”, “?”
2 - Distance suggested by the author compared to his assertions or the knowledge presented in the text: It may also correspond to a restriction of the knowledge concerned to a specific context (e.g.: the context of the article or experiment).e.g.: “our findings suggest that”, “in this case we conclude that”, ”it is possible that”
3 - Studies by other researchers, references to other works, articles or methods: We assume that if an article is cited, the information is assumed, or at worst simply believed to be true. e.g.: “previous observation”, “it is now believed that”, “it has been proposed that”
4 - Demonstration or proof given by the author: This corresponds to work carried out by the author and presented in the concerned article. e.g.: “we reveal that”, “we show here that”, “our results indicate that”…
Results: Type 1 classResults: Type 1 class
MIE MIE 20082008
11
• 10 Qualifiers representing probabilities from negation to affirmation, i.e. from the least probable to the most probable
Results: Type 2 class*Results: Type 2 class*
Confidence - - Confidence + +
* Work derived from: Ian Jacobs. 1995. English Modal Verbs
MIE MIE 20082008
12
Results: ModelingResults: Modeling
• Modeling confidence criteria for their automatic extraction– Regular expressions are used
• “we anticipate” and “we expect”we<have>*(<anticipate>+<expect>)
– Synonyms are used• “we hypothesise” and “we suspect ”
we<have>*(<hypothesise>+ <speculate>+<expect>+<predict>+<suspect>)
• “have been previously confirmed”, “is now largely confirmed” and
is “widely confirmed ”<have>*<be>(previously+now)*(largely+widely+extensively+generally)*<confirm>
We had anticipated that…
We have anticipated that…
We expect that….
We have expected that…
MIE MIE 20082008
13
Results: ApplicationResults: Application
• Context of apolipoprotein E gene
*
*
*
poin
ts
Triplets (Gene, Function, PMID)
MIE MIE 20082008
14
Results : ExplanationsResults : Explanations - ApoE allelic variability influences pupil response to cholinergic challenge and
cognitive impairment. 1
- The Apolipoprotein E (ApoE) epsilon4 allele role in LOD is controversial, while
it is still unknownit is still unknown in vascular depression. 2
- ApoE4 seemsseems to facilitate HSV-1 latency in the brain much more so than ApoE3.3
Triplets Type1 Type2 IF
ApoE/ cognitive impairment/167646771 4 10 4,091
ApoE epsilon4 allele/vascular depression /173370102 1 10 2,035
ApoE4/ HSV-1 latency/166990183 2 10 5,178
Triplets ordered in an ascending confidence orderconfidence order:
1 ; 3 ; 2
MIE MIE 20082008
15
Discussion / ConclusionDiscussion / Conclusion
• Confidence markers collected manually– Abstracts– full text articles
• They are extended with WordNet® resource• They are classified into 4 categories of Type 1 and
10 categories of Type2
• This study constitute a priming work: the confidence markers will be easily added to lexico-syntactic patterns already generated for annotating genes/proteins functionally
• Annotation already present in databases could be additionally documented with confidence markers– Gene Ontology Annotation files– Swissprot / Uniprot
• The confidence markers can be used by curators to annotate genes/proteins through a system able to detect those qualifiers