iSimp: A Sentence Simplification System for Biomedical Text
Transcript of iSimp: A Sentence Simplification System for Biomedical Text
iSimp: A Sentence Simplification Systemfor Biomedical TextYifan Peng, Catalina O. Tudor, Manabu ToriiCathy H. Wu, and K. Vijay-Shanker
Computer & Information SciencesUniversity of Delaware
Oct 6, 2012
Outline
1 Introduction and motivation
2 iSimp: what to simplify, and how
3 Evaluation
4 Summary and future work
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Introduction
Lots of text mining applications are developed for biomedical text
Complexity of sentences is a challengeiSimp simplifies the text so that the existing text mining tools canbe improvedThis topic is still new, though we are not the first one (Miwa,2010; Jonnalagadda, 2010; Siddharthan, 2003)
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Introduction
Lots of text mining applications are developed for biomedical textComplexity of sentences is a challenge
iSimp simplifies the text so that the existing text mining tools canbe improvedThis topic is still new, though we are not the first one (Miwa,2010; Jonnalagadda, 2010; Siddharthan, 2003)
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Introduction
Lots of text mining applications are developed for biomedical textComplexity of sentences is a challengeiSimp simplifies the text so that the existing text mining tools canbe improved
This topic is still new, though we are not the first one (Miwa,2010; Jonnalagadda, 2010; Siddharthan, 2003)
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Introduction
Lots of text mining applications are developed for biomedical textComplexity of sentences is a challengeiSimp simplifies the text so that the existing text mining tools canbe improvedThis topic is still new, though we are not the first one (Miwa,2010; Jonnalagadda, 2010; Siddharthan, 2003)
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
How do we extract relations?
• PKC::::::::::::::::phosphorylates GAP-43 on serine 41.
• It was suggested that Yak1::::::::::::::::phosphorylates
Crf1 to promote its nuclear entry.Sentences
... ProteinA ::::::::::::::::phosphorylates ProteinB ...Word
Sequence
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Three alternative ways
Raf-1:::::::::::::::phosphorylates and activates MEK1 , which in turn
:::::::::::::::phosphorylates and activates the MAP kinases/extracellularsignal regulated kinases, ERK1 and ERK2.Sentence
... ProteinA ::::::::::::::phosphorylates ProteinB ...Word
Sequence
They are same because of subject – object relation
Design rules for all possible variationsThere are TOO many variations
Improve deep representations of sentencesParsers become error-prone for long and complex sentencesParsers will be less efficient for long sentences
Simplify sentences to reduce variations
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Three alternative ways
Raf-1:::::::::::::::phosphorylates and activates MEK1 , which in turn
:::::::::::::::phosphorylates and activates the MAP kinases/extracellularsignal regulated kinases, ERK1 and ERK2.Sentence
... ProteinA ::::::::::::::phosphorylates ProteinB ...Word
Sequence
They are same because of subject – object relation
Design rules for all possible variationsThere are TOO many variations
Improve deep representations of sentencesParsers become error-prone for long and complex sentencesParsers will be less efficient for long sentences
Simplify sentences to reduce variations
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Three alternative ways
Raf-1:::::::::::::::phosphorylates and activates MEK1 , which in turn
:::::::::::::::phosphorylates and activates the MAP kinases/extracellularsignal regulated kinases, ERK1 and ERK2.Sentence
... ProteinA ::::::::::::::phosphorylates ProteinB ...Word
Sequence
They are same because of subject – object relationDesign rules for all possible variations
There are TOO many variations
Improve deep representations of sentencesParsers become error-prone for long and complex sentencesParsers will be less efficient for long sentences
Simplify sentences to reduce variations
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Three alternative ways
Raf-1:::::::::::::::phosphorylates and activates MEK1 , which in turn
:::::::::::::::phosphorylates and activates the MAP kinases/extracellularsignal regulated kinases, ERK1 and ERK2.Sentence
... ProteinA ::::::::::::::phosphorylates ProteinB ...Word
Sequence
They are same because of subject – object relationDesign rules for all possible variations
There are TOO many variationsImprove deep representations of sentences
Parsers become error-prone for long and complex sentencesParsers will be less efficient for long sentences
Simplify sentences to reduce variations
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Three alternative ways
Raf-1:::::::::::::::phosphorylates and activates MEK1 , which in turn
:::::::::::::::phosphorylates and activates the MAP kinases/extracellularsignal regulated kinases, ERK1 and ERK2.Sentence
... ProteinA ::::::::::::::phosphorylates ProteinB ...Word
Sequence
They are same because of subject – object relationDesign rules for all possible variations
There are TOO many variationsImprove deep representations of sentences
Parsers become error-prone for long and complex sentencesParsers will be less efficient for long sentences
Simplify sentences to reduce variationsYifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Simplification as a preprocessing module
Assume we are building a phosphorylation system.
Raf-1:::::::::::::::phosphorylates and activates MEK1 , which in turn
:::::::::::::::phosphorylates and activates the MAP kinases/extracellularsignal regulated kinases, ERK1 and ERK2.Sentence
• Raf-1:::::::::::::::phosphorylates MEK1
• MEK1:::::::::::::::phosphorylates ERK1
• MEK1:::::::::::::::phosphorylates ERK2
• Raf-1 activates MEK1...
Simplify
... ProteinA phosphorylates ProteinB ...WordSequence
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Simplification as a preprocessing module
Assume we are building a phosphorylation system.
Raf-1:::::::::::::::phosphorylates and activates MEK1 , which in turn
:::::::::::::::phosphorylates and activates the MAP kinases/extracellularsignal regulated kinases, ERK1 and ERK2.Sentence
• Raf-1:::::::::::::::phosphorylates MEK1
• MEK1:::::::::::::::phosphorylates ERK1
• MEK1:::::::::::::::phosphorylates ERK2
• Raf-1 activates MEK1...
Simplify
... ProteinA phosphorylates ProteinB ...WordSequence
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Simplification as a preprocessing module
Assume we are building a phosphorylation system.
Raf-1:::::::::::::::phosphorylates and activates MEK1 , which in turn
:::::::::::::::phosphorylates and activates the MAP kinases/extracellularsignal regulated kinases, ERK1 and ERK2.Sentence
• Raf-1:::::::::::::::phosphorylates MEK1
• MEK1:::::::::::::::phosphorylates ERK1
• MEK1:::::::::::::::phosphorylates ERK2
• Raf-1 activates MEK1...
Simplify
... ProteinA phosphorylates ProteinB ...WordSequence
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Outline
1 Introduction and motivation
2 iSimp: what to simplify, and how
3 EvaluationiSimp accuracyImprovement of recall of information extraction systemsImprovement of sentence ranking systems
4 Summary and future work
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
How does iSimp work?
Active Raf-1 :::::::::::::::phosporylates and activates MEK1 , which in turn
:::::::::::::::phosporylates and activates the MAP kinases/extracellular signal
regulared kinases , ERK1 and ERK2 .
1 Raf-1 phosphorylates MEK12 MEK1 phosphorylates ERK13 MEK1 phosphorylates ERK2
Verb conjunctionRelative clauseAppositionNoun conjunction
Yifan Peng iSimp: A Sentence Simplification Systemfor Biomedical Text
How does iSimp work?
Active Raf-1 :::::::::::::::phosporylates and activates MEK1 , which in turn
:::::::::::::::phosporylates and activates the MAP kinases/extracellular signal
regulared kinases , ERK1 and ERK2 .
1 Raf-1 phosphorylates MEK1
2 MEK1 phosphorylates ERK13 MEK1 phosphorylates ERK2
Verb conjunction
Relative clauseAppositionNoun conjunction
Yifan Peng iSimp: A Sentence Simplification Systemfor Biomedical Text
How does iSimp work?
Active Raf-1 :::::::::::::::phosporylates and activates MEK1 , which in turn
:::::::::::::::phosporylates and activates the MAP kinases/extracellular signal
regulared kinases , ERK1 and ERK2 .
1 Raf-1 phosphorylates MEK12 which phosphorylates ...
3 MEK1 phosphorylates ERK14 MEK1 phosphorylates ERK2
Verb conjunction
Relative clauseAppositionNoun conjunction
Yifan Peng iSimp: A Sentence Simplification Systemfor Biomedical Text
How does iSimp work?
Active Raf-1 :::::::::::::::phosporylates and activates MEK1 , which in turn
:::::::::::::::phosporylates and activates the MAP kinases/extracellular signal
regulared kinases , ERK1 and ERK2 .
1 Raf-1 phosphorylates MEK12 which phosphorylates ...
3 MEK1 phosphorylates ERK14 MEK1 phosphorylates ERK2
Verb conjunctionRelative clause
AppositionNoun conjunction
Yifan Peng iSimp: A Sentence Simplification Systemfor Biomedical Text
How does iSimp work?
Active Raf-1 :::::::::::::::phosporylates and activates MEK1 , which in turn
:::::::::::::::phosporylates and activates the MAP kinases/extracellular signal
regulared kinases , ERK1 and ERK2 .
1 Raf-1 phosphorylates MEK12 MEK1 phosphorylates ...
3 MEK1 phosphorylates ERK14 MEK1 phosphorylates ERK2
Verb conjunctionRelative clause
AppositionNoun conjunction
Yifan Peng iSimp: A Sentence Simplification Systemfor Biomedical Text
How does iSimp work?
Active Raf-1 :::::::::::::::phosporylates and activates MEK1 , which in turn
:::::::::::::::phosporylates and activates the MAP kinases/extracellular signal
regulared kinases , ERK1 and ERK2 .
1 Raf-1 phosphorylates MEK12 MEK1 phosphorylates ...
3 MEK1 phosphorylates ERK14 MEK1 phosphorylates ERK2
Verb conjunctionRelative clauseApposition
Noun conjunction
Yifan Peng iSimp: A Sentence Simplification Systemfor Biomedical Text
How does iSimp work?
Active Raf-1 :::::::::::::::phosporylates and activates MEK1 , which in turn
:::::::::::::::phosporylates and activates the MAP kinases/extracellular signal
regulared kinases , ERK1 and ERK2 .
1 Raf-1 phosphorylates MEK12 MEK1 phosphorylates ERK1 and
ERK2
3 MEK1 phosphorylates ERK14 MEK1 phosphorylates ERK2
Verb conjunctionRelative clauseApposition
Noun conjunction
Yifan Peng iSimp: A Sentence Simplification Systemfor Biomedical Text
How does iSimp work?
Active Raf-1 :::::::::::::::phosporylates and activates MEK1 , which in turn
:::::::::::::::phosporylates and activates the MAP kinases/extracellular signal
regulared kinases , ERK1 and ERK2 .
1 Raf-1 phosphorylates MEK12 MEK1 phosphorylates ERK1 and
ERK2
3 MEK1 phosphorylates ERK14 MEK1 phosphorylates ERK2
Verb conjunctionRelative clauseAppositionNoun conjunction
Yifan Peng iSimp: A Sentence Simplification Systemfor Biomedical Text
How does iSimp work?
Active Raf-1 :::::::::::::::phosporylates and activates MEK1 , which in turn
:::::::::::::::phosporylates and activates the MAP kinases/extracellular signal
regulared kinases , ERK1 and ERK2 .
1 Raf-1 phosphorylates MEK12 MEK1 phosphorylates ERK13 MEK1 phosphorylates ERK2
Verb conjunctionRelative clauseAppositionNoun conjunction
Yifan Peng iSimp: A Sentence Simplification Systemfor Biomedical Text
Types of simplification constructs
ConjunctionRelative clauseAppositionSubordinate clauseIntroductory phraseParenthesized element
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Types of simplification constructs
ConjunctionRelative clause
Almost all abstracts contain at leastone of these three constructsThey are challenging to detectApposition
Subordinate clauseIntroductory phraseParenthesized element
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
iSimp pipeline
Tagger and chunker are trained onGenia corpus∗Part-of-speech tagging &
Simple phrase chunking
6 constructions are detected: conjunc-tion, relative clause, apposition, etc.Detection of
complex constructions
For each type of constructs, use aproper templateGeneration of
simplified sentences
* Y. Tateisi, A. Yakushiji, T. Ohta, and J. Tsujii, “Syntax annotation for the geniacorpus,” in Procs. of the IJCNLP, Companion volume, 2005, pp. 222–227.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
iSimp pipeline
Tagger and chunker are trained onGenia corpus∗Part-of-speech tagging &
Simple phrase chunking
6 constructions are detected: conjunc-tion, relative clause, apposition, etc.Detection of
complex constructions
For each type of constructs, use aproper templateGeneration of
simplified sentences
* Y. Tateisi, A. Yakushiji, T. Ohta, and J. Tsujii, “Syntax annotation for the geniacorpus,” in Procs. of the IJCNLP, Companion volume, 2005, pp. 222–227.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
iSimp pipeline
Tagger and chunker are trained onGenia corpus∗Part-of-speech tagging &
Simple phrase chunking
6 constructions are detected: conjunc-tion, relative clause, apposition, etc.Detection of
complex constructions
For each type of constructs, use aproper templateGeneration of
simplified sentences
* Y. Tateisi, A. Yakushiji, T. Ohta, and J. Tsujii, “Syntax annotation for the geniacorpus,” in Procs. of the IJCNLP, Companion volume, 2005, pp. 222–227.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Detection
Look for triggers: “and”, “which”, etc.Scan the right and left of the trigger to determine the type ofconstructsUse part-of-speech tags and chunking boundaries to determinethe boundary of constructs
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Challenges – detection of simplification types
ExampleeIF2alpha dephosphorylation, GADD34 and CreP, ...
noun phrase conjunction
Two markers, D16S3070 and D16S3275, ...
apposition and noun conjunction
Criteria for apposition detectionOne of two noun phrases begins with a number, a determiner (e.g.,“a”, “an”, “the”), or words “other” or “another”
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Challenges – detection of simplification types
ExampleeIF2alpha dephosphorylation, GADD34 and CreP, ...
noun phrase conjunction
::::Two
:::::::::markers, [D16S3070 and D16S3275], ...apposition and noun conjunction
Criteria for apposition detectionOne of two noun phrases begins with a number, a determiner (e.g.,“a”, “an”, “the”), or words “other” or “another”
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Challenges – detection of simplification types
ExampleeIF2alpha dephosphorylation, GADD34 and CreP, ...
noun phrase conjunction
::::Two
:::::::::markers, [D16S3070 and D16S3275], ...apposition and noun conjunction
Criteria for apposition detectionOne of two noun phrases begins with a number, a determiner (e.g.,“a”, “an”, “the”), or words “other” or “another”
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Challenges – detection of boundary
Examplehyperglycemic clamps in carriers of a CA repeat in the IGF-I promoter
::::and an ApaI polymorphism in the IGF-II gene...
noun phrase “of” [noun phrase::::and noun phrase]...
[noun phrase “of” noun phrase::::and noun phrase]...
Criteria for noun phrase similaritysame wordnumbersGreek alpha-betanumbers followed by letters...
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Challenges – detection of boundary
Examplehyperglycemic clamps in carriers of a CA repeat in the IGF-I promoter
::::and an ApaI polymorphism in the IGF-II gene...
noun phrase “of” [noun phrase::::and noun phrase]...
[noun phrase “of” noun phrase::::and noun phrase]...
Criteria for noun phrase similaritysame wordnumbersGreek alpha-betanumbers followed by letters...
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Challenges – detection of boundary
Examplehyperglycemic clamps in carriers of a CA repeat in the IGF-I promoter
::::and an ApaI polymorphism in the IGF-II gene...
noun phrase “of” [noun phrase::::and noun phrase]...
[noun phrase “of” noun phrase::::and noun phrase]...
Criteria for noun phrase similaritysame wordnumbersGreek alpha-betanumbers followed by letters...
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Outline
1 Introduction and motivation
2 iSimp: what to simplify, and how
3 EvaluationiSimp accuracyImprovement of recall of information extraction systemsImprovement of sentence ranking systems
4 Summary and future work
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Results of simplification detection
100 abstracts from PubMed, for a total of 998 sentences5 judges annotated the corpus
0%
Conjunctions
Relative clauses
Appositions
0% 100%
recall
20% 40% 60% 80%100%
precision
80% 60% 40% 20%
typetype+boundary
100%
100%76.8%
88.5%
93.8%93.8%
87.9%85.5%
93.0%91.3%
83.3%83.3%
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Results of simplification detection
100 abstracts from PubMed, for a total of 998 sentences5 judges annotated the corpus
0% 0% 100%
recall
20% 40% 60% 80%100%
precision
80% 60% 40% 20%
type
100%
100%
93.8%
87.9%
93.0%
83.3%
Conjunctions
Relative clauses
Appositions
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Results of simplification detection
100 abstracts from PubMed, for a total of 998 sentences5 judges annotated the corpus
0%
Conjunctions
Relative clauses
Appositions
0% 100%
recall
20% 40% 60% 80%100%
precision
80% 60% 40% 20%
typetype+boundary
100%
100%76.8%
88.5%
93.8%93.8%
87.9%85.5%
93.0%91.3%
83.3%83.3%
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Improvement of recall of information extractionsystems
RLIMS-P∗
Protein phosphorylation information extraction systemHand-coded patterns
1,000 Medline abstracts related to phosphorylationWith simplification, we expect the recall to go upNumber of pairs matched: 1,768−→ 2,111 ( 20% more)Manual verification shows that, the precision stays the same
* Z.-Z. Hu, M. Narayanaswamy, K. E. Ravikumar, K. Vijay-Shanker, and C. H. Wu,“Literature mining and database annotation of protein phosphorylation using arule-based system,” Bioinformatics, vol. 21, no. 11, pp. 2759–2765, 2005.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Improvement of recall of information extractionsystems
RLIMS-P∗
Protein phosphorylation information extraction systemHand-coded patterns
1,000 Medline abstracts related to phosphorylationWith simplification, we expect the recall to go up
Number of pairs matched: 1,768−→ 2,111 ( 20% more)Manual verification shows that, the precision stays the same
* Z.-Z. Hu, M. Narayanaswamy, K. E. Ravikumar, K. Vijay-Shanker, and C. H. Wu,“Literature mining and database annotation of protein phosphorylation using arule-based system,” Bioinformatics, vol. 21, no. 11, pp. 2759–2765, 2005.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Improvement of recall of information extractionsystems
RLIMS-P∗
Protein phosphorylation information extraction systemHand-coded patterns
1,000 Medline abstracts related to phosphorylationWith simplification, we expect the recall to go upNumber of pairs matched: 1,768−→ 2,111 ( 20% more)
Manual verification shows that, the precision stays the same
* Z.-Z. Hu, M. Narayanaswamy, K. E. Ravikumar, K. Vijay-Shanker, and C. H. Wu,“Literature mining and database annotation of protein phosphorylation using arule-based system,” Bioinformatics, vol. 21, no. 11, pp. 2759–2765, 2005.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Improvement of recall of information extractionsystems
RLIMS-P∗
Protein phosphorylation information extraction systemHand-coded patterns
1,000 Medline abstracts related to phosphorylationWith simplification, we expect the recall to go upNumber of pairs matched: 1,768−→ 2,111 ( 20% more)Manual verification shows that, the precision stays the same
* Z.-Z. Hu, M. Narayanaswamy, K. E. Ravikumar, K. Vijay-Shanker, and C. H. Wu,“Literature mining and database annotation of protein phosphorylation using arule-based system,” Bioinformatics, vol. 21, no. 11, pp. 2759–2765, 2005.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Improvement of sentence ranking systems
RankPref∗
Rank sentences containing particular gene and relevant termSVM with linear kernel
100 gene-term pairsWith simplification, we expect the relation between gene andrelevant term is more clearnDCG of ranked sentences containing gene and relevant term:67%−→ 74% (relative improvement: 10.4% )
nDCG (normalized discounted cumulative gain) is a widely used metric used in
information retrieval to evaluate the quality of the ranked lists.
* C. O. Tudor and K. Vijay-Shanker, “Rankpref : Ranking sentences describingrelation between biomedical entities with an application,” in Procs. of BioNLP inconjunction with NAACL-HLT, 2012.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Improvement of sentence ranking systems
RankPref∗
Rank sentences containing particular gene and relevant termSVM with linear kernel
100 gene-term pairsWith simplification, we expect the relation between gene andrelevant term is more clear
nDCG of ranked sentences containing gene and relevant term:67%−→ 74% (relative improvement: 10.4% )
nDCG (normalized discounted cumulative gain) is a widely used metric used in
information retrieval to evaluate the quality of the ranked lists.
* C. O. Tudor and K. Vijay-Shanker, “Rankpref : Ranking sentences describingrelation between biomedical entities with an application,” in Procs. of BioNLP inconjunction with NAACL-HLT, 2012.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Improvement of sentence ranking systems
RankPref∗
Rank sentences containing particular gene and relevant termSVM with linear kernel
100 gene-term pairsWith simplification, we expect the relation between gene andrelevant term is more clearnDCG of ranked sentences containing gene and relevant term:67%−→ 74% (relative improvement: 10.4% )
nDCG (normalized discounted cumulative gain) is a widely used metric used in
information retrieval to evaluate the quality of the ranked lists.
* C. O. Tudor and K. Vijay-Shanker, “Rankpref : Ranking sentences describingrelation between biomedical entities with an application,” in Procs. of BioNLP inconjunction with NAACL-HLT, 2012.
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Summary
We developed a sentence simplifier – iSimpDetects six simplification structuresGenerates simplified sentencesRuns efficiently in linear time
We confirmed in experiments that iSimp helps improve textmining tools
Rule based information extraction toolsMachine learning based sentence ranking tools
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Summary
We developed a sentence simplifier – iSimpDetects six simplification structuresGenerates simplified sentencesRuns efficiently in linear time
We confirmed in experiments that iSimp helps improve textmining tools
Rule based information extraction toolsMachine learning based sentence ranking tools
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Future work
Analysis and developmentImprove boundary detection for conjunction constructsExamine the utility of iSimp for other biomedical text mining toolsAnalyze the use of simplification for different entity/conceptrelations
DisseminationMake iSimp available as a software moduleRelease the benchmark corpus used in the study
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Future work
Analysis and developmentImprove boundary detection for conjunction constructsExamine the utility of iSimp for other biomedical text mining toolsAnalyze the use of simplification for different entity/conceptrelations
DisseminationMake iSimp available as a software moduleRelease the benchmark corpus used in the study
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Acknowledgment
National Science Foundation (grant number 1062520)National Institutes of Health (grant number 1G08LM010720)OpenNLP for MaxEnt tagger and chunkerGENIA for the training corpusJudges who helped annotate the corpus
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text
Q & A
Yifan Peng, et. al. iSimp: A Sentence Simplification Systemfor Biomedical Text