Efficient biomedical literature mining › documents › sophicdocs › BioLT Product P… ·...

2
Product: BioXM Knowledge Management Environment Applications: knowledge management and semantic data integration, research collaboration, information publishing and project management Product contact: [email protected] Biomax Informatics AG, Lochhamer Str. 9, 82152 Martinsried, Germany, +49 89 895574-0 Sophic Systems Alliance, Inc., 200 Main Street, Suite 201, Falmouth, MA 02536, USA, (508) 495-3801 Internet: www.biomax.com, www.sophicalliance.com Efficient biomedical literature mining Scientists involved in disease-specific research, target-gene identification, target validation, chemical-compound development, diagnostics and treatment spend valuable time screening scientific literature. At the expense of productive research time, the results are often unstructured and notably incomplete. The BioLT™ Literature Mining Tool provides an alternative. The intuitive software performs structured text mining using a number of highly curated biological and medical term dictionaries. The tool extracts relations from search terms and their synonyms to terms in selected dictionaries. More than 166 million pre-calculated relations and free- text search capabilities ensure comprehensive research area coverage. The resulting structured information can be easily shared, extended and updated. The results provide a starting point to generate and refine knowledge and hypotheses. The BioLT tool allows researchers to save time and produce significantly superior output compared to common PubMed searches. Integration of the BioLT tool in research infrastructures, for example the BioXM™ Knowledge Management Environment, can improve efficiencies and outcomes of R&D projects considerably. Building a knowledge base for oncology with BioLT linguistics The BioLT tool is the central data mining component used to create a manually curated, up-to-date index covering all cancer genes, including their compound and disease relationships. Preliminary results were published at ISMB 2005. Biomax also offers to carry out customized text-mining projects for other disease areas and biological contexts. The BioLT tool provides comprehensive, structured and ranked answers to the following types of questions: Which genes and proteins are known to be related to breast cancer? (For example, the BioLT tool presents a sorted list of about 3,000 gene/protein terms, compared to over 130,000 abstracts in PubMed) For obesity, which genes show genetic variation and which varients are described (e.g., nutrigenomics and pharmacogenomics use cases)? Which diseases and drug compounds are potentially related to Alzheimer’s disease? The BioLT tool with query results for diseases related to the protein apoE

Transcript of Efficient biomedical literature mining › documents › sophicdocs › BioLT Product P… ·...

Page 1: Efficient biomedical literature mining › documents › sophicdocs › BioLT Product P… · Efficient biomedical literature mining Scientists involved in disease-specific research,

Product: BioXM Knowledge Management Environment

Applications: knowledge management and semantic data integration,

research collaboration, information publishing and project management

Product contact: [email protected]

Biomax Informatics AG, Lochhamer Str. 9, 82152 Martinsried, Germany, +49 89 895574-0

Sophic Systems Alliance, Inc., 200 Main Street, Suite 201, Falmouth, MA 02536, USA, (508) 495-3801

Internet: www.biomax.com, www.sophicalliance.com

Efficient biomedical literature miningScientists involved in disease-specific research, target-gene identification, target

validation, chemical-compound development, diagnostics and treatment spend valuable

time screening scientific literature. At the expense of productive research time, the

results are often unstructured and notably incomplete.

The BioLT™ Literature Mining Tool provides an alternative. The intuitive software

performs structured text mining using a number of highly curated biological and medical

term dictionaries. The tool extracts relations from search terms and their synonyms to

terms in selected dictionaries. More than 166 million pre-calculated relations and free-

text search capabilities ensure comprehensive research area coverage. The resulting

structured information can be easily shared, extended and updated. The results provide

a starting point to generate and refine knowledge and hypotheses. The BioLT tool allows

researchers to save time and produce significantly superior output compared to common

PubMed searches. Integration of the BioLT tool in research infrastructures, for example

the BioXM™ Knowledge Management Environment, can improve efficiencies and

outcomes of R&D projects considerably.

Building a knowledge base for oncology with BioLT linguistics

The BioLT tool is the central data mining component used to create a manually curated,

up-to-date index covering all cancer genes, including their compound and disease

relationships. Preliminary results were published at ISMB 2005.

Biomax also offers to carry out customized text-mining projects for other disease areas

and biological contexts.

The BioLT tool provides

comprehensive, structured and

ranked answers to the

following types of questions:

• Which genes and proteins are

known to be related to breast

cancer?

(For example, the BioLT tool

presents a sorted list of about

3,000 gene/protein terms,

compared to over 130,000

abstracts in PubMed)

• For obesity, which genes show

genetic variation and which

varients are described (e.g.,

nutrigenomics and

pharmacogenomics use cases)?

• Which diseases and drug

compounds are potentially

related to Alzheimer’s disease?

The BioLT tool with query results for diseases related to the protein apoE

Page 2: Efficient biomedical literature mining › documents › sophicdocs › BioLT Product P… · Efficient biomedical literature mining Scientists involved in disease-specific research,

Biomax Informatics AG, Lochhamer Str. 9, 82152 Martinsried, Germany, +49 89 895574-0

Sophic Systems Alliance, Inc., 200 Main Street, Suite 201, Falmouth, MA 02536, USA, (508) 495-3801

Internet: www.biomax.com, www.sophicalliance.com

Automatically generated expert

knowledge

The BioLT tool delivers clearly

structured results with extraordinary

recall and precision, as shown in the

following benchmark example. The

BioLT results were compared to a

manually curated list of "all major

pathways and hereditary cancer

predisposition types" each related to

one of 57 representative predisposition

genes (Vogelstein and Kinzler, 2004*).

With 100% recall, all 57 genes and 57

cancer types were represented in the

BioLT dictionaries. 95% of the

relationships were ranked in the top

three results of up to thousands of hits.

For the remaining three genes, the

corresponding diseases were found in

positions four and five.

The BioLT tool automatically generates

comprehensive results comparable to

the knowledge of expert scientists. The

BioLT text-mining approach works for

other disease areas (such as

cardiovascular, neurological and

infectious diseases) and for additional

biological research areas as well.

Integration into biological and

medical project management

The BioLT tool uses hiqh-quality

thematic dictionaries to identify

relationships between research objects.

The dictionaries can be extended and

customized. The following dictionaries

are currently available:

• Disease — 260,000 entries

• Gene name — 130,000 human gene

names, including name variants

• Compound — 82,000 entries

• Pathway — 61,000 entries

• Organism — 275,000 entries

• Other subdomains (e.g.,

polymorphism, therapy, tissues, cells)

These relationship data sets can be

imported into the BioXM Knowledge

Management Environment for further

curation. With the upload, they are

automatically integrated into a user-

defined biological or medical context.

Thus, BioLT results become part of an

efficient infrastructure even for large

distributed R&D projects.

* Vogelstein B and Kinzler KW (2004) Cancergenes and the pathways they control. Nat Med10(8):789–99

Text-mining technology

In contrast to classical information

retrieval systems, the BioLT software

preprocesses the underlying text

databases (such as scientific or patent

information) with specific background

information. The system first recognizes

all chunks of text (phrases), special

patterns for scientific notations and

words belonging to terminology

dictionaries. After the syntactic analysis,

the system tries to determine the

meaning of ambiguous terms. To

ensure the most complete results,

potentially false meanings are marked,

but are not deleted from the knowledge

database. The resulting text databases

are manually curated by experts to

create the thematic dictionaries used

by the BioLT system.

The BioLT tool uses the BioRS™

Integration and Retrieval System to add

Boolean free-text search capabilities.

Diverse analysis parameters including

the scope of the search, the level of

precision, the resolution of terms with

multiple meanings and the statistical

representation of the results can be

selected.

Dictionary terms in all abstracts BioLT results in the context of a clinical study, displayed in the BioXM software

Biomax, BioLT, BioRS and BioXM are registered trademarks of Biomax Informatics AG in Germany and other countries. Registered names, trademarks, etc., used in this docu-ment, even when not specifically marked as such, are not to be considered unprotected by law. BIOLTPPR0602

FREE TRIAL The BioLT tool for efficient text mining of the MEDLINE database is available using a common Web browser from

www.biomax.com/products/biolt/biolt.htm. Contact us for a free demo account and see how the BioLT tool can speed your research.