UIMA Introduction
description
Transcript of UIMA Introduction
UIMA Introduction
SHARPn Summit June 11, 2012
Outline
UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations interactively
UIMA Terminology
CAS XCAS JCAS View Analysis Engine (AE) / Annotator XML output: XCAS XMI Type System JCasGen CAS Visual Debugger (CVD) CPE (Collection Processing Engine)
UIMA
Framework– Defining data types– Passing data from one component to another
Tooling– Viewing results– Debugging– Editing XML visually
Data Through a Pipeline
Type System– Defines the data types passed along
CAS (Common Analysis Structure)– Container for the data passed along
– Created by UIMA from the Type System
Parts of a UIMA Pipeline
Collection Reader– Read input document
Analysis Engine(s) / Annotator(s)– Process document
CAS Consumer– Output data
Tying a Pipeline Together
CPE descriptor (Collection Processing Engine)
– Collection Reader – Analysis Engine(s)
– CAS Consumer
Aggregate analysis engine– Multiple Analysis Engines and their order
Pipeline Example
UIMA term
Collection Reader
Analysis Engine
Analysis Engine
Analysis Engine
CAS Consumer
Example
Read files from a dir
Sentence detector
Tokenizer annotator
Part of Speech tagger
Output tokens to DB
UIMA plugin for Eclipse
Provides visual editors for descriptors – Mini GUI for selecting options – Rather than editing XML directly
An “Update site” exists for installing pluginhttp://www.apache.org/dist/incubator/uima/eclipse-update-site
UIMA Tooling Options
Tools:– CPE Configurator – CVD (CAS Visual Debugger)
Options:– Command line scripts/.bat files
– Run within Eclipse
Running a Pipeline - CPE
cTAKES provides a script and a bat filerunctakesCPE
Choose a CPE descriptor, such astest_plaintext.xml
from cTAKESdesc/cdpdesc/collection_processing_engine
Viewing Annotations - CVD
Viewing annotations using the CVD– Load the Type System– Load the XCAS or XMI
Annotation Viewers
UIMA tools– CVD (CAS Visual Debugger)– Annotation viewer
Viewing XML output– Any XML viewer
– Any text editor
Supplemental slides follow
Options to Run a Pipeline
CPE GUI CVD GUI
– Single Aggregate Analysis Engine– No Collection Reader
Instantiate a CpeDescription and invoke
the process() method uimaFIT– removes dependency on XML
Creating a New Annotator
Within Eclipse– Create Java project– Right click -> Add UIMA Nature– Add UIMA jars to .classpath (Build Path)– Create Analysis Engine (AE) descriptor– Add types to AE descriptor, or optionally
create separate Type System descriptor– Write code!
Running an AE in CVD
Using CVD to run an Analysis Engine– No Collection Reader– Single Analysis Engine (can be an aggregate)– No CAS Consumer
– Load an Analysis Engine – Paste/type in text to process
Family history of hyperlipidemia.
Modifying a parameter
UIMA’s descriptor editors allow you to modify most parameters without looking at the XML itself.
Links
Getting started with UIMA http://uima.apache.org/doc-uima-annotator.html
UIMA Update site for use in Eclipse http://www.apache.org/dist/incubator/uima/eclipse-update-site