Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax...
Transcript of Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax...
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 1
Module 13OCR Full Text and PDF Conversion
Slide 2 Module 13 -- OCR Full Text and PDF Conversion
OCR Full Text and PDF Conversion
OCR Full Text and PDF
Conversion
• OCR Full Text definition
• OCR Full Text configuration
• Batch class setup
• Document class setup
• Export setup
• PDF properties and output
• PDF configuration
• Batch class setup
• Document class setup
• Export setup
Slide 3 Module 13 -- OCR Full Text and PDF Conversion
OCR Full Text Definition
• OCR Full Text
• Uses Optical Character Recognition to convert pixels to text
• Allows inclusion of pictures and graphics
• Creates an active document in a variety of formats:
• Comma Separated Values - .csv
• Text - .txt
• Rich Text Format - .rtf
• HTML - .mht
• Word - .doc
• Excel - .xls
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 2
Slide 4 Module 13 -- OCR Full Text and PDF Conversion
Application
• OCR Full Text uses:
• Legal profession (case documents)
• Medical documents
• Scientific documents
• Government documents
• Historical documents
• Technical manuals
• Financial records
• Any kind of printed document that needs to be archived and electronically active or searchable.
• Not intended to perform extraction of data.
Slide 5 Module 13 -- OCR Full Text and PDF Conversion
Required Steps for OCR Full Text Setup
• There are three required steps for setting up OCR Full Text within the Administration module:
1) Add OCR Full Text to the batch class workflow (Batch Class properties, Queues tab)
2) Set the properties for OCR Full Text (Document Class, OCR tab)
3) Export the OCR Full Text-generated documents (export connector)
Slide 6 Module 13 -- OCR Full Text and PDF Conversion
Batch Class Setup
Add OCR Full Text to the batch class
workflow
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 3
Slide 7 Module 13 -- OCR Full Text and PDF Conversion
Document Class Properties OCR Tab
Set the properties for OCR Full Text. [Enable OCR full
text] must be checked
A Dictionary support file may be used to add words not in the OCR Dictionary
Slide 8 Module 13 -- OCR Full Text and PDF Conversion
Dictionary Files
• A text file that has the following format:• 1 value per line
• 32 characters per line
• 1000 words max
Slide 9 Module 13 -- OCR Full Text and PDF Conversion
Recognition Profile - Edit
The first page of a document may be skipped. Example:
when using separator sheets
A Recognition profile’s properties are available with
[Edit]
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 4
Slide 10 Module 13 -- OCR Full Text and PDF Conversion
Advanced OCR Full Text
The Non-natural language option may be selected
Then the valid characters can be defined
Select a language
Slide 11 Module 13 -- OCR Full Text and PDF Conversion
Languages Supported
Over 170 different choices are
available
Slide 12 Module 13 -- OCR Full Text and PDF Conversion
Mark Level – Spell Check
Mark level is the % of confidence required for
characters to be flagged when the confidence falls below that level
Spell check may be selected and a
character entered that will be placed before misspelled
words
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 5
Slide 13 Module 13 -- OCR Full Text and PDF Conversion
Advanced and Output Options are Available
Advanced options may be edited
Slide 14 Module 13 -- OCR Full Text and PDF Conversion
Advanced Options – Attributes
To help the OCR engine to process the text, amount the text is rotated can be
specified.
Converting color/grayscale images to bitonal should
enhance performance
Text type and character type may be selected
Apply Low resolution mode setting to improve results
for low resolution documents
Slide 15 Module 13 -- OCR Full Text and PDF Conversion
Advanced Options -- Elements
Table options may be selected
If the document does not include pictures or
barcodes, disabling these options should enhance
performance
Text block forces the recognition engine to
consider the recognition zone as a text block. Other
settings for detecting tables, pictures, and bar
codes are then unavailable
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 6
Slide 16 Module 13 -- OCR Full Text and PDF Conversion
Advanced Options -- Operation
Settings allow for balancing between optimum
performance and accuracy
Select this check box to ensure that small text is
clearly recognized
Slide 17 Module 13 -- OCR Full Text and PDF Conversion
Output Options
Output options may be edited
Slide 18 Module 13 -- OCR Full Text and PDF Conversion
Output Formats
Six output formats are available with
different options for each format
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 7
Slide 19 Module 13 -- OCR Full Text and PDF Conversion
Plain Text Options
Plain Text options may be selected
Slide 20 Module 13 -- OCR Full Text and PDF Conversion
Rich Text Options
Text attributes and color may be
selected
Removing pictures may be selected as well as setting the level of
resolution
Slide 21 Module 13 -- OCR Full Text and PDF Conversion
HTML (mht) Options
Removing pictures may be selected as well as setting the level of
resolution
Change JPEG Quality to between 85 and 100 for best results
Text attributes and color may be
selected
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 8
Slide 22 Module 13 -- OCR Full Text and PDF Conversion
Word Options
Text attributes and color may be
selected
Removing pictures may be selected as well as setting the level of
resolution
Slide 23 Module 13 -- OCR Full Text and PDF Conversion
Comma-Separated Options
A comma (,) is the default, however other
characters may be used as the separator
Slide 24 Module 13 -- OCR Full Text and PDF Conversion
Excel Options
Text options may be selected
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 9
Slide 25 Module 13 -- OCR Full Text and PDF Conversion
“Save As” to Save Changes
When changes are made, the [Save As] command button is
used to save the profile with a new name
Slide 26 Module 13 -- OCR Full Text and PDF Conversion
Recognition Profile Selected
The new profile may be selected
Slide 27 Module 13 -- OCR Full Text and PDF Conversion
Text Export Connector Setup
Export the OCR Full Text-generated
documents
Select OCR output and enter or browse
to the directory
Select an output format if the image
files will also be exported
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 10
Slide 28 Module 13 -- OCR Full Text and PDF Conversion
Processing
• Create a batch based on the OCR Full Text batch class
• Scan or import the batch of documents
• Process batch through the OCR Full Text queue
• Export the OCR Full Text documents
Slide 29 Module 13 -- OCR Full Text and PDF Conversion
OCR Full Text Processing
Batch name, class and progress are displayed
Document information is
displayed
An event log is displayed with auto
scroll enabled
A batch is opened, processed and
closed automatically
Slide 30 Module 13 -- OCR Full Text and PDF Conversion
OCR Full Text Files – Word Format
Documents are exported in the format selected
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 11
Slide 31 Module 13 -- OCR Full Text and PDF Conversion
OCR Output File - Word
This file can be opened easily with
Microsoft Word
Notice the “^” indicating a
questionable character
Slide 32 Module 13 -- OCR Full Text and PDF Conversion
Required Steps for PDF File Conversion Setup
• There are three required steps for setting up PDF file conversion within the Administration Module:
1) Add PDF Generator to the batch class workflow (Batch Class, Queues tab)
2) Set the properties for PDF generation (Document Class, PDF tab)
3) Export the PDF documents (export connector)
Slide 33 Module 13 -- OCR Full Text and PDF Conversion
Adding Queues
Add the PDF Generator queue to the workflow.
PDF Generator has no user-
definable properties to
examine.
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 12
Slide 34 Module 13 -- OCR Full Text and PDF Conversion
PDF Properties
Enable PDF generation for this document class.
Remember, we have already added the PDF Generator queue to the batch class
properties.
Set the properties for PDF generation. [Enable Kofax PDF
generation] must be checked
Slide 35 Module 13 -- OCR Full Text and PDF Conversion
Dictionary File
An optional dictionary file improves OCR results. Create in a program like Windows Notepad. The guidelines are:
•Up to 32 characters per word
•One entry per line
•Up to 1000 words total
•Save in *.txt format
Slide 36 Module 13 -- OCR Full Text and PDF Conversion
Multiple PDF Output Types are Available
• PDF output options include: PDF Image Only, PDF Image + Text, and PDF Text Under Image.
• PDF Text Under Image is similar to the Kofax PDF Image + Text profile, but differs as follows:
• Offers improved speed
• Sets the default “Page content” selection for the output format to Text Under Image, instead of Text Over Image
• Sets the default resolution to 72 dpi, instead of 200 dpi
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 13
Slide 37 Module 13 -- OCR Full Text and PDF Conversion
Select a Recognition Profile
Click on [Edit] to examine or change
the profile properties
Select the recognition profile. Options include: PDF Image Only, PDF Image + Text, and PDF Text
Under Image.
Remember: Kofax PDF Image + Text and Text Under Image require
an additional license.
Slide 38 Module 13 -- OCR Full Text and PDF Conversion
Output Options – PDF Image Only
There is very little to edit with this profile: only an attached Image Cleanup
profile and Output options.
Slide 39 Module 13 -- OCR Full Text and PDF Conversion
PDF Image Only Settings
Resolution is in dots per inch (dpi). Define the
compression format for the output file. JPEG Quality
relates to compression -- the greater the quality, the less
compression and vice versa. Formats include: JPEG,
CCITT, JPEG 2000
PDF Version is selectable: Auto, 1.3, 1.4, 1.5, 1.6, or 1.7
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 14
Slide 40 Module 13 -- OCR Full Text and PDF Conversion
Image Compression Profiles and PDF Compression
• PDF file compression is available and defined by PDF image compression profiles.
• PDF Compression reduces storage and bandwidth requirements.
• PDF image compression profiles are a convenient way to specify image compression settings for a process.
• Kofax Capture supports the Kofax PDF image compression profile.
Slide 41 Module 13 -- OCR Full Text and PDF Conversion
Image Compression Profile Settings
When configuring a PDF profile, PDF Compression
can be configured using an Image Compression profile.
Remember: PDF Compression requires additional licensing.
Slide 42 Module 13 -- OCR Full Text and PDF Conversion
Sensitivity, Background Quality, Picture Detection
• Sensitivity• High settings preserve faint data; low settings suppress
textured backgrounds. A high setting increases quality but increases file size; a low setting reduces quality and reduces file size.
• Background Quality• The background is rendered as a compressed image, and this
parameter controls the quality of this image. A high setting increases quality, but decreases compression, increasing file size; a low setting reduces quality, but increases compression, reducing file size.
• Picture Detection• Pictures are rendered as compressed images. This option
enables detection and adjusts picture quality and file size. A high setting increases quality but increases file size; a low setting reduces quality and reduces file size.
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 15
Slide 43 Module 13 -- OCR Full Text and PDF Conversion
Recognition Profile – PDF Image + Text & Text Under Image
Click on [Edit] to examine or change
the profile properties
Slide 44 Module 13 -- OCR Full Text and PDF Conversion
Output Options – PDF Image + Text & Text Under Image
Advanced settings are available
Use the text entry field below the check box to specify the characters that are valid to
include in a word.
Slide 45 Module 13 -- OCR Full Text and PDF Conversion
PDF Image + Text & Text Under Image – Advanced Settings
A variety of advanced settings are available
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 16
Slide 46 Module 13 -- OCR Full Text and PDF Conversion
PDF Image + Text & Text Under Image – Output Settings
Output settings are also available
Slide 47 Module 13 -- OCR Full Text and PDF Conversion
PDF Image + Text & Text Under Image – Output Options
Resolution is in dots per inch (dpi). JPEG Quality relates to compression --
the greater the quality, the less compression
and vice versa.
PDF Version is selectable
Text attributes can be set
Determine if the text layer is above or below
the image layer
Slide 48 Module 13 -- OCR Full Text and PDF Conversion
PDF Image + Text & Text Under Image – PDF Compression
PDF Compression can be configured using an Image
Compression profile.
Remember: PDF Compression requires additional licensing.
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 17
Slide 49 Module 13 -- OCR Full Text and PDF Conversion
PDF/A Support
PDF/A Features• Preserves visual
appearance over time.
• Independent of tools & systems used for creating or storing.
• 100% self-contained – all necessary information for reproducing the file is embedded.
• Is available for PDF Image-only, PDF Image + Text, and PDF Text Under Image formats.
Select between Standard PDF and
PDF/A output formats.
Slide 50 Module 13 -- OCR Full Text and PDF Conversion
PDF Header Information
Data can be added to the PDF file header.
Click [OK] to save.
Slide 51 Module 13 -- OCR Full Text and PDF Conversion
Setting the PDF Output File Destination
Specify the export of the PDF
documents bysetting up the export folder.
The image files can be ignored if a
PDF file is detected at export.
Select [OK] to save
Module 13 -- OCR Full Text and PDF Conversion
Kofax Technical Training Page 18
Slide 52 Module 13 -- OCR Full Text and PDF Conversion
Processing
• Create a batch based on a batch class that includes the PDF Generator
• Scan or import the batch of documents
• Process the batch through the PDF Generator queue
• Export the PDF Documents
Slide 53 Module 13 -- OCR Full Text and PDF Conversion
Demonstration
OCR Full Text and PDF Conversion
Slide 54 Module 13 -- OCR Full Text and PDF Conversion
Lab
• OCR Full Text and PDF conversion
Refer to the Kofax Capture 10 Lab and Reference Guide