Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax...

18
Module 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 1 Module 13 OCR Full Text and PDF Conversion Slide 2 Module 13 -- OCR Full Text and PDF Conversion OCR Full Text and PDF Conversion OCR Full Text and PDF Conversion OCR Full Text definition OCR Full Text configuration Batch class setup Document class setup Export setup PDF properties and output PDF configuration Batch class setup Document class setup Export setup Slide 3 Module 13 -- OCR Full Text and PDF Conversion OCR Full Text Definition OCR Full Text Uses Optical Character Recognition to convert pixels to text Allows inclusion of pictures and graphics Creates an active document in a variety of formats: Comma Separated Values - .csv Text - .txt Rich Text Format - .rtf HTML - .mht Word - .doc Excel - .xls

Transcript of Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax...

Page 1: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 1

Module 13OCR Full Text and PDF Conversion

Slide 2 Module 13 -- OCR Full Text and PDF Conversion

OCR Full Text and PDF Conversion

OCR Full Text and PDF

Conversion

• OCR Full Text definition

• OCR Full Text configuration

• Batch class setup

• Document class setup

• Export setup

• PDF properties and output

• PDF configuration

• Batch class setup

• Document class setup

• Export setup

Slide 3 Module 13 -- OCR Full Text and PDF Conversion

OCR Full Text Definition

• OCR Full Text

• Uses Optical Character Recognition to convert pixels to text

• Allows inclusion of pictures and graphics

• Creates an active document in a variety of formats:

• Comma Separated Values - .csv

• Text - .txt

• Rich Text Format - .rtf

• HTML - .mht

• Word - .doc

• Excel - .xls

Page 2: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 2

Slide 4 Module 13 -- OCR Full Text and PDF Conversion

Application

• OCR Full Text uses:

• Legal profession (case documents)

• Medical documents

• Scientific documents

• Government documents

• Historical documents

• Technical manuals

• Financial records

• Any kind of printed document that needs to be archived and electronically active or searchable.

• Not intended to perform extraction of data.

Slide 5 Module 13 -- OCR Full Text and PDF Conversion

Required Steps for OCR Full Text Setup

• There are three required steps for setting up OCR Full Text within the Administration module:

1) Add OCR Full Text to the batch class workflow (Batch Class properties, Queues tab)

2) Set the properties for OCR Full Text (Document Class, OCR tab)

3) Export the OCR Full Text-generated documents (export connector)

Slide 6 Module 13 -- OCR Full Text and PDF Conversion

Batch Class Setup

Add OCR Full Text to the batch class

workflow

Page 3: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 3

Slide 7 Module 13 -- OCR Full Text and PDF Conversion

Document Class Properties OCR Tab

Set the properties for OCR Full Text. [Enable OCR full

text] must be checked

A Dictionary support file may be used to add words not in the OCR Dictionary

Slide 8 Module 13 -- OCR Full Text and PDF Conversion

Dictionary Files

• A text file that has the following format:• 1 value per line

• 32 characters per line

• 1000 words max

Slide 9 Module 13 -- OCR Full Text and PDF Conversion

Recognition Profile - Edit

The first page of a document may be skipped. Example:

when using separator sheets

A Recognition profile’s properties are available with

[Edit]

Page 4: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 4

Slide 10 Module 13 -- OCR Full Text and PDF Conversion

Advanced OCR Full Text

The Non-natural language option may be selected

Then the valid characters can be defined

Select a language

Slide 11 Module 13 -- OCR Full Text and PDF Conversion

Languages Supported

Over 170 different choices are

available

Slide 12 Module 13 -- OCR Full Text and PDF Conversion

Mark Level – Spell Check

Mark level is the % of confidence required for

characters to be flagged when the confidence falls below that level

Spell check may be selected and a

character entered that will be placed before misspelled

words

Page 5: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 5

Slide 13 Module 13 -- OCR Full Text and PDF Conversion

Advanced and Output Options are Available

Advanced options may be edited

Slide 14 Module 13 -- OCR Full Text and PDF Conversion

Advanced Options – Attributes

To help the OCR engine to process the text, amount the text is rotated can be

specified.

Converting color/grayscale images to bitonal should

enhance performance

Text type and character type may be selected

Apply Low resolution mode setting to improve results

for low resolution documents

Slide 15 Module 13 -- OCR Full Text and PDF Conversion

Advanced Options -- Elements

Table options may be selected

If the document does not include pictures or

barcodes, disabling these options should enhance

performance

Text block forces the recognition engine to

consider the recognition zone as a text block. Other

settings for detecting tables, pictures, and bar

codes are then unavailable

Page 6: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 6

Slide 16 Module 13 -- OCR Full Text and PDF Conversion

Advanced Options -- Operation

Settings allow for balancing between optimum

performance and accuracy

Select this check box to ensure that small text is

clearly recognized

Slide 17 Module 13 -- OCR Full Text and PDF Conversion

Output Options

Output options may be edited

Slide 18 Module 13 -- OCR Full Text and PDF Conversion

Output Formats

Six output formats are available with

different options for each format

Page 7: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 7

Slide 19 Module 13 -- OCR Full Text and PDF Conversion

Plain Text Options

Plain Text options may be selected

Slide 20 Module 13 -- OCR Full Text and PDF Conversion

Rich Text Options

Text attributes and color may be

selected

Removing pictures may be selected as well as setting the level of

resolution

Slide 21 Module 13 -- OCR Full Text and PDF Conversion

HTML (mht) Options

Removing pictures may be selected as well as setting the level of

resolution

Change JPEG Quality to between 85 and 100 for best results

Text attributes and color may be

selected

Page 8: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 8

Slide 22 Module 13 -- OCR Full Text and PDF Conversion

Word Options

Text attributes and color may be

selected

Removing pictures may be selected as well as setting the level of

resolution

Slide 23 Module 13 -- OCR Full Text and PDF Conversion

Comma-Separated Options

A comma (,) is the default, however other

characters may be used as the separator

Slide 24 Module 13 -- OCR Full Text and PDF Conversion

Excel Options

Text options may be selected

Page 9: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 9

Slide 25 Module 13 -- OCR Full Text and PDF Conversion

“Save As” to Save Changes

When changes are made, the [Save As] command button is

used to save the profile with a new name

Slide 26 Module 13 -- OCR Full Text and PDF Conversion

Recognition Profile Selected

The new profile may be selected

Slide 27 Module 13 -- OCR Full Text and PDF Conversion

Text Export Connector Setup

Export the OCR Full Text-generated

documents

Select OCR output and enter or browse

to the directory

Select an output format if the image

files will also be exported

Page 10: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 10

Slide 28 Module 13 -- OCR Full Text and PDF Conversion

Processing

• Create a batch based on the OCR Full Text batch class

• Scan or import the batch of documents

• Process batch through the OCR Full Text queue

• Export the OCR Full Text documents

Slide 29 Module 13 -- OCR Full Text and PDF Conversion

OCR Full Text Processing

Batch name, class and progress are displayed

Document information is

displayed

An event log is displayed with auto

scroll enabled

A batch is opened, processed and

closed automatically

Slide 30 Module 13 -- OCR Full Text and PDF Conversion

OCR Full Text Files – Word Format

Documents are exported in the format selected

Page 11: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 11

Slide 31 Module 13 -- OCR Full Text and PDF Conversion

OCR Output File - Word

This file can be opened easily with

Microsoft Word

Notice the “^” indicating a

questionable character

Slide 32 Module 13 -- OCR Full Text and PDF Conversion

Required Steps for PDF File Conversion Setup

• There are three required steps for setting up PDF file conversion within the Administration Module:

1) Add PDF Generator to the batch class workflow (Batch Class, Queues tab)

2) Set the properties for PDF generation (Document Class, PDF tab)

3) Export the PDF documents (export connector)

Slide 33 Module 13 -- OCR Full Text and PDF Conversion

Adding Queues

Add the PDF Generator queue to the workflow.

PDF Generator has no user-

definable properties to

examine.

Page 12: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 12

Slide 34 Module 13 -- OCR Full Text and PDF Conversion

PDF Properties

Enable PDF generation for this document class.

Remember, we have already added the PDF Generator queue to the batch class

properties.

Set the properties for PDF generation. [Enable Kofax PDF

generation] must be checked

Slide 35 Module 13 -- OCR Full Text and PDF Conversion

Dictionary File

An optional dictionary file improves OCR results. Create in a program like Windows Notepad. The guidelines are:

•Up to 32 characters per word

•One entry per line

•Up to 1000 words total

•Save in *.txt format

Slide 36 Module 13 -- OCR Full Text and PDF Conversion

Multiple PDF Output Types are Available

• PDF output options include: PDF Image Only, PDF Image + Text, and PDF Text Under Image.

• PDF Text Under Image is similar to the Kofax PDF Image + Text profile, but differs as follows:

• Offers improved speed

• Sets the default “Page content” selection for the output format to Text Under Image, instead of Text Over Image

• Sets the default resolution to 72 dpi, instead of 200 dpi

Page 13: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 13

Slide 37 Module 13 -- OCR Full Text and PDF Conversion

Select a Recognition Profile

Click on [Edit] to examine or change

the profile properties

Select the recognition profile. Options include: PDF Image Only, PDF Image + Text, and PDF Text

Under Image.

Remember: Kofax PDF Image + Text and Text Under Image require

an additional license.

Slide 38 Module 13 -- OCR Full Text and PDF Conversion

Output Options – PDF Image Only

There is very little to edit with this profile: only an attached Image Cleanup

profile and Output options.

Slide 39 Module 13 -- OCR Full Text and PDF Conversion

PDF Image Only Settings

Resolution is in dots per inch (dpi). Define the

compression format for the output file. JPEG Quality

relates to compression -- the greater the quality, the less

compression and vice versa. Formats include: JPEG,

CCITT, JPEG 2000

PDF Version is selectable: Auto, 1.3, 1.4, 1.5, 1.6, or 1.7

Page 14: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 14

Slide 40 Module 13 -- OCR Full Text and PDF Conversion

Image Compression Profiles and PDF Compression

• PDF file compression is available and defined by PDF image compression profiles.

• PDF Compression reduces storage and bandwidth requirements.

• PDF image compression profiles are a convenient way to specify image compression settings for a process.

• Kofax Capture supports the Kofax PDF image compression profile.

Slide 41 Module 13 -- OCR Full Text and PDF Conversion

Image Compression Profile Settings

When configuring a PDF profile, PDF Compression

can be configured using an Image Compression profile.

Remember: PDF Compression requires additional licensing.

Slide 42 Module 13 -- OCR Full Text and PDF Conversion

Sensitivity, Background Quality, Picture Detection

• Sensitivity• High settings preserve faint data; low settings suppress

textured backgrounds. A high setting increases quality but increases file size; a low setting reduces quality and reduces file size.

• Background Quality• The background is rendered as a compressed image, and this

parameter controls the quality of this image. A high setting increases quality, but decreases compression, increasing file size; a low setting reduces quality, but increases compression, reducing file size.

• Picture Detection• Pictures are rendered as compressed images. This option

enables detection and adjusts picture quality and file size. A high setting increases quality but increases file size; a low setting reduces quality and reduces file size.

Page 15: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 15

Slide 43 Module 13 -- OCR Full Text and PDF Conversion

Recognition Profile – PDF Image + Text & Text Under Image

Click on [Edit] to examine or change

the profile properties

Slide 44 Module 13 -- OCR Full Text and PDF Conversion

Output Options – PDF Image + Text & Text Under Image

Advanced settings are available

Use the text entry field below the check box to specify the characters that are valid to

include in a word.

Slide 45 Module 13 -- OCR Full Text and PDF Conversion

PDF Image + Text & Text Under Image – Advanced Settings

A variety of advanced settings are available

Page 16: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 16

Slide 46 Module 13 -- OCR Full Text and PDF Conversion

PDF Image + Text & Text Under Image – Output Settings

Output settings are also available

Slide 47 Module 13 -- OCR Full Text and PDF Conversion

PDF Image + Text & Text Under Image – Output Options

Resolution is in dots per inch (dpi). JPEG Quality relates to compression --

the greater the quality, the less compression

and vice versa.

PDF Version is selectable

Text attributes can be set

Determine if the text layer is above or below

the image layer

Slide 48 Module 13 -- OCR Full Text and PDF Conversion

PDF Image + Text & Text Under Image – PDF Compression

PDF Compression can be configured using an Image

Compression profile.

Remember: PDF Compression requires additional licensing.

Page 17: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 17

Slide 49 Module 13 -- OCR Full Text and PDF Conversion

PDF/A Support

PDF/A Features• Preserves visual

appearance over time.

• Independent of tools & systems used for creating or storing.

• 100% self-contained – all necessary information for reproducing the file is embedded.

• Is available for PDF Image-only, PDF Image + Text, and PDF Text Under Image formats.

Select between Standard PDF and

PDF/A output formats.

Slide 50 Module 13 -- OCR Full Text and PDF Conversion

PDF Header Information

Data can be added to the PDF file header.

Click [OK] to save.

Slide 51 Module 13 -- OCR Full Text and PDF Conversion

Setting the PDF Output File Destination

Specify the export of the PDF

documents bysetting up the export folder.

The image files can be ignored if a

PDF file is detected at export.

Select [OK] to save

Page 18: Module 13 -- OCR Full Text and PDF ConversionModule 13 -- OCR Full Text and PDF Conversion Kofax Technical Training Page 3 Slide 7 Module 13 -- OCR Full Text and PDF Conversion Document

Module 13 -- OCR Full Text and PDF Conversion

Kofax Technical Training Page 18

Slide 52 Module 13 -- OCR Full Text and PDF Conversion

Processing

• Create a batch based on a batch class that includes the PDF Generator

• Scan or import the batch of documents

• Process the batch through the PDF Generator queue

• Export the PDF Documents

Slide 53 Module 13 -- OCR Full Text and PDF Conversion

Demonstration

OCR Full Text and PDF Conversion

Slide 54 Module 13 -- OCR Full Text and PDF Conversion

Lab

• OCR Full Text and PDF conversion

Refer to the Kofax Capture 10 Lab and Reference Guide