Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology...

Post on 28-Mar-2015

218 views 2 download

Tags:

Transcript of Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology...

Institut für Print- und Medientechnik der TU Chemnitz[Institute for Print and Media Technology • Chemnitz University of

Technology] Direktor: Prof. Dr. Arved C. Hübler • Reichenhainer Str. 70 • 09126 Chemnitz • Germany

http://www.tu-chemnitz.de/pm • pmhuebler@mb.tu-chemnitz.de • Tel: +49-371-531-2364 • Fax: -3780

Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing Stefan Pletschacher; Marcel Eckert; Arved C. Hübler

2 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Digitization of Historical Documents

GEB1150

3 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Alphabet und Font Extraction

XML instance

alphabet and fontdefinition

...

content

...

glyph ID1

glyph ID2

ID1 ID2 ID3ID3 ID4

4 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Vectorization - Raster to Vector Conversion

font assignmen

t

Vectorization

RIP

41 hex

OCR

vector font

encoded text e.g.

ASCII

bitmap graphic

5 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

document image

blocks

textual blocks image blocks

structural information

region basedsegmentation

blockclassification

text lines

segmentation

words

characters

segmentation

segmentation

DIA System und Workflow

1. text (headline)

2. bitmap image 3. text block

4. text block

1. text (headline)

2. bitmap image 3. text block

4. text block

6 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

character images

set of prototypes

clustering

vectorisable glyphs

classification ofvectorisable glyphs

non vectorisable

images

set of bitmap symbols

IDassignment

set of vectorised

paths

vectorisation

document specific

SVG font

transformationto SVG

set of SVG glyph

descriptions

assignment of private Unicode code points

DIA System und Workflow

&#xE000

7 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

DIA System und Workflow

XML + SVG encoded

document

image blocks

structural information

set of bitmap symbols

document specific

SVG font

encoding

references

specific output formats

layout modificationby means of XSLT

OCR

XML

1. text (headline)

2. bitmap image 3. text block

4. text block

1. text (headline)

2. bitmap image 3. text block

4. text block

8 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Vectorization Approaches

• Contour based

• Skeleton based

CompxNCompxCore )(: CorexCompxCont :

zxyxCompContxdist

zyCompContzyCompxS

)(,(

, ),(,:

9 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Applied Algorithms

• Pre-processing- Finding connected components (Region Growing)- Contour extraction (Contour following)

• Polygonal Approximation Based on Relaxation- Phase 1: Clustering of polygonal points- Phase 2: Relaxation (Error correction)

• Automatic Parameter Control- Rasterization of the resulting glyph images- Ascertaining a weighted error (Ground Truth)- Selecting appropriate vectorization parameters

10 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Finding Connected Components

Ü Ö Ä % “ !

11 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Region Growing

12 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Contour Following

white pixel

black pixel

starting point

examination order

13 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Clustering of Polygonal Points

14 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Relaxation

15 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

SVG Representation

16 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Visual Quality

17 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Formal Quality Measurement - Ground Truth

Error function- absolute number of wrong pixels- weighted by the distance to the next true component

18 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Results

19 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2

vectorization parameter ε

acc

ura

cy

H K d

Adaptive Parameter Control

-5

-4

-3

-2

-1

0

1

2

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2

vectorization parameter ε

accu

racy

gra

die

nt

H K d

20 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Compression rates

21 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Conclusions

• Good vectorization results already with linear primitives• High compression rates can be achieved• Extracted fonts can be easily scaled and further formatted• Known vectorization methods have been extended towards an adaptive system for automatic parameter control• These methods can be applied for preservation and handling of unknown type faces in digitized documents• Originals may be re-encoded using a document specific alphabet and font• Direct integration into XML/SVG based processes possible• Various output formats can be supported by means of XSL transformations

22 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Thank you very much!

stefan.pletschacher@mb.tu-chemnitz.de

Questions