FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document...
SCSIT Talk, Nottingham University, Thursday 16th June 2005
FRE 2645
Indexing of Graphic Document Images : a Perceptive Approach
Mathieu Delalandre¹,²Thursday 16th June 2005
¹ PSI Laboratory, Rouen University, France
² SCSIT, Nottingham University, UK
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Who I am ?Who I am ?
Mathieu Delalandre Thesis: Fourth year of PhD (defence in September) Lab: PSI Laboratory, Rouen city, France Super: E. Trupin, J.M. Ogier, J. Labiche Team: S. Adam, H. Locteau, P. Héroux, E. Barbu, Y. Lecourtier Field: Document Image Analysis (Graphics Recognition) Postdoc: IPI, SCSIT, from April to September (4-5 months) with
Tony Pridmore
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Indexing of Graphic Document Images : a Perceptive Approach
Introduction Systems Overview The Knowledge Level Conclusion
SCSIT Talk, Nottingham University, Thursday 16th June 2005
IntroductionIntroductionIndexing & Retrieval (I & R)Indexing & Retrieval (I & R)
Indexing & Retrieval [Greengrass’00] Indexing: Identification and recording of attributes of data that will aid
retrieval. Retrieval: Ability of a database management system to get back data
that were stored there previously.
Applications videos (MPEG, AVI, …) Web pages (XML, XHTML, …) structured documents (PDF, PS, Word, …) images (JPG, GIF, …)
-Indexing & Retrieval (I & R)
-Categorization of Images
-I & R of Document Images
-My Topic
SCSIT Talk, Nottingham University, Thursday 16th June 2005
IntroductionIntroductionCategorization of ImagesCategorization of Images
document images
trademark logo heading
journal
manual
photographies
foreground/background images
-Indexing & Retrieval (I & R)
-Categorization of Images
-I & R of Document Images
-My Topic
SCSIT Talk, Nottingham University, Thursday 16th June 2005
IntroductionIntroduction I & R of Document Images (1/3)I & R of Document Images (1/3)
Web Pages
ImagesMarkup LanguagesHTML, XHTML, ..
30% 70%
Document ImagesLogos, Headings, …
Photographies
60% 40%
Today, document images are not indexed by search engines due of complexity of Document Image Analysis (DIA) task [Doerman’98][Walker’00][Baird’03]
Is indexing of document images really needed ? two questions Question : How many document images and where [Spring’95] [Cleveland’98]
[Steve’99] [Ouf’01] [Baird’03] [Hu’04] ?
Deep Web
Web (8.1015ko)
0.3% 99.3%
Digital LibrariesOthers
Softwares, Data Bases, …
large (or main) part
Document Images Structured Documents
minor partmain part
-Indexing & Retrieval (I & R)
-Categorization of Images
-I & R of Document Images
-My Topic
SCSIT Talk, Nottingham University, Thursday 16th June 2005
IntroductionIntroduction I & R of Document Images (2/3)I & R of Document Images (2/3)
Paper (and image) has too many desirable properties,
document images and structured documents
will increasingly co-exist in the future [Breul’04]
Question : New or just old document images ?
-Indexing & Retrieval (I & R)
-Categorization of Images
-I & R of Document Images
-My Topic
SCSIT Talk, Nottingham University, Thursday 16th June 2005
IntroductionIntroduction I & R of Document Images (3/3)I & R of Document Images (3/3)
To Conclude : (1) DIA is needed (and will be needed) in the future of I &
R of documents [Baird’03] [Breul’04] (2) DIA must come back today under the way of I & R
[Baird’03]
-Indexing & Retrieval (I & R)
-Categorization of Images
-I & R of Document Images
-My Topic
SCSIT Talk, Nottingham University, Thursday 16th June 2005
IntroductionIntroduction My Topic My Topic
Indexing of graphic document images Indexing & Retrieval Indexing
Identification and recording of attributes of data that will aid retrieval
First step before retrieval
document images graphic document images
line drawing symbol logo asian script historical heading
-Indexing & Retrieval (I & R)
-Categorization of Images
-I & R of Document Images
-My Topic
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Indexing of Graphic Document Images : a Perceptive Approach
Introduction Systems Overview The Knowledge Level Conclusion
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Systems OverviewSystems OverviewIntroductionIntroduction
Overview of systems to index graphic document images we talk about Graphics Indexing Systems
Graphics Indexing Systems are specialized from DIA systems applied to recognition and understanding of graphic document images [Tombre’03] we talk about Graphics Recognition Systems
-Introduction
-Graphics Recognition Systems
-Graphics Indexing Systems
-Open Problems
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Systems OverviewSystems OverviewGraphics Recognition Systems (1/3)Graphics Recognition Systems (1/3)
Applications deal with graphics parts (symbol and linear) text/graphics segmentation [Tombre’02], vectorisation
[Mejbri’02], symbol recognition [Llados’02], document interpretation (or understanding) [Ablameko’00], …
symbol linear text
Graphics Recognition Systems : graphic document images structured documents
-Introduction
-Graphics Recognition Systems
-Graphics Indexing Systems
-Open Problems
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Systems OverviewSystems OverviewGraphics Recognition Systems (2/3)Graphics Recognition Systems (2/3)
Graphics are structured and connected
Graphics Recognition Systems are based on structural methods “relational organization of low-level features (graphic primitives) into higher-level
structures (graph)” [Tombre’96] [Shi’89]
symbol and its structure
connected symbol in drawing
lineconnect point
connect point T link
line
low level featuresgraphic primitives
lineconnect edge
higher-level structuregraph
T edge
symbol recognition
-Introduction
-Graphics Recognition Systems
-Graphics Indexing Systems
-Open Problems
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Systems OverviewSystems OverviewGraphics Recognition Systems (3/3)Graphics Recognition Systems (3/3)
Graphic Primitive Extraction, some methods [Wenyin’98] [Delalandre’04] : skeletonization [Hilaire’04], contouring [Ramel’00], tracking [Song’00], labelling [Badawy’02],
transform [Couasnon’01], meshes [Vaxiviere’95], region segmentation [Cao’00], run-length [Burge’98], …
Recognition Graph Matching [Bunke’00], Graph Transform [Blostein’05], Primitive Matching [Foggia’99], …
Architecture of Graphics Recognition Systems :
-Introduction
-Graphics Recognition Systems
-Graphics Indexing Systems
-Open Problems
Graphic PrimitiveExtraction
Recognition
document images graph of graphic primitives
<network><part id=”1”><symbols><labels></labels></symbols></part></network>
structured document
Graphic Models
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Systems OverviewSystems OverviewGraphics Indexing Systems (1/3)Graphics Indexing Systems (1/3)
Graphics Indexing Systems [Doerman’98] [Tombre’03], 3 classes :
Title block recognition [Arias’98], [Najman’01],
[Lamiroy’02], …
Statistical framework [Samet’96], [Worring’99], [Tabbone’03], [Terrades’03], …
Connected so no matched
Partial matching
Graphics indexing [Kasturi’88], [Lorenz’95], [Huang’97], [Hu’97], [Barbu’04], [Valasoulis’04], …
-Introduction
-Graphics Recognition Systems
-Graphics Indexing Systems
-Open Problems
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Systems OverviewSystems OverviewGraphics Indexing Systems (2/3)Graphics Indexing Systems (2/3)
Architecture of Graphics Indexing Systems :
Graphic PrimitiveExtraction
Indexing
Graph of graphic primitives indexing attributes specific set of graphic
primitives
Indexattributes+
document links
document links
-Introduction
-Graphics Recognition Systems
-Graphics Indexing Systems
-Open Problems
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Works
[Huang’97]
[Kasturi’88]
[Lorenz’95]
[Barbu’04]
[Hu’04]
[Dosh’04]
Graphic PrimitivesExtraction
thinning andchaining
run length encodingand polygonisation
contouring and polygonisation
thinning and neighbour analysis of skeleton’s pixels
thinning, chaining, and polygonisation
thinning,chaining, and polygonisation
Graph of Graphic Primitives
line graph of skeleton
straight line graph of contours and skeleton
2-D strings of contours
region adjacency graph
set of straight line of skeleton
set of straight line of skeleton
Indexing
cycle search, width and length matching of lines
Fourier approximationof line graph
string matching
graph mining
string matching
vectorial signature
Systems OverviewSystems OverviewGraphics Indexing Systems (3/3)Graphics Indexing Systems (3/3)
thinning
contouring region graph
skeleton graph
statistical
structural
-Introduction
-Graphics Recognition Systems
-Graphics Indexing Systems
-Open Problems
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Systems OverviewSystems OverviewOpen Problems (1/2)Open Problems (1/2)
All these systems use a Lexical/Syntactic (or Bottom/Up) approach [Tombre’96] Lexical (Bottom) : Extraction from images of graphical primitives in an fixed way Syntactic (Up) : Analysis of graphical primitives without returns on image
So, all these systems use a Document Understanding Approach, but I & R is not an Understanding problem
-Introduction
-Graphics Recognition Systems
-Graphics Indexing Systems
-Open Problems
Criterion Understanding I & R
Image Size large small and mediumData Base Size small large
Process Execution one shot every-timecomplexity
Graphic Primitives accurate approximatedNoise Level high and medium low and medium
robustness
Prior Knowledge yes noDocument Class few and known several and unknown
content adaptation
content adaptation is the most important feature of I & R systems
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Systems OverviewSystems OverviewOpen Problems (2/2)Open Problems (2/2)
-Introduction
-Graphics Recognition Systems
-Graphics Indexing Systems
-Open Problems
region based[Roque’03]
both based [Ramel’00]
line based[Hilaire’04]
Examples of Content Adaptation A broad class of document
Context
text/graphics segmentation
noise adaptation
To conclude A I & R must deal with the content adaptation Content adaptation can’t be solved without a knowledge based approach
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Indexing of Graphic Document Images : a Perceptive Approach
Introduction Systems Overview The Knowledge Level Conclusion
SCSIT Talk, Nottingham University, Thursday 16th June 2005
The Knowledge LevelThe Knowledge Level IntroductionIntroduction
Some (general) definitions [Tuthill’90] [Holsapple’04] Knowledge : human mental grasp of reality Representation : placement (and meaning) of knowledge into (from) computer
memory Formalism : a set of symbols corresponding to knowledge inside computers
Knowledge Human
Formalism(s) Computer
placementmeaning Human/Computer
Different types of knowledge on strategies [] on case based reasoning [] on ontologies [] ….
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
SCSIT Talk, Nottingham University, Thursday 16th June 2005
pixel-basedformalisms
vector-basedformalisms
graph-basedformalisms
graphic primitives
high-level objects
formalism levels
The Knowledge Level The Knowledge Level Graphical Knowledge (1/2)Graphical Knowledge (1/2)
Graphical Knowledge [Delalandre’05] : It is a type of knowledge corresponding to human mental grasp of graphics
Levels of Graphical Knowledge
image
symbol
perception
interpretation
abstraction levels
it is a gate !
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
SCSIT Talk, Nottingham University, Thursday 16th June 2005
primitives line images
The Knowledge Level The Knowledge Level Graphical Knowledge (2/2)Graphical Knowledge (2/2)
Two formalism levels [Tombre’96]
Graphic Primitives [Murray’96] Pixel-based formalism : pixel, raster,
run, connected component, … Vector-based formalism : vector, arc,
curve, ellipsis, square, …
Graph-based formalisms [Sowa 99]: Relational Attributed Graphs (RAG), Frames, Object-Oriented Languages, …
Relational Attributed Graphs [Seong’93]
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
SCSIT Talk, Nottingham University, Thursday 16th June 2005
The Knowledge Level The Knowledge Level Graphics Model (1/2)Graphics Model (1/2)
Model [Seguela’01] : a knowledge representation using given formalisms and for given system’s purposes
Graphics Model [Delalandre’05] : model used to represent the graphical knowledge
a (simple) shape graphic primitivesextremity junction line
line based modeljunction edge line
junction based modelextremity junction line edge
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
SCSIT Talk, Nottingham University, Thursday 16th June 2005
The Knowledge Level The Knowledge Level Graphics Model (2/2)Graphics Model (2/2)
region-based modelscomponent loop neighbour include
contour based modelsquadrilateral Line link Junction link
skeleton based modelsextremity junction line edge
One system = one model a considerable number of models [Joseph’92] [Pasternak’93] [Han’94] [Burgue’95] [Yu’97] [Lee’98] [Ramel’00]
[Couasnon’01] [Badawy’02] [Yan’04] …
Models depend of extracted graphic primitives, we can defined a graphics model taxonomy into 3 classes [Delalandre’05]
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
SCSIT Talk, Nottingham University, Thursday 16th June 2005
The Knowledge Level The Knowledge Level a Perceptive Approach (1/6)a Perceptive Approach (1/6)
Region Level
Contour Level
Skeleton Level
Perception Level
of RepresentationsGlobal
Local
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
specialisation aggregation
two links between levels
SCSIT Talk, Nottingham University, Thursday 16th June 2005
The Knowledge Level The Knowledge Level a Perceptive Approach (2/6)a Perceptive Approach (2/6)
classic models
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
Contour Level
Skeleton Level
Perception Level
of RepresentationsGlobal
Local
Region Level
hybrid models
perceptive approach (jump or browse)
SCSIT Talk, Nottingham University, Thursday 16th June 2005
The Knowledge Level The Knowledge Level a Perceptive Approach (3/6)a Perceptive Approach (3/6)
First step, the region level : connected component analysis [Alnuweiri’92]
foreground background
foreground’s components
background’s components
main background
loops
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Six Features (F) Foreground
(B) Background
(R) Resolution (ie. distance)
The Knowledge Level The Knowledge Level a Perceptive Approach (4/6)a Perceptive Approach (4/6)
(N) Neighboring
(S) Size
(I) Inclusion
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Use-Case Queries
The Knowledge Level The Knowledge Level a Perceptive Approach (5/6)a Perceptive Approach (5/6)
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
started image FR1 FR2
BR2 BR2S2 BR2S2N2
SCSIT Talk, Nottingham University, Thursday 16th June 2005
The Knowledge Level The Knowledge Level a Perceptive Approach (6/6)a Perceptive Approach (6/6)
True-Life Query
FS1
-Introduction
-Graphical Knowledge
-Graphics Model
-a Perceptive Approach
BR2 N>2
SCSIT Talk, Nottingham University, Thursday 16th June 2005
Indexing of Graphic Document Images : a Perceptive Approach
Introduction Systems Overview The Knowledge Level Conclusion
SCSIT Talk, Nottingham University, Thursday 16th June 2005
ConclusionConclusion
Conclusion It is just a bibliography study and ideas Start on this ideas ?
Perspectives Contour and skeleton levels ? System to control the representation building ?