Writer identification through information retrieval Ralph Niels, Franc Grootjen & Louis Vuurpijl.
-
Upload
godfrey-lionel-stokes -
Category
Documents
-
view
214 -
download
0
Transcript of Writer identification through information retrieval Ralph Niels, Franc Grootjen & Louis Vuurpijl.
Writer identification through information retrieval
Ralph Niels, Franc Grootjen & Louis Vuurpijl
August 21st, 2008ICFHR, Montreal
A search engine for forensic experts
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Overview
• Forensic writer identification• Prototypical shapes in handwriting• Information retrieval (IR)
• Traditional• Writer identification using
prototypes
• Experiments• Method• Results
• Conclusions & future work
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Forensic writer identification
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Forensic information retrieval
• Web search: query of words to search in documents containing words
• Forensic search: query of characters to search in documents containing characters
• Previous work*: sub-character level, binary features• Based on characters: improves justification possibilities
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
* A. Bensefia, T. Paquet, and L. Heutte. A writer identification and verification system. Pattern Recogn. Letters, 26(13):2080–2092, 2005.
Forensic information retrieval
• Dictionary of character shapes: prototypes– Experts use prototypes– Describe query & documents by prototype usage
instances ofprototype
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Prototypes
Character to prototype matcher
• Find most similar prototype for each character
W48 h16 a9 t1 y2 o1 u23 d16 i25 d12 i6 s12 (…)
a5
a9
a16
a52
(…)
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Prototypes
• Averaged shapes of real handwritten characters• Dynamic Time Warping-distance to find most similar
prototype
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
R. Niels & L. Vuurpijl & L. Schomaker. Automatic allograph matching in forensic writer identification. International Journal of Pattern Recognition and Artificial Intelligence. Vol. 21, No. 1. Pages 61-81. February 2007.
Prototypes
The IR model for writer identification
Character to prototype matcher
Indexing
Matching
Character to prototype matcher
Writer input
Query input
Prototype list
af(q)
af(w) aw(w)
Rankedlist
Justification
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Indexing: create weighted vectors
• Vector of prototype usage for each writer: af(w)• Adjust weight of prototypes in that vector:
• Protos used by many writers: not distinctive -> lower weight• wf(p) = number of writers using proto p
• Weighted vector of prototype use for each writer
)log()( )(2
pwfnpiwf
)()()( piwfwafwaw p
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
The IR model for writer identification
Character to prototype matcher
Indexing
Matching
Character to prototype matcher
Writer input
Query input
Prototype list
af(q)
af(w) aw(w)
Rankedlist
Justification
Prototype frequency in query
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
The IR model for writer identification
Character to prototype matcher
Indexing
Matching
Character to prototype matcher
Writer input
Query input
Prototype list
af(q)
af(w) aw(w)
Rankedlist
Justification
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Matching
• Input• ‘Database writers’: Indexed writer vectors aw(w)• ‘Query writer’: Vector af(q)
• Match:• Calculate cosine of angle between af(q) and each aw(w)
• Output• Ranked list of writers (similarity to query)
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
The IR model for writer identification
Character to prototype matcher
Indexing
Matching
Character to prototype matcher
Writer input
Query input
Prototype list
af(q)
af(w) aw(w)
Rankedlist
Justification
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Justification
• Similarity value (cosine of angle)• Prototype contribution to retrieval result
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Justification
• Forensic expert can further inspect justification
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Experiment
• 43 writers from plucoll database• Online data• Segmented into characters
• How well does our technique perform given a certain amount of data (characters)?• Amount of characters in database (d)• Amount of characters in query (q)
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Experiment
• Pick d random letters from each database writer• Pick q random other letters from one writer,
and use those as query• Find most similar writer
• Prototypes• iwf(p), aw(w)• Matching
• Vary d and q
Repeat 10 times for each writer
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Repeat10 times for each comb. ofd and q
Results
100 300 500 1000
10 59 79 83 88
30 86 97 99 100
50 94 99 100 100
70 96 100 100 100
100 98 100 100 100
dq
d
q
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Conclusions & future work
• Needed for 100%: 70 chars (q), 300 chars (d)• Average English sentence: 75-100 characters
• No black box: results are justified
• Online data: forensic practice?• Extract semi-automatically with help expert• Use offline matching technique
• Just 43 writers• Bigger (n writers & n techniques) experiments planned
• Promising results
Writer identification throughinformation retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl