Writer identification through information retrieval Ralph Niels, Franc Grootjen & Louis Vuurpijl.

21
Writer identification through information retrieval Ralph Niels , Franc Grootjen & Louis Vuurpijl August 21st, 2008 ICFHR, Montreal

Transcript of Writer identification through information retrieval Ralph Niels, Franc Grootjen & Louis Vuurpijl.

Writer identification through information retrieval

Ralph Niels, Franc Grootjen & Louis Vuurpijl

August 21st, 2008ICFHR, Montreal

A search engine for forensic experts

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Overview

• Forensic writer identification• Prototypical shapes in handwriting• Information retrieval (IR)

• Traditional• Writer identification using

prototypes

• Experiments• Method• Results

• Conclusions & future work

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Forensic writer identification

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Forensic information retrieval

• Web search: query of words to search in documents containing words

• Forensic search: query of characters to search in documents containing characters

• Previous work*: sub-character level, binary features• Based on characters: improves justification possibilities

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

* A. Bensefia, T. Paquet, and L. Heutte. A writer identification and verification system. Pattern Recogn. Letters, 26(13):2080–2092, 2005.

Forensic information retrieval

• Dictionary of character shapes: prototypes– Experts use prototypes– Describe query & documents by prototype usage

instances ofprototype

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Prototypes

Character to prototype matcher

• Find most similar prototype for each character

W48 h16 a9 t1 y2 o1 u23 d16 i25 d12 i6 s12 (…)

a5

a9

a16

a52

(…)

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Prototypes

• Averaged shapes of real handwritten characters• Dynamic Time Warping-distance to find most similar

prototype

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

R. Niels & L. Vuurpijl & L. Schomaker. Automatic allograph matching in forensic writer identification. International Journal of Pattern Recognition and Artificial Intelligence. Vol. 21, No. 1. Pages 61-81. February 2007.

Prototypes

The IR model for writer identification

Character to prototype matcher

Indexing

Matching

Character to prototype matcher

Writer input

Query input

Prototype list

af(q)

af(w) aw(w)

Rankedlist

Justification

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Indexing: create weighted vectors

• Vector of prototype usage for each writer: af(w)• Adjust weight of prototypes in that vector:

• Protos used by many writers: not distinctive -> lower weight• wf(p) = number of writers using proto p

• Weighted vector of prototype use for each writer

)log()( )(2

pwfnpiwf

)()()( piwfwafwaw p

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

The IR model for writer identification

Character to prototype matcher

Indexing

Matching

Character to prototype matcher

Writer input

Query input

Prototype list

af(q)

af(w) aw(w)

Rankedlist

Justification

Prototype frequency in query

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

The IR model for writer identification

Character to prototype matcher

Indexing

Matching

Character to prototype matcher

Writer input

Query input

Prototype list

af(q)

af(w) aw(w)

Rankedlist

Justification

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Matching

• Input• ‘Database writers’: Indexed writer vectors aw(w)• ‘Query writer’: Vector af(q)

• Match:• Calculate cosine of angle between af(q) and each aw(w)

• Output• Ranked list of writers (similarity to query)

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

The IR model for writer identification

Character to prototype matcher

Indexing

Matching

Character to prototype matcher

Writer input

Query input

Prototype list

af(q)

af(w) aw(w)

Rankedlist

Justification

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Justification

• Similarity value (cosine of angle)• Prototype contribution to retrieval result

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Justification

• Forensic expert can further inspect justification

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Experiment

• 43 writers from plucoll database• Online data• Segmented into characters

• How well does our technique perform given a certain amount of data (characters)?• Amount of characters in database (d)• Amount of characters in query (q)

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Experiment

• Pick d random letters from each database writer• Pick q random other letters from one writer,

and use those as query• Find most similar writer

• Prototypes• iwf(p), aw(w)• Matching

• Vary d and q

Repeat 10 times for each writer

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Repeat10 times for each comb. ofd and q

Results

100 300 500 1000

10 59 79 83 88

30 86 97 99 100

50 94 99 100 100

70 96 100 100 100

100 98 100 100 100

dq

d

q

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Conclusions & future work

• Needed for 100%: 70 chars (q), 300 chars (d)• Average English sentence: 75-100 characters

• No black box: results are justified

• Online data: forensic practice?• Extract semi-automatically with help expert• Use offline matching technique

• Just 43 writers• Bigger (n writers & n techniques) experiments planned

• Promising results

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

Writer identification throughinformation retrieval

Ralph Niels

Franc Grootjen

Louis Vuurpijl

A search engine for forensic experts