Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós...

20
Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 2010 1 2nd "NameGame" APE-INV workshop

Transcript of Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós...

Page 1: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 1

Mr. JOTL: A User Friendly Matching Software

Stéphane Lhuillery, Julio Raffo & Fernando Lladós 

December 2010

Page 2: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 2

Outline

• Background• Objectives & Rationale• Results• User Friendly Software

– Concept– Alpha test

• Further steps

December 2010

Page 3: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 3

Background

• Automatic patent retrieval is becoming compulsory due to the size of data sets.

• Growing literature looking at this NameGame:– On firms’ names: Derwent, 2002; Mageman et al., 2006; Hall,

2006; Thoma et al. 2007.

– On inventors’ names: Trajtenberg et al., 2006; Hoisl, 2006; Lissoni et al., 2006; Mariani et al., 2007; Raffo & Lhuillery, 2009; etc.

• Our ESF Project outcomes:– New matching best practices

– APE-INV database

December 2010

Page 4: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop

Minimize False positive(=higher precision)

Minimize False negative(=higher recall)

Objectives of the NameGame

December 2010 4

?

MaximizingTrue positives

Page 5: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop

Rationale behind: A three step game

December 2010 5

Page 6: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop

Examples on matching (EPFL)

6December 2010

Page 7: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop

Examples on filtering (EPFL)

7December 2010

Page 8: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 8

What we learned so far?

• General– Matching algorithms are not perfect, but improve

considerably the results.

• Cleaning step– Data origin changes substantially the data preparation

process

• Matching step– There is a hierarchy pattern across algorithms, although

specific to each particular case

• Filtering step– Supplementary data availability enhances or constraints

the disambiguation process

December 2010

Page 9: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 9

Why to create a user friendly software?

December 2010

PATSTAT /APE-INV Database

Survey

PATVAL

EU FWProgram SCOPUS

ISI Thomson

Page 10: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 10

Concept behind Mr. JOTL

• Intuitive for beginner users• Flexible on inputs and its preparation• Fair variety of standard matching processes• Adaptable on the disambiguation filters• But soundly customizable for advanced users• Conceived and coded to be expanded in the

future by multiple developers

December 2010

Page 11: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 11

From concept to real• (ok for the moment just an alpha!)

December 2010

Page 12: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

Inputs

IPTS, Sevilla May 2010. 12

Page 13: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

13IPTS, Sevilla May 2010.

Parsing

Page 14: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

Matching

IPTS, Sevilla May 2010. 14

Page 15: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

Disambiguation

IPTS, Sevilla May 2010. 15

SSM

Page 16: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 16

LET’S TEST IT!

December 2010

Page 17: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 17

Technical notes• OS supported (so far):

– Windows XP,  Vista, Seven (Server & x64)

• Coded in C sharp– Pros: 

• Free Development Environment• Low cost of entry• Large Developer community

– Cons:• Proprietary language and libraries• Less performing memory management

•  Libraries needed: Scintella: open source lexer, syntax highlighter

•  Customizable code:– C sharp & VBA

•  Suggested environment for future development:– Visual Studio (Express version is free to use)– Mono in Linux

December 2010

Page 18: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 18

Further developments

• Full coding existing algorithms.• Testing performance against large dataset

(>Million records).• Pre-setting standard routines (as XML).• Drafting documentation (+Video).• Proof-testing with first time users (at EPFL).

December 2010

Page 19: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 19

Openness and its governance

• How to share it?– GitHub?– Forums

• How to develop a dynamic sharing community?

December 2010

Page 20: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

2nd "NameGame" APE-INV workshop 20

Thank you!

December 2010