Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós...
-
Upload
allison-may -
Category
Documents
-
view
215 -
download
1
Transcript of Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós...
![Page 1: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/1.jpg)
2nd "NameGame" APE-INV workshop 1
Mr. JOTL: A User Friendly Matching Software
Stéphane Lhuillery, Julio Raffo & Fernando Lladós
December 2010
![Page 2: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/2.jpg)
2nd "NameGame" APE-INV workshop 2
Outline
• Background• Objectives & Rationale• Results• User Friendly Software
– Concept– Alpha test
• Further steps
December 2010
![Page 3: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/3.jpg)
2nd "NameGame" APE-INV workshop 3
Background
• Automatic patent retrieval is becoming compulsory due to the size of data sets.
• Growing literature looking at this NameGame:– On firms’ names: Derwent, 2002; Mageman et al., 2006; Hall,
2006; Thoma et al. 2007.
– On inventors’ names: Trajtenberg et al., 2006; Hoisl, 2006; Lissoni et al., 2006; Mariani et al., 2007; Raffo & Lhuillery, 2009; etc.
• Our ESF Project outcomes:– New matching best practices
– APE-INV database
December 2010
![Page 4: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/4.jpg)
2nd "NameGame" APE-INV workshop
Minimize False positive(=higher precision)
Minimize False negative(=higher recall)
Objectives of the NameGame
December 2010 4
?
MaximizingTrue positives
![Page 5: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/5.jpg)
2nd "NameGame" APE-INV workshop
Rationale behind: A three step game
December 2010 5
![Page 6: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/6.jpg)
2nd "NameGame" APE-INV workshop
Examples on matching (EPFL)
6December 2010
![Page 7: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/7.jpg)
2nd "NameGame" APE-INV workshop
Examples on filtering (EPFL)
7December 2010
![Page 8: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/8.jpg)
2nd "NameGame" APE-INV workshop 8
What we learned so far?
• General– Matching algorithms are not perfect, but improve
considerably the results.
• Cleaning step– Data origin changes substantially the data preparation
process
• Matching step– There is a hierarchy pattern across algorithms, although
specific to each particular case
• Filtering step– Supplementary data availability enhances or constraints
the disambiguation process
December 2010
![Page 9: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/9.jpg)
2nd "NameGame" APE-INV workshop 9
Why to create a user friendly software?
December 2010
PATSTAT /APE-INV Database
Survey
PATVAL
EU FWProgram SCOPUS
ISI Thomson
![Page 10: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/10.jpg)
2nd "NameGame" APE-INV workshop 10
Concept behind Mr. JOTL
• Intuitive for beginner users• Flexible on inputs and its preparation• Fair variety of standard matching processes• Adaptable on the disambiguation filters• But soundly customizable for advanced users• Conceived and coded to be expanded in the
future by multiple developers
December 2010
![Page 11: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/11.jpg)
2nd "NameGame" APE-INV workshop 11
From concept to real• (ok for the moment just an alpha!)
December 2010
![Page 12: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/12.jpg)
Inputs
IPTS, Sevilla May 2010. 12
![Page 13: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/13.jpg)
13IPTS, Sevilla May 2010.
Parsing
![Page 14: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/14.jpg)
Matching
IPTS, Sevilla May 2010. 14
![Page 15: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/15.jpg)
Disambiguation
IPTS, Sevilla May 2010. 15
SSM
![Page 16: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/16.jpg)
2nd "NameGame" APE-INV workshop 16
LET’S TEST IT!
December 2010
![Page 17: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/17.jpg)
2nd "NameGame" APE-INV workshop 17
Technical notes• OS supported (so far):
– Windows XP, Vista, Seven (Server & x64)
• Coded in C sharp– Pros:
• Free Development Environment• Low cost of entry• Large Developer community
– Cons:• Proprietary language and libraries• Less performing memory management
• Libraries needed: Scintella: open source lexer, syntax highlighter
• Customizable code:– C sharp & VBA
• Suggested environment for future development:– Visual Studio (Express version is free to use)– Mono in Linux
December 2010
![Page 18: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/18.jpg)
2nd "NameGame" APE-INV workshop 18
Further developments
• Full coding existing algorithms.• Testing performance against large dataset
(>Million records).• Pre-setting standard routines (as XML).• Drafting documentation (+Video).• Proof-testing with first time users (at EPFL).
December 2010
![Page 19: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/19.jpg)
2nd "NameGame" APE-INV workshop 19
Openness and its governance
• How to share it?– GitHub?– Forums
• How to develop a dynamic sharing community?
December 2010
![Page 20: Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.](https://reader035.fdocuments.net/reader035/viewer/2022081519/56649e195503460f94b069b6/html5/thumbnails/20.jpg)
2nd "NameGame" APE-INV workshop 20
Thank you!
December 2010