(,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of...

14
Citation: 90 J. Pat. & Trademark Off. Soc'y 448 2008 Content downloaded/printed from HeinOnline (http://heinonline.org) Tue Oct 4 18:34:19 2011 -- Your use of this HeinOnline PDF indicates your acceptance of HeinOnline's Terms and Conditions of the license agreement available at http://heinonline.org/HOL/License -- The search text of this PDF is generated from uncorrected OCR text. -- To obtain permission to use this article beyond the scope of your HeinOnline license, please use: https://www.copyright.com/ccc/basicSearch.do? &operation=go&searchType=0 &lastSearch=simple&all=on&titleOrStdNo=0882-9098

Transcript of (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of...

Page 1: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

Citation: 90 J. Pat. & Trademark Off. Soc'y 448 2008

Content downloaded/printed from HeinOnline (http://heinonline.org)Tue Oct 4 18:34:19 2011

-- Your use of this HeinOnline PDF indicates your acceptance of HeinOnline's Terms and Conditions of the license agreement available at http://heinonline.org/HOL/License

-- The search text of this PDF is generated from uncorrected OCR text.

-- To obtain permission to use this article beyond the scope of your HeinOnline license, please use:

https://www.copyright.com/ccc/basicSearch.do? &operation=go&searchType=0 &lastSearch=simple&all=on&titleOrStdNo=0882-9098

Page 2: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

An Analysis of PatentSearch Systems

Shigeyuki Sakurai' and Alfonso F. Cardenas2

1. IntroductionThe number of patent applications isgrowing steadily all over the world.Approximately 445,000 and 467,000 patentapplications are filed to the USPTO in FY2006 and FY 2007 respectively 3. As forinternational patent applications underPCT, approximately 149,000 and 156,000applications are filed to the WIPO in 2006and 2007 respectively'. The number ofPCT applications is continuously growingby around 5% per year'.

The enormous number of patent appli-cations becomes a large backlog for patentauthorities such as the USPTO, and fur-thermore, the enormous number ofpatents granted means much more diffi-culty in prior-art search during futurepatent examination.

At the same time, for the applicants andprivate companies, the unprecedented vol-ume of patent applications makes it muchmore difficult to research and find con-tending technologies of other companies.

To analyze the increasing enormouspatent documents, a powerful search sys-tem for patent documents and the contin-uous development of it are significantlyimportant. It helps patent authoritiesimprove their efficiency of prior-art searchand applicants improve their efficiency ofsearch for contending technologies ofother companies.

Although commercial patent searchsystems don't disclose the inside struc-ture and the algorithms of their systemsclearly, in this research we analyze theexisting public and commercial patent

1 Visiting Scholar at Computer Science Department, University of California Los Angeles. Also Patent Examiner at 4th PatentExamination Department, Japan Patent Office, Tokyo, Japan

The opinions in this article are merely the current opinions of the authors.

2 Professor at Computer Science Department, University of California Los Angeles.3 USPTO, Performance and Accountability Report Fiscal Year 2007, page 109, 2007.

4 WIPO, 'Unprecedented Number of International Patent Filings in 2007' (PR/2008/536), Feb. 21, 2008.http://www.wipo.int/pressroomlen/articles/2008/article0006.ht ml.

5 WIPO, WIPO Overview 2007 edition, page 32, 2007.

Page 3: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

JUNE 2008 AN ANALYSIS OF PATENT SEARCH SYSTEMS

search systems. After that, we considerthe evaluation of patent search systems.Also, we consider evaluation data setsessential for promoting development ofpatent search systems.

2. Analysis of Public and Com-mercial Patent Search Systems

2-1. Patent search systemsIn this section we address existing public,i.e. free, and commercial patent searchsystems. Table 1 shows search algorithmhighlights, searchable fields and opera-tors, result ordering, patent corpus, andcost of each existing patent search system.

The most basic patent search system isthe full-text patent search system of theUSPTO. We can access the system on theInternet at no cost. Within the systemthere are two web pages corresponding totwo databases, issued patents and pub-lished applications. Therefore we canconduct full-text search of publication ofissued patents and published unexam-ined applications. The system offers basicoperators such as AND, OR, and AND-NOT. The query "A ANDNOT B"retrieves documents that contain A anddo not contain B. After retrieval, it showsa list of documents in reverse chronologi-cal order, which means newer publica-tions come first.

Many patent authorities are adoptingsimple word finding full-text search,because it is fail-safe. Word finding full-text search retrieves all documents thatcontain all words in a user query.However, the result, i.e. a list of docu-ments, contains a lot of documents, so

users have to use more qualifying wordsto narrow the result.

Google Patent Search was launched inDecember 2006; thus it is a rather newsystem. This system scores retrievedpatent documents and shows higherscored documents first. It says "As withGoogle Web Search, we rank patentresults according to their relevance to agiven search query. We use a number ofsignals to evaluate how relevant eachpatent is to a user's query, and we deter-mine our results algorithmically"6. Butthe detail of the search algorithm ofGoogle Patent Search is not provided.(Note that Google Web Search scoresresults based on PageRank algorithm.)

Delphion is a commercial patent searchsystem. This system scores results by thefrequency and location of search words.Basic search allows searching of the frontpage for free, while Premier search allowssearching for the full-text of the wholedocument at a price.

There is a study, [Larkey 19991, whichalso uses search word frequency. Thisstudy makes a ranked list of retrievedpatents based on tf-idf, i.e. term frequen-cy-inverse document frequency, weight-ing score. Therefore it calculates howmany times the search words appear ineach document, normalizes the wordcounts, and makes a vector of the wordcounts for each document.

PatentCafe also uses a vector of wordcounts. Judging from the explanation ofPatentCafe, it compares the vector ofsearch words to vectors on the patentdatabase and ranks the result. PatentCafeis a commercial system.

6 Cited from 'About Google Patent Search' http://www.google.com/googlepatents/about.html.

7 Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, 'rhe PageRank Citation Ranking: Bringing Order to the Web',

Stanford Digital Library Technologies Project, 1998.

JUNE 2008 AN ANALYSIS OF PATENT SEARCH SYSTEMS

Page 4: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

Table 1: Patent Search Systems

Name Search Algorithm, Searchable Fields, Result Patent Corpus CostOther Features Operators Ordering

USPTO Full Text Search -All Fields Reverse chrono- Full text of US patent $0Patent Full-Text and Full- -Specific Fields: logical order: publication (1976-)Page Image Databases Patent Number newer comes (Patents of 1790-1975

Issue date first are searchable only byhttp://www.uspto.govl US Classification Issue Date, Patentpatft/index.html and so on Number, and Current

US Classification.)-Operators:

[Quick search]: Full text of US patentAND, OR, application publicationANDNOT (Mar. 15th 2001-)

[Advanced search]:AND, OR,ANDNOT,Unifier parenthesis"0111Phrase operator"<phrase>"

Google Patent Search Using a number of -All Fields Scored US patent publication $0signals to evaluate -Specific Fields: Ranking: higher (1790s-)

http:lwww.google.coml how relevant each Patent Number scored comespatents patent is to a user's Issue date first

query (cited from US Classification"About Google and so onPatent Search")

-Operators:AND, OR,Without operator"-" (=ANDNOTI,Phrase operator"<phrase>"

Delphion(The ThomsonCorporation)

http:llwww.delphion.com/simple

Front page Search(Basic)

Full Text Search(Premier)

Scoring by thefrequency and loca-tion of searchword(s) (cited fromDelphion)

Proximity Search,Wild card,Thesaurus(Premier, Unlimited)

Citation Link tool(which showscitation graph)

-Front page (Basic)

-All Fields (Premier,Unlimited)-Specific Fields:

<in> operator-Operators:

AND, OR,NOT(=ANDNOT)Wild card:? (1 character* (0- characters)

Proximity:<near/n><order><sentence><paragraph>

Thesaurus:<thesaurus>

ScoredRanking: higherscore comesfirst

US, EP, DE, JP, CH,WO, DWPI, INPADOCUS patent publication(full text is 1974-, bib-liographic text is 1971-)

Basis registration:Quick searchingagainst the US grantedbibliographic collec-tion and Patent num-ber searching againstworldwide collections

Premier:$124.501M0

Unlimited:$249/Mo

SHIGEYUKI SAKURAI AND ALFONSO F CARDENAS JPTOS

Page 5: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

AN ANALYSIS OF PATENT SEARCH SYSTEMS 451

Table 1: (continued)

Name I

PatentCafe

http:/www.patentcafe.com/

FreepatentsOnine(started by a formerpatent searcher)

http:Ilwww.freepatentsonline.com/

MicroPatent Patent Web(The ThomsonCorporation)

http:Iiwww.micropat.comstaticlpatentweb.htm

Search Algorithm,Other Features

Searchable Fields,Operators

-- - -- - - - -- - - - - - - -- -- - -- - -LLinguistics engineconverts enteredwords into a con-cept, representedby a mathematicalvector. ProSearchmatches and ranksthe vector to similarvectors inPatentCafe'sSemantic database(cited fromPatentCafe)

Advanced searchtechniques:-proximity searching-search termweighing

-latent semanticsearch

Full Text Search

(contains Forwardand backwardcitation data)

PatentCafe allows to usefull calims section as aquery.

Also it allows certainspecific paragraphs inspecification as a query(cited from PatentCafe)

ResultOrdering

ScoredRanking: higherscore comesfirst

Patent Corpus Cost

US, EP, JP, DE, CA,FR, GB, WO(PCT),

I 1 4 .1 ______

-All Fields-Specific Fields

-Operators:Proximity:

"cat dog",,5('cat' within 5 wordsof 'dog')

Relevancy Weight:catA5 OR dog (Cat is 5 times moreimportant to the rele-vancy of documentsthan dog)

Wildcard:

-Multi-field search

-Operators:Boolean, proximity, andtruncation

Users canchoose Reversechronologicalorder orRelevancy order

US, EP, PAJ(PatentAbstracts of Japan),WO(PCT)

1 4 -I-(unknown) FullText database: US,

DE, EP, GB, JP, andWO

INPADOC database(EPO examiner'sdatabase)

AUREKA (charged tool of citation analysis ChargedMicroPatent (this tool shows

citation graph)http:1lwww.micropat.com/staticlaureka.htm

LexisNexis TotalPatent (unknown) -Multiple Field (unknown) 22 full-text databases subscrip--Multiple Operators of the major patent tion basis,

http:llwwwlexisnexis.coml -Multiple Wildcards authorities also avail-ip/totalpatent/ able trans-

actionally

$499/Mo

$0

First $50deposit isneeded.

Daily orAnnualsubscrip-tion

JUNE 2008

Page 6: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

JPTOSSHIGEYUKI SAKURAI AND ALFONSO F. CARDENAS

Table 1: (continued)

DEPATISnet(German Patent andTrade Mark Office)

http:Ilwww.dpma.delservice/depatisnet.html

Search Algorithm,Other Features

Full Text Search

Searchable Fields,Operators

-Multiple Field

-Operators:AND, OR,

NOT(=ANDNOTIWild card:!(1 character)? (0- characters)# (0-1 characters)

Proximity:(F) terms within sameparagraph(L) terms within samefield(A) finds search termsin any order, with nowords between them.(#A) # defines themaximum number ofwords between thesearch terms.(W) finds search termsin order with nowords between them.(NOTW) search termsin order, but notimmediately next toeach other(#W) # defines themaximum number ofwords between thesearch terms.

Result Ordering

User can selecthow to order theresult.Chronological,Reverse-Chronological,Applicationdate, title, andso on.

Patent Corpus

DE (Full Text) WO, GB,US, JP(Title & Abstract only)

[Larkey 1999]' Search using tf-idf All fields ranked list with US (1980-1996) (Used at(University of algorithm tf-idf score USPTO)Massachusetts Centerfor IntelligentInformation Retrieval)

[Li et al. 2007]' Classification Citation field ranked list of US (451,853 patents,(University of Arizona algorithm based on similarity of 1/111990-)Department of "citation network" citation networkManagement on patents labeled withInformation Systems) subclass

[Fall et al. 2003]" IPC classification Claims, First 300 words, Assigned IPC WIPO-alpha(WIPO, and so on) using Support Vector Abstracts class and 1998-2002)

Machine algorithm subclass

8 Leah S. Larkey, 'A Patent Search and Classification System', Proc. DL-99, 4th ACM Conference on Digital Libraries, pages 179-187,1999.

9 Xin Li et al., 'Automatic Patent Classification using Citation Network Information: An Experimental Study in Nanotechnology',Proc. JCDL-07, ACM 2007 conference on Digital Libraries, June 2007.

10 C.J. Fall, A. Tdrcsviri, K. Benzineb, G. Karetka, 'Automated categorization in the international patent classification', ACM SIGIRForum archive, Volume 37, Issue 1, Pages 10-25, Spring 2003.

T

$0

J

Table 1-(continued)

Page 7: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

JUNE 2008 AN ANALYSIS OF PATENT SEARCH SYSTEMS

FreePatentsOnline is a free patentsearch system. Its web page mentions thatFreePatentsOnline is using advancedsearch techniques such as search termweighing and "latent semantic search".However, what "latent semantic search"means is unclear. FreePatentsOnline isinteresting because it offers weightingoperator for search words to modifyscored ranking.

MicroPatent has an interesting toolnamed AUREKA, which contains thefunction of citation analysis. (FIGURE 1)The function presents a citation graph, inwhich cited patents are graphically con-nected to a patent citing it using a line.Delphion also has a similar tool namedCitation Link. (FIGURE 2)

FIGURE 1: AUREKA Advanced Patent Analysis of MicroPatent"

FIGURE 2: Citation Link of Delphion'2

11 Cited from http://www.micropat.com/static/advanced.htm.

12 Cited from http://www.delphion.com/products/research/products-citelink.

AN ANALYSIS OF PATENT SEARCH SYSTEMSJUNE 2008

Page 8: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

SHIGEVUKi SAKURAI AND ALFONSO F. CARDENAS JPTOS

There is an interesting study using thecitation relation among patents. [Li et al.2007]13 leverages citation relation for auto-matic classification. It compares randomwalk paths of a patent to be classified andprior patents. The random walk paths arelabeled with classification categories.(FIGURE 3) They found that the algorithmof the labeled graph kernel (KGra) has86.67% accuracy and 88.04% class aver-aged F-measure.

DEPATISnet managed by the GermanPatent and Trade Mark Office has someunique features, although its main cor-pus is Germany patents. One feature isthe number and variety of useful proxim-ity operators, e.g. an operator that findssearch words within the same paragraphor the same field, and an operator thatfinds search words in any order (or inorder) within the maximum number ofwords between them. Delphion also hasthe similar proximity operators. Anotherfeature of DEPATISnet is result ordering.

Users can select result ordering fromamong several options such as reverse-chronological, application date, title, andso on.

2-2. Summary of search algorithms

We made a survey of the public and com-mercial patent search systems. We cansummarize that there are three types ofpatent search algorithms:

(1) Word finding full-text search withindex (e.g. USPTO, DEPATISnet)

(2) Semantically scoring search (e.g. tf-idf, Delphion, PatentCafe)

(3) Search using metadata (e.g. citationrelation)

Of course it is possible to combine two ofthese three types in order to implement apatent search engine.

Almost all free patent search systemsare adopting word finding full-text searchusing index. Some non-free search sys-

FIGURE 3: Random walk paths on a patent citation network14

s Random walk path

2. S-

C, *; 3. -C4. S-.C-,2

5. S-'C1-*-C1C, 7. S---.---C4

c C4 8. s---r-*cr-*cC, I 10. S-Cr-Cr-+C3

S: Start (patent to be classified or prior patent)C: Cited patent node labeled with classification category

13 Xin Li et al., 'Automatic Patent Classification using Citation Network Information: An Experimental Study in Nanotechnology',Proc. JCDL-07, ACM 2007 conference on Digital Libraries, June 2007.

14 Cited from [Li et al. 2007].

JPTOSSHIGEYUKI SAKURAI AND ALFONSO F. CARDENAS

Page 9: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

JUNE 2008 AN ANALYSIS OF PATENT SEARCH SYSTEMS

tems implement so-called semantic searchalgorithms, which are not clearlyexplained in the web pages of the systems.Some vendors, e.g. Delphion and AURE-KA, offer the visualization tool of citationrelation among patent documents.

2-3. Summary ofsearchable Fields

We can conduct field specific search, e.g.patent number search or inventor namesearch, with almost all patent search ven-dor software. This is essential for retriev-ing a specific patent document.

2-4. Summary of operators

The common and popular operatorsamong patent search systems are:

- Boolean: AND, OR, ANDNOT,unifier 0

- Phrase: "<phrase>"-Wild card: exact 1 character, 0 charac-

ter or more, 1 character or more-Proximity: within the same para-graph, within the same field, within nwords in order, within n words in anyorder

2-5. Summary of result ordering

The most basic result ordering is reverse-chronological (newer publication datecomes first). Some search systems scorethe results according to their algorithmand show the results in scored rankingorder. Moreover, some systems such asFreePatentsOnline offer a weighting oper-ator for search words to modify scoredranking. DEPATISnet allows to order theresult, e.g. reverse-chronological, applica-tion date, or patent number order.

3. Evaluation of PatentSearch SystemsLet us compare the patent search systemabove. We selected three search softwareproducts, USPTO, Google Patent Search,and FreePatentsOnline, because they aretotally free.

We retrieved one sample patent, PatentNo. 7,163,290. This patent claims babysunglasses having a band comprising amesh piece. With the use of Public PAIR 5

that shows file wrapper of patent appli-cations, we can see the office actions writ-ten by the patent examiner in charge ofthe sample patent. During prosecution,the examiner cites three prior-art patents,No. 5,926,855, 5,481,763, and 5,042,094.So we investigated whether these patentsearch systems can retrieve the citedprior-art patents.

We made a query suitable for the sam-ple patent referring to the examiner'ssearch strategy in Public PAIR: "((eye-glass OR spectacle OR eyewear) ANDband AND mesh) AND 1/1/1976<=applicationdate<=12/28/2004". Themeaning of the query is to retrieve patentdocuments that contain the words, "eye-glass OR spectacle OR eyewear" ANDband AND mesh, and the patent applica-tion date between 1/1/1976 and12/28/2004. We limited the query withthe range of application date because thesample patent was filed on Dec. 28, 2004and the database of the full-text patentsearch system of USPTO has data from1976. The result of retrieval by the threesearch software products, USPTO, GooglePatent Search, and FreePatentsOnline isshown on Table 2. The table shows theresults for the query formally indicated in

15 http://portal.uspto.gov/external/portal/pair.

JUNE 2008 AN ANALYSIS OF PATENT SEARCH SYSTEMS

Page 10: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

SHIGEYUKI SAKURAI AND ALFONSO E CARDENAS JPTOS

the column and in plain English. We bers of the six patents are 7,239,802,retrieved six more sample patents and 6,909,889, 7,274,290, 7,232,976, 7,152,184,conducted the same research work show- and 7,095,829.ing the results also in Table 2. The num-

TABLE 2: The result of retrieval on search systems 6

Case Patent No. Query USPTO Google Patent FreePatentsOnlineNo. & Issue Date Search

(Number of * The meanings of these queries are number cited number cited number citedpatents cited by displayed in the table below, of patents of patents of patents

examiner) patents found patents found patents foundretrieved retrieved retrieved

#1 7,163,290 ((eyeglass OR spectacle OR eyewear) 69 1 8 1 192 1Jan. 16, 2007 AND band AND mesh) AND

(3) Application- date/ll/1976->1212812004

#2 7,239,802 (USClass/715/805 OR USClass70711) 383 2 204 1 486 2Jul. 3, 2007 AND select AND button AND

(3) Application -datell/ll1976->11/30/2001

#3 6,909,889 Jun. (mobile OR wireless OR cellS) AND 36 1 1 0 39 121, 2005 (3) digital AND (photo OR image) AND

"print condition" AND Application (Worddate/lll1976-> 10/5/2001 Stemming

is offbecauseusingwildcard)

#4 7,274,290 medical AND tray AND wireless 317 1 111 1 423 1Sep. 25, 2007 AND mobile AND Application-

(5) date/l/l/1976->5/11/2006

#5 7,232,976 !mouse pad" AND therapeutic AND 31 1 9 1 51 1Jun. 19, 2007 Application-date/11/1976->

(4) 1/18/2006

#6 7,152,184 (USClass/707/204 OR 348 3 32 1 522 4Dec. 19, 2006 USClass/711/162 OR USClass/714/5)

(5) AND input AND update AND sec-ond AND third AND Application-datell/111976-> U/22/2001

#7 7,095,829 (USClass/7071200 OR 148 3 65 2 285 3Aug. 22, 2006 USClass/707/204 OR

(6) USClass/7091206 ORUSClass/709/225 ORUSClass/379/88.23) AND (mail ORmessage) AND archive AND recipientAND Applicationjdatell/l/1976->9/7/2004

16 Using database of issued patents only, not using database of published applications.

Page 11: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

Case No. Meaning of the query

#1 To retrieve patent documents that contain the words, "eyeglass OR spectacle OR eyewear" AND band AND mesh, andthe patent application date between 1/111976 and 12/2812004.

#2 To retrieve patent documents that are classified to 7151805 OR 70711 under U.S. Patent Classification, and that containthe words, select AND button, and the patent application date between 1/1/1976 and 11/3012001.

#3 To retrieve patent documents that contain the words, "mobile OR wireless OR cellS ($ is indicating truncation.)" ANDdigital AND "photo OR image" AND "print condition" (phrase), and the patent application date between 11111976 and10/5/2001.

#4 To retrieve patent documents that contain the words, medical AND tray AND wireless AND mobile, and the patentapplication date between 1/111976 and 5/11/2006.

#5 To retrieve patent documents that contain the words, "mouse pad" (phrase) AND therapeutic, and the patent applica-tion date between 11/111976 and 1/1812006.

#6 To retrieve patent documents that are classified to 7071204 OR 711/162 OR 714/5 under U.S. Patent Classification, andthat contain the words, input AND update AND second AND third, and the patent application date between 1/111976and 11/22/2001.

#7 To retrieve patent documents that are classified to 707/200 OR 7071204 OR 7091206 OR 7091225 OR 379188.23 underU.S. Patent Classification, and that contain the words, "mail OR message" AND archive AND recipient, and the patentapplication date between 1/111976 and 91712004.

The result shows that the three search sys-tems found different numbers of patentssatisfying the query, and found only asubset of the number of prior-art patentscited by the patenf examiner in charge ofeach patent (the number of prior-artpatents cited by the examiner is indicatedin parenthesis under the patent number inthe table).

We found the fact that the three systemsretrieved the closest prior-art documentscited by the patent examiner. We call theclosest prior-art "the main prior-art". Forexample, for case #1 (Patent no. 7,163,290),Patent 5,926,855 is cited as a prior-art inthe actions written by the examiner andthe prior-art patent is the main prior-artdocument. All three systems retrieved themain prior-art document for case #1.Whether or not a patent search system canretrieve the main prior-art documents thatare the closest as a whole to the applica-tion at issue is important.

In addition, we can say that afterretrieval of the main prior-art patent,

examiners conduct other searches usingother queries to find the rest of the prior-art documents. These can be combinedwith the main prior-art document tobridge the difference between the mainprior-art document and the patent appli-cation at issue (e.g. case #1). So we needother queries to retrieve the rest of theprior-art documents. We call the rest of theprior-art "the sub prior-art".

Our research shows, as illustrated inTable 2, that Google Patent Searchretrieves fewer documents when we addmore search terms in a query. This isbecause the aim of Google Patent Search isdifferent from other patent search systems.It places more emphasis on focus, i.e. nar-rowing down documents retrieved, andscored relevancy ranking of documentsretrieved. The search system of the USPTOuses a simple word finding algorithm inorder to be fail-safe, and retrieves moredocuments than Google Patent Search.FreePatentsOnline retrieves the most num-ber of documents that are thought to be

JUNE 2008 AN ANALYSIS OF PATENT SEARCH SYSTEMS

Page 12: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

SHIGEVUKI SAKURAI AND ALFONSO F. CARDENAS JPTOS

relevant to a user query. One of the reasonsfor this is the Word Stemming function ofFreePatentsOnline, which expands eachword of a user query to other forms of theword. Note that within FreePatentsOnlinewe can choose result ordering: reversechronological order or relevancy order.This is a useful function.

One possible usage of patent search sys-tems is to combine use of different types ofthe systems. First we can use GooglePatent Search to retrieve a narrowed set ofprior-art documents in relevancy order.After that we can use a fail-safe search sys-tem such as the USPTO system orFreePatentsOnline to retrieve a more com-prehensive set of prior-art documents.

The discussion in this section addressesprior-art patent search. However, we rec-ognize that patent examiners also conductnon-patent literature searches as well aspatent searches. Patent examiners oftenalso cite documents of non-patent litera-ture such as proceedings of conference orarticles in magazines. Thus, there is apotential need for a system to search non-patent literature.

4. Test Set and Training Setof Patent PublicationsSome projects such as WIPO-alpha man-aged by WIPO 7 and NTCIR in Japan"5

have provided test sets and training setsof patent publications for researchers ofpatent search algorithms. However, thedata sets are available on a registrationbasis and for a limited time.

To promote research for automaticpatent search algorithms, patent authori-ties such as the USPTO should provide anexperimental set of patent documents. Wesuggest a possible experimental set ofpatent documents that contains:

Training collection of patent publica-tions

Target test collection of publishedpatent applications- Prior-art publications as an answer to

target patent applications

We suggest providing the patent publica-tions cited in the office actions written bypatent examiners, i.e. Non-FinalRejections and List of references cited byexaminer, as an answer to target patentapplications. Patent examiners try to findthe closest prior-art during examination.Therefore, we can use the patent publica-tions cited by patent examiners as ananswer to target patent applications. Thepatent publications cited by patent exam-iners are also shown in the field of"References Cited" on the cover page ofpatent publications with asterisk.

5. ConclusionIn this research, we analyzed existingpublic and commercial patent search sys-tems, investigated the search algorithmsand search results for sample patents, andconsidered the common operators and theresult ordering. Continuous research anddevelopment of patent search systems are

17 C.J. Fall, A. Tbrcsviri, K. Benzineb, G. Karetka, 'Automated categorization in the international patent classification', ACM SIGIRForum archive, Volume 37, Issue 1, Pages 10-25, Spring 2003.

18 Sumio Fujita, 'Technology survey and invalidity search: A comparative study of different tasks for Japanese patent documentretrieval', Information Processing and Management: an International Journal archive, Volume 43, Issue 5, pages 1154-1172 September2007.

JPTOSSHIGEYUKI SAKURAI AND ALFONSO F CARDENAS

Page 13: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

jUNE 2008 AN ANALYSIS OF PATENT SEARCH SYSTEMS

significantly important to acceleratepatent examination and streamline searchfor contending technologies. To promotethe research, providing testing and train-

ing sets of patent documents is essential.We hope for a more active and open dis-cussion of established patent search sys-tems and algorithms.

REFERENCES

USPTO, Performance and AccountabilityReport Fiscal Year 2007, page 109, 2007.

WIPO, 'Unprecedented Number ofInternational Patent Filings in 2007'(PR/2008/536), Feb. 21, 2008.http://www.wipo.int/pressroomlenlarticlesl2008/article_0006.html.

WIPO, WIPO Overview 2007 edition, page32, 2007.

'About Google Patent Search'http://www.google.com/googlepatents/about.html.

Lawrence Page, Sergey Brin, RajeevMotwani, Terry Winograd, 'The PageRankCitation Ranking: Bringing Order tothe Web', Stanford Digital LibraryTechnologies Project, 1998.

USPTO Patent Full-Text and Full-PageImage Databaseshttp://www.uspto.gov/patft/index.html.

Google Patent Searchhttp://www.google.com/patents.

Delphionhttp://www.delphion.com/si-mple.http://www.delphion.com/products/research/products-citelink.

PatentCafehttp://www.patentcafe.com/.

FreePatentsOnlinehttp://www.freepatentsonline.com/.

MicroPatent Patent Webhttp://www.micropat.com/static/patentweb.htm.http://www.micropat.com/static/advanced.htm.

AUREKAhttp://www.micropat.com/static/aureka.htm.

LexisNexis TotalPatenthttp://www.lexisnexis.com/ip/totalpatent/.

DEPATISnethttp://www.dpma.de/service/depatisnet.html.

[Larkey 19991 Leah S. Larkey, 'A PatentSearch and Classification System', Proc.DL-99, 4th ACM Conference on DigitalLibraries, pages 179-187, 1999.

[Li et al. 2007] Xin Li et al., 'AutomaticPatent Classification using CitationNetwork Information: An ExperimentalStudy in Nanotechnology', Proc. JCDL-07,ACM 2007 conference on Digital Libraries,June 2007.

JUNE 2008 AN ANALYSIS OF PATENT SEARCH SYSTEMS

Page 14: (,1 2 1/,1( - Cardenas 90JPatTrademarkOffSocy448.pdfnamed AUREKA, which contains the function of citation analysis. (FIGURE 1) The function presents a citation graph, in 2 *; -Search

SHIGEVUKI SAKURAi AND ALFONSO F. CARDENAS JPTOS

[Fall et al. 20031 C.J. Fall, A. T6rcsvdri, K.Benzineb, G. Karetka, 'Automated cate-gorization in the international patentclassification', ACM SIGIR Forumarchive, Volume 37, Issue 1, Pages 10-25,Spring 2003.

USPTO Public PAIRhttp://portal.uspto.gov/external/portal/pair.

[Fujita 2007] Sumio Fujita, 'Technologysurvey and invalidity search: A compara-tive study of different tasks for Japanesepatent document retrieval', InformationProcessing and Management: anInternational Journal archive, Volume 43,Issue 5, pages 1154-1172, September 2007.

[Lamirel et al. 20031 Jean-CharlesLamirel, Shadi Al Shehabi, MartialHoffmann, Claire Franqois, 'Intelligentpatent analysis through the use of a neu-ral network: experiment of multi-view-point analysis with the MultiSOMmodel', Proceedings of the ACL-2003workshop on Patent corpus processing,Volume 20, Pages 7-23, 2003

[Chen et al. 20031 Liang Chen, NaoyukiTokuda, Hisahiro Adachi, 'A patent docu-ment retrieval system addressing bothsemantic and syntactic properties',Proceedings of the ACL-2003 workshopon Patent corpus processing, Volume 20,Pages 1-6, 2003

SHIGEYUKI SAKURAI AND ALFONSO F. CARDENAS JPTOS