The Fung Institute Patent Lab: Products and Future Plans

46
The Fung Institute Patent Lab: Products and Future Plans Lee Fleming, Director of the Coleman Fung Institute for Engineering Leadership May 2015 With Gabe Fierro, Ben Balsmeier, Guan-Cheng Li, Kevin Johnson, Aditya Kaulagi, Douglas O'Reagan, Bill Yeh We gratefully acknowledge support from the National Science Foundation Grant #1064182, the US Patent and Trademark Office, and the American Institutes for Research

Transcript of The Fung Institute Patent Lab: Products and Future Plans

Page 1: The Fung Institute Patent Lab: Products and Future Plans

The Fung Institute Patent Lab: Products and Future Plans

Lee Fleming, Director of the Coleman Fung Institute

for Engineering Leadership

May 2015

With Gabe Fierro, Ben Balsmeier, Guan-Cheng Li, Kevin

Johnson, Aditya Kaulagi, Douglas O'Reagan, Bill Yeh

We gratefully acknowledge support from the National Science Foundation Grant #1064182, the US Patent and

Trademark Office, and the American Institutes for Research

Page 2: The Fung Institute Patent Lab: Products and Future Plans

My objectives for today’s chat •  Give you an understanding of our work

–  Disambiguation (upcoming JEMS paper) –  Visualization and tools –  Future plans (PAIR)

•  Get your feedback on our research •  Help me understand bigger picture of data efforts in innovation and entrepreneurship

–  I want to get our stuff used –  and at the same time, aid replication and help our field to stop re-inventing inferior wheels

Page 3: The Fung Institute Patent Lab: Products and Future Plans

Continuing opportunity w/ patent data •  Despite many papers, basic data remain inaccessible

–  Unstructured and dirty text difficult to aggregate across entities –  (Semi) manual and uncoordinated efforts to date for granted patents

•  We provide parsing, dbase, auto disambig of grants + apps: •  inventors •  assignees •  patent lawyers’ firms •  location

• Everything made public and supportive of complementary efforts (mainly AIR and USPTO)

Page 4: The Fung Institute Patent Lab: Products and Future Plans

Basic data flow (~2-3 weeks)

Page 5: The Fung Institute Patent Lab: Products and Future Plans

Conceptual database schema 10/18/13 database-simplified.svg

file://localhost/Users/gabe/Documents/Patent/patentprocessor/latex/figs/database-simplified.svg 1/1

Patent

Lawyer

<lawyers,

patents>

Assignee

<assignees,

patents>

Inventor

<patents,

inventors>

RawLawyer

<rawlayers,

lawyer>

RawInventor

<inventor,

rawinventors>

RawAssignee

<assignee,

rawassignees>

Location<assignees,

locations>

<locations

inventors>

RawLocation

<location,

rawlocations>

<rawlocations,

rawinventor>

<rawassignee,

rawlocations>

USPC

<classes,

patent>

Citation

IPCR

<ipcrs,

patent>

MainClass

<mainclass,

uspc>

SubClass

<subclass,

uspc>

USRelDoc

<patent,

usreldocs>

reldocs>

OtherReference

<patent,

otherreferences>

Application

<application,

patent>

<patent,

citations>

citedby>

<patent,

rawassignees>

<patent,

rawinventors>

<rawlawyers,

patent>

Page 6: The Fung Institute Patent Lab: Products and Future Plans

Accessible data: monthly disambiguated grant, app data Jan ‘75 – Dec ‘14: http://funglab.berkeley.edu/database

•  Parse, clean, disambiguate: –  inventors –  geography (Google lookup) –  assignee (crude Jaro-Winkler) –  lawyer (crude Jaro-Winkler) –  consistent inventor identifiers –  cites, claims, non-pat refs… –  .csv download or SQL query –  future: blocking, tech control –  > 300M observations (not all characterized yet); ~50GB

Page 7: The Fung Institute Patent Lab: Products and Future Plans

Will the real Matt Marx please stand up?

Plainview NY Everett MA Mt View CA

Class 704

Page 8: The Fung Institute Patent Lab: Products and Future Plans

Disambiguation: a classifier problem •  Popular methods: we currently use last three

–  Manual –  Linear weighting + manual tuning –  Naïve Bayes, supervised and semi-supervised –  String matching –  K-means intra and inter cluster optimization –  Look up (Google provided access to library)

•  Active research topic in machine learning •  Julia Lane is planning a contest •  Had more complex approach (Li et al. 2014)

–  latest is simpler, faster, supportable, improvable •  though not as accurate yet – tends to oversplit

Page 9: The Fung Institute Patent Lab: Products and Future Plans

Inventor disambiguation •  Start with (block on) exact name matches •  Euclidean distance for exact attribute matches •  Balance min intra cluster and max inter cluster distances

Page 10: The Fung Institute Patent Lab: Products and Future Plans

•  Look for no further improvement

–  4 in this case

Page 11: The Fung Institute Patent Lab: Products and Future Plans

•  Re-label each column with a cluster •  Relax exact name match and merge •  Use correlation of co-authors as well

Page 12: The Fung Institute Patent Lab: Products and Future Plans

Future of inventor disambiguation •  Relax strict matching •  Bring in additional data

–  All tech fields –  Lexical overlap –  Law firms –  Prior art citations and non patent references

•  New algorithms •  Make everything public and support AIR tournament

Page 13: The Fung Institute Patent Lab: Products and Future Plans

Assignee disambiguation

•  Jaro-Winkler after simple string cleaning •  Unique assignees from 6,700,000 to 507,000 •  Indentifier, raw and cleaned name available

Page 14: The Fung Institute Patent Lab: Products and Future Plans

Future of assignee disambiguation •  Coordinate with NBER and HBS efforts

–  The field needs to curate and maintain cumulative progress

•  CONAME data from USPTO •  Normalize common affixes •  Train with manually developed NBER disambiguation •  Apply inventor algorithm •  Provide Compustat identifier •  Add subsidiary information

-  BvD sample of 6,000 major U.S. firms revealed 50,000 subsidiaries under parental control (>50% in 2012)

-  GE: 250 subsidiaries, ~98% patents filed under GE

Page 15: The Fung Institute Patent Lab: Products and Future Plans

Law firms

•  Similar algorithms to assignees •  Not aware of any applications yet

Page 16: The Fung Institute Patent Lab: Products and Future Plans

Locations

•  Use Google’s geocoding API •  Unique cities from 333K to 66K •  City, region, country

–  Lat and Long being developed –  Do not provide street level data

Page 17: The Fung Institute Patent Lab: Products and Future Plans

If you’re allergic to SQL: http://rosencrantz.berkeley.edu

Page 18: The Fung Institute Patent Lab: Products and Future Plans

Approximate results (full 2014 data in process)

http://funglab.berkeley.edu/database

Page 19: The Fung Institute Patent Lab: Products and Future Plans

Tools and applications •  Look for this stuff and high level explanations at:

–  http://www.funginstitute.berkeley.edu/blog-categories/faculty-directors-blog#

Page 20: The Fung Institute Patent Lab: Products and Future Plans

Visualizations

• Clean tech inventions mapped by type and source • Inventor mobility movies • Patent location in technology “space” • The convergence and divergence, the coalescence and reconfiguration of components – the flow of technology - over time

• Visualizing the patent application process

Page 21: The Fung Institute Patent Lab: Products and Future Plans

Clean Tech Patent Mapper

•  Li, G., K. Paisner, “A List of Clean Tech Patents.” •  http://funglab.berkeley.edu/cleantechx/ • Energy: wind, solar, bio, hydro, geo, nuclear • Assignee: VC backed, university, government, large and small incumbents, no assignee

Page 22: The Fung Institute Patent Lab: Products and Future Plans

VC patents 1990-1999

Innovation and Entrepreneurship in Clean Energy: Nanda, Younge, Fleming

Note scale of funding activity 1990-1999

Page 23: The Fung Institute Patent Lab: Products and Future Plans

VC patents 2000-2009

Innovation and Entrepreneurship in Clean Energy: Nanda, Younge, Fleming

See Nanda, R. and K. Younge, L. Fleming. “Innovation and Entrepreneurship in Clean Energy,” Forthcoming at Rethinking Science and Innovation Policy, NBER.

Much greater funding activity 2000-2009

Page 24: The Fung Institute Patent Lab: Products and Future Plans

Midwest clean tech

Page 25: The Fung Institute Patent Lab: Products and Future Plans

Kansas City clean tech

Page 26: The Fung Institute Patent Lab: Products and Future Plans

Mobility mapper: http://funglab.berkeley.edu/mobility/

• Larger states • Example: 1987 immigration to MI (note one IL inventor):

Page 27: The Fung Institute Patent Lab: Products and Future Plans

!

!

1987

1982

Illustrates causal impact of noncompetes on brain drain (Marx, Singh, Fleming, forthcoming RP)

Page 28: The Fung Institute Patent Lab: Products and Future Plans

!

Variety of states

Page 29: The Fung Institute Patent Lab: Products and Future Plans

Visualizing an acquisition

Page 30: The Fung Institute Patent Lab: Products and Future Plans

Acknowledgment of government support –  Hillary Greene, Dennis Yao, Guan Cheng

•  What proportion of 2015 patents can be traced to govt?

Page 31: The Fung Institute Patent Lab: Products and Future Plans

5M patent applications as a Markov process? Starting with an analysis of Bilski vs. Kappos

Page 32: The Fung Institute Patent Lab: Products and Future Plans

Network Interface – http://

douglasoreagan.com/socialnetwork/

Page 33: The Fung Institute Patent Lab: Products and Future Plans

Semiconductor patents in 438/283

from 1998-2000

Page 34: The Fung Institute Patent Lab: Products and Future Plans

Method to illustrate network around seed inventors

Page 35: The Fung Institute Patent Lab: Products and Future Plans

Cool pics – but what do they mean?

–  Need to validate visualizations with ground truth –  Mixed visualization and historical study of biggest semiconductor breakthrough of last decade – the FinFET

Page 36: The Fung Institute Patent Lab: Products and Future Plans

Why FinFET? •  Study intended to explore/develop breakthrough visualization tools

–  tie to reality w/o conflating variables

• All patents Northern CA 1995-2000 • Ranked by future citations • Tech distance

–  from our brains, close but moldy

•  Geographic distance –  about 40 yards

•  Social distance –  head of search committee that hired me –  neighbor

Page 37: The Fung Institute Patent Lab: Products and Future Plans

Quintessential architectural BT

Source: King 2012

Page 38: The Fung Institute Patent Lab: Products and Future Plans

Inventors brokered social and academic/

industry networks

Page 39: The Fung Institute Patent Lab: Products and Future Plans

But they also integrated outsiders

Page 40: The Fung Institute Patent Lab: Products and Future Plans

The flow of technology

1)  Words are components -> little differentiation, this is so incremental

2)  No geographic localization of trajectories

3)  How did university plop in and do this?

4)  FinFET may have been only govt supported patent

Page 41: The Fung Institute Patent Lab: Products and Future Plans

Coming attractions •  Blocking actions – better than citations as a measure of patent impact?

•  Lexical novelty –  First appearance of new word in corpus –  First pair-wise combination of words

•  Lexical distance between classes

Page 42: The Fung Institute Patent Lab: Products and Future Plans

Identification of blocking patents – pdf challenges: OCR 101,195 PDF files…

Page 43: The Fung Institute Patent Lab: Products and Future Plans

Claim Rejections – 35 USC 103 3. The folowing is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth …

Detail Enhancement

Noise Reduction

OCR

Page 44: The Fung Institute Patent Lab: Products and Future Plans

OCRed blocking data

Page 45: The Fung Institute Patent Lab: Products and Future Plans

First results from 2012 •  2011 now complete as well •  Need to characterize each type of action

Page 46: The Fung Institute Patent Lab: Products and Future Plans

I may come to you tin cup in hand… •  Download, parse, clean, disambiguate, store and serve up > 300M data (and weekly updates)

–  Julia Lane taking over part of this •  Blocking data: must OCR ~400M documents •  Disambiguation takes weeks, PAIR years

–  ~$150K hardware alone past year –  database person in Si Valley (~$140K + Cal tax)

•  Mention maintenance in NSF proposal => ding •  Public good (~50,000 downloads) •  Talking with firms and private philanthropy