Visually Exploring Patent Collections for Events and Patterns

88
Visually Exploring Patent Collections for Events and Patterns Derek X. Wang Associate Director of the Charlotte Visualization Center Together with: Wenwen Dou, Wlodek Zadrozny, Suraj Ankam, Debbie Strumsky, Terry Rabinowitz

description

My talk on Patent Visualization at The 3rd IEEE Workshop on Interactive Visual Text Analytics. Primary focus is to introduce the Scalable Visual Analytics research that my team is working on. Workshop paper can be found at: http://vialab.science.uoit.ca/textvis2013/papers/Ankam-TextVis2013.pdf

Transcript of Visually Exploring Patent Collections for Events and Patterns

Page 1: Visually Exploring Patent Collections for Events and Patterns

Visually Exploring Patent Collections for Events

and PatternsDerek X. Wang

Associate Director of the Charlotte Visualization Center

Together with: Wenwen Dou, Wlodek Zadrozny, Suraj Ankam, Debbie Strumsky, Terry Rabinowitz

Page 2: Visually Exploring Patent Collections for Events and Patterns

Value

Page 3: Visually Exploring Patent Collections for Events and Patterns

BusinessesValue

Page 4: Visually Exploring Patent Collections for Events and Patterns

BusinessesValue

Page 5: Visually Exploring Patent Collections for Events and Patterns

BusinessesValue

• 800 patents:

• $1 billion worth of patents from AOL to Microsoft

Page 6: Visually Exploring Patent Collections for Events and Patterns

BusinessesValue

• 800 patents:

• $1 billion worth of patents from AOL to Microsoft

• 1,100 patents from Kodak

• 525 Million to group license

Page 7: Visually Exploring Patent Collections for Events and Patterns

BusinessesValue

• 800 patents:

• $1 billion worth of patents from AOL to Microsoft

• 1,100 patents from Kodak

• 525 Million to group license

• 17, 000 Patents

• $12.5 billion Motorola Mobility to Google

Page 8: Visually Exploring Patent Collections for Events and Patterns

2006 2007 2008 2009 2010

Dataset: 123 Publications from VAST proceedings from 2006-2010.

Value

Page 9: Visually Exploring Patent Collections for Events and Patterns

2006 2007 2008 2009 2010

Dataset: 123 Publications from VAST proceedings from 2006-2010.

ValueTechnology

Page 10: Visually Exploring Patent Collections for Events and Patterns

2006 2007 2008 2009 2010

Cyan topic: variable uncertainty trend correlation linear multivariate sensitivity

Dataset: 123 Publications from VAST proceedings from 2006-2010.

ValueTechnology

Page 11: Visually Exploring Patent Collections for Events and Patterns

2006 2007 2008 2009 2010

Cyan topic: variable uncertainty trend correlation linear multivariate sensitivity

Blue topic: dimension quality cluster measure lda attribute reduction projection

Dataset: 123 Publications from VAST proceedings from 2006-2010.

ValueTechnology

Page 12: Visually Exploring Patent Collections for Events and Patterns

2006 2007 2008 2009 2010

Cyan topic: variable uncertainty trend correlation linear multivariate sensitivity

Blue topic: dimension quality cluster measure lda attribute reduction projection

Dataset: 123 Publications from VAST proceedings from 2006-2010.

FODAVA

ValueTechnology

Page 13: Visually Exploring Patent Collections for Events and Patterns

2006 2007 2008 2009 2010

Cyan topic: variable uncertainty trend correlation linear multivariate sensitivity

Blue topic: dimension quality cluster measure lda attribute reduction projection

Dataset: 123 Publications from VAST proceedings from 2006-2010.

FODAVA

ValueTechnology

**X. Wang et al., ParallelTopics: A probabilistic approach to exploring document collections, IEEE VAST 2011

Page 14: Visually Exploring Patent Collections for Events and Patterns

Goal

Value

Page 15: Visually Exploring Patent Collections for Events and Patterns

GoalValue

• Can we spot an emerging new technology?

Page 16: Visually Exploring Patent Collections for Events and Patterns

GoalValue

• Can we spot an emerging new technology?

• Text mining and visualization

Page 17: Visually Exploring Patent Collections for Events and Patterns

GoalValue

• Can we spot an emerging new technology?

• Text mining and visualization

• Can we spot novelty within a patent?

Page 18: Visually Exploring Patent Collections for Events and Patterns

GoalValue

• Can we spot an emerging new technology?

• Text mining and visualization

• Can we spot novelty within a patent?

• How much do claims differ from class descriptions?

Page 19: Visually Exploring Patent Collections for Events and Patterns

GoalValue

• Can we spot an emerging new technology?

• Text mining and visualization

• Can we spot novelty within a patent?

• How much do claims differ from class descriptions?

• How much do claims differ from claims in other similar patents

Page 20: Visually Exploring Patent Collections for Events and Patterns

GoalValue

• Can we spot an emerging new technology?

• Text mining and visualization

• Can we spot novelty within a patent?

• How much do claims differ from class descriptions?

• How much do claims differ from claims in other similar patents

• Can we list “all” patents relevant for some technology? (and what does it mean)

Page 21: Visually Exploring Patent Collections for Events and Patterns

GoalValue

A Robust and Scalable Patent Analysis Infrastructure Is Needed

Page 22: Visually Exploring Patent Collections for Events and Patterns

GoalValue

A Robust and Scalable Patent Analysis Infrastructure Is Needed

Visual Analytics Will Play a Key Role

BalancedAnalytics

Technology

Page 23: Visually Exploring Patent Collections for Events and Patterns

GoalValue

A Robust and Scalable Patent Analysis Infrastructure Is Needed

Visual Analytics Will Play a Key Role

BalancedAnalytics

Technology

Human

Computer+=

Page 24: Visually Exploring Patent Collections for Events and Patterns

Value

Challenge

Goal

Page 25: Visually Exploring Patent Collections for Events and Patterns

Value ChallengeGoal

Page 26: Visually Exploring Patent Collections for Events and Patterns

Value ChallengeGoal

Unstructured or semi-structured

Highly heterogeneous

Leading to highly heterogeneous models

Incomplete or with holes

With intrinsic uncertainty (and in some cases deception)

Inside and outside the enterprise

Containing detailed time and space information:

Page 27: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge

Research

Page 28: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Topic Modeling

Page 29: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Topic Modeling• Latent Dirichlet Allocation (LDA)

Page 30: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Topic Modeling• Latent Dirichlet Allocation (LDA)

• Reveals Latent topics from large textual corpus

Page 31: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Topic Modeling• Latent Dirichlet Allocation (LDA)

• Reveals Latent topics from large textual corpus

• Coherent sets of most likely words to describe topics

Page 32: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Topic Modeling• Latent Dirichlet Allocation (LDA)

• Reveals Latent topics from large textual corpus

• Coherent sets of most likely words to describe topics

• Topics defined by keyword groups

Page 33: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Topic Modeling• Latent Dirichlet Allocation (LDA)

• Reveals Latent topics from large textual corpus

• Coherent sets of most likely words to describe topics

• Topics defined by keyword groups

• Topics in text collections can effectively be inferred

Page 34: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge Research

Page 35: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Investigative Element Extraction

Page 36: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Investigative Element Extraction

• Recognition of entities including people, locations, buildings, organizations.

Page 37: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Investigative Element Extraction

• Recognition of entities including people, locations, buildings, organizations.

• Recognition of times and dates.

Page 38: Visually Exploring Patent Collections for Events and Patterns

GoalValue Challenge ResearchStructuring the Unstructured:

Investigative Element Extraction

• Recognition of entities including people, locations, buildings, organizations.

• Recognition of times and dates.

• Construct near-real-time analysis pipeline for entity association

Page 39: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research

Page 40: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge ResearchStructuring the Unstructured:

Event Structuring

Page 41: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge ResearchStructuring the Unstructured:

Event Structuring

Events: Meaningful occurrences in space and time

Page 42: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge ResearchStructuring the Unstructured:

Event Structuring

Events: Meaningful occurrences in space and time

Motivating Event

Particular Topic Stream

Page 43: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge ResearchStructuring the Unstructured:

Event Structuring

Events: Meaningful occurrences in space and time

Motivating Event

Particular Topic Stream

Narrative: a series of clustered (event-based) stories temporally-linked based on content similarity.

Page 44: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research

Results

Page 45: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ResultsCan we spot an emerging new technology?

Page 46: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ResultsCan we spot an emerging new technology?

Data: 50,000 telecommunication patents, in past 10 years Abstract text and patent meta-information;

1.5 Gb Raw Patent Documents

Page 47: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ResultsCan we spot an emerging new technology?

Data: 50,000 telecommunication patents, in past 10 years Abstract text and patent meta-information;

1.5 Gb Raw Patent Documents

Methods: Topic modeling and visualization

Page 48: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ResultsCan we spot an emerging new technology?

Results: We can see a significant change in the topic of “software and storage” in communication around 2007 (corresponding to Apple iPhone?)

Data: 50,000 telecommunication patents, in past 10 years Abstract text and patent meta-information;

1.5 Gb Raw Patent Documents

Methods: Topic modeling and visualization

Page 49: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge ResearchCan we spot an emerging new technology?

Results

**W. Dou et al., HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies, IEEE VAST 2013

Page 50: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge ResearchCan we spot an emerging new technology?

Results

Model: § 100 topics § Each topic a distribution on

words § Each abstract a combination

of topics !

Note: Width of the graph proportional to the number of patents and the number of words from a particular topic (topic signal strength). Number of class 455 patents grew from 2234 in 2005 to 7647 in 2012

**W. Dou et al., HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies, IEEE VAST 2013

Page 51: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge ResearchCan we spot an emerging new technology?

Results

Model: § 100 topics § Each topic a distribution on

words § Each abstract a combination

of topics !

Note: Width of the graph proportional to the number of patents and the number of words from a particular topic (topic signal strength). Number of class 455 patents grew from 2234 in 2005 to 7647 in 2012

**W. Dou et al., HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies, IEEE VAST 2013

Page 52: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Page 53: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Typical Keyword: “transistor”

Page 54: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

!

Emergent: “storage, software, …”

Typical Keyword: “transistor”

Page 55: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Page 56: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge ResearchCan we spot novelty within an existing patent?

Results

Page 57: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge ResearchCan we spot novelty within an existing patent?Data$$

$Ini(ally:$A"random"sample"of"40"patents"in"several"classes"with"focus"on"455"(telecom)."""

$Recently:$Confirmed"through"automated"analysis"of"several"subclasses"of"455.""$Method:"Compare"words"in"claims"with"words"in"class"plus"subclass"definiAon""Results:"Large"symmetric"differences

""#$%&(()*+,&)÷"#$%&(./0+1+2+#1)"""#$%&(34&2$*52&)÷"#$%&(./0+1+2+#1)"

"

Results

Page 58: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ResultsExample

h)p://pa,t.uspto.gov/netacgi/nph-­‐Parser?Sect1=PTO2&p=1&u=%2Fnetahtml%2Fsearch-­‐bool.html&r=2&f=G&l=50&d=pall&s1=449%2F8.CCLS.&OS=CCL/449/8&RS=CCL/449/8  

Patent  Title  Process  for  rearing  bumblebee  queens  and  process  for  

rearing  bumblebees    

Main  ClassificaTon    449/1  ;  449/2;  449/8  

Class  449  –  Bee  Culture  /  Subclass  1  Class  449  –  Bee  Culture  /  Subclass  8

Page 59: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ResultsWe  claim:  1.  A  process  for  rearing  bumblebee  queens  (genus  Bombus)  comprising  generaTng  a  colony  with  workers  in  the  presence  of  ferTlized  eggs  and/or  larvae  from  at  least  one  colony,  in  a  room  with  a  controlled  climate  provided  with  food,  and  allowing  the  colony  to  grow  unTl  bumblebee  queens  are  produced,  wherein  subadult  and/or  adult  workers  that  originate  from  at  least  one  different  colony  are  brought  together  with  said  ferTlized  eggs  and/or  larvae.   2.  The  process  according  to  claim  1,  wherein  the  workers  that  originate  from  said  at  least  one  different  colony  are  brought  together  with  a  young  colony  in  the  eusocial  phase,  consisTng  of  a  ferTlized  queen,  brood  and  the  first  born  workers.   3.  The  process  according  to  claim  1,  wherein  more  than  100  workers  are  brought  together.  4.  The  process  according  to  claim  1,  wherein  rearing  is  carried  out  using  a  workers:  ferTlized  eggs  raTo  of  0.5-­‐4.  5.  The  process  according  to  claim  1,  wherein  the  workers  originaTng  from  said  at  least  one  different  colony  are  first  kept  in  a  room  without  any  queen  and  without  brood  for  one  day.  6.  The  process  according  to  claim  1,  wherein  brood  and  workers  from  different  bumblebee  species  are  brought  together.   7.  A  process  for  rearing  bumblebees  (genus  Bombus),  comprising  rearing  bumblebee  queens  by  generaTng  a  colony  with  workers  in  the  presence  of  ferTlized  eggs  and/or  larvae  from  at  least  one  colony,  in  a  room  with  a  controlled  climate  provided  with  food,  and  allowing  the  colony  to  grow,  wherein  subadult  and/or  adult  workers  that  originate  from  at  least  one  different  colony  are  brought  together  with  said  ferTlized  eggs  and/or  larvae,  and  using  said  bumblebee  queens  for  rearing  bumblebees.

Page 60: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Class 449 1 -> Class Definition

8 -> 7 -> 3 -> Class Definition

Subclass Nesting

Page 61: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Class 449 1 -> Class Definition

8 -> 7 -> 3 -> Class Definition

Subclass Nesting

Class  Name:  Bee  Culture  Class  Defini;on:    This  class  includes  the  methods  of  and  structures  for  propagaTng,  raising  and  caring  for  bees;  as  well  as  certain  ancillary  methods  and  structures.

Page 62: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Class  449  Subclass  1Subclass  Name:  Method  Subclass  Defini;on:    This  subclass  is  indented  under  the  class  definiTon.    Process.

Page 63: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Class  449  Subclass  8Subclass  Name:  Queen  Raising  Subclass  Defini;on:    This  subclass  is  indented  under  subclass  7.    Structure  with  provision  to  encourage  and  care  for  the  producTon  of  a  bee  larvae  into  a  queen  bee.

Page 64: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ResultsWords  in  class  /  subclass  defini;ons  found  in  patent  claim

method 0 colony 11

process 7 culture 0

queen 6 propagate 0

raise 0

encourage 0

care 0

larvae 4

producTon 1

bee 7

mulT 0

swarm 0

capture 0

house 0

hive 0

structure 0

Page 65: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ResultsWords  in  claim  that  were  not  in  definiTons

rearing 5

worker 10

egg 5

ferTlize 6

climate 2

food 2

different 5

control 2

Page 66: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Page 67: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Observations • Novelty is in words/relations that are not part of the definition (but appear in

patent claims or its abstract) • Some things can be left unsaid. Is there a boundary? • Happens in all patents (but degree varies)

Can we spot novelty within an existing patent?

Page 68: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Can we spot novelty within an existing patent?

Next • Opportunity to text mine these differences – Are they random on a time scale? – Would descriptions of emerging technologies emerge from these

patterns? – Do combination patents have more of these?

Page 69: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

Page 70: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ResultsCan we list “all” patents relevant for some technology?

Page 71: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

– Data: Patents, Wikipedia

Can we list “all” patents relevant for some technology?

Page 72: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

– Data: Patents, Wikipedia– Potential Data: Cell phone manuals or other descriptions

Can we list “all” patents relevant for some technology?

Page 73: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

– Data: Patents, Wikipedia– Potential Data: Cell phone manuals or other descriptions

Can we list “all” patents relevant for some technology?

Page 74: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

– Data: Patents, Wikipedia– Potential Data: Cell phone manuals or other descriptions

– Method: Text mining of patents in certain classes, text mining of filing by certain market/technology players, text mining of other patents, using Wikipedia and manuals as a guidance what to look for.

Can we list “all” patents relevant for some technology?

Page 75: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results

– Data: Patents, Wikipedia– Potential Data: Cell phone manuals or other descriptions

– Method: Text mining of patents in certain classes, text mining of filing by certain market/technology players, text mining of other patents, using Wikipedia and manuals as a guidance what to look for.

Can we list “all” patents relevant for some technology?

Page 76: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research

Scale

Results

Page 77: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ScaleScalable Computing Architecture for Extracting Latent Topics and Events*

Results

Page 78: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ScaleScalable Computing Architecture for Extracting Latent Topics and Events*

Results

Page 79: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ScaleScalable Computing Architecture for Extracting Latent Topics and Events*

Distributed Data Storage and Pre-Processing Environment

Results

Page 80: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ScaleScalable Computing Architecture for Extracting Latent Topics and Events*

**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012

Distributed Data Storage and Pre-Processing Environment

Results

Page 81: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ScaleScalable Computing Architecture for Extracting Latent Topics and Events*

**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012

Distributed Data Storage and Pre-Processing Environment

MapReduce procedures for data-cleaning and pre-processing Distributed Storage Solution (MongoDB), is used for data storage,

analysis and Retrieval

Results

Page 82: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ScaleScalable Computing Architecture for Extracting Latent Topics and Events*

**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012

Distributed Data Storage and Pre-Processing Environment

MapReduce-based social media crawlers for Twitter, blogs and news articles: Unstructured Contents: Textual Information, Image, Comments

Structured Contents: User Graph, Geo-tags, HashTag

Results

Page 83: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ScaleScalable Computing Architecture for Extracting Latent Topics and Events*

**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012

Results

Page 84: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ScaleScalable Computing Architecture for Extracting Latent Topics and Events*

**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012

Parallel Data Analytics Cluster

MPI-based Parallel-LDA implementation for Topic modeling with Memory Sharing Optimization

Results

Page 85: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research ScaleScalable Computing Architecture for Extracting Latent Topics and Events*

**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012

Parallel Data Analytics Cluster

OpenNLP-based Parallel Implementation for Entity-Extraction Customized PBS to schedule jobs for parallel computing environment

Results

Page 86: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Results Scale

News Briefing App

Page 87: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Scale

Resources we’d be happy to share

• Complete US patents and applications (until 1q2013) with with a search engine (Lucene) interface • Patent Classes • Other text resources (Wikipedia, Wiktionary etc) !

We’d be happy to prepare specialized extracts or combination for those who need them.

Results

Page 88: Visually Exploring Patent Collections for Events and Patterns

RealityValue Challenge Research Scale

Thank you!

Derek Xiaoyu Wang [email protected]

Results

News Briefing App @News_Briefing

Now FREE at App Store