Supplemental Information 1. Patent Search Methodology Information 1. Patent Search Methodology...

5
Supplemental Information 1. Patent Search Methodology Thomson Innovation and Thomson Data Analyzer software were utilized to perform patent data mining. The search includes both granted patents and published patent applications. Keyword searches were performed of a patent’s title, abstract and claims sections across a complete global dataset of 50 patent authorities. All searches were performed for the time period of September 1, 2006 – December 27, 2013. (Supp. Table 2) The applicant and inventor data was cleaned to remove duplicate entries arising from spelling errors, initialisation, international variation (Ltd, Pty, GmbH etc) or equivalence (Ltd, Limited, etc). A characterization of the datasets would be that the initial dataset is the most inclusive of the wider range of possible iPSC patent technologies, including all generation, cell culturing, miscellaneous patents, a large sampling of differentiation patents and some over-inclusion of ESC-related patents due to overlap in terminology. The first cleaned dataset comprised 1,015 applications and granted patents. It represents the compromise position which was later further cleaned to begin to hone in on those patents related to iPSC-production technologies. Harvested documents may include invalid, discontinued or denied applications. At random manual review of harvested documents was performed to further clean subsets of denied or irrelevant patents. Filtering Search (Derwent Manual Code) All searches were filtered to limit results to two general Derwent manual code taxonomies within which cell technologies occur: D16 (concerned with the fermentation industry including microbiology, antibody production, and cell & tissue culture and genetic engineering) and B04 (concerned with biotechnology for natural products and polymers including testing of microorganisms for pathogenicity, testing of chemicals for mutagenicity or human toxicity and fermentative production of DNA or RNA) (Derwent class codes: http://images.webofknowledge.com/WOK48B5/help/DII/hcodes_classes.html#class_d, accessed 12/29/2013). This filtration was performed to prevent keyword searches including results from unrelated fields such as electrical engineering or botany. Keyword search stream A number of keyword search streams were performed and at least the first 100 documents were manually reviewed to compare relevancy. (Supp. Table. 1) Resulting datasets were compared when a search yielded less documents than the previous search by reviewing at least the first 100-300 documents omitted to ensure relevant patents were not removed. Cleaning the dataset The documents yielded by the search terms were further cleaned by sub-searching terms and through further manual review of those sub-search results to remove unrelated or denied patents to compile a cleaner dataset, erring on the side of over-inclusive (Supp. Table 1) Technology areas within the PSC landscape Patents within the broad PSC landscape in Supplemental Figure 1 were sub-searched to discern the numbers of patents related to different technology focus areas (Supplemental Table 1). These results are overly inclusive and may capture some unrelated patents. In particular, the estimation of the number of differentiation patents versus cell-culturing techniques is difficult to discern based on keyword sub-searches because of shared common terminology. Nonetheless, the figures provide an approximation. The number of total PSC granted and pending patent applications breakdown by: technology category (generation, differentiation, cell-culturing, Nature Biotechnology: doi:10.1038/nbt.2975

Transcript of Supplemental Information 1. Patent Search Methodology Information 1. Patent Search Methodology...

Supplemental Information 1. Patent Search Methodology Thomson Innovation and Thomson Data Analyzer software were utilized to perform patent data mining. The search includes both granted patents and published patent applications. Keyword searches were performed of a patent’s title, abstract and claims sections across a complete global dataset of 50 patent authorities. All searches were performed for the time period of September 1, 2006 – December 27, 2013. (Supp. Table 2) The applicant and inventor data was cleaned to remove duplicate entries arising from spelling errors, initialisation, international variation (Ltd, Pty, GmbH etc) or equivalence (Ltd, Limited, etc). A characterization of the datasets would be that the initial dataset is the most inclusive of the wider range of possible iPSC patent technologies, including all generation, cell culturing, miscellaneous patents, a large sampling of differentiation patents and some over-inclusion of ESC-related patents due to overlap in terminology. The first cleaned dataset comprised 1,015 applications and granted patents. It represents the compromise position which was later further cleaned to begin to hone in on those patents related to iPSC-production technologies. Harvested documents may include invalid, discontinued or denied applications. At random manual review of harvested documents was performed to further clean subsets of denied or irrelevant patents. Filtering Search (Derwent Manual Code) All searches were filtered to limit results to two general Derwent manual code taxonomies within which cell technologies occur: D16 (concerned with the fermentation industry including microbiology, antibody production, and cell & tissue culture and genetic engineering) and B04 (concerned with biotechnology for natural products and polymers including testing of microorganisms for pathogenicity, testing of chemicals for mutagenicity or human toxicity and fermentative production of DNA or RNA) (Derwent class codes: http://images.webofknowledge.com/WOK48B5/help/DII/hcodes_classes.html#class_d, accessed 12/29/2013). This filtration was performed to prevent keyword searches including results from unrelated fields such as electrical engineering or botany. Keyword search stream A number of keyword search streams were performed and at least the first 100 documents were manually reviewed to compare relevancy. (Supp. Table. 1) Resulting datasets were compared when a search yielded less documents than the previous search by reviewing at least the first 100-300 documents omitted to ensure relevant patents were not removed. Cleaning the dataset The documents yielded by the search terms were further cleaned by sub-searching terms and through further manual review of those sub-search results to remove unrelated or denied patents to compile a cleaner dataset, erring on the side of over-inclusive (Supp. Table 1)

Technology areas within the PSC landscape Patents within the broad PSC landscape in Supplemental Figure 1 were sub-searched to discern the numbers of patents related to different technology focus areas (Supplemental Table 1). These results are overly inclusive and may capture some unrelated patents. In particular, the estimation of the number of differentiation patents versus cell-culturing techniques is difficult to discern based on keyword sub-searches because of shared common terminology. Nonetheless, the figures provide an approximation. The number of total PSC granted and pending patent applications breakdown by: technology category (generation, differentiation, cell-culturing,

Nature Biotechnology: doi:10.1038/nbt.2975

other), cell source (fibroblast, adipose-derived, cord blood) and cell product (cardiac, neural, hematopoietic, ophthalmic). The different cell types are a good indication of the most active therapeutic areas of medicine in which research was conducted. The leading areas for PSC research broadly divide among the following: cardiovascular, ophthalmic, cancer and neurological which can be seen best by the peaks on the patent map at Supplemental Figure 1. 2. Figures

©Thomson Reuters Supplemental Figure 1 Broader Pluripotent Stem Cell Patent Map The map is a visual representation of the broader pluripotent cell landscape including patents relating to ESCs and iPSCs. It serves as a macroscopic snapshot that places the more specific iPSC field within its wider context. These PSC-related patents cite pluripotency as an aspect of the patent whether ancillary or otherwise. The dots represent patents and are clustered together according to similar terminology in the title and abstract. The more similar the patents’ terminology then the more elevated the cluster. White snow-capped peaks represent the areas of highest patent density. The landscape is comprised of 2,797 patented inventions with 9,189 documents searched from September 2006 through 2013 inclusive. This number is based on a keyword search of a patent’s title, abstract and claim sections for variations of the word “pluripotent” (and filtered to retrieve results only in the related field of mammalian cells). For illustrative purposes the red dots are 2,057 patents that contain the phrase “potency determining factor” in the title or abstract.

Nature Biotechnology: doi:10.1038/nbt.2975

©Thomson Reuters Supplemental Figure 2 iPSC Patent map displaying ‘reprogramming’ patents in red The broad iPSC-related technology dataset is comprised of 1,388 iPSC-related patented inventions (families), combining published patent applications and granted patents from September 2006 - December 27, 2013. Here, the map displays the spread of patents related to cellular ‘reprogramming’ technologies with red discs, showing a wide distribution across the map and concentrating in the following areas: producing animal embryos, liver disease, retinal epithelium cells, cardiomyocyte production and antineoplastic research. The different therapeutic indications named in several peaks, from liver disease to spinal cord injury, also depict a broad range of research using differentiation techniques for specialized cell and tissue types. 3. Tables Supplemental Table 1 Keyword Search Streams

PSC dataset  pluripoten*  

Broad Initial iPSC dataset

CTB=((((induc* ADJ pluripotent ADJ stem ADJ cell*) or (induc* NEAR3 pluripoten*) or ((reprogram* or re-program* or dedifferentiat* or de-differentiat*) NEAR2 (somatic or cell*))) AND (pluripoten* NEAR8 (dedifferentiat* OR retro-differentiat* or retrodifferentiat* OR de-differentiat* OR induc* OR *2program* OR generat*)))) AND DC=(D16 or b04) AND DP>=(20060901) AND DP<=(20131231);

Nature Biotechnology: doi:10.1038/nbt.2975

Scientific  Literature  Search  

(induced ADJ pluripotent ADJ stem ADJ cell*1) OR (INDUC* NEAR5 PLURIPOTEN*)) and (produc* or generat* or reprogram* or program*)) NOT SSC=("PHYSICS" OR "BEETLE" or "COMPUTER SCIENCE" OR "ENGINEERING") *844 additional publications if include “iPS” which were manually reviewed and 62 relevant publications selected and merged into final workfile of 3312 documents; 9 publications were added from the following search (with a total 3992 documents): (induced ADJ pluripotent ADJ stem ADJ cell*1) OR IPS) and (produc* or generat* or reprogram* or program*)) NOT SSC=("PHYSICS" OR "BEETLE" or "COMPUTER SCIENCE" OR "ENGINEERING")

Exemplar  Search  Stream  reviewed,  

compared  and  eliminated  

(CTB=(((((induced ADJ pluripotent ADJ stem ADJ cell*) OR (iPS*2 NOT (ipsa OR ipsen OR IPSS OR Ipse OR IPSP OR insect* OR pest*1 OR computer OR islet ADJ producing OR ipsi OR IPSO OR sexual OR plant OR (interrogation ADJ position*) OR film OR saccharide OR optical OR sensor OR (idiopathic ADJ parkinson* ADJ syndrome) OR (IPS ADJ receptor) OR (immun* ADJ privilege* ADJ site))) OR (iPS ADJ cell*)) NEAR10 (((pluripotent ADJ stem) OR iPS) ADJ cell*)) AND (*5different* OR production OR producing OR produce* OR obtain* OR induc* OR *2program* OR derive* OR generat* OR engineer*))) AND DP>=(20030101)) AND (DC=(B04 or D16));

Other keywords for iPSC creation

searched (results

compared to ensure relevant

patents were not excluded)

induc*  Near8  pluripoten*  (4136  total  documents)  vs.  induc*  AND  pluripoten*  (6080    documents)  -induced type stem cell (iPS cell) -production OR producing OR produce Near5 pluripoten* -artificial Near6 stem adj cell or artificial Near8 stem adj cell or artificial Near3 pluripoten* -human parthenogenetic stem cell

-induc* pluripotency -induc* Near3 stem cell -deriv*/derivation Near6 pluripoten* -*5differentiat* Near6 pluripoten* (ie retro-differentiate)

Supplemental Table 2 Dataset Summary

Initial Dataset Cleaned Dataset 1

Cleaned Dataset 2

Number of Patent Families (ie Total Number of Patented Inventions):

1,905 1015 901

Number of Patent Publications 6,335 3,323 _

Number of Granted Patents 444 131 115

Nature Biotechnology: doi:10.1038/nbt.2975

Peak Patent Publication Year 2012 2012 [256

inventions] 2012 [228 inventions]

Search Date Range of Published Patents (Incl.) (Sept.) 2006 - 2013 (Dec.)

Top Inventor Country US

Top Patent Issuing Country: US

Top Patent Assignee: University of Kyoto

Date of report creation 27-Dec-13

Supplemental Table 3: Keyword Clustering of PSC Patents

TERM NUMBER   OF   PATENTED   INVENTIONS   CONTAINING   TERM   FROM  TOTAL  (2,841  FAMILIES  OF  9,504  PATENTS)

differentiat*,  differentiat*   1,300-­‐1,400*

cell  culturing 820

iPSC  generation 650*

scaffold,   matrix,  apparatus/device 40-­‐60

cord  blood  cell* 40

adipose-­‐derived   or  adipose

200

cardiomyocyte 250

cardiomyocyte*,  cardiac,  coronary 1,313

Fibroblast  or  dermal  fibroblast   850  

hematopoietic 470

neural  or  neuron*   750

*The estimate is based on a keyword subsearch and manual review of the first 150 documents yielded. The estimated generation of iPSCs includes culturing techniques, growth factors and other essential steps. Estimates do not account for invalid, discontinued or denied applications.

Nature Biotechnology: doi:10.1038/nbt.2975